Continuous-valued logic multi-criteria decision operators and interpretability

Juan G. Diaz Ochoa; Orsolya Csiszár; Thomas Schimper

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Continuous-valued logic multi-criteria decision operators and interpretability

JO Juan G. Diaz Ochoa

OC Orsolya Csiszár

TS Thomas Schimper

This method is extracted from research article: BMC Med Inform Decis Mak, Jun 2021

Medical recommender systems based on continuous-valued logic and multi-criteria decision operators, using interpretable neural networks

DOI: 10.1186/s12911-021-01553-3

Request a Protocol

Ask a question

Favorite

Our strategy consists of implementing networks based on logical gates, modeled by Perceptron with fixed weights and biases. This hybrid neural model was introduced in [9, 10]. Here, a single Perceptron in the NN network is activated by so-called Squashing activation functions, differentiable, a parametric family of functions that satisfy natural invariance requirements and contain rectified linear units as a particular case [16, 17]. These Squashing functions approximate the cutting function in the nilpotent logical operators. A relevant characteristic of this family is its differentiability, which is vital for employing gradient-based optimization techniques.

In this investigation, we implemented the following function:

where $β$ is a real nonzero value that needs to be adjusted to let the model be convergence.

Thus, the Perceptron in the neural networks' hidden layers can model a threshold-based nilpotent operator [9, 10]: a conjunction, a disjunction, or even an aggregative operator.

This means that the weights of the first layer are to be learned, while the hidden layers of the pre-designed neural block, worked as logical operators with frozen weights and biases. This means:

The first layer is trainable and has been implemented using an Exponential Linear Unit (ELU) activation function.

At the same time, the activation functions in the hidden layers, model the cutting function to avoid the vanishing gradient problem with the so-called squashing function $S_{β_{int}} (x)$ in the nilpotent logical operators (defined by Eq. 2), representing the internal layers. Besides logical operators, preference operators can also be modeled this way [10].

The final layer is again trainable, with a sigmoid activation function.

The weights in the first, $H_{i}$ , and last layer, $O_{i}$ , are optimized during training to establish an association between input $x$ and output y. In the second layer, we define the nodes $M_{i}$ , layers with different and frozen weights $w_{ij}$ and biases $b_{ij}$ (see Eq. 1), grouping different relations between the input parameters. Thus, each of these nodes is essentially a hypothesis grouping of all the parameters with different statistical weights. Finally, the additional internal layers perform logical operations; some of them are resumed in Table Table2.2. In Fig. 3, we illustrate this architecture, considering 4 $M_{i}$ nodes.

Some examples of logical operators and their corresponding implementation (Csiszár et al. 2020c)

For now, with simplicity in mind, we implement two AND layers (conjunctions), followed by an OR layer (disjunction), to logically evaluate the nodes $M_{i}$ , which is a process modeling human reasoning in the decision process

In both the reference (Dense layers with “ReLU” activation functions) and our implemented Logic-Operator neural network (LONN) models, we implemented as a loss function the mean squared error ( $M S E = \sum_{i = 1}^{n} {(Y_{i} - {\hat{Y}}_{i})}^{2}$ ) between labels $Y_{i}$ and predictions ${\hat{Y}}_{i}$ . For the optimization process, we implemented an ADAM method, which is an algorithm for gradient-based optimization of stochastic objective functions [19], with a learning rate adjusted to 0.02 (see Table Table44).

Model-specific hyperparameters for comparison a dense Layer network ReLu vs. a LONN

Due to that the categorical parameters were binary (for instance sex, smoke, anemia), we converted them to 1 and 0 s. The other numerical parameters were normalized using the normalization function $n o r m (x) = (x - m i n (X)) / ((X) - m i n (X))$ , where $m i n (x)$ and $m a x (x)$ are the minimal and maximal values of the vector $X$ where $x$ belongs. With these data transformations, we were able to get a homogeneous input matrix for the deep-learning model.

Finally, we selected 80% of shuffled data for training to get a balanced sample for training and avoid eventual biases as well as overfitting in the training process. The list of the optimizer hyperparameters are listed in Table Table3,3, while the list of model hyperparameters are listed in the Table Table4.4. These parameters were manually tuned. An implementation of automatically tuned parameters will be presented in a future work.

Hyperparameters of the ADAM optimizer for a dense Layer network ReLu and LONN

Therefore, our LONN model simulates cognitive processes like rational, logical thinking process, considering that this logic is joined by fuzziness, i.e., logical operations are not exact but essentially fuzzy due to the implemented continuous-valued operators (see Fig. 4) [20].

Representation of natural thinking processes (A) and by neural networks (B). Logical processes, like the combination of “and” and “or” processes, are implemented as fuzzy logic, representing natural uncertainties in the thinking process

Even though we were inspired in this design by cognitive processes, our aim is not to reproduce the native human thinking process in silico but rather to implement processes much closer to the natural processes in making information processing interpretable. Furthermore, we aim to retain a flexible modeling method that can be applied in different and relevant medical environments: the model needs only a few parameters out of the medical space, but addresses the major medical workflow, from medical analyses to diagnoses and treatment selection. The main relevant result is plausible suggestions of medical treatments. That should be applicable in nearly all medical domains.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol