Our strategy consists of implementing networks based on logical gates, modeled by Perceptron with fixed weights and biases. This hybrid neural model was introduced in [9, 10]. Here, a single Perceptron in the NN network is activated by so-called Squashing activation functions, differentiable, a parametric family of functions that satisfy natural invariance requirements and contain rectified linear units as a particular case [16, 17]. These Squashing functions approximate the cutting function in the nilpotent logical operators. A relevant characteristic of this family is its differentiability, which is vital for employing gradient-based optimization techniques.
In this investigation, we implemented the following function:
where is a real nonzero value that needs to be adjusted to let the model be convergence.
Thus, the Perceptron in the neural networks' hidden layers can model a threshold-based nilpotent operator [9, 10]: a conjunction, a disjunction, or even an aggregative operator.
This means that the weights of the first layer are to be learned, while the hidden layers of the pre-designed neural block, worked as logical operators with frozen weights and biases. This means:
The first layer is trainable and has been implemented using an Exponential Linear Unit (ELU) activation function.
At the same time, the activation functions in the hidden layers, model the cutting function to avoid the vanishing gradient problem with the so-called squashing function in the nilpotent logical operators (defined by Eq. 2), representing the internal layers. Besides logical operators, preference operators can also be modeled this way [10].
The final layer is again trainable, with a sigmoid activation function.
The weights in the first, , and last layer, , are optimized during training to establish an association between input and output y. In the second layer, we define the nodes , layers with different and frozen weights and biases (see Eq. 1), grouping different relations between the input parameters. Thus, each of these nodes is essentially a hypothesis grouping of all the parameters with different statistical weights. Finally, the additional internal layers perform logical operations; some of them are resumed in Table Table2.2. In Fig. 3, we illustrate this architecture, considering 4 nodes.
Some examples of logical operators and their corresponding implementation (Csiszár et al. 2020c)
For now, with simplicity in mind, we implement two AND layers (conjunctions), followed by an OR layer (disjunction), to logically evaluate the nodes , which is a process modeling human reasoning in the decision process
In both the reference (Dense layers with “ReLU” activation functions) and our implemented Logic-Operator neural network (LONN) models, we implemented as a loss function the mean squared error () between labels and predictions . For the optimization process, we implemented an ADAM method, which is an algorithm for gradient-based optimization of stochastic objective functions [19], with a learning rate adjusted to 0.02 (see Table Table44).
Model-specific hyperparameters for comparison a dense Layer network ReLu vs. a LONN
Due to that the categorical parameters were binary (for instance sex, smoke, anemia), we converted them to 1 and 0 s. The other numerical parameters were normalized using the normalization function , where and are the minimal and maximal values of the vector where belongs. With these data transformations, we were able to get a homogeneous input matrix for the deep-learning model.
Finally, we selected 80% of shuffled data for training to get a balanced sample for training and avoid eventual biases as well as overfitting in the training process. The list of the optimizer hyperparameters are listed in Table Table3,3, while the list of model hyperparameters are listed in the Table Table4.4. These parameters were manually tuned. An implementation of automatically tuned parameters will be presented in a future work.
Hyperparameters of the ADAM optimizer for a dense Layer network ReLu and LONN
Therefore, our LONN model simulates cognitive processes like rational, logical thinking process, considering that this logic is joined by fuzziness, i.e., logical operations are not exact but essentially fuzzy due to the implemented continuous-valued operators (see Fig. 4) [20].
Representation of natural thinking processes (A) and by neural networks (B). Logical processes, like the combination of “and” and “or” processes, are implemented as fuzzy logic, representing natural uncertainties in the thinking process
Even though we were inspired in this design by cognitive processes, our aim is not to reproduce the native human thinking process in silico but rather to implement processes much closer to the natural processes in making information processing interpretable. Furthermore, we aim to retain a flexible modeling method that can be applied in different and relevant medical environments: the model needs only a few parameters out of the medical space, but addresses the major medical workflow, from medical analyses to diagnoses and treatment selection. The main relevant result is plausible suggestions of medical treatments. That should be applicable in nearly all medical domains.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.