This section describes how we used transfer learning with a reduced MobileNetV2 (MNV2) [43] to predict pain intensity. We have selected it, as it performed well for binary pain classification from facial video in [10]. The proposed architecture is a combination of two architectures: (1) MNV2 using the first 5 inverted residual blocks after pre-training with ImageNet for the RGB X-ITE images [10], (2) the simple CNN architecture for the RFc prediction images, as shown in the previous subsection. Then, three dense layers were added for combining the concatenated outputs of (1) and (2). The dense layers have 1024, 512 and 128 neurons, respectively and are activated by ReLU. The final dense output layer is activated with the softmax function. The obtained model is trained for 150 epochs with learning rate.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.