2.4. Adaptive Domain Adversarial Network (ADANN)

Evan Campbell; Angkoon Phinyomark; Erik Scheme

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.4. Adaptive Domain Adversarial Network (ADANN)

EC Evan Campbell

AP Angkoon Phinyomark

ES Erik Scheme

This method is extracted from research article: Front Neurosci, May 2021

Deep Cross-User Models Reduce the Training Burden in Myoelectric Control

DOI: 10.3389/fnins.2021.657958

Request a Protocol

Ask a question

Favorite

The ADANN followed an identical topology and training strategy to the CNN network described in the previous section, with the addition of two domain adaptation techniques: adaptive batch normalization and domain adversarial training.

Adaptive batch normalization is a domain adaptation technique that encodes domain-specific information in sets of batch normalization parameters and class-specific information in the weights and biases of the network (Li et al., 2016). This is achieved by associating a set of batch normalization parameters to each subject during training but using common weights and biases across all subjects. During training, the mean and variance of activations within a feature map were tracked; this resulted in 128 parameters per block being trained to encode domain information. In contrast to regular batch normalization, which is used for regularization during training but not after, the adaptive batch normalization parameters associated with a subject are retained to adapt activations after training. This enables a model that is pre-trained using a large number of subjects with multiple repetitions of gestures to be adapted to an unseen subject by learning their batch normalization parameters using a small amount of data (here, a single repetition of each gesture). In practice, this adaptation can be done using the Pytorch library by performing forward passes over the single repetition while the model is in train mode and all convolutional and linear layers are frozen (set the requires_grad attribute to false). This will update only the running mean and running variance parameters and leave the model weights unchanged. In addition to its part in previous ADANN studies, adaptive batch normalization alone has proven meaningful for EMG gesture recognition (Cote-Allard et al., 2017; Du et al., 2017).

Domain adversarial training is another technique used by ADANN that can improve generalization of the model to different domains (subjects) (Ganin et al., 2016; Côté-Allard et al., 2020a). Domain adversarial training relies on the network having two heads with which to simultaneously predict the elicited gesture and the subject who elicited that gesture during training. The heads consist of linear layers that operate in parallel to produce predictions on different characteristics of the data provided the same input from the convolutional blocks. These layers result in two loss terms, a gesture prediction loss ( $L_{d}$ ) and a domain divergence loss ( $L_{y}$ ). Standard backpropagation is used for the gesture prediction loss; however, the divergence loss is reversed (multiplied by −λ) for all convolutional blocks. In theory, this training strategy penalizes domain-specific information by regularizing across-subjects while encoding gesture-specific information. Effectively, the system is trained to be able to differentiate between gestures while being unable to differentiate between users. An appropriate penalty was observed when λ was set to 0.1, as suggested in past works (Côté-Allard et al., 2020a).

The domain adversarial training was further optimized by using only two output neurons when computing the domain divergence loss. These neurons represent whether the input comes from a particular subject or from any other subject. This strategy, as opposed to using a neuron for each subject in the training set, enabled the domain to be distinguished with a higher degree of certainty, resulting in a more appropriate penalty term. During each epoch of training, a random subject from the training set was selected as the particular subject which ensured an approximately equal representation over the course of training. Balance between the domain labels was achieved by ensuring half the batch was from the selected subject, and the remainder was from other subjects. Inputs that originated from the selected subject were issued a subject label of 1, whereas the remaining inputs were issued a label of 0. Domain divergence was computed via cross entropy between the issued labels and the predictions of the domain head.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol