In the cross-domain EEG emotion recognition problem, it is often necessary to deal with multiple domains with significantly different distributions. To reduce the cost of labeling training data for the target task, experience can be gained from multiple labeled source subjects to help predict the target subject. When data are prone to staleness, the predictions of a new session for one subject can benefit from its known multiple labeled sessions.
Unlike many existing studies that directly combine multiple source domains into one single source domain, we try to explore a more challenging and practical multi-source scenario in which each source domain with a different distribution is treated as an independent source domain. In multi-source unsupervised domain adaptation, K distinct source domains Ds={Dsk|k = 1,…,K}, where the k-th source domain contains n_k i.i.d. labeled examples drawn from the joint distribution Psk(Xsk,Ysk), and a target domain of n_t i.i.d. unlabeled examples sampled from the joint distribution Pt(Xt,Yt), are generally given. It is assumed that the joint distributions of different source domains are not equal to each other, and each source domain and target domain also have different joint distributions. Our aim is to reduce the difference between the labeled K source domains and the unlabeled target domain to make the extracted features domain-invariant so that the source learner has better performance when applied to the target domain.
We propose a network capable of handling multiple domains with different distributions for the cross-domain EEG emotion recognition problem. As shown in Figure 1, our proposed Multi-Source Joint Domain Adaptation (MSJDA) framework consists of three parts: (a) domain-shared feature extractors, (b) domain-private feature extractors, and (c) domain-private label predictors.
The proposed multi-source joint domain adaptation (MSJDA) framework. For brevity, here, only the i-th and j-th (i≠j and i,j ∈ {1,…,K}) of the K source domains are selected to present our network framework.
A domain-shared feature extractor E refers to a shared network structure through which the EEG data of all the source and target domains first pass. Its goal is to enable the neural network to learn abstract representations that are shared by different domains. These representations should be general to all domains for subsequent further processing. For an input sample x from a specific domain, the feature extracted by E is denoted as e=E(x).
Each of the domain-private feature extractors, F = {Fk|k = 1,…,K}, is a separate feature extraction structure for each pair of the k-th source domain and the target domain, which follows the domain-shared feature extractor E. In other words, there are K network branches after E, each of which corresponds to one source domain. The role of F_k is to extract some deep representations relevant to the final classification task for each pair of source and target domains. If ek is used to uniformly represent the general features of the k-th source domain or the target domain input to F_k, then the output corresponding to F_k is fk=Fk(ek).
The domain-private label predictors are denoted as G = {Gk|k = 1,…,K}, where G_k is connected to F_k. We treat each labeled source domain as an independent source domain. Each source domain has an independent label predictor, which is trained by means of supervised learning and connected to the corresponding private feature extractor. The output of the k-th label predictor can be expressed as gk=Gk(fk).
Our work is inspired by the work of Zhu et al. (2019), who proposed a multiple feature spaces adaptation network to address the cross-domain image classification problem in computer vision. They utilized MMD to match the marginal distributions of each pair of source and target domains and reduced the prediction divergence of individual source classifiers on target samples. Different from the work of Zhu et al. (2019), our method separately considers the joint distributions of each pair of source and target domains and does not need to align classifiers from different source domains.
For a labeled source domain sample, the error between its predicted label and its ground-truth label, which we call the label prediction loss, can be calculated by the cross-entropy loss function LCE(⋅,⋅). The label prediction loss for the k-th source domain is:
After passing through the domain-shared feature extractor, the unlabeled data of the target domain will enter the network branch of each source domain. In this way, we measure the JMMD between each source domain and target domain in a one-to-one manner. The network model proposed here only takes into account the case where each label predictor has one layer, but it can also be easily generalized to the multilayer case. The domain-private feature fk and label prediction gk, i.e., the activations of the last two layers of each network branch, are used to compute the joint distribution difference (joint adaptation loss) in these layers for each pair of source and target domains. This difference is taken as an approximation of the difference in the original joint distributions of the corresponding source and target domains. Then, the joint adaptation loss between the k-th source and target domains is computed as:
Therefore, the total loss function is the sum of the label prediction loss and the joint adaptation loss; that is, the objective function of the proposed network is
where λ > 0 is the tradeoff parameter between the two losses. The objective function is optimized as
to improve the performance of the label predictors and reduce the distribution difference between the source and target domains, thereby learning discriminative and domain-invariant feature representations.
During training, the EEG data samples of one of the source domains and target domain are input to fit the model in each iteration. At test time, the prediction of each sample in the target domain is determined by the average of the outputs of all the label predictors.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.