We assume the following model for the observed data xijg:
Here i is the index for the observation, j the index for the batch and g the index for the variable. The term parametrizes the effect of experimental conditions or, in general, any factors of interest aij on the measurements of variable g. In this paper, aij is a dummy variable representing the binary variable of interest yij, with aij=1 if yij=2 and aij=0 if yij=1, respectively. The term εijg represents random noise, unaffected by batch effects. The term γjg corresponds to the mean shift in location of variable g in the j-th batch compared to the unobserved—hypothetical—data unaffected by batch effects. The term δjg corresponds to the scale shift of the residuals for variable g in the j-th batch. As in the SVA model (Appendix A.2, Additional file 1), Zijl are random latent factors. In contrast to the latter model, in our model the distribution of the latent factors is independent of the individual observation. However, since the loadings bjgl of the latent factors are batch-specific, the latter induce batch effects in our model as well. More precisely, they lead to varying correlation structures in the batches. In the SVA model, by contrast, all batch effects are induced by the latent factors. Without the summand model (1) would equal the model underlying the ComBat-method, see Appendix A.1 (Additional file 1).
The unobserved data not affected by batch effects is assumed to have the form
The remaining batch effect adjustment methods considered in this paper are described in Appendix A.3 (Additional file 1).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.