EpiScanpy chromatin data integration workflow

AD Anna Danese
MR Maria L. Richter
KC Kridsadakorn Chaichoompu
DF David S. Fischer
FT Fabian J. Theis
MC Maria Colomé-Tatché
request Request a Protocol
ask Ask a question
Favorite

In the advent of having multiple datasets of the same omic (single-cell ATAC-seq or DNA methylation) to analyse jointly, it is important to remove potential batch effects. EpiScanpy offers this possibility using the bbKNN33 batch correction method. In order to integrate the different batches, it is required to use a common feature space. Thus, a preliminary step is to build count matrices using a shared set of features like windows or a common set of peaks between datasets. To obtain a good embedding of the different datasets together, it is important that the set of features used is representative of all datasets. For that, we select the most variable features on each dataset separately. Then we concatenate the datasets keeping the intersect of the variable features. Alternatively, epiScanpy can merge the datasets using the union of the different feature spaces. Additional quality controls and filtering are recommended to remove features that are not covered in enough cells, and cells which do not contain enough covered features. Finally, we proceed to library size normalisation and run the integration method on this concatenated matrix.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A