Three commonly used scRNA-seq datasets were employed to evaluate the performance of different batch effect removal methods. The first dataset “panc_rm,” includes human pancreas cells measured by 5 different platforms. To measure the ability of different methods to detect the batch-specific cell types, we manually removed “ductal” cells from the “CEL-seq” dataset and “acinar,” “alpha” cells from the “inDrop” dataset. The “ductal” cell type has the largest number of cells in the “CEL-seq” sub-dataset. With their removal, the primary variance of “CEL-seq” may be determined by the second and third most numerous cell type, i.e., “acinar” and “alpha” cells. Then, we further removed these two cell types from another “inDrop” sub-dataset which was selected as the integration anchor. The second dataset “cell_lines,” is composed of three sub-datasets all sequenced by the 10x platform. Two of them are pure cell lines (“Jurkat” and “293 T”), and “Mix” is the equal mixture of “Jurkat” and “293 T.” For the “Mix” dataset, we performed the standard “Seurat” pipeline to cluster and annotate the cells. Those clusters with high expression of XIST were set as “293 T” while others as “Jurkat.” The third dataset “DC_rm,” consists of human DCs sequenced by the Smart-seq2 protocol. CD1C DCs in batch 1 and CD141 DCs in batch 2 were also removed, which are biologically similar.

Two recently published benchmark datasets “SCP424_PBMC” and “SCP425_cortex,” which sequenced thousands of cells from peripheral blood mononuclear cells and brain tissue respectively, with over ten protocols, covering most of single-cell and/or single-nucleus profiling methods, were also included for comparison of different methods. The log-10 K data, and meta information were downloaded from the Single Cell Portal (; Additional file 1: Table S1). We also tested the performance of iMAP on five additional datasets, with various numbers of cells, and detailed information can be found in Additional file 1: Table S1.

To test the performance, especially the time cost of iMAP for large-scale datasets, we ran iMAP on the Tabila Muris dataset, which consists of the mouse cells sequenced by two platforms, e.g., Smart-seq2 and 10x. The “UpdateSeuratObject” function updated the downloaded Seurat object to the version v3. The sequencing platforms were regarded as the batches. Another dataset containing over 600,000 cells from Human Cell Atlas was also adopted to test the scalability of iMAP, and its detailed information can be found in Additional file 1: Table S1.

The “CRC” dataset was used to test the applications of iMAP on the tumor microenvironments. Nearly 50,000 cells from human colon cancer were sequenced by either Smart-seq2 or 10x platforms. Cells from different patients sequenced by Smart-seq2 show less technical variations than those by 10x [30]. Therefore, we regarded all cells from Smart-seq2 as a single batch, and every patient sequenced by 10x was a separate batch. Cell types and tissue sources information were obtained from the original publication.

We compared our method with nine leading scRNA-seq batch effect removal methods: ComBat, scVI, LIGER, fastMNN, BBKNN, Harmony, Scanorama, Seurat v3, and DESC. See Additional file 1: Table S2 for detailed version information. Combat and BBKNN correction were performed using the scanpy API “scanpy.pp.combat” and “scnpy.external.pp.bbknn.” scVI was run using the default parameters and obtained latent representations were used for further analysis. The “optimizeALS” parameter of LIGER was set to “k = 20.” We used the “SeuratWarpper” versions of fastMNN (“RunFastMNN”) and Harmony (“RunHarmony”). Scanorama was run using the default parameter of “scanorama.correct.” The dimensions parameters of Seurat v3 were all set to “dim = 1:30.” DESC was run with the default parameters, and especially the “louvain_resolution” was set as 1.0. Because some methods cannot give the corrected expression values, we compared them by using the UMAP embeddings. All embeddings were run by using the same parameters of the Python package “umap-learn.”

There exists an extensive list of batch effect removal evaluation indices in the literature [6]. Some widely used include kBET (k-nearest neighbor batch-effect test) [18], LISI (Local Inverse Simpson’s Index) [13], ASW (average silhouette width), and ARI (adjusted rand index). We argue that ARI and ASW are cluster-level indices and cannot reliably evaluate the mixture of cells from different batch at a local single-cell level (Additional file 1: Fig. S1a). kBET and LISI evaluate the batch mixing at a local level by comparing the batch distribution with kNNs of a cell with the global batch distribution. kBET has the advantage in evaluating the integration performance of batch-shared cell types, one drawback of which, however, is that when it measures the batch mixture, it is cell type ignorant. This may cause unfair results when the proportions of share cells types are too discrepant in different batches [13]. LISI could evaluate both the capacity of identification batch-specific cell types and the integration of batch-shared cell types, but it is hard to summarize all single cell-level LISI values into a simple statistic for comparing between various methods. kBET and LISI are nonetheless reliable metrics when appropriated employed. So, we first used these two kinds of metrics to compare different methods. For kBET, we computed the acceptance rates for each cell type separately and summarized the median value over all tested cells as the final output. For the “DC_rm” and “panc_rm” datasets, only those cell types appearing in all batches were taken into account, and since no cell type appears in all three sub-datasets of “cell_lines,” we computed the acceptance rates for the integration of “Jurkat” and “Mix” and the integration of “293 T” and “Mix,” respectively. One important parameter k, the number of nearest neighbors, has a large effect on the results of kBET, and following the kBET paper, a series of k values, which are chosen as 5%, 10%, 15%, 20%, and 25% of the total cell numbers, are adopted to run kBET. For LISI, we computed the cLISI and iLISI values for each cell, with the ideal cLISI equal to one. iLISI values of different methods are compared for each cell type separately, because the best values are cell type-specific, and determined by the number of batches having this specific cell type [13].

Considering that these indices all have their own limitations in terms of simultaneously evaluating both cell type and batch mixing, we propose two new indices to evaluate the batch mixture. Our evaluation procedure is also based on kNNs of a cell and divided into two successive steps (Additional file 1: Fig. S1b). Firstly, we classify all cells into “positive” and “negative” cells. “Positive” cells are those surrounded mostly by cells from the same cell type. Be default, one cell is assigned as “positive” only if at least 50% cells of its kNNs are with the same cell type label, otherwise “negative” (k is set as the minimum of 100 and the number of cells for this cell type). Then, those positive cells are further discriminated into “true” and “false” positive cells by a second dichotomous classifier. “True” positive cells are those surrounded by appropriate proportions of cells with different batches. We use the three-sigma rule of thumb to measure whether the observed batch distribution of one positive cell’s neighborhood is consistent with the global batch distribution. Considering a cell with cluster label y, the number of cells from cell type y in all n batches are N1, N2, ⋯, Nn respectively. We define pi = Ni/∑jNj for i = 1, 2, ⋯, n. Then, by expectation, if we sample k cells from cell type y, the number of cells from batch i is equal to kpi. We regard a positive cell as true positive if the numbers of its neighbors from different batches are all within the range of 3 standard deviation around the expectation. This is to say, suppose kNNs of one true positive cell have the batch distribution k1, k2, ⋯, kN, then kimax0kpi3kpi1pikpi+3kpi1pi for all i = 1, 2, ⋯, n. By these two classification procedures, we could automatically identify those cells that are not mixed well. We use the proportions of positive and true positive cells as the quantitative indices to evaluate the performance of batch effect removal of different methods. This two-classifier system also provides an effective tool for visualizations of the batch effect removal results.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.