request Request a Protocol
ask Ask a question
Favorite

This scRNA-seq dataset contains mouse neuronal cells in the dorsal root ganglion (Usoskin et al., 2015). The processed expression values were downloaded from the Github respiratory3 of the zinbwave paper. Following the process procedures given in the zinbwave paper, the authors considered 622 cells with a classification of 11 neuronal cell-types, which were denoted as NF1 to NF5, NP1 to NP3, PEP1, PEP2 and TH. Genes with less than 20 counts were removed and a total of 12,132 genes are considered for the following analyses with 68% zero counts. The authors showed the existence of a batch effect related to the picking session for the cells. Thus, the picking session covariate (with values Cold, RT-1, and RT-2) in this dataset was considered as a batch covariate for real data analysis. The batch effect was associated with expression measures and the relationship between zero inflation and sequencing depth, which was shown in Figure 5 of the zinbwave paper (Hicks et al., 2015; Van den Berge et al., 2018). We repeated the results of Figures 5A,B of the zinbwave paper in Supplementary Figures S1A,B. There is a large variation in the depth of sequencing among batches, which weaken the overall association with zero inflation when pooling cells across batches (Supplementary Figure S1A). Zero inflation was also identified for the Usoskin dataset. Histograms of observational weights for nonzero counts, which were calculated by the ZINB-WaVE model including the cell type as a covariate with and without the batch effect as fixed effects, are shown in Supplementary Figure S1B. Calculated weights of nonzero counts with and without the batch effect both have high modes near zero. This suggests zero inflation in the Usoskin dataset. The real data analysis of the processed Usoskin dataset was done to identify DE genes for defined 11 cell types vs. the rest. Simulated datasets based on this dataset were generated by spaltter with estimated corresponding parameters. For a null dataset without DE genes, we created 10,000 genes, 250 cells, five balanced batches and two balanced groups for cells. Twelve methods were implemented to identify DE genes between the two groups for each of the 30 simulated null data sets. A gene was declared to be DE if its unadjusted p-value was less than or equal to 0.05. Declared DE genes were false positives for these simulated null datasets. The empirical PCER of each method was calculated as the proportion of declared DE genes and was compared to the 0.05 nominal PCER.

Computational times for differential expression methods on the simulated null Usoskin and Tung datasets, which were generated by splatter. The number of cores were set to be 1 and 8 on a cluster with 24 Intel Xeon Processor (Skylake, IBRS) at 2.60 GHz (2593 MHz) and 128 GB RAM.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A