scRNA-seq data pre-processing

HL Hansong Lee
JJ Ji-Young Joo
DS Dong Hyun Sohn
JK Junho Kang
YY Yeuni Yu
HP Hae Ryoun Park
YK Yun Hak Kim
ask Ask a question
Favorite

Single-cell gene expression data were processed using 10× Genomics Cell Ranger v3.1.0. Raw BCL files from the Illumina sequencing platform were demultiplexed to generate FASTQ files using the ‘cellranger mkfastq’ pipeline. Then, raw FASTQ files were analyzed using the ‘cellranger count’ pipeline. This step includes alignment to the human reference genome (GRCh38, v3.0.0) and measurement of gene expression with a unique molecular identifier (UMI) and cell barcode. Consequently, a cell-by-gene count matrix was generated. To remove low-quality cells, cells with less than 500 UMIs or more than 20,000 UMIs and > 20% mitochondrial genes were filtered out. In addition, we removed cells with fewer than 250 genes or more than 5000 genes, as well as cells with less than 80% complexity (number of genes detected per UMI with log transformation), which could be interpreted as specific cell types, artifacts, or contaminants. In addition, we included genes expressed in more than 0.1% of the cells, not only to eliminate zero counts, but also to prevent genes expressed in a few cells from lowering the average of all other cells. As some samples had a large number of cells (maximum 12,177 cells), possible doublets were estimated using Scrublet, and 3.5% of cells were eliminated (maximum 11,852 cells) [13].

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A