R (version 3.6.1, https://www.r-project.org/) and Signac R packages (version 1.0.0, https://github.com/timoast/signac/)20,21 were used to perform downstream analysis. We identified barcodes representing genuine cells mainly by TSS enrichment score and the number of unique fragments. The filter metrics were determined by referencing the Signac official tutorial and previous studies (https://satijalab.org/signac/)20. The criterion was as follows: (1) the peak region fragment was >3000 and <10000 unique fragments; (2) enrichment at transcription start sites (TSS) ≥2; (3) pct reads in peaks ≥15; (4) blacklist ratio ≤0.025; (5) nucleosome signal <10 were filtered. And the outliers for those QC metrics were removed.
After QC, the high quality scATAC-seq datasets were obtained, then were normalized by term frequency-inverse document frequency (TF-IDF) and Seurat function “Run TFIDF”. The dimensionality was reduced from the DNA accessibility assay by latent semantic indexing (LSI), while the first LSI component was usually be removed from downstream analysis for capturing sequencing depth rather than biological variation.
After linear dimensional reduction, the cells were embedded in a low-dimensional space, performed graph-based clustering and non-linear dimension reduction for visualization, and applied the UMAP algorithm to visualize and identify cell clusters by Seurat function of “RunUMAP” and “FindClusters”.
To define the specific highly expressed gene set of each cluster, we generated a count matrix and calculated the genescore value by the Signac function “GeneActivity ()”. The activity of each gene was quantified by evaluating the chromatin accessibility associated with the gene in the scATAC-seq data. A gene activity matrix was generated from the reads mapped to gene body and promoter (upstream 2 kb from the TSS), and calculated the genescore value of each gene. In order to facilitate cluster annotation, the gene activity of TopFeatures was examined and visualized genescore by “DotPlot”. Finally, the “gene activity” of some typical cell type-specific marker genes were visualized for clustering and cell type assignment of scATAC-seq data.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.