Secondary Analysis

KR Katharina Rindler
WB Wolfgang M. Bauer
CJ Constanze Jonak
MW Matthias Wielscher
LS Lisa E. Shaw
TR Thomas B. Rojahn
FT Felix M. Thaler
SP Stefanie Porkert
IS Ingrid Simonitsch-Klupp
WW Wolfgang Weninger
MM Marius E. Mayerhoefer
MF Matthias Farlik
PB Patrick M. Brunner
request Request a Protocol
ask Ask a question
Favorite

Seurat package (version 3.1.4, RRID: SCR_016341) was applied to perform quality control and integrate all samples. The filtering criteria applied to each cell were the number of genes expressed (between 200 and 4000) and mitochondrial gene percentage (less than 12%) in order to discriminate for multiplets and dead cells leaking mRNA but include cell types with naturally higher mitochondrial content (such as macrophages). All cells not satisfying these criteria were discarded. Following QC, data from TCR receptor sequencing and transcriptome sequencing were merged by adding clonotype frequency and CDR3 amino acid sequences to the metadata column of the Seurat objects. All samples were aligned with the standard integration pipeline, as recommended by the Seurat developers (12, 13). Briefly, gene expression counts were log-normalized and 2,000 variable features were selected individually for each sample, and used to find integration anchors, and for principal component analysis. Based on explained variance by each principal component (elbow plot), we selected the first 22 principal components as input for dimension reduction and clustering using the Louvain algorithm at a resolution of 0.6. Clusters were visualized in two-dimensional space by Uniform Manifold Approximation and Projection (UMAP). The corresponding cell types of clusters were annotated by finding cluster markers with the “FindAllMarkers” command and running the SingleR package (1.0.5) (14). Differential gene expression (logFC>|0.25|, adjusted p-value<0.05) was calculated using the FindMarkers command and the Wilcoxon Rank Sum Test. P-values were adjusted for multiple comparisons with Bonferroni correction. We used scran package to find droplets containing more than one cell (15). The applied approach simulates thousands of doublets by adding together two randomly chosen single cell profiles. For the doublet score calculation cell clustering including the set randomly generated doublets was performed. Then for each cell of the original dataset the number of simulated doublets in their neighborhood was recoded and used as input for score calculation. We used 200 nearest neighbors for each cell. Doublet score was log10 of the ratio between simulated doublet cells and total number of neighbors taken into consideration for each cell.

Calculation of cell cycle scores was performed as implemented in the Seurat package, where gene expression of cell cycle marker genes was combined to a score. The score consisted of 43 genes primarily expressed in G1/S and 55 primarily expressed in G2/M, described in more detail by Tirosh I et al. (16).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A