The cell line mixture dataset consists of 12 samples from 4 different cell lines. Each sample was tagged with a different HTO as described previously (Stoeckius, et al., 2018). Droplets were demultiplexed based on the distinct transcription profiles of the 4 cell lines independent of the HTOs using a standard single-cell clustering workflow as described by Amezquita, et al. (2020): Empty droplets were removed using the emptyDrops method (Lun, et al., 2019). The remaining 7,596 droplets were normalized and log-transformed using the logNormCounts method from the R package scater. The top 5,000 genes with the largest biological component of variance were selected using modelGeneVar. Principal component analysis (PCA) was then applied to the top 500 most variable genes in this set. Droplets were clustered using the Walktrap community detection algorithm applied to the n=10 nearest-neighbor graph constructed from the top 50 principle components (Pons and Latapy, 2006). A total of 19 clusters were detected. The larger clusters (≥ 200 droplets) reflected the different cell lines, and droplets from these clusters were labeled accordingly, except for one cluster of 294 cells, which demonstrated a high amount of mitochondrial reads (mean of 16.6%), indicating apoptotic cells. This cluster and the remaining smaller clusters, some of which likely reflect multiplets, were labeled as ‘uncertain’. The dataset was downloaded from Gene Expression Omnibus (GSM3501446 and GSM3501447).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.