For library preparation, 10x Genomics Chromium Single Cell 3′ RNA-seq kits v3 were used. Gene expression libraries were prepared per the manufacturer’s instructions. 4 biological replicates totaling 8 processed tumors were sequenced in 2 batches: Run A - 2 NT2.5 tumors, 2 NT2.5-LM tumors; Run B - 2 NT2.5 tumors, 2 NT2.5-LM tumors. These tumors were taken as a subset from a larger batch of tumors that include various mouse treatments, with each batch having an equal assortment of samples from multiple treatment groups to reduce technical biases. Here, we restrict our analysis to replicates under the vehicle treatment condition. Illumina HiSeqX Ten or NovaSeq were used to generate total reads. Paired-end reads were processed using CellRanger v3.0.2 and mapped to the mm10 transcriptome with default settings. ScanPy v1.8.2 and Python v3 was used for quality control and basic filtering. DoubleDetection v4.2 with Louvain clustering algorithm v0.7.1 was used to find doublets. For gene filtering, all genes expressed in less than 3 cells within a tumor (NT2.5 and NT2.5-LM) were removed. Cells expressing less than 200 genes or more than 8,000 genes or having more than 15% mitochondrial gene expression were also removed. Gene expression was total-count normalized to 10,000 reads per cell and log transformed. Highly variable genes were identified using default ScanPy parameters, and the total counts per cell and the percent mitochondrial genes expressed were regressed out. Finally, gene expression was scaled to unit variance and values exceeding 10 standard deviations were removed. Neighborhood graphs were constructed using 10 nearest neighbors and 30 principal components. Tumors were clustered together within cell lines using Louvain clustering (with resolution parameter 0.12) and cancer cells were identified as Lcn+, Wfd2c+, Cd24a+, Cd276+, Col9a1+, Erbb2+.(Berger et al., 2010; Gündüz et al., 2016; Seaman et al., 2017; Sidiropoulos et al., 2022; Yang et al., 2009; Yeo et al., 2020) All other cell clusters and doublets were removed. There were ~10,000 NT2.5 cancer cells and ~9,000 NT2.5-LM cancer cells, and these were combined by total raw count normalization to 10,000 reads, with log transformation and batch correction on cell lines via ComBat. The 250 top differentially expressed genes in the cancer clusters from each cell line were identified using the Wilcoxon rank-sum test and compared for overlap with pathways from the ‘KEGG_2019_Mouse’ database using GSEAPY (Gene Set Enrichment Analysis in Python).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
 Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.