Patient samples from the GSE161529 dataset from Pal and colleagues were downloaded from the Gene Expression Omnibus (GEO) server [7]. We generated a folder containing a barcode, features, and matrix file for each patient sample. We imported seven normal patient sample datasets with the Read10X Seurat command in R. Patient samples N-N280-Epi, N-N1105-Epi, N-MH0064-Epi are premenopausal and nulliparous, N-MH0023-Epi is premenopausal and parous, N-PM0342-Epi is postmenopausal and nulliparous, and N-PM0372-Epi and N-MH275-Epi are postmenopausal and parous [7]. We removed potentially low-quality cells with ≥ 20% mitochondrial genes, < 500 genes, < 1500 unique molecular identifiers (UMI), or < 0.8 log10GenesPerUMI. We removed genes expressed in less than ten cells. SCTransform was used to normalize, scale, and find variable features in the data. Anchor-based integration was performed. We used 100 principal components for PCA analysis, followed by t-SNE and UMAP analysis. SCTransform is highly effective, so 100 principle components contribute to robust analysis. We performed a clustering analysis with several cluster resolution values and generated heatmaps, t-SNE, and dot plots. The top differentially expressed genes were used as population-specific marker genes and examined in detail. We also examined eight scRNAseq TN BrC datasets from GSE161529 using the above protocols, with the additional step of removing cell populations that highly express CD31 (endothelial cell marker) and CD45 (immune cell marker). Four samples had BRCA mutations: TN-B1-MH4031, TN-B1-MH0131, TN-B1-Tum0554, TN-B1-MH0177; “B1” indicates BRCA mutant. Four samples had normal BRCA: TN-MH0126, TN-MH0135, TN-SH0106, TN-MH0114-T2.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.