Datasets

KH Kexin Huang
YZ Yun Zhang
HG Haoran Gong
ZQ Zhengzheng Qiao
TW Tiangang Wang
WZ Weiling Zhao
LH Liyu Huang
XZ Xiaobo Zhou
request Request a Protocol
ask Ask a question
Favorite

This study used gene expression data from four LUAD datasets from TCGA (https://www.cancer.gov/tcga) and Gene Expression Omnibus (GEO) (www.ncbi.nlm.nih.gov/geo/) databases. The gene expression matrix from the TCGA data is derived from bulk RNA-sequencing, while the gene expression matrix from the GEO dataset is obtained from microarray technology. A total of 1202 LUAD tumor samples and 128 normal samples were included in our study. Firstly, gene expression data of 533 LUAD samples and 59 normal samples from TCGA-LUAD dataset were downloaded from the Genomic Data Commons (GDC) Data Portal (https://gdc-portal.nci.nih.gov/) using R package TCGAbiolinks [35]. DNA copy number, somatic mutation and clinical data were downloaded from TCGA data portal (https://portal.gdc.cancer.gov/). TCGA-LUAD dataset was used as the training cohort for model construction (https://www.cancer.gov/tcga). Then, gene expression data of GSE68465 and GSE10072 were obtained from the Gene Expression Omnibus (GEO) database [17, 18]. We combined the 443 LUAD samples from GSE68465 and 49 normal samples from GSE10072 as validation cohort 1. It is worth noting that the gene expression data for these two datasets were generated by using a common platform GPL96 (Affymetrix GeneChip Human Genome U133 Array Set HG-U133A). Thus, the number of genes contained in these two datasets are the same. We used ‘ComBat’ function in the SVA package to avoid batch effects [46,47]. Moreover, gene expression data of 226 tumor samples and 20 normal samples were collected from GSE31210 as validation cohort 2 [19]. Clinical information such as survival time and pathological and histologic stages of these samples was obtained from GEO for downstream analysis. The detailed information of four datasets is shown in Table A in S1 Appendix.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A