We gained the latest expression profile, CNV data and clinical characteristics from the Cancer Genome Atlas (TCGA) [26]. A total of 556 samples (500 tumors and 56 normal tissues) were enrolled in this study. We also downloaded the fragments per kilobase of exon model per million reads mapped (FPKM) data of LUAD from the transcriptome RNA-Sequence data in GEO database [27], incorporating 226 LUAD samples in GSE31210 dataset and 83 LUAD samples in GSE30219 dataset.
For TCGA-LUAD, samples without clinical follow-up information, survival time samples, and status samples were removed. Genes with FPKM <1 in more than half of the samples were removed. Tumor samples and normal tissue samples (Primary Solid Tumor and Solid Tissue Normal) were retained.
For GEO data, the criteria for enrollment of publicly available LUAD patient's data were as follows: samples without clinical follow-up information, survival time, and survival status were removed. The probes correspond to multiple genes were removed. Expressions with multiple gene symbols taken a median value. The clinical statistical information of the samples is shown in Table 1. The workflow of this study is presented in Figure 1.
The workflow of this study.
Clinical sample information for three datasets.
∗Lifelong nonsmoker (less than 100 cigarettes smoked in lifetime) = 1; current smoker (includes daily smokers and nondaily smokers or occasional smokers) = 2; current reformed smoker for >15 years (greater than 15 years) = 3; current reformed smoker for ≤15 years (less than or equal to 15 years) = 4; current reformed smoker, duration not specified = 5; smoking History not documented = 7.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.