The RNA-seq data of 521 samples and miRNA-seq data of 465 samples with colon adenocarcinoma were retrieved from the TCGA data portal (https://portal.gdc.cancer.gov/). R software and the package GDCRNATools were applied to read the RNA-seq sample sheet and remove the repetitive samples and the samples that were not primary tumors. Finally, 469 primary CRC tumors and 41 normal tissues in total were collected. The RNA-seq data contained more than 60,000 genes including noncoding genes with the Ensembl Gene ID. For miRNAs, a matrix of 451 primary tumors and 8 normal tissues was built with the expression level of all the genes. The miRNA-seq data included more than 2,500 miRNAs with annotated miRNA IDs. In addition, the corresponding clinical information was downloaded. The sample sheets provided information on case ID, sample ID, sample type and clinical information such as race, ages, gender, pathologic stage, vital status, days to death or days to last follow up of the patients. This study was in accordance with the publication guidelines provided by TCGA (https://cancergenome.nih.gov/publications/publicationguidelines). All the packages and databases in the following analysis were well-established open data and require no further ethical approval.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.