TCGA and GTEx data acquisition, normalization and quality control

SS Saiful Effendi Syafruddin
WN Wan Fahmi Wan Mohamad Nazarie
NM Nurshahirah Ashikin Moidu
BS Bee Hong Soon
MM M. Aiman Mohtar
request Request a Protocol
ask Ask a question
Favorite

The analysis combined the TCGA-GBM and GTEx normal brain RNA-Seq read count data. The GBM RNA-Seq gene raw read counts from TCGA were downloaded from Genomics Data Commons Data Portal (https://portal.gdc.cancer.gov). GTEx data were used for the normal brain tissues. The GTEx data used for the analyses described in this manuscript were obtained from the GTEx Portal on 29/03/19. We downloaded RNA-Seq gene raw read counts (from the cortex, frontal cortex, anterior cingulate cortex) from the GTEx portal (https://gtexportal.org/home/datasets). This allows us to perform the analysis of the differentially expressed gene on the 166 samples of GBM tumour from TCGA and 408 samples of normal brain tissues data from GTEx. The RNA-Seq raw read counts pre-processing steps involve are data filtering and data normalization. The normalization process of both data set was then performed by using mean as gene-level normalization using log2-counts per million where raw data are adjusted to account for factors that will prevent direct comparison of expression measures and to safeguard the expression distributions are similar for each sample across the whole experiment. Data that unlikely to be informative or simply erroneous data will be removed by using variance filter (less than 15) and low abundance (less than 4).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A