TCGA Data Download and Preprocessing

RH Rui Huang
JL Jinying Liu
HL Hui Li
LZ Lierui Zheng
HJ Haojun Jin
YZ Yaqing Zhang
WM Wei Ma
JS Junhong Su
MW Min Wang
KY Kun Yang
request Request a Protocol
ask Ask a question
Favorite

Gene expression quantification data and corresponding clinical information for HCC were downloaded from The Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC) data collection. The 424 HTSeq-counts files comprised 371 tumor samples and 50 normal samples. Clinical information was extracted and included follow-up time and clinical status. The TCGA expression matrix was obtained by data fusion and ID transformation of raw TCGA counts data. Next, the RPKM (Reads Per Kilobase per Million mapped reads) values were calculated for the WGCNA.

We applied the “limma” package (Ritchie et al., 2015) of R software to perform normalization and base-2 logarithm conversion for the matrix data for each GEO and TCGA dataset. differentially expressed genes for each GEO and TCGA matrix were obtained by transforming expression values, and genes were sorted according to the log2FoldChange (logFC) value. Next, rank analysis was performed using the R package “RRA.” The criterion for screening DEGs is that the P < 0.05 and | logFC | > 1.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A