For getting synthetic data matrices about stomach adenomas and adenocarcinomas and normal stomach tissue, the RNA transcriptome datasets (HTSeq—Counts and HTSeq—FPKM) and the relevant clinical information were downloaded from Genotype-Tissue Expression Project (GTEx) (https://www.gtexportal.org/) and The Cancer Genome Atlas (TCGA) (https://portal.gdc.cancer.gov/). Then we converted the FPKM value to the TPM value of the synthetic matrix by data.table, tibble, dplyr, and tidyr R packages. As a result, we got two synthetic data matrices. The Counts value matrix was just for identifying differentially expressed lncRNAs, while the TPM value matrix was for the other analyses. To reduce statistical bias in this analysis, stomach adenomas and adenocarcinomas patients with missing overall survival (OS) values or short OS values (<30 days) were excluded. With relevant clinical information, we retrieved 306 patients and divided them into the train risk group and test risk group randomly by Strawberry Perl and caret R package. The ratio was 1:1.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.