The RNA-Seq profiles from GTEx project

LJ Lin Jiang
CX Chao Xue
SD Sheng Dai
SC Shangzhen Chen
PC Peikai Chen
PS Pak Chung Sham
HW Haijun Wang
ML Miaoxin Li
request Request a Protocol
ask Ask a question
Favorite

The normalized expression datasets at the gene level and transcript level were downloaded from GTEx project (V7) [73], GTEx_Analysis_2016-01-15_v7_RNASeQCv1.1.8_gene_tpm.gct.gz and GTEx_Analysis_2016-01-15_v7_RSEMv1.2.22_transcript_tpm.txt.gz. The sample sizes of each tissue were different, ranging from 5 to 564 (Additional file 1: Table S2). There were initially 196,520 transcripts and 56,205 genes in 53 tissues. The expression values were measured by transcripts per kilobase million (TPM). As TPM is effective for cross-tissue comparison [74], we did not retransform the expression values by other measurements. A series of quality control procedures were carried out. The mean and standard deviation of expression values of all genes in each tissue were calculated. In the evaluation according to correlation, three tissues (the whole blood, pancreas, and pituitary) had low Pearson correlation with other tissues (Fig. (Fig.2a2a and Additional file 1: Figure S4) and were excluded. In the calculation of tissue-selective expression, genes or transcripts having ≤ 0.01 TPM in all tissues were excluded. Genes whose Ensembl IDs had no corresponding official HGNC gene symbols were excluded as well. Finally, 131,292 transcripts and 31,659 genes in 50 tissues were retained for subsequent analysis.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A