To identify testis-restricted transcript, we downloaded preanalyzed global gene expression data from the GTEx Consortium (16). Normalized expression levels [transcripts per million reads (TPM)] of 18,059 transcripts in all nonreproductive tissues (42) and testis were analyzed. To stringently exclude transcripts with expression in nonreproductive tissues, we used a 0.5-TPM expression value as a cutoff. The number of nonreproductive tissues with >0.5 TPM expression for each transcript was plotted against the log2 TPM expression in testis. Transcripts with robust expression in testis (>10 TPM) and <0.5 TPM in all other tissue were considered testis-restricted. One hundred eighty-five transcripts were identified by these filtering criteria. Note that the number of testis-specific genes is likely higher due to the GTEx data analysis that discards multimapping reads that could prevent discovery of genes from multicopy gene families. To identify which of the 185 transcripts are eutherian specific, we used OrthoDB (17) to manually curate those without orthologs outside of eutherian mammals.

To determine which of the 185 testis-restricted genes are aberrantly expressed in cancer (CTAs), we queried TCGA using cBioPortal (43) for expression in breast invasive carcinoma (n = 1105), cervical squamous cell carcinoma and endocervical adenocarcinoma (n = 306), colon adenocarcinoma (n = 382), esophageal carcinoma (n = 185), head and neck squamous cell carcinoma (n = 522), liver hepatocellular carcinoma (n = 374), lung adenocarcinoma (n = 523), lung squamous cell carcinoma (n = 501), ovarian serous cystadenocarcinoma (n = 307), sarcoma (n = 263), skin cutaneous melanoma (n = 472), stomach adenocarcinoma (n = 415), and uterine corpus endometrial carcinoma (n = 177) datasets. The average and maximal RNA-Seq by Expectation Maximization (RSEM) expression value were calculated across all 13 tumor types (n = 5532).

