The lists of lncRNAs and PCGs were downloaded from the GENCODE website1. Data from the release v27 were used for human genes annotated on the genome sequence GRCh38 (gencode.v27.long_noncoding_RNAs.gtf.gz; gencode.v27.basic.annotation.gtf.gz). Data from the release M16 were used for mouse genes annotated on the genome sequence GRCm38 (gencode.vM16.long_noncoding_RNAs.gtf.gz; gen code.vM16.basic.annotation.gtf.gz). PCGs were selected from the basic annotation when both gene and transcript were indicated as “protein_coding”. The total number of genes, transcripts and exons considered in both species are reported in Supplementary Table S1.
An independent validation of the results from GENCODE was obtained by collecting human lncRNAs annotations data from 6 different databases: the FANTOM5 database (Fantom CAT genes2; FANTOM_CAT.lv3_robust.only_lncRNA.gtf) (Hon et al., 2017), the NONCODE v.5 database3 (Fang et al., 2018), the BIGTranscriptome database release 2016 lncRNA catalog4 (You et al., 2017), the LncBook database5 (Ma et al., 2019), the MiTranscriptome database6 (Iyer et al., 2015), and the LNCipedia database version 5.27 (Volders et al., 2013). A validation of results obtained from the mouse genome was performed using lncRNAs annotations from the NONCODEv5 database8.
The lists of lncRNAs and PCGs of Drosophila melanogaster and Caenorhabditis elegans were downloaded from the BioMart data mining tool (Smedley et al., 2015) in the Ensembl genome database (release 91).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.