Protein-coding genes in the CHM genome were predicted using a combination of homology-based prediction, de novo prediction, and transcriptome-based prediction methods. Five ab initio gene prediction programs were used to predict genes, including Augustus (http://Augustus. gobics.de/, version 2.5.5), Genescan (http://genes.mit.edu/GENSCAN.html, version 1.0), Geneid (http://genome.crg.es/software/geneid/), GlimmerHMM (http://ccb.jhu.edu/software/glimmerhmm/, version 3.0.2) and SNAP (http://korflab.ucdavis.edu/software.html, version 2013–11–29). Protein sequences of six homologous species (Vitis vinifera, Arabidopsis thaliana, Oryza sativa, Nelumbo nucifera, Aquilegia coerulea, and Amborella trichopoda) were downloaded from Ensembl or NCBI. Homologous sequences were aligned against the repeat-masked CHM genome using TBLASTN40 (E-value ≤ 1E−05). Genewise (https://www.ebi.ac.uk/Tools/psa/genewise, version 2.2.0) was employed to predict gene models based on the alignment sequences. The RNA-seq data were mapped to the CHM genome using Tophat (http://ccb.jhu.edu/software/tophat/index.shtml, version 2.0.8)41, and cufflinks (http://cufflinks.cbcb.umd.edu/, version 2.1.1)42 was then used to assemble the transcripts into gene models. Trinity (version 2.0.8) was used to de novo assemble the RNA-seq data. A weighted and non-redundant gene set was generated by EVidenceModeler (EVM)43, which only keeps the longest model per locus. Then, PASA software (version 2.0.2) (http://pasapipeline.github.io/)43 improved the gene structures. Finally, gene models were filtered by removing the genes having 20% of their CDS sharing an overlap with TEs and coding region lengths <150 bp. The final gene set contained 79,668 protein-coding genes (Supplementary Tables S9, S10 and Supplementary Figs. S5, S6).

Note: The content above has been extracted from a research article, so it may not display correctly.



Q&A
Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.



We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.