De Novo assembly and functional annotation

YZ Yin Zhang
YL Yang Li
XL Xiao Liang
XC Xiaojuan Cao
LH Longfei Huang
JY Jie Yan
YW Yanxing Wei
JG Jian Gao
request Request a Protocol
ask Ask a question
Favorite

High quality sequences were indispensable for de novo assembly analysis. Raw sequencing reads were clipped by discarding adapter sequences and ambiguous nucleotides before assembly. Then all clean reads of the libraries of the three different groups assembled into transcripts by Trinity software. Trinity is a modular method which combines three components: Inchworm, Chrysalis and Butterfly. Firstly, Inchworm assembles reads by a greedy k-mer based approach for linear contigs collection. Contigs longer than 200 bases were used for subsequent analysis. Chrysalis clusters the related contigs, and then a de Bruijn graph is built for each cluster. Finally, Butterfly analyzes the paths based on reads and read pairings from the corresponding de Bruijn graph and outputs full-length transcripts for alternatively spliced isoforms. After assembly, the TGICL clustering software (J. Craig Venter Institute, Rockville, MD, USA) was used to cluster and remove redundant transcripts, and then the remaining sequences were defined as unigenes. Blastx with an E-value <10−5 between the unigenes and the databases non-redundant proteins (Nr), Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG), Gene ontology (GO) and Clusters of Orthologous Groups (COG) was conducted. GO annotation of these unigenes was produced using Blast2GO based on the results of the NCBI Nr database annotation. Blastn was used for aligning these unigenes to the Nr database, searching proteins with the highest sequence similarity to the given unigenes, accompanied by their protein functional annotations.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A