Advanced Search
Published: Oct 5, 2023 DOI: 10.21769/BioProtoc.4830 Views: 506
Reviewed by: Yi Zheng
Abstract
With the massive development of long-read sequencing technologies, more and more complex genomes are being assembled. Genome alignment is an essential step for the majority of genomics research. Sensitively and accurately aligning complex genomes is critical to gain information from those genomes. In this protocol, we introduce AnchorWave (Anchored Wavefront alignment), which uses conserved sequences as anchors to identify collinear blocks and performs global sequence alignment at the nucleotide level using a 2-piece affine gap cost strategy. Here, we give two examples including maize (Zea mays), sorghum (Sorghum bicolor), and different cultivar maize lines. Maize went through an extra round of whole-genome duplication evolutionary process compared to sorghum. Moreover, maize has abundant transposable elements (TEs), genome rearrangement, and gene losses. AnchorWave provides a significant improvement compared to previous methods when aligning plant genomes with dispersed repeats, active TEs, high sequence diversity, and whole-genome duplication.
Keywords: AnchorWaveBackground
Recent technological advancements lead to a huge number of genomes being sequenced. We propose to unlock the secret of genome by genome comparison. Genome alignment is a fundamental step of genomics analysis, providing a set of intermediate results for downstream analysis. However, with previous genome alignment tools, it is difficult to deal with long (> 50 bp) inserts and deletions (InDels) and whole-genome duplication (WGD) that have taken place between two plant genomes. In addition, plants have a lower gene length and more complex structure variations and chromosome rearrangement than mammals. Software developed by the mammal research community cannot sensitively and accurately align plant genomes. This protocol illustrates AnchorWave, which can deal well with dispersed repeats, active TEs, high sequence diversity, and WGD (Song et al., 2022). AnchorWave combines a collinear region identification approach with a 2-piece affine gap cost global alignment strategy. This software firstly identifies collinear regions with conserved anchors and then performs base-pair resolved global sequence alignment for each anchor and inter-anchor region. AnchorWave provides a significant improvement for base-pair-resolved plant genome alignment.
Software and datasets
Software
AnchorWave (Song et al., 2022; v1.0.1; https://github.com/baoxingsong/AnchorWave; MIT License)
Samtools (Li et al., 2009; v1.6; http://www.htslib.org)
minimap2 (Li, 2018; v2.17-r941; https://github.com/lh3/minimap2)
ggplot2 (Wickham, 2016; v3.3.5; https://ggplot2.tidyverse.org)
Input data
Reference genome in FASTA format and annotation in GFF(3) format
Query genome in FASTA format
Procedure
Category
Bioinformatics and Computational Biology
Systems Biology > Genomics > Transposons
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.
Share
Bluesky
X
Copy link