2.7. Genome Sequence Reassembly Using Lep-Anchor

JB John H. Boyle
PR Pasi M. A. Rastas
XH Xin Huang
AG Austin G. Garner
IV Indra Vythilingam
PA Peter A. Armbruster
request Request a Protocol
ask Ask a question
Favorite

Genome anchoring, haplotype reduction, and scaffold reassembly of the Palatini et al. (2020) AalbF2 genome assembly were achieved with Lep-Anchor [16]. The input initially consisted of the 9 linkage maps described above with marker intervals from Lep-MAP3 consisting of 111,328 SNP markers. Additionally, we used a contig-contig alignment chain file calculated using HaploMerger2 [28] and mappings from minimap2 [29] for Ae. albopictus PacBio reads (SRR8839546-57, 60–70). We first split the scaffold level assembly of Palatini et al. (2020) into contigs using a custom script. To utilize the scaffolding links (Hi-C) from the Palatini et al. (2020) genome, we constructed an artificial paf file with alignments between each contig joined initially into scaffolds. We followed the pipeline given in Lep-Anchor’s wiki [30] (scripts provided in our dryad repository), first removing contigs being full haplotypes with findFullHaplotypes.awk (parameter minScore = 20) based on the alignment chain only. We then mapped the markers in these haplotype contigs into non-haplotypes using the LiftoverHaplotypes module. Next, CleanMap was run to put contigs into chromosomes. CleanMap split 11 contigs into two or more chromosomes. We determined the split sites in these 11 contigs by inspecting the contig-contig alignments manually and by HaploMerger2 (hm.batchA3.misjoin_processing). A total of 5 contigs could be split exactly, and the location of the split-site could be approximated based on the alignments for 5 of the remaining 6 contigs.

We then ran PlaceAndOrientContigs (POC) with a bed file from Map2Bed including all non-haplotype contigs not assigned to any chromosome (i.e., contigs with no linkage map markers). Following POC, we used the propagate script (iterated 9 times, until additional contigs could not be added) to place contigs that could be assigned to a single chromosome. The remaining contigs were left without chromosome assignment. We ran POC three times, first with only the contigs with a clear assignment and two more times removing newly found haplotypes between runs (using removeHaplotypes.awk).

Custom R scripts were used to obtain Marey maps for each chromosome. This pipeline was run twice: after the first run, we identified regions in three of the maps that contradicted the other maps (one in pseudochromosome 2, and two in pseudochromosome 3). The contradictory regions were all in those maps produced using markers for which one intercross F0 parent was a heterozygote. The problematic regions produced by the first pipeline run seemed to be caused by gaps without markers (and a high amount of crossing over due to the F7 intercross design). By flipping markers in one region (of linkage map 5) and splitting two other maps into 2 and 6 parts (linkage maps 4 and 8, respectively) we could obtain a set of 15 consistent maps that were used in the second run of the pipeline.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A