Genome Assembly, Polishing, and Assessment

XX Xiao Xiong
YK Yogeshwar D Kelkar
CG Chris J Geden
CZ Chao Zhang
YW Yidong Wang
EJ Evelien Jongepier
EM Ellen O. Martinson
EV Eveline C Verhulst
JG Jürgen Gadau
JW John H Werren
XW Xu Wang
request Request a Protocol
ask Ask a question
Favorite

The raw sequencing reads (Aub sample) from both PacBio library and 10× Genomics library were checked for sequencing quality using FastQC (Andrews et al., 2010) before genome assembly. De novo genome assembly for the M. raptorellus Aub sample was performed by a Supernova 2.1.1 (Weisenfeld et al., 2017) assembler using 400 million reads subsampled from the total amount of reads generated from the 10× Genomics library. Filtered HiFi PacBio reads were assembled by hifiasm v0.13 (Cheng et al., 2021) and HiCanu v2.1.1 (Nurk et al., 2020), dedicated assemblers using long-read sequencing. The Kop CLS PacBio data were assembled using Canu v2.1 (Koren et al., 2017). The Kop CANU assembly was polished with Pilon (version 1.22; parameter settings: fix = all) (Walker et al., 2014) to correct small errors based on high-quality 150 bp paired-end Illumina short reads (Table 1). A final round of polishing with Arrow (VariantCaller version 2.1.0) was performed to correct large structural errors, based on the raw PacBio reads that were aligned with Minimap2 (Li, 2018). Aub and Kop cultures have identical mitochondrial genomes (100% sequence identity) with only one 11 bp indel. The Aub 10× Genomics reads were aligned to the repeat-masked Kop assembly using the Longranger v2.1.6 (Zheng et al., 2016) software suite with the ALIGN pipeline. 58,350 SNPs were called by UnifiedGenotyper in the Genome Analysis Toolkit (GATK) (McKenna et al., 2010; DePristo et al., 2011). SNP positions in repetitive regions and variants outside the coverage depth threshold (120–500 bp) were filtered out using BEDTools v2.30.0 (Quinlan, 2014). A total of 11,523 homozygote SNPs between Aub and Kop were identified, and the percentage of fixed differences in the nuclear genome was estimated to be 0.0038%. To achieve the best assembly, these draft assemblies with different assemblers from both Aub and Kop samples were merged into a draft assembly using an assembly combination tool quickmerge v0.3.0 (Chakraborty et al., 2016). Potential bacterial contaminations were checked using a pipeline described in our previous research (Wang et al., 2020), and no bacteria contig contamination was discovered. The draft assembly was polished to yield a final high-quality assembly with the 10× Genomics Illumina short reads for indel correction using Pilon v1.23.0 (Walker et al., 2014). The final genome assembly was evaluated based on the N50 size of contigs and RNA-seq read mapping percentages, and genome completeness was assessed by BUSCO version 4.0.6 (Seppey et al., 2019). The BUSCO scores were calculated using arthropoda_odb10 with a total of 1,013 orthologs.

Summary statistics of the Muscidifurax raptorellus genome assemblies.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A