Genome assembly

PW Pengjie Wang
JY Jiaxin Yu
SJ Shan Jin
SC Shuai Chen
CY Chuan Yue
WW Wenling Wang
SG Shuilian Gao
HC Hongli Cao
YZ Yucheng Zheng
MG Mengya Gu
XC Xuejin Chen
YS Yun Sun
YG Yuqiong Guo
JY Jiangfan Yang
XZ Xingtan Zhang
NY Naixing Ye
ask Ask a question
Favorite

We analyzed the genome size of the sequenced individuals by flow cytometry (BD FACSCalibur, BD Bioscience, USA) using tomato and maize as internal controls50. The CSS HD genome was assembled by incorporating sequencing data from PacBio circular consensus sequencing technology51 and the Hi-C method. First, HiFi reads were assembled with Hifiasm with the default parameters. We assembled two levels of chromosome-scale genomes, including a monoploid genome and an allele-defined haplotype-resolved assembly. Briefly, for monoploid assembly, although Purge_dups was contained in Hifiasm, we checked the read depth and filtered the primary contigs of the initial Hifiasm assembly by Purge_dups (v1.25) with the default parameters and evaluated the results by the Benchmarking Universal Single-Copy Orthologs (BUSCO) completeness27 and duplication score. Hi-C reads were assessed by the HiC-Pro (v2.11.4) program52 and were uniquely mapped to contig assemblies. Meanwhile, Juicer tools (v1.11.08)53 and 3D-DNA pipelines (v180114)54 were used to detect and correct misassembled contigs. To distribute contigs into the appropriate groups, we aligned the set of Hi-C corrected contigs against the “Shuchazao” chromosome-scale assembly using RaGOO (v1.1). Finally, the ALLHiC optimize algorithm (v0.9.13) was used to adjust the order and orientation of contigs in each group. We merged the haplotig sequence and alternative assembly to redo Purge_dups (v1.25) with the cutoff parameter “2 6 10 12 20 70”. The purged primary contigs and alternative contigs were merged and regarded as a draft contig assembly for the monoploid genome. The resulting contigs were subjected to haplotype phasing using the ALLHiC algorithm with default parameters, and the monoploid genome sequences were selected as a reference to identify allelic contigs. Chromosome-level haplotype A and haplotype B of CSS HD were fully resolved and released. In addition, we calculated the heterozygosity through GenomeScope255 with 33 k-mers. Chromosome localization and collinearity analysis of genes were visualized in TBtools56.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A