These and other analyses were performed on the UC-Riverside High Performance Computing Cluster, the Galaxy Platform [93], or dedicated local workstations. To identify markers for linkage analysis, Illumina reads were trimmed using Sickle (github.com/najoshi/sickle) using default settings. The reads were then aligned to Stitch7 using BWA-mem [28] and variants called using GATK best practices [94], retaining all SNPs and indels ≤3 nt. Markers were generated to resemble a pseudo-testcross (Aa × AA or AA × Aa) and filtered to include those showing Mendelian segregation (X2 ≤0.05). To reduce miscalls resulting from low sequence coverage, particularly homozygous calls made when heterozygosity is the true state, we imputed correct genotypes when possible. To enable this, genotypes of each parent were split into two VCF files consisting of markers that were heterozygous in parent1 and homozygous in parent2, and vice versa. The VCF files were then subsampled with vcftools [95] to ensure at least a 5 kb gap between markers, resulting in 23,661 markers between the four parents. Phases were then determined by Beagle 4.0 [96], using a window of 1000 and an overlap of 300. For parents 1306, 618, 6629, and 550, we removed 758, 567, 246, and 271 markers that showed unexpected segregation ratios, respectively, leaving a total of 22,010 phased markers. Genotype-corrector [97] was then used to identify miscalls and impute more parsimonious genotypes using default parameters and an F2 population type, excluding contigs with fewer than 15 markers. This was applied to 1,380,769 genotypes across the crosses, resulting in 15,888 corrections mostly (85%) involving changes from homozygous to heterozygous states. After incorporating markers modified by Genotype-corrector and adding some additional contigs from the FALCON assembly, 6,993 markers were identified and passed to JoinMap 5.0 (Kyasma, Wageningen, Netherlands).
JoinMap generated independent genetic maps for each haplotype of the four parents, resulting in eight genetic maps. Initial genetic sizes were bloated due to aneuploidy and errors not modified by Genotype-corrector. Consequently, a first round of corrections addressed miscalls due to aneuploidy, which can force genotypes to exist as completely hetero- or homozygous across a contig regardless of haplotype phase. A second round focused on miscalls at the edges of contigs. A third round removed miscalls inside small contigs which had been excluded previously from Genotype-corrector. A final round eliminated 770 markers placed more than 30 cM from the nearest marker. The eight polished genetic maps, which incorporated 6,166 markers, were then assembled into chromosomes using ALLMAPS [98], adding 100 Ns between nonoverlapping contigs. All Illumina sequencing fastq files were subsequently remapped to the chromosome-scale assembly, and all heterozygous polymorphisms for downstream analyses were recalled via GATK [99] as executed with Stitch7.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.