The apples (Malus spp.) investigated here are from the USDA apple germplasm repository in Geneva, NY, USA. Leaf tissue was collected from 1949 accessions. The countries of origin of the accessions are indicated in Supplementary Fig. S1 and Supplementary Table S1. DNA was extracted from these accessions using commercial extraction kits. Genotyping-by-sequencing (GBS) libraries were generated according to Elshire et al.11. A visual overview of all data processing and analysis steps described below is provided in Supplementary Fig. S2.

The samples were processed with two different restriction enzymes (ApeKI, PstI/EcoT22I) in separate GBS libraries and were sequenced using Illumina Hi-Seq 2000 technology (96 samples per lane) at Cornell University (Ithaca, New York, US) across 42 lanes generating 100-bp single-end reads. The DNA sequence data are as NCBI BioProject PRJNA636391. Reads that failed Illumina’s ‘chastity filter’ were removed and remaining reads were aligned to the Malus x domestica GDDH13 v1.1 reference genome12 using Burrows-Wheeler aligner tool v0.7.1213 and the Tassel version 5 pipeline14. Kmerlength was set to 82 for ApeKI and 89 for PstI-EcoT22I, and the minMAF was set to 0.01 during the DiscoverySNPCallerPluginV2 step. Non-biallelic sites and indels were removed using VCFtools v0.1.1415. VCF files for both enzymes were then merged using a custom Perl script that preferentially kept SNPs called from the PstI-EcoT22I libraries in cases where SNPs were identified from both restriction enzymes. The resulting data set contained 1949 accessions and 1,103,605 SNPs. Mean read depth per individual, per SNP, and the proportion of heterozygotes per site were calculated using VCFtools v0.1.1415 (Supplementary Fig. S3).

Missing genotypes in the VCF files were imputed using LinkImputeR16 with the following filters: max missingness of 0.30, minor allele frequency (MAF) of 0.01, minimum depth of 8, and Hardy–Weinberg equilibrium threshold of p = 0.0001. The resulting data set had an imputation accuracy of 0.9778 and a correlation value of 0.8764, with 1598 accessions and 68,392 SNPs remaining.

The data set was filtered to only include accessions in the USDA apple germplasm collection that were relevant to modern apple development, which includes accessions labelled as M. domestica (N = 1154), M. sieversii (N  = 195), Malus (L.) baccata Borkh. (N = 40), Malus floribunda Sieb. ex Van Houtte (N = 17), Malus orientalis Uglitzk. (N = 17), and M. sylvestris (N = 15). Next, the VCF file was converted using PLINK v1.0717,18 and filtered for MAF 0.01, resulting in 1438 accessions and 47,925 SNPs.

Our genotype calling pipeline assumes all accessions are diploid (2x), and we, therefore, aimed to exclude triploid accessions (3x). Previous work has confirmed that triploids can be identified from GBS data due to their excessive heterozygosity19. We examined heterozygosity by individual, and contrasted these values with labels available in the USDA germplasm database for 2x, 3x and 4x accessions (Supplementary Fig. S4). Using a Tukey test, we determined that accessions labelled as 3x were significantly more heterozygous than 2x (p < 1 × 10−15) or 4x (p = 1.385 × 10−4) individuals. There was no significant difference in heterozygosity between 2x and 4x accessions, indicating that the accessions labelled as 4x were likely all autotetraploids and could, therefore, be treated as diploid for the purposes of genotype calling and all downstream analyses. The mean proportion of heterozygous genotypes was 0.191 for accessions labelled as 2x, 0.226 for 3x accessions, and 0.182 for 4x accessions. Based on these results we removed 168 accessions with heterozygosity >0.21 that we inferred to be triploid, including 28 labelled as 2x, 51 labelled as 3x, and 2 labelled as 4x. There were 62 accessions labelled as 3x that were not removed using this filter. The majority of the accessions removed (N = 147) were labelled as M. domestica. After filtering, 1270 accessions remained.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.