SNP calling was performed using the software Axiom Analysis Suite (AAS) from Affymetrix®, following the instructions provided in the Axiom Analysis Suite 3.1 user guide3. Before SNP calling of the 874 samples of the combined data set (EUCLEG and NJAU), we first checked the performance of the 355K SoySNP microarray on the 480 EUCLEG samples separately. This step was considered necessary, as the 355K SoySNP microarray was developed using the NJAU collection and might perform sub-optimally with plant materials of a different origin. In brief, the Affymetrix® Power Tools (APT) software package, version 1.15.0 implemented in AAS performed sample quality control based on 20,000 non-polymorphic probe sets and considering the parameters Dish Quality Control (DQC; determines the intensity of contrast between signal and noise) and Sample Call Rate (QC-CR; refers to the ratio of genotype-called SNPs to attempted SNPs in a sample). Based on criteria DQC > 0.82 and QC-CR ≥ 97, AAS filtered out four poor-quality samples. The R package SNPolisher version 1.3.6.7 implemented in AAS was used for SNP calling using 609,883 probe sets targeting 355,595 SNPs. Its Ps_Classification function classified the SNPs/probe sets into six categories based on the following SNP QC metrics: call rate (CR) ≥97%, Fisher’s linear discriminant (FLD) ≥ 3.6, heterozygous strength offset (HetSO) ≥−0.1, and homozygote ratio offset (HomRO) ≥0.3 for one-cluster or two-cluster SNPs or ≥−0.9 for three-cluster SNPs. A summary of the SNP classification was obtained for the 476 good quality samples of the EUCLEG collection. We compared this summary with the SNP classification summary obtained from the NJAU collection by Wang et al. (2016).
In a second step, we genotyped the combined dataset (EUCLEG and NJAU), starting from the raw fluorescence data following the procedure described above. In the quality control step, 69 poor quality samples were excluded. SNP calling was performed on the remaining 805 good quality samples. After genotyping, low quality SNPs based on SNP QC metrics were excluded and a final genotyping dataset containing 229,557 SNPs was generated. This dataset was divided in three subsets for further processing: EUCLEG, NJAU-Wild and NJAU-Cultivated, comprising 477, 82, and 246 accessions, respectively. For the divisions NJAU-Wild and NJAU-Cultivated, we refer to Wang et al. (2016). In further analyses, we considered either the whole collection (EUCLEG and NJAU) or some of these subsets.
For some of the downstream analyses, the genomic coordinates of the SNPs were required. Because during the development of the 355K SoySNP microarray SNP coordinates were assigned using an older version of the soybean reference genome sequence (Glyma.Wm82.a1), we positioned SNPs onto the novel reference genome sequence Glyma.Wm82.a2 (with improved assembly and gene annotation quality compared to Glyma.Wm82.a1). Finally, the 224,993 SNPs corresponding to probes that could be positioned onto the 20 soybean chromosomes using a blast query were considered for further analyses.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.