SNP discovery by GBS

FT F. Taranto
ND N. D’Agostino
BG B. Greco
TC T. Cardi
PT P. Tripodi
request Request a Protocol
ask Ask a question
Favorite

GBS involves five major steps: sample preparation, library assembly, sequencing, SNP calling and diversity analysis. Genomic DNA was extracted using the DNeasy® Plant Mini Kit (QIAGEN, Germany). DNA quality parameters as well as concentration were measured by absorbance values at 260 and 280 nm respectively, using a UV-Vis spectrophotometer (ND-1000; NanoDrop, Thermo Scientific, Wilmington, DE, USA). A trial DNA digestion was carried out using the 6-base-cutter HindIII. GBS was performed at the Institute of Genomic Diversity (Cornell University, Ithaca, NY, USA) as described by Elshire [17]. Genome complexity was reduced by digesting individual sample genomic DNA with ApeKI, a methylation sensitive restriction enzyme. The resultant fragments from all samples were directly ligated to a pair of enzyme-specific adapters, and were combined into pools. PCR amplification was carried out to generate the GBS library, which was submitted to a single Illumina HiSeq 2500 run (Illumina Inc., USA). The sequencing produced millions of reads split across multiple FASTQ files. All unique sequence tags from each sequence file were captured and then collapsed to generate a master tag file. Master tags were aligned to the reference CM334 genome available at http://peppergenome.snu.ac.kr [33] using the Burrows-Wheeler Aligner (BWA) tool (version 0.7.8-r455) with default settings. The GBS analysis pipeline implemented in TASSEL (version 3.0.166) was used to call SNPs [34]. SNP calling implemented within the TASSEL-GBS pipeline produced a raw HapMap genotypic data file. A two-step filtering procedure was used in order to filter high quality SNPs. Initial filtering was performed with settings for minimum minor allele frequency (mnMAF = 0.01), minimum taxa coverage (mnTCov = 0.1) and minimum site coverage (mnSCov = 0.8). The genotypes with a large number of missing data were filtered out based on minimum minor allele count (mnMAC = 10). SNPs that passed either the specified minimum minor allele count (mnMAC) or frequency (mnMAF), were kept for downstream analysis. Subsequently, we filtered out high quality SNP markers using TASSEL-GBS with the following parameters: minimum count 150, minimum frequency 0.01 and Maximum Frequency 1.0.

Read depth and coverage data were obtained using custom R scripts and BEDTools [35]. In order to identify the peri-centromeric regions of the 12 Capsicum chromosomes we used the pepper COSII genetic map [36]. For each chromosome, peri-centromeric flanking markers were selected and their position was defined from the information available at the Sol Genomics Network [37]. In Additional file 1: Table S3 the COSII markers, used to define the peri-centromeric regions, are reported. Vcf-annotate form the VCFtools (0.1.13) was used to count how many SNPs fall within coding regions. All sequences were submitted to the NCBI Short Read Archive (SRA; http://www.ncbi.nlm.nih.gov/sra/) under the accession number SRP070992.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A