The GATK v. 3.3 suite of tools (DePristo et al., 2011) was used to realign reads in regions with insertion/deletions (indels) and to perform the variant calling through HaplotypeCaller under diploid organism assumption. Since there is no training dataset to use as a parameter for SNP filtering, we used GATK hard filters to exclude false positives. For this purpose, the filters were applied as described by GATK Best Practices and the RMSMappingQuality option ≥30. After the SNP filtering step, the variant data were gathered in a single file using VCFtools package (Danecek et al., 2011). SNPRelate was used to remove SNPs in linkage disequilibrium, with a sliding window of 5000 nucleotides and a threshold of 2. This dataset was used for a PCA using the R package SNPRelate (Zheng et al., 2012). This dataset was also used to obtain supportive data for population structure using the program Admixture (Alexander et al., 2009). The snpEFF package (Cingolani et al., 2012) was used for SNP variant annotation, and genome annotation files were retrieved from GeneDB (Logan-Klumpler et al., 2012).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
 Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.