2.4. Variant prediction calling, SNP filtering and principal component analysis (PCA)

DT D.G. Teixeira
GM G.R.G. Monteiro
DM D.R.A. Martins
MF M.Z. Fernandes
VM V. Macedo-Silva
MA M. Ansaldi
PN P.R.P. Nascimento
MK M.A. Kurtz
JS J.A. Streit
MX M.F.F.M. Ximenes
RP R.D. Pearson
AM A. Miles
JB J.M. Blackwell
MW M.E. Wilson
AK A. Kitchen
JD J.E. Donelson
JL J.P.M.S. Lima
SJ S.M.B. Jeronimo
request Request a Protocol
ask Ask a question
Favorite

The GATK v. 3.3 suite of tools (DePristo et al., 2011) was used to realign reads in regions with insertion/deletions (indels) and to perform the variant calling through HaplotypeCaller under diploid organism assumption. Since there is no training dataset to use as a parameter for SNP filtering, we used GATK hard filters to exclude false positives. For this purpose, the filters were applied as described by GATK Best Practices and the RMSMappingQuality option ≥30. After the SNP filtering step, the variant data were gathered in a single file using VCFtools package (Danecek et al., 2011). SNPRelate was used to remove SNPs in linkage disequilibrium, with a sliding window of 5000 nucleotides and a threshold of 2. This dataset was used for a PCA using the R package SNPRelate (Zheng et al., 2012). This dataset was also used to obtain supportive data for population structure using the program Admixture (Alexander et al., 2009). The snpEFF package (Cingolani et al., 2012) was used for SNP variant annotation, and genome annotation files were retrieved from GeneDB (Logan-Klumpler et al., 2012).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A