Variant analysis

FP Fen Pan
HZ Hong Zhang
XD Xiaoyan Dong
WY Weixing Ye
PH Ping He
SZ Shulin Zhang
JZ Jeff Xianchao Zhu
NZ Nanbert Zhong
ask Ask a question
Favorite

High-quality sequence reads were mapped to the S. pneumoniae R6 strain reference genome (GenBank accession no. ASM704v1) using BWA (version 0.7.12-r1039).16 The alignments were improved using the Picard package (http://sourceforge.net/projects/picard/) with the following two commands: the “FixMateInformation” command was used to ensure that all mate-pair information was in sync between each read and its mate pair, and the “MarkDuplicates” command was used to mark potential PCR duplicates. Where multiple read pairs had identical external coordinates, only the pair with the best mapping quality was retained; the others were marked as PCR duplicates. We then undertook a local realignment of the mapped reads around indels using the GATK package in two steps: the “RealignerTargetCreator” command was used to determine suspicious intervals that probably need realigning, and the “IndelRealigner” command was used for realignment of such intervals. After alignment, we carried out variant calling using the Bayesian approach as implemented in the GATK package (https://software.broadinstitute.org/gatk/). The variants were further filtered according to the following criteria: RMS mapping quality of ≥25, site quality score of ≥30, variant confidence/quality by depth of ≥2, ≥16 reads covering each site with eight reads mapping to each strand, and the reads covering a major variant were at least five times greater than that of the minor variant. Sites that failed these criteria in any strain were removed from the analysis.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A