Calling of small variants

MS Miquel Àngel Schikora-Tamarit
TG Toni Gabaldón
request Request a Protocol
ask Ask a question
Favorite

PerSVade’s small variant calling pipeline (module “call_small_variants”) uses three alternative methods (GATK Haplotype Caller (HC) [67] (v4.1.2), freebayes (FB) [66] (v1.3.1), and / or bcftools (BT) [68] (v1.9)) to call and filter single-nucleotide polymorphisms (SNP) and small insertions/deletions (IN/DEL) in haploid or diploid configuration (specified with the -p option). The input is the bam file generated by the “align_reads” module. This module defines as high-confidence (PASS) variants those that are in positions with a read depth above the value provided with --min_coverage, with extra filters for HC and FB. For HC, it keeps as PASS variants those where (1) there are <4 additional variants within 20 bases; (2) the mapping quality is >40; (3) the confidence based on depth is >2; (4) the phred-scaled p-value is <60; (5) the MQRankSum is >−12.5, and (6) the ReadPosRankSum is >−8. For FB, perSVade “call_small_variants” keeps as PASS variants those where (1) quality is > 1 or alternate allele observation count is > 10, (2) strand balance probability of the alternate is > 0, (3) number of observations in the reverse strand is > 0, and (4) number of reads placed to the right/left of the allele are > 1. Then, bcftools (v1.10) and custom python code are used to normalize and merge the variants called by each software into a consensus variant set, which includes only those variants called with high-confidence by N or more algorithms This results in one .vcf file with the high-confidence variants for each N. Note that this .vcf file only keeps variants for which the fraction of reads covering the alternative allele is above the value provided with --min_AF (which may be 0.9 for haploids or 0.25 for diploids). For diploid calls, it defines the genotype with the strongest support (the one called by most programs). In addition, the quality of each variant is calculated from the mean of the three algorithms. Beyond the filtered variant calls, this module writes a tabular file with all the raw variants with various metadata columns (i.e., the programs that called the variant), which can be used to apply a custom filtering of the variants.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A