SNP and genotype calling
# The initial input files for this protocol are the outputs from RealignerTargetCreator and IndelRealigner in Genome Analysis Toolkit (GATK), generating for each sample a realigned BAM file.
Step 1: For each sample, use HaplotypeCaller in GATK to generate per-sample genomic variants call formats (GVCFs). The command is as follows:
gatk --java-options "-Xmx4g" HaplotypeCaller -R REFERENCE.fasta -I input.bam -O output.g.vcf.gz -ERC GVCF
Step 2: Then the GVCFs are passed to GenotypeGVCFs in GATK, generating a raw VCF file in which all samples have been jointly genotyped. The command is as follows:
gatk --java-options "-Xmx4g" GenotypeGVCFs -R REFERENCE.fasta -V input1.g.vcf.gz -V input2.g.vcf.gz … -O raw.vcf.gz
Note: The GATK4 GenotypeGVCFs tool can take only one input track. Options are 1) a single single-sample GVCF 2) a single multi-sample GVCF created by CombineGVCFs or 3) a GenomicsDB workspace created by GenomicsDBImport. A sample-level GVCF is produced by HaplotypeCaller with the `-ERC GVCF` setting.
Step 3: Filter SNPs using VCFTOOLS
vcftools --gzvcf raw.vcf.gz --maf 0.05 --min-meanDP 4 --max-meanDP 50 --max-missing 0.7--remove-filtered-all -- --minGQ 10 --min-alleles 2 --max-alleles 2 --remove-indels --recode -c | gzip -c > filterSNP.vcf.gz
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this
article to respond.