To confirm the accuracy, sensitivity, and specificity of the imputation, we have utilized the initial release of whole genome sequencing data in the MVP study. This data was collected and sequenced with a focus on elucidating the pathophysiology of COVID-19 infection from their genomes. The sequencing was performed using Illumina’s Sequencing by Synthesis technology to a targeted depth of 30x. Individual variant calling from 10,413 samples was performed on the cloud-based data and task management framework Trellis59. In summary, reads were aligned with BWA-MEM (version 0.7.15) on the GRCh38 reference genome, and variant calling was performed in GATK 4.1.0.0 using the haplotypeCaller function. Genotypes of all samples were aggregated into a matrix table using gVCF Combiner implemented in Hail55 for additional quality-control steps. In summary, we retained high-quality genotypes by applying the following steps: I. Variants in low complexity regions and ENCODE blacklist regions were removed. II. Variants within regions of atypical sequencing depth (DP < 10 or DP > 400) were discarded. For haploid genotypes on sex chromosomes, a minimum DP > 5 was required. III. Genotypes were retained if sites were: a. Homozygous reference with Genotype Quality > 20, or b. Alternate homozygotes with Phred-scaled likelihood of the genotype for reference homozygotes (PL[0]) > 20, and the ratio of depth for alternate alleles (DPALT) to total depth at the site (DPALT/DPSITE) > 0.9, or, c. Heterozygous with PL[0] > 20, and the ratio of the sum of DPALT and depth for reference alleles (DPREF) to DPSITE [(DPALT + DPREF)/DPSITE] > 0.9, and DPALT/DPSITE > 0.2. III. Variants with high missing rate (> 0.8) and population wide PHardy-Weinberg equilibrium ≤ 1 × 10−5 for variants with minor allele frequency (MAF) ≥ 1%, and PHardy-Weinberg equilibrium ≤ 1 × 10−6 for variants with MAF < 1% were discarded. IV. Samples with low call rate (≤ 0.97) or low overall sequencing coverage (mean depth ≤ 18) were excluded. This processing resulted in 187,790,701 variants in 10,390 individuals.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.