2.2 DeepVariant and GLnexus

TY Taedong Yun
HL Helen Li
PC Pi-Chuan Chang
ML Michael F Lin
AC Andrew Carroll
CM Cory Y McLean
request Request a Protocol
ask Ask a question
Favorite

We used DeepVariant v0.8.0 and the publicly released WGS model v0.8.0 to generate the single-sample variant calls for all samples in GIAB, CSER and PAGE. A single-line command to run DeepVariant in a pre-built docker container is available on the DeepVariant public repository (https://github.com/google/deepvariant). The DeepVariant calls for the sample from 1000 Genomes Project were generated using a custom model trained exclusively for the NovaSeq platform. Both the custom model and all single-sample DeepVariant calls generated by it are publicly available, as described in ‘Availability and Implementation’ in the Abstract.

To merge and evaluate the multiple cohorts in parallel, we deployed the open-source GLnexus algorithm using Apache Beam (https://beam.apache.org) on Google internal compute clusters. The Beam-based pipeline abstracts away the need to specify multi-threading on a single machine (as is done in the open-source GLnexus), and deploys hundreds of different parameter configurations on thousands of CPUs. The pipeline produces identical scientific results to the open-source GLnexus v1.2.2 when run with the same parameters. To both ensure our train/test dataset split is non-overlapping and limit computational costs of this study, we used separate individual chromosomes for pipeline optimization and evaluation. For consistency with previous studies (Lin et al., 2018; Poplin et al., 2018b), we used chromosome 2 to optimize the pipeline, and computed final performance benchmarks separately on chromosome 20. The optimized DeepVariant parameters from this study, which are discussed in detail in Results, are included in open-source GLnexus v1.2.2 or later versions in two presets: –config DeepVariantWGS for WGS and –config DeepVariantWES for WES. After installing the GLnexus command line tool, users can merge DeepVariant calls in these optimized setups using a single command like

$ glnexus_cli –config DeepVariantWGS \

deepvariant.*.g.vcf.gz > cohort.bcf

In addition to parameter optimization, we modified the internals of both DeepVariant and GLnexus for better communication between the tools and to improve the joint-calling process. All modifications were incorporated into open-sourced DeepVariant (v0.8.0 or later) and GLnexus (v1.2.2 or later).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A