2.2 DeepVariant and GLnexus

Taedong Yun; Helen Li; Pi-Chuan Chang; Michael F Lin; Andrew Carroll; Cory Y McLean

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.2 DeepVariant and GLnexus

TY Taedong Yun

HL Helen Li

PC Pi-Chuan Chang

ML Michael F Lin

AC Andrew Carroll

CM Cory Y McLean

This method is extracted from research article: Bioinformatics, Jan 2021

Accurate, scalable cohort variant calls using DeepVariant and GLnexus

DOI: 10.1093/bioinformatics/btaa1081

Request a Protocol

Ask a question

Favorite

We used DeepVariant v0.8.0 and the publicly released WGS model v0.8.0 to generate the single-sample variant calls for all samples in GIAB, CSER and PAGE. A single-line command to run DeepVariant in a pre-built docker container is available on the DeepVariant public repository (https://github.com/google/deepvariant). The DeepVariant calls for the sample from 1000 Genomes Project were generated using a custom model trained exclusively for the NovaSeq platform. Both the custom model and all single-sample DeepVariant calls generated by it are publicly available, as described in ‘Availability and Implementation’ in the Abstract.

To merge and evaluate the multiple cohorts in parallel, we deployed the open-source GLnexus algorithm using Apache Beam (https://beam.apache.org) on Google internal compute clusters. The Beam-based pipeline abstracts away the need to specify multi-threading on a single machine (as is done in the open-source GLnexus), and deploys hundreds of different parameter configurations on thousands of CPUs. The pipeline produces identical scientific results to the open-source GLnexus v1.2.2 when run with the same parameters. To both ensure our train/test dataset split is non-overlapping and limit computational costs of this study, we used separate individual chromosomes for pipeline optimization and evaluation. For consistency with previous studies (Lin et al., 2018; Poplin et al., 2018b), we used chromosome 2 to optimize the pipeline, and computed final performance benchmarks separately on chromosome 20. The optimized DeepVariant parameters from this study, which are discussed in detail in Results, are included in open-source GLnexus v1.2.2 or later versions in two presets: –config DeepVariantWGS for WGS and –config DeepVariantWES for WES. After installing the GLnexus command line tool, users can merge DeepVariant calls in these optimized setups using a single command like

$ glnexus_cli –config DeepVariantWGS \

deepvariant.*.g.vcf.gz > cohort.bcf

In addition to parameter optimization, we modified the internals of both DeepVariant and GLnexus for better communication between the tools and to improve the joint-calling process. All modifications were incorporated into open-sourced DeepVariant (v0.8.0 or later) and GLnexus (v1.2.2 or later).

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol