Advanced Search
Published: Apr 20, 2018 DOI: 10.21769/BioProtoc.2805 Views: 31445
Reviewed by: Beatrice Li
Abstract
Recent advances in Next Generation Sequencing (NGS) technologies have given an impetus to find causality for rare genetic disorders. Since 2005 and aftermath of the human genome project, efforts have been made to understand the rare variants of genetic disorders. Benchmarking the bioinformatics pipeline for whole exome sequencing (WES) has always been a challenge. In this protocol, we discuss detailed steps from quality check to analysis of the variants using a WES pipeline comparing them with reposited public NGS data and survey different techniques, algorithms and software tools used during each step. We observed that variant calling performed on exome and whole genome datasets have different metrics generated when compared to variant callers, GATK and VarScan with different parameters. Furthermore, we found that VarScan with strict parameters could recover 80-85% of high quality GATK SNPs with decreased sensitivity from NGS data. We believe our protocol in the form of pipeline can be used by researchers interested in performing WES analysis for genetic diseases and any clinical phenotypes.
Keywords: Whole exome sequencingBackground
Next Generation Sequencing (NGS) technologies have paved the way for rapid sequencing efforts to analyze a wide number of samples. From the whole genome to transcriptome to exome, it has changed the way we look at nonspecific germline variants, somatic mutations, structural variant besides identifying associations between a variant and human genetic disease (Singleton et al., 2011). This can help understand the complex genetic disorders to get better diagnosis and assess disease risk. The analysis of exome sequencing data to find variants, however still poses multiple challenges. For example, there are several commercial and open source pipelines but configuring (Pabinger et al., 2014; Guo et al., 2015) them in terms of benchmarking and optimizing them is a time-consuming process. Among the steps, viz. quality check, alignment, recalibration, variant calling, variant annotation, one needs to reach consensus on the set of tools following which one’s output should be fed as other tool’s input (Stajich et al., 2002; Gentleman et al., 2004; Chang and Wang, 2012). While integrating, it would be appropriate to check and use the tools before reproducing and maintaining highly heterogeneous pipelines (Hwang et al., 2015). In this protocol, we discuss the steps for whole exome sequence (WES) analyses and its pipeline to identify variants from exome sequence data. Our pipeline includes open source tools that include a number of tools from quality check to variant calling (see Software section).
Equipment
Software
All the software can be downloaded/used from following locations:
Procedure
Category
Systems Biology > Genomics > Exome capture
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Share
Bluesky
X
Copy link