NGS data preprocessing

Noam Hadar; Grisha Weintraub; Ehud Gudes; Shlomi Dolev; Ohad S Birk

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

NGS data preprocessing

NH Noam Hadar

GW Grisha Weintraub

EG Ehud Gudes

SD Shlomi Dolev

OB Ohad S Birk

This method is extracted from research article: Database (Oxford), Jun 2023

GeniePool: genomic database with corresponding annotated samples based on a cloud data lake architecture

DOI: 10.1093/database/baad043

Ask a question

Favorite

Human publicly available WES raw data from SRA are obtained using the following parameters in the SRA search bar: ‘((((illumina[Platform]) AND homo sapiens[Organism]) AND WXS[Strategy]) AND “Homo sapiens”[orgn:__txid9606] AND cluster_public[prop] AND “biomol dna”[Properties])’. The obtained table contains both SRA download accessions and corresponding BioProject and BioSample IDs. Raw sequencing data are downloaded from SRA using sratoolkit.2.11.0 ‘prefetch’ command and then extracted using ‘fastq-dump’ command, including ‘--split-files’ option if data are paired-end sequencing. Raw data are cleaned using Trimmomatic-0.39 (12) and then aligned to hg38 (UCSC version) using Picard and BWA-MEM (13) following GATK 4.2.2.0 pipeline (14) to generate VCF files using ‘HaplotypeCaller’ function. To generate parallel hg19 VCFs, we used Picard’s LiftOverVcf function using UCSC’s hg38tohg19 chain file. Variant effect annotation is done using SnpEff (15) 5.0e. This pipeline is performed using our institutional high-performance computing infrastructure at Ben-Gurion University of the Negev. Output VCF files are uploaded to the AWS S3 bucket in a gzipped format.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol