Variant calling

Claudio Urra; Dayan Sanhueza; Catalina Pavez; Patricio Tapia; Gerardo Núñez-Lillo; Andrea Minio; Matthieu Miossec; Francisca Blanco-Herrera; Felipe Gainza; Alvaro Castro; Dario Cantu; Claudio Meneses

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Variant calling

CU Claudio Urra

DS Dayan Sanhueza

CP Catalina Pavez

PT Patricio Tapia

GN Gerardo Núñez-Lillo

AM Andrea Minio

MM Matthieu Miossec

FB Francisca Blanco-Herrera

FG Felipe Gainza

AC Alvaro Castro

DC Dario Cantu

CM Claudio Meneses

This method is extracted from research article: G3 (Bethesda), Jul 2023

Identification of grapevine clones via high-throughput amplicon sequencing: a proof-of-concept study

DOI: 10.1093/g3journal/jkad145

Request a Protocol

Ask a question

Favorite

The raw sequences were analyzed using FastQC v0.11.7 (Andrews 2010), followed by a coverage standardization of 20×. To do this, 137,372,000 reads were kept from each clone genome in CH, 119,020,000 in SB, 124,600,000 from CS, and 103,685,230 in M clones using the software seqtk v1.3-r106 (https://github.com/lh3/seqtk). Trimming was performed using Trim-galore software v0.5.0 with PHRED quality threshold Q > 25 (Krueger 2012). Each clone genome was mapped to the genome assembly of its cultivar using the primary assembly. The genome mapping was performed with bwa-mem software v0.7.17-r1188 (Li et al. 2008). Before the variant calling process, the mapped genome sequence reads were sorted using Samtools software v1.9 (Li et al. 2009) and prepared with Picard-tools software v2.16.1 using the AddOrReplaceReadGroups, MarkDuplicates, and CleanSam commands (https://broadinstitute.github.io/picard/).

We used GATK HaplotypeCaller v4.0.9.0 (Mckenna et al. 2010) to perform the variant calling of each clone genome using the primary assembly of SB and CH clones (Zhou et al. 2019). In CS, the primary assembly version was the one described by Chin et al. 2016, while in M clones, it was the primary assembly described by Massonnet et al. 2020. Two different variant calling protocols were used: first on each sample individually and second with a joint genotyping step combining all samples following the GATK best practices (available at https://gatk.broadinstitute.org). A variant quality filter of Q > 100 was applied for both protocols. The global distribution of variants detected in all clones was evaluated by a Circos plot (Krzywinski et al. 2009). Variants and gene densities were calculated in 100-kbp windows for plotting. Only variants consistently present in each clone's replicates were used for principal component analysis (PCA). To identify clone-specific variants, we extracted variants that were present in all replicates of a clone and absent in all the other samples.

PCA plots were generated in R v3.5.3 with the R packages factoextra v1-0-5 and FactoMineR v1.4.1. Predicted functional effects were estimated using the software SnpEff v4.3t (Cingolani et al. 2012).

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol