2.1. SNP selection and bioinformatics

EF Ellika Faust
EJ Eeva Jansson
CA Carl André
KH Kim Tallaksen Halvorsen
GD Geir Dahle
HK Halvor Knutsen
MQ María Quintela
KG Kevin A. Glover
request Request a Protocol
ask Ask a question
Favorite

In order to find discriminant and divergent SNPs for the identification of nonlocal corkwing wrasse, we used 2b‐RAD sequence‐data from western Norway (import region) and Skagerrak‐Kattegat (export region). Western sample sequences were taken from Faust et al. (2018) and contained 40 individuals from Austevoll, the only region where the authors did not detect any escapees or potential hybrids. As a reference for the exported fish, we used 120 individuals from three locations in the Skagerrak‐Kattegat (Risør, Sandefjord and Kungsbacka). All raw sequences are available on NCBIs Sequence Read Archive (BioProject PRJNA702627). The unpublished sequences were sampled and processed in the same way as the ones from Austevoll using a modified version of 2b‐RAD (Wang et al., 2012) full procedure (Faust et al., 2018). All sequences were mapped using bowtie2 (Langmead & Salzberg, 2012) to the published Symphodus melops genome (Mattingsdal, 2017). Variant calling was done following the GATK pipeline (McKenna et al., 2010) using UnifiedGenotyper after realigning sequences around indels and recalibrating base quality (BQSR). Variant score quality was recalibrated (VQSR) using site identity across technical replicates as a training set. To ensure high confidence in genotypes and SNPs, we used vcftools (Danecek et al., 2011) filtering on quality by depth (QD < 2.0), strand bias (FS > 60, SOR > 2) and mapping quality (MQ < 40). Sites with more than 10% missing data and with a fraction of heterozygotes above 0.5 (possible lumped paralogs) were removed, leaving a total of 10 747 putative SNPs.

To select the most divergent SNPs between western and Skagerrak‐Kattegat individuals, we conducted pairwise comparisons between Austevoll (western Norway) and each of the three locations in Skagerrak‐Kattegat. A total of 387 SNPs, distributed over 270 contigs, were identified among the 500 highest F ST values in all three pairwise comparisons. Reading and converting between file formats was done using VcfR radiator (Knaus & Grünwald, 2016, 2017) and Radiator (Gosselin, 2019), and the package diveRsity (Keenan et al., 2013) was used to calculate pairwise F ST.

SNPs displaying FST values >0.4 (183 SNPs total) were used for SNP locus primer design and resulted in four assays with a total of 106 SNPs. Primer design, amplification and genotype calling were based on the Agena MassARRAY iPLEX Platform, as described by Gabriel et al. (2009). Selected 106 SNP loci were analysed in four assay groups (Table S1). Accuracy, efficiency and power of the four assays to correctly identify escaping individuals from the two populations and their potential offspring were estimated using the R package HYBRIDDETECTIVE (Wringe et al., 2017a). Genotype frequencies from the reference samples in Austevoll and Risør with 40 individuals each were used to simulate three replicates of three independent data sets with pure parents (Pure1 and Pure2), first‐ and second‐generation hybrids (F1 and F2), and backcrosses between F1 and pure parents (BC1 and BC2). The simulated data sets contained 288 individuals and were analysed using the R package parallelnewhybrid (Wringe et al., 2017b) and NEWHYBRIDS v. 1.1 (Anderson & Thompson, 2002), which estimates the posterior probability of each individual to belong to one of the six hybrid classes. The analysis was done using default priors and genotype proportions, with a burn‐in period of 50,000 iteration and 300,000 MCMC sweeps. In case of nonconvergent MCMC chains, simulations were re‐analysed. Power was estimated as the product of efficiency (correctly assigned individuals over the known individuals per class) and accuracy (correctly assigned individuals over individuals assigned to that class) as described in Wringe et al. (2017a). Simulations demonstrated a high efficiency (>94%), accuracy (>98%) and power (>94) to detect individuals from all of the six hybrid classes (Figure S1).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A