2.5. DNA extraction and genotyping

SK Suzanne J. Kelson
MM Michael R. Miller
TT Tasha Q. Thompson
SO Sean M. O’Rourke
SC Stephanie M. Carlson
request Request a Protocol
ask Ask a question
Favorite

We conducted genetic analyses on all of the tissue samples collected in 2014. For 2015–2017 samples, we included a subset of ~50% of the samples, where every‐other study pool was included in the final analysis. We chose to subset the samples in the later years after preliminary analyses from 2014 revealed consistent results with a smaller number of samples. In total, we analysed n = 4,517 fish. For analyses around changing genotype frequencies, we focused on n = 3,081 fish that were captured systematically during electrofishing surveys. A breakdown by year, location, sample pool and age class for these fish is given in Table Table1.1. Raw sequence data are available at NCBI, SRA accession: PRJNA599015.

Number of pools and fish that were included in genetic samples in 2014–2017 by sample location

We conducted DNA extractions and restriction site‐associated DNA capture (RAD capture, or RAPTURE) using the methods and bait sets described in Ali et al. (2016). Briefly, DNA was extracted from caudal fin tissue using a bead‐based protocol, and SbfI RAD libraries were prepared and captured through hybridization with 500 unique RAPTURE baits distributed across all 29 chromosomes in the O. mykiss genome. We used an Illumina HiSeq to sequence libraries using paired‐end 100‐bp (2014 samples) or 150‐bp reads (2015–2017 samples). We demultiplexed sequence data using custom scripts (Ali et al., 2016) and used the MEM algorithm of the burrows–wheeler aligner (bwa; Li & Durbin, 2009) with standard parameters to align sequences to a rainbow trout genome assembly (https://www.ncbi.nlm.nih.gov/assembly/GCF_002163495.1/). We used samtools (Li et al., 2009) to filter alignments (unmapped reads, supplementary alignments and nonprimary alignments were removed, and only properly‐paired reads were retained), sort alignments, remove PCR (polymerase chain reaction) duplicates (using both samtools [rmdup] and picard tools [MarkDuplicates], https://broadinstitute.github.io/picard/) and index binary alignment map files (see Table S2 for number of reads retained at each step).

We used Analysis of Next Generation Sequencing Data (angsd) for all RAPTURE sequencing data analyses (Korneliussen, Albrechtsen, & Nielsen, 2014). We inferred major and minor alleles of sites with a high probability of being variable (SNP p < 1e−6) from genotype likelihoods. We estimated allele frequencies assuming a fixed major but unknown minor allele (Kim et al., 2011). Sites were included if they had a minor allele frequency >0.05, and had data in a minimum of 50% of the samples. From here, we created two sets of genotype files for analyses, one that could be used for PCAs and include a maximum sample size without a bias in data quality per individual, and another that could be used for descriptive genetics. For the first genotype type, we used a single read sampling approach, where, for each individual, a single base from each site passing the above filters was randomly sampled and used for downstream analyses. This approach creates an “identify by state” (IBS) matrix and mediates the effects of coverage difference (number of sequence reads) between individuals and facilitates the use of samples with low coverage, thus allowing a larger number of samples to be included in downstream analyses than is possible with other approaches (see also Kelson, Miller, Thompson, O'Rourke, & Carlson, 2019). We used a discriminant analysis of principal components (DAPC) (Jombart, Devillard, & Balloux, 2010), on the IBS matrix with only SNPs on Omy05 (n = 415 SNPs) to assign individuals to migratory, heterozygous or resident genotype groups (described further in Kelson et al., 2019). Second, we called genotypes for all SNPs located on the 500 RAPTURE baits (i.e., SNPs that were enriched during sequencing and therefore had relatively high coverage) using a uniform prior and posterior probability cutoff of 0.95 and refer to this approach as “called genotypes” (n = 473 SNPs). We used the called genotypes to calculate metrics of genetic diversity (described below). Genotype data sets used for analyses in this paper are available on Dryad (Kelson et al., 2020).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A