2.3. Read Alignment and Filtering

Alexandra K. Fraik; John R. McMillan; Martin Liermann; Todd Bennett; Michael L. McHenry; Garrett J. McKinney; Abigail H. Wells; Gary Winans; Joanna L. Kelley; George R. Pess; Krista M. Nichols

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.3. Read Alignment and Filtering

AF Alexandra K. Fraik

JM John R. McMillan

ML Martin Liermann

TB Todd Bennett

MM Michael L. McHenry

GM Garrett J. McKinney

AW Abigail H. Wells

GW Gary Winans

JK Joanna L. Kelley

GP George R. Pess

KN Krista M. Nichols

This method is extracted from research article: Genes (Basel), Jan 2021

The Impacts of Dam Construction and Removal on the Genetics of Recovering Steelhead (Oncorhynchus mykiss) Populations across the Elwha River Watershed

DOI: 10.3390/genes12010089

Request a Protocol

Ask a question

Favorite

Data quality filtering and genotyping were conducted using the STACKS pipeline [63,64]. Process_radtags from Stacks v. 1.44 processed the forward and reverse reads for each sequencing lane to de-multiplex samples, quality filter the reads, and trim the reads to 85 bases. Since the SbfI site is found in both forward and reverse reads, process_radtags was run independently on these reads from each lane of sequencing, the SbfI site and barcode end identified. Data from the SbfI site were then combined into the R1 file for each sample, and the random fragmented sequence end concatenated into the R2 file for each sample. Following quality filtering, data were filtered to remove PCR clones using the clone_filter script from STACKS. Quality and clone filtered reads from each sample were aligned to the Oncorhynchus mykiss reference genome (NCBI: GCA_002163495.1 [39]) using default parameters in bwa [65]. Samtools was used to sort and index aligned reads from bwa, as well as remove unmapped and improper read pairs [66]. The resulting bam files were genotyped in STACKS (v. 2.2) using gstacks with default parameters. STACKS populations were used to collate genotypes across samples and populations, keeping only loci with a minimum of 65% of individuals genotyped in each population. In this case, we grouped all individuals into one population and therefore applied the filter to retain only loci genotyped in a minimum of 65% of the individuals. We used VCFtools [67] to filter the merged output file from the populations module in STACKS to remove non-biallelic sites, indels, sites with a minor allele-frequency <1%, sites with >10% missing data per SNP, and individuals with >20% missing data per sample. Due to presence of highly similar paralogs from the salmonid-specific genome duplication [68], we used HDplot to identify and remove possible paralogs using a combination of heterozygosity and read-ratio deviation [69]. Post-filtering, we retained 1125 individual Oncorhynchus mykiss (567 individuals from 15 sampling sites pre-dam removal and 558 individuals from four sampling sites post-dam removal) and 71,320 SNPs (Table 1). All further analyses, except for principal components analyses, were conducted on pre-dam and post-dam removal sample sets separately.

All sequenced samples arranged by population (AD, ID, SBLR, and BD) and sampling site, ordered from upstream (top) to downstream (bottom) divided by relative anadromous barrier location.

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol