Data quality filtering and genotyping were conducted using the STACKS pipeline [63,64]. Process_radtags from Stacks v. 1.44 processed the forward and reverse reads for each sequencing lane to de-multiplex samples, quality filter the reads, and trim the reads to 85 bases. Since the SbfI site is found in both forward and reverse reads, process_radtags was run independently on these reads from each lane of sequencing, the SbfI site and barcode end identified. Data from the SbfI site were then combined into the R1 file for each sample, and the random fragmented sequence end concatenated into the R2 file for each sample. Following quality filtering, data were filtered to remove PCR clones using the clone_filter script from STACKS. Quality and clone filtered reads from each sample were aligned to the Oncorhynchus mykiss reference genome (NCBI: GCA_002163495.1 [39]) using default parameters in bwa [65]. Samtools was used to sort and index aligned reads from bwa, as well as remove unmapped and improper read pairs [66]. The resulting bam files were genotyped in STACKS (v. 2.2) using gstacks with default parameters. STACKS populations were used to collate genotypes across samples and populations, keeping only loci with a minimum of 65% of individuals genotyped in each population. In this case, we grouped all individuals into one population and therefore applied the filter to retain only loci genotyped in a minimum of 65% of the individuals. We used VCFtools [67] to filter the merged output file from the populations module in STACKS to remove non-biallelic sites, indels, sites with a minor allele-frequency <1%, sites with >10% missing data per SNP, and individuals with >20% missing data per sample. Due to presence of highly similar paralogs from the salmonid-specific genome duplication [68], we used HDplot to identify and remove possible paralogs using a combination of heterozygosity and read-ratio deviation [69]. Post-filtering, we retained 1125 individual Oncorhynchus mykiss (567 individuals from 15 sampling sites pre-dam removal and 558 individuals from four sampling sites post-dam removal) and 71,320 SNPs (Table 1). All further analyses, except for principal components analyses, were conducted on pre-dam and post-dam removal sample sets separately.
All sequenced samples arranged by population (AD, ID, SBLR, and BD) and sampling site, ordered from upstream (top) to downstream (bottom) divided by relative anadromous barrier location.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.