Paired-end reads were trimmed with Trimmomatic v.0.36 (57) using the following parameters: LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:30. After trimming, a total of 14.1 million reads were retained, with an average of 706 (SD = 240) thousand reads (lengths = 30 to 126 bp) per pool. Trimmed FASTQ files were mapped against the L. saxatilis reference genome from a single Crab ecotype individual [N50, 44,284 bp; NG50 (based on genome size 1.35 Gbp), 55,450 bp; maximum scaffold length, 608,273 bp; total number of contigs, 388,619]. Kofler et al. (58) reported that pool-seq data tend to show high levels of disagreement in population genetic metrics (e.g., FST) when different read mappers were used. To attend to these concerns, we initially used two different read mappers: BWA mem v0.7.15 (59) and CLC v5.0.3 (www.qiagenbioinformatics.com) with default parameters. Minimal inconsistencies were found between two alternative mappers in depth and FST. However, CLC proved able to handle repetitive regions better than BWA (i.e., it penalized mapping score appropriately at repeated regions). This is in agreement with Kofler et al. (58), where CLC was shown to outperform other mappers. Consequently, we decided to use CLC for read mapping. For all samples, more than 97% of the reads were mapped to the reference genome, with a minimal number (<0.7%) mapped as singletons. Given the high level of fragmentation of the reference genome, paired reads were mapped to the same scaffold in only ~70% of the cases. Of the paired reads that mapped to different scaffolds, >50% had high mapping quality scores (Q20 or higher) and were thus retained for downstream analyses. Bam files were processed with SAMtools v1.3.1 (60), BEDtools v2.25.0 (61), and Picard tools v2.7.1 (http://broadinstitute.github.io/picard). For each set of bam files, we filtered out reads with base quality lower than 30 and mapping quality lower than 20 and those that mapped to very short contigs (<500 bp; 158,060 contigs). We also removed positions of low coverage to avoid uncertain calls and positions of very high coverage to avoid potential repetitive regions. We fitted the depth of coverage to a mixed distribution model to define three classes of coverage: low, medium, and high with the function mixmdl from the R (62) package mixtools. We used this model to define cutoff values to remove 100% of the low-coverage distribution and 50% of the high-coverage distribution (low cutoff = 14 and high cutoff = 204; see fig. S15 for details). Population genomic analyses were performed with filtered files with an average coverage of 68× (min = 14×; max = 204×; SD = 18×).

Note: The content above has been extracted from a research article, so it may not display correctly.



Q&A
Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.



We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.