Population Genomics

JC Jocelyn P Colella
AT Anna Tigano
OD Olga Dudchenko
AO Arina D Omer
RK Ruqayya Khan
IB Ivan D Bochkov
EA Erez L Aiden
MM Matthew D MacManes
request Request a Protocol
ask Ask a question
Favorite

We used the software package ANGSD v. 0.93 (Korneliussen et al. 2014) to call variants from low-coverage population genomic data from the 3 species (26 P. eremicus, 9 P. crinitus, and 5 P. maniculatus) with high confidence. First, an initial list of high-quality SNPs was identified by analyzing all samples from the 3 species together using the settings: -SNP_pval 1e-6 -minMapQ 20 -minQ 20 -setMinDepth 20 -minInd 20 -minMaf 0.01. Then, allele frequencies for each of those high-quality SNPs were calculated independently for each species, with the following filtering steps: a minimum of half (-minInd) P. crinitus and P. eremicus samples and all P. maniculatus samples had to meet independent quantity (-minMapQ) and quality (-minQ) thresholds for each variable site.

Differentiation among species was examined using a multidimensional scaling (MDS) analysis in ANGSD. MDS plots were generated in R v.3.6.1 (R Core Team 2017) based on the covariance matrix. Cook’s D was used to identify outliers (Cook and Weisberg 1984; Williams 1987). As an additional measure of differentiation, we estimated weighted and unweighted global FST values for each species pair using realSFS in ANGSD. NGSadmix v. 33 (Skotte et al. 2013) was used to fit genomic data into K populations to parse species-level differences and provide a preliminary screen for genomic admixture under a maximum-likelihood model. Individuals with <90% assignment to a particular species were considered putatively admixed. To examine the impact of coverage on the detection of admixture, we also evaluated coverage distributions among admixed and non-admixed individuals. Nonetheless, expanded sample sizes with greater sequencing depth will be necessary to detail patterns of population structure and introgression. We tested K = 1 through K = (N − 1), where N is the number of total individuals examined. NGSadmix was run for all species combined and again for each species independently.

We used Pairwise sequential Markovian Coalescent (PSMC v. 0.6.5-r67; Li and Durbin 2011) to examine historical demographic changes through time for each species. PSMC analyses are not suitable for low-coverage genomes, therefore we used the higher-coverage reads used to generate the high-quality, chromosome-length assemblies for each species (P. crinitus, assembly methods detailed above; P. eremicus, SAMEA5799953, Tigano et al. 2020; P. maniculatus: GCA_003704035.1, Harvard University). High quality reads (q > 20; Skewer, Jiang et al. 2014) were mapped to their respective de novo assembled reference to identify heterozygous sites. Reference assemblies were then indexed in BWA. Samblaster removed PCR duplicates and picard (http://broadinstitute.github.io/picard/) added a read group to the resulting bam file and generated a sequence dictionary (CreateSequenceDictionary) from the reference assembly. For each species, samtools was used to sort and index alignments, and variants were called using mpileup in bcftools v1.10.2 (call, Li et al. 2009). Consensus sequences were called in VCFtools v 0.1.16 (vcf2fq, Danecek et al. 2011). PSMC distributions of effective population size (Ne) were estimated with 100 bootstrap replicates and results were visualized with gnuplot v. 5.2 (Williams and Kelley 2010), using perl scripts available at https://github.com/lh3/psmc. Output was scaled by a generation time of 6 months (0.5 year, Millar 1989; Pergams and Lacy 2008) and a general mammalian mutation rate of 2.2 × 10−9 substitutions/site/year (Kumar and Subramanian 2002).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A