Given the clear discordances between mitochondrion-based and phenotype-based phylogenies (12, 15), we performed an extensive series of phylogenetic analyses using different data types and analytical approaches. Our goal was to develop robust conclusions regarding population history that are supported by multiple datasets and analyses.

Phylogenetic analysis of concatenated whole-genome SNV data. The dataset for this analysis consisted of SNV calls for the 15 baboons from the diversity panel (table S4), plus one gelada (T. gelada). From the variant call file (vcf) produced by GATK, only SNV positions surviving filtering steps conducted with vcffilter from vcflib (https://github.com/ekg/vcflib; settings: -f "QUAL > 20 & DP > 10 & MQ > 30 & QD > 20") were used for further analysis (24,588,548 SNVs). SNVs were extracted from the filtered vcf file with bcftools from SAMtools 1.2 (settings: bcftools query -f ’%CHROM\t%POS\t%REF\t%ALT [\t%SAMPLE =%GT]\n’). The resulting table was converted into individual FASTA sequences using a custom Python script. Individual cases where a baboon exhibited more than one different nonreference allele at the same site were recorded as ambiguous. Merging all FASTA sequences into a single file provided a multiple sequence alignment of all individuals and all concatenated SNVs. Positions in the alignment where no information was given for at least one species were removed. A total of 22,433,604 SNVs remained for analysis. Model selection using the Bayesian information criterion in IQ-TREE 1.3.13 (49) revealed the TVM+ASC+G model as the best-fit model for this dataset. Phylogenetic trees were reconstructed with ML and Bayesian approaches using IQ-TREE and MrBayes 3.2.6 (50), respectively. IQ-TREE settings: TVM+ASC+G model, 1000 ultrafast bootstraps; MrBayes settings: TVM+G, 100,000 generations and 10% burnin.

Polymorphism-aware phylogenetic model analysis of whole-genome data. To estimate species-level phylogeny while allowing for current and possible ancient polymorphism, we applied the PoMo model (23) implemented in IQ-TREE (49) to the baboon diversity Panu_2.0 SNV data together with fourfold degenerate sites of the orthologous gene set. Briefly, the PoMo model represents the evolution of an individual nucleotide site within a given fixed species-level phylogeny as a continuous time Markov chain along that phylogeny. Rather than considering only four states (four alternative nucleotides) for a given genomic position, PoMo allows for polymorphism within species by expanding the state space in the Markov chain to include heterozygous nucleotide compositions, assuming two nucleotides per site, in addition to the traditional four nucleotide states. Mutation (e.g., using the HKY model) introduces new nucleotides. The Moran model was used to describe genetic drift or changes in allele frequencies over time. PoMo generates a single species tree, but does allow for ILS. Additional details are available in (23, 24).

Simulation study comparing methods. We analyzed the robustness of the PoMo results to admixture between differentiating lineages using the baboon phylogeny as the assumed context. We defined the input phylogeny as that obtained by modeling potential admixture among the six baboon species through f-statistics (see below). Total branch lengths for each lineage were set as inferred from baboon data, and Watterson’s θ within species was set to 0.0025. We tested the ability of PoMo to accurately reconstruct the phylogeny for P. kindae by varying the proportion of admixture into the P. kindae lineage from a northern clade species from 0 (no admixture) to 80%. We simulated 1000 genes (1000 bp per gene) on five chromosomes and created 1000 gene trees using MSMS (51). We next concatenated the sequence data for five chromosomes and all gene trees in each species. We then used both PoMo and the HKY model to generate phylogenies and compared their ability to reconstruct the correct species-level phylogeny.

Note: The content above has been extracted from a research article, so it may not display correctly.



Q&A
Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.



We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.