Ancestral state inference

RM Riley J. Mangan
FA Fernando C. Alsina
FM Federica Mosti
JS Jesús Emiliano Sotelo-Fonseca
DS Daniel A. Snellings
EA Eric H. Au
JC Juliana Carvalho
LS Laya Sathyan
GJ Graham D. Johnson
TR Timothy E. Reddy
DS Debra L. Silver
CL Craig B. Lowe
request Request a Protocol
ask Ask a question
Favorite

We implemented gonomics: primateRecon to estimate the ancestral allele states using a maximum likelihood framework90 from our alignment of the human, chimpanzee, bonobo, gorilla, and orangutan genomes. We used this program to estimate both the human-chimpanzee ancestor and the human-gorilla ancestor.

We first estimated the neutral rate of evolution based on four-fold degenerate sites in codons using the knownGenes track on the UCSC Genome Browser as our gene set. We used PHAST: msa view to extract four-fold degenerate codon sites and estimated branch lengths for a fixed-topology tree using a Jukes-Cantor model of evolution91 by maximum likelihood92 (PHAST: phyloFit).

A base was determined to be present in the ancestral node if a base is present in at least two species on two independent lineages connected to the ancestral node. For alignment columns where an ancestral base was determined to be present, we first reconstructed the probabilities of A, C, G, and T in the ancestral node using the tree inferred from four-fold degenerate sites90. We then used one of two methods to assign a single base to the ancestor from these four probabilities. These distinct methods of ancestral state inference reflect the specific experimental use cases for the resulting inferred sequences. In the first method, we bias the reconstruction towards an extant species base by mandating that the sum of probabilities for the three other bases must be greater than or equal to 0.8 for the most likely base to be assigned as the ancestral state. This method produced a conservative estimation of divergent sites between modern and ancestral species and was used in the ascertainment of HAQERs, chimp-AQERs, and gorilla-AQERs. We used our second method of ancestral state inference for annotating the ancestral allele for segregating sites among modern humans. In this method, we first implemented gonomics: vcfToFa to construct a FASTA format sequence of the human reference genome where the reference allele at each segregating site is replaced with the alternate allele from a VCF format file. We then appended this sequence to our multiple alignment and treated both the reference and alternate human sequence with equal weight. we then calculated the four base probabilities for the human-chimpanzee ancestor and accepted the most likely allele as the ancestral state if its probability was greater than or equal to 99%. For uncertain positions, we assigned an N to the ancestral state to ensure that only high confidence SNPs were retained for subsequent analysis of derived allele frequencies.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A