Identifying differential selection between humans and primates through population modeling

HG Hong Gao
TH Tobias Hamp
JE Jeffrey Ede
JS Joshua G. Schraiber
JM Jeremy McRae
MS Moriel Singer-Berk
YY Yanshen Yang
AD Anastasia S. D. Dietrich
PF Petko P. Fiziev
LK Lukas F. K. Kuderna
LS Laksshman Sundaram
YW Yibing Aashish Wu
YF Yair Field
CC Chen Chen
SB Serafim Batzoglou
FA Francois Aguet
GL Gabrielle Lemire
RR Rebecca Reimers
DB Daniel Balick
MJ Mareike C. Janiak
MK Martin Kuhlwilm
JO Joseph D. Orkin
SM Shivakumara Manu
AV Alejandro Valenzuela
JB Juraj Bergman
MR Marjolaine Rousselle
FS Felipe Ennes Silva
LA Lidia Agueda
JB Julie Blanc
MG Marta Gut
DV Dorien de Vries
IG Ian Goodhead
RH R. Alan Harris
MR Muthuswamy Raveendran
AJ Axel Jensen
IC Idriss S. Chuma
JH Julie E. Horvath
CH Christina Hvilsom
DJ David Juan
PF Peter Frandsen
FM Fabiano R. de Melo
FB Fabrício Bertuol
HB Hazel Byrne
IS Iracilda Sampaio
IF Izeni Farias
JA João Valsecchi do Amaral
MM Mariluce Messias
MS Maria N. F. da Silva
MT Mihir Trivedi
RR Rogerio Rossi
ask Ask a question
Favorite

We first established a neutral background distribution of mutation rates per gene for each primate species by fitting the Poisson Random Field model to the segregating synonymous variants in each species. The observed number of segregating synonymous sites is a Poisson random variable, with the mean determined by mutation rate, demography, and sample size (34). For simplicity, we assumed an equilibrium (i.e., constant) demography for all species besides humans; for humans, we used Moments (51) to find a best-fitting demographic history based on the folded site frequency spectrum of synonymous sites. We adopted a Gamma distributed prior on mutation rates, which also accounts for the impact of GC content on mutation rate. We optimized the prior parameters through maximum likelihood and computed the posterior distribution of the mutation rate per gene.

The number of segregating nonsynonymous sites is modeled as a Poisson random variable similar to synonymous sites with additional selection parameters. We assumed that every nonsynonymous mutation in a gene shares the same population-scaled selection coefficient γig. To explicitly estimate the selection coefficient of each gene per species, we devised a two-step procedure analogous to an expectation-maximization algorithm to control for differences in population size across species.

To identify genes in which human constraint is different from nonhuman primate selection, we developed a likelihood ratio test to test whether population-scaled selection coefficients are significantly different between humans and other primates. We then assessed whether our population genetic modeling improved the correlation of selection estimates of our primate data with previous gene-constraint metrics in humans, including pLI (28) and s_het (111). To validate the performance of our model, we performed population genetic simulations.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A