Investigating the Interplay between Probabilistic Distances and Likelihood Ratio Test Significance

Richard H Adams; Heath Blackmon; Michael DeGiorgio

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Investigating the Interplay between Probabilistic Distances and Likelihood Ratio Test Significance

RA Richard H Adams

HB Heath Blackmon

MD Michael DeGiorgio

This method is extracted from research article: Syst Biol, Feb 2021

Of Traits and Trees: Probabilistic Distances under Continuous Trait Models for Dissecting the Interplay among Phylogeny, Model, and Data

DOI: 10.1093/sysbio/syab009

Request a Protocol

Ask a question

Favorite

We conducted an array of simulations to investigate the interplay between probabilistic model distances and the significance of the likelihood ratio test used for model selection. First, we simulated a random equation M225 tip phylogeny based on a pure-birth Yule process (Yule 1925) with a birth rate of 10 (Supplementary Fig. S6a available on Dryad). Next, we simulated data sets under an OU model with equation M226 and equation M227 . For each value of equation M228 in this range, we simulated 100 replicate data sets with this model equation M229 ), and for each replicate, we fit two alternative models to the simulated data: 1) a BM model and 2) an OU model. Model parameters were estimated using maximum likelihood with the fitContinuous function provided in GEIGER (Harmon et al. 2007). We used the results of these two fitted models to compute a likelihood ratio test with significance assessed assuming a chi-squared distribution with one degree of freedom. Because all observations were simulated under an OU model, fitting a BM model in this case represents a scenario of model-misspecification, and we used these simulations to characterize the relationship between the significance of the likelihood ratio test between two models and the probabilistic distance between them. We repeated this same analysis for a larger tree with equation M230 tips (Supplementary Fig. S7a available on Dryad).

We also expanded these simulations to investigate the impacts of tree shape, tree size, and evolutionary parameters on the significance of the likelihood ratio test alongside probabilistic distances. We simulated data sets under an OU model using three different tree sizes ( equation M231 , 512, and 1024 tips) and three different tree shapes (“balanced,” “left unbalanced,” and randomly generated Yule tree with birth rate equation M232 ; example tree shapes shown at the top of Fig. 5) using branch lengths scaled to give a total tree height of 1.0. For the “balanced” and “left unbalanced” shapes, lineage splits are evenly distributed from the time of sampling to the root of the tree, and all internal branches or shortest external branches are of equal length. For each tree size and shape, we simulated character trait data sets using an OU model that varied in the parameter equation M233 and computed both the likelihood ratio test and Hellinger distance between an OU and BM model that have each been fit to the simulated data set using the fitContinuous function of GEIGER.

Investigating the relationship between model distances and the significance of likelihood ratio tests between fitted BM and OU models (traits simulated under an OU model). Results shown for three different tree shapes: “balanced” (left panels), “left unbalanced” (center), and trees simulated under a Yule model with the birth rate equation M234 (right) with equal branch lengths that are scaled to give a total tree height of 1.0. equation M235 values for a likelihood ratio test comparing the OU and BM models as a function of their Hellinger distance ( equation M236 ) are shown for three different tree sizes: 128 (a–c), 512 (d–f), and 1024 (g–i) tips. The mean (circle) and standard deviations (bars) of the distribution of 10 replicate equation M237 values (subtracted from one). Each simulation replicate was computed by incrementally increasing the equation M238 parameter of the OU model from equation M239 to equation M240 (from left to right in each panel colored in the blue scale shown), at increments of 0.01.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol