Assessing Precision and Recall Using Simulated Data

Lina Kloub; Sean Gosselin; Matthew Fullmer; Joerg Graf; Johann Peter Gogarten; Mukul S Bansal

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Assessing Precision and Recall Using Simulated Data

LK Lina Kloub

SG Sean Gosselin

MF Matthew Fullmer

JG Joerg Graf

JG Johann Peter Gogarten

MB Mukul S Bansal

This method is extracted from research article: Mol Biol Evol, Feb 2021

Systematic Detection of Large-Scale Multigene Horizontal Transfer in Prokaryotes

DOI: 10.1093/molbev/msab043

Request a Protocol

Ask a question

Favorite

We performed an extensive simulation study to systematically assess the impact of a wide range of parameters including HGT rates, HMGT rates, HMGT size, number of contigs (i.e., genome assembly fragmentation), HGT inference error, and HMGT inference parameters (i.e., $〈 x, y, z 〉$ values) on the precision and recall of HoMer. Details of this analysis appear in supplementary Assessing HoMer Using Simulated Data, Supplementary Material online. This analysis shows that HoMer shows high precision and when applied to simulated data that roughly mimic the average characteristics of our real Aeromonas data set (supplementary table S16, Supplementary Material online), and that our default $〈 x, y, z 〉$ values of $〈 3, 4, 1 〉$ provide a good trade-off between precision and recall overall. We also find that increasing numbers of HGTs have the largest impact on the precision of the method, which can degrade rapidly with increasing numbers of HGT, particularly when the more permissive HMGT inference parameter setting of $〈 2, 3, 1 〉$ is used (supplementary table S17, Supplementary Material online). The simulation study also shows that HGT inference error has the biggest impact on recall, with recall decreasing consistently as HGT inference error increases (supplementary table S19, Supplementary Material online).

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol