Assessing Precision and Recall Using Simulated Data

LK Lina Kloub
SG Sean Gosselin
MF Matthew Fullmer
JG Joerg Graf
JG Johann Peter Gogarten
MB Mukul S Bansal
request Request a Protocol
ask Ask a question
Favorite

We performed an extensive simulation study to systematically assess the impact of a wide range of parameters including HGT rates, HMGT rates, HMGT size, number of contigs (i.e., genome assembly fragmentation), HGT inference error, and HMGT inference parameters (i.e., x,y,z values) on the precision and recall of HoMer. Details of this analysis appear in supplementary Assessing HoMer Using Simulated Data, Supplementary Material online. This analysis shows that HoMer shows high precision and when applied to simulated data that roughly mimic the average characteristics of our real Aeromonas data set (supplementary table S16, Supplementary Material online), and that our default x,y,z values of 3,4,1 provide a good trade-off between precision and recall overall. We also find that increasing numbers of HGTs have the largest impact on the precision of the method, which can degrade rapidly with increasing numbers of HGT, particularly when the more permissive HMGT inference parameter setting of 2,3,1 is used (supplementary table S17, Supplementary Material online). The simulation study also shows that HGT inference error has the biggest impact on recall, with recall decreasing consistently as HGT inference error increases (supplementary table S19, Supplementary Material online).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A