We consider a gene to be lateral if it contains, or is overlapped by, at least one inferred lateral segment such that two distinct length thresholds are met. The inferred lateral segment must itself contain at least a specified minimum number of k-mers (including k-mers in any intervening gaps up to G = 2k); this minimum number is 10 for the BA dataset, 100 for EB, 500 for ECS (Cong et al., 2016b) and 10 for BAC. These values approximate the average length of all LGT detections in each dataset, thereby controlling in part for differences in sequence diversity among the datasets. In addition, the overlap must extend for at least a specified minimum number of k-mers (again including k-mers in gaps up to G = 2k); this minimum number is 10 for BA, 100 for ECS and EB (Cong et al., 2016b) and 10 for BAC.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.