Gene-wise estimates of nucleotide diversity (π), Watterson’s theta (θ), and Tajima’s D were computed for each gene by parsing the whole-genome alignment into genes using EMBOSS (48) and using the python package EggLib (50) on individual gene alignments. Fst values were also calculated using EggLib, specifying groups by (i) country and (ii) sublineage as identified by the TB profiler. Within each country, we calculated genomic pairwise Hamming distances using the program snp-dists (https://github.com/tseemann/snp-dists).

For each gene, we additionally calculated the number of homoplasies and number of homoplasic sites using a Fitch downpass algorithm, as implemented in Dendropy (51). Gene-wise parsimony scores were calculated by identifying homoplasic mutations in each gene and summing the Fitch parsimony scores (i.e., the minimum number of independent emergences of each mutation). To investigate potential alterations of fitness induced by the various mutations in and immediately upstream of the lldD2 gene, we devised a metric for transmissibility associated with each mutation akin to (6). First, homoplasies in this gene were categorized as being in the promotor region, in codons 3 and 253. All homoplasy emergences were mapped to the respective branch in the full phylogenetic time tree. For each homoplasy, we recorded the subtree’s number of descendents and number of deme transitions. The reasoning behind this was that ancestors with homoplasic mutations increasing transmissibility should have more descendents and more deme transitions per time than ancestors without these mutations. For this latter control group, we randomly extracted subtrees of comparable height distribution to those subtrees with promoter/codon 3/codon 253 mutations, and we labeled this control group as “none” (that is, no homoplasic mutations in lldD2). Figure S2 shows a linear model between subtree height and the number of sampled descendents by each mutation category. To test whether these categories have different slopes, we used an analysis of covariance (ANCOVA) procedure. First, a simple null model, where the number of descendents are dependent on subtree height but not on the mutation category (including the none group), was set up. An alternative model is that the relationship between the subtree height and the number of descendents varies between these four different mutation categories. For each of these models, we weighted the number of deme transitions as (number of deme transmissions)2 + 1, i.e., no transitions got a weight of 1, one transition got a weight of 2, and two transitions got a weight of 5. The ANCOVA rejected the null model, showing significant preference for the per-group model (F test, P = 0.038). Analysis of individual height-group interaction terms showed that the coefficient for promotor mutations was significantly different from zero, indicating a positive association between lldD2 promotor mutations and transmissibility. Note that, if the weighting by deme transitions is removed, the ANCOVA no longer significantly prefers the alternative model (F test, P = 0.114), and there is no evidence for homoplasy group interaction with height and number of descendants.

Note: The content above has been extracted from a research article, so it may not display correctly.



Q&A
Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.



We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.