MCAN dataset. The MCAN dataset was composed of seven draft genomes downloaded from GenBank (CIPT 140010059, NC_015848.1; CIPT 140060008, NC_019950.1; CIPT 140070008, NC_019965.1; CIPT 140070002, NZ_CAOL00000000.1; CIPT 140070005, NZ_CAOM00000000.1; CIPT 140070013, NZ_CAON00000000.1; and CIPT 140070007, NZ_CAOO00000000.1).

MTBC datasets. We downloaded all the available genomes from the studies of Coll et al. (16), Walker et al. (41), Guerra-Assunção et al. (24), and Comas et al. (14). The total number of sequences originally downloaded was 7977 genomes. For the dN/dS calculations and phoR variant screening, we used all the downloaded genomes, with the objective of incrementing the robustness of the measures and the number of variants per gene. We identified all clusters at a maximum distance of 15 SNPs (common threshold in M. tuberculosis epidemiology), removed samples potentially coinfected with more than one strain, and then kept just one representative from each cluster. Thus, the final number of genomes for these analyses was 4595. The rest of the analyses were performed in smaller subsets of samples because of computational limitations or the specific features of each dataset. A 1591-sequence subset from the Coll et al. (16) samples was used for the recombination analyses within the MTBC, as they include global representatives of the MTBC diversity. A smaller subset of these, which included 219 sequences corresponding also to global representatives, was used for Gubbins because it was not computationally feasible to run the program with more strains. Last, genomes from the Guerra-Assunção et al. (24) dataset, which includes samples taken over a 15-year period in a high-transmission setting (thus enriched in transmission clusters), were used for the phoR transmission analysis (n = 1187). Information about all the strains used in this study (including its accession numbers) can be found in table S7.

The most likely ancestral genome of MTBC. The MTBC ancestor was derived in a previous publication by maximum parsimony and likelihood methods (20). This ancestor is H37Rv-like in terms of genome structural variants, but H37Rv alleles were replaced by those present in the inferred common ancestor of all MTBC lineages.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.