Pangenomes for the entire genome data set (1,684 strains) and the clade A data set (1,644 strains) were created using Roary (40) with default settings. A core gene alignment was generated using the –mafft option in Roary, resulting in a core gene alignment of 859 genes for the entire data set and of 978 genes for the clade A data set. To estimate recombination events and to remove them from the core genome alignment, we used BratNextGen with default settings, including 20 hidden Markov model (HMM) iterations, 100 permutations run in parallel on a cluster, and 5% significance level, similar to those in earlier publications (41, 42). To determine sequence clusters (SCs) in the core genome alignment where significant recombinations had been removed, we used 5 estimation runs of the hierBAPS method (43) with 3 levels of hierarchy and the prior upper bound for the number of clusters ranging in the interval 50 to 200. All runs converged to the same estimate of the posterior mode clustering. We considered the second level of hierarchy (postBNGBAPS.2) to determine SCs in our collection. To estimate a phylogenetic tree, we used RAxML (44) with the GTR+Gamma model on a core gene alignment stripped of recombination. The bootstrap option was disabled in RAxML due to an extremely long runtime.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.