Sequence variation in the data set was visualized by using a novel MDS algorithm (45) that represents each sequence as a point in three-dimensional space, with the location of each point being determined by the pairwise differences between that sequence and all other sequences in the data set. To compare the MDS visualization of sequence variation with the inferred evolutionary relationships between sequences, we interpolated the gene tree into the visualization using a neighbor-joining-based algorithm that we developed previously (46). For clarity, not all sequences were included in this gene tree; instead, the data set was preclustered by using AbundantOTU (39) with a 99% sequence similarity threshold. The resulting sequences still represented all genera in the initial data set. We aligned the sequences with MUSCLE v.3.8.3 (47) and created an unrooted maximum likelihood gene tree using RAxML v.8.0.0 (48) with the generalized time-reversible (GTR) gamma model of nucleotide substitution and 1,000 rapid bootstrap replicates to determine statistical confidence in the tree topology; all subsequent multiple-sequence alignments and gene trees were constructed in the same way, except that the gene trees were rooted by including an outgroup sequence.
To evaluate the sequence similarity between isolates from three clades that contained different amounts of sequence variation, we constructed both a rooted gene tree and a heat map of pairwise sequence divergence for each clade. For the gene trees, all aligned sequences were used from the clades corresponding to the Gigasporaceae and Rhizoglomus, and for the clade corresponding to the genera Claroideoglomus and Entrophospora, the gene tree was made by using representative sequences for each species and each evolutionary history (sequence group) (see Results) that were created by clustering sequences using AbundantOTU (39) with a 97% sequence similarity threshold. Each of the three gene trees was rooted by using a sequence from the most closely related clade in the data set (Fig. 1). Clades on each gene tree that represented a single species were collapsed for clarity and colored by species. For each of the three clades, heat maps of pairwise sequence divergence were made from a random subsample of 240 sequences, split evenly among all species, geographic isolates, and both sequence groups for Claroideoglomus and Entrophospora. Pairwise genetic distances for the aligned sequences were calculated with MEGA 7 (49), using the Kimura 2-parameter model of nucleotide substitution (50) with gamma rate distribution, which was the best-fitting substitution model overall for the three clades, as determined by the substitution model-fitting function in MEGA 7; the use of different substitution models gave qualitatively identical results.
Correspondence between the rRNA gene tree (left) and two different views of the multidimensional scaling (MDS) visualization (right) for sequences from 21 species of AM fungi colored by genus. Branches within genera on the gene tree are collapsed for clarity, with the bootstrap value for each of these genus-level clades being noted on its branch; in contrast, each sequence is represented as a point in the MDS visualization.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.