Mutational signatures of substitutions on a trinucleotide sequence context were inferred from sets of somatic mutation counts using the sigfit (v.2.1.0) R package31. Initially, signature extraction was performed de novo for a range of numbers of signatures (N = 2,...,10), using counts of mutations grouped per sample, per individual and per species. To account for differences in sequence composition across samples, and especially across species, mutational opportunities per sample, per individual and per species were calculated from the reference trinucleotide frequencies across the analysable genome of each sample (see ‘Calculation of analysable genome size’), and supplied to the ‘extract_signatures’ function in sigfit. The ‘convert_signatures’ function in sigfit was subsequently used to transform the extracted signatures to a human-relative representation (Fig. (Fig.2b),2b), by scaling the mutation probability values using the corresponding human genome trinucleotide frequencies. The best-supported number of signatures, on the basis of overall goodness-of-fit31 and consistency with known COSMIC signatures (https://cancer.sanger.ac.uk/signatures/), was found to be N = 3. The cleanest deconvolution of the three signatures was achieved when using the mutation counts grouped by species, rather than by sample or individual. The three extracted signatures (labelled SBSA, SBSB and SBSC) were found to be highly similar to COSMIC signatures SBS1 (cosine similarity 0.96), SBS5 (0.89) and SBS18 (0.91), respectively. These signatures were independently validated using the MutationalPatterns (v.1.12.0) R package68, which produced comparable signatures (respective cosine similarities 0.999, 0.98 and 0.89).
This de novo signature extraction approach, however, failed to deconvolute signatures SBSA and SBSB entirely from each other, resulting in a general overestimation of the exposure to SBSA (Extended Data Fig. Fig.15).15). To obtain more accurate estimates of signature exposure, the deconvolution was repeated using an alternative approach that combines signature fitting and extraction in a single inference process31. More specifically, the ‘fit_extract_signatures’ function in sigfit was used to fit COSMIC signature SBS1 (retrieved from the COSMIC v,3.0 signature catalogue; https://cancer.sanger.ac.uk/signatures/) to the mutation counts grouped by species (with species-specific mutational opportunities), while simultaneously extracting two additional signatures de novo (SBSB and SBSC). Before this operation, COSMIC SBS1 was transformed from its human-relative representation to a genome-independent representation using the ‘convert_signatures’ function in sigfit. By completely deconvoluting SBS1 and SBSB, this approach yielded a version of SBSB that was more similar to COSMIC SBS5 (cosine similarity 0.93); the similarity of SBSC to COSMIC SBS18 was the same under both approaches (0.91).
a, Mutational signatures inferred de novo from the species mutational spectra shown in Fig. Fig.2a.2a. Signatures are shown in a human-genome-relative representation. SBSA is the de novo equivalent of COSMIC signature SBS1 (Fig. (Fig.2b).2b). b, Exposure of each sample to each of the mutational signatures shown in a. Samples are arranged horizontally as in Fig. Fig.1b.1b. c, Regression of signature-specific mutation burdens on individual age for human, mouse and naked mole-rat samples. Regression was performed using mean mutation burden per individual. Shaded areas indicate 95% confidence intervals of the regression lines. BW, black-and-white; H, harbour; N, naked; RT, ring-tailed.
Finally, the inferred signatures were re-fitted to the mutational spectra of mutations in each sample (using the ‘fit_signatures’ function in sigfit with sample-specific mutational opportunities) to estimate the exposure of each sample to each signature. The fitting of the three signatures yielded spectrum reconstruction similarity values (measured as the cosine similarity between the observed mutational spectrum and a spectrum reconstructed from the inferred signatures and exposures) with median 0.98 and interquartile range 0.96–0.99. Although the purely de novo extraction approach and the ‘fitting and extraction’ approach yielded comparable versions of signatures SBSB and SBSC, the fixing of COSMIC SBS1 in the latter approach resulted in lower SBS1 exposures and higher SBSB exposures in most samples, owing to the cleaner deconvolution of these two signatures (Fig. (Fig.2,2, Extended Data Fig. Fig.1515).
To examine potential variation in the spectrum of signature SBS5 across species, the following procedure was conducted for each species: individual-specific mutation counts and mutational opportunities were calculated for each individual in the species, and the ‘fit_extract_signatures’ function was used to fit COSMIC signatures SBS1, SBS18 and SBS34 (transformed to a genome-independent representation using the ‘convert_signatures’ function) to the mutational spectra of each individual, while simultaneously inferring one additional signature (corresponding to signature SBS5 as manifested in that species; Extended Data Fig. Fig.66).
To assess the presence in non-human colorectal crypts of mutational signatures caused by APOBEC or colibactin, which have been previously observed in human crypts8, we used an expectation–maximization algorithm for signature fitting, in combination with likelihood ratio tests (LRTs). More specifically, for each non-human sample, we tested for exposure to colibactin (signature SBS88, COSMIC v.3.2) by comparing the log-likelihoods of (i) a model fitting COSMIC signatures SBS1, SBS5, SBS18, SBS34 and SBS88, and (ii) a reduced model fitting only the first four signatures. Benjamini–Hochberg multiple-testing correction was applied to the P values that resulted from the LRTs, and colibactin exposure was considered significant in a sample if the corresponding corrected q-value was less than 0.05. We followed the same approach to assess exposure to APOBEC (SBS2 and SBS13), using two separate sets of LRTs for models including either SBS2 or SBS13, in addition to SBS1, SBS5, SBS18 and SBS34. APOBEC exposure was considered significant in a sample if its q-values for the models including SBS2 and SBS13 were both less than 0.05. This analysis identified 1/180 crypts with significant exposure to each of colibactin and APOBEC (although the evidence for the presence of the relevant signatures in these two crypts was not conclusive). To test for depletion of colibactin or APOBEC exposure in non-human crypts relative to human crypts, we first applied the LRT-based method described above to a published set of 445 human colorectal crypts8, identifying 92 colibactin-positive and 9 APOBEC-positive crypts. We then compared the numbers of colibactin- and APOBEC-positive crypts in the human and non-human sets using two separate Fisher’s exact tests (‘fisher.test’ function in R). This revealed the difference in colibactin exposure to be highly significant (P = 7 × 10–14), unlike the difference in APOBEC exposure (P = 0.30).
Mutational spectra of somatic indels identified in each species were generated using the ‘indel.spectrum’ function in the Indelwald tool for R (24/09/2021 version; https://github.com/MaximilianStammnitz/Indelwald).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.