For the comparison to the dN/dS approaches shown in Supplementary Fig. S5, we used the dN/dS values available at https://github.com/spond/SARS-CoV-2-variation (Martin et al. 2021) for all SARS-CoV-2 sequences, which were calculated using the FEL approach (Kosakovsky Pond and Frost 2005). Specifically, the file https://raw.githubusercontent.com/spond/SARS-CoV-2-variation/master/windowed-sites-fel-all.csv contains dN/dS estimates made using large-scale sequence sets from GISAID in 3-month windows starting with the earliest sequences from December 2019 and continuing up until 31 January 2022. We averaged the dN/dS values for each site over all three month windows, and analyzed those time-window averaged dN/dS values. In order to ensure comparability in the sequence sets used, for the comparisons in Supplementary Fig. S5, we used fitness estimates from our approach made using only sequences in the mutation-annotated tree as of 31 January 2022. We restricted our analysis to this timeframe because the large-scale dN/dS analyses at https://github.com/spond/SARS-CoV-2-variation (which use state-of-the-art methods) were only available for that range of dates.
The dN/dS ratios only provide a single number for each site, which cannot be directly compared to either the mutation-effect estimates or the deep mutational scanning, which estimate the effects of individual amino acid mutations. We therefore computed site-summary metrics of the mutation-effect estimates and the deep mutational scanning as the average effect of all measured amino acid mutations at each site, excluding stop codons. The correlations in Supplementary Fig. S5 are with those site-summary metrics.
We also compared both our mutation-effect estimates and the spike deep mutational scanning measurements (Dadonaite et al. 2023) to predictions from three other algorithms:
These comparisons are shown in Supplementary Fig. S6. The Maher et al. (2022) and Thadani et al. (2023) studies report mutation-level predictions and so are compared directly to the deep mutational scanning our mutation-effect estimates; Rodriguez-Rivas et al. (2023) report only site-level metrics and so are compared to site-summary metrics as for the dN/dS analysis.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.