5.4. ENSEMBLE

GG Gregory-Neal W. Gomes
MK Mickaël Krzeminski
AN Ashley Namini
EM Erik W. Martin
TM Tanja Mittag
TH Teresa Head-Gordon
JF Julie D. Forman-Kay
CG Claudiu C. Gradinaru
ask Ask a question
Favorite

ENSEMBLE 2.1 [31] was used to determine a subset of conformations from an initial pool of conformers created by the statistical coil generator TraDES[55, 56]. All modules were given equal rank, and all other ENSEMBLE parameters were left at their default values.

To achieve a balance between the concerns of over-fitting (under-restraining) and under-fitting (over-restraining) we performed multiple independent ENSEMBLE calculations with 100 conformers, Nconf = 100, as suggested by Ref [58], and averaged the results from independent ensemble calculation or combined them to form ensembles with larger numbers of conformers (e.g., Nconf = 500). To address the possibility that changing the ensemble size could affect the structural properties of the ensemble, or its agreement with experimental observables, we re-performed the Sic1 SAXS+PRE ensemble calculations, but varied the ensemble size, Nconf (details in the Supporting Information). The determination of polymer properties and the agreement with experimental observables is robust in a range of Nconf from ca. 50–100. Below Nconf ≈ 50, agreement with restraining data (SAXS and PRE) is worsened, and the ensembles do not agree with validating data (smFRET and CSs). Above Nconf ≈ 150, ensembles are in agreement with the experimental observables, though increased ensemble-to-ensemble variation suggests that 5 replicates (independently calculated ensembles with same set of restraints) is insufficient to ensure convergence. Larger ensembles are calculated quicker (> 72 hours for Nconf = 20 vs ca. 1 hour for Nconf = 100). Ensembles with 100 conformers were chosen to minimize the computational cost per ensemble calculation, and ensemble-to-ensemble variation.

NMR data were obtained from BMRB accession numbers 16657 (Sic1) and 16659 (pSic1)[29]. A total of 413 PRE restraints were used with a typical conservative upper- and lower-bound on PRE distance restraints of ±5 Å[57, 82]. This tolerance was used in computing the χ2 metric for the PRE data. CSs were back-calculated using the SHIFTX calculator[51] and a total of 90 Cα CSs and 85 Cβ CSs were used. The CS χ2 metric was computed using the experimental uncertainty σexp and the uncertainty in the SHIFTX calculator (σSHIFTX = 0.98 ppm for Cα CSs and σSHIFTX = 1.10 ppm for Cβ CSs[51]). CRYSOL[33] with default solvation parameters was used to predict the solution scattering from individual structures from their atomic coordinates. A total of 235 data points from q = 0.02 to q = 0.254 Å−1 were used in SAXS-restrained ensembles. The SAXS χ2 metric was computed using the experimental uncertainty in each data point.

Accessible volume (AV) simulations[34, 35] were used to predict the sterically accessible space of the dye attached to each conformation via its flexible linker (Figure 1D). These calculations were performed using the AvTraj[34] v0.0.9 and MDTraj[83] v1.9.3 packages in Python 3.7.6. In the quasi-static approximation, the inter-dye distance dynamics within the AVs for a particular conformation are quasi-static on the timescale of the donor excited state (τDAτD0 = 3.7 ns). The per-conformer mean FRET efficiency is therefore e=E(rDA)P(rDA)drDA, where P(rDA) is the distribution of inter-dye distances resulting from the AV simulation for a particular conformation, and E(rDA)=(1+(rDA/R0)6)1. End-to-end distance reconfiguration times for IDPs and unfolded proteins are typically in the range 50–150 ns [84], and so the end-to-end distance is also quasi-static on the timescale of τDA. The back-calculated ensemble-averaged ⟨E⟩ens is calculated as the linear average of the per-conformer FRET efficiencies ⟨E⟩ens = ⟨e⟩. The quasi-static approximation gives the same ⟨E⟩ens within error as a more computationally demanding method which considers Monte-Carlo simulations of the photon emission process and Brownian motion simulations of dye translation diffusion within the accessible volume (detail in the Supporting Information). Further support for the quasi-static averaging approach used, comes from multiparameter E vs τDA histograms (Figure S2) which provide complementary information of inter-dye distances and dynamics, but with different experimental integration times.

The uncertainty in ⟨E⟩ens, σE,ens, is ca. 0.01, which is a combination of SEM and uncertainty in R0. Differences |EexpEens|σE,exp2+σE,ens20.02 indicate no disagreement between back-calculated and experimental mean transfer efficiencies. A comprehensive description of the ENSEMBLE calculations, restraints and back-calculations can be found in the Supporting Information.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A