We used trRosetta to predict the structures of the following components: FANCA, FANCB, FANCC, FANCE, FANCF, FANCG, FANCL and Fanconi anemia core complex associated protein 100 (FAAP100). trRosetta model building is a two-step process: in the initial step a deep residual convolutional neural network is used to generate inter-residue distance and orientation predictions, and in the second step these predictions are used to model a protein of interest (Yang et al., 2020 ▸). The MSAs were used as inputs to the neural network, which generates residue pair distance distributions in addition to orientation information between all residue pairs. These predictions are then used as input to a custom Rosetta-based folding protocol. This protocol works by randomly setting backbone torsions and utilizing random subsets of the predictions as restraints for a centroid (Rosetta’s reduced residue representation) torsional quasi-Newton-based energy minimization (MinMover). For each domain, 150 centroid models are generated and each model is then refined with the Rosetta full-atom FastRelax protocol. The results from this refinement are used to sort the models based on the REF2015 score function, and the top three models are selected and manually inspected. For all domains except the CC domains and FAAP100 α/β+CtH we observed a well converged structure, and representative structures from this modeling are shown in Fig. 3(a).
The original trRosetta pipeline was unable to generate converged models for the sequence between the β-propeller regions and the β-sandwich regions of FAAP100 and FANCB and for the sequence of FAAP100 α/β+CtH, so we employed a modified version of the network which, in addition to the MSA, also used information on the top 50 putative structural homologs as identified by HHsearch against the PDB100 database of templates. HHsearch hits were converted into 2D network inputs by extracting pairwise distances and orientations from the structure of the template for the matched positions only. Additionally, the positional (1D) similarity and confidence scores provided by HHsearch as well as backbone torsions were used; we tiled them in both axes of the 2D inputs and stacked with them, producing the resulting 2D feature matrix. Features for all unmatched positions were set to zero. Templates were first processed independently by one round of 2D convolutions and were then merged together into a single 2D feature matrix using a pixel-wise attention mechanism. This processed feature matrix was then concatenated with the features extracted from the MSA as in the original trRosetta network; the architecture of the upstream part of the network remained unchanged. For the CC domains this improved the quality of the models for the β-propellers as well as the models for the extended helices C-terminal to the β-propellers. For FAAP100 α/β+CtH we modeled FAAP100 CC+βsand+α/β+CtH with this modified version and found strong convergence for all of the domains. The coiled-coil domains of FANCB and FAAP, and FAAP100 α/β and CtH, were manually extracted for use in the next stages. The results from this modified version of trRosetta are shown in Supplementary Fig. S2(b).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.