参见作者原研究论文

本实验方案简略版
Jan 2019
Advertisement

本文章节


 

Probabilistic Models for Predicting Mutational Routes to New Adaptive Phenotypes
用于预测新的适应性表型突变路径的概率模型   

Eric  LibbyEric Libby* Peter A. LindPeter A. Lind* (*共同第一作者)
引用 收藏 提问与回复 分享您的反馈 Cited by

Abstract

Understanding the translation of genetic variation to phenotypic variation is a fundamental problem in genetics and evolutionary biology. The introduction of new genetic variation through mutation can lead to new adaptive phenotypes, but the complexity of the genotype-to-phenotype map makes it challenging to predict the phenotypic effects of mutation. Metabolic models, in conjunction with flux balance analysis, have been used to predict evolutionary optimality. These methods however rely on large scale models of metabolism, describe a limited set of phenotypes, and assume that selection for growth rate is the prime evolutionary driver.

Here we describe a method for computing the relative likelihood that mutational change will translate into a phenotypic change between two molecular pathways. The interactions of molecular components in the pathways are modeled with ordinary differential equations. Unknown parameters are offset by probability distributions that describe the concentrations of molecular components, the reaction rates for different molecular processes, and the effects of mutations. Finally, the likelihood that mutations in a pathway will yield phenotypic change is estimated with stochastic simulations.

One advantage of this method is that only basic knowledge of the interaction network underlying a phenotype is required. However, it can also incorporate available information about concentrations and reaction rates as well as mutational biases and mutational robustness of molecular components. The method estimates the relative probabilities that different pathways produce phenotypic change, which can be combined with fitness models to predict evolutionary outcomes.

Keywords: Evolutionary forecasting (进化预测), Mathematical modeling (数学建模), Adaptation (适应性), Mutation (突变), Evolution (进化), Genotype-to-phenotype map (基因型-表型图谱)

Background

The ability to forecast evolution would not only make evolutionary biology more predictive but could have translational impact with applications in biotechnology (e.g., synthetic biology or biofuels) or medicine (e.g., antibiotic resistance or cancer). Most previous success in evolutionary forecasting has been in asexual microbial populations subject to strong selection and relies on extensive historical sequence data to estimate the fitness of different strains (Luksza and Lassig 2014; Neher et al., 2014). Such models are not based on the mechanistic details of how mutations translate into fitness differences. The models also assume that selection is the main determinant of evolutionary outcomes which is a limiting assumption because evolutionary routes can be significantly influenced by factors other than selection. For example, if the rate of phenotypic production differs across evolutionary routes then more likely mutations that confer small gains in fitness may be observed more often than rare mutations that confer large fitness gains; being first may be more important than being best. In such cases, accurate prediction of the mutational routes to adaptive phenotypes requires knowledge of their relative rates of phenotypic production.

The rates of phenotypic production can be shaped by a variety of factors. For instance, a higher rate of phenotypic production can be caused by mutational hotspots, which are sections of DNA that increase the frequency of mutational events in nearby genes. Another possibility is that some genes have a greater capacity to translate mutation into phenotypic change, i.e., have a larger number of sites that can be mutated with functional effects. This can due to the properties of the individual protein (Lind et al., 2017) or ncRNA, in terms of mutational robustness, but also because proteins and ncRNAs have different functions in the molecular interaction networks underpinning adaptive phenotypes (Lind et al., 2015 and 2019). A simple example of the latter case would be when expression of a gene is controlled by both a negative and a positive regulator, activation of that gene occurs more often through mutations in the negative regulator simply due to loss-of-function mutations being more likely than gain-of-function mutations.

To account for the factors affecting the rate of phenotypic production, mechanistic information is needed. An arena in which mechanistic information has been used to successfully predict evolution is in metabolism (Edwards and Palsson 2000; O'Brien et al., 2015). There is a wealth of data on the biochemical reactions involved in central metabolism in different microbial species. This data can be used to form explicit metabolic models that in conjunction with a technique known as flux balance analysis can predict the growth rate of an organism in simple environments. By manipulating which reactions are present in the metabolic model, the phenotypic effects, i.e., growth rates, of mutational knockouts can be predicted. Similar metabolic models have also been used to predict how organisms will interact in different environments via the exchange of biochemical compounds including spatio-temporal interactions (Liu et al., 2015; Bocci et al., 2018). While the metabolic models have been successful in certain evolutionary predictions, they are limited in their applicability: they focus mainly on growth phenotypes in environments in which organisms are assumed to be actively reproducing. Moreover, the growth phenotypes are computed using large-scale models with hundreds or thousands of reactions and typically involve optimization of a “biomass’’ function of some 50+ variables. The models also do not usually incorporate any type of regulation, e.g., transcriptional/translational control or protein modifications, but see Chandrasekaran and Price, 2010 for one such example. Thus, they are not well suited to generalizations outside of metabolism.

Here we describe a method to predict the rates that mutational changes to different molecular networks produce phenotypic change. One advantage of this method is that only knowledge of the general architecture of the molecular networks is needed to make a prediction. However information about reaction rates, concentrations, mutation rates and gene size, mutational robustness of components can be included if known. The method can incorporate different types of reactions, for example conformational changes and enzymatic reactions, and the phenotypes predicted are not limited to optimization for growth rate. Predictions generated by the method could potentially be combined with origin-fixation models (McCandlish and Stoltzfus, 2014) in order to predict the mutation rate to an adaptive phenotype in the absence of unbiased experimental data that is often difficult to obtain. Information about mutational biases (Lind et al., 2019) and the molecular effects of mutations on protein function from different prediction methods (Capriotti et al., 2005; Bromberg and Rost, 2007; Kumar et al., 2009; Dehouck et al., 2011; Capriotti et al., 2013; Celniker et al., 2013; Yates et al., 2014; Choi and Chan, 2015) can be incorporated into the model described here to adjust the rates of disabling and enabling mutation in different genes. The method described here is also useful for providing null models in order to test the causes of repeated evolution (Lind, 2018; Lind et al., 2019). It could also be one component in understanding the molecular bases of complex genetic diseases and for evolutionary forecasting of antibiotic resistance and cancer, especially when experimental data is incomplete.

Equipment

  1. Computer
    Any desktop computer that fulfills the system requirements for the programming language(s) used. Here a desktop computer with the following configuration was used: iMac with a 3.6 GHz Intel Core i7 processor and 16 GB 2400 MHz DDR4 memory

Software

  1. MATLAB (MathWorks®, http://www.mathworks.com/)
    MATLAB was used to solve the differential equations, analyze the resulting data, and generate figures. However, other programming language(s) would have sufficed so long as they can be used to numerically solve differential equations and perform simple mathematical operations, e.g., Python, Julia, Octave, or Mathematica. We note that in the associated paper (Lind et al., 2019) a combination of languages was used but for simplicity here we have put everything into a single language. MATLAB code for the procedures RunModelComparison and PlotModelComparison is available at https://github.com/ericlibby/BioProtocol.

Procedure

This protocol describes a method for computing the relative likelihood that a mutational change will translate into a phenotypic change in two molecular pathways. The pathways do not have to produce the same phenotype but there should be a way of determining what the phenotype is, based on the interactions between pathway components. Thus, the protocol assumes that the mechanistic details of the molecular pathways are understood; however, information may be missing concerning the reaction kinetics, the concentrations of molecular compounds, and/or the effects of mutations. This protocol outlines a procedure that randomly samples many models of the pathways and compares the likelihood that mutations in the pathway–that affect the reaction kinetics–alter the phenotype(s). The steps of the procedure are illustrated in Figure 1 using the same notation as in the text below. Figure 2 and Figure 3 presents the procedure in a pseudocode format and the full MATLAB code is available at https://github.com/ericlibby/BioProtocol.



Figure 1. Overview of modeling methodology. Details are explained in the text below using the same notation.


  1. Build mathematical models of the molecular pathways that enable comparison
    1. Identify the set of biochemical reactions in each molecular pathway that determine expression of a particular phenotype. The types of reactions can vary depending on the pathway modeled and can include, for example, enzymatic reactions, conformational changes in proteins, oligomerization, and transcription initiation. The reactions should link the expression of the phenotype to some indicator, e.g., a biochemical compound (or function of compounds). For example, if the phenotype is the color of an organism then an appropriate indicator might be the amount of a pigment compound. In the article where this method was first used (Lind et al., 2019) the indicators were three diguanylate cyclase enzymes that could each be activated by mutations in their regulatory networks.
    2. For each set of biochemical reactions, formulate a system of differential equations that describes the reaction kinetics of the indicator (see Figure 1 for example). Each pathway is likely affected by biochemical compounds whose dynamics are governed by external pathways. It is often better to treat these compounds as constants rather than include more differential equations for their dynamics. For instance, if a pathway affects an indicator through phosphorylation then the intracellular pool of phosphate groups may be held as a constant rather than modeled through an additional set of differential equations. In general, it also helps to keep a similar level of detail/abstraction across the models of the pathways for better comparison. As an example, in (Lind et al., 2019) the models of the different pathways focused on the interactions between proteins and ignored processes involving transcription.


      Figure 2. Modeling of the AWS network. The reactions of the AWS regulatory network from (Lind et al., 2019) can be illustrated by an interaction diagram. The mathematical model of AWS is a system of ordinary differential equations describing all reactions and rates. In this case, the active dimer RR is the indicator of phenotypic change.

  2. Choose parameter distributions to solve the differential equations
    We assume that the systems of differential equations used to model the pathways cannot be solved analytically. To solve each system of differential equations numerically, three key pieces of information are needed:
    1. Initial concentrations of the molecular compounds.
    2. Reaction rates for the interactions between compounds.
    3. Time interval over which to solve the differential equations.
    It is possible that all pieces of information are known for a model in which case there will be only one numerical solution for each pathway model. In most cases, however, information is missing and must be estimated. 

    1. If there is any information concerning a) or b) that is known or can be estimated from empirical data then use this to constrain the sampling procedure. For unknown concentrations or reaction rates, choose a distribution for each that seems reasonable based on empirical data or at least encompasses a broad spectrum of likely dynamics. For instance, in Lind et al. (2019) there was no information concerning reaction rates or concentrations, so the authors chose the distribution for the reaction rates to be a uniform distribution on log space (i.e., 10U[−2,2]) and the distribution for initial concentrations to be a uniform distribution U[0,10]. These choices were in line with other pathway reaction kinetic models and allowed for numerically stable solutions. Importantly, whatever distributions are chosen, it is useful to see the effects of different choices on the results of the pathway comparison as part of a post-analysis parameter sensitivity study.
    2. The time interval for numerical solving the differential equations depends on the particular molecular system and the dynamics of the indicator compound. If the indicator reaches a steady state then that might set the time interval; however, in some systems transient, short-term behavior may be more biologically relevant. A distribution could be chosen for the time to solve the differential equations; however, in practice it is often faster and easier to evaluate the results when a fixed time is chosen and used for all models–especially when all other parameters are sampled randomly.

  3. Choose distributions for the effects of mutations and the amount of change in the indicator compound that corresponds to phenotypic change
    The prior steps allow for the computation of the indicator compound in each pathway, as a baseline for comparison. The choices made in this step will allow numerical solution of a pathway’s differential equation model following a mutation (or set of mutations) and determination of whether the phenotype has changed.
    1. We assume that the relevant mutations to a pathway are those that alter reaction rates, i.e., change the internal dynamics of a pathway, and we classify mutations as enabling or disabling based on whether they increase/decrease reaction rates. Thus, we need a way of determining new reaction rate(s) in the differential equation model of a pathway, following a mutation (or a set of mutations). Ideally, one would have information on all possible mutations and their effects on reaction rates. However, this is unlikely to be available. In the absence of empirical data, a distribution should be picked such that given a reaction rate and whether a mutation increases or decreases reaction rates, a new reaction rate is determined. For example, in (Lind et al., 2019) enabling mutations were assumed to affect reaction rates multiplicatively and thus new reaction rates were computed as the product of the original reaction rate and a multiplicative factor sampled from 10U[0,2] (similarly a factor of 10U[−2,0]) was used for disabling mutations. 
    2. The differential equation models for each pathway link the concentration of an indicator compound to a phenotype. Since the sampling distributions in Step B may result in different baseline concentrations of indicator compound, there needs to be a way of assessing whether a mutation (or set of mutations) produces a phenotypic change. In the absence of empirical data, a simple approach is to establish that if the indicator increases in concentration above some threshold then the phenotype changes. The actual threshold will depend on the biological system and how much of a change in the indicator is biologically relevant. Without this information, the threshold should be high enough to avoid issues of numerical precision and will depend on the differential equation solver and its parameters (e.g., step sizes in Runge-Kutta methods). For simplicity, we used a threshold of 0.000001 which was similar in magnitude to the numerical tolerances used in the differential equation solver.

  4. Run simulation routine to compute the likelihood of a phenotypic change given all combinations of mutations
    Use the code “Run Model Comparison” (https://github.com/ericlibby/BioProtocol/blob/master/RunModelComparison.m) to generate the data. The pseudocode for “Run Model Comparison” is included below as Algorithm 1 (Figure 3). “Run Model Comparison” use the procedure “Phenotype Change” for each pathway to compute the probability that mutation in a pathway causes a phenotypic change. The pseudocode for “Phenotype Change” is shown in Figure 3. “Phenotype Change” uses several procedures to set the parameter distributions (Procedure B, C above). The pseudocode for these are shown in Figure 4. To visualize the results, use the code “Plot Model Comparison” (https://github.com/ericlibby/BioProtocol/blob/master/PlotModelComparison.m) to create a contour plot that shows the log2 ratio of relative probabilities between two pathways with axes denoting the probabilities of enabling and disabling change.


    Figure 3. Pseudocode for the “RunModelComparison” and “PhenotypeChange” procedures. “PhenotypeChange” compute the probability that mutation in a pathway causes a phenotypic change. “RunModelComparison” computes the relative likelihood that Pathway 1 is used relative to Pathway 2. MATLAB code for RunModelComparison is available at https://github.com/ericlibby/BioProtocol/blob/master/RunModelComparison.m.


    Figure 4. Pseudocode for procedures used by “PhenotypeChange” procedure shown in Figure 3. The SolveModel procedure is not listed as it is assumed to be a numerical solver available as a built-in function or from a library/package.

Data analysis

The result of running the codes “Run Model Comparison” (https://github.com/ericlibby/BioProtocol/blob/master/RunModelComparison.m) and “Plot Model Comparison” (https://github.com/ericlibby/BioProtocol/blob/master/PlotModelComparison.m) is a contour plot that shows the ratio of likelihoods that the pathways produce a phenotypic change for different values of the probabilities of enhancing and disabling mutations (the vertical and horizontal axes, respectively). In terms of statistical analyses, it depends on the conclusions that are drawn. The contour plot only shows the ratio of the average likelihood for each pathway. Further analyses can investigate the standard deviation or maximum/minimum of the likelihoods. We recommend that additional analyses be performed to validate any conclusions. In particular, the analyses can be performed again with changes to the probability distributions to test for parameter sensitivity. Alternatively, the number of iterations can be increased/decreased by some factor to evaluate whether the contour plot remains stable. Examples of these analyses can be found in Figure 5 and associated methods of Lind et al., 2019.

Acknowledgments

This protocol was adapted from (Lind et al., 2019).

Competing interests

The authors declare no competing financial interest.

References

  1. Bocci, F., Suzuki, Y., Lu, M. and Onuchic, J. N. (2018). Role of metabolic spatiotemporal dynamics in regulating biofilm colony expansion. Proc Natl Acad Sci U S A 115: 4288-4293.
  2. Bromberg, Y. and Rost, B. (2007). SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res 35(11): 3823-3835.
  3. Capriotti, E., Calabrese, R., Fariselli, P., Martelli, P. L., Altman, R. B. and Casadio, R. (2013). WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation. BMC Genomics 14 Suppl 3: S6.
  4. Capriotti, E., Fariselli, P. and Casadio, R. (2005). I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res 33(Web Server issue): W306-310.
  5. Celniker, G., Nimrod, G., Ashkenazy, H., Glaser, F., Martz, E., Mayrose, I., Pupko, T. and Ben-Tal, N. (2013). ConSurf: using evolutionary data to raise testable hypotheses about protein function. Isr J Chem 53: 199-206.
  6. Chandrasekaran, S. and Price, N. D. (2010). Probabilistic integrative modeling of genome-scale metabolic and regulatory networks in Escherichia coli and Mycobacterium tuberculosis. Proc Natl Acad Sci U S A 107: 17845-17850. 
  7. Choi, Y. and Chan, A. P. (2015). PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 31(16): 2745-2747.
  8. Dehouck, Y., Kwasigroch, J. M., Gilis, D. and Rooman, M. (2011). PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinformatics 12: 151. 
  9. Edwards, J. S. and Palsson, B. O. (2000). Metabolic flux balance analysis and the in silico analysis of Escherichia coli K-12 gene deletions. BMC Bioinformatics 1: 1.
  10. Kumar, P., Henikoff, S. and Ng, P. C. (2009). Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4(7): 1073-1081. 
  11. Lind, P. A. (2018). Evolutionary forecasting of phenotypic and genetic outcomes of experimental evolution in Pseudomonas. bioRxiv. 342261.
  12. Lind, P. A., Arvidsson, L., Berg, O. G. and Andersson, D. I. (2017). Variation in mutational robustness between different proteins and the predictability of fitness effects. Mol Biol Evol 34(2): 408-418.
  13. Lind, P. A., Farr, A. D. and Rainey, P. B. (2015). Experimental evolution reveals hidden diversity in evolutionary pathways. Elife 4: e07074.
  14. Lind, P. A., Libby, E., Herzog, J. and Rainey, P. B. (2019). Predicting mutational routes to new adaptive phenotypes. Elife 8: e38822.
  15. Liu, J., Prindle, A., Humphries, J., Gabalda-Sagarra, M., Asally, M., Lee, D. Y., Ly, S., Garcia-Ojalvo, J. and Suel, G. M. (2015). Metabolic co-dependence gives rise to collective oscillations within biofilms. Nature 523(7562): 550-554. 
  16. Luksza, M. and Lassig, M. (2014). A predictive fitness model for influenza. Nature 507(7490): 57-61.
  17. McCandlish, D. M. and Stoltzfus, A. (2014). Modeling evolution using the probability of fixation: history and implications. Q Rev Biol 89(3): 225-252.
  18. Neher, R. A., Russell, C. A. and Shraiman, B. I. (2014). Predicting evolution from the shape of genealogical trees. Elife 3: e03568.
  19. O'Brien EJ, Monk JM, Palsson BO. (2015). Using Genome-scale Models to Predict Biological Capabilities. Cell 161:971-987.
  20. Yates, C. M., Filippis, I., Kelley, L. A. and Sternberg, M. J. (2014). SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features. J Mol Biol 426(14): 2692-2701.

简介

了解遗传变异到表型变异的转化是遗传学和进化生物学中的一个基本问题。通过突变引入新的遗传变异可以导致新的适应性表型,但是基因型到表型图的复杂性使得预测突变的表型效应具有挑战性。代谢模型与通量平衡分析相结合,已被用于预测进化的最优性。然而,这些方法依赖于新陈代谢的大规模模型,描述了一组有限的表型,并假设对生长速率的选择是主要的进化驱动力。

在这里,我们描述了一种计算突变变化将转化为两个分子途径之间的表型变化的相对可能性的方法。用常微分方程对通路中分子成分的相互作用进行建模。未知参数被描述分子成分浓度,不同分子过程的反应速率以及突变影响的概率分布所抵消。最后,通过随机模拟估计了途径突变产生表型变化的可能性。

该方法的一个优点是仅需要表型下的相互作用网络的基本知识。但是,它也可以合并有关浓度和反应速率以及分子组分的突变偏倚和突变稳健性的可用信息。该方法估计了不同途径产生表型变化的相对概率,可以将其与适应性模型相结合以预测进化结果。
【背景】 预测进化的能力不仅可以使进化生物学更具预测性,而且还可以在生物技术(例如,合成生物学或生物燃料)或医学(例如,抗生素)中产生转化影响抵抗力或癌症)。进化预测的大多数先前成功是在无性微生物种群中进行的,这些种群需要进行大量选择,并依靠广泛的历史序列数据来估计不同菌株的适应性(Luksza and Lassig 2014; Neher et al。,2014)。 。这样的模型不是基于突变如何转化为适应性差异的机制细节。该模型还假设选择是进化结果的主要决定因素,这是一个限制性假设,因为进化路径可能会受到选择以外的因素的显着影响。例如,如果表型产生的速率在进化途径上有所不同,那么与提供较大适应性的稀有突变相比,可以更频繁地观察到产生较小适应性的突变;成为第一可能比成为最佳更重要。在这种情况下,要准确预测到适应性表型的突变途径,需要了解其相对表型产生率。
表型产生的速率可以由多种因素决定。例如,突变热点可能导致较高的表型产生率,突变热点是增加附近基因突变事件频率的DNA片段。另一种可能性是某些基因具有将突变转化为表型变化的更大能力,即具有大量可以通过功能作用进行突变的位点。这可能是由于单个蛋白质(Lind et al。,2017)的特性(在突变稳健性方面),也可能是因为蛋白质和ncRNA在分子相互作用网络中具有不同的功能,这些分子支撑了适应性表型(Lind et al。,2015年和2019年)。后一种情况的一个简单示例是,当基因的表达同时受负调节剂和正调节剂的控制时,该基因的激活更多地通过负调节剂中的突变发生,这仅仅是由于功能丧失突变的可能性更高而不是功能获得突变。

为了解释影响表型产生速率的因素,需要机械信息。在新陈代谢的领域中,已经使用机械信息成功地预测了进化(Edwards和Palsson,2000; O'Brien等,2015)。关于不同微生物物种中枢代谢涉及的生化反应,有大量数据。该数据可用于形成明确的代谢模型,并与称为通量平衡分析的技术结合使用,可以预测生物在简单环境中的生长速度。通过操纵代谢模型中存在的反应,可以预测突变敲除的表型效应,即生长速率。类似的代谢模型也已用于预测生物如何通过生化化合物交换(包括时空相互作用)在不同环境中相互作用(Liu 等人,,2015; Bocci 等人,,2018年)。尽管代谢模型已在某些进化预测中取得成功,但它们的适用性受到限制:它们主要关注假定生物体正在积极繁殖的环境中的生长表型。此外,生长表型是使用具有数百或数千个反应的大规模模型计算得出的,通常涉及优化50多个变量的“生物量”函数。这些模型通常也不会合并任何类型的调节,例如 ,转录/翻译控制或蛋白质修饰,但有关此类示例,请参见Chandrasekaran和Price,2010。因此,它们不适用于新陈代谢之外的概括。
在这里,我们描述了一种预测不同分子网络的突变变化产生表型变化的速率的方法。该方法的一个优点是,只需要了解分子网络的一般结构即可进行预测。但是,如果已知,则可以包含有关反应速率,浓度,突变率和基因大小,组分的突变稳健性的信息。该方法可以并入不同类型的反应,例如构象变化和酶促反应,并且预测的表型不限于优化生长速率。该方法产生的预测可能会与原点固定模型结合使用(McCandlish和Stoltzfus,2014),以便在缺乏通常难以获得的无偏实验数据的情况下预测适应性表型的突变率。有关来自不同预测方法(Capriotti等),2005; Bromberg和Rost的有关突变偏见的信息(Lind 等人,,2019)以及突变对蛋白质功能的分子影响,2007年;库玛等人,2009年;迪霍克等人,2011年; Capriotti等人,2013年;塞里克尔等人等人,2013年; Yates等人,2014年; Choi和Chan,2015年)可以并入此处描述的模型中,以调整不同基因的失能和致突变率。此处描述的方法对于提供空模型以测试重复进化的原因也很有用(Lind,2018; Lind et al。,2019)。它也可能是理解复杂遗传疾病的分子基础以及对抗生素抗性和癌症进行进化预测的一个组成部分,尤其是在实验数据不完整时。

关键字:进化预测, 数学建模, 适应性, 突变, 进化, 基因型-表型图谱

设备

  1. 电脑
    满足所用编程语言的系统要求的任何台式计算机。在这里,使用了具有以下配置的台式计算机:具有3.6 GHz Intel Core i7处理器和16 GB 2400 MHz DDR4内存的iMac

软件

  1. MATLAB(MathWorks ®, http://www.mathworks.com/ )< br /> 使用MATLAB求解微分方程,分析所得数据并生成图形。但是,只要其他编程语言可用于数值求解微分方程并执行简单的数学运算即可,例如 ,Python,Julia,Octave或Mathematica。我们注意到在相关论文中(Lind et al。,2019),使用了多种语言的组合,但是为了简单起见,我们将所有内容都都统一为一种语言。 https://github.com/ericlibby/BioProtocol 上提供了RunModelComparison和PlotModelComparison过程的MATLAB代码。 br />

程序

该协议描述了一种计算突变变化将转化为两个分子途径表型变化的相对可能性的方法。途径不必产生相同的表型,但是应该有一种基于途径组分之间的相互作用确定表型是什么的方法。因此,该协议假定已经了解了分子途径的机理细节。但是,可能缺少有关反应动力学,分子化合物浓度和/或突变影响的信息。该方案概述了随机采样许多途径模型的程序,并比较了影响反应动力学的途径突变改变表型的可能性。该过程的步骤在图1中使用与下文相同的符号表示。图2和图3以伪代码格式显示了该过程,完整的MATLAB代码可从 https://github.com/获得。 ericlibby / BioProtocol 。


图1.建模方法概述。下文中使用相同的符号说明了详细信息。

  1. 建立分子路径的数学模型以进行比较
    1. 确定每个分子途径中确定特定表型表达的生化反应集。反应的类型可以根据建模的途径而变化,并且可以包括例如酶促反应,蛋白质的构象变化,寡聚和转录起始。反应应将表型的表达与某种指示物例如(一种生化化合物(或化合物的功能))联系起来。例如,如果表型是生物体的颜色,则适当的指示符可能是色素化合物的量。在首次使用此方法的文章中(Lind et al。,2019),指标是三种双鸟苷酸环化酶,每种均可被其调节网络中的突变激活。
    2. 对于每组生化反应,制定一个描述该指示剂反应动力学的微分方程组(例如,参见图1)。每种途径都可能受生化化合物的影响,其动态性受外部途径控制。通常最好将这些化合物视为常量,而不是为它们的动力学包括更多的微分方程。例如,如果途径通过磷酸化影响指示剂,则磷酸酯基团的细胞内池可以保持恒定而不是通过另外一组微分方程来建模。通常,它还有助于在路径模型之间保持相似的详细程度/抽象度,以便更好地进行比较。例如,在(Lind et al。,2019)中,不同途径的模型着重于蛋白质与涉及转录的被忽略过程之间的相互作用。


      图2. AWS网络的建模。(交互等,2019)中的AWS监管网络的反应可以通过交互图来说明。 AWS的数学模型是描述所有反应和速率的常微分方程系统。在这种情况下,有效的二聚体RR是表型变化的指标。

  2. 选择参数分布来求解微分方程
    我们假设用于解析路径的微分方程组无法解析。为了用数值方法求解每个微分方程组,需要三个关键信息:
    1. 分子化合物的初始浓度。
    2. 化合物之间相互作用的反应速率。
    3. 解决微分方程的时间间隔。
    一个模型的所有信息都是已知的,在这种情况下,每个路径模型只有一个数值解。但是,在大多数情况下,信息会丢失,必须对其进行估算。

    1. 如果存在任何有关a)或b)的信息,这些信息已知或可以从经验数据中估计出来,则可以使用此信息来约束采样过程。对于未知的浓度或反应速率,请根据经验数据为每种分布选择似乎合理的分布,或者至少包含可能的动态范围。例如,在Lind et al。(2019)中没有有关反应速率或浓度的信息,因此作者选择了反应速率的分布为对数空间上的均匀分布(例如,,10 U [-2,2] ),初始浓度的分布为均匀分布U [0,10]。这些选择与其他途径反应动力学模型一致,并允许数值稳定的解决方案。重要的是,无论选择哪种分布,作为分析后参数敏感性研究的一部分,查看不同选择对途径比较结果的影响都是有用的。
    2. 用于数值求解微分方程的时间间隔取决于特定的分子系统和指示剂化合物的动力学。如果指示器达到稳定状态,则可以设置时间间隔。但是,在某些系统中,短暂的短期行为可能在生物学上更相关。可以选择时间分布来求解微分方程。但是,实际上,选择固定时间并将其用于所有模型时,评估结果通常更快,更容易,尤其是在随机抽取所有其他参数时。

  3. 选择突变的分布和与表型变化相对应的指示剂化合物变化量
    先前的步骤允许计算每个途径中的指示剂化合物,作为比较的基准。在此步骤中进行的选择将允许对突变(或一组突变)之后的路径的微分方程模型进行数值求解,并确定表型是否已改变。
    1. 我们假设途径的相关突变是那些会改变反应速率的突变,即改变途径的内部动力学,并且我们根据突变是否增加/降低了反应速率将突变分为启用或禁用。因此,我们需要一种确定突变(或一组突变)后在途径的微分方程模型中确定新反应速率的方法。理想情况下,应该了解所有可能的突变及其对反应速率的影响。但是,这不太可能。在没有经验数据的情况下,应该选择一种分布,以便在给定反应速率以及突变是增加还是减少反应速率的情况下,确定新的反应速率。例如,在(Lind et al。,2019)中,假定使突变能够成倍地影响反应速率,因此,新反应速率被计算为原始反应速率与从10中采样的倍增因子的乘积 U [0,2] (大约是10 U [-2,0] 的因数)用于禁用突变。
    2. 每个途径的微分方程模型将指示剂化合物的浓度与表型联系起来。由于步骤B中的采样分布可能导致指示剂化合物的基线浓度不同,因此需要一种评估突变(或一组突变)是否产生表型变化的方法。在缺乏经验数据的情况下,一种简单的方法是确定如果指示剂的浓度增加到某个阈值以上,则表型发生变化。实际阈值将取决于生物系统以及指标中多少变化与生物学相关。没有此信息,阈值应该足够高以避免出现数值精度问题,并且将取决于微分方程求解器及其参数(例如,Runge-Kutta方法中的步长)。为简单起见,我们使用的阈值为0.000001,其大小与微分方程求解器中使用的数值公差相似。

  4. 运行模拟例程以计算在所有突变组合下表型变化的可能性
    使用代码“运行模型比较”( https://github.com/ericlibby/ BioProtocol / blob / master / RunModelComparison.m )来生成数据。下面包含了“运行模型比较”的伪代码作为算法1(图3)。 “运行模型比较”为每个路径使用“表型改变”过程,以计算路径中的突变导致表型改变的可能性。 “表型更改”的伪代码如图3所示。“表型更改”使用几种过程来设置参数分布(上述过程B,C)。它们的伪代码如图4所示。要使结果可视化,请使用代码“ Plot Model Comparison”(目标图形 https://github.com/ericlibby/BioProtocol/blob/master/PlotModelComparison.m )创建等高线图,以显示相对概率的log 2 比两个路径之间的轴,轴表示启用和禁用更改的可能性。


    图3.“ RunModelComparison”和“ PhenotypeChange”过程的伪代码。“ PhenotypeChange”计算路径突变引起表型改变的可能性。 “ RunModelComparison”计算使用路径1相对于路径2的相对可能性。可通过 https://github.com/ericlibby/BioProtocol/blob/master/RunModelComparison.m。


    图4。图3所示的“ PhenotypeChange”过程所使用的过程的伪代码。未列出SolveModel过程,因为它假定是可作为内置函数或库中的数值求解器/ package。

数据分析

运行代码“运行模型比较”的结果( https://github.com /ericlibby/BioProtocol/blob/master/RunModelComparison.m )和“图模型比较”( https://github.com/ericlibby/BioProtocol/blob/master/PlotModelComparison.m )是一个等高线图,它显示了针对不同的值,途径产生表型变化的可能性比。增强和禁用突变的概率(分别为垂直轴和水平轴)。在统计分析方面,这取决于得出的结论。等高线图仅显示每个路径的平均似然比。进一步的分析可以研究可能性的标准偏差或最大值/最小值。我们建议执行其他分析以验证任何结论。特别地,可以通过改变概率分布来再次执行分析以测试参数敏感性。或者,可以将迭代次数增加/减少一些因素,以评估轮廓图是否保持稳定。这些分析的示例可以在Lind的等距图5及其相关方法中找到,2019年。

致谢

该协议改编自(Lind et al。,2019)。

利益争夺

作者声明没有竞争性的经济利益。

参考文献

  1. Bocci,F.,Suzuki,Y.,Lu,M.和Onuchic,J.N.(2018)。 代谢时空动力学在调节生物膜菌落扩展中的作用。 Proc Natl Acad美国科学(Sci USA) 115:4288-4293。
  2. Bromberg,Y。和Rost,B。(2007)。 SNAP:预测非同义多态性对功能的影响。 核酸Res 35(11):3823-3835。
  3. Capriotti,E.,Calabrese,R.,Fariselli,P.,Martelli,P.L.,Altman,R.B.和Casadio,R.(2013)。 WS-SNPs&amp; GO:使用功能注释来预测人类蛋白质变体的有害影响的网络服务器 BMC基因组学 14补充3:S6。
  4. Capriotti,E.,Fariselli,P.和Casadio,R.(2005)。 I-Mutant2.0:预测蛋白质序列或结构突变后稳定性的变化。 Nucleic Acids Res 33(Web服务器问题):W306-310。
  5. Celniker,G.,Nimrod,G.,Ashkenazy,H.,Glaser,F.,Martz,E.,Mayrose,I.,Pupko,T.和Ben-Tal,N.(2013)。 ConSurf:使用进化数据提出有关蛋白质功能的可检验假设。 Isr J Chem 53:199-206。
  6. Chandrasekaran,S.和Price,N.D.(2010)。 大肠杆菌中基因组规模代谢和调控网络的概率整合建模和结核分枝杆菌。 Proc Natl Acad Sci USA 107:17845-17850。
  7. Choi,Y.和Chan,A.P.(2015)。 PROVEAN网络服务器:一种预测氨基酸替代和插入缺失功能作用的工具。 生物信息学 31(16):2745-2747。
  8. Dehouck,Y.,Kwasigroch,J.M.,Gilis,D.和Rooman,M.(2011)。 PoPMuSiC 2.1:用于评估突变和序列最优时蛋白质稳定性变化的Web服务器。 a> BMC生物信息学 12:151。
  9. Edwards,J。S.和Palsson,B。O.(2000)。 代谢通量平衡分析和大肠杆菌 K-的计算机分析12个基因缺失。 BMC生物信息学 1:1。
  10. Kumar,P.,Henikoff,S.和Ng,P.C.(2009)。 使用SIFT算法预测编码非同义变体对蛋白质功能的影响。 Nat Protoc 4(7):1073-1081。&nbsp;
  11. Lind,P.A.(2018年)。 假单胞菌实验进化的表型和遗传结果的进化预测。 bioRxiv。 342261。
  12. Lind,P.A.,Arvidsson,L.,Berg,O.G.和Andersson,D.I.(2017)。 不同蛋白质之间的突变鲁棒性变化和适应性效应的可预测性。 Mol Biol Evol 34(2):408-418。
  13. Lind,P.A.,Farr.A.D。和Rainey,P.B。(2015年)。 实验进化揭示了进化途径中隐藏的多样性。 Elife 4:e07074。
  14. Lind,P.A.,Libby,E.,Herzog,J.和Rainey,P.B.(2019年)。 预测新适应性表型的突变途径。 Elife 8 :e38822。
  15. Liu,J.,Prindle,A.,Humphries,J.,Gabalda-Sagarra,M.,Asally,M.,Lee,D.Y.,Ly,S.,Garcia-Ojalvo,J.和Suel,G.M.(2015)。 代谢共依赖性在生物膜内引起集体振荡。 自然 523(7562):550-554。
  16. Luksza,M.和Lassig,M.(2014)。 一种流感的预测适应度模型。 自然 507( 7490):57-61。
  17. McCandlish,D.M.和Stoltzfus,A.(2014)。 使用注视概率对演化进行建模:历史和启示。 Q Rev Biol 89(3):225-252。
  18. Neher,R.A.,Russell,C.A.和Shraiman,B.I.(2014)。 根据族谱树的形状预测进化。 Elife 3:e03568。
  19. O'Brien EJ,Monk JM,Palsson BO。 (2015)。 使用基因组规模模型来预测生物功能。 细胞
  20. Yates,C.M.,Filippis,I.,Kelley,L.A.和Sternberg,M.J.(2014)。 可疑:使用网络功能增强了对单个氨基酸变异(SAV)表型的预测。 J Mol Biol 426(14):2692-2701。
登录/注册账号可免费阅读全文
  • English
  • 中文翻译
免责声明 × 为了向广大用户提供经翻译的内容,www.bio-protocol.org 采用人工翻译与计算机翻译结合的技术翻译了本文章。基于计算机的翻译质量再高,也不及 100% 的人工翻译的质量。为此,我们始终建议用户参考原始英文版本。 Bio-protocol., LLC对翻译版本的准确性不承担任何责任。
Copyright Libby and Lind. This article is distributed under the terms of the Creative Commons Attribution License (CC BY 4.0).
引用: Readers should cite both the Bio-protocol article and the original research article where this protocol was used:
  1. Libby, E. and Lind, P. A. (2019). Probabilistic Models for Predicting Mutational Routes to New Adaptive Phenotypes. Bio-protocol 9(20): e3407. DOI: 10.21769/BioProtoc.3407.
  2. Lind, P. A., Libby, E., Herzog, J. and Rainey, P. B. (2019). Predicting mutational routes to new adaptive phenotypes. Elife 8: e38822.
提问与回复

(提问前,请先登录)bio-protocol作为媒介平台,会将您的问题转发给作者,并将作者的回复发送至您的邮箱(在bio-protocol注册时所用的邮箱)。为了作者与用户间沟通流畅(作者能准确理解您所遇到的问题并给与正确的建议),我们鼓励用户用图片的形式来说明遇到的问题。

当遇到任何问题时,强烈推荐您通过上传图片的形式提交相关数据。