Previous Article | Next Article ![]()
Journal of Bacteriology, May 2003, p. 2692-2699, Vol. 185, No. 9
0021-9193/03/$08.00+0 DOI: 10.1128/JB.185.9.2692-2699.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
| MINIREVIEW |
Department of Bioengineering, University of California, San Diego, La Jolla, California 92093-0412
|
|
|---|
Modern modeling approaches in biology need to be easily scalable and able to integrate available "-omics" data (38) that may contain tens of thousands of measurements. A constraint-based modeling approach (5, 14, 62) meets these criteria and at present is the only methodology by which genome-scale models have been constructed. The few parameters used in a constraint-based framework enable models to be built quickly and to encompass a larger portion of biochemical reaction networks than the portion currently encompassed by other modeling methodologies. To date, constraint-based models account for the largest metabolic models in terms of numbers of genes and reactions and have proven to be predictive of some types of data, including phenomic data (15, 26, 63), qualitative transcriptomic data (9), and gene knockout data (16, 54).
Escherichia coli is a well-studied organism, and much is known about its metabolism, regulation, and physiology. Constraint-based models of E. coli have been under development for the past 13 years. The continual growth in the size and scope of constraint-based E. coli models, as shown in Fig. 1, illustrates the iterative nature of in silico model building and how such models expand in scope and completeness over time. While many modeling approaches have been used to study E. coli (2, 12, 57, 65), in this minireview we focus on the development of successive constraint-based models that have been formulated to describe the metabolic network of E. coli and summarize the models' abilities to predict or explain phenotypic behavior. The principles that have been developed and the experiences that have been gained from modeling E. coli can be directly applied to modeling other organisms; this process has begun for Haemophilus influenzae (17), Helicobacter pylori (48), Saccharomyces cerevisiae (8, 23), and Methylobacterium extorquens (58), and more models are expected to emerge soon. The scope of constraint-based in silico models should continue to grow, and these models are likely to have a variety of uses in the near future.
![]() View larger version (45K): [in a new window] |
FIG. 1. Development of successive constraint-based FBA models of E. coli. Constraint-based models of E. coli first focused on metabolism. By the time the complete genome was sequenced (1997), only 26% of metabolic genes were accounted for in FBA models. Over the next 5 years the number grew to include nearly 80% of the metabolic genes. Methods for incorporating transcriptional regulation have been developed and implemented in a core metabolic model of E. coli, as have methods for including protein synthesis. Expanding the regulatory and protein synthesis models to the genome scale can be accomplished by using information that is known today (indicated by dotted lines). Further functional analysis of genes should increase the size of models (dashed lines). These three components can be combined to form an integrated model (E. coli i2K) that accounts for nearly 2,000 genes. The superscript letters indicate references, as follows: a, reference 4; b, reference 55; c, reference 32; d, references 59 and 60; e, references 42 and 43; f, reference 16; g, reference 11; h, reference 9; and i, reference 1.
|
Stoichiometric constraints can be represented by the matrix equation Sv = 0, where S is the stoichiometric matrix describing all the reactions in the network and v is a vector describing the fluxes through each of the reactions. Each column of S corresponds to an individual reaction, and the rows of S correspond to the different metabolites. The stoichiometric coefficients of a reaction are then represented as elements in column (i.e., Sij corresponds to the stoichiometric coefficient of the ith metabolite in the jth reaction). The equation Sv = 0 imposes the restriction that the total rate of production for any metabolite must equal the total rate of consumption for that metabolite. In addition to stoichiometric or mass balance constraints, thermodynamic constraints and enzyme capacity constraints place limits on the range of values for individual fluxes (vj) in the network. Enzyme capacity constraints place an upper limit on the values that a given flux can take. Application of thermodynamic constraints further restricts the range of flux values. If a reaction is irreversible, the corresponding flux must be greater than or equal to zero; however, reversible reactions can have positive or negative flux values. A more systematic representation of thermodynamic constraints has appeared (3, 44). Stoichiometric, enzyme capacity, and thermodynamic constraints represent hard inviolable physicochemical constraints that cells must abide by.
Given the governing constraints, the next step involves characterizing the allowable solution space and predicting which solution a cell is likely to use. Different techniques exist under the constraint-based modeling framework, including extreme pathway analysis (50), elementary mode analysis (52, 53), flux balance analysis (FBA) (5, 14, 19, 62), and minimization of metabolic adjustments (54), which aid in this process. Figure 2 illustrates the different constraint-based modeling techniques used to characterize the solution space defined by the network and the applied constraints.
![]() View larger version (38K): [in a new window] |
FIG. 2. Constraint-based modeling. Application of constraints to a reconstructed metabolic network leads to a defined solution space in which a cell's network must operate. From this solution space a number of methods have been developed that help predict or explain phenotypic behavior. Linear optimization can be used to find solutions in the space that maximize or minimize a given objective (5, 14, 19, 62), and mixed-integer linear programming (MILP) can be used to find multiple optima if they exist (30, 41). Elementary mode analysis (52, 53) and extreme pathway analysis (50) can be used to characterize vectors in the solution space; the edges of the space correspond to extreme pathways (EP) and are a subset of the elementary modes (EM). Phenotypic phase plane analysis shows for what conditions the metabolic network operates under different limitations (18). The effects of gene deletions can also be computed. In the diagram the old optimal solution (point a) does not lie in the new solution space. A new optimum can be calculated (point b), or a suboptimal solution that is closest to the old optimum can be calculated (point c) (54). In addition, work has been done by using experimental flux measurements (indicated by a point) to back-calculate objective functions (indicated by vectors) (6a).
|
|
View this table: [in a new window] |
TABLE 1. Elementary mode analysis and extreme pathway analysis models
|
|
View this table: [in a new window] |
TABLE 2. Flux balance models
|
The extreme pathways for a larger network containing 78 reactions and 53 metabolites were calculated for two different carbon sources, glucose and succinate. The extreme pathway vectors calculated for the conditions when the biomass precursors were included in the network as a growth reaction were correlated with results from FBA when growth was used as the objective function (49). More recently, the elementary modes in a larger representation of E. coli's metabolic network (containing 110 reactions and 89 metabolites) were calculated and analyzed (56). In this study five different carbon sources were used, and again a reaction representing the drain of metabolic precursors needed for growth was added to the network. Elementary mode analysis was used to determine which of 90 genes were essential for growth; a reported 90% of the predictions agreed with experimental data when phenotypes were classified as either growth or no growth. The number of elementary modes varied depending on the carbon source and ranged from 598 when acetate was the carbon source to 27,099 for glucose. The modes were further analyzed to identify which enzymes would most likely be regulated for changing growth conditions (i.e., different carbon sources). A good correlation between regulatory predictions and measured mRNA expression data was found (56).
The metabolic networks studied by elementary mode or extreme pathway analysis are smaller than the networks studied by FBA. This limitation is due to the computational complexity associated with calculating these vectors and not from limitations on known reaction stoichiometry (28).
Direct determination of optimality properties can be accomplished by using optimization procedures that circumvent the exhaustive enumeration of extreme pathway analysis or elementary mode analysis. Linear programming has been used extensively to determine the optimality properties of reconstructed E. coli networks.
-ketoglutarate [
KG], pyruvate, and acetyl coenzyme A) (32). In their analysis production of high-energy phosphate bonds on ATP and GTP was used as the objective function.
Majewski and Domach studied the optimal behavior of the metabolic network under two different types of constraints, enzymatic capacity constraints and electron transport chain constraints. Both types of constraints placed an upper limit on the value for fluxes through different reactions in the network. By maximizing the utilization of the network for the production of high-energy phosphate bonds under either an enzymatic constraint (
KG dehydrogenase) or an electron transport chain capacity constraint, equations describing the onset of acetate overflow and the rate of acetate production could be derived. The experimentally determined secretion patterns for acetate overflow in E. coli agreed with the network operating under enzymatic capacity constraints (32). The model led to the conclusion that the electron transport chain capacity is a constraint only when the growth rate approaches the maximum achievable growth rate (32).
Model built on Neidhardt's compendium. Varma and Palsson's reconstruction of E. coli's metabolic network contained both anabolic and catabolic reactions based on previously published information (34, 35). The model contained 53 catabolic reactions and 94 biosynthetic reactions that produce the amino acids, nucleic acids, and cell membrane and cell wall constituents found in cell biomass (59, 60). Several network properties were calculated from this model, such as optimal production of cofactors (ATP, NADPH, NADH) (60), optimal production of metabolic precursors (such as pyruvate or succinate) (60), maximal theoretical yields for amino acid and nucleic acid production (59), and evaluation of constraints (energy, redox, or stoichiometric) that restrict production of metabolites or cofactors (59, 60), as well as optimal flux distributions for biomass production (61). Model predictions agreed with experimental results when E. coli was grown on glucose minimal medium under aerobic and anaerobic conditions and if the metabolic network was optimized for the production of biomass constituents (63). Sensitivity analysis of the predicted growth rate with respect to the biomass composition was conducted, and it was found that the sensitivity of the biomass yield to changes in metabolite requirements was low (61).
Effects of growth rate-dependent biomass composition. Pramanik and Keasling's metabolic network was an expansion of Varma and Palsson's model and consisted of 300 reactions and 289 metabolites (an additional 17 reactions and 16 metabolites were added later) (42, 43). The biomass composition of E. coli varies with the carbon source and the growth rate (36). Varma and Palsson's model had used a fixed biomass composition based on previously published data, while Pramanik and Keasling's model used derived equations relating growth rate to biomass requirements, allowing for changes in biomass composition in a growth rate-dependent manner (43).
To analyze the accuracy of the model, 13 experimentally measured flux values (64) were compared to fluxes calculated by using Pramanik and Keasling's model (43). Three experimental conditions were examined: anaerobic growth on glucose, aerobic growth on acetate, and aerobic growth on acetate plus glucose. For aerobic growth on acetate plus glucose the average difference between experimental flux measurements and model predictions was 16%. Similar results were found for aerobic growth on acetate (17%). No experimental flux values were available for comparison for anaerobic glucose, but a branched TCA cycle (with an oxidative branch producing
KG and a reductive branch producing succinyl coenzyme A) had been observed experimentally; the model's prediction agreed with this observation. Further analysis of the model also indicated that the predicted flux values were sensitive to biomass composition (42, 43). Other studies have been conducted by using slight modifications of Pramanik and Keasling's model to investigate the effects of large-scale gene deletions or additions (6) and to calculate a minimal gene set (7).
Edwards and Palsson's model included 720 reactions and 436 metabolites involved in glycolysis, the TCA cycle, the pentose phosphate pathway, respiration, anaplerotic reactions, fermentative reactions, amino acid biosynthesis and degradation, nucleotide biosynthesis and interconversions, fatty acid biosynthesis and degradation, phospholipid biosynthesis, cofactor biosynthesis, and metabolite transport. This model was validated with mutant data (16), was used to design quantitative experiments (15), and was found to predict the outcome of adaptive evolution (26).
The results of in silico gene deletion studies were compared with growth data obtained with known mutants. The in vivo growth characteristics of a series of E. coli mutants on several different carbon sources were examined and compared to the in silico deletion results. In this analysis, 68 of 79 (86%) of the in silico predictions were consistent with the experimental observations (16). The predictions of the in silico E. coli model were highly consistent with phenotypes of known mutants and knockouts.
Phenotypic phase plane analysis was developed (18) and applied to Edwards and Palsson's E. coli model. Quantitative predictions regarding optimal usage of a carbon source and oxygen to maximize the growth rate were made and tested for a variety of substrates. Growth on M9 minimal medium with acetate, malate, or succinate as the primary carbon source agreed with the computational hypothesis (15, 26); however, glycerol supported only suboptimal growth of E. coli. Repeated exponential balanced growth batch cultures on glycerol were then incubated for 60 days (around 900 generations), and the cells reproducibly evolved towards the a priori predicted optimal growth behavior (26).
Incorporating effects of transcriptional regulation. A shortcoming of the purely stoichiometric metabolic models is that they do not account for transcriptional regulation, so all the gene products are assumed to be available to the cell to optimize its performance in a defined environment. This assumption is based on the rationale that E. coli would have evolved its regulatory network to allow optimal growth under conditions to which the microorganism was already adapted. Some instances where this assumption might not be true are for E. coli mutants or for growth on multiple carbon sources. It has also been found that some carbon sources do no initially support optimal behavior, although limited adaptive evolution data do suggest that over time E. coli adjusts its metabolic fluxes to find the optimal solution (26; S. S. Fong and B. O. Palsson, unpublished data).
To address this issue, the effects of transcriptional regulation have recently been included in a constraint-based model of E. coli central metabolism (9). The method for modeling transcriptional regulation is based on Boolean logic, where the genes can be either on or off, and their status is evaluated based on conditional if statements. The regulatory network has been reconstructed for central metabolism in E. coli by using this method. Covert and Palsson's metabolic and regulatory network includes 149 genes (16 are regulatory genes), which take part in 113 metabolic reactions, 45 of which are regulated by 16 regulatory proteins (9).
Covert and Palsson's regulated model has been used to compare regulated model predictions of gene deletions with mutant data and predictions from an unregulated model (9). The unregulated network correctly predicted 97 of 116 cases correctly (83.6%), while the regulated network predicted 106 of 116 cases correctly (91.4%); thus, incorporating regulation improved the accuracy of in silico knockout predictions. The model was also used to calculate time courses of batch growth. The in silico predictions were in agreement with the experimental by-product (acetate, formate, and ethanol) secretion patterns, as well as the glucose uptake and biomass production patterns (9).
Recent work has also been done to include the regulatory constraint-based model of E. coli (9). The expansion of this regulatory model to include the effects of regulation on larger genome-scale models of metabolism is one of the next foreseeable steps in building more accurate constraint-based models of E. coli. Tools that should aid in this effort include RegulonDB, a database containing information about gene regulation in E. coli that is publicly available (47), and methods that extract gene regulatory networks from transcriptomic data (24).
Finding objective functions. There are many issues that remain involving the selection of an objective function. Biomass compositions have often been used to compute the basis for optimal growth objective function. The effects of growth rate-dependent changes in biomass composition have already been accounted for (43). In addition to these advances, work is being done to back-calculate objective functions based on measured flux data, so that utilization of a calculated objective function yields a solution that minimizes the error between predicted and experimentally measured fluxes (6a).
Alternative solutions. A single linear optimization identifies one solution in the solution space; however, alternate optimal solutions can exist in the allowable solution space. These equivalent solutions can be calculated by using a variety of techniques, such as mixed-integer linear programming (30, 41), extreme pathway analysis (39), and elementary mode analysis (52); which optimal solution is actually used by the cell is still not known.
The in silico finding that the same phenotype can be attained in more than one way with the same underlying network gives rise to the possibility that it may be difficult to determine the true state of a cell. Preliminary experimental data support this expectation since strains that evolve to have the same growth phenotype (26) are not identical (Fong and Palsson, unpublished). Further, evolution of phosphotransferase system knockouts in E. coli also support this expectation (22).
Gene knockouts. The in silico representation of biological associations among genes, proteins, and reactions is important when the effects of gene deletions are modeled. Enzyme subunits and enzyme complexes need to be taken into account when associations among genes, enzymes, and reactions are made (Fig. 3). Deleting a gene in constraint-based models results in removing the reactions associated with the protein from the network, unless other isozymes are present. Removal of reactions changes the solution space, and some wild-type solutions might be eliminated (Fig. 2). Knocking out essential genes from the model produces no solutions which allow for cellular growth under the governing constraints.
![]() View larger version (18K): [in a new window] |
FIG. 3. Gene-protein-reaction associations. The association between the enzyme fumarate reductase and the genes which code for its subunits is shown. All four gene products come together to make a functional enzyme. This enzyme is capable of carrying out two reactions, (i) the transfer of electrons from menaquinol (MKH2) to fumarate (FUM) and (ii) the transfer of electrons from demethylmenaquinol (DMKH2) to FUM. The products of both reactions are succinate (SUCC) and either menaquinone (MK) or demethylmenaquinone (DMK). Deletion of any of the subunits would eliminate the functional enzyme. This is simulated by removing the two reactions from the network (unless an isozyme exists).
|
Application of additional constraints. Significant progress has been made in the last 13 years towards building constraint-based models of E. coli; they are now genome-scale models. Enhancing the predictive capabilities of these models in the future should be accomplished by broadening the scope of the models (including other cellular processes), as well as exploring the use of additional constraints. The utilization of other physicochemical constraints, such as the conservation of energy, kinetic constraints, osmotic balances, or electroneutrality, should further reduce the allowable solution space, resulting in more accurate predictions. A framework for implementing energy balance constraints (3, 44) has already been developed and applied to the central metabolic network of E. coli (3).
Integrated models. Other cellular processes can be described in a constraint-based modeling framework based on the genome sequence, such as transcription (1), translation (1), and DNA replication. These processes place direct metabolite and energy demands (i.e., through the objective function) on the metabolic network. These processes are coupled, and metabolism affects the rates of transcription, translation, and replication; these processes, in return, direct metabolism (Fig. 4). The development of constraint-based models that include metabolism, regulation, and protein synthesis should allow simultaneous reconciliation of diverse "-omics" data (such as proteomic, metabolomic, transcriptomic, and phenomic data) and back-calculation of biological parameters (such as promoter strengths).
![]() View larger version (43K): [in a new window] |
FIG. 4. Integrated constraint-based model of E. coli: the E. coli i2K model. Constraint-based modeling frameworks have been developed for metabolism (5, 14, 19, 50, 52, 62), regulation (9), transcription, and translation (1). The connectivity among the three modeling components is shown here. Integration of these three modeling components should produce an integrated model of E. coli that accounts for nearly 2,000 genes, referred to as the E. coli i2K model. This model can be used to reconcile diverse "-omics" data and utilize the data to more accurately predict a cellular phenotype.
|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»