A FRAMEWORK FOR MODELING IN REGULATORY NETWORKS

Mohsen Ben Hassine1 , Radhi Mhiri2, Lamine Mili3

1 ISET de Nabeul, Computer Engineering,Tunisia

2 Faculté desSciences de Tunis, Electrical Engineering,Tunisia

3VirginiaTech , Electrical and Computer Engineering , USA

Abstract

The study of regulatory networks in systems biology and their ensuing dynamics is a critical task to understand the huge genomic data being currently collected. Advances in nanotechnology enable scientists for the first time to trace the biological processes on a nanoscale by tracking the molecule movements. The projection of the real system, using graphical and mathematical tools enables biologists to understand better,and even predict itsbehaviour. Nevertheless the lack of a general framework that leads the biologist to an efficiency modellingis a great challenge. In this paper we try to explain more issues concerning the modelling of regulatory network using a straightforward method.

Keywords:auto-regulation, synthetic circuits, delay time, sensitivity analysis, homeostasis, noise, data mining.

  1. MODELLING IN SYSTEMS BIOLOGY

Most kinds of systems that are likely to be of interest involve entities (proteins, metabolites, signaling molecules, etc.) that can be cast as “nodes” interacting with each other via “edges” representing reactions that may be catalyzed via other substances such as enzymes. These will also typically involve feedback loops in which some of the nodes interact directly with the edges. We refer to the basic constitution of this kind of representation as a structural model.The classical modelling strategy in biology (and in engineering), the ordinaryDifferential equation (ODE) approach contains three initial phases, and starts with this kind of structural model, in which the reactions and effectors are known. The next level refers to the kinetic rate equations describing the local properties of each edge, the third level involve the parameterization of the model, in terms of providing values for the parameters.

Armed with such knowledge, any number of software packages can predict the time evolution of the variables (the concentrations) until they may reach a steady state. This is done (internally) by recasting the system as a series of coupled ordinary differential equations which are then solved numerically. We refer to this type of operation as forward modelling, and provided that the structural model, equations, and values of the parameters are known, it is comparatively easy to produce such models and compare them with an experimental reality In such cases, however, the experimental data that are most readily available do not include the parameters at all, and are simply measurements of the (time-dependent) variables, of which fluxes and concentrations are the most common. Comparison of the data with the forward model is much more difficult, as we have to solve an inverse modelling, reverse engineering or system identification problem.

Direct solution of such problems is essentially impossible, as they are normally hugely underdetermined and do not have an analytical solution. The normal approach is thus an iterative one in which a candidate set of parameters is proposed, the system run in the forward direction, and on the basis of some metric of closeness to the desired output a new set of parameters is tested. Eventually (assuming that the structural model and the equations are adequate), a satisfactory set of parameters, and hence solutions, will be found. These methods are much more computer-intensive than those required for simple forward modelling, as potentially many thousands or even millions of candidate models must be tested. We note, however, that there are a number of other modelling strategies and issues that may lead one to wish to choose different types of model from that described. First, the ODE model assumes that compartments are well stirred and that the concentrations of the participants are sufficiently great as to permit fluctuations to be ignored. If this is not the case then stochastic simulations (SS) are required. If flow of substances between many contiguous compartments is involved, and knowledge of the spatial dynamics is required (as is common in computational fluid dynamics), partial differential equations (PDEs) are necessary. SS and PDE models are again much more computationally intensive, although in the latter case the designation of a smaller subset of representative compartments may be effective (Mendes andKell, 2001).

Data mining is the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques. This enables modeller to construct network in a reduced and optimized way,thus offer a “middle-out” strategy to keep insight between the two approaches: bottom-up and top-down (fig.1).

Figure.1: Middle-out modeling strategy

  1. MATHEMATICAL MODELLING AND SYSTEM DYNAMICS THEORY

In order to turn the static map (biologist system graphic representation) into dynamic model that can provide insight into the temporal evolution of biochemical reaction networks, a set of differential equations is needed. The general rule for expressing the evolution of biochemical specific specie (x) is: dx/dt = rate of production – rate of decay ± rate of transportation. For each interaction between species we can attribute a specific function, for example, consider the Goldbeter model (fig.2) of mitotic oscillator (Goldbeter 1991)

Figure.2: The Mitotic Oscillator

The rate of cyclin production is a linear process (vi), the decay process is composed of tow parts: a natural exponential decay (death) and invoked decay caused by a protease-cyclin complex (X,C).We can convert our ODE in a set of block diagrams and nodes as used in engineering sciences (fig.3), for example, the cyclin equation can be modelled as in fig.4

Figure.3: Examples of blocks diagrams and nodes used in engineering sciences

Figure.4: The Cyclin circuit

One important feature of a biochemical network is the robustness which evaluates the sensitivity of the system or the ability to preserve its homeostasis (equilibrium). As in engineering we can stimulate the system by varying the input signal (parameter) and observe its output effect, we can also test validity and efficiency of feedback by opening the loop involved (D. Angeli et al. 2003),

3. MATHEMATICALFUNCTIONS AND CELL PHENOTYPES

Regulatory network are governed by the same mathematical functions usually used by modellers to express: positive vs negative feedback, activation vs repression and inhibition, the fraction of free operator…etc.the next table (table1) gives the most useful functions for regulation.

Yagil rules / -Inducible enzyme (as lactose)
-Repressible enzyme ( as trp) / F(O)= (1+k1*Ep)/(k+k1*Ep)
F(O)= (1+k1*Ep)/(1+k*k1*Ep)
Michaelis-Menten / Enzyme catalysed reaction / F(S)= Vmax*S/(S+Km)
Hill functions / Activation / F(X)=Vmax*TFn/(Km+TFn)
Hill functions / Repression / F(X)=Vmax/(Km+TFn)
MM with ... / Competitive inhibition / F(S)= Vmax*S/(S+KS(1+Ki))
MM without... / Competitive inhibition / F(S)= Vmax*S/((S+KS)*(1+I/Ki))
Hill functions / Multiple TF activation (or gate) / F(S)= (TF1/K1)n /(1+(TF1/K1)n+ (TF2/K2)n)
Hill functions / Multiple TF repression (or gate) / F(S)=1 /(1+(TF1/K1)n+ (TF2/K2)n)
Gaussian function / Internal noise / F(X)=N(μ,σ2)
Delay function / Time Delay ( transcription, translation initiation) / X(t)=F(Y(t-τ))

Table 1: Useful function of regulation

After a long experience in regulatory, transduction and metabolic networksmodelling, we can deduce now a lot of rules about the cell phenotypes: apoptosis, proliferation, differentiation, stress response, mitosis, bifurcation …etc.

  1. Negative feedback loops

Negative feedback loops, common in biochemical pathways, are known to provide stability, and withstand considerable variations and random perturbations of biochemical parameters.

  1. Positive feedback loops

The positive-feedback network thus forms the basis for cellular memory, allowing cells of identical genotype to achieve different phenotypes depending on the external signals received. The behaviour of the system therefore depends on its history, it can drive to hysteresis.

  1. Delay time

A generic feature in all intracellular biochemical processes is the time required to complete the whole sequence of reactions to yield any observable quantity in biological functions,theoretically time delay is known to be a source of instability, and has been attributed to lead to oscillations or transient dynamics in several biological functions.The delay in repression for example is the primary factor for inducing increased inter-cellular heterogeneity in gene expression in a population is shown theoretically and experimentally.

  1. Noise

Genetically identical cells exposed to the same environmental conditions can show significant variation in molecular content and marked differences in phenotypic characteristics. This variability is linked to stochasticity in gene expression, which is generally viewed as having detrimental effects on cellular function with potential implications for disease. However, stochasticity in gene expression can also be advantageous. It can provide the flexibility needed by cells to adapt to fluctuating environments or respond to sudden stresses, and a mechanism by which population heterogeneity can be established during cellular differentiation and development. Negative feedback reduces fluctuations by increasing expression when protein numbers are low and decreasing expression when protein numbers are high, negative feedback is more likely to evolve as an attenuator of stochasticity in systems dominated by extrinsic fluctuations (Paulsson, 2004;Hooshangi and Weiss, 2006). Alternatively, intrinsic fluctuations could be reduced by an additional positive feedback loop to maintain high protein copy numbers despite the negative feedback needed to attenuate extrinsic fluctuations.

4. SENSITIVITY,SYNTHETICCIRCUITS AND MEASUREMENT TECHNIQUES

Sensitivity analysis represents a cornerstone in the analysis of complex systems. It treats the effect of changing some parameter P (in the model) on the reaction of some system variables. The goals of this analysis are:

- Determine factors that may contribute to output variability and so need the most consideration

-find out parameters that can be eliminated in order to simplify the modelwithout altering its behavior grossly

-find the optimal region for use in a calibration study

-Check which groups of factors interact with each other.

- Evaluate the model, thus creating an output distribution or response.

- Assess the influence of each variable or group of variables using correlation/regression, Bayesian inference, machine learning, or other methods (data mining).

In order to break down the complexity of regulatory network, the forward engineering of gene circuits and its ensuingexperimental techniques (mutant cells, as cdc25Δ and wee1mutation in yeast, B.Novak 2001) enable modelers to builddesired network with specific properties predictedfrom mathematical models using knowledge from biochemistry,molecular biology, and genetics. Consequently we can engineer new cellular behaviour, and improve understanding of naturally occurring networks (Bratsun et al.2005), the next figure (fig.5) presents some samples:

Figure.5: synthetic genetic networks.

Massive amounts of data are being generated by genomics and proteomics projects, thanks to sophisticated genetic engineering tools (gene knock-outs and insertions, PCR) and measurement technologies (fluorescent proteins, microarrays, blotting, FRET). Polymerase chain reaction (PCR) is a technique that amplifies DNA (typically a gene or part of a gene). Creating multiple copies of a piece of DNA, which would otherwise be present in too small a quantity to detect, PCR enables the use of measurement techniques. Suppose that we wish to know at what rate a certain gene X is being transcribed under a particular set of conditions in which the cell finds itself. Fluorescent proteins may be used for that purpose. For instance, green fluorescent protein (GFP) is a protein with the property that it fluoresces in green when exposed to UV light. It is produced by the jellyfish Aequoria victoria, and its gene has been isolated so that it can be used as a reporter gene. The GFP gene is inserted (cloned) into the chromosome, adjacent to or very close to the location of gene X, so both are controlled by the same promoter region. Thus, gene X and GFP are transcribed simultaneously and then translated (Fig. 6), and so by measuring the intensity of the GFP light emitted one can estimate how much of X is being expressed. Fluorescent protein methods are particularly useful when combined with flow cytometry. Flow Cytometry devices can be used to sort individual cells into different groups, on the basis of characteristics such as cell size, shape, or amount of measured fluorescence, and at rates of up to thousands of cells per second. In this manner, it is possible, for instance, to count how many cells in a population express a particular gene under a specific set of conditions.

Figure. 6: Fluorescent protein method

5. NOISE INGENETIC NETWORKS

Biochemical networks are stochastic: fluctuations in numbers of molecules are generated intrinsically by the dynamics of the network and extrinsically by interactions of the network (fig. 7) with other stochastic systems (Elowitz et al, 2002; Swain et al, 2002). Stochastic effects in protein numbers can drive developmental decisions (Arkin et al, 1998; Maamar et al,2007; Nachman et al, 2007; Suel et al, 2007), be inherited for several generations (Rosenfeld et al, 2005; Kaufmann et al, 2007), and have perhaps influenced the organization of the genome (Swain, 2004; Becskei et al, 2005). Intrinsic fluctuations are generated by intermolecular collisions affecting the timing of individual reactions. Their strength is increased by low copy numbers. The source of extrinsic fluctuations, however, is mostly unknown (Kaern et al, 2005),although cell cycle effects (Rosenfeld et al, 2005; Volfson et al, 2006) and upstream networks (Volfson et al, 2006)contribute. Yet extrinsic fluctuations dominate cellular variation in both prokaryotes (Elowitz et al, 2002) and eukaryotes (Raser and O’Shea, 2004). They are colored, having a lifetime that is not negligible but comparable to the cell cycle (Rosenfeld et al, 2005), and they are nonspecific, potentially affecting equallymany molecules in the system (Pedraza and van Oudenaarden, 2005). They are thus difficult to model and their effects hard to predict (Austin et al, 2006; Cox et al, 2006; Geva-Zatorsky et al,2006; Scott et al, 2006; Sigal et al, 2006; Tanase-Nicola et al, 2006; Tsimring et al, 2006; Volfson et al, 2006; Maithreye and Sinha, 2007).

Intrinsic and extrinsic stochasticity can be measured by creating a copy of the network of interest in the same cellular environment as the original network (Elowitz et al, 2002). Wecan then define intrinsic and extrinsic variables, and their fluctuations generate intrinsic and extrinsic stochasticity (Swain et al, 2002). Intrinsic variables typically specify the copy numbers of the molecular components of the network. Their values differ for each copy of the network. Extrinsic variables often describe molecules that affect equally each copy of the network. Their values are therefore the same for each copy.

Figure . 7:Noise in regulatory network

Noise strength is usually reported in terms of the standard deviation σ of a stochastic variable q. The Fano factor, defined as F = σq2 / <q> ,is related to the standard deviation by σ/q>=(F/q>)1/2; because q measuresmolecule number, Fis a dimensionless quantity. When number fluctuations aredue to a Poisson process, we have F= 1. The Fano factor of an arbitrarystochastic system reveals deviations from Poissonianbehaviour. It is a sensitivemeasure of noise and the unit in which we report our results. If we consider a single gene we can draw the ode equations as follows (fig.8) :

Figure .8: Single Gene Expression

The average number of proteinssynthesized per mRNA transcript is:N= Kp/γR, the mean number of number is : KR * N / γP ,finally the fano factor ≈ N+1

If we take account of the possibility of mutual activation and repression of the promoter and try to tune the transcriptional and the translational efficiency we can get different noise behaviour(fig.9)

Figure .9: Slow promoter transitions and transcriptional bursting (M.Kærn2005)

Intrinsic and extrinsic noise can be measured and distinguished with two reporters genes (cfp, yfp) controlled by identical regulatory sequences. In the absence of intrinsic noise,the two fluorescent proteins fluctuate in a correlated fashion over time in a single cell. Thus, in a population, each cell will have the same amount of both proteins, although that amount will differ from cell to cell because of extrinsic noise .Expression of the two genes may become uncorrelated in individual cells because of intrinsic noise, giving rise to a population in which some cells express more of one fluorescent protein than the other.

The next scatter plot(fig. 10) presentsthefluorescent technique using two strains of e_coli : onequiet (M22) and one noisy (D22). Each point represents the mean fluorescenceintensities from one cell. Spread of points perpendicular to thediagonal line on which CFP and YFP intensities are equal corresponds tointrinsic noise, whereas spread parallel to this line is increased by extrinsicnoise. The total noise generated is defined by : ηtot2 =ηint2+ηext2

Figure.10: Experimental quantification of noise ( Elowitz. Et al 2002)

Finally, we can pronounce some important results concerning the study of noise in genetic networks:

-Extrinsic noise is not gene-specific, but intrinsic noise is.

- Extrinsic noise is predominant over intrinsic noise

- Noise does not depend on the regulatory pathway, neither on absolute rate of expression.

- Noise depends on the rate of a slow upstream promoter transition, such as chromatine remodelling

- Downstream effects of noise can have profound phenotypic consequences, drastically affecting the stability of gene expression.

- Noise (and consequently cell-to-cell variability) is amplified at transition in long cascades.

- Autoregulation in gene circuits (in particular negative feedback loops) provides stability.

-Noise can be controlled by kinetics parameters

5. CONCLUSION

Based on a lot of new research articles, This paper presents an overview of the engineering methods used for the modelling of regulatory networks. With this framework, it’s easy for the modeller to abstract its network, make a good analysisof its parts dependencies, study the sensitivity of the system (effect of tuning some parameters keys and noise) and even predict its behaviour (synthetic circuits, mutants…). For more details a lot of paradigms are available (2,3,4,5,6,8,12)

References

1.Armen R Kherlopian, Ting Song, Qi Duan. A review of imaging techniques for systems biology. BMC Systems Biology 2008, 2:74.

2. A. GOLDBETER. A minimal cascade model for the mitotic oscillator involving cyclinand cdc2 kinase. Proc. Nati. Acad. Sci. USA,Vol. 88, pp. 9107-9111, October 1991

3. SOMDATTA SINHA. A Simple Approach to StudyDesigns in ComplexBiochemical Pathways, 74th Annual Meeting, New DelhiOct. 31 – Nov. 2, 2008

4. David Angeli, James E. Ferrell, Jr., and Eduardo D. Sontag. Detection of multistability, bifurcations, andhysteresis in a large class of biologicalpositive-feedback systems, Pnas , 1822–1827, February 17, 2004

5.Michael C. Mackey ,, Moisés Santillán , Necmettin Yildirim . Modeling operon dynamics:the tryptophan and lactose operons as paradigms. C. R. Biologies 327 (2004) 211–224