One of the most complex questions in quantitative biology is how to manage noise sources and the subsequent consequences for cell functions. Noise in genetic networks is inevitable, as chemical reactions are probabilistic and often, genes, mRNAs and proteins are present in variable numbers per cell. Previous research has focused on counting these numbers using experimental methods such as complex fluorescent techniques or theoretical methods by characterizing the probability distribution of mRNAs and proteins numbers in cells. In this work, we propose a modeling based approach; we build a mathematical model that is used to predict the number of mRNAs and proteins over time, and develop a computational method to extract the noise-related information in such a biological system. Our approach contributes to answering the question of how the number of mRNA and proteins change in living cells over time and how these changes induce noise. Moreover, we calculate the entropy of the system; this turns out to be important information for prediction which could allow us to understand how noise information is generated and expanded.
Randomness, or noise, in biological systems has long been predicted from basic physical principles [
To understand noise in biological systems, biochemical circuits and genetic networks are often used as the measured noise properties to elucidate the structure and the function of the underlying gene circuit [
d n ( t ) d t = α − γ n ( t ) (1)
with parameters α representing the rate of production and γ the rate of decay of number of proteins n ( t ) . However, such continuous time formulation neglects the discrete nature of proteins and the random timing of molecular transition [
In general, the Kolmogorov’s equations are used as master equation to capture the distribution of chemical components of the gene circuit over time. The state of the system is defined by a vector n ( t ) = ( n 1 , n 2 , ⋯ , n N ) , where n i ( t ) represents the i-th component of molecule n at time t; a i and v i are internal parameters representing respectively the propensity of the dynamic and the actual change in x i , resulting from the change in the previous state The probability, p ( n , t ) , that the system evolves into the state n ( t ) = ( n , t ) at time t is described by the following partial differential equation:
∂ p ( n , t ) ∂ t = ∑ j = 1 N a j ( n − ν j ) p ( n − ν j ) − a j ( n ) p ( n ) (2)
This equation makes sense only if we assume that the probability for two or more reactions to occur in the time interval d t is negligible compared to the case when only one reaction occurs. In addition, (2) can only be solved numerically for relatively simple systems. In a recent work by [
∂ P m , n ∂ t = ν 0 ( P m − 1 , n − P m , n ) + ν 1 ( P m , n − 1 − P m , n ) + d 0 [ ( m + 1 ) P m + 1 , n − m P m , n ] + d 1 [ ( n + 1 ) P m , n + 1 − n P m , n ] (3)
The meanings of the rates in (3) are: ν 0 is the probability per unit time of transcription, ν 1 the probability per unit of translation, d 0 the probability per unit time of degradation of an mRNA, and d 1 the probability per unit time of degradation of a protein. The authors use a particular generating function and transform (3) into a first order PDE which is solved using a simple approximation. However, this model works only on a single cell, and all rates ν 0 , ν 1 , d 0 and d 1 are fixed over time. Further, by assuming that the protein synthesis occurs in bursts ( m = 0 ) , the authors derive the Kolmogorov (master) equation for gene expression that considers only proteins, by implicitly including mRNAs (since n and m seem to be correlated over time). In the next section, we shall re-examine this model and propose a new one in order to overcome the above limitations.
Our setup is motivated by the necessity to overcome the limitations from the previous models by increasing the cell numbers and relaxing the restriction on constant parameters. We propose a new, flexible, and more general, model for a population of N cells. This model is an extended version of the previous Kolmogorov’s equation with additional cell-dependent constraints.
{ ∂ p ( m , t / m 0 , t 0 ) ∂ t = ∑ j = 1 N a j ( m − ν j ) p ( m − ν j , t / m 0 , t 0 ) − a j ( m ) p ( m , t / m 0 , t 0 ) ∂ p ( n , t / n 0 , t 0 ) ∂ t = ∑ j = 1 N a j ( n − d j ) p ( n − d j , t / n 0 , t 0 ) − a j ( n ) p ( n , t / n 0 , t 0 ) (4)
The parameters of the model have an autoregressive form:
{ a j ( m ) = ϑ 1 a j − 1 ( m ) + θ 1 a j ( n ) = ϑ 1 a j − 1 ( n ) + θ 1 (5)
The transcription, translation and degradation rates are assumed to vary from one cell to another as
ν j = ν 0 e − 0.005 j and d j = d 0 e − 0.001 j (6)
We assume for k ∈ [ 0 , N ] , the first ν 1 , ⋯ , ν k are sequences of transcription rates and the late ν k + 1 , ⋯ , ν N are sequences of translation rates with ν 0 being the fixed initial rate. Our model, which is composed of the Equations (4)-(6), is well adapted to various real biological promoter change. We shall notice that Equation (4) is a system of 200 equations with 100 by 2 unknowns, which is likely to be only numerically solvable after some good approximations. To efficiently predict the number of mRNAs and proteins over time, we shall rely on the following assumptions.
Proposition 1. Over time the number of mRNAs/Proteins is perfectly correlated with the probability mass functions of mRNAs m(t) and proteins n(t) respectively. That is, m ( t ) = p ( m , t ) m 0 and n ( t ) = p ( n , t ) n 0 , where m 0 and n 0 are initial measurements.
Proof: The proof follows from our algorithm and solution in this paper. ,
Proposition 2. Let n = n ( t ) be the number of proteins and η = η ( n , Δ t ) be the noise generated by n proteins (or m for mRNAs) in the same time interval Δ t . Then there exists a unique constant C such that η ( n , Δ t ) = C p ( n , Δ t ) which means that noise is cells is proportionally correlated to the probability distribution of protein and mRNA numbers.
Proof:
Let η = η ( n , Δ t ) , Δ n ( t ) = n ( t ) − n ( t − 1 ) be respectively the noise and the number of proteins in a cell. By the simple decomposition of numbers of mRNA/ proteins, p ( Δ n ( t ) ) = p ( n , Δ ( t ) ) and p ( Δ n ( t ) ) = p ( n ( t ) ) − p ( n ( t − 1 ) ) , (by the additivity property of probability distribution. We also have, using the
definition, that η = Δ n Δ t , p ( Δ n ) = Δ n N and ∑ n = N . This implies that p ( Δ n ( t ) ) = Δ n N ⇒ N p ( Δ n ( t ) ) = Δ n ( t ) = n ( t ) − n ( t − 1 ) multiplying the right side of above with Δ t Δ t and we obtain N × p ( Δ n ( t ) ) = n ( t ) − n ( t − 1 ) Δ t Δ t .
since η ( n , Δ t ) = n ( t ) − n ( t − 1 ) Δ t
and N × p ( Δ n ( t ) ) = η ( n , Δ t ) × Δ t
thus N Δ t × p ( Δ n ( t ) ) = η ( n , Δ t )
leading to C × p ( n , Δ ( t ) ) = η ( n , Δ t )
Finally we conclusion that C = N Δ t . ,
Here we put Δ t = T N 0 where T is the total time, N 0 the total number of
points in the simulation and N is the total number of mRNA or proteins in a single cell. In the next section we introduce our method and algorithm for solving Equations (4)-(6).
We propose a straightforward method of solving the above problem based on numerical approximation via the following algorithm. As the analytical solution to Equation (4) is (at least) hard to obtain, even for a “reasonable” number of cells, a numerical algorithm using an adapted stochastic simulation approach is proposed in this paper. In our algorithm, two random variables m ( t ) and n ( t ) determine the temporal evolution of the system. The variable τ k is the time for the next event to occurs, the probability density of an event (appearance of m(t) or n(t)) is evaluated based upon our model (4), so as to give a better flexibility and applicability to the approach in comparison with previous ones. The main purpose of creating such an algorithm is to simultaneously simulate the process noise, while predicting the online probability mass function (»probability density) of each event over time. An important assumption here is that the hypothetical probability distribution functions (p.m.fs) of the translation and transcription rates are of the form ν j ~ N ( 2 , 0.05 ) and the mRNA and protein degradation rates are d j ~ N ( 2 , 0.05 ) . This is in line with the existence of a one-to-one relation between the dynamic and distribution for predictable dynamic systems. We will present our algorithm in the next section of our work.
Our AlgorithmInput: Initial data m 0 , n 0
Outputs: P m , P n
1. Set a 0 ( m ) : = m 0 , a 0 ( n ) : = n 0 .
2. For j = 1:k do [k = number of iterations]
a. Let ν j ~ N ( 2 , 0.05 ) , d j ~ N ( 2 , 0.05 ) be the changes associated to a single event;
b. Compute a j ( m ) = θ a j − 1 ( m ) + ϑ , a j ( n ) = θ a j − 1 ( n ) + ϑ ;
c. Compute α m ( j ) = ϑ α m ( j − 1 ) + θ , α n ( j ) = ϑ α n ( j − 1 ) + θ ;
d. Compute P m ( j ) = − v j ( α m ( j ) − v j ) + ξ 1 ( j ) 1 + v j , P n ( j ) = − d j ( α n ( j ) − d j ) + ξ 2 ( j ) 1 + d j , where ξ 1 ( j ) ~ P o ( 10 ) , ξ 2 ( j ) ~ P o ( 10 ) ;
e. Normalize P m , P n .
3. Output P m , P n .
End
The initial data here is a matrix of randomly generated numbers between one and fifty for mRNAs and between one and forty for proteins. The rows represent the cell numbers and the columns are the number of mRNAs/proteins counted at each time interval. Therefore we have 100 cells (population) and 50 samples taken at a time interval of one unit, and the total time of 50 time units in the entire population; (a unite could be second, minute or hour depending on the experiment). The bar and image pots of the initial data are shown in
Our results will show various figures related to our solutions. We first plot the variability of the number of protein in cells over time for a sample of 50.
Next, we plot the solutions of (4) over time and explain their relevance for our work. pmf (probability mass function) of the mRNA and proteins in separate graphs for each sample, and further we plot the histograms of the distribution and finally the scatter plot of P n against P m . Our observations are presented in the caption of each figure.
It can be seen that all probability values are between 0.1 and 0.9 and do not overlap in most of the cases; this is an indication that mRNAs and proteins number may be dynamically dependent, and therefore correlated. Next, we predict the number of mRNAs and proteins m j , n j using a straightforward probabilistic concept which states that “a good value of m (or n) depends on a good guess of p”. The prediction for the number of mRNA and Protein m j , n j (for iteration j = 1 , 2 , 3 , ⋯ , 100 ) are then given by the following Markov equations.
m j = { P m ( 0 ) * m 0 ; if j = 0 P m ( j ) * m j − 1 ; if j > 0 (7)
n j = { P n ( 0 ) * n 0 ; if j = 0 P n ( j ) * n j − 1 ; if j > 0 (8)
Leading to the following results for mRNA
To measure the uncertainty associated with each sample of mRNA or proteins count, we introduce the concept of entropy over a population, which is calculated as follows:
(for mRNAs) H ( m ) = − ∑ j = 1 N p ( m j ) log ( p ( m j ) ) (9)
(for Proteins) H ( n ) = − ∑ j = 1 N p ( n j ) log ( p ( n j ) ) (10)
Computational results are shown in the figures below in the discussion section.
We have shown (
the standard deviation is clearly not constant over time. Such distributions are poorly characterized by Gaussian characteristics. This paper was primarily designed to promote a modelling culture among noise biologists, modellers and to cope with the noise source and consequences in cell development.
The advantage of counting single molecules (mRNAs or proteins) is that, one obtains the probability distribution of molecules corresponding to each stage of the “central dogma” of molecular biology for each single gene. The mathematical model developed here differs from those that cellular biologists are accustomed to encountering [
The authors would like to thank Dr. Sakumura for dedicating his precious time, giving many insightful comments and suggestions. This work was supported by the GCOE International Senior Research Fellowship, NAIST and the Grant of Aid from the Ministry of Education, Culture, Science and Technology (MEXT) Japan. We also thank the Universities of Sorbonne (France) and AUAF for their generous support and collaboration.
Jimbo, H.C., Ngongo, S.I., Mbassi, A. and Andjiga, N.G. (2017) Novel Quantitative Approach for Predicting mRNA/Protein Counts in Living Cells. Applied Mathematics, 8, 1128-1139. https://doi.org/10.4236/am.2017.88085