12. Stochastic simulation of biological circuits


Concepts

  • Sampling out of probability distributions is a useful way to computationally determine their properties.

  • Master equations can equivalently be defined in terms of a set of moves and associated propensities and updates.

  • The Gillespie algorithm enables sampling out of probability distributions described by master equations.

Techniques

  • Application of the Gillespie algorithm.

  • Profiling of code for optimization.


[1]:
import multiprocessing
import tqdm

import numpy as np
import scipy.stats as st
import numba

import biocircuits

# Plotting modules
import bokeh.io
import bokeh.plotting

bokeh.io.output_notebook()

# Line profiler (can install with conda install line_profiler)
%load_ext line_profiler
Loading BokehJS ...

Note: To run this notebook, you will need to update the biocircuits package, which can be done by doing pip install --upgrade biocircuits on the command line.


Sampling out of master equations

For some stories, we know the probability distribution. For example, if the story is that we are doing \(n\) independent coin flips, each with probability \(p\) of landing heads, the number \(h\) of heads is Binomially distributed. We could prove that this is the case, or look it up in a book. But sometimes, it is too difficult (or impossible) to prove what the distribution is, so we can numerically compute its properties by sampling out of the distribution. Sampling involved using a random number generator to simulate the story that generates the distribution. So, if you know the story and you have a computer handy, you can sample to be able to get a plot of your distribution, though you will not get the analytical form.

Let’s demonstrate this with the Binomial distribution. We will take \(n = 25\) and \(p = 0.25\) and compute \(P(h \mid n, p)\), the probability of getting \(h\) heads in \(n\) flips, each with probability \(p\) of landing heads. We will draw 10, 30, 100, and 300 samples and plot them versus the expected Binomial distribution.

[2]:
def simulate_coinflips(n, p, size=1):
    """
    Simulate n_samples sets of n coin flips with prob. p of heads.
    """
    n_heads = np.empty(size, dtype=int)
    for i in range(size):
        n_heads[i] = np.sum(np.random.random(size=n) < p)
    return n_heads


size = (30, 100, 1000, 10000)
n = 25
p = 0.25
h_plot = np.arange(26)
theor_dist = st.binom.pmf(h_plot, n, p)

plots = []
for n_samp in size:
    plot = bokeh.plotting.figure(
        plot_height=200,
        plot_width=300,
        x_axis_label="h",
        y_axis_label="P(h)",
        title=f"{n_samp} samples",
    )
    h = simulate_coinflips(n, p, size=n_samp)
    plot.circle(h_plot, theor_dist)
    plot.circle(
        np.arange(h.max() + 1), np.bincount(h) / n_samp, color="orange"
    )
    plots.append(plot)

bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))

As we can see, if we sample out of the probability distribution, we can approximately calculate the actual distribution. If we sample enough, the approximation is very good.

Sampling is such a powerful strategy that highly efficient algorithms with convenient APIs have been developed to sample out of named probability distributions. For example, we could have used np.random.binom() as a drop-in (and much more efficient) replacement for the simulate_coinflips() function above.

We will use the same strategy for solving master equations. We will find a way to sample out of the distribution that is governed by the master equation. This technique was pioneered by Dan Gillespie in the last 70s. For that reason, these sampling techniques are often called Gillespie simulations. The algorithm is sometimes referred to as a stochastic simulation algorithm, or SSA.

Here, we will explore how this algorithm works by looking at simple production of a protein.

The dynamical equations

For simple protein production, we have the following reactions.

\begin{align} \text{DNA} \rightarrow \text{mRNA} \rightarrow \text{protein} \end{align}

Macroscale equations

As we’ve seen before, the deterministic dynamics, which describe mean concentrations over a large population of cells, are described by the ODEs

\begin{align} \frac{\mathrm{d}m}{\mathrm{d}t} &= \beta_m - \gamma_m m, \\[1em] \frac{\mathrm{d}p}{\mathrm{d}t} &= \beta_p m - \gamma_p p. \end{align}

The same equations should hold if \(m\) and \(p\) represent the mean numbers of cells; we would just have to appropriately rescale the constants. Assuming the \(m\) and \(p\) are now numbers (so we are not free to pick their units), we can nondimensionalize using \(\gamma_m\) to nondimensionalize time. This leads to redefinition of parameters and variables

\begin{align} &\beta_m/\gamma_m \to \beta_m, \\[1em] &\beta_p/\gamma_m \to \beta_p, \\[1em] &\gamma_m t \to t. \end{align}

The dimensionless equations are

\begin{align} \frac{\mathrm{d}m}{\mathrm{d}t} &= \beta_m - m, \\[1em] \frac{\mathrm{d}p}{\mathrm{d}t} &= \beta_p m - \gamma p, \end{align}

with \(\gamma = \gamma_p/\gamma_m\).

The Master equation

We can write a master equation for these dynamics. In this case, each state is defined by an mRNA copy number \(m\) and a protein copy number \(p\). So, we will write a master equation for \(P(m, p, t)\).

\begin{align} \frac{\mathrm{d}P(m,p,t)}{\mathrm{d}t} &= \beta_m P(m-1,p,t) + (m+1)P(m+1,p,t) - \beta_m P(m,p,t) - mP(m,p,t) \nonumber \\[1em] &+\beta_p mP(m,p-1,t) + \gamma (p+1)P(m,p+1,t) - \beta_p mP(m,p,t) - \gamma p P(m,p,t). \end{align}

We implicitly define \(P(m, p, t) = 0\) if \(m < 0\) or \(p < 0\). This is the master equation we will sample from using the stochastic simulation algorithm (SSA) or Gillespie algorithm.

The Gillespie algorithm

The transition probabilities are also called propensities in the context of stochastic simulation. The propensity for a given transition, say indexed \(i\), is denoted as \(a_i\). The equivalence to notation we introduced for master equations is that if transition \(i\) results in the change of state from \(n'\) to \(n\), then \(a_i = W(n\mid n')\).

To cast this problem for a Gillespie simulation, we can write each change of state (moving either the copy number of mRNA or protein up or down by 1 in this case) and their respective propensities.

\begin{align} \begin{array}{ll} \text{reaction, }r_i & \text{propensity, } a_i \\ m \rightarrow m+1,\;\;\;\; & \beta_m \\[0.3em] m \rightarrow m-1, \;\;\;\; & m\\[0.3em] p \rightarrow p+1, \;\;\;\; & \beta_p m \\[0.3em] p \rightarrow p-1, \;\;\;\; & \gamma p. \end{array} \end{align}

We will not carefully prove that the Gillespie algorithm samples from the probability distribution governed by the master equation, but will state the principles behind it. The basic idea is that events (such as those outlined above) are rare, discrete, separate events. I.e., each event is an arrival of a Poisson process. The Gillespie algorithm starts with some state, \((m_0,p_0)\). Then a state change, any state change, will happen in some time \(\Delta t\) that has a certain probability distribution (which we will show is Exponential momentarily). The probability that the state change that happens is reaction \(j\) is proportional to \(a_j\). That is to say, state changes with high propensities are more likely to occur. Thus, choosing which of the \(n\) state changes happens in \(\Delta t\) is a matter of drawing an integer \(j\) in \([1,n]\) where the probability of drawing \(j\) is

\begin{align} \frac{a_j}{\sum_i a_i}. \end{align}

Now, how do we determine how long the state change took? The probability density function describing that a given state change \(i\) takes place in time \(t\) is

\begin{align} P(t\mid a_i) = a_i\, \mathrm{e}^{-a_i t}, \end{align}

since the time it takes for arrival of a Poisson process is Exponentially distributed. The probability that it has not arrived in time \(\Delta t\) is the probability that the arrival time is greater than \(\Delta t\), given by the complementary cumulative distribution function for the Exponential distribution.

\begin{align} P(t > \Delta t\mid a_i) = \int_{\Delta t}^\infty \mathrm{d}t\,P(t\mid a_i) = \mathrm{e}^{-a_i \Delta t}. \end{align}

Now, say we have \(n\) processes that arrive in time \(t_1, t_2, \ldots\). The probability that none of them arrive before \(\Delta t\) is

\begin{align} P(t_1 > \Delta t, t_2 > \Delta t, \ldots) &= P(t_1 > \Delta t) P(t_2 > \Delta t) \cdots = \prod_i \mathrm{e}^{-a_i \Delta t} = \mathrm{exp}\left[-\Delta t \sum_i a_i\right]. \end{align}

This is the same as the probability of a single Poisson process with \(a = \sum_i a_i\) not arriving before \(\Delta t\). So, the probability that it does arrive in \(\Delta t\) is Exponentially distributed with mean \(\left(\sum_i a_i\right)^{-1}\).

So, we know how to choose a state change and we also know how long it takes. The Gillespie algorithm then proceeds as follows.

  1. Choose an initial condition, e.g., \(m = p = 0\).

  2. Calculate the propensity for each of the enumerated state changes. The propensities may be functions of \(m\) and \(p\), so they need to be recalculated for every \(m\) and \(p\) we encounter.

  3. Choose how much time the reaction will take by drawing out of an exponential distribution with a mean equal to \(\left(\sum_i a_i\right.)^{-1}\). This means that a change arises from a Poisson process.

  4. Choose what state change will happen by drawing a sample out of the discrete distribution where \(P_i = \left.a_i\middle/\left(\sum_i a_i\right)\right.\). In other words, the probability that a state change will be chosen is proportional to its propensity.

  5. Increment time by the time step you chose in step 3.

  6. Update the states according to the state change you choose in step 4.

  7. If \(t\) is less than your pre-determined stopping time, go to step 2. Else stop.

Gillespie proved that this algorithm samples the probability distribution described by the master equation in his seminal papers in 1976 and 1977. (We recommend reading the latter.) You can also read a concise discussion of how the algorithm samples the master equation in section 4.2 of Del Vecchio and Murray.

Coding up the Gillespie simulation

To code up the Gillespie simulation, we first make an an array that gives the changes in the counts of \(m\) and \(p\) for each of the four reactions. This is a way of encoding the updates in the particle counts that we get from choosing the respective state changes.

[3]:
# Column 0 is change in m, column 1 is change in p
simple_update = np.array(
    [
        [1, 0],  # Make mRNA transcript
        [-1, 0],  # Degrade mRNA
        [0, 1],  # Make protein
        [0, -1],  # Degrade protein
    ],
    dtype=np.int,
)

Next, we make a function that updates the array of propensities for each of the four reactions. We update the propensities (which are passed into the function as an argument) instead of instantiating them and returning them to save on memory allocation while running the code. It has the added benefit that it forces you to keep track of the indices corresponding to the update matrix. This helps prevent bugs. It will naturally be a function of the current population of molecules. It may in general also be a function of time, so we explicitly allow for time dependence (even though we will not use it in this simple example) as well.

[4]:
def simple_propensity(propensities, population, t, beta_m, beta_p, gamma):
    """Updates an array of propensities given a set of parameters
    and an array of populations.
    """
    # Unpack population
    m, p = population

    # Update propensities
    propensities[0] = beta_m  # Make mRNA transcript
    propensities[1] = m  # Degrade mRNA
    propensities[2] = beta_p * m  # Make protein
    propensities[3] = gamma * p  # Degrade protein

Making a draw

Finally, we write a general function that draws a choice of reaction and the time interval for that reaction. This is the heart of the Gillespie algorithm, so we will take some time to discuss speed. First, to get the time interval, we sample a random number from an exponential distribution with mean \(\left(\sum_i a_i\right)^{-1}\). This is easily done using the np.random.exponential() function.

Next, we have to select which reaction will take place. This amounts to drawing a sample over the discrete distribution where \(P_i = a_i\left(\sum_i a_i\right)^{-1}\), or the probability of each reaction is proportional to its propensity. This can be done using scipy.stats.rv_discrete, which allows specification of an arbitrary discrete distribution. We will write a function to do this.

[5]:
def sample_discrete_scipy(probs):
    """Randomly sample an index with probability given by probs."""
    return st.rv_discrete(values=(range(len(probs)), probs)).rvs()

This is a nice one-liner, but is it fast? There may be significant overhead in setting up the scipy.stats discrete random variable object to sample from each time. Remember, we can’t just do this once because the array probs changes with each step in the SSA because the propensities change. We will therefore write a less elegant, but maybe faster way of doing it.

Another way to sample the distribution is to generate a uniformly distributed random number \(q\), with \(0 < q < 1\) and return the value \(j\) such that

\begin{align} \sum_{i=0}^{j-1} p_i < q < \sum_{i=0}^{j}p_i. \end{align}

We’ll code this up.

[6]:
def sample_discrete(probs):
    """Randomly sample an index with probability given by probs."""
    # Generate random number
    q = np.random.rand()

    # Find index
    i = 0
    p_sum = 0.0
    while p_sum < q:
        p_sum += probs[i]
        i += 1
    return i - 1

Now let’s compare the speeds using the %timeit magic function. This is a useful tool to help diagnose slow spots in your code.

[7]:
# Make dummy probs
probs = np.array([0.1, 0.3, 0.4, 0.05, 0.15])

print('Result from scipy.stats:')
%timeit sample_discrete_scipy(probs)

print('\nResult from hand-coded method:')
%timeit sample_discrete(probs)
Result from scipy.stats:
558 µs ± 9.57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Result from hand-coded method:
1.39 µs ± 43.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Wow! The less concise method is a couple of orders of magnitude faster! So, we will ditch using scipy.stats, and use our hand-built sampler instead.

Now we can write a function to do our draws.

[8]:
def gillespie_draw(propensity_func, propensities, population, t, args=()):
    """
    Draws a reaction and the time it took to do that reaction.

    Parameters
    ----------
    propensity_func : function
        Function with call signature propensity_func(population, t, *args)
        used for computing propensities. This function must return
        an array of propensities.
    propensities : ndarray
        Propensities for each reaction as a 1D Numpy array.
    population : ndarray
        Current population of particles
    t : float
        Value of the current time.
    args : tuple, default ()
        Arguments to be passed to `propensity_func`.

    Returns
    -------
    rxn : int
        Index of reaction that occured.
    time : float
        Time it took for the reaction to occur.
    """
    # Compute propensities
    propensity_func(propensities, population, t, *args)

    # Sum of propensities
    props_sum = propensities.sum()

    # Compute next time
    time = np.random.exponential(1.0 / props_sum)

    # Compute discrete probabilities of each reaction
    rxn_probs = propensities / props_sum

    # Draw reaction from this distribution
    rxn = sample_discrete(rxn_probs)

    return rxn, time

SSA time stepping

Now we are ready to write our main SSA loop. We will only keep the counts at pre-specified time points. This saves on RAM, and we really only care about the values at given time points anyhow.

Note that this function is generic. All we need to specify our system is the following.

  • A function to compute the propensities

  • How the updates for a given reaction are made

  • Initial population

Additionally, we specify necessary parameters, an initial condition, and the time points at which we want to store our samples. So, providing the propensity function and update are analogous to providing the time derivatives when using scipy.integrate.odeint().

[9]:
def gillespie_ssa(propensity_func, update, population_0, time_points, args=()):
    """
    Uses the Gillespie stochastic simulation algorithm to sample
    from probability distribution of particle counts over time.

    Parameters
    ----------
    propensity_func : function
        Function of the form f(params, t, population) that takes the current
        population of particle counts and return an array of propensities
        for each reaction.
    update : ndarray, shape (num_reactions, num_chemical_species)
        Entry i, j gives the change in particle counts of species j
        for chemical reaction i.
    population_0 : array_like, shape (num_chemical_species)
        Array of initial populations of all chemical species.
    time_points : array_like, shape (num_time_points,)
        Array of points in time for which to sample the probability
        distribution.
    args : tuple, default ()
        The set of parameters to be passed to propensity_func.

    Returns
    -------
    sample : ndarray, shape (num_time_points, num_chemical_species)
        Entry i, j is the count of chemical species j at time
        time_points[i].
    """

    # Initialize output
    pop_out = np.empty((len(time_points), update.shape[1]), dtype=np.int)

    # Initialize and perform simulation
    i_time = 1
    i = 0
    t = time_points[0]
    population = population_0.copy()
    pop_out[0, :] = population
    propensities = np.zeros(update.shape[0])
    while i < len(time_points):
        while t < time_points[i_time]:
            # draw the event and time step
            event, dt = gillespie_draw(
                propensity_func, propensities, population, t, args
            )

            # Update the population
            population_previous = population.copy()
            population += update[event, :]

            # Increment time
            t += dt

        # Update the index
        i = np.searchsorted(time_points > t, True)

        # Update the population
        pop_out[i_time : min(i, len(time_points))] = population_previous

        # Increment index
        i_time = i

    return pop_out

Running and parsing results

We can now run a set of SSA simulations and plot the results. We will run 100 trajectories and store them, using \(\beta_p = \beta_m = 10\) and \(\gamma = 0.4\). We will also use the nifty package tqdm to give a progress bar so we know how long it is taking.

[10]:
# Specify parameters for calculation
args = (10.0, 10.0, 0.4)
time_points = np.linspace(0, 50, 101)
population_0 = np.array([0, 0], dtype=int)
size = 100

# Seed random number generator for reproducibility
np.random.seed(42)

# Initialize output array
samples = np.empty((size, len(time_points), 2), dtype=int)

# Run the calculations
for i in tqdm.tqdm(range(size)):
    samples[i, :, :] = gillespie_ssa(
        simple_propensity, simple_update, population_0, time_points, args=args
    )
100%|██████████| 100/100 [00:24<00:00,  4.14it/s]

We now have our samples, so we can plot the trajectories. For visualization, we will plot every trajectory as a thin blue line, and then the average of the trajectories as a thick orange line.

[11]:
# Set up plots
plots = [
    bokeh.plotting.figure(
        plot_width=300,
        plot_height=200,
        x_axis_label="dimensionless time",
        y_axis_label="number of mRNAs",
    ),
    bokeh.plotting.figure(
        plot_width=300,
        plot_height=200,
        x_axis_label="dimensionless time",
        y_axis_label="number of proteins",
    ),
]

# Plot trajectories and mean
for i in [0, 1]:
    for x in samples[:, :, i]:
        plots[i].line(
            time_points, x, line_width=0.3, alpha=0.2, line_join="bevel"
        )
    plots[i].line(
        time_points,
        samples[:, :, i].mean(axis=0),
        line_width=6,
        color="orange",
        line_join="bevel",
    )

# Link axes
plots[0].x_range = plots[1].x_range

bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))

We can also compute the steady state properties by considering the end of the simulation. The last 50 time points are at steady state, so we will average over them.

[12]:
print("mRNA mean copy number =", samples[:, -50:, 0].mean())
print("protein mean copy number =", samples[:, -50:, 1].mean())
print("\nmRNA variance =", samples[:, -50:, 0].std() ** 2)
print("protein variance =", samples[:, -50:, 1].std() ** 2)
print("\nmRNA noise =", samples[:, -50:, 0].std() / samples[:, -50:, 0].mean())
print(
    "protein noise =", samples[:, -50:, 1].std() / samples[:, -50:, 1].mean()
)
mRNA mean copy number = 10.0748
protein mean copy number = 251.6994

mRNA variance = 10.26200496
protein variance = 2145.33703964

mRNA noise = 0.317965262817717
protein noise = 0.18402023679851773

Finally, we can compute the steady state probability distributions. To plot them, we plot the empirical cumulative distribution function (ECDF) from the sampling. The theoretical distribution for mRNA is Poisson, which is overlayed in orange.

[13]:
# mRNA ECDF
p_m = biocircuits.viz.ecdf(
    samples[:, -50:, 0].flatten(),
    plot_width=300,
    plot_height=200,
    x_axis_label="mRNA copy number",
)

# Theoretial mRNA CDF (Poisson)
p_m.circle(
    np.arange(25), st.poisson.cdf(np.arange(25), args[0]), color="orange"
)

# protein ECDF
p_p = biocircuits.viz.ecdf(
    samples[:, -50:, 1].flatten(),
    plot_width=300,
    plot_height=200,
    x_axis_label="protein copy number",
)

bokeh.io.show(bokeh.layouts.gridplot([p_m, p_p], ncols=2))

As we expect, the mRNA copy number matches the Poisson distribution. We also managed to get a distribution for the protein copy number that we could plot.

You now have the basic tools for doing Gillespie simulations. You just need to code the propensity function and the update, and you’re on your way! (Note, though, that this only works for Gillespie simulations where the states are defined by particle counts, which should suffice for this course.)

Implementation in the biocircuits package

This algorithm is implemented in the biocircuits package. A significant speed boost is achieved by just-in-time compliation using Numba. To utilize this feature, you need to just-in-time compile (JIT) your propensity function. You can insist that everything is compiled (and therefore skips the comparably slow Python interpreter) by using the @numba.njit decorator. In many cases, like in this one for simple gene expression, that is all you have to do.

[14]:
@numba.njit
def simple_propensity(propensities, population, t, beta_m, beta_p, gamma):
    """Updates an array of propensities given a set of parameters
    and an array of populations.
    """
    # Unpack population
    m, p = population

    # Update propensities
    propensities[0] = beta_m  # Make mRNA transcript
    propensities[1] = m  # Degrade mRNA
    propensities[2] = beta_p * m  # Make protein
    propensities[3] = gamma * p  # Degrade protein

You can also utilize more threads and do the Gillespie simulations in parallel if you like. Because of the speed boosts, we will take more samples using the biocircuits implementation.

[15]:
samples = biocircuits.gillespie_ssa(
    simple_propensity,
    simple_update,
    population_0,
    time_points,
    size=1000,
    args=args,
    n_threads=4,
    progress_bar=False,
)

We now have 4000 trajectories, and we can again make the plots as we did before. With so many trajectories, though, we should not show them all, since there would be too many glyphs on the plot. We therefore thin out the samples for plotting, taking only every 100th trajectory.

[16]:
# Set up plots
plots = [
    bokeh.plotting.figure(
        plot_width=300,
        plot_height=200,
        x_axis_label="dimensionless time",
        y_axis_label="number of mRNAs",
    ),
    bokeh.plotting.figure(
        plot_width=300,
        plot_height=200,
        x_axis_label="dimensionless time",
        y_axis_label="number of proteins",
    ),
]

# Plot trajectories and mean
for i in [0, 1]:
    for x in samples[::100, :, i]:
        plots[i].line(
            time_points, x, line_width=0.3, alpha=0.2, line_join="bevel"
        )
    plots[i].line(
        time_points,
        samples[:, :, i].mean(axis=0),
        line_width=6,
        color="orange",
        line_join="bevel",
    )

# Link axes
plots[0].x_range = plots[1].x_range

bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))

Simulating the repressilator

We have seen a structure to how we can set up Gillespie simulations; we need to specify the propensities and updates, along with an initial population. It helps clarify the system being simulated and also avoids bugs if we make tables for

  1. The species whose populations we are describing;

  2. The update-propensity pairs;

  3. The parameter values.

After constructing the tables, coding up the update and propensity functions is much easier.

To demonstrate this procedure, we will perform a Gillespie simulation for the repressilator, as described in Elowitz and Leibler. We will consider both RNA and DNA in a repressilator where gene 1 represses gene 2, gene 2 represses gene 3, and gene 3 represses gene 1. We explicitly consider mRNA and protein. A repressor binds its operon with chemical rate constant \(k_r\), and an operon may have zero, one, or two repressors bound. The unbinding rate of a repressor when one is bound is \(k_{u,1}\) and that of a repressor when two are bound if \(k_{u,2}\), with \(k_{u,2} < k_{u,1}\) to capture cooperativity. Transcription happens when no repressors are bound to a promoter region with rate constant \(k_{m,u}\) and happens when one or two repressors are bound with rate \(k_{m,o}\).

As we build the simulation, let’s start with a table of the populations we are keeping track of.

index

description

variable

0

gene 1 mRNA copy number

m1

1

gene 1 protein copy number

p1

2

gene 2 mRNA copy number

m2

3

gene 2 protein copy number

p2

4

gene 3 mRNA copy number

m3

5

gene 3 protein copy number

p3

6

Number of repressors bound to promoter of gene 3

n1

7

Number of repressors bound to promoter of gene 1

n2

8

Number of repressors bound to promoter of gene 2

n3

Note that we labeled each species with an index, which corresponds to its position in the array of populations in the simulation.

Next, we can set up a table of updates and propensities for the moves we allow in the Gillespie simulation. We also assign an index to each entry here, as this helps keep track of everything.

index

description

update

propensity

0

transcription of gene 1 mRNA

m1 m1 + 1

kmu*(n3 == 0) + kmo*(n3 > 0)

1

transcription of gene 2 mRNA

m2 m2 + 1

kmu*(n1 == 0) + kmo*(n1 > 0)

2

transcription of gene 3 mRNA

m3 m3 + 1

kmu*(n2 == 0) + kmo*(n2 > 0)

3

translation of gene 1 protein

p1 p1 + 1

kp * m1

4

translation of gene 2 protein

p2 p2 + 1

kp * m2

5

translation of gene 3 protein

p3 p3 + 1

kp * m3

6

degradation of gene 1 mRNA

m1 m1 - 1

gamma_m * m1

7

degradation of gene 2 mRNA

m2 m2 - 1

gamma_m * m2

8

degradation of gene 3 mRNA

m3 m3 - 1

gamma_m * m3

9

degradation of unbound gene 1 protein

p1 p1 - 1

gamma_p * p1

10

degradation of unbound gene 2 protein

p2 p2 - 1

gamma_p * p2

11

degradation of unbound gene 3 protein

p3 p3 - 1

gamma_p * p3

12

degradation of bound gene 1 protein

n1 n1 - 1

gamma_p * n1

13

degradation of bound gene 2 protein

n2 n2 - 1

gamma_p * n2

14

degradation of bound gene 3 protein

n3 n3 - 1

gamma_p * n3

15

binding of protein to gene 1 operator

n3 n3 + 1, p3 p3 - 1

kr * p3 * (n3 < 2)

16

binding of protein to gene 2 operator

n1 n1 + 1, p1 p1 - 1

kr * p1 * (n1 < 2)

17

binding of protein to gene 3 operator

n2 n2 + 1, p2 p2 - 1

kr * p2 * (n2 < 2)

18

unbinding of protein to gene 1 operator

n3 n3 - 1, p3 p3 + 1

ku1*(n3 == 1) + 2*ku2*(n3 == 2)

19

unbinding of protein to gene 2 operator

n1 n1 - 1, p1 p1 + 1

ku1*(n1 == 1) + 2*ku2*(n1 == 2)

20

unbinding of protein to gene 3 operator

n2 n2 - 1, p2 p2 + 1

ku1*(n2 == 1) + 2*ku2*(n2 == 2)

Finally, we have parameters that were introduced in the propensities, so we should have a table defining them.

parameter

value

units

kmu

0.5

1/sec

kmo

5×10\(^{-4}\)

1/sec

kp

0.167

1/molec-sec

gamma_m

0.005776

1/sec

gamma_p

0.001155

1/sec

kr

1

1/molec-sec

ku1

224

1/sec

ku2

9

1/sec

We have now clearly defined all of our parameters, updates, and propensities. With the initial population, out simulation is not completely defined. In practice, we recommend constructing tables like this (and including them in your publications!) if you are going to do Gillespie simulations in your work.

We are now tasked with coding up the propensities and the updates. Starting with the propensities, we recall that the biocircuits module requires a call signature of propensity(propensities, population, t, *args), where the args are the parameters. We find that it is clearest to list the arguments one at a time in the function definition. It is also much clearer to unpack the population into individual variables than to use indexing. Finally, when returning the array of propensities, we recommend having one propensity for each line indexed in order.

[17]:
@numba.njit
def repressilator_propensity(
    propensities, population, t, kmu, kmo, kp, gamma_m, gamma_p, kr, ku1, ku2
):
    m1, p1, m2, p2, m3, p3, n1, n2, n3 = population

    propensities[0] = kmu if n3 == 0 else kmo
    propensities[1] = kmu if n1 == 0 else kmo
    propensities[2] = kmu if n2 == 0 else kmo
    propensities[3] = kp * m1
    propensities[4] = kp * m2
    propensities[5] = kp * m3
    propensities[6] = gamma_m * m1
    propensities[7] = gamma_m * m2
    propensities[8] = gamma_m * m3
    propensities[9] = gamma_p * p1
    propensities[10] = gamma_p * p2
    propensities[11] = gamma_p * p3
    propensities[12] = gamma_p * n1
    propensities[13] = gamma_p * n2
    propensities[14] = gamma_p * n3
    propensities[15] = kr * p3 * (n3 < 2)
    propensities[16] = kr * p1 * (n1 < 2)
    propensities[17] = kr * p2 * (n2 < 2)
    propensities[18] = ku1 * (n3 == 1) + 2 * ku2 * (n3 == 2)
    propensities[19] = ku1 * (n1 == 1) + 2 * ku2 * (n1 == 2)
    propensities[20] = ku1 * (n2 == 1) + 2 * ku2 * (n2 == 2)

Now, we can code up the update. The update is a matrix where entry i,j is the change in species i due to move j. Since we have indexes both the species and the moves (in the update/propensity table), we can include the indices when we define the update for clarity.

[18]:
repressilator_update = np.array(
    [   # 0   1   2   3   4   5   6   7   8
        [ 1,  0,  0,  0,  0,  0,  0,  0,  0], # 0
        [ 0,  0,  1,  0,  0,  0,  0,  0,  0], # 1
        [ 0,  0,  0,  0,  1,  0,  0,  0,  0], # 2
        [ 0,  1,  0,  0,  0,  0,  0,  0,  0], # 3
        [ 0,  0,  0,  1,  0,  0,  0,  0,  0], # 4
        [ 0,  0,  0,  0,  0,  1,  0,  0,  0], # 5
        [-1,  0,  0,  0,  0,  0,  0,  0,  0], # 6
        [ 0,  0, -1,  0,  0,  0,  0,  0,  0], # 7
        [ 0,  0,  0,  0, -1,  0,  0,  0,  0], # 8
        [ 0, -1,  0,  0,  0,  0,  0,  0,  0], # 9
        [ 0,  0,  0, -1,  0,  0,  0,  0,  0], # 10
        [ 0,  0,  0,  0,  0, -1,  0,  0,  0], # 11
        [ 0,  0,  0,  0,  0,  0, -1,  0,  0], # 12
        [ 0,  0,  0,  0,  0,  0,  0, -1,  0], # 13
        [ 0,  0,  0,  0,  0,  0,  0,  0, -1], # 14
        [ 0,  0,  0,  0,  0, -1,  0,  0,  1], # 15
        [ 0, -1,  0,  0,  0,  0,  1,  0,  0], # 16
        [ 0,  0,  0, -1,  0,  0,  0,  1,  0], # 17
        [ 0,  0,  0,  0,  0,  1,  0,  0, -1], # 18
        [ 0,  1,  0,  0,  0,  0, -1,  0,  0], # 19
        [ 0,  0,  0,  1,  0,  0,  0, -1,  0], # 20
    ],
    dtype=int)

Next, we specify the parameter values according to the parameter table. Remember that we need to package them in a tuple, after defining them.

[19]:
# Parameter values
kmu = 0.5
kmo = 5e-4
kp = 0.167
gamma_m = 0.005776
gamma_p = 0.001155
kr = 1.0
ku1 = 224.0
ku2 = 9.0

repressilator_args = (kmu, kmo, kp, gamma_m, gamma_p, kr, ku1, ku2)

Finally, we specify the initial population and the time points we want for the sampling.

[20]:
# State with 10 copies of everything, nothing bound to operators
repressilator_pop_0 = np.array([10, 10, 10, 10, 10, 10, 0, 0, 0], dtype=int)

repressilator_time_points = np.linspace(0, 80000, 4001)

And we are all set to perform the calculation. We will make a single trace and then plot it.

[21]:
# Perform the Gillespie simulation
pop = biocircuits.gillespie_ssa(
    repressilator_propensity,
    repressilator_update,
    repressilator_pop_0,
    repressilator_time_points,
    args=repressilator_args,
)


# Make plot
colors = bokeh.palettes.d3["Category10"][3]
p = bokeh.plotting.figure(
    height=250,
    width=600,
    x_axis_label="time (min)",
    y_axis_label="protein copy number",
)
for c, i in enumerate([1, 3, 5]):
    p.line(repressilator_time_points / 60, pop[0, :, i], color=colors[c])

bokeh.io.show(p)