In this notebook I am going to fit a Bayesian SIR model in JAGS. Some of this material is a simplification of the more advanced spatial SEIRS model found in this thesis by Grant Brown from 2015 which is well worth reading. Another very useful paper is Lekone and Finkelstadt which has no spatial aspect, and is the majority of what is re-created here. I have changed the notation to what I believe is a clearer version.
The main idea is to simplify the differential equations of the SIR model into difference equations and then use probability distributions and transmission matrices to quantify uncertainties between compartments. All time is treated as discrete, which is fine for current COVID-19 data which is almost all available at a daily time step. In another document I will extend this model to SIER, and further a spatial or stratified type model. In the next document I will fit this model to a chosen country from the ECDC data set.
It may be possible to fit a similar model in Stan but this would require integrating out many of the discrete parameters.
First data:
Now latent variables:
Some parameters:
Some extra parameters are derived:
The model equations are given by: \[S(t+1) = S(t) - N_{S \rightarrow I}(t)\] \[I(t+1) = I(t) + N_{S \rightarrow I}(t) - N_{I \rightarrow R}(t)\] \[S(t) + I(t) + R(t) = N\]
In English these equations mean:
Each of the latent variables is given a binomial distribution
\[N_{S \rightarrow I}(t) \sim Bin(S(t), p_{S \rightarrow I}(t))\] \[N_{I \rightarrow R}(t) \sim Bin(I(t), p_{I \rightarrow R})\]
In words these equations mean:
These probabilities are set at
\[p_{S \rightarrow I}(t) = 1 - \exp \left[ - \frac{\beta(t)}{N} I(t) \right]\] This equation makes the proportion of people moving from susceptible to infected dependent on the proportion of people who are infected (\(I(t)/N\)), i.e. the greater the proportion of people infected, the fewer can become infected.
\[p_{I \rightarrow R} = 1 - \exp( - \gamma)\] where
Finally, the transmission rate is assumed to be constant up until a time point \(t^*\) at which point control measures are introduced:
\[\beta(t) = \left\{ \begin{array}{ll} \beta, & t< t^* \\ \beta e^{-q(t - t^*)}, & t \ge t^* \end{array} \right.\]
One of the key parameters to estimate is the basic reproduction number which is defined as \(R_0\) or \(\beta / \gamma\) in this notation. This model has the cool feature that we can estimate a time dependent reproduction number as:
\[ R_0(t) = \frac{\beta(t)}{\gamma} \frac{S(t)}{N}\]
Ideally, the data for this model would be \(N, N_{S \rightarrow I}(t), N_{I \rightarrow R}(t)\), the latter three for all time values. In fact we often do not have access to \(N_{S \rightarrow I}(t)\) (the number of susceptible individuals who become infected) so we may need to integrate out these values during the Bayesian model.
Below is some code for a JAGS model:
jags_code = '
model {
# Likelihood
for (t in 1:T) {
N_I_R[t] ~ dbinom(p_I_R, I[t])
# These are imputed:
N_S_I[t] ~ dbinom(p_S_I[t], S[t])
R_0[t] <- (beta[t]/gamma)*(S[t]/N)
}
# These are the known time evolution steps:
for(t in 2:T) {
S[t] <- S[t-1] - N_S_I[t-1]
I[t] <- I[t-1] + N_S_I[t-1] - N_I_R[t-1]
R[t] <- N - S[t] - I[t]
}
# Need a value for S[1] and I[1]
I[1] <- I_start # Assume the number of people infected on day 1 is the same as those transitioning from E to I
R[1] <- R_start # As above but for removed
S[1] <- N - I[1] - R[1] # Left over
# This is the prior on p_S_I[t]
for(t in 1:T) {
p_S_I[t] <- 1 - exp( - beta[t] * I[t] / N )
}
# Sort out the change point
for(t in 1:(t_star - 1)) {
beta[t] <- beta_const
}
for (t in t_star:T) {
beta[t] <- beta_const * exp( - q * (t - t_star))
}
# These are the priors on the other probabilities
p_I_R <- 1 - exp( -gamma )
# Now the prior on the hyper-parameters
beta_const ~ dunif(0.3, 0.6) #= 0.4
gamma <- 1/gamma_inv
gamma_inv ~ dunif(5,8) # 6
q ~ dbeta(2, 10) # 0.2
# Can now forecast into the future
for (t in (T+1):T_max) {
# Transitions
N_I_R[t] ~ dbinom(p_I_R, I[t])
N_S_I[t] ~ dbinom(p_S_I[t], S[t])
# R_0
R_0[t] = (beta[t]/gamma)*(S[t]/N)
# Compartment values
S[t] <- S[t-1] - N_S_I[t-1]
I[t] <- I[t-1] + N_S_I[t-1] - N_I_R[t-1]
R[t] = N - S[t] - I[t]
# Probability values
p_S_I[t] <- 1 - exp( - beta[t] * I[t] / N )
beta[t] <- beta_const * exp( - q * (t - t_star))
}
}
'
I will first try simulating some data from this model and checking that JAGS actually works.
# Summary values
N = 1e3 # Population size
T = 30 # Maximum time steps
t = 1:T
# First the hyper-parameters
t_star = 20 # Days in before intervention
gamma_inv = 10; gamma = 1/gamma_inv
beta_const = 0.6
q = 0.4
# Now the probabilities
p_I_R = 1 - exp(-gamma)
# Give an initial values for everthing required
p_S_I = N_S_I = N_I_R = S = I = R = beta = rep(NA, T)
beta[1:(t_star - 1)] = beta_const
beta[t_star:T] = beta_const * exp( - q * (t[t_star:T] - t_star))
S[1] = N - 10; I[1] = 10; R[1] = 0
p_S_I[1] = 1 - exp( - beta[1] * I[1] / N )
N_S_I[1] = rbinom(1, S[1], p_S_I[1])
N_I_R[1] = rbinom(1, I[1], p_I_R)
# Now can loop through filling in the other values
for (t in 2:T) {
S[t] <- S[t-1] - N_S_I[t-1]
I[t] <- I[t-1] + N_S_I[t-1] - N_I_R[t-1]
R[t] = N - S[t] - I[t]
p_S_I[t] <- 1 - exp( - beta[t] * I[t] / N )
# Now get the actual values
N_I_R[t] = rbinom(1, I[t], p_I_R)
# These are usually imputed:
N_S_I[t] = rbinom(1, S[t], p_S_I[t])
}
# Create R0
R_0 = (beta/gamma)*(S/N)
# Create a plot of the epidemic
library(ggplot2)
library(tidyr)
tibble(t = 1:T,
S, I, R) %>%
pivot_longer(names_to = 'Compartment', values_to = 'People', -t) %>%
ggplot(aes(x = t, y = People, colour = Compartment)) +
geom_vline(xintercept = t_star) +
#scale_y_log10() +
geom_line()
Now we can fit this model by providing the data to JAGS
N_future = 50 # Forecast 100 days into the future
T_max = T + N_future
jags_data = list(N = N,
T = T,
T_max = T_max,
t_star = t_star,
I_start = I[1],
R_start = R[1],
N_S_I = c(N_S_I, rep(NA, N_future)),
N_I_R = c(N_I_R, rep(NA, N_future)))
library(R2jags)
jags_run = jags(data = jags_data,
parameters.to.save = c("gamma_inv", "q", "beta",
"beta_const", "S","I",
"R", "R_0"),
model.file = textConnection(jags_code))
plot(jags_run)
Now plot the number of infected over time:
S_post = jags_run$BUGSoutput$median$S
I_post = jags_run$BUGSoutput$median$I
R_post = jags_run$BUGSoutput$median$R
tibble(t = 1:T_max,
S_post, I_post, R_post) %>%
pivot_longer(names_to = 'Compartment', values_to = 'People', -t) %>%
ggplot(aes(x = t, y = People, colour = Compartment)) +
geom_vline(xintercept = t_star) +
geom_line()
Also plot the dynamic \(R_0\) against it’s true value
R0_post = jags_run$BUGSoutput$median$R_0
tibble(t = 1:T_max,
R0_true = c(R_0, rep(NA, N_future)),
R0_post = R0_post) %>%
pivot_longer(names_to = 'Type', values_to = 'R0', -t) %>%
ggplot(aes(x = t, y = R0, colour = Type)) +
geom_vline(xintercept = t_star) +
geom_line()
## Warning: Removed 50 row(s) containing missing values (geom_path).