^{1}

^{*}

^{2}

^{2}

Disease mapping is the study of the distribution of disease relative risks or rates in space and time, and normally uses generalized linear mixed models (GLMMs) which includes fixed effects and spatial, temporal, and spatio-temporal random effects. Model fitting and statistical inference are commonly accomplished through the empirical Bayes (EB) and fully Bayes (FB) approaches. The EB approach usually relies on the penalized quasi - likelihood (PQL), while the FB approach, which has increasingly become more popular in the recent past, usually uses Markov chain Monte Carlo (McMC) techniques. However, there are many challenges in conventional use of posterior sampling via McMC for inference. This includes the need to evaluate convergence of posterior samples, which often requires extensive simulation and can be very time consuming. Spatio-temporal models used in disease mapping are often very complex and McMC methods may lead to large Monte Carlo errors if the dimension of the data at hand is large. To address these challenges, a new strategy based on integrated nested Laplace approximations (INLA) has recently been recently developed as a promising alternative to the McMC. This technique is now becoming more popular in disease mapping because of its ability to fit fairly complex space-time models much more quickly than the McMC. In this paper, we show how to fit different spatio-temporal models for disease mapping with INLA using the Leroux CAR prior for the spatial component, and we compare it with McMC using Kenya HIV incidence data during the period 2013-2016.

Statistical methods for disease mapping have grown very fast in the last decade. Modern registers provide a lot of information with high quality data recorded for different regions over a period of time (e.g. years). This has brought in new challenges and goals which also require new and more flexible statistical models, faster and less computationally demanding methods for model fitting, and advance softwares to implement them. Spatio-temporal disease mapping models are widely used to describe the temporal variation and geographical patterns of mortality risks or rates. The information obtained from these analyses is useful for health researchers and policy makers since it helps in formulating hypothesis about the aetiology of a disease, looking for risk factors and also allocation of resources efficiently in hot spot areas, or planning prevention and intervention measures.

Spatio-temporal models are mainly used in disease mapping studies because they make it possible to borrow strength from spatial and temporal neighbours to reduce the high variability that is common to classical risk estimators, such as the standardized mortality ratio (SMR) when studying, in particular, rare diseases or low populated areas. These models are usually formulated in a hierarchical Bayesian framework and typically rely on generalized linear mixed models (GLMM). Model fitting and statistical inference are commonly accomplished through the empirical Bayes (EB) and fully Bayes (FB) approaches. The EB approach usually relies on the penalized quasi-likelihood (PQL) [

The FB approach provides posterior marginal distributions of the target parameters and consequently it provides a whole picture about the target parameters instead of a single point estimate. However, there are many challenges associated with this approach. The posterior sampling distributions are not readily available in a closed form and hence inference is usually achieved via McMC algorithms. This includes the need to evaluate convergence of posterior samples, which often requires extensive simulation and can be very time consuming. Spatio-temporal models used in disease mapping are often very complex and McMC methods may lead to large Monte Carlo errors and large computation time if the dimension of the data at hand is large [

To address these challenges, a new strategy based on integrated nested Laplace approximations (INLA) has recently been recently proposed [

There is an extensive literature in Bayesian spatio-temporal disease mapping. For parametric models, see for example [

In this paper, our focus is to implement spatio-temporal disease mapping models using the INLA methodology. Most of the research in spatial and spatio-temporal disease mapping with INLA considers the Besag et al. [

The rest of this paper is organized as follows. In Section 2, a review of spatial model is given and different spatio-temporal models that will be fitted with INLA are described. A review of the R-INLA package and prior distributions to be used is presented in Section 3. In Section 4, the models discussed are used to analyze Kenya HIV incidence data for the years 2013-2016. In Section 5, we compare the INLA and McMC techniques and finally conclusion is given in Section 6.

Consider a large area, say a country, divided into small areas (let us say provinces or counties) that will be labelled by i = 1 , 2 , ⋯ , n , and let Y i denote the number of incident cases (or deaths) in the ith small area. Then conditional on the relative risk θ i , Y i is assumed to follow a Poisson distribution with mean μ i = E i θ i , where E i is the number of expected cases. That is

Y i | θ i ~ P o i s s o n ( μ i = E i θ i ) ; (1)

log ( μ i ) = log ( E i ) + log (θi)

here log ( E i ) is the offset and log ( θ i ) is modeled as

log ( θ i ) = α + u i (2)

where α is the global risk and u i is the spatially structured random effect. Very often, an intrinsic conditional autoregressive (ICAR) prior is used to modeled the vector of spatially structured random effects u = ( u 1 , ⋯ , u n ) ′ . That is,

u ~ N ( 0 , σ 2 R − ) (3)

where − denotes the Moore-Penrose inverse of a matrix, σ 2 is the variance component and R is the n × n spatial neighbourhood matrix with ij th element defined as:

R i j = { n i , if i = j − 1, if j ~ i 0, otherwise (4)

where n i represents the number of neighbours of area i and i ~ j indicates that areas i and j are neighbours. Typically, two areas are neighbours if they share a common border.

The full conditional distributions of u i given all the other remaining components u − i = ( u 1 , ⋯ , u i − 1 , u i + 1 , ⋯ , u n ) can be expressed as follows:

u i | u − i ~ Normal ( 1 n i ∑ j ~ i n u j , σ 2 n i ) (5)

However, this model has been criticized since the spatial and non-spatial effects are not identifiable, as noticed by Eberly and Carlin [

u ~ N ( 0 , σ 2 Q − 1 ) , Q = [ ρ R + ( 1 − ρ ) I ] (6)

where ρ ∈ [ 0,1 ] is a spatial smoothing parameter and I is a n × n identity matrix. When ρ = 0 , the LCAR prior reduces to an exchangeable (independent) prior u ~ N ( 0 , σ 2 I ) , and when ρ = 1 , it reduces to the ICAR model u ~ N ( 0 , σ 2 R − ) [

The univariate full conditional distribution of u i can be expressed as:

u i | u j ≠ i ~ Normal ( ρ ( 1 − ρ ) + n i ρ ∑ j ~ i n u j , σ u 2 ( 1 − ρ ) + n i ρ ) (7)

Suppose now that for every small area i, data has been recorded for different time periods denoted by t = 1 , ⋯ , T . Then, conditional on the relative risk θ i t , Y i t which is the count of events in region i at time t is assumed to be Poisson distributed with mean μ i t = E i t θ i t , where E i t is the number of expected cases. That is;

Y i t | θ i t ~ P o i s s o n ( μ i t = E i t θ i t ) , l o g ( μ i t ) = l o g ( E i t ) + l o g ( θ i t ) (8)

here, l o g ( θ i t ) can be specified in different ways to define various models.

Various spatio-temporal models for disease mapping have been considered in the literature, with most of them based on the popular ICAR models extending the popular BYM model [

In this section, we consider a Bayesian model with a parametric linear trend for the temporal component which is with the model proposed by Bernardinelli [

l o g ( θ i t ) = α + u i + ( β + δ i ) ⋅ t (9)

where α is the intercept that quantifies the average outcome rate in the entire study region, u i is the spatial random effect, β is the main linear time trend which represents the global time effect, and δ i is a differential trend which captures the interaction between the linear time trend and the spatial effect u i . In this paper, the LCAR prior proposed by Leroux [

In the model specified above, a linearity assumption imposed on the differential temporal trend δ i . However, this assumption may not be realistic under practical situations, where it is common to observe change points in temporal trends due to improvement in treatments, screening programmes and early detection and intervention, and generally advances in research. Thus, it is necessary to extend Equation (9) by releasing out the linearity constraint and assuming a dynamic non-parametric trends. In this paper, various non-parametric models which also includes space-time interactions are examined. In these models, the LCAR prior distribution is used for the spatial component unlike the models considered by Knorr-Held [

l o g ( θ i t ) = α + u i + ϕ t + γ t + δ i t (10)

here α and u i have the same parameterization as in Equation (9). The term ϕ t denote the temporally unstructured random effects and while the term γ t represent temporally structured random effects. Finally, δ i t is the space-time interaction term. Note that additive models are obtained if the interaction terms are not there. All the components in the model 10 are usually modelled as Gaussian Markov random fields (GMRF), Rueand Held [

There are four ways to define the structure matrix, as presented in Knorr-Held [

Space-time interaction | R δ | Rank of R δ | |
---|---|---|---|

RW1 for γ | RW2 for γ | ||

Type I | I s ⊗ I t | I ⋅ T | I ⋅ T |

Type II | I s ⊗ R t | I ⋅ ( T − 1 ) | I ⋅ ( T − 2 ) |

Type III | R s ⊗ I t | ( I − 1 ) ⋅ T | ( I − 1 ) ⋅ T |

Type IV | R s ⊗ R t | ( I − 1 ) ⋅ ( T − 1 ) | ( I − 1 ) ⋅ ( T − 2 ) |

Source: Ugarte et al. (2014).

interpreted as different spatial trends for each year without any temporal structure. Type IV interaction, which is the most complex among the space-time interactions, assumes that δ ′ i t s are completely dependent over space and time. This type of interaction will be appropriate if temporal trends are different from region to region, but are more likely to be similar for adjacent regions.

Different combinations of priors for the temporally structured effect (RW1 or RW2) and the type of interaction produce 20 additional models to models 1, 2a, and 2b discussed in Section 2.1. Models 3a and 3b are the additive models (obtained when the interaction term is dropped) with RW1 and RW2 for the temporally structured effect, respectively. Models 4a and 4b are Type I interaction models with RW1 and RW2 for the temporally structured effect, respectively. Models 5a and 5b are the same as models 4a and 4b but with a Type II interaction. Models 6a and 6b are Type III interaction models, and Models 7a and 7b include a Type IV interaction. In addition, models without the unstructured temporal effect are considered. Models 8a and 8b are additive models with RW1 and RW2 priors for the temporally structured effect. Models 9a and 9b are Type I interaction models, Models 10a and 10b are Type II interaction, Models 11a and 11b include a Type III interaction models and Models 12a and 12b are the Type IV interaction models.

The Bayesian inference using INLA methodology is implemented in a package called inla, which is a C program [^{rd} June 2014 was used.

The models in INLA can be ran by specifying the linear predictor of the model as a formula object in R using the function f() for the smooth effects such as fixed effects, non linear terms and random effects. The interface is very flexible and it has options that allows different models and priors to be specified easily. Several authors [

Spatial latent effects for the lattice data in R-INLA consist of a prior distribution which follow a multivariate normal distribution with zero mean and precision matrix τ C , where τ is a precision parameter and C is a square and symmetric structure matrix which controls how the spatial dependence is and it can assume different forms to induce different types of spatial interaction. When C is completely specified, like in the case of spatio-temporal interaction effect, the “generic0” model is implemented and it defines a multivariate normal prior distribution with zero mean and generic precision matrix C which is normally defined by the user.

For the case of spatially structured random effect, the “besag” and “generic1” models are used to implement the ICAR [

Q = ( I n − β λ m a x C ) (11)

where C is the structure matrix and λ m a x is the maximum eigenvalue of matrix C which allows the parameter β take values between 0 and 1. Ugarte [

In addition to the ICAR specification implemented in the besag model, bym model can be used to implement the sum of spatially structured and unstructured random effects described in the convolution model [

To ensure the identifiability of the interaction term δ , it should be emphasized here that sum-to-zero constraints should be used depending on the type of interaction (see

π ∗ ( δ ) = π ( δ | A δ = e ) (12)

where A δ = e denotes linear constraints on δ with A given by those eigenvectors of R δ which span the null space. Hence, to ensure the identifiability of δ , the null space of the respective structure matrix R δ is computed using the obtained eigenvectors as linear constraints for the estimation of δ . Consequently, the number of linear constraints which are necessary is always equal to the rank deficiency of R δ (see

In R-INLA, the model is normally fitted with a call to function inla(), which returns an inla object with the fitted model. This function provides for specification of different likelihood models (family object), computes marginal densities of the latent effects and, by default, the hyperparameters and also enables one to select the integration strategy for the approximations (control.inla object). In addition to the posterior marginal densities, it is possible to compute posterior marginals for the linear predictor (control.predictor object). Several quantities for model choice and selection such as the effective number of parameters (pD) and the Deviance Information Criterion (DIC) are also provided within INLA (control.compute object).

The choice of prior distributions is very important in Bayesian inference because it can seriously affect the posterior distributions. The hyperprior distributions are defined in R-INLA with the argument hyper. Here, the hyperprior distributions for the spatial components are l o g τ s ~ log Gamma ( 1 , 0.01 ) and logit ( λ s ) ~ logitbeta ( 4 , 2 ) . This informative prior for λ s is used since the data at hand are known to show high spatial correlation. If no information about the amount of spatial correlation is available, a non informative prior such as a logitbeta (1, 1) can be used [

In this section we apply the models discussed in the previous sections to 2013-2016 HIV data collected by the Ministry of Health, Kenya. The data was extracted from the Kenya Aids Indicator Surveys (KAIS), conducted by the Government of Kenya. The data has been described in Section 1. The main objective of survey was to collect high quality data on the prevalence of HIV and sexually transmitted infections (STI) among adults, and to assess knowledge of HIV and STI among the populations.

All the 23 models already discussed in Section 2 were fitted to the 2013-2016 HIV data using INLA. An important feature of the INLA technique is that the computation time and cost are reduced substantially as compared to the McMC methods, and therefore many models can be fitted and compared in a much shorter time. For model selection and comparison, the Deviance Information Criterion (DIC) [

For the best model (model 10a), the estimated logarithm of the relative risks obtained is made up of four different components: a global risk (denoted by α ^ ) which is the overall risk common to all areas; the spatial location risk ( u ^ ) that

Parametric models | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

Model | D ¯ | pD | DIC | ||||||||

Model 1 | 16,469.70 | 117.18 | 16,586.88 | ||||||||

Model 2a | 16,469.82 | 117.19 | 16,587.01 | ||||||||

Model 2b | 16,469.79 | 117.19 | 16,586.98 | ||||||||

Non-parametric models ( log r i t = α + u i + ϕ t + γ t + δ i j ) | |||||||||||

Model | Space-time interaction | a) RW1 | b) RW2 | ||||||||

D ¯ | pD | DIC | D ¯ | pD | DIC | ||||||

Model 3 | Additive model | 36,361.70 | 84.26 | 36,445.96 | 36,361.85 | 84.26 | 36,446.11 | ||||

Model 4 | Type I | 2354.81 | 186.84 | 2541.65 | 2354.02 | 186.85 | 2540.87 | ||||

Model 5 | Type II | 2349.25 | 187.10 | 2536.35 | 2355.64 | 185.39 | 2541.03 | ||||

Model 6 | Type III | 2349.29 | 187.62 | 2536.91 | 2349.28 | 187.62 | 2536.90 | ||||

Model 7 | Type IV | 2350.34 | 187.15 | 2537.49 | 2353.30 | 186.55 | 2539.85 | ||||

Non-parametric models log r i t = α + u i + γ t + δ i j | |||||||||||

Model | Space-time interaction | a) RW1 | b) RW2 | ||||||||

D ¯ | pD | DIC | D ¯ | pD | DIC | ||||||

Model 8 | Additive model | 36,361.87 | 84.26 | 36,446.13 | 36,362.04 | 84.26 | 36,446.30 | ||||

Model 9 | Type I | 2354.84 | 186.85 | 2541.69 | 2354.00 | 186.87 | 2540.87 | ||||

Model 10 | Type II | 2349.20 | 186.85 | 2536.30 | 2355.59 | 185.39 | 2540.98 | ||||

Model 11 | Type III | 2349.27 | 187.62 | 2536.89 | 2349.28 | 187.62 | 2536.90 | ||||

Model 12 | Type IV | 2350.30 | 187.15 | 2537.45 | 2353.23 | 186.55 | 2539.78 | ||||

can arise due to factors associated to a specific area; a temporal risk trend common to all regions ( γ ^ ) that can arise due to changes in coding the disease, diagnostics, policies affecting the whole country and finally a region specific temporal risk trend δ ^ attributed to specific effects of each county.

It is clear from this figure that there is a higher risk of HIV infection in the counties to the Western region of Kenya as compared to the other counties. In particular, Homa Bay, Siaya, Migori and Kisumu counties show high relative risks. Finally,

The specific temporal trends (in log scale) for four selected counties are shown in

In Bayesian modeling and inference there are several challenges in the use of the popular McMC. One challenge is that the McMC uses posterior sampling inference which requires the need to evaluate convergence of posterior samples. This usually requires extensive simulation that can be costly and time consuming. The frequently used software packages for the implementation of the McMC technique include WinBUGS, OpenBUGS, as well as certain selected R packages such as McMCpack and SAS procedures. WinBUGS has gained a lot of popularity in the recent past and has been used to run most of the McMC analyses.

INLA which has been proposed as an alternative to the burdensome McMC can be implemented as an R package (R-INLA) and performs Bayesian modeling without using the posterior sampling methods. Unlike McMC algorithms, which rely on Monte Carlo integration, the R-INLA package performs Bayesian analyses using numerical integration which requires much shorter time since it does not require extensive iterative computation. Very often, Bayesian modeling using the INLA methodology takes much shorter time as compared to modeling using McMC. However, there have been limited attempts to compare performance capabilities of these software packages particularly for the case of spatio-temporal models in a disease mapping. In this section, a comparison of the McMC and INLA techniques based on the best fitting model (model 10a) in analysis of the Kenya HIV data in section 4 is provided.

Parameters | MCMC | INLA |
---|---|---|

α | −0.031 (0.007) | −0.366 (0.001) |

σ u 2 | 0.892 (0.223) | 1.712 (0.421) |

λ u | 0.722 (0.123) | 0.555 (0.178) |

σ γ 2 | 0.018 (0.016) | 0.008 (0.003) |

σ δ 2 | 0.045 (0.005) | 0.042 (0.005) |

INLA also have some shortcomings. One challenge involves the ability to use hyperparameters as flexibly as in WinBUGS. While it is difficult to implement prior distributions for the standard deviations in INLA, this can be done easily in WinBUGS. Placing prior distributions on the standard deviations rather than fixing them or placing them on the precisions can lead to better model fits in some situations. Additionally, there is not an easy way to place hyperprior distributions on the precisions of the fixed effects.

There are many options in INLA for improving the models. Initially, we explore specifying the use of a full Laplace approximation strategy in INLA, but this does not lead to different parameter estimates and computation time is longer as compared to simple Laplace approximation. Specifying the full Laplace strategy did, however, lead to different goodness of fit measures that were closer to those produced with WinBUGS. Furthermore, the simplified Laplace strategy is not sufficient for computing predictive measures [

Spatial and spatio-temporal models are usually formulated in a hierarchical Bayesian framework and typically relies on generalized linear mixed models (GLMM). Model fitting and statistical inference are commonly accomplished through the empirical Bayes (EB) and fully Bayes (FB) approaches. The EB approach usually relies on the penalized quasilikelihood (PQL), while the FB approach usually uses Markov chain Monte Carlo (McMC) techniques. Spatio-temporal models used in disease mapping are often very complex and McMC methods may lead to large Monte Carlo errors and large computation time if the dimension of the data at hand is large. To address these challenges, a new strategy based on integrated nested Laplace approximations (INLA) has recently been proposed as a promising alternative to the McMC. In this paper, it is shown that INLA is able to fit fairly complex space-time models much more quickly than the McMC algorithms. INLA also has an additional attractive feature since it can be easily used in the free software R, with the package R-INLA. The INLA methodology also provides several quantities for Bayesian model choice and selection such as the effective number of parameters (pD) and the Deviance Information Criterion (DIC). The disadvantage of INLA involves the ability to use hyperparameters as flexibly as in WinBUGS. It is difficult to implement prior distributions for the standard deviations in INLA, while this can be done easily in WinBUGS. Placing prior distributions on the standard deviations rather than fixing them or placing them on the precisions can lead to better model fits in some situations. Furthermore, there is not an easy way to place hyperprior distributions on the precisions of the fixed effects.

Most of the works in spatial and spatio-temporal disease mapping with McMC and INLA considers the intrinsic conditional autoregressive (ICAR) prior for the spatially structured variability. However, the ICAR prior is improper and has the undesirable largescale property of leading to a negative pairwise correlation for regions located further apart. Moreover, the variance components in the BYM convolution model are not identifiable from the data and informative hyperpriors are needed for posterior inference. In this paper, we consider the LCAR prior as an alternative to the ICAR prior. The LCAR prior does not produce such negative correlations and has the advantage of including a parameter that quantifies spatial dependence as well as unstructured heterogeneity. A comparison of INLA and McMC has been done using the LCAR prior for the spatial random effects. WinBUGS is a populal tool for analysis in FB disease mapping while INLA was recently introduced and is now gaining popularity. Both techniques produce similar parameter estimates, except for the smoothing parameter, where McMC tends to overestimate it a bit more than INLA. To improve the models in INLA, we explore specifying the use of a full Laplace approximation strategy, but this does not lead to different parameter estimates and computation time is longer as compared to simple Laplace approximation. Specifying the full Laplace strategy did, however, lead to different goodness of fit measures that were closer to those produced with WinBUGS.

Finally, our analysis of the Kenya HIV incidence data for the period 2013-2016 shows that the incidence rate is still high, and counties located to the Western region show a significant high risk as compared with the other counties. In particular, Homa Bay, Siaya, Migori and Kisumu counties shows the highest risks. The reasons why these counties show high HIV incidence risks is a subject that is still under investigation and further research is needed.

The authors declare no conflicts of interest regarding the publication of this paper.

Tonui, B., Mwalili, S. and Wanjoya, A. (2018) Spatio-Temporal Variation of HIV Infection in Kenya. Open Journal of Statistics, 8, 811-830. https://doi.org/10.4236/ojs.2018.85053

R-INLA code for model 10a

#Type II interaction and RW2 prior for time #

S=47

T=4

temp <- poly2nb(kenya)

nb2INLA("kenya.graph", temp)

kenya.adj <- paste(getwd(),"/kenya.graph",sep="")

H <- inla.read.graph(filename="kenya.graph")

# Temporal graph

D1 <- diff(diag(T),differences=1)

Q.gammaRW1 <- t(D1)%*%D1

D2 <- diff(diag(T),differences=2)

Q.gammaRW2 <- t(D2)%*%D2

Q.xi <- matrix(0, H$n, H$n)

for (i in 1:H$n){

Q.xi[i,i]=H$nnbs[[i]]

Q.xi[i,H$nbs[[i]]]=-1

}

Q.Leroux <- diag(S)-Q.xi

R <- kronecker(Q.gammaRW1,diag(S))

r.def <- S

A.constr <- kronecker(matrix(1,1,T),diag(S))

formula <- y ˜f(ID.area, model="generic1", Cmatrix= Q.Leroux, constr=TRUE,

hyper=list(prec=list(prior="loggamma", param=c(1,0.01)),

beta=list(prior="logitbeta", param=c(4,2))))+

f(ID.year, model="rw1", constr=TRUE,

hyper=list(prec=list(prior="loggamma", param=c(1,0.00005))))+

f(ID.area.year,model="generic0", Cmatrix=R, constr=TRUE,

hyper=list(prec=list(prior="loggamma", param=c(1,0.00005))),

extraconstr=list(A=A.constr, e=rep(0,S)))

27

model10a<-inla(formula, family="poisson", data=Data, E=E,

control.predictor=list(compute=TRUE,cdf=c(log(1))),

control.compute=list(dic=TRUE),

control.inla=list(strategy="simplified.laplace"))