**Open Journal of Statistics**

Vol.08 No.05(2018), Article ID:87540,20 pages

10.4236/ojs.2018.85053

Spatio-Temporal Variation of HIV Infection in Kenya

Benard Tonui^{1*}, Samuel Mwalili^{2}, Anthony Wanjoya^{2 }

^{1}Department of Mathematics & Computer Science, University of Kabianga, Kericho, Kenya

^{2}Department of Statistics & Actuarial Science, Jomo Kenyatta University of Agriculture & Technology, Juja, Kenya

Copyright © 2018 by authors and Scientific Research Publishing Inc.

This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).

http://creativecommons.org/licenses/by/4.0/

Received: May 21, 2018; Accepted: September 24, 2018; Published: September 27, 2018

ABSTRACT

Disease mapping is the study of the distribution of disease relative risks or rates in space and time, and normally uses generalized linear mixed models (GLMMs) which includes fixed effects and spatial, temporal, and spatio-temporal random effects. Model fitting and statistical inference are commonly accomplished through the empirical Bayes (EB) and fully Bayes (FB) approaches. The EB approach usually relies on the penalized quasi-likelihood (PQL), while the FB approach, which has increasingly become more popular in the recent past, usually uses Markov chain Monte Carlo (McMC) techniques. However, there are many challenges in conventional use of posterior sampling via McMC for inference. This includes the need to evaluate convergence of posterior samples, which often requires extensive simulation and can be very time consuming. Spatio-temporal models used in disease mapping are often very complex and McMC methods may lead to large Monte Carlo errors if the dimension of the data at hand is large. To address these challenges, a new strategy based on integrated nested Laplace approximations (INLA) has recently been recently developed as a promising alternative to the McMC. This technique is now becoming more popular in disease mapping because of its ability to fit fairly complex space-time models much more quickly than the McMC. In this paper, we show how to fit different spatio-temporal models for disease mapping with INLA using the Leroux CAR prior for the spatial component, and we compare it with McMC using Kenya HIV incidence data during the period 2013-2016.

**Keywords:**

HIV, INLA, McMC, Leroux CAR Prior, Disease Mapping, Spatio-Temporal Models

1. Introduction

Statistical methods for disease mapping have grown very fast in the last decade. Modern registers provide a lot of information with high quality data recorded for different regions over a period of time (e.g. years). This has brought in new challenges and goals which also require new and more flexible statistical models, faster and less computationally demanding methods for model fitting, and advance softwares to implement them. Spatio-temporal disease mapping models are widely used to describe the temporal variation and geographical patterns of mortality risks or rates. The information obtained from these analyses is useful for health researchers and policy makers since it helps in formulating hypothesis about the aetiology of a disease, looking for risk factors and also allocation of resources efficiently in hot spot areas, or planning prevention and intervention measures.

Spatio-temporal models are mainly used in disease mapping studies because they make it possible to borrow strength from spatial and temporal neighbours to reduce the high variability that is common to classical risk estimators, such as the standardized mortality ratio (SMR) when studying, in particular, rare diseases or low populated areas. These models are usually formulated in a hierarchical Bayesian framework and typically rely on generalized linear mixed models (GLMM). Model fitting and statistical inference are commonly accomplished through the empirical Bayes (EB) and fully Bayes (FB) approaches. The EB approach usually relies on the penalized quasi-likelihood (PQL) [1], while the FB approach usually uses Markov chain Monte Carlo (McMC) techniques [2]. Both approaches have been used in the literature and both have advantages and disadvantages [3], but the FB approach has experienced an enormous expansion due to the advent of modern computers and free software to run McMC algorithms such as WinBUGS [4].

The FB approach provides posterior marginal distributions of the target parameters and consequently it provides a whole picture about the target parameters instead of a single point estimate. However, there are many challenges associated with this approach. The posterior sampling distributions are not readily available in a closed form and hence inference is usually achieved via McMC algorithms. This includes the need to evaluate convergence of posterior samples, which often requires extensive simulation and can be very time consuming. Spatio-temporal models used in disease mapping are often very complex and McMC methods may lead to large Monte Carlo errors and large computation time if the dimension of the data at hand is large [5]. Moreover, the available software do not implement easily specific algorithms that are often needed [6]. Hence, there is a need to strike a balance between the exact inference, model complexity and computing time. This is also an issue in spatio-temporal disease mapping where the data at hand are usually large and the models are complex. Additionally, there is also a challenge in choosing priors for the hyper parameters in order to obtain reliable inference [7][8].

To address these challenges, a new strategy based on integrated nested Laplace approximations (INLA) has recently been recently proposed [9]as a promising alternative to the McMC. This technique is now gaining popularity in disease mapping as compared to the McMC. This is because of its ability to fit very complex space-time models much more quickly than the McMC. Many latent Gaussian models, which comprises the models described in this paper, have conditional independence properties that lead to sparse precision matrices. This is an advantage in INLA since it helps in speeding up the computation thus providing Bayesian inference without running long and complex McMC algorithms. INLA also has an additional attractive feature since it can be easily used in the free software [10], with the package R-INLA [11].

There is an extensive literature in Bayesian spatio-temporal disease mapping. For parametric models, see for example [12][13][14]and Knorr-Held and Besag [15]for non-parametric time trends models. A major contribution to spatio-temporal disease mapping is a research paper byKnorr-Held [16], which describes four different types of space-time interactions. Most of the previous work in disease mapping is based on the popular conditional autoregressive priors (CAR) to model both the spatial and temporal effects extending the initial work of Besag, York, J. and Mollie [17]. However, there are other approaches based on splines that have been developed. For example, within an EB frame work, MacNab and Dean [18]proposed autoregressive local smoothing in space and B-spline smoothing for time. On the other hand, Ugarte, Goicoa, and Militino [19]and Ugarte, Goicoa, and Etxeberria [20]used a pure interaction P-spline model for space and time, while Ugarte, Goicoa, and Etxeberria [21]consider an ANOVA type P-spline model to study spatio-temporal patterns of prostate cancer mortality in Spain. From a FB frame work, see MarNab [22]and MacNab, and Gustafson [23]for the application of spline smoothing in disease mapping.

In this paper, our focus is to implement spatio-temporal disease mapping models using the INLA methodology. Most of the research in spatial and spatio-temporal disease mapping with INLA considers the Besag et al. [17]model (hereafter BYM model) which includes two spatial effects: one assuming a Gaussian exchangeable prior to model unstructured heterogeneity and another one assuming an intrinsic conditional autoregressive (ICAR) prior for the spatially structured variability [5][24][25][26]. However, the ICAR prior is improper and has the undesirable largescale property of leading to a negative pairwise correlation for regions located further apart [27][28]. In addition, the variance components in the BYM convolution model are not identifiable from the data [29][30]and informative hyper priors are needed for posterior inference. In this paper, we consider the prior proposed by Leroux, Lei, and Breslow [31], LCAR hereafter in this paper. This prior has been proved to be better than the ICAR prior [32]and can be easily implemented using the R-INLA package. This model has previously been used to construct a local adaptive algorithm for spatial smoothing [33].

The rest of this paper is organized as follows. In Section 2, a review of spatial model is given and different spatio-temporal models that will be fitted with INLA are described. A review of the R-INLA package and prior distributions to be used is presented in Section 3. In Section 4, the models discussed are used to analyze Kenya HIV incidence data for the years 2013-2016. In Section 5, we compare the INLA and McMC techniques and finally conclusion is given in Section 6.

2. Spatio-Temporal Models for Disease Mapping

Consider a large area, say a country, divided into small areas (let us say provinces or counties) that will be labelled by $i=1,2,\cdots ,n$ , and let ${Y}_{i}$ denote the number of incident cases (or deaths) in the ith small area. Then conditional on the relative risk ${\theta}_{i}$ , ${Y}_{i}$ is assumed to follow a Poisson distribution with mean ${\mu}_{i}={E}_{i}{\theta}_{i}$ , where ${E}_{i}$ is the number of expected cases. That is

${Y}_{i}|{\theta}_{i}~Poisson\left({\mu}_{i}={E}_{i}{\theta}_{i}\right)\mathrm{;}$ (1)

$\mathrm{log}\left({\mu}_{i}\right)=\mathrm{log}\left({E}_{i}\right)+\mathrm{log}(\theta i)$

here $\mathrm{log}\left({E}_{i}\right)$ is the offset and $\mathrm{log}\left({\theta}_{i}\right)$ is modeled as

$\mathrm{log}\left({\theta}_{i}\right)=\alpha +{u}_{i}$ (2)

where $\alpha $ is the global risk and ${u}_{i}$ is the spatially structured random effect. Very often, an intrinsic conditional autoregressive (ICAR) prior is used to modeled the vector of spatially structured random effects $u={\left({u}_{1},\cdots ,{u}_{n}\right)}^{\prime}$ . That is,

$u~N\left(0\mathrm{,}{\sigma}^{2}{R}^{-}\right)$ (3)

where ${}^{-}$ denotes the Moore-Penrose inverse of a matrix, ${\sigma}^{2}$ is the variance component and $R$ is the $n\times n$ spatial neighbourhood matrix with ij th element defined as:

${R}_{ij}=\{\begin{array}{l}{n}_{i}\text{,}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}i=j\\ -\mathrm{1,}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}j~i\\ \mathrm{0,}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{otherwise}\end{array}$ (4)

where ${n}_{i}$ represents the number of neighbours of area i and $i~j$ indicates that areas i and j are neighbours. Typically, two areas are neighbours if they share a common border.

The full conditional distributions of ${u}_{i}$ given all the other remaining components ${u}_{-i}=\left({u}_{1}\mathrm{,}\cdots \mathrm{,}{u}_{i-1}\mathrm{,}{u}_{i+1}\mathrm{,}\cdots \mathrm{,}{u}_{n}\right)$ can be expressed as follows:

${u}_{i}|{u}_{-i}~\text{Normal}\left(\frac{1}{{n}_{i}}{\displaystyle \sum _{j~i}^{n}}\text{\hspace{0.05em}}{u}_{j}\mathrm{,}\frac{{\sigma}^{2}}{{n}_{i}}\right)$ (5)

However, this model has been criticized since the spatial and non-spatial effects are not identifiable, as noticed by Eberly and Carlin [34]. To fix these identifiability problems, Leroux [31]considered the following LCAR prior that takes into account spatially structured and unstructured variability:

$u~N\left(0\mathrm{,}{\sigma}^{2}{Q}^{-1}\right)\mathrm{,}Q=\left[\rho R+\left(1-\rho \right)I\right]$ (6)

where $\rho \in \left[\mathrm{0,1}\right]$ is a spatial smoothing parameter and $I$ is a $n\times n$ identity matrix. When $\rho =0$ , the LCAR prior reduces to an exchangeable (independent) prior $u~N\left(0\mathrm{,}{\sigma}^{2}I\right)$ , and when $\rho =1$ , it reduces to the ICAR model $u~N\left(0\mathrm{,}{\sigma}^{2}{R}^{-}\right)$ [27][35].

The univariate full conditional distribution of ${u}_{i}$ can be expressed as:

${u}_{i}|{u}_{j\ne i}~\text{Normal}\left(\frac{\rho}{\left(1-\rho \right)+{n}_{i}\rho}{\displaystyle \sum _{j~i}^{n}}\text{\hspace{0.05em}}{u}_{j}\mathrm{,}\frac{{\sigma}_{u}^{2}}{\left(1-\rho \right)+{n}_{i}\rho}\right)$ (7)

Suppose now that for every small area i, data has been recorded for different time periods denoted by $t=1,\cdots ,T$ . Then, conditional on the relative risk ${\theta}_{it}$ , ${Y}_{it}$ which is the count of events in region i at time t is assumed to be Poisson distributed with mean ${\mu}_{it}={E}_{it}{\theta}_{it}$ , where ${E}_{it}$ is the number of expected cases. That is;

${Y}_{it}|{\theta}_{it}~Poisson\left({\mu}_{it}\mathrm{=}{E}_{it}{\theta}_{it}\right)\mathrm{,}log\left({\mu}_{it}\right)=log\left({E}_{it}\right)+log\left({\theta}_{it}\right)$ (8)

here, $log\left({\theta}_{it}\right)$ can be specified in different ways to define various models.

Various spatio-temporal models for disease mapping have been considered in the literature, with most of them based on the popular ICAR models extending the popular BYM model [17]. In this section, we discuss three models with parametric time trends and a set of non-parametric models that include different types of space-time interactions [16]. The INLA methodology will be used to fit these models.

2.1. Linear Time Trend Models

In this section, we consider a Bayesian model with a parametric linear trend for the temporal component which is with the model proposed by Bernardinelli [12]. This model is just an extension of the BYM spatial model but with an additional linear time trend and a differential time trend for each small area. The logarithm of the relative risks are modelled as follows;

$log\left({\theta}_{it}\right)=\alpha +{u}_{i}+\left(\beta +{\delta}_{i}\right)\cdot t$ (9)

where $\alpha $ is the intercept that quantifies the average outcome rate in the entire study region, ${u}_{i}$ is the spatial random effect, $\beta $ is the main linear time trend which represents the global time effect, and ${\delta}_{i}$ is a differential trend which captures the interaction between the linear time trend and the spatial effect ${u}_{i}$ . In this paper, the LCAR prior proposed by Leroux [31]is considered for the spatial effects ${u}_{i}$ and three different priors for the differential trend ${\delta}_{i}$ are explored. The first model is the one which assumes an exchangeable distribution and is denoted as Model 1. The second model considers an ICAR prior and the is denoted as Model 2a. The third one considers the LCAR prior for the area specific slopes and the model is denoted as 2b.

2.2. Nonparametric Dynamic Time Trend Models

In the model specified above, a linearity assumption imposed on the differential temporal trend ${\delta}_{i}$ . However, this assumption may not be realistic under practical situations, where it is common to observe change points in temporal trends due to improvement in treatments, screening programmes and early detection and intervention, and generally advances in research. Thus, it is necessary to extend Equation (9) by releasing out the linearity constraint and assuming a dynamic non-parametric trends. In this paper, various non-parametric models which also includes space-time interactions are examined. In these models, the LCAR prior distribution is used for the spatial component unlike the models considered by Knorr-Held [16]in which an ICAR prior distribution is used for the spatial component. Here, the logarithm of the relative risk is model as;

$log\left({\theta}_{it}\right)=\alpha +{u}_{i}+{\varphi}_{t}+{\gamma}_{t}+{\delta}_{it}$ (10)

here $\alpha $ and ${u}_{i}$ have the same parameterization as in Equation (9). The term ${\varphi}_{t}$ denote the temporally unstructured random effects and while the term ${\gamma}_{t}$ represent temporally structured random effects. Finally, ${\delta}_{it}$ is the space-time interaction term. Note that additive models are obtained if the interaction terms are not there. All the components in the model 10 are usually modelled as Gaussian Markov random fields (GMRF), Rueand Held [36]and prior distributions can be specified according to some structure matrices. Here, the spatial random effects ${u}_{i}$ is assumed to have LCAR prior distribution proposed by Leroux [31]. The temporally unstructured random effects ${{\varphi}^{\prime}}_{t}\text{s}$ are modeled as Gaussian exchangeable prior with mean 0 and variance ${\sigma}_{\varphi}^{2}$ . That is, $\varphi ~N\left(0\mathrm{,}{\sigma}_{\varphi}^{2}{I}_{t}\right)$ where $\varphi ={\left({\varphi}_{1}\mathrm{,}\cdots \mathrm{,}{\varphi}_{T}\right)}^{\prime}$ and ${I}_{t}$ is the $T\times T$ identity matrix. The temporally structured random effect $\gamma ={\left({\gamma}_{1},\cdots ,{\gamma}_{T}\right)}^{\prime}$ is modeled dynamically, for example, using a random walk of order 1(RW1) or order 2 (RW2). That is, ${\gamma}_{t}|{\gamma}_{t-1}~N\left({\gamma}_{t-1}\mathrm{,}{\sigma}^{2}\right)$ for RW1 and ${\gamma}_{t}|{\gamma}_{t-1}\mathrm{,}{\gamma}_{t-2}~N\left(2{\gamma}_{t-1}+{\gamma}_{t-2}\mathrm{,}{\sigma}^{2}\right)$ for RW2. The interaction terms $\delta ={\left({\delta}_{11},\cdots ,{\delta}_{nT}\right)}^{\prime}$ are assumed to be follow a Gaussian distribution with a precision matrix given as ${\sigma}_{\delta}^{2}{R}_{\delta}$ , where ${\sigma}_{\delta}^{2}$ is the variance parameter and ${R}_{\delta}$ is the structure matrix given by the Kronecker product of the corresponding structure matrices which identify the type of the temporal and/or spatial main effects which interact [37].

There are four ways to define the structure matrix, as presented in Knorr-Held [16]and reported in Table 1, reproduced from Ugarte, Adin, Goicoa, and Militino [38]. For the Type I interactions, all ${{\delta}^{\prime}}_{it}\text{s}$ are a priori independent. Therefore, it is assumed that there is no spatial and/or temporal structure on the interaction either and therefore ${\delta}_{it}~N\left(\mathrm{0,}1/{\tau}_{\delta}\right)$ . In Type II interactions, each ${\delta}_{i.},i=1,\cdots ,n$ follows a random walk (RW1 or RW2), independently of all other areas. Type II interactions will be suitable if temporal trends are different region to region, but do not have any structure in space. In Type III interactions, the parameters of the tth time point $\left\{{\delta}_{.1}\mathrm{,}\cdots \mathrm{,}{\delta}_{\mathrm{.}T}\right\}$ have a spatial structure independent from the other time points. Hence each ${\delta}_{.t},t=1,\cdots ,T$ follows an independent intrinsic autoregression. Type III interactions can be

Table 1. Specification and rank deficiency for different space-time interactions.

Source: Ugarte et al. (2014).

interpreted as different spatial trends for each year without any temporal structure. Type IV interaction, which is the most complex among the space-time interactions, assumes that ${{\delta}^{\prime}}_{it}\text{s}$ are completely dependent over space and time. This type of interaction will be appropriate if temporal trends are different from region to region, but are more likely to be similar for adjacent regions. Table 1 gives a summary of the structure matrices for the different type of space-time interactions and the rank deficiencies. To ensure identifiability of the interaction term $\delta $ in case of rank deficiency, specific sum-to-zero constraints have to be used. If these constraints are not included then the interaction terms are confounded with the main time effect $\gamma $ . It is only the Type I interaction which does not need additional constraints as this prior does not induce a rank deficiency, see Table 1.

Different combinations of priors for the temporally structured effect (RW1 or RW2) and the type of interaction produce 20 additional models to models 1, 2a, and 2b discussed in Section 2.1. Models 3a and 3b are the additive models (obtained when the interaction term is dropped) with RW1 and RW2 for the temporally structured effect, respectively. Models 4a and 4b are Type I interaction models with RW1 and RW2 for the temporally structured effect, respectively. Models 5a and 5b are the same as models 4a and 4b but with a Type II interaction. Models 6a and 6b are Type III interaction models, and Models 7a and 7b include a Type IV interaction. In addition, models without the unstructured temporal effect are considered. Models 8a and 8b are additive models with RW1 and RW2 priors for the temporally structured effect. Models 9a and 9b are Type I interaction models, Models 10a and 10b are Type II interaction, Models 11a and 11b include a Type III interaction models and Models 12a and 12b are the Type IV interaction models.

3. Bayesian Inference Using Integrated Nested Laplace Approximations (INLA)

The Bayesian inference using INLA methodology is implemented in a package called inla, which is a C program [11]. This program is based on the GRMFLib-library, which incorporates efficient algorithms for sparse matrices [36]. Here, the computations are speeded up by the implementation of parallel computing elements. An R-interface called R-INLA is available to ease the usage of the inla program. The inla program is has been incorporated within the R library [10]. The software is available for free download at http://www.r-inla.org and run in a Linux, MAC and Windows environment. For the analyses in this paper, the INLA library built on the 3^{rd} June 2014 was used.

The models in INLA can be ran by specifying the linear predictor of the model as a formula object in R using the function f() for the smooth effects such as fixed effects, non linear terms and random effects. The interface is very flexible and it has options that allows different models and priors to be specified easily. Several authors [39][40][41][42]summarize the different spatial models available in R-INLA as latent effects that can be used to build models. In this section, only an overview of the spatial models that will be used to fit the models considered in this paper will be provided.

Spatial latent effects for the lattice data in R-INLA consist of a prior distribution which follow a multivariate normal distribution with zero mean and precision matrix $\tau C$ , where $\tau $ is a precision parameter and $C$ is a square and symmetric structure matrix which controls how the spatial dependence is and it can assume different forms to induce different types of spatial interaction. When $C$ is completely specified, like in the case of spatio-temporal interaction effect, the “generic0” model is implemented and it defines a multivariate normal prior distribution with zero mean and generic precision matrix $C$ which is normally defined by the user.

For the case of spatially structured random effect, the “besag” and “generic1” models are used to implement the ICAR [17]and LCAR [31]prior distributions respectively. The besag model for the ICAR prior corresponds to a multivariate normal with zero mean and precision matrix $\tau R$ , with ${R}_{ij}$ equal to ${n}_{i}$ if $i=j$ , −1 if $i~j$ and 0 otherwise, where ${n}_{i}$ is the number of neighbours of county i and and $i~j$ indicates that counties i and j are neighbours. On the other hand, the LCAR prior, which forms the basis of the space-time disease mapping models discussed in this paper, can not be obtained directly in R-INLA, but the generic1 model can be used to introduce it easily. This model implements a multivariate normal distribution with zero mean and precision matrix $\tau Q$ , with

$Q=\left({I}_{n}-\frac{\beta}{{\lambda}_{max}}C\right)$ (11)

where $C$ is the structure matrix and ${\lambda}_{max}$ is the maximum eigenvalue of matrix $C$ which allows the parameter $\beta $ take values between 0 and 1. Ugarte [38]show that when $C=I-R$ then ${\lambda}_{\mathrm{max}}=1$ . Hence, the LCAR model proposed by Leroux et al. (1999) can be easily implemented in the R-INLA with a generic1 model by taking $C=I-R$ , so that $Q=\left(1-\beta \right)I+\beta R$ with $\beta \in \left(\mathrm{0,1}\right)$ .

In addition to the ICAR specification implemented in the besag model, bym model can be used to implement the sum of spatially structured and unstructured random effects described in the convolution model [17]. Similarly, for the spatially structured temporal random effects, the first and second order random walk priors are implemented using “rw1” and “rw2” models respectively. Finally, the identically independent random effects can be implemented using the “iid” model. In all these models, only the priors corresponding to the precision parameters (the inverse of the variance components) should be specified. In this paper, the following precision parameters are considered: ${\tau}_{u}=1/{\sigma}_{u}^{2}$ for the spatially structured random effect; ${\tau}_{\varphi}=1/{\sigma}_{\varphi}^{2}$ for the temporally unstructured random effect; ${\tau}_{\gamma}=1/{\sigma}_{\gamma}^{2}$ for the temporally structured random effect and ${\tau}_{\delta}=1/{\sigma}_{\delta}^{2}$ for the space-time interaction term.

To ensure the identifiability of the interaction term $\delta $ , it should be emphasized here that sum-to-zero constraints should be used depending on the type of interaction (see Table 1). The vector $\delta $ follows an IGMRF which is improper, i.e. its precision matrix or equivalently its structure matrix ${R}_{\delta}$ is not of full rank. Its improper distribution denoted by ${\pi}^{\ast}\left(\delta \right)$ is written as

${\pi}^{\ast}\left(\delta \right)=\pi \left(\delta |A\delta =e\right)$ (12)

where $A\delta =e$ denotes linear constraints on $\delta $ with $A$ given by those eigenvectors of ${R}_{\delta}$ which span the null space. Hence, to ensure the identifiability of $\delta $ , the null space of the respective structure matrix ${R}_{\delta}$ is computed using the obtained eigenvectors as linear constraints for the estimation of $\delta $ . Consequently, the number of linear constraints which are necessary is always equal to the rank deficiency of ${R}_{\delta}$ (see Table 1) and $e$ will be a vector of zeros.

In R-INLA, the model is normally fitted with a call to function inla(), which returns an inla object with the fitted model. This function provides for specification of different likelihood models (family object), computes marginal densities of the latent effects and, by default, the hyperparameters and also enables one to select the integration strategy for the approximations (control.inla object). In addition to the posterior marginal densities, it is possible to compute posterior marginals for the linear predictor (control.predictor object). Several quantities for model choice and selection such as the effective number of parameters (pD) and the Deviance Information Criterion (DIC) are also provided within INLA (control.compute object).

4. Prior Distributions

The choice of prior distributions is very important in Bayesian inference because it can seriously affect the posterior distributions. The hyperprior distributions are defined in R-INLA with the argument hyper. Here, the hyperprior distributions for the spatial components are $log{\tau}_{s}~\mathrm{log}\text{Gamma}\left(1,0.01\right)$ and $\text{logit}\left({\lambda}_{s}\right)~\text{logitbeta}\left(\text{4},\text{2}\right)$ . This informative prior for ${\lambda}_{s}$ is used since the data at hand are known to show high spatial correlation. If no information about the amount of spatial correlation is available, a non informative prior such as a logitbeta (1, 1) can be used [38]. For the temporally unstructured component $\phi $ , a $log{\tau}_{\varphi}~\mathrm{log}\text{Gamma}\left(1,0.01\right)$ hyperprior is chosen [25]. For the temporally structured component $\gamma $ (RW1 or RW2) and the interaction term $\delta $ , minimally informative priors (which are the default priors) $log{\tau}_{\gamma}~\mathrm{log}\text{Gamma}\left(1,0.00005\right)$ , $log{\tau}_{\delta}~\mathrm{log}\text{Gamma}\left(1,0.00005\right)$ have been used. Further details on the choice of the priors for the precision (variance) parameters can be obtained inWakefield [7]andFong [8], among other papers. Finally, a Gaussian exchangeable prior with mean 0 and variance 1000 is used for the fixed effect.

5. Application to HIV Incidence Data

In this section we apply the models discussed in the previous sections to 2013-2016 HIV data collected by the Ministry of Health, Kenya. The data was extracted from the Kenya Aids Indicator Surveys (KAIS), conducted by the Government of Kenya. The data has been described in Section 1. The main objective of survey was to collect high quality data on the prevalence of HIV and sexually transmitted infections (STI) among adults, and to assess knowledge of HIV and STI among the populations.

All the 23 models already discussed in Section 2 were fitted to the 2013-2016 HIV data using INLA. An important feature of the INLA technique is that the computation time and cost are reduced substantially as compared to the McMC methods, and therefore many models can be fitted and compared in a much shorter time. For model selection and comparison, the Deviance Information Criterion (DIC) [4]was used. The DIC is the sum of the posterior mean of the deviance $\overline{D}$ (a measure of goodness of fit) and the effective number of parameters pD (a measure of model complexity). The best fitting model is one with the smallest DIC value. In our analysis in this paper, all models were fitted using the Simplified Laplace approximations strategy.

Table 2 shows $\overline{D}$ , pD and the DIC values for the 23 fitted models discussed in Section 2. It can be seen that while parametric models and the additive models (both parametric and non-parametric) are parsimonious with small values of the effective number of parameters pD, they are far from the best model since they have large DIC values. It is also clear that amongst the models fitted, models with type II and type IV interactions with a RW1 prior for the temporally structured effect show lower DIC values. Furthermore, models without the temporally unstructured effect seem to perform better. Finally, in terms of the trade-off between model fit and complexity, model 10a is the best model since it has the smallest DIC value. This model incorporates the spatially structured random effect with a LCAR prior, a temporally structured random effect with a RW1 prior and a type II interaction term. The corresponding R code for this model is provided in the Appendix.

For the best model (model 10a), the estimated logarithm of the relative risks obtained is made up of four different components: a global risk (denoted by $\widehat{\alpha}$ ) which is the overall risk common to all areas; the spatial location risk ( $\widehat{u}$ ) that

Table 2. Model comparison for the 23 fitted spatio-temporal models.

can arise due to factors associated to a specific area; a temporal risk trend common to all regions ( $\widehat{\gamma}$ ) that can arise due to changes in coding the disease, diagnostics, policies affecting the whole country and finally a region specific temporal risk trend $\widehat{\delta}$ attributed to specific effects of each county. Figure 1 shows the spatial and temporal patterns for HIV cases in Kenya. Figure 1 (upper left figure) shows the spatial incidence risk $\left({\widehat{\zeta}}_{i}=exp\left({\widehat{u}}_{i}\right)\right)$ associated to each county and constant along the period. Figure 1 (upper right figure) shows the posterior probability that the spatial risk is greater than 1 $\left(p=P\left({\zeta}_{i}>1|Y\right)\right)$ . Probabilities above 0.9 point towards high risk areas. Some discussions about reference thresholds in relative risks and cut-off probabilities can be obtained in Richardson, Thomson, and Best [43], Ugarte et al. [13], and Ugarte, Goicoa, and Militino [44].

It is clear from this figure that there is a higher risk of HIV infection in the counties to the Western region of Kenya as compared to the other counties. In particular, Homa Bay, Siaya, Migori and Kisumu counties show high relative risks. Finally, Figure 1 (bottom figure) shows the temporal risk trend common to all counties. Generally, there is an increasing trend in the whole period which indicates that there might be some factors affecting the whole country that produce an increase in risk along the period. There is a non-linear trend behavior of the temporal pattern over time, thus explaining the reason why the parametric linear trend models do not fit well to the HIV data as compared to the non-parametric ones.

The specific temporal trends (in log scale) for four selected counties are shown in Figure 2. There is a clear differences among counties, which means including the interaction term in the model is appropriate. Figure 3 displays the spatio-temporal pattern of HIV incidence rates for each county for the four-year period (2013-2016), and finally Figure 4 shows the posterior probabilities that the relative risks are greater than 1. It is clear from the information provided by the two maps that there is an increase in risk as the maps are getting darker with years. A number of counties in the Western region of Kenya show higher significant risk of HIV as compared to other regions.

6. Comparison of McMC and INLA Techniques

In Bayesian modeling and inference there are several challenges in the use of the popular McMC. One challenge is that the McMC uses posterior sampling inference which requires the need to evaluate convergence of posterior samples. This usually requires extensive simulation that can be costly and time consuming. The frequently used software packages for the implementation of the McMC technique include WinBUGS, OpenBUGS, as well as certain selected R packages such as McMCpack and SAS procedures. WinBUGS has gained a lot of popularity in the recent past and has been used to run most of the McMC analyses.

INLA which has been proposed as an alternative to the burdensome McMC can be implemented as an R package (R-INLA) and performs Bayesian modeling without using the posterior sampling methods. Unlike McMC algorithms, which rely on Monte Carlo integration, the R-INLA package performs Bayesian analyses using numerical integration which requires much shorter time since it does not require extensive iterative computation. Very often, Bayesian modeling using the INLA methodology takes much shorter time as compared to modeling using McMC. However, there have been limited attempts to compare performance capabilities of these software packages particularly for the case of spatio-temporal models in a disease mapping. In this section, a comparison of the McMC and INLA techniques based on the best fitting model (model 10a) in analysis of the Kenya HIV data in section 4 is provided.

Table 3 shows parameter estimates and the standard errors obtained using both McMC and INLA. The model estimates of the model parameters obtained by McMC and INLA methods are generally quite similar. The largest differnce is obtained for the spatial smoothing parameters, where the estimated standard errors by McMC is larger than the one obtained by INLA. As already mentioned above, INLA also does have an advantage over WinBUGS in terms of the computation time. For model 10a, it took 254 seconds to run an McMC while the computation time in INLA is 16.7 seconds.

Figure 1. Upper left figure: The spatial pattern of HIV incidence risks ${\zeta}_{i}=\mathrm{exp}\left({u}_{i}\right)$ ; Upper right figure: Posterior probabilities $P\left({\zeta}_{i}>1|Y\right)$ ; Bottom figure: Temporal trend of HIV incidence risks.

Figure 2. Specific temporal trends (in log scale) for the four selected counties: Homa Bay, Bomet, Nairobi and Wajir.

Figure 3. Relative incidence risk of HIV by counties.

Figure 4. $P\left({\widehat{r}}_{it}>1|Y\right)$ posterior probability distribution by counties.

Table 3. Model parameter estimates and standard errors obtained using McMC and INLA.

INLA also have some shortcomings. One challenge involves the ability to use hyperparameters as flexibly as in WinBUGS. While it is difficult to implement prior distributions for the standard deviations in INLA, this can be done easily in WinBUGS. Placing prior distributions on the standard deviations rather than fixing them or placing them on the precisions can lead to better model fits in some situations. Additionally, there is not an easy way to place hyperprior distributions on the precisions of the fixed effects.

There are many options in INLA for improving the models. Initially, we explore specifying the use of a full Laplace approximation strategy in INLA, but this does not lead to different parameter estimates and computation time is longer as compared to simple Laplace approximation. Specifying the full Laplace strategy did, however, lead to different goodness of fit measures that were closer to those produced with WinBUGS. Furthermore, the simplified Laplace strategy is not sufficient for computing predictive measures [25][45].

7. Conclusions

Spatial and spatio-temporal models are usually formulated in a hierarchical Bayesian framework and typically relies on generalized linear mixed models (GLMM). Model fitting and statistical inference are commonly accomplished through the empirical Bayes (EB) and fully Bayes (FB) approaches. The EB approach usually relies on the penalized quasilikelihood (PQL), while the FB approach usually uses Markov chain Monte Carlo (McMC) techniques. Spatio-temporal models used in disease mapping are often very complex and McMC methods may lead to large Monte Carlo errors and large computation time if the dimension of the data at hand is large. To address these challenges, a new strategy based on integrated nested Laplace approximations (INLA) has recently been proposed as a promising alternative to the McMC. In this paper, it is shown that INLA is able to fit fairly complex space-time models much more quickly than the McMC algorithms. INLA also has an additional attractive feature since it can be easily used in the free software R, with the package R-INLA. The INLA methodology also provides several quantities for Bayesian model choice and selection such as the effective number of parameters (pD) and the Deviance Information Criterion (DIC). The disadvantage of INLA involves the ability to use hyperparameters as flexibly as in WinBUGS. It is difficult to implement prior distributions for the standard deviations in INLA, while this can be done easily in WinBUGS. Placing prior distributions on the standard deviations rather than fixing them or placing them on the precisions can lead to better model fits in some situations. Furthermore, there is not an easy way to place hyperprior distributions on the precisions of the fixed effects.

Most of the works in spatial and spatio-temporal disease mapping with McMC and INLA considers the intrinsic conditional autoregressive (ICAR) prior for the spatially structured variability. However, the ICAR prior is improper and has the undesirable largescale property of leading to a negative pairwise correlation for regions located further apart. Moreover, the variance components in the BYM convolution model are not identifiable from the data and informative hyperpriors are needed for posterior inference. In this paper, we consider the LCAR prior as an alternative to the ICAR prior. The LCAR prior does not produce such negative correlations and has the advantage of including a parameter that quantifies spatial dependence as well as unstructured heterogeneity. A comparison of INLA and McMC has been done using the LCAR prior for the spatial random effects. WinBUGS is a populal tool for analysis in FB disease mapping while INLA was recently introduced and is now gaining popularity. Both techniques produce similar parameter estimates, except for the smoothing parameter, where McMC tends to overestimate it a bit more than INLA. To improve the models in INLA, we explore specifying the use of a full Laplace approximation strategy, but this does not lead to different parameter estimates and computation time is longer as compared to simple Laplace approximation. Specifying the full Laplace strategy did, however, lead to different goodness of fit measures that were closer to those produced with WinBUGS.

Finally, our analysis of the Kenya HIV incidence data for the period 2013-2016 shows that the incidence rate is still high, and counties located to the Western region show a significant high risk as compared with the other counties. In particular, Homa Bay, Siaya, Migori and Kisumu counties shows the highest risks. The reasons why these counties show high HIV incidence risks is a subject that is still under investigation and further research is needed.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

Cite this paper

Tonui, B., Mwalili, S. and Wanjoya, A. (2018) Spatio-Temporal Variation of HIV Infection in Kenya. Open Journal of Statistics, 8, 811-830. https://doi.org/10.4236/ojs.2018.85053

References

- 1. Breslow, N.E. and Clayton, D.G. (1993). Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association, 88, 9-25.
- 2. Gilks, W., Richardson, S. and Spiegelhalter, D. (2005) Markov Chain Monte Carlo. Wiley, Hoboken.
- 3. Goicoa, T., Ugarte, M. and Etxeberra, J., et al. (2012) Comparing Car and p-Spline Models Inspatial Disease Mapping. Environmental and Ecological Statistics, 19, 573-599. https://doi.org/10.1007/s10651-012-0201-8
- 4. Spiegelhalter, D., Best, N., Carlin, B. and van derLinde, A. (2002) Bayesian Measures of Model Complexity and Fit (with Discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64, 583-639. https://doi.org/10.1111/1467-9868.00353
- 5. Schrodle, B., Held, L. and Riebler, A. (2011) Using Integrated Nested Laplace Approximations for the Evaluation of Veterinary Surveillance Data from Switzerland: A Case-Study. Journal of the Royal Statistical Society, Series C, 60, 261-279. https://doi.org/10.1111/j.1467-9876.2010.00740.x
- 6. Schmid, V. and Held, L. (2004) Bayesian Extrapolation of Space-Time Trends in Cancer Registry Data. Biometrics, 60, 1034-1042. https://doi.org/10.1111/j.0006-341X.2004.00259.x
- 7. Wakefield, J. (2007) Disease Mapping and Spatial Regression with Count Data. Biostatistics, 8, 158-183. https://doi.org/10.1093/biostatistics/kxl008
- 8. Fong, Y., Rue, H. and Wakefield, J. (2010) Bayesian Inference for Generalized Linear Mixed Models. Biostatistics, 11, 397-412. https://doi.org/10.1093/biostatistics/kxp053
- 9. Martino, S. and Rue, H. (2009) Implementing Approximate Bayesian Inference Using Integrated Nested Laplace Approximation: A Manual for the Inla Program. Department of Mathematical Sciences, Norway. http://www.math.ntnu.no/hrue/GMRFLib
- 10. R Core Team (2016) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. MRC Biostatistics Unit, Austria. http://www.R-project.org/
- 11. Rue, H., Martino, S. and Chopin, N. (2009) Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations. Journal of the Royal Statistical Society, Series B, 71, 319-392. https://doi.org/10.1111/j.1467-9868.2008.00700.x
- 12. Bernardinelli, L., Clayton, D. and Montomoli, C. (1995). Bayesian Estimates of Disease Maps: How Important Are Priors? Statistics in Medicine, 14, 2411-2431. https://doi.org/10.1002/sim.4780142111
- 13. Assuncao, R., Reis, I. and Oliveira, C. (2001) Diffusion and Prediction of Leishmaniasis in a Large Metropolitan Area in Brazil with a Bayesian Space-Time Model. Statistics in Medicine 20, 2319-2335. https://doi.org/10.1002/sim.844
- 14. Ugarte, M.D., Goicoa, T. and Ibanez, B., et al. (2009) Evaluating the Performance of Patio-Temporal Bayesian Models in Disease Mapping. Environmetrics, 20, 647-665. https://doi.org/10.1002/env.969
- 15. Knorr-Held, L. and Besag, J. (1998) Modelling Risk from a Disease in Time and Space. Statistics in Medicine, 17, 2045-2060. https://doi.org/10.1002/(SICI)1097-0258(19980930)17:18<2045::AID-SIM943>3.0.CO;2-P
- 16. Knorr-Held, L. (2000) Bayesian Modeling of Inseparable Space-Time Variation in Disease Risk. Statistics in Medicine, 19, 2555-2567. https://doi.org/10.1002/1097-0258(20000915/30)19:17/18<2555::AID-SIM587>3.0.CO;2-#
- 17. Besag, J., York, J. and Mollie, A. (1991) Bayesian Image Restoration with Two Applications in Spatial Statistics (with Discussion). Annals of the Institute of Statistical Mathematics, 43, 1-59. https://doi.org/10.1007/BF00116466
- 18. MacNab, Y.C. and Dean, C.B. (2001) Autoregressive Spatial Smoothing and Temporal Spline Smoothing for Mapping Rates. Biometrics, 57, 949-956. https://doi.org/10.1111/j.0006-341X.2001.00949.x
- 19. Ugarte, M.D., Goicoa, T. and Militino, A.F. (2010) Spatio-Temporal Modeling of Mortality Risks Using Penalized Splines. Environmetrics, 21, 270-289. https://doi.org/10.1002/env.1011
- 20. Ugarte, M.D., Goicoa, T. and Etxeberria, J. (2012) Projections of Cancer Mortality Risks Using Spatio-Temporal p-Spline Models. Statistical Methods in Medical Research, 21, 545-560. https://doi.org/10.1177/0962280212446366
- 21. Ugarte, M.D., Goicoa, T. and Etxeberria, J. (2012) A p-Spline Anova Type Model in Space-Time Disease Mapping. Stochastic Environmental Research and Risk Assessment, 26, 835-845. https://doi.org/10.1007/s00477-012-0570-4
- 22. MacNab, Y.C. (2007) Spline Smoothing in Bayesian Disease Mapping. Environmetrics, 18, 727-744. https://doi.org/10.1002/env.876
- 23. MacNab, Y.C. and Gustafson, P. (2007) Regression b-Spline Smoothing in Bayesian Disease Mapping: With an Application to Patient Safety Surveillance. Statistics in Medicine, 26, 4455-4474. https://doi.org/10.1002/sim.2868
- 24. Schrodle, B. and Held, L. (2011a) A Primer on Disease Mapping and Ecological Regression Using INLA. Computational Statistics, 26, 241-258. https://doi.org/10.1007/s00180-010-0208-2
- 25. Schrodle, B. and Held, L. (2011) Spatio-Temporal Disease Mapping Using INLA. Environmetrics, 22, 725-734. https://doi.org/10.1002/env.1065
- 26. Held, L., Schrodle, B. and Rue, H. (2009) Posterior and Cross-Validatory Predictive Checks: A Comparison of MCMC and INLA. In: Kneib, T. and Tutz, G., Eds., Statistical Modeling and Regression Structures, Springer, Berlin, 91-110.
- 27. MacNab, Y.C. (2011) On Gaussian Markov Random Fields and Bayesian Disease Mapping. Statistical Methods in Medical Research, 20, 49-68. https://doi.org/10.1177/0962280210371561
- 28. Botella-Rocamora, P., Lpez-Qulez, A. and Martnez-Beneito, M.A. (2013) Spatial Moving Average Risk Smoothing. Statistics in Medicine, 32, 2595-2612. https://doi.org/10.1002/sim.5704
- 29. MacNab, Y.C. (2014) On Identification in Bayesian Disease Mapping and Ecological-Spatial Regression Models. Statistical Methods in Medical Research, 23, 134-155. https://doi.org/10.1177/0962280212447152
- 30. Rampaso, R., de Souza, A.D.P. and Flores, E. (2016) Bayesian Analysis of Spatial Data Using Different Variance and Neighborhood Structures. Journal of Statistical Computation and Simulation, 86, 535-552. https://doi.org/10.1080/00949655.2015.1022549
- 31. Leroux, B.G., Lei, X. and Breslow, N. (1999) Estimation of Disease Rates in Small Areas: A New Mixed Model for Spatial Dependence. In: Statistical Models in Epidemiology, the Environment and Clinical Trials, Springer-Verlag, New York, 135-178.
- 32. Lee, D. (2011) A Comparison of Conditional Autoregressive Models Used in Bayesian Disease Mapping. Spatial and Spatio-Temporal Epidemiology, 2, 79-89. https://doi.org/10.1016/j.sste.2011.03.001
- 33. Lee, D. and Mitchell, R. (2013) Locally Adaptive Spatial Smoothing Using Conditional Autoregressive Models. Journal of the Royal Statistical Society Series C, 62, 593-608. https://doi.org/10.1111/rssc.12009
- 34. Eberly, L.E. and Carlin, B.P. (2000) Identifiability and Convergence Issues for Markov Chain Monte Carlo Fitting of Spatial Models. Statistics in Medicine, 19, 2279-2294.
- 35. Lee, D., Ferguson, C. and Mitchell, R. (2009) Air Pollution and Health in Scotland: A Multi-City Study. Biostatistics, 10, 409-423. https://doi.org/10.1093/biostatistics/kxp010
- 36. Rue, H. and Held, L. (2005) Gaussian Markov Random Fields: Theory and Applications. CRC Press, London. https://doi.org/10.1201/9780203492024
- 37. Clayton, D. (1996) Generalized Linear Mixed Models. In: Gilks, W., et al., Eds., Markov Chain Monte Carlo in Practice, Chapman & Hall, London, 275-301.
- 38. Ugarte, M.D., Adin, A., Goicoa, T. and Militino, A.F. (2014) On Fitting Spatiotemporal Disease Mapping Models Using Approximate Bayesian Inference. Statistical Methods in Medical Research, 23, 507-530. https://doi.org/10.1177/0962280214527528
- 39. Gmez-Rubio, V., Bivand, R.S. and Rue, H. (2014) Spatial Models Using Laplace Approximation Methods. In: Fischer, M.M. and Nijkamp, P.E., Eds., Handbook of Regional Science, Springer, Berlin, 1401-1417.
- 40. Bivand, R.S., Gmez-Rubio, V. and Rue, H. (2015) Spatial Data Analysis with R-INLA with Some Extensions. Journal of Statistical Software, 63, 1-31. https://doi.org/10.18637/jss.v063.i20
- 41. Lindgren, F. and Rue, H. (2015) Bayesian Spatial Modeling with R-INLA. Journal of Statistical Software, 63, 1-25.
- 42. Blangiardo, M. and Cameletti, M. (2015) Spatial and Spatio-Temporal Bayesian Models with R-INLA. John Wiley and Sons Inc., New York.
- 43. Richardson, S., Thomson, A. and Best, N.A. (2004) Interpreting Posterior Relative Risk Estimates in Disease-Mapping Studies. Environmental Health Perspectives, 112, 1016-1025.
- 44. Ugarte, M.D., Goicoa, T. and Militino, A.F. (2009) Empirical Bayes and Fully Bayes Procedures to Detect High Risk Areas in Disease Mapping. Computational Statistics & Data Analysis, 53, 2938-2949. https://doi.org/10.1016/j.csda.2008.06.002
- 45. Carroll, R., Lawson, A.B., Faes, C., Kirby, R.S., Aregay, M. and Watjou, K. (2015) Comparing INLA and OpenBUGS for Hierarchical Poisson Modeling in Disease Mapping. Spatial and Spatio-Temporal Epidemiology, 14-15, 45-54.

Appendix

R-INLA code for model 10a

#Type II interaction and RW2 prior for time #

S=47

T=4

temp <- poly2nb(kenya)

nb2INLA("kenya.graph", temp)

kenya.adj <- paste(getwd(),"/kenya.graph",sep="")

H <- inla.read.graph(filename="kenya.graph")

# Temporal graph

D1 <- diff(diag(T),differences=1)

Q.gammaRW1 <- t(D1)%*%D1

D2 <- diff(diag(T),differences=2)

Q.gammaRW2 <- t(D2)%*%D2

Q.xi <- matrix(0, H$n, H$n)

for (i in 1:H$n){

Q.xi[i,i]=H$nnbs[[i]]

Q.xi[i,H$nbs[[i]]]=-1

}

Q.Leroux <- diag(S)-Q.xi

R <- kronecker(Q.gammaRW1,diag(S))

r.def <- S

A.constr <- kronecker(matrix(1,1,T),diag(S))

formula <- y ˜f(ID.area, model="generic1", Cmatrix= Q.Leroux, constr=TRUE,

hyper=list(prec=list(prior="loggamma", param=c(1,0.01)),

beta=list(prior="logitbeta", param=c(4,2))))+

f(ID.year, model="rw1", constr=TRUE,

hyper=list(prec=list(prior="loggamma", param=c(1,0.00005))))+

f(ID.area.year,model="generic0", Cmatrix=R, constr=TRUE,

hyper=list(prec=list(prior="loggamma", param=c(1,0.00005))),

extraconstr=list(A=A.constr, e=rep(0,S)))

27

model10a<-inla(formula, family="poisson", data=Data, E=E,

control.predictor=list(compute=TRUE,cdf=c(log(1))),

control.compute=list(dic=TRUE),

control.inla=list(strategy="simplified.laplace"))