^{1}

^{*}

^{1}

Cardiorespiratory diseases are a serious public health problem worldwide. Identification of spatial patterns in health events is an efficient tool to guide public policies in environmental health. However, only few studies have considered spatial pattern analysis which is considered the evaluation of spatial autocorrelation, degree of autocorrelation and dependence behavior in terms of distances. Therefore, the objective of this study is to propose a set of procedures to evaluate the spatial patterns of cardiorespiratory diseases in the Federal District, Brazil. Specifically, our proposal will be based on four questions: a) is the spatial distribution of all patients clustered, random or dispersed? b) what is the degree of clustering for either high values or low values of patients? c) what is the spatial dependence behavior? d) considering the spatial variation, at what distance does the type of distribution (cluster, random or disperse) begin to change? We chose four methods to answer these questions Global Moran’s I (question “a”); Getis-Ord General G (question “b”); semivariogram analysis (question “c”); and multi-distance spatial cluster-K-function (question “d”). Our results suggest that there is a different behavior for people up to 5 years old (cluster, p < 0.01), especially in distances below 2.5 km. For people above 59 years old, cluster is significant just in short distances (<200 m). For other age groups, the spatial distribution is basically random. Our study showed that it was possible to capture evidences of health disparities in the Federal District.

Cardiorespiratory diseases are a serious public health problem worldwide [

Ning et al. (2012) [

Many studies have applied geostatistical methods to evaluate the cause and effect relationships between two or more variables, such as occurrence of diseases due to environmental factor [

Mitchell (1999) [

Thus, the objective of this study is to propose a set of procedures to evaluate the spatial patterns of cardiorespiratory diseases. Specifically, our proposal will be based on four questions: 1) is the spatial distribution of all patients clustered, random or dispersed? 2) what is the degree of clustering for either high values or low values of patients? 3) what is the spatial dependence behavior? 4) considering the spatial variation, at what distance does the type of distribution (cluster, random or disperse) begin to change?

This study used data from Federal District (FD), the Brazilian state in which the city of Brasília, Brazil’s capital, is located. FD has an area of 5802 km^{2} and an estimated population of 2.7 million people. It is the fourth most populous city in Brazil [

Three types of data were used in this study: a) health data, b) geographic database of addressing, and c) a demographic census.

Health data were provided by the Individual Health System [

The geographic addressing database was provided by the Urban State Secretary of FD [

The demographic census data were provided by the Brazilian Institute of Geography and Statistics [

Initially, the health data were summarized with the aim of combining patients who live in the same area. A total of 7066 patients who were admitted for circulatory and respiratory system illness between 2008 and 2013, were grouped a total of 169 address groups.

Subsequently, groups were combined with the geographic addressing database (using address as parameter) and the 7066 patients were integrated into the geographic information system (GIS) database. Finally, based on the principle of proportionality (number of patients per population), the 169 address groups were normalized based on demographic data.

We used the software ArcGis, version 10.2, as a tool for data integration and to evaluate the spatial patterns.

We chose four methods to answer the questions which the aim of this study was based (presented on the introduction): Global Moran’s I (question “a”); Getis-Ord General G (question “b”); semivariogram analysis (question “c”); and multi-distance spatial cluster―K-function (question “d”).

The Global Moran’s I test measures spatial autocorrelation based on two parameters: location (address of each patient) and values (number of patients living in the same location). The results of this test classify the distribution as clustered, dispersed or random. Equation (1) shows the calculation used in the Global Moran’s I test [

where Z_{i} is the deviation of the number of patients for polygons that represent each of the 169 address groups; W_{i}_{,j} is the spatial weight between polygons i and j; n is the total number of address groups, which in this study was 169; and So is the aggregate of all the spatial weights, represented by Equation (2).

A positive Moran’s I index value indicates a tendency toward clustering, while a negative value indicates a tendency toward dispersion. However, this classification tendency can only be accepted based on a hypothesis test. The null hypothesis of the Global Moran’s I test is that the value of each polygon (in this study, the number of patients) is randomly distributed across the study area. The null hypothesis is rejected in two cases: first, when the p-value is ≤0.10 and z-score is ≥1.65, so the distribution is statistically significantly clustered, and second, when the p-value is ≤0.10 and the z-score is ≤−1.65, so the distribution is statistically significantly dispersed.

The Getis-Ord General G, the second test, identifies the degree of clustering for either high or low values. Equation 3 shows the calculation used in the test [

where X_{i} and X_{j} are the number of patients for each of the polygons I and j; W_{i}_{,j} is the spatial weight between polygons i and j; n is the total number of address groups; and indicates that polygons i and j cannot be the same polygon.

A high General G index indicates that high values for the attribute are clustered, while a low General G index indicates low values for the attribute are clustered. The null hypothesis of the Getis-Ord General G is that there is no spatial clustering. The null hypothesis can be rejected in two cases: first, when the p-value is ≤0.10 and the z-score is ≥1.65, so there is clustering for high values (High Cluster), and second, when the p-value is ≤0.10 and the z-score is ≤−1.65, so there is clustering for low values (Low Cluster).

The test Getis-Ord General G is sensitive to a choice of distance which the spatial relationships among features are calculated. In this study we choose the fixed distance band, which each feature is analyzed in terms of the context of neighboring features.

According to Chun and Griffith (2013) [

where is the semivariogram value; d is the distance; A_{i }is the value at location i, and A_{j} is the value at location j. In this study, these values were the number of patients in each location.

The result from the empirical semivariogram is a chart that shows all pairs of locations that were compared. On the x-axis is the distance between the locations, and on the y-axis is the semivariogram value calculated by the empirical semivariogram (Equation (4)). The interpretation of this chart is: the lower Y value, the closer are the values from the points compared for the location X.

The K-function is used to examine whether the distribution is clustered or dispersed considering a range of distances [

where L(d) is the K-function; d is the distance; n is the total number of address groups; A is the total area of the address groups; and k_{i,j} is the weight. This weight will be 1 when the distance between i and j is less than d and will be 0 otherwise.

The k value is calculated at several distances and displayed on a chart, which shows the value observed, the value expected and the confidence level. When the observed k value for a particular distance is above the line for the expected value, the distribution is more clustered. When the observed k value is below the line for the expected value, the distribution is more dispersed. When the observed k value is larger than the upper confidence level, clustering for that distance is statistically significant. When the observed k value is smaller than the lower limits of the confidence level, dispersion for that distance is statistically significant [

From the total of 7066 patients 3381 (47.8%) were less than 5 years old, which was the largest group. The smallest group, 177 (2.5%), was the patients between 6 and 17 years old. The groups of patients between 18 and 59 years old and above 59 years of age included 1908 and 1600 patients, respectively (

The descriptive analysis of the data before normalization showed that the 7066 patients were distributed in 169 address groups in the FD. The average of this distribution was 41.81 (± 76.9) patients per address. The patients up to 5 years old had the highest average, of 20.1 (±42.1) patients per address (

Parameter | Up to 5 years | Between 6 and 17 years | Between 18 to 59 years | Above 59 years | All ages |
---|---|---|---|---|---|

Maximum | 272 | 25 | 105 | 128 | 428 |

Total | 3381 | 177 | 1908 | 1600 | 7066 |

Average | 20.1 | 1.1 | 11.3 | 9.5 | 41.8 |

Standard deviation | 42.1 | 2.7 | 19.8 | 17.8 | 76.9 |

In the normalized data analysis (patients per population in each address), the age group above 59 years had the highest average, 0.8 (±5.8), followed by the group up to 5 years old, with an average of 0.8 (± 4.2). The lowest average was in the age group between 6 and 17 years old, 0.02 (±0.2),

There was clustering (I value > 0) when the analysis considered the patients of all ages (I = 0.012, p < 0.10). However, upon analyzing age groups, only the group up to 5 years old exhibits statistically significant tendency toward clustering (I = 0.024, p < 0.10). Other age groups presented a tendency to cluster (I value > 0), but without statistical significance (

There was no high or low clustering for the spatial distribution of patients. Random distributions were identified in all cases by Getis-Ord General G analysis due to the z-score value (near zero) and p-value (>0.10),

There was a concentration of a high number of points with low semivariogram values for all age groups, which means that the number of patients are related (similar) along at the distances. Only for the groups up to 5 years old and above 59 years old there were high semivariogram values, especially between 1500 and 5000 m (

Considering a range of distances, some differences in spatial patterns were found among age groups. For patients up to 5 years old, there is statistically significant clustering until ~2500 m. Above this distance, the spatial distribution is random. For the age groups 6 - 17 years old and 18 - 59 years old, the spatial distribution is random, except for the patients between 6 and 17 at distances between ~5500 and ~8000 m, when the distribution shows statistically significant clustering. For the oldest patients (>59 years old) there is statistically significant clustering at lower distances (~<1000 m) and between ~7500 and ~10,500 m. Above ~10,500 m, the distribution exhibits statistically significant dispersion for the oldest patients. Finally, considering all age groups, there is significant clustering up to ~2500 m (

Our results showed that the greater numbers of patients (patients per population) are children and seniors. We expected this result, since the previous studies have already showed that people up to 5 years old and above 59 years are more vulnerable to cardiorespiratory diseases [

The group of youngest and the group of all ages were the only that presented positive spatial autocorrelation

Parameter | Up to 5 years | Between 6 and 17 years | Between 18 and 59 years | Above 59 years | All ages |
---|---|---|---|---|---|

Minimum | 0 | 0 | 0 | 0 | 0.0001 |

Maximum | 52 | 2 | 14 | 75 | 19 |

Total | 132 | 4 | 22 | 137 | 35 |

Average | 0.8 | 0.02 | 0.1 | 0.8 | 0.2 |

Standard deviation | 4.2 | 0.2 | 1.1 | 5.8 | 1.5 |

Age | Moran’s Index (I) | z-score | p-value | Distribution |
---|---|---|---|---|

Up to 5 years old | 0.024 | 2.055 | 0.040 | Clustered |

Between 6 and 17 years old | 0.015 | 1.470 | 0.141 | Random |

Between 18 and 59 years old | 0.007 | 1.348 | 0.178 | Random |

Above 59 years of age | 0.011 | 1.561 | 0.118 | Random |

All ages | 0.012 | 1.678 | 0.093 | Clustered |

Age | General G | z-score | p-value | Distribution |
---|---|---|---|---|

Up to 5 years old | 0.000114 | 0.537 | 0.591 | Random |

Between 6 and 17 years old | 0.000090 | 0.044 | 0.965 | Random |

Between 18 and 59 years old | 0.000070 | −0.197 | 0.844 | Random |

Above 59 years of age | 0.000098 | 0.161 | 0.872 | Random |

All ages | 0.000095 | 0.130 | 0.897 | Random |

(cluster). We suggest that there is a similar factor responsible for the cardiorespiratory illness occurrence in the FD for these two groups (<5 years old; all ages). Considering that this factor is related with the urban environment, probably the spatial distribution of air pollution is affecting the cardiorespiratory illness occurrence. Several studies have showed the relation between air pollution and cardiorespiratory diseases [

Specifically for the all ages analysis, probably the group with youngest people (up to 5) is influencing the results, because the age group up to 5 years old is predominant in our analysis, which represents 47.8% of our sample.

Although the groups up to 5 and all ages present positive spatial autocorrelation, there was no specifically high or low clustering. This result may be because the extremes and lowest values are considered geographically isolated outliers in FD.

The semivariogram analysis showed that the higher concentration of points has low semivariogram values. So we suggest that the number of patients per residential sector is homogenous. For the group up to 5 years old and above 59 years old we found a low concentration of points with high semivariogram values, especially between 1500 and 5000 m. This result shows that for the children and seniors the spatial distribution of patients is less homogenous.

As the K-function analysis, the group up to 5 years old and the group above 59 years old showed significant cluster at short distances (<2500 m). Above 2500 m the spatial distributions is random or disperse. This result can be compared with the semivariogram, which the higher concentration of high semivariogram values for the children and seniors is between 1500 and 5000 m. The heterogeneity (high semivariogram values) at this range of distances can be affecting the changing of the spatial distribution (from cluster to random or disperse).

There are some limitations in this study. The first one is regarding to the method that we use to estimate the population in each address group where there are patients. It is different the spatial scale of the polygons that represent the health data and the polygons that represent the population data. In other words, the shape and the size of the polygons are not the same. We suggest to future studies to apply the dasymetric method [

The second limitation is regarding to the shape of the input data used to calculate the spatial patterns. We used polygons, which represent the aggregation of the health data. In this way, was used the centroid of each polygon in order to evaluate the spatial patterns. This is the GIS technique used when the input data is not a point. Therefore, this transformation can cancel some patterns, especially because the polygons are not uniform. However, we highlight that this limitation does not invalidate the results, since it is a limitation linked to the availability of data. Others studies have reported results with the same limitation (centroid of each polygon), such Cook et al. (2013) [

Finally, the health data do not representative of all hospital admissions in the FD. The health data provided to us is a subsample of the National Health Database, which includes information about residential addresses. According to the census for the FD, the National Health Database included 399,564 hospital admissions for cardiorespiratory diseases during the period 2008-2013, while in our subsample the total hospital admissions is 7,066.

This study was the first in Brazil with the aim to evaluate the spatial patterns of cardiorespiratory diseases for all age groups (children, teens, adults and seniors). Also, it was the first environmental health study for the Brazilians cities which tested the geostatistical approaches―spatial autocorrelation, degree of clustering, semivariogram and k-function. Most of the studies in Brazil have used only the spatial autocorrelation analysis [

Our study showed that in the FD the spatial occurrence of cardiorespiratory diseases is different among the age groups. Understanding the spatial distribution of diseases, the degree of clustering and the dependence behavior in terms of distances are critical to the development of public policies in health, environment, and urban planning.

The authors thank CAPES Foundation, an agency under the Ministry of Education of Brazil, which provided scholarship to the first author. ESRI for providing the package of tools that make up the though ArcGIS 10 family of the contract number 2011 MLK 8733. Also, authors thank Imagem for the support and feasibility of establishing the terms of use between Geoscience Institute―University of Brasília and ESRI.

WeeberbRequia,HenriqueRoig, (2015) Analyzing Spatial Patterns of Cardiorespiratory Diseases in the Federal District, Brazil. Health,07,1283-1293. doi: 10.4236/health.2015.710143