Exposure to particulate matter with an aerodynamic diameter of less than 2.5 μm (PM2.5) may increase risk of lung cancer. The repetitive and broad-area coverage of satellites may allow atmospheric remote sensing to offer a unique opportunity to monitor air quality and help fill air pollution data gaps that hinder efforts to study air pollution and protect public health. This geographical study explores if there is an association between PM2.5 and lung cancer mortality rate in the conterminous USA. Lung cancer (ICD-10 codes C34- C34) death count and population at risk by county were extracted for the period from 2001 to 2010 from the U.S. CDC WONDER online database. The 2001-2010 Global Annual Average PM2.5 Grids from MODIS and MISR Aerosol Optical Depth dataset was used to calculate a 10 year average PM2.5 pollution. Exploratory spatial data analyses, spatial regression (a spatial lag and a spatial error model), and spatially extended Bayesian Monte Carlo Markov Chain simulation found that there is a significant positive association between lung cancer mortality rate and PM2.5. The association would justify the need of further toxicological investigation of the biological mechanism of the adverse effect of the PM2.5 pollution on lung cancer. The Global Annual Average PM2.5 Grids from MODIS and MISR Aerosol Optical Depth dataset provides a continuous surface of concentrations of PM2.5 and is a useful data source for environmental health research.
Lung cancer is a leading cause of cancer mortality in the United States. Exposure to particulate matter with an aerodynamic diameter of less than 2.5 μm (PM2.5) may increase risk of lung cancer [
Air pollution (including PM2.5) epidemiological studies often rely on ground monitoring networks to provide metrics of exposure. Ground monitoring data often lacks spatially complete coverage. Public health concerns compel efforts to broaden spatial and temporal coverage. The repetitive and broad-area coverage of satellites may allow atmospheric remote sensing to offer a unique opportunity to monitor air quality at continental, national and regional scales. To provide a continuous surface of concentrations of PM2.5 for health and environmental research, researchers at Battelle Memorial Institute in collaboration with the Center for International Earth Science Information Network/Columbia University have developed Global Annual Average PM2.5 Grids from MODIS and MISR Aerosol Optical Depth (AOD) covering year 2001 to 2010 [
This study was to examine if there is an association between PM2.5 and lung cancer mortality rate in the conterminous USA using the Global Annual Average PM2.5 Grids. The study used a suite of geographical approach, including remote sensing, GIS, cartography (map visualization and comparison), exploratory spatial data analysis (ESDA) and spatially extended statistical models.
The 2001-2010 Global Annual Average PM2.5 Grids from Moderate Resolution Imaging Spectroradiometer (MODIS) and Multi-angle Imaging Spectroradiometer (MISR) Aerosol Optical Depth (AOD) dataset [
In the Global Annual Average PM2.5 Grids data archive, each annual average data file contains integer values for a global grid (0.5˚ × 0.5˚) of estimated PM2.5 concentrations (in µg/m3) covering the world from 70˚N to 60˚S.The MODIS and MISR AOD retrievals were converted to ground-level concentrations based on a conversion factor developed by van Donkelaar et al. (2010). Level 3 global, monthly-mean MODIS and MISR AOD data for the years 2001-2010 were acquired from NASA LAADS and NASA Langley ASDC respectively. The MODIS level 3 (L3) monthly data were disaggregated from 1˚ resolution to 0.5˚ resolution to match the resolution of the MISR AOD data. AOD for both instruments that were anticipated to have a bias of greater than ±(0.1 uhuior 20%) as compared to ground-based Aerosol Robotic Network (AERONET; Holben et al. 1998) AOD due to high surface albedos or other persistent factors were removed from the analysis. The filtered MODIS and MISR data were then combined by taking the mean of each grid cell for each month of the year. Ground-level concentrations of dry 24 hour PM2.5 were estimated from the satellite observations of total-column AOD by applying a conversion factor that accounts for the spatial and temporal relationship between the two. This conversion factor is a function of aerosol size, aerosol type, diurnal variation, relative humidity and the vertical structure of aerosol extinction, which were derived from a global 3-D chemical transport model (GEOS-Chem: http://acmg.seas.harvard.edu/geos/) and assumes a relative humidity of 50% (van Donkelaar et al. 2010). The satellite AOD data were multiplied by monthly-mean conversion factors (calculated as a climatological mean over 2001-2006) for each grid cell. Finally, an annual-aver- age estimated surface PM2.5 concentration was estimated by calculating the mean of the monthly estimates over each year.
Lung cancer (ICD-10 codes C34-C34: malignant neoplasms of trachea, bronchus and lung) death count and population at risk by county for the conterminous USA were extracted for the period from 2001 to 2010 from the National Center for Health Statistics Compressed Mortality File 1999-2010 in the CDC WONDER online database [
To link lung cancer mortality rate with PM2.5, the 10-year (2001-2010) mean PM2.5 raster grid was first resampled so that each 0.5˚ × 0.5˚ grid cell was subdivided into 20 by 20 smaller cells retaining the original PM2.5 values. The purpose of the resampling procedure was to split the grid cell on the county boundary into smaller cells for neighboring counties to achieve higher accuracy of county average PM2.5 calculation.
The resampled PM2.5 grid was then overlaid with the map of lung cancer mortality rates. A GIS zonal statistical function was used to calculate the average PM2.5 value for each county. The average value was calculated by averaging PM2.5 values of all the cells formed after the resampling of the original grid whose centroids are within the county.
Exploratory spatial data analysis (ESDA) [
Univariate LISA resulted in a cluster map for each of the two variables. Bivariate LISA results were presented as a Moran scatter plot and a cluster map. Bivariate Moran’s I value determines the strength and direction of the relationship between mortality rate and PM2.5 in each county and measures the overall clustering. The Bivariate Moran’s I statistic is represented as the values of mortality rate averaged across all neighboring counties and plotted against PM2.5 in each county. If the slope on the scatter plot is significantly different to zero then there is association between mortality rate and PM2.5. Significance was tested by comparison to a reference distribution obtained by random permutations [
Two regression models were fitted to examine the relationship between lung cancer mortality rate and PM2.5: Spatial lag, and spatial error. The two spatial regression models could alleviate the problem of spatial autocorrelation that might exist within the data. Spatial autocorrelation is the propensity for data values closer to each other in space to be more similar. If spatial autocorrelation exists, the assumption of independent observations and errors of classical statistical models may be violated. Spatial regression methods capture spatial dependency in regression analysis, avoiding statistical problems such as unstable parameters and unreliable significance tests, as well as providing information on spatial relationships among the variables involved. The spatial lag model (also called Spatial Auto-Regressive model, or SAR) takes the form:
and the spatial error model takes the form:
where
The association between lung cancer mortality rate and PM2.5 was also explored using a more sophisticated spatially extended Bayesian Monte Carlo Markov Chain (MCMC) simulation model. Simulation-based algorithms for Bayesian inference allow us to fit very complicated hierarchical models, including those with spatially correlated random effects. In this geographical study, there could exist spatial autocorrelation within values of the lung cancer mortality rate and PM2.5. The following model was fitted allowing a convolution prior for the random effects:
where i is the index for a county, O is observed lung cancer death count, E is expected death count reflecting age-standardized values. For model specification, an improper (flat) prior for the intercept parameter β0 and a uniform prior distribution for the fixed-effect parameters (β1) were assumed. Fixed effect means that it applies equally to all the counties. Two sets of county-specific random effects were included in the model. The first set bi is spatially structured random effects assigned an intrinsic Gaussian conditional auto-regression (CAR) prior distribution (Suwa et al. 2002). The second set of random effects hi is assigned an exchangeable (non-spatial) normal prior. The random effect for each county is thus the sum of a spatially structured component bi and an unstructured component hi. This is termed a convolution prior (Suwa et al. 2002; Best 1999). The model is more flexible than assuming only CAR random effects, since it allows the data to decide how much of the residual disease risk is due to spatially structured variation, and how much is unstructured over-dispersion. To complete the model specification, conjugate inverse-gamma prior distributions were assigned to the variance parameters associated with the exchangeable and/or CAR priors.
The MCMC simulation computation technique and Gibbs sampling algorithm were used to fit the Bayesian model. Having specified the model as a full joint distribution on all quantities, whether parameters or observables, we wish to sample values of the unknown parameters from their conditional (posterior) distribution given those stochastic nodes that have been observed. MCMC methods perform Monte Carlo simulations generating parameter values from Markov chains having stationary distributions identical to the joint posterior distribution of interest. After these Markov chains converge to a stationary distribution, the simulated parameter values represent a correlated sample of observations from the posterior distribution. The basic idea behind the Gibbs sampling algorithm is to successively sample from the conditional distribution of each node given all the others. Under broad conditions this process eventually provides samples from the joint posterior distribution of the unknown quantities. Summaries of the post-convergence MCMC samples provide posterior inference for model parameters. The result of such analysis is the posterior distribution of a density function with covariate effects.
The model was fitted using the WinBUGS software?an interactive Windows version of the BUGS (Bayesian inference Using Gibbs Sampling) program for Bayesian analysis of complex statistical models using MCMC techniques [
Before you begin to format your paper, first write and save the content as a separate text file. Keep your text and graphic files separate until after the text has been formatted and styled. Do not use hard tabs, and limit use of hard returns to only one return at the end of a paragraph. Do not add any kind of pagination anywhere in the paper. Do not number text heads―the template will do that for you.
Finally, complete content and organizational editing before formatting. Please take note of the following items when proofreading spelling and grammar:
The univariate Moran’s I scatter plots for lung cancer mortality rate and PM2.5 are shown in
The bivariate LISA cluster map is shown in
average mortality rate in it is neighbors) in the mid of the east and south, and “low-low” in the mid and west. A few clusters of “low-high” are in the mid, and “high-low” in the mid, west and north east. These spatial outlier locations, especially areas with low PM2.5 but high mortality rates, warrant further investigation to see if other factors dominate in effects on mortality. These four categories of clusters correspond to the four quadrants in the Moran scatter plot as shown in
on the mean with the axes drawn such that the four quadrants are clearly shown. The high-high and low-low locations (positive local spatial correlation) represent spatial clusters, while the high-low and low-high locations (negative local spatial correlation) represent spatial outliers. The positive local Moran’s I value of 0.3851 (p = 0.05) indicates an overall positive spatial association and neighborhood effect between lung cancer mortality rate and PM2.5. It should be noted that the so-called spatial clusters shown on the LISA cluster map only refer to the core of the cluster [
Model description | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Y | No. of variables | No. of observations | Degrees of freedom | |||||||||
LUNG | 3 | 2931 | 2929 | |||||||||
Model fit | ||||||||||||
R2 | Log likelihood | AIC | ||||||||||
0.5650 | −10819.3 | 21644.6 | ||||||||||
Model estimation | ||||||||||||
Variable | Coefficient | Std. Error | Z-Value | p | ||||||||
ρ | 0.7261 | 0.0156 | 46.423 | 0.0000 | ||||||||
CONSTANT | 11.1278 | 0.8629 | 12.895 | 0.0000 | ||||||||
PM25 | 0.5319 | 0.0691 | 7.701 | 0.0004 | ||||||||
Diagnostic tests | ||||||||||||
Tests | DF | Value | p | |||||||||
Heteroskedasticity | Breusch-Pagan | 1 | 3.888 | 0.0486 | ||||||||
Spatial dependence | Likelihood Ratio | 1 | 1603.998 | 0.0000 | ||||||||
after introducing the spatial lag term. In the likelihood ratio test of the spatial lag dependence, the result is significant. Introducing the spatial lag term did not completely remove the spatial effects.
In the spatial error model (
The Markov chains begin to converge after about 7,500 simulation runs and parameter value updates. After convergence, each simulation generates values fluctuating around within a consistent range of values representing the posterior distribution of each model parameter. Inferences were made about the parameters of the model using the simulated values on iterations 7500 to 20,000.
Model description | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Y | No. of variables | No. of observations | Degrees of freedom | |||||||||
LUNG | 2 | 2931 | 2929 | |||||||||
Model fit | ||||||||||||
R2 | Log likelihood | AIC | ||||||||||
0.5620 | −10833.3 | 21670.6 | ||||||||||
Model estimation | ||||||||||||
Variable | Coefficient | Std. Error | Z-Value | p | ||||||||
CONSTANT | 47.0179 | 1.7097 | 27.500 | 0.0000 | ||||||||
PM25 | 1.1852 | 0.1885 | 6.289 | 0.0000 | ||||||||
λ | 0.7318 | 0.0155 | 47.136 | 0.0000 | ||||||||
Diagnostic tests | ||||||||||||
Tests | DF | Value | p | |||||||||
Heteroskedasticity | Breusch-Pagan | 1 | 4.019 | 0.0450 | ||||||||
Spatial dependence | Likelihood Ratio | 1 | 1575.931 | 0.0000 | ||||||||
Fixed | Posterior | Posterior | Standard | MC | 95% Credible |
---|---|---|---|---|---|
effects | mean | median | deviation | error | set |
β0 | 0.037 | 0.029 | 0.058 | 0.002 | (−0.068, 0.184) |
β1 | 1.308 | 1.316 | 0.245 | 0.011 | (0.892, 1.731) |
* Posterior means, medians, and 95% credible sets are based on post-convergence iterations (from iteration 7500 to 20,000). Dependent variable: lung cancer mortality rate. Fixed effects are: β0-intercept, β1?effect of PM2.5.
gorithm [
value. The PM2.5 parameter value density curve and
Significant positive association was found between lung cancer mortality rate and PM2.5. There is an excess risk of lung cancer mortality in areas with high PM2.5 levels. This study used a geographical/ecological approach. Ecological studies are more useful for generating and testing hypothesis (Rytkönen 2003). The statistically significant association between lung cancer mortality and PM2.5 can be taken as indicative of a potential air pollution effect. The association would justify the need of further toxicological approach for investigating the biological mechanism of the adverse effect of the PM2.5 pollution. Although the mechanism underlying the correlation between PM2.5 exposure and lung cancer has not fully elucidated, PM2.5-induced oxidative stress has been considered as an important molecular mechanism of PM2.5-mediated toxicity [
Remote sensing could help fill pervasive data gaps that hinder efforts to study air pollution and protect public health. The Global Annual Average PM2.5 Grids from MODIS) and MISR Aerosol Optical Depth dataset provides a continuous surface of concentrations of PM2.5 and is a useful data source for health and environmental research.
Hu, Z.Y. and Baker, E. (2017) Geographical Analysis of Lung Cancer Mortality Rate and PM2.5 Using Global Annual Average PM2.5 Grids from MODIS and MISR Aerosol Optical Depth. Journal of Geoscience and Environment Protection, 5, 183-197. https://doi.org/10.4236/gep.2017.56017