This paper proposes a simple method of optimizing Air Quality Monitoring Network (AQMN) using Geographical Information System (GIS), interpolation techniques and historical data. Existing air quality stations are systematically eliminated and the missing data are filled in using the most appropriate interpolation technique. The interpolated data are then compared with the observed data. Pre-defined performance measures root mean square error (RMSE), mean absolute percentage error (MAPE) and correlation coefficient (r) were used to check the accuracy of the interpolated data. An algorithm was developed in GIS environment and the process was simulated for several sets of measurements conducted in different locations in Riyadh, Saudi Arabia. This methodology proves to be useful to the decision makers to find optimal numbers of stations that are needed without compromising the coverage of the concentrations across the study area.
It is well known that the air pollution causes adverse effects on human health in addition to the impact on environment. Due to rapid urbanization and industrialization, air pollution assumes high significance particularly in large cities. Continuous monitoring of the air pollution with a well-designed air quality monitoring network (AQMN) is the first step in addressing this issue. Obtaining the continuously monitored data to ensure the safe levels of air quality is one of the primary objectives of AQMN, in addition to evaluating exposure hazards and implementing effective control strategies. Environmental protection agencies would be looking for an optimal design of AQMN meeting these objectives with an obvious focus on minimizing cost.
The methodology to design a new AQMN or evaluate an existing AQMN attracted the attention of several researchers. Maximum sensitivity of the collected data [
Linear programming approach was also used by many researchers to site optimum AQMN. A multi-attribute utility function method was used for siting the air quality network by Kainuma et al. [
The strides that the field Geographical Information System (GIS) and its components (such as interpolation methods) are making as an application in almost every field are incredible. GIS and spatial interpolation techniques were also used in AQMN. Bayraktar et al. [
The methods summarized above are very useful, well established and has been implemented widely; however it appears a simple GIS based methodology would further reduce the complexities of AQMN design. The basic advantage of using GIS is that it organizes geographic data in such a way that the decision making process becomes easy. In addition to this, it provides several advanced functionalities to manage statistical and spatial data, interpolate the data to create smooth surface, extract data from the interpolated surface, and create algorithms to automate the process. Furthermore, it creates the results that can be visualized in interactive maps which will further simplify the decision making process. Taking the cue on these advantages, this paper proposes a simple and innovative process to optimize AQMN by using GIS, interpolation methods and the historical data. The existing stations are systematically eliminated by creating several interpolated maps and comparing it with the observed values. The number of stations that can be eliminated is governed by the pre-defined performance measures criteria. In recent times an increasing trend of air pollution has been observed in Riyadh city of Saudi Arabia and there is an emphasis on frequent air pollution measurements [
Interpolation predicts values for cells in a raster using a limited number of sample data points, which helps in predicting unknown values for any geographic station. Five interpolation methods were selected to estimate the concentrations of air pollution at the unknown stations. The selected methods were 1) Inverse Distance Weighted (IDW); 2) Spline (SPL); 3) Ordinary Kriging (OK); 4) Universal Kriging (UK) and 5) Natural Neighbor (NN). These methods have been widely used in estimating the air pollution concentrations.
The IDW uses a method of interpolation that estimates cell values by averaging the values of sample data points in the neighborhood of each processing cell [
The SPL uses an interpolation method that estimates values using a mathematical function that minimizes overall surface curvature, resulting in a smooth surface that passes exactly through the input points. This method was found superior in varying health risk from air pollution [
Kriging is an advanced geo-statistical procedure that generates an estimated surface from a scattered set of points with z-values. While Kriging is a weighted combination of monitor values, this method also uses spatial auto correlation among data to determine the weights. Generally Kriging has two different forms i.e. ordinary and universal Kriging [
Natural Neighbor interpolation finds the closest subset of input samples to a query point and applies weights to them based on proportionate areas to interpolate a value. It is also known as Sibson or “area-stealing” interpolation.
A statistical error is the amount by which an observation differs from its expected value. The statistical indices selected to measure performance are Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), Nash-Sutcliffe equation (NSE) [
where Intri = Interpolated value; Obsi = Observed value; n = number of observations.
RMSE is the frequently used measure of the differences between values predicted by a model or an estimator and the values actually observed. It basically represents the sample standard deviation of the differences between predicted and observed values. RMSE gives important information in predicting the magnitude of a pollutant concentration, a measure close to zero represents good predictions. The absolute mean percentage error denoted by MAPE is calculated by dividing sum of percentage error by number of observations, and a value equal or close to zero is considered ideal. Coefficient of correlation is a measure of linear dependence between two variables, and it was chosen to get an indication of the correspondence of timing and evolution between observed and interpolated concentration values. The coefficient of efficiency (NSE) indicates the normalized fit of the model, the value ranges from −∞ to 1. It compares the mean square error generated by a particular model simulation to the variance of the output sequence; a value of 1 indicates a perfect fit [
Several studies have used error statistics in comparing observed and predicted meteorology and air quality data. RMSE was used by Shad et al. [
The main objective of this optimization process is to eliminate as many stations as possible and filling in the missing information through the interpolated values. The steps involved in this process are illustrated in
Step 1: Selection of stations
As a first step, a single station (P) or set of stations (P1, P2...) are selected to be eliminated from the vector dataset used for creating the raster. A particular station or set of stations can be chosen or set of all possible stations tested in a loop.
Step 2: Storing the observed values
The observed concentrations at the selected station (P ()) or stations (P1 (), P2 () …) are stored as arrays. These values will later be compared with the interpolated values.
Step 3: Creating vector layer
A vector layer is a coordinate-based data model that represents geographic features, such as points. Each point is the station represented by the geographical coordinates. The layer has a column of “z” values used for interpolation. Several columns of “z” values i.e. the concentration of the pollutants are added to the vector for simulation. In this step the vector layer is created without the selected stations.
Step 4: Creating raster from the vector
Raster is defined as a spatial data model that defines space as an array of equally sized cells arranged in rows and columns, and composed of single or multiple bands. Each cell contains an attribute value and location coordinates. Rasters are created using the vector data and applying appropriate interpolation techniques. In this step five rasters are created using IDW, Spline, Ordinary Kriging, Universal Kriging and Natural neighbor methods.
Step 5: Extracting the interpolated value
This step calculates the predicted concentrations. The values at the eliminated stations are extracted from the created rasters. The values for each interpolation method (IDW, SPL, OK, UK, and NN) are stored in separate arrays (IDW1(), IDW2()…; SPL1(), SPL2()…; OK1(), OK2()…; UK1(), UK2()…; and NN1(), NN2()… ). This process of interpolation and value extraction are repeated for all the observed datasets.
Step 6: Performance measure
The process of interpolation and value extraction generates arrays of the interpolated values. In this step, the performance measures are applied to observed and interpolated values. RMSE, Bias, and correlation coefficient are calculated for the selected stations and the five interpolation methods. The interpolation method that generates the minimum performance measure is then chosen and the others are discarded. These measures are compared with the pre-defined threshold limits and if the measures are within the limit, they are stored in the possible station combinations array C ().
Step 7: Repeat for another station combination
The process is repeated for another station combination until all the combinations are exhausted.
Step 8: Finding the best possible station combination
Best possible station combinations can be chosen from the list of possible station combination array C (). The decision maker can then choose from the list of possible station combinations which can be eliminated from the AQMN.
The proposed methodology is applied to the city of Riyadh, Saudi Arabia. The city is divided into sixteen cells that are identical in area and each cell is 12 km × 12 km. The measurements were carried out intermittently from September 2011 to September 2012. Most of the measurements have been conducted approximately in the center of each cell (
As the measurements are staggered, in order to get a continuous dataset, 24 datasets were prepared from the available measurements by averaging the hourly measurements for the entire study period for all the 16 stations. These 24 datasets were used for simulation to create the raster with different interpolation techniques and compared with the observed values. ESRI’s ArcGIS (ESRI, Redlands) exposes several functions to run the interpolations and extract the necessary data. These functions were customized to run the simulation process through Microsoft Visual Basic for application (VBA) environment in ArcGIS.
The pollutant concentration data were collected from 16 stations as shown in
No. of stations to be eliminated | Possible combinations | No. of simulations | Approximate time of simulation (hours) |
---|---|---|---|
1 | 16 | 1920 | 0.3 |
2 | 120 | 14,400 | 2.25 |
3 | 560 | 67,200 | 10.5 |
4 | 1820 | 218,400 | 34 |
5 | 4368 | 524,160 | 81 |
6 | 8008 | 960,960 | 150 |
7 | 11,440 | 1,372,800 | 214 |
8 | 12,870 | 1,544,400 | 241 |
SO2 and CO, and the results of the simulations are discussed as follows. The interpolation was performed using five methods i.e. IDW, Spline, OK, UK and NN. The IDW and UK outperformed other methods particularly in terms of producing lower RMSE, MAPE values and higher value of r2.
The primary sources of ground level O3 are automobiles, cement and power plants, construction activities and biogenic or natural sources. Small industries such as paint shops, dry cleaners and bakeries are also known to contribute O3. These 10 stations fall within these areas. The stations 10 and 7 are located in dense residential area, and the stations 2 and 3 are in industrial area. The agricultural areas are found near station 9, and construction activities are reported near station 14. Station 1 and 16 are the city outskirts where small scale industries are located.
The simulation results for NOx concentrations are shown in
No. of station(s) to be eliminated | Optimal station(s) combination that can be eliminated | Interpol. method | RMSE | r2 | MAPE | NSE | ACFT |
---|---|---|---|---|---|---|---|
1 | 11 | IDW | 3.224 | 0.953 | 4.041 | 0.944 | 0.960 |
2 | 11, 15 | IDW | 3.811 | 0.925 | 5.153 | 0.919 | 0.973 |
3 | 4, 11, 15 | UK | 3.964 | 0.913 | 7.991 | 0.901 | 1.003 |
4 | 4, 11, 12, 15 | UK | 4.588 | 0.911 | 15.699 | 0.884 | 1.099 |
5 | 4, 5, 11, 15, 16 | IDW | 5.495 | 0.862 | 22.805 | 0.844 | 1.095 |
6 | 4, 8, 11, 12, 13, 15 | UK | 6.113 | 0.831 | 25.322 | 0.821 | 1.015 |
7 | 4, 5, 8, 11, 12, 15, 16 | IDW | 6.488 | 0.832 | 36.469 | 0.809 | 1.081 |
8 | 4, 5, 6, 8, 11, 12, 15, 16 | IDW | 7.018 | 0.817 | 49.610 | 0.775 | 1.138 |
No. of station(s) to be eliminated | Optimal station(s) combination that can be eliminated | Interpol. method | RMSE | r2 | MAPE | NSE | ACFT |
---|---|---|---|---|---|---|---|
1 | 11 | IDW | 5.999 | 0.865 | 3.792 | 0.848 | 1.020 |
2 | 3, 11 | IDW | 7.470 | 0.808 | 4.006 | 0.731 | 0.961 |
3 | 3, 5, 11 | UK | 9.084 | 0.619 | 6.591 | 0.599 | 0.984 |
4 | 3, 5, 7, 11 | UK | 10.766 | 0.439 | 12.184 | 0.430 | 1.025 |
5 | 3, 5, 7, 11, 14 | IDW | 11.521 | 0.461 | 15.621 | 0.459 | 1.022 |
6 | 3 , 5, 7, 11, 14, 16 | UK | 12.119 | 0.438 | 19.839 | 0.438 | 1.019 |
7 | 3 , 5, 6, 7, 11, 14, 16 | IDW | 14.834 | 0.395 | 23.148 | 0.341 | 0.966 |
8 | 1, 3 , 5, 6, 7, 11, 14, 16 | IDW | 14.684 | 0.312 | 27.468 | 0.287 | 0.977 |
was found that station 11 was the best one with a RMSE = 5.999; MAPE = 3.792 and r2 = 0.864. These values increased to 14.684, 27.468 and 0.312, respectively, for 8 stations eliminations (Figures 3-5). For MAPE scenario of ≤25, 7 a maximum of 7 stations could be eliminated (3, 5, 6, 7, 11, 14, and 16), however RMSE and r2 values were not within the predefined limits. Taking the MAPE as priority, a minimum of 9 stations is needed to produce satisfactory NOx concentration maps as shown in
Station 1, which is located outskirt of the City, was the best one station to be eliminated with RMSE = 2.155,
MAPE = 2.155 and r2 = 0.944. For eight stations, these parameters increased to 9.962, 69.931 and 0.301 respectively as shown in
The results of simulations run on CO data are shown in
As observed from Tables 2-5, there is no single station which is common among the list of possible elimination stations.
No. of station(s) to be eliminated | Optimal station(s) combination that can be eliminated | Interpol. method | RMSE | r2 | MAPE | NSE | ACFT |
---|---|---|---|---|---|---|---|
1 | 1 | UK | 2.155 | 0.323 | 2.155 | −1.507 | 0.983 |
2 | 1, 4 | IDW | 3.991 | 0.446 | 5.925 | 0.421 | 1.029 |
3 | 1, 4, 13 | UK | 4.599 | 0.375 | 10.792 | 0.355 | 1.042 |
4 | 1, 4, 6, 13 | IDW | 4.737 | 0.268 | 15.562 | 0.167 | 1.063 |
5 | 1, 4, 6, 13, 15 | IDW | 5.124 | 0.582 | 25.187 | 0.428 | 1.140 |
6 | 1, 4, 6, 11, 13, 15 | IDW | 6.258 | 0.642 | 43.558 | 0.284 | 1.248 |
7 | 1, 4, 6, 7, 11, 13, 15 | UK | 7.579 | 0.469 | 60.337 | 0.038 | 1.278 |
8 | 1, 4, 6, 7, 9, 11, 13, 15 | UK | 9.962 | 0.301 | 69.931 | 0.100 | 1.279 |
No. of station(s) to be eliminated | Optimal station(s) combination that can be eliminated | Interpol. method | RMSE | r2 | MAPE | NSE | ACFT |
---|---|---|---|---|---|---|---|
1 | 10 | IDW | 0.144 | 0.877 | 6.608 | 0.841 | 0.914 |
2 | 10, 16 | UK | 0.174 | 0.838 | 6.842 | 0.775 | 0.901 |
3 | 2, 10, 16 | UK | 0.237 | 0.807 | 11.283 | 0.671 | 0.863 |
4 | 2, 10, 14, 16 | UK | 0.239 | 0.718 | 16.928 | 0.659 | 0.915 |
5 | 2, 9, 10, 14, 16 | IDW | 0.236 | 0.665 | 25.182 | 0.647 | 0.953 |
6 | 2, 4, 9, 10, 14, 16 | IDW | 0.287 | 0.581 | 34.782 | 0.476 | 0.870 |
7 | 2, 4, 7, 9, 10, 14, 16 | IDW | 0.288 | 0.509 | 44.646 | 0.467 | 0.951 |
8 | 2, 4, 7, 9, 10, 14, 15, 16 | UK | 0.329 | 0.444 | 51.856 | 0.374 | 0.926 |
Pollutant | Stations needed to achieve MAPE < 25 |
---|---|
O3 | 1, 2, 3, 5, 6, 7, 9, 10, 14, 16 |
NOx | 1, 2, 4, 8, 9, 10, 12, 13, 15 |
SO2 | 2, 3, 5, 7, 8, 9, 10, 11, 12, 14, 16 |
CO | 1, 3, 4, 5, 6, 7, 8, 11, 12, 13, 15 |
Pollutant | RMSE | r2 | MAPE | NSE | ACFT |
---|---|---|---|---|---|
O3 | 6.114 | 0.831 | 25.322 | 0.821 | 1.016 |
NOx | 24.173 | 0.208 | 57.374 | 0.189 | 1.122 |
CO | 0.490 | 0.234 | 82.670 | 0.221 | 1.112 |
SO2 | 17.579 | 0.056 | 240.894 | −4.570 | 2.302 |
Pollutant | RMSE | r2 | MAPE | NSE | ACFT |
---|---|---|---|---|---|
NOx | 14.834 | 0.396 | 23.148 | 0.341 | 0.966 |
CO | 0.565 | 0.285 | 73.538 | 0.275 | 1.183 |
O3 | 10.736 | 0.815 | 85.331 | 0.432 | 1.438 |
SO2 | 37.826 | 0.109 | 198.364 | −0.194 | 0.900 |
Pollutant | RMSE | r2 | MAPE | NSE | ACFT |
---|---|---|---|---|---|
SO2 | 5.125 | 0.582 | 25.128 | 0.431 | 1.139 |
O3 | 11.834 | 0.550 | 28.723 | 0.496 | 0.929 |
NOx | 26.512 | 0.063 | 51.687 | 0.045 | 1.101 |
CO | 0.740 | 0.182 | 80.081 | 0.141 | 1.038 |
Pollutant | RMSE | r2 | MAPE | NSE | ACFT |
---|---|---|---|---|---|
CO | 0.236 | 0.665 | 25.181 | 0.647 | 0.953 |
NOx | 17.480 | 0.287 | 31.760 | 0.283 | 1.047 |
O3 | 15.769 | 0.343 | 60.332 | 0.310 | 1.004 |
SO2 | 20.571 | 0.149 | 327.356 | −1.631 | 3.067 |
and CO is about 57 and 82 while SO2 has a very high MAPE value of over 240. Considering NOx as priority, MAPE value for CO and O3 were about 73 and 85. In this case also SO2 has very high MAPE (over 198) as shown in
A simple method of optimizing the AQMN is proposed using GIS, interpolation techniques and historical data. Existing air quality stations are systematically eliminated and the missing data are filled in using the most appropriate interpolation technique. The interpolated data are then compared with the observed data. Pre-defined performance measures RMSE, MAPE and r2 were used to check the accuracy of the interpolated data. NSE and ACFT supported the validity of the interpolated data. The process was simulated for several sets of observed data using an algorithm developed in GIS environment. In order to achieve a MAPE value of 25 or less, no combination of station could be eliminated for all the pollutants. The pollutants could be prioritized to achieve the most optimal scenario. The results of the prioritization showed that the most optimal scenario was for the SO2 stations, which achieved MAPE for O3, NOx and CO about 28, 51 and 80, respectively.
This methodology proves to be useful to the decision makers to find optimal numbers of stations that are needed without compromising the coverage of the concentrations across the study area. Although it is a simple procedure, it does have few limitations. A continuous set of data is required to get the reliable simulation results, owing to the unavailability of such continuous dataset; the staggered dataset is averaged as hourly data for a day and simulated in present case study. Secondly, the process is computing intensive and hence requires large computing resources, though not very expensive these days. Lastly, more parameters can be included in the performance measures to get the most appropriate results.
We gratefully acknowledge the financial support of King Abdulaziz City for Science and Technology (KACST) under grant number 32-594.
Mohammed M. Shareef,Tahir Husain,Badr Alharbi, (2016) Optimization of Air Quality Monitoring Network Using GIS Based Interpolation Techniques. Journal of Environmental Protection,07,895-911. doi: 10.4236/jep.2016.76080