_{1}

^{*}

Although many studies have explored the quality of Texas groundwater, very few have investigated the concurrent distributions of more than one pollutant, which provides insight on the temporal and spatial behavior of constituents within and between aquifers. The purpose of this research is to study the multivariate spatial patterns of seven health-related Texas groundwater constituents, which are calcium (Ca), chloride (Cl), nitrate (NO
_{3}
), sodium (Na), magnesium (Mg), sulfate (SO
_{4}
), and potassium (K). Data is extracted from Texas Water Development Board’s database including nine years: 2000 through 2008. A multivariate geostatistical model was developed to examine the interactions between the constituents. The model had seven dependent variables—one for each of the constituents, and five independent variables: altitude, latitude, longitude, major aquifer and water level. Exploratory analyses show that the data has no temporal patterns, but hold spatial patterns as well as intrinsic correlation. The intrinsic correlation allowed for the use of a Kronecker form for the covariance matrix. The model was validated with a split-sample. Estimates of iteratively re-weighted generalized least squares converged after four iterations. Matern covariance function estimates are zero nugget, practical range is 44 miles, 0.8340 variance and kappa was fixed at 2. To show that our assumptions are reasonable and the choice of the model is appropriate, we perform residual validation and universal kriging. Moreover, prediction maps for the seven constituents are estimated from new locations data. The results point to an alarmingly increasing levels of these constituents’ concentrations, which calls for more intensive monitoring and groundwater management.

Groundwater is one of Texas natural resources that supplies the majority of the total water use in Texas [

This study focuses on the simultaneous spatial and temporal distributions of the seven most investigated groundwater constituents (five major and two minor constituents) over a nine-year period. A deeper understanding of the factors affecting constituents’ levels may lead to a more successful and specialized programs designed at protecting groundwater from contamination. The results of this research project may be relevant to preventing and controlling groundwater contamination.

Data was obtained from the Texas Water Development Board Groundwater Database for all Texas wells from 2000 through 2008. The samples were checked for flagged values to ensure acceptable results in terms of reliable sampling, threshold conditions, or other criteria that can label the values as non-reliable. The wells are sampled periodically every four years (

To investigate temporal effects, Fisher’s F-test was used to test the null hypothesis of non-changing variances between samples of the following years: 2000 versus 2004, 2001 versus 2005, 2002 versus 2006, 2003 versus 2007 and 2004 versus 2008. Fisher’s F test p-values were all greater than the significance level of 0.05. Hence we cannot reject the null hypothesis of equal variances. Therefore, none of the constituents has shown any change in variance from the year 2000 to the year 2008. Repeated measures t-test was used to test the null hypothesis of non-changing means for the seven constituents (calcium, chloride, magnesium, potassium, nitrate, sulfate, sodium). The test was run to compare sample pairs of the following years: 2000 versus 2004, 2001 versus 2005, 2002 versus 2006, 2003 versus 2007 and 2004 versus 2008. It was found that none of the constituents has shown any change from the year 2000 to the year 2008. T-tests p-values were all greater than the significance level of 0.05. Hence the null hypothesis of equal means could not be rejected. Moreover, mapping of annual concentrations showed that for each of the seven constituents, the differences between two years levels was around zero. Based on the results for non-changing variances and means across the years, it was concluded that the data set does not contain temporal effects and all records between 2000 and 2008 were combined.

Exploratory analyses showed that Texas well depth means are similar all over the State, and that variances of constituents are lesser within an aquifer than between the aquifers. This means that the locality of a constituent has an effect on its level, which calls for a spatial model. Descriptive statistics showed high shifts of skewness and kurtosis from 0 and 3, respectively, which are the characteristic values of normal distribution. Therefore, the variables generally exhibit non symmetric distributions, with long tails and several outliers (

A preliminary variogram analysis enables us to visualize the characterization of spatial correlation. At this

Variable | Mean | Minimum | Maximum | Median | Std. Dev. | Skewness | Kurtosis |
---|---|---|---|---|---|---|---|

Calcium | 93.72 | 0.35 | 1560.00 | 68.50 | 104.38 | 4.19 | 25.80 |

Chloride | 159.32 | 1.85 | 17000.00 | 41.60 | 421.6111 | 18.41 | 635.51 |

Potassium | 5.95 | 0.20 | 99.30 | 4.40 | 6.20 | 3.53 | 25.69 |

Magnesium | 36.59 | 0.22 | 559.00 | 24.20 | 44.60 | 3.93 | 24.23 |

Sodium | 122.51 | 2.39 | 13900.00 | 47.15 | 301.76 | 24.74 | 1052.38 |

Nitrate | 17.33 | 0.00 | 425.88 | 8.01 | 31.68 | 5.03 | 35.69 |

Sulfate | 192.40 | 0.00 | 5110.00 | 46.50 | 389.27 | 4.08 | 22.56 |

Variables | ln(Ca) | ln(Mg) | ln(Na) | ln(K) | ln(SO_{4}) | ln(NO_{3}) | ln(Cl) |
---|---|---|---|---|---|---|---|

ln(Ca) | 1 | 0.695 | 0.043 | 0.218 | 0.403 | 0.428 | 0.359 |

ln(Mg) | 0.695 | 1 | 0.166 | 0.525 | 0.557 | 0.489 | 0.315 |

ln(Na) | 0.043 | 0.166 | 1 | 0.628 | 0.696 | 0.026 | 0.832 |

ln(K) | 0.218 | 0.525 | 0.628 | 1 | 0.629 | 0.206 | 0.541 |

ln(SO_{4}) | 0.403 | 0.557 | 0.696 | 0.629 | 1 | 0.288 | 0.678 |

ln(NO_{3}) | 0.428 | 0.489 | 0.026 | 0.206 | 0.288 | 1 | 0.157 |

ln(Cl) | 0.359 | 0.315 | 0.832 | 0.541 | 0.678 | 0.157 | 1 |

Ca | Mg | Na | K | SO_{4} | NO_{3} | Cl | |
---|---|---|---|---|---|---|---|

Ca | 1.00 | 0.67 | 0.64 | 0.44 | 0.76 | 0.52 | 0.81 |

Mg | 1.00 | 0.72 | 0.77 | 0.87 | 0.46 | 0.82 | |

Na | 1.00 | 0.59 | 0.86 | 0.41 | 0.88 | ||

K | 1.00 | 0.64 | 0.43 | 0.61 | |||

SO_{4} | 1.00 | 0.45 | 0.81 | ||||

NO_{3} | 1.00 | 0.46 | |||||

Cl | 1.00 |

point, we recognize that there may be mean effects that are unaccounted for. Nevertheless, these initial variogram estimates indicate consistent spatial behavior across aquifer. None of the constituents showed any significant anisotropy within an aquifer. Multivariate intrinsic correlation exists when the multivariate correlation of a multivariate data set is independent of the spatial correlation. Multivariate intrinsic correlation allows one to simplify data modeling. Under intrinsic correlation, the joint covariance matrix is given by the Kronecker product

Principal components analysis (PCA) was conducted to study and visualize the correlations between the variables and hopefully be able to limit the number of variables to be measured afterwards, and to visualize observations in a 2- or 3-dimensional space in order to identify similarities and dissimilarities within observations. PCA was performed using the correlation matrix, which brings the measurements onto a common scale. In other words, a constituent which concentration varies between 0 and 1 will not weigh more in the projection than a constituent varying between 0 and 400. The Scree Plot of

For our estimation we used the K-Bessel (Matern) model for the semivariogram model since its smoothness can be adjusted. Although the computation are cumbersome, the advantage of this model is that the behavior of the

semivariogram near the origin can be estimated from the data rather than assumed to be of a certain form. Also, changing the value of α, we can get other semivariogram models. For example, when α = 0.5 we get the exponential covariance function [

In order to determine the significant covariates to include in the model, we perform a model selection exercise. The research started with six covariates: altitude, latitude, longitude, major aquifer, water level and percentage of irrigated acres per county. The effect of percentage of irrigated acres per county is excluded from the analysis since it has shown no association with any of the constituents. Model selection was performed to decide on which of the remaining covariates to include in our model. The criteria for model selection is Akaike’s information criterion AIC. One thousand records were selected based on simple random sampling without replacement. Many regressions were performed and their AIC values were compared. We started with no intercept. Then single predictor models, then two-predictor models and so on. The regression model with the least AIC value (3143.04) was found to be [Ca, Cl, NO_{3}, Na, Mg, SO_{4}, K] = longitude + altitude + water depth + aquifer effect (

Regression | AIC |
---|---|

Nitrate = 1 | 3643.62 |

Nitrate = Altitude | 3530.97 |

Nitrate = Latitude | 3573.52 |

Nitrate = Longitude | 3507.15 |

… | … |

Nitrate = Water Depth | 3510.61 |

Nitrate = Longitude + Altitude | 3509.11 |

Nitrate = Longitude + Latitude | 3502.37 |

Nitrate = Longitude + Water Depth | 3401.39 |

… | … |

Nitrate = Altitude + Latitude + Longitude + Water Depth | 3371.91 |

… | … |

Because the concentrations were log transformed prior to modeling (

This study focused on seven of the most researched groundwater constitutents in Texas. Namely, calcium (Ca),

Nugget (tausq) = 0 |
---|

Range (miles) = 8.22 |

Sill (sigmasq) = 0.83 |

Kappa (smoothness parameter) = 2 |

Practical Range = 44.12 (miles) |

Constituent | Mx: The maximum level allowed in public water supplies by the EPA | Log(Mx) |
---|---|---|

Calcium | And magnesium 200 mg/L | 5.298317 |

Chloride | 250 mg/L | 5.521461 |

Magnesium | And calcium 200 mg/L | 5.298317 |

Nitrate | 10 mg/L | 2.302585 |

Potassium | 10 mg/L | 2.302585 |

Sodium | 20 mg/L | 2.995732 |

Sulfate | 250 mg/L | 5.521461 |

Constituent | Affected aquifers |
---|---|

Calcium | Gulf Coast, Ogallala, Edwards-Trinity Plateau and Seymour |

Nitrate | Ogallala, Seymour and Edwards-Trinity Plateau |

Chloride | Ogallala Edwards-Trinity Plateau and Hueco-Mesilla Bolson |

Potassium | Ogallala |

Sodium | All |

Sulfate | Ogallala and Edwards-Trinity Plateau |

chloride (Cl), nitrate (NO_{3}), sodium (Na), magnesium (Mg), sulfate (SO_{4}), and potassium (K). Data were extracted from TWDB database for the nine years from 2000 to 2008. No temporal effects were found in the concentrations but data was found to be intrinsically correlated. This allowed to spatially model the between-con- stituent effects individually using IRWGLS. The multivariate geostatistical model used latitude, longitude, elevation, water depth and aquifer effect as the mean covariates. Also, a covariance structure was estimated to link the constituents’ concentrations with the spatial structure. The non-temporally different concentrations of the seven constituents points to the value of geostatistical modeling techniques as a valuable estimation method. This study provides an example where simultaneous modeling of pollutants can be conducted based on locality, which closes a big gap in environmental research where interaction between variables is significant. This study also shows that calcium, chloride, nitrate, sodium, magnesium, sulfate, and potassium follow similar increasing trends. The high concentrations that exceed EPA tolerated levels presented in the prediction maps (