^{1}

^{*}

^{1}

The linked simulation-optimization model can be used for solving a complex groundwater pollution source identification problem. Advanced simulators have been developed and successfully linked with numerous optimization algorithms for identification of groundwater pollution sources. However, the identification of pollution sources in a groundwater aquifer using linked simulation-optimization model has proven to be computationally expensive. To overcome this computational burden, an approximate simulator, the artificial neural network (ANN) model can be used as a surrogate model to replace the complex time-consuming numerical simulation model. However, for large-scale aquifer system, the performance of the ANN-based surrogate model is not satisfactory when a single ANN model is used to predict the concentration at different observation locations. In such a situation, the model efficiency can be enhanced by developing separate ANN model for each of the observation locations. The number of ANN models is equal to the number of observation wells in the aquifer. As a result, the complexity of the ANN-based simulation-optimization model will be related to the number of observation wells. Thus, this study used a modified formulation to find out the optimal numbers of observation wells which will eventually reduce the computational time of the model. The performance of the ANN-based simulation-optimization model is evaluated by identifying the groundwater pollutant sources of a hypothetical study area. The limited evaluation shows that the model has the potential for field application.

Identifying the groundwater pollutant source is a very difficult and time-consuming process due to the involvement of aquifer simulation model with the optimization model. As simulation model is linked with the optimization model, the model is known as the simulation-optimization model. The source identification model can be regarded as an inverse problem and can be solved using inverse optimization technique. The complexity of the aquifer simulation model has led the solution of the source identification model highly computationally expensive. The focus of every source identification model is not only to solve the problem but also to make it efficient with respect to computational time. Numerous attempts have been made by adopting various simulation models but the real challenge lies on the degree of effectiveness of the simulator. There are various techniques for solving the groundwater source identification problems viz. response matrix approach, embedded technique and linked simulation-optimization methods. The study on unknown groundwater pollution source identification using response matrix method was first adopted by Ref. [

One of the major advantages of using linked simulation-optimization is that the simulator is externally linked to the optimization model. As such, any type of complex simulator also can be easily incorporated into the optimization model. Numerous researchers adopted various groundwater simulation models by linking with the optimization model. Ref. [

The review of the literature suggested that ANN based simulation-optimization model is computationally efficient in obtaining pollution source location of an aquifer. It has also been observed that for a large scale aquifer, the performance of the ANN model is found to be unsatisfactory when a single ANN model is used to predict the contaminant concentrations at different observation locations. For this reason, separate ANN models have to be developed for each of the observation location. Thus, the number of ANN models is equal to the number of the observation wells used in the model. Incorporating more number of observation wells will increase the computational time of the model. At the same time, the optimal location of the observation wells, as well as the number of observation wells to be considered in the model are also not known. Considering all these aspects, this study presented a modified optimization formulation for obtaining the contaminant sources. The data required for training the ANN model has been generated using the groundwater transport model MT3DMS. The performance of the developed methodology is evaluated in a large hypothetical study area by identifying groundwater pollutant sources.

The hypothetical problem considered by [^{2}. The presence of two rivers on the western and southern sides of the aquifer defines the boundary condition to be a constant head, whereas no flow boundaries exist on the remaining north-east directions. The hydrogeological parameters for the study area are given in

Parameters | Values |
---|---|

Hydraulic conductivity in x direction, K_{xx} (m/s) | 0.0002 |

Hydraulic conductivity in y direction, K_{yy} (m/s) | 0.0002 |

Effective porosity, | 0.25 |

Time steps, ∆t (months) | 3 |

Longitudinal dispersivity, α_{L} (m) | 40 |

Transverse dispersivity, α_{T} (m) | 9.6 |

Sources | Time Step 1 | Time Step 2 | Time Step 3 | Time Step 4 | Time Step 5 |
---|---|---|---|---|---|

S1 | 908.42 | 1130.50 | 653.35 | 902.13 | 721.25 |

S2 | 644.02 | 1023.87 | 1139.88 | 781.09 | 889.77 |

S3 | 0 | 0 | 0 | 0 | 0 |

S4 | 0 | 1024.16 | 652.05 | 1117.45 | 889.77 |

S5 | 987.08 | 0 | 0 | 1104.82 | 639.93 |

Time Step | Pumping Rates (m^{3}/day) | Time Step | Pumping Rates (m^{3}/day) |
---|---|---|---|

1 | 327.024 | 11 | 272.52 |

2 | 163.512 | 12 | 218.016 |

3 | 218.016 | 13 | 327.024 |

4 | 318.528 | 14 | 163.512 |

5 | 109.008 | 15 | 381.528 |

6 | 327.024 | 16 | 217.72 |

7 | 272.520 | 17 | 272.520 |

8 | 163.512 | 18 | 218.010 |

9 | 381.528 | 19 | 327.024 |

10 | 109.008 | 20 | 272.520 |

The groundwater pollution sources can be identified using inverse optimization technique. There are mainly two objective functions used in the optimization model. The first objective function minimizes the difference between the observed and the simulated concentrations at the observation locations. Whereas the second objective function will allow the model to select those observation wells whose concentrations are large and monitors effectively throughout the stress period. It can be further added that the second objective function will not select that observation wells which are located very far from the pollutant sources as the contaminant concentration observed in those wells will be negligible at different time steps. So, the combination of these two objectives will allow the model to identify the pollutant source effectively and the select only those optimal well locations which will monitor large contaminant concentration. The objective function can be written as:

Minimize

Subject to the constraints

where: ^{th} observation well location at time period n;

For obtaining the unknown contaminant sources in an aquifer, an aquifer simulator is required to be linked with the optimization model. The aquifer process can be simulated by solving the flow and transport equation using a numerical technique. As mentioned earlier, the numerical simulation model is computationally expensive. The computational time can be reduced using ANN based surrogate model. But developing an ANN model for each of the observation well will also be an issue of increasing computational time for a large study area. Therefore, the modified formulation presented above searches for an optimal number of observation wells which will reduce the computational burden of the model and eventually identify the pollutant sources efficiently. The ANN simulation will be performed only for those locations where the z value is equal to one. As mentioned earlier, the simulation has been performed for five years and the search was performed to find the optimal number of wells at the end of 1st year, 3rd year and 5th year. A total number of 30 potential observation wells were used for all the three time periods. From the initially placed potential observation wells, the objective function directs the optimization model to select the best match concentrations between the observed and simulated from all the considered locations and time steps. The constraint sets a boundary condition for the objective function to choose an optimal number of wells between 10 and 20 numbers of monitoring wells for the three time periods.

The additional binary decision variable introduced in the constraint function decides whether an observation well will be selected or not given by 0 or 1, respectively. It is to be noted that the placing of a large number of observation wells far from the pollutant sources might not identify the pollutant source efficiently because the location of the wells being far from the source will become redundant. Hence, the optimal number of well locations around the pollutant sources will efficiently identify the pollutant sources. The optimization model is solved using genetic algorithms as it has a binary variable.

In this evaluation process, the ANN models developed for each observation location will be repeatedly called by the optimization model. During each successive generation, the existing population undergoes the subsequent steps of selection, crossover and mutation to produce an advanced breed. This continues until the termination criteria of the simulation-optimization are not satisfied. The best offspring tends to give the best optimal solution of the inverse optimization problem. The genetic algorithm parameters used are shown in

MODFLOW and MT3DMS models are used for simulating the groundwater flow and transport processes. In the present methodology, the data pattern required for training the ANN model is generated using the MT3DMS model.

The simulation of groundwater flow and transport in an aquifer can be solved using the two partial differential

Parameter | Adopted Value | Function Parameter | Adopted |
---|---|---|---|

Population size | 200 | Scaling function | Rank |

Generations | 2000 | Selection function | Stochastic uniform |

Crossover fraction | 0.8 | Mutation function | Constraint dependent |

Elite count | 0.5 | Crossover function | Scattered |

equations. The groundwater flow equation for two-dimensional flow in a confined homogeneous aquifer can be written according to [

where:

The governing equation for contaminant mass transport in groundwater is given by [

where: c is the concentration of dissolved chemical species (

Ref. [

For the present methodology, single hidden layer architecture is adopted. There are no definite rules for selecting the number of hidden layers and the number of neurons in the architecture, it is determined on the basis of trial and error. The ANN model is trained using Levenberg-Marquardt algorithm. A unipolar sigmoidal transfer function and a purely linear transfer function are used for the neurons in the hidden layer and in the output layer of the network respectively. There are number of important ANN parameters which must be selected through comparative study. The ANN parameters are the learning rate and the momentum rate. Once the combination of the ANN parameters are decided for designing an ANN model it can be further used in the training of ANN model. The learning rate signifies the rate at which the weight is changes in the training phase. A fast learning rate will require less computational time to train the network. Whereas a slow learning rate slows down the training procedure and would require more computational time [

There are five pollutant sources in the study area and the simulation is performed for a period of five years at an interval of ninety days. As such a total number of 25 pollutant source fluxes will be used as the input to the ANN model. The pollutant sources are active for five-time steps. The output from the ANN model is the concentration at the observation locations for all the five years which will be equal to 20. There are 30 observation wells in the aquifer. As such, 30 models have been developed. Then each of these models will be used in predicting the contaminant concentrations at the well locations. A total number of three layers are used with one input layer, one hidden layer, and one output layer. The developed architecture for the ANN model can be represented as 25-40-20 which has been shown in

The ANN models developed for the thirty observation wells of the illustrative study area are incorporated with the optimization model for identification of pollutant sources and the optimal location of the monitoring wells.

^{2}, it can be observed that the performance of the ANN model as an approximate simulator is quite good.

As discussed above, the developed ANN model is linked with the optimization model. The optimization problem is solved using Genetic Algorithms. In every generation of Genetic Algorithms, the ANN models have to be run to calculate the simulated concentration at observation locations. Although there are thirty ANN models need to be run in every generation of Genetic Algorithms, it will actually run the ANN model of only those locations where the z value is one. Thus, the number of ANN model run in every generation is always less than the maximum number of observation wells. The results obtained using the proposed model is compared between the actual and the predicted pollutant sources as shown in

Even though the movement of the pollutant sources in the groundwater is considered to be a slow process but considering the dynamic nature of the pollutant sources, the location for optimal number of observation wells will be selected at the end of first year, third year and fifth year. The model selects optimal number of wells for each time period from a total number of 30 observation wells locations. In the first year, the optimal well locations selected by the model are close to the source locations. It can be seen that the location of the optimal well follows the path of the plume and a total number of 15 wells are selected ranging between 10 and 20 which are the maximum and the minimum number of optimal wells allowed by the model (

The observation well locations for the second and third time periods can be seen in

plume. This is because the optimal wells detected will monitor for two long years and the concentration of the wells being dynamic in nature compels the wells to change their location with time. Hence, with the passage of time, the optimal wells are moving along the direction of the plume. It is also observed that the total number of optimal wells selected by the model is 10 for both the time periods.

Most of the selected optimal observation wells for all the three time period are found to be very close to the pollutant sources. It is due to the fact that the pollutant sources are active for five-time steps only. In the remaining time steps, the pollutant source becomes inactive resulting in the decrease of the pollutant concentration for the remaining time steps. It can also be noted that the observation well (W 30) located very far from the pollutant sources is not selected by the model as the well is not within the reach of the contaminant concentration. As mentioned earlier, ANN model is developed for all the thirty observation wells and each of these selected optimal wells represents those ANN models which perform as the best approximate simulator. The ANN-GA model took only 6 hours of CPU time to compute the simulation-optimization problem on a 2.27 GHz processor.

Time Steps | Source Locations | Actual Sources (g/s ) | Estimated Sources (g/s) | Absolute Relative Error (%) |
---|---|---|---|---|

1 | S1 | 908.42 | 909.54 | −0.12 |

S2 | 644.02 | 643.20 | 0.12 | |

S3 | 0 | 66.6 | - | |

S4 | 0 | 7.73 | - | |

S5 | 987.08 | 986.203 | 0.08 | |

2 | S1 | 1130.5 | 1125.20 | 0.46 |

S2 | 1023.87 | 1029.73 | −0.57 | |

S3 | 0 | 22.16 | - | |

S4 | 1024.16 | 988.78 | 3.45 | |

S5 | 0 | 0.38 | - | |

3 | S1 | 653.35 | 663.18 | −1.50 |

S2 | 1139.88 | 1125.71 | 1.24 | |

S3 | 0 | 12.34 | - | |

S4 | 652.05 | 728.73 | −11.76 | |

S5 | 0 | 0.48 | - | |

4 | S1 | 902.15 | 892.55 | 1.06 |

S2 | 781.09 | 796.56 | −1.98 | |

S3 | 0 | 4.73 | - | |

S4 | 1117.45 | 1017.66 | 8.92 | |

S5 | 1104.82 | 1101.45 | 0.30 | |

5 | S1 | 721.24 | 726.87 | 0.77 |

S2 | 889.77 | 886.64 | 0.35 | |

S3 | 0 | 10.23 | - | |

S4 | 457.91 | 509.22 | −11.20 | |

S5 | 639.93 | 642.24 | −0.36 |

Time period | Number of selected optimal wells | Location of the optimal observation wells (i, j, k) |
---|---|---|

1 | 15 | W1 (30, 56, 1) |

W3 (32, 53, 1) | ||

W4 (63, 52, 1) | ||

W5 (66, 49, 1) | ||

W6 (70, 61, 1) | ||

W9 (72, 49, 1) | ||

W10 (75, 50, 1) | ||

W14 (82, 53, 1) | ||

W16 (86, 32, 1) | ||

W19 (87, 32, 1) | ||

W21 (89, 31, 1) | ||

W22 (86, 32, 1) | ||

W24 (87, 32, 1) | ||

W25 (89, 31, 1) | ||

W26 (92, 32, 1) |

Time period | Number of selected optimal wells | Location of the optimal observation wells (i, j, k) |
---|---|---|

2 | 10 | W7 (38, 62, 1) |

W10 (66, 49, 1) | ||

W11 (68, 53, 1) | ||

W12 (69, 57, 1) | ||

W19 (82, 53, 1) | ||

W21 (82, 50, 1) | ||

W23 (86, 61, 1) | ||

W26 (92, 32, 1) | ||

W27 (93, 36, 1) | ||

W28 (99, 36, 1) |

The present methodology has adopted linked simulation-optimization approach for efficient identification of pollutant sources in an aquifer. For reducing the computational time of the groundwater flow and transport processes, ANN model has been used as an approximate simulator. The developed methodology has been

Time period | Number of selected optimal wells | Location of the optimal observation wells (i, j, k) |
---|---|---|

3 | 10 | W12 (69, 57, 1) |

W13 (70, 61, 1) | ||

W15 (74, 63, 1) | ||

W17 (75, 59, 1) | ||

W18 (80, 61, 1) | ||

W19 (82, 53, 1) | ||

W20 (82, 66, 1) | ||

W21 (83, 50, 1) | ||

W23 (86, 61, 1) | ||

W29 (103, 36, 1) |

applied in a large study area of approximately 17 km^{2}. A total number of 30 observation wells are used for the present study area. Thirty ANN models were developed for each of the thirty observation wells as performance of the simulation model is not satisfactory when a single ANN model is used to predict the concentration of all the observation wells. However, simulations performed by all these ANN models are also taking considerable CPU time. Therefore, the present study has focused on the selection of optimal observation wells which will ultimately reduce the computational time. The solution results show that the location of the optimal wells changes from one time period to another and shows the dynamic state of the network. The wells were selected by the model considering the budgetary constraints. These optimal wells detect the pollutant concentration very efficiently, matching the estimated fluxes with the actual fluxes. The relative error for the estimated source fluxes is found to be very negligible when compared with actual sources. From the performance evaluation of the results, it is seen that the ANN model can efficiently use as an approximate simulator for a large study area. However, detailed and hydrological parameters of a real large aquifer are required for evaluating the performance of the present methodology.

Sophia Leichombam,Rajib Kumar Bhattacharjya, (2016) Identification of Unknown Groundwater Pollution Sources and Determination of Optimal Well Locations Using ANN-GA Based Simulation-Optimization Model. Journal of Water Resource and Protection,08,411-424. doi: 10.4236/jwarp.2016.83034