_{1}

^{*}

Climatic variability influences the hydrological cycle that subsequently affects the discharge in the stream. The variability in the climate can be represented by the ocean-atmospheric oscillations which provide the forecast opportunity for the streamflow. Prediction of future water availability accurately and reliably is a key step for successful water resource management in the arid regions. Four popular ocean-atmospheric indices were used in this study for annual streamflow volume prediction. They were Pacific Decadal Oscillation (PDO), El-Nino Southern Oscillation (ENSO), Atlantic Multidecadal Oscillation (AMO), and North Atlantic Oscillation (NAO). Multivariate Relevance Vector Machine (MVRVM), a data driven model based on Bayesian learning approach was used as a prediction model. The model was applied to four unimpaired stream gages in Utah that spatially covers the state from north to south. Different models were developed based on the combinations of oscillation indices in the input. A total of 60 years (1950-2009) of data were used for the analysis. The model was trained on 50 years of data (1950-1999) and tested on 10 years of data (2000-2009). The best combination of oscillation indices and the lead-time were identified for each gage which was used to develop the prediction model. The predicted flow had reasonable agreement with the actual annual flow volume. The sensitivity analysis shows that the PDO and ENSO have relatively stronger effect compared to other oscillation indices in Utah. The prediction results from the MVRVM were compared with the Support Vector Machine (SVM) and the Artificial Neural Network (ANN) where MVRVM performed relatively better.

The ocean-atmospheric indices are connected to climatic variability around the globe. Streamflow depends on the distribution of the precipitation in time and space as well as in the type and state of the basin, which, in turn, depends on the climatic conditions [

Pacific Decadal Oscillation (PDO), El-Niño Southern Oscillation (ENSO), Atlantic Multi-decadal Oscillation (AMO), and North Atlantic Oscillation (NAO) are popular oceanic-atmospheric oscillation indices. The climate variation in the decadal-scale over the Pacific Ocean and its surrounding are strongly related to PDO which is coherent with the wintertime climate over North America [

The signal strength of oscillation indices varies spatially in the regions around the world. It is thus important to identify the influential oscillation indices or their combinations and corresponding lead time that develops the best prediction model for the given location of stream gages. Accurate prediction of long-lead time streamflow can benefit the management of water resources in the basin scale [

There are several physically based models developed to understand the behavior of the water resources systems. The complexities in these models and difficulties associated with the data acquisitions and corresponding expenses that these models would require has limited the application of such models. To overcome these limita- tions, data driven models are often used as an alternative to physically based models. They are characterized by their ability to quickly capture the underlying physics of the system by relating inputs and outputs. They are robust and are capable of making reasonable predictions using historical data [

Artificial Neural Network (ANN), Support Vector Machine (SVM), and Relevance Vector Machine (RVM) are popular data driven models. The ANN model has ability to implicitly detect complex nonlinear relationship between response and predictors. It performs well even if the data contains noise. However, it has number of disadvantages. The ANN model may get stuck in local minima rather than global minima. Also, an incorrect network definition may cause over fitting of the model. The SVM model is a very popular machine learning model. It however makes unnecessary liberal use of the basis function. The number of support vector increases linearly with the size of training data [

The objective of this study is to identify the best combinations of oscillation indices and predict long-lead time annual streamflow volume accurately and reliably at four unimpaired stream gages in Utah that spatially covers the state from north to south.

Four stream gages were chosen in Utah (

Site ID | Name | Basin | Stream | Gage location | ||
---|---|---|---|---|---|---|

Area (km^{2}) | Length (km) | Slope | Latitude (˚) | Longitude (˚) | ||

10128500 | Weber River near Oakley | 419.8 | 40.7 | 0.020 | 40.737 | −111.247 |

10131000 | Chalk Creek at Coalville | 643.1 | 60.4 | 0.010 | 40.921 | −111.401 |

10174500 | Sevier River at Hatch | 880.6 | 50.1 | 0.007 | 37.651 | −112.430 |

09330500 | Muddy Creek near Emery | 271.9 | 32.3 | 0.004 | 38.982 | −111.249 |

Relevance Vector Machine is a supervised learning model based on sparse Bayesian learning. This is a model of identical functional form to the SVM developed by Vapnik [

For the given input-target pair

Target ^{2}.

The unknown function y is the product of design matrix (F) and weight parameter (w). In the vector form, Equation (1) can be written as,

where the target and weight vector are expressed as t = (^{T} and w= (^{T}, respectively. An independent Gaussian noise is assumed. Thus,

To avoid the overfitting in Equation (2), w is constrained with prior probability (Equation (3)) [

where α is hyperparameter that controls the deviation of each weight from zero [

The posterior covariance and mean of the weight are

For uniform hyperpriors over loga and logσ,

Equation (5) is solved by iterative re-estimation.

where^{th} posterior mean weight and N is the number of data examples. The ^{th} diagonal element of the posterior weight covariance computed with the current a and

The learning algorithm proceeds by iterative process of Equation (6) together with updating the posterior sta- tistics

where

The model used in this study is one introduced by Thayananthan [

Unimpaired monthly streamflow data were collected for Weber River near Oakley, Chalk Creek at Coalville, Sevier River at Hatch, and Muddy Creek near Emery for 1950-2009 from the US Geological Survey (USGS). The values were then converted to annual flow volumes using appropriate conversion factor.

Pacific Decadal Oscillation (PDO) is a climate phenomenon associated with the persistent, bi-modal climate patterns in the North Pacific Ocean. It is an interannual climate index which can be used as an integrator of the overall winter climate conditions in the North Pacific. The PDO also refers to a numerical climate index based on the sea surface temperatures (SST) in a particular region of the North Pacific which has an interannual signature [

The El-Niño Southern Oscillation is a complex ocean/atmospheric interaction that causes cyclical patterns of warming and cooling of the sea surface in the tropical Pacific with the pronounced global climatic teleconnection. El-Niño is a warm-phase and La Niña is a cold phase. It has characteristic return frequency of 4 to 6 years, and usually persists for 1 to 2 years. Several studies have shown that it is associated with the streamflow variability in the western United States [

AMO index is introduced by Enfield et al. [

NAO is a dominant mode of winter climate variability in the North Atlantic region ranging from Central North America to Europe and much into Northern Asia. The positive NAO means below normal pressure across the high latitudes of the North Atlantic, and above normal pressure over the Central North Atlantic, Eastern United States and the Western Europe. This is opposite for the negative phase. The NAO index varies from year to year, but also exhibits a tendency to remain in one phase for intervals lasting for several years. The monthly average NAO data were collected from National Center for Atmospheric Research www.cgd.ucar.edu/cas/jhurrell/indices.html and annual averages were computed for 1945-2009 (

The input consists of annualized ocean-atmospheric oscillation indices and the output is the annualized streamflow volume. The oscillation indices at time step t were used to predict annual streamflow volume at time step t + i where

correlation coefficient, and efficiency in the test phase.

Different models were developed based on the different combinations of oscillation modes in the input. Model 1 consists of using all four oscillation indices (PDO, ENSO, AMO, and NAO). This results in one model run for each lead-time. Model 2 consists of dropping one oscillation index and using remaining three oscillations. This results in four model runs for each lead-time. Model 3 consists of dropping two oscillation indices and using remaining pair. This results in total six model runs for a given lead-time. Model 4 consists of using only one oscillation mode at a time. This results four model runs for each lead-time. Model 1 is a base case while Model 2 to Model 4 gives the relative influence of ocean-atmospheric oscillation indices for the annual streamflow volume prediction for each selected gage. For each model type, the combination of oscillation indices and lead-time corresponding to the best test result was identified which was used to predict the long lead-time annual stream- flow volume. For each lead-time, the combination of oscillations that develops the best prediction was also identified. This shows the relative influence of oscillations for given lead-time for each selected gage. The prediction results from the MVRVM were also compared with the ANN and SVM.

Model 1

Model 2

Model 3

The best test RMSE was obtained from the pair of PDO + ENSO at 4 year lead for Weber River near Oakley (

Model 4

The test RMSE at each selected gage for 1 to 5-year lead time is shown in

time. For Chalk Creek at Coalville, AMO produced relatively small test RMSE compared to other oscillation indices. However, the ENSO at 4-year lead developed the comparable model prediction. For Sevier River at Hatch, the PDO produced the best test RMSE at 2-year lead. Next to it, ENSO developed the best result at 4-year lead. For Muddy Creek near Emery, ENSO and PDO produced relatively better test RMSE at 1 and 2-year lead. When compared among 3-, 4- and 5-year lead, ENSO and AMO predicted relatively better at 4-year lead. In general, 4-year lead developed the best test result.

The best prediction for each model, the corresponding combination of oscillation indices, and lead-time for Model 1 to Model 4 are shown in Tables 2-5, respectively.

The predictions from Model 1 to Model 4 are presented in Figures 7-10, respectively. The first and second columns are for training and test phase, respectively. The third column is a plot of actual versus predicted annual flow volume for training phase and fourth column shows similar plot for the test phase. The results show that the model has predicted annual flow volume reasonably well using ocean-atmospheric oscillation indices. A good agreement was obtained between the actual and the predicted volume. The plot of predicted versus actual streamflow volume shows the points are saturated about 45 degree line except extreme flow events. This was because the oscillation indices do not fully represent the underlying physical processes responsible for generation of streamflow. In general, this prediction gives reasonable estimate of future water availability which could be used for planning and management of water resources in the basin scale.

Comparing Model 2 to Model 4 with the base case (Model 1), the relative influence of each oscillation index was estimated subjectively for each gage.

In Model 1, the best model prediction was obtained at 4-year lead for all stream gages except Sevier River at Hatch where the best model prediction was obtained at 3-year lead. Therefore, for comparing Model 2 (

Stream gage | Train correlation | Test correlation | Test RMSE (1000 ac-ft) | Efficiency | Lead time (year) | Combination of indices |
---|---|---|---|---|---|---|

Weber River near Oakley | 0.43 | 0.53 | 32.88 | 0.08 | 4 | All |

Chalk Creek at Coalville | 0.48 | 0.39 | 16.76 | - | 4 | All |

Sevier River at Hatch | 0.34 | 0.56 | 58.2 | 0.22 | 3 | All |

Muddy Creek near Emery | 0.37 | 0.38 | 9.97 | 0.23 | 4 | All |

Gage | Train correlation | Test correlation | Test RMSE (1000 ac-ft) | Efficiency | Lead time (year) | Combination of indices |
---|---|---|---|---|---|---|

Weber River near Oakley | 0.39 | 0.67 | 29.42 | 0.261 | 4 | Dropping NAO |

Chalk Creek at Coalville | 0.94 | 0.45 | 13.85 | 0.202 | 4 | Dropping AMO |

Sevier River at Hatch | 0.62 | 0.62 | 57.22 | 0.246 | 3 | Dropping NAO |

Muddy Creek near Emery | 0.58 | 0.51 | 9.26 | 0.332 | 3 | Dropping AMO |

Sevier River at Hatch | 0.73 | 0.82 | 47.87 | 0.473 | 2 | Dropping AMO^{*} |

^{*}Second model.

Gage | Train correlation | Test correlation | Test RMSE (1000 ac-ft) | Efficiency | Lead time (year) | Combination of indices |
---|---|---|---|---|---|---|

Weber River near Oakley | 0.49 | 0.72 | 33.95 | 0.02 | 4 | PDO + ENSO |

Chalk Creek at Coalville | 0.57 | 0.61 | 19.01 | - | 4 | PDO + ENSO |

Sevier River at Hatch | 0.62 | 0.87 | 41.68 | 0.60 | 2 | PDO + NAO |

Muddy Creek near Emery | 0.90 | 0.82 | 7.13 | 0.61 | 2 | PDO + NAO |

Gage | Train correlation | Test correlation | Test RMSE (1000 ac-ft) | Efficiency | Lead time (year) | Combination of indices |
---|---|---|---|---|---|---|

Weber River near Oakley | 0.41 | 0.40 | 36.86 | - | 4 | ENSO |

Chalk Creek at Coalville | 0.45 | 0.30 | 19.72 | - | 4 | ENSO |

Sevier River at Hatch | 0.49 | 0.60 | 58.26 | 0.218 | 4 | ENSO |

Muddy Creek near Emery | 0.14 | 0.35 | 10.15 | 0.199 | 4 | AMO |

Sevier River at Hatch | 0.36 | 0.84 | 57.09 | 0.249 | 2 | PDO^{*} |

Muddy Creek near Emery | 0.47 | 0.84 | 9.13 | 0.351 | 2 | PDO^{*} |

^{*}Second model.

year lead was used. Based on the test RMSE, the model prediction shows good improvement over Model 1 when NAO was dropped for Weber River near Oakley. Reasonable improvement was obtained when PDO was dropped. The model prediction marginally deteriorated when AMO was dropped. However, the prediction dete- riorated significantly when ENSO was dropped. For Chalk Creek at Coalville, significant improvement was ob- tained in the model prediction compared to Model 1 by dropping AMO, and PDO. Marginal improvement was obtained by dropping NAO, and ENSO. For Sevier River at Hatch, the prediction improved by dropping NAO. The result marginally deteriorated by dropping PDO and it deteriorated significantly by dropping ENSO. For Mud- dy Creek near Emery, the prediction results marginally deteriorated by dropping PDO, and NAO. It however

deteriorated significantly when ENSO and AMO were dropped. These comparisons help to identify the relative strength of the oscillation indices for each gage.

In learning machine, the model prediction deteriorates by the use of trivial predictors. Since the prediction re- sult improved by dropping NAO compared to Model 1 for Weber River near Oakley, NAO is not influential ocean-atmospheric oscillation index for annual streamflow volume prediction. The PDO and AMO have mar-

ginal influence while ENSO has strong influence because the prediction results significantly deteriorated com- pared to Model 1 when it was dropped.

For Chalk Creek at Coalville, AMO and PDO are not influential oscillation indices because the prediction re- sults improved compared to Model 1 when they were dropped. The ENSO and NAO, however has marginal in- fluence as prediction results marginally improved compared to Model 1 when they were dropped. NAO is not an

influential oscillation index for Sevier River at Hatch because the prediction results improved when it was dropped. Since the results marginally deteriorated by dropping PDO, it may have marginal influence on the annual flow volume prediction. The prediction results however, deteriorated significantly by dropping ENSO. So, ENSO has strong signal for annual flow volume prediction for Sevier River at Hatch. For Muddy Creek near Emery, PDO and NAO have marginal influence and ENSO has strong influence for annual flow volume predictions.

Based on the correlation coefficient between actual and predicted volume in the test phase, the overall results of Model 3 improved over Model 1. The relative strength of oscillation indices were again estimated subjective- ly by comparing results from Model 3 (

Again, Model 4 (

In general, PDO and ENSO produced relatively better streamflow volume predictions compared to other annualized oceanic-atmospheric oscillation indices. The best model was usually obtained at 3 and 4 year leads. The best combination of oscillations can be used to develop accurate model prediction. In addition to fixing the lead time and identifying the best combinations of oscillation indices, the best combination of oscillations were also identified for each lead time (1 through 5 year) (

The results from the MVRVM for Model 1 to Model 4 were compared with corresponding SVM and ANN model for each gage. The software to develop SVM model was obtained from SVM and Kernel Methods Matlab Toolbox . The software to develop ANN model was obtained from Aston University Engineering and Applied Science (http://www1.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/downloads/) .

A total of 500 bootstrap runs were used to construct the histograms. The 2.5^{th} percentile and 97.5^{th} percentile values of test RMSE were computed which are shown by red dotted lines in

Lead time (year) | Weber River near Oakley | Chalk Creek at Coalville | Sevier River at Hatch | Muddy Creek near Emery |
---|---|---|---|---|

1 | ENSO and AMO | All | ENSO, AMO, and NAO | ENSO, AMO, and NAO |

2 | ENSO, AMO, and NAO | PDO and ENSO | PDO and NAO | PDO and NAO |

3 | ENSO, AMO, and NAO | PDO, ENSO, and NAO | PDO, ENSO, and AMO | PDO, ENSO, and NAO |

4 | PDO, ENSO, and AMO | PDO, ENSO, and NAO | PDO, ENSO, and AMO | ENSO and AMO |

5 | ENSO and AMO | ENSO, AMO, and NAO | ENSO and AMO | AMO |

indicating the developed model is robust and resonable to use as long lead-time streamflow prediction model.

The relationship between streamflow and climate variability represented by ocean-atmospheric oscillation indices is a key for the annual streamflow volume prediction. Accurate and reliable long-term streamflow prediction is crucial for the management of water resources in the basin scale. This study identifies the best combinations of the oscillations indices and lead-time for each selected stream gage and use them for the annual streamflow volume predictions. This study has also presented the relative influence of each oscillation index at each selected stream gage. The streamflow volumes were predicted at 1 to 5-year lead using the MVRVM model and pre- diction results were refined using the optimal combination of oscillations and corresponding lead-time. The model prediction showed satisfactorly results. Model 1 was a base case where all four oscillation indices (PDO, ENSO, AMO and NAO) were used while Model 2, 3, and 4 were developed from the different combinations of oscil- lation indices. The best predictions were usually obtained at 4-year lead-time. Although relatively better predic- tions were obtained at 2- and 3-year lead in some gages, the 4-year lead predictions were comparable. The ENSO and PDO generally predicted better than AMO and NAO for all gages. For the fixed lead-time used in this paper

(4-year except Sevier River at Hatch), ENSO and PDO showed strong to marginal influence while AMO and NAO has weak to marginal influence. In addition, the combination of oscillations that predicted the best results for each lead-time were also identified. Different combinations of oscillations developed the best predictions at different lead-time. This information can be used to enhance the model predictions. In general, the model predicted reasonably well, however, it did not perform well on capturing the extreme events. This shows that the oscillation indices used in this paper are not enough to represent the physical process associated with the generation of streamflow. Bootstrap analysis was used to test the robustness and generalization capability of the model. The narrow bound of histogram showed that the model was robust. Also, the actual test statistics were in between the 2.5^{th} and 97.5^{th} percentile values which indicated the model prediction was consistent and was well generalized. The comparison showed that the MVRVM outperformed the ANN and SVM. The pattern of pre- dictions however remained similar in all machine learning models.