Application of PLS-Regression as Downscaling Tool for Pichola Lake Basin in India

doi:10.4236/ijg.2010.12007

Paper Menu >>

Journal Menu >>

International Journal of Geosciences, 2010, 1, 51-57

doi:10.4236/ijg.2010.12007 Published Online August 2010 (http://www.SciRP.org/journal/ijg)

Application of PLS-Regression as Downscaling Tool for

Pichola Lake Basin in India

Manish Kumar Goyal*, Chandra Shekhar Prasad Ojha

Department of Civil Engineering, Indian Institute of Technology, Roorkee, India

E-mail: vipmkgoyal@rediffmail.com

Received July 12, 2010; revised July 14, 2010; accepted July 23, 2010

Abstract

In this paper, downscaling models are developed using Partial Least Squares (PLS) Regression for obtaining

projections of mean monthly precipitation to lake-basin scale in an arid region in India. The effectiveness of

this approach is demonstrated through application to downscale the predictand for the Pichola lake region in

Rajasthan state in India, which is considered to be a climatically sensitive region. The predictor variables are

extracted from (1) the National Centers for Environmental Prediction (NCEP) reanalysis dataset for the pe-

riod 1948-2000, and (2) the simulations from the third-generation Canadian Coupled Global Climate Model

(CGCM3) for emission scenarios A1B, A2, B1 and COMMIT for the period 2001-2100. The selection of

important predictor variables becomes a crucial issue for developing downscaling models since reanalysis

data are based on wide range of meteorological measurements and observations. In this paper, we use PLS

regression for quality prediction and its use for the variable selection based on the variable importance. The

results of downscaling models using PLS regression show that precipitation is projected to increase in future

for A2 and A1B scenarios, whereas it is least for B1 and COMMIT scenarios using predictors.

Keywords: PLS Regression, Precipitation, VIP Score

1. Introduction

A general circulation model is a numerical mathematical

model that gives the analysis of atmosphere in all three

spatial dimensions based on conservation laws of mo-

mentum, energy and water vapor. These models are the

most reliable tool for estimating the changes in the cli-

mate. These are also known as global climate models,

generally abbreviated as GCMs. These are mathematical

representations of atmospheric and oceanic properties

and processes that help describe the earth’s climate sys-

tem [1,2]. However, in most climate change impact

studies, such as hydrological impacts of climate change,

impact models are usually required to simulate sub-grid

scale phenomenon and therefore require input data at

similar sub-grid scale. The methods used to convert

GCM outputs into local meteorological variables re-

quired for reliable hydrological modeling are usually

referred to as “downscaling” techniques: [3,4]. Hydro-

logic variables, such as precipitation, etc., are significant

parameters for climate change impact studies. A proper

assessment of probable future precipitation and their

variability are to be made for various hydro-climatology

scenarios.

A number of papers have previously reviewed down-

scaling concepts and recently, downscaling has found

wide application in hydro-climatology for scenario con-

struction and simulation/prediction of 1) low-frequency

rainfall events [5] 2)streamflow [6] 3) precipitation [7] 4)

streamflow [8].

In this paper, we present a downscaling methodology

based on partial least square (PLS) regression technique

to study climate change impact over Pichola lake basin in

an arid region. The objective of this study is to obtain 1)

predictor selection based on Variable Importance in the

Projection (VIP) score 2) downscale mean monthly pre-

cipitation using PLS-regression approach from simula-

tions of CGCM3 for latest IPCC scenarios. The scenarios

which are studied in this paper are relevant to Intergov-

ernmental Panel on Climate Change’s (IPCC’s) fourth

assessment report (AR4) which was released in 2007.

2. Study Region

The area of the this study is the Pichola lake catchment

in Rajasthan state in India that is situated from 72.5°E to

77.5°E and 22.5°N to 27.5°N. The Pichola lake basin,

located in Udaipur district, Rajasthan is one of the major

M. GOYAL ET AL.

sources for water supply for this arid region. During the

past several decades, the streamflow regime in the catch-

ment has changed considerably, which resulted in water

scarcity, low agriculture yield and degradation of the

ecosystem in the study area. Regions with arid and

semi-arid climates could be sensitive even to insignifi-

cant changes in climatic characteristics. Understanding

the relationships among the hydrologic regime, climate

factors, and anthropogenic effects are important for the

sustainable management of water resources in the entire

catchment hence this study area was chosen because of

aforementioned reasons. It receives an average annual

precipitation of 608 mm based on data available from

1975-2000. It has a tropical monsoon climate where most

of the precipitation is confined to a few months of the

monsoon season. The location map of the study region is

shown in Figure 1.

3. Data Extraction

The monthly mean atmospheric variables were derived

from the National Center for Environmental Prediction

(NCEP/NCAR) (hereafter called NCEP) reanalysis data

set [9] for a period of January 1948 to December 2000.

The data have a horizontal resolution of 2.5° latitude X

longitude and seventeen constant pressure levels in ver-

tical. The atmospheric variables are extracted for nine

grid points whose latitude ranges from 22.5 to 27.5 °N,

and longitude ranges from 72.5 to 77.5°E at a spatial

resolution of 2.5°. The meteorological data, i.e., precipi-

tation are used at monthly time scale from records avail-

able for Pichola Lake which is located in Udaipur at 24°

34’N latitude and 73°40’E longitude. The data is avail-

able for the period January 1975 to December 2000 [10].

The Canadian Center for Climate Modeling and Analysis

(CCCma) provides GCM data for a number of surface

and atmospheric variables for the CGCM3 T47 version

which has a horizontal resolution of roughly 3.75° lati-

tude by 3.75° longitude and a vertical resolution of 31

levels. CGCM3 is the third version of the CCCMA Cou-

pled Global Climate Model which makes use of a sig-

nificantly updated atmospheric component AGCM3 and

uses the same ocean component as in CGCM2. The data

comprise of present-day (20C3M) and future simulations

forced by four emission scenarios, namely A1B, A2, B1

and COMMIT. Data was obtained for CGCM3 climate

of the 20th Century (20CM3) experiments used in this

study.

The nine grid points surrounding the study region are

selected as the spatial domain of the predictors to ade-

quately cover the various circulation domains of the pre-

dictors considered in this study. The GCM data is re-

gridded to a common 2.5° using inverse square interpo-

lation technique. The utility of this interpolation algo-

rithm was examined in previous downscaling studies

[7,8]. The development of downscaling models for e

predictand variable precipitation begins with selection of

potential predictors followed by application of PLS re-

gression on downscaling model. The developed model is

then used to obtain projections of precipitation from

simulations of CGCM3.

3. Introduction to Partial Least Square

Regression and Selection of Predictors

3.1. Partial Least Square Regression

Partial least squares (PLS) regression is used to describe

the relationship between multiple response variables and

Figure 1. Location map of the study region in Rajasthan State of India with NCEP grid.

M. GOYAL ET AL.

predictors through the latent variables. PLS regression

can analyze data with strongly collinear, noisy, and nu-

merous X-variables, and also simultaneously model sev-

eral response variables, Y. In general, the PLS approach

is particularly useful when one or a set of dependent

variables (or time series) need to be predicted by a large

set of predictor variables (or time series) that are strongly

cross-correlated. This is often the case in empirical

downscaling of climate variables [11]. For details of PLS

regression, one can refer to Manne [12] and Wold [13].

3.2. Selections of Predictors

The selection of appropriate predictors is one of the most

important steps in a downscaling exercise for down-

scaling predictands. The predictors are chosen by the

following criteria: 1) predictors are skillfully predicted

by GCMs 2) they should represent important physical

processes in the context of the enhanced greenhouse ef-

fect 3) they should not be strongly correlated to each

other [14]. We have used 9 large-scale atmospheric vari-

ables, viz, air temperature (at 925,500 and 200mb pres-

sure levels), geopotential height (at 200 and 500mb

pressure levels), zonal (u) and meridional (v) wind ve-

locities (at 925 and 200mb pressure levels), as the pre-

dictors for downscaling GCM output to mean monthly

precipitation over a catchment.

The VIP (Variable Importance in the Projection)

scores obtained by the PLS regression, has been paid an

increasing attention as an importance measure of each

explanatory variable or predictor [15]. The variable se-

lection procedure under PLS is proposed with an appli-

cation to downscaling technique for identifying influ-

encing variables on understand the impact of climate

change. The VIP scores which are obtained by PLS re-

gression, can be used to select most influential variables

or predictors, X [15]. The VIP score can be estimated for

j-th X-variable by

(,)

diij

VIPRY tw

RYt 







(1)

where Rd is defined as the mean of the squares of the

correlation coefficients (R) between the variables and the

component.

(,) (,)

RdX cRxc

p

 (2)

Usually the predictor variable whose VIP score is

greater than 0.8 and above is considered as an important

variable [16]. It can be seen form Figure 2 that seven

predictor variables namely air temperature at 925 mb,

500 mb and 200 mb, zonal wind (925 mb); meridoinal

wind (925 mb); zeo-potential height 500 mb and 200 mb

have their VIP score greater than 0.8. Hence, these vari-

ables are used in the prediction model to obtain projected

predictands.

Figure 2. VIP of the predictand variable (precipitation) of the three-component PLSR model.

M. GOYAL ET AL.

4. Downscaling of GCM Models

There are several different methods, which can be used

to derive the relationship between local and large-scale

climates. PLS regression is used to downscale mean

monthly precipitation in this study. The data of potential

predictors is first standardized. Standardization is widely

used prior to statistical downscaling to reduce bias (if

any) in the mean and the variance of GCM predictors

with respect to that of NCEP-reanalysis data [17]. Stan-

dardization is done for a baseline period of 1948 to 2000

because it is of sufficient duration to establish a reliable

climatology, yet not too long, nor too contemporary to

include a strong global change signal [8,17].

To develop downscaling models, the feature vectors

(i.e., predictors) which are prepared from NCEP record,

are partitioned into a training set and a validation set.

Feature vectors in the training set are used for calibrating

the model, and those in the validation set are used for

validation. The 26-year mean monthly observed precipi-

tation data series were broken up into a calibration period

and a validation period. The models were calibrated on

the calibration period 1975 to 1989 and validation in-

volved period 1990 to 2000. The various error criteria

are used as an index to assess the performance of the

model. Based on the latest IPCC scenario, models for

mean monthly precipitation were evaluated based on the

accuracy of the predictions for validation data set. The

following criteria of PLS regression models were chosen

in this study.

1) The Q²CUM index measures the global contribution

of the h first components to the predictive quality of the

model. The Q²CUM (h) index writes:

CUM jj

PRESS

QRRS



 (3)

where PRESSj PRESS being associated with a j-compo-

nent PLS model and RRSj–1 RRS being associated with a

(j-1) component PLS model. It must be as close as possi-

ble to 1.

2) The R²XCUM index is the sum of the coefficients of

determination between the explanatory variables and the

h first components. It is therefore a measure of the ex-

planatory power of the h first components for the ex-

planatory variables of the model.

21*100

(1)

RX np







(4)

3) The R²YCUM index is the sum of the coefficients of

determination between the dependent variables and the h

first components. It is therefore a measure of the ex-

planatory power of the h first components for the de-

pendent variables of the model.

1(1)() *100

h adjhL

RYRY nh













(5)

where

[]

()











(6)

where []h

is the prediction of the observation yi for an

h-component model and y the average of the nL observa-

tions, yi.

A comparison of proposed downscaling method with

the commonly used principal component regression was

performed. The leading principal components, which

together explain about 98% of predictor’s variability,

were retained to be used in empirical model (PCRM2)

development. The same error criteria were used to assess

the performance of model.

5. Results and Discussions

Seven predictor variables namely air temperature at 925

mb, 500 mb and 200 mb; zonal wind (925 mb); merido-

inal wind (925 mb); zeo-potential height 500 mb and 200

mb at 9 NCEP grid points with a dimensionality of 63,

are used as the standardized data of potential predictors.

These feature vectors are provided as input to the PLS

regression and PCR downscaling models. PLS regression

is performed on this dataset. Results of the PLS regres-

sion model (viz. PLSRM1) and Principal Component

based regression model (viz. PLSRM2) are tabulated in

Table 2.

Model quality indexes Q²CUM index, R²XCUM and

R²YCUM index have been shown in Table 1. It is clear that

all three indexes are highest for the three component of

for predictand precipitation. Q²CUM index, R²XCUM and

R²YCUM index are 0.647, 0.724 and 0.907, respectively.

Hence, model quality can be considered as good.

Coefficient of correlation (CC) was in the range of

0.80-0.87, RMSE was in the range of 43.20-44.98, N-S

Index was in the range of 0.58-0.75 and MAE was in the

range of 0.44-0.58 for PLS regression based model

PLSRM1 for training and validation set. For PCRM2,

Coefficient of correlation (CC) was in the range of 0.42-

0.61, RMSE was in the range of 78.18-120.45, N-S In-

dex was in the range of 0.31-0.10 and MAE was in the

range of −0.16-0.19 for PCR based model PCRM2 for

training and validation set. Hence it is clear from Table 2

that PLS regression is performed better than principal

component regression. A comparison of mean monthly

observed precipitation with precipitation simulated using

PLS regression models PLSRM1 has been shown from

Figure 3 for validation period.

Once the downscaling models have been calibrated

M. GOYAL ET AL.

and validated, the next step is to use these models to

downscale the control scenario simulated by the GCM.

The GCM simulations are run through the calibrated and

validated PLS regression model (viz. PLSRM1) to obtain

future simulations of predictand. The predictand (viz.

precipitation) patterns are analyzed with box plots for 20

year time slices. The middle line of the box gives the

median whereas the upper and lower edges give the 75

Table 1. Various quality measures of PLS regression model

(PLSRM1).

Precipitation (PLSRM1)

Index Comp1 Comp2 Comp3

Q²CUM 0.468 0.633 0.647

R²XCUM 0.486 0.659 0.724

R²YCUM 0.747 0.884 0.907

Figure 3. Typical results for comparison of the monthly

observed precipitation with precipitation simulated using

PLR regression downscaling model PLSRM1 for NCEP

data.

Figure 4. Box plots results from the PLS regression-based

downscaling model PLSRM1 for the predictand precipiation.

percentile and 25 percentile of the data set, respectively.

Typical results of downscaled predictand (precipitation)

obtained from the predictors are presented in Figure 4.

In part (i) of Figure 4, the precipitation downscaled us-

ing NCEP and GCM datasets are compared with the ob-

served precipitation for the study region using box plots.

The projected precipitation for 2001-2020, 2021-2040,

2041-2060, 2061-2080 and 2081-2100 for the four sce-

narios A1B, A2, B1 and COMMIT are shown in (ii), (iii),

(iv) and (v) respectively.

From the box plots of downscaled predictand (Figure

4), it can be observed that precipitation are projected to

increase in future for A1B, A2 and B1 scenarios. The

average value of observed precipitation for last 26 years

(1975-2000) is 608 millimeters. The average value of

precipitation for 100 years (2001-2100) from SRES A1B

scenario of CCCma is 677 millimeters while average

M. GOYAL ET AL.

Table 2. Various performance statistics of model PLSRM1 and PCRM2.

CR SSE MSE RMSE NMSE N-S Index MAE

Model Training Validation Training Validation TrainingValidation Training ValidationTrain-

ing

Valida-

tion Training Valida-

tion

Train-

ing

Valida-

tion

PLSRM1 0.87 0.80 145691.44 111969.46 2023.491866.1644.9843.20 0.250.410.75 0.58 0.580.44

PCRM2 0.61 0.42 311122.00 451324.00 6112.0014509.0078.18120.450.050.030.31 0.10 −0.160.19

value of precipitation for 100 years (2001-2100) from

SRES A2 scenario of CCCma is 719 millimeters. The

mean value of precipitation for 100 years (2001-2100)

from SRES B1 scenario of CCCma is 626 millimeters

while mean value of precipitation for 100 years (2001-

2100) from COMMIT scenario of CCCma is 618 milli-

meters. Hence, it is clear that the projected increase of

precipitation is high for A1B and A2 scenarios whereas it

is least for B1 scenario.

This is because the scenario A1B and A2 have the

highest concentration of atmospheric carbon dioxide

(CO2) equal to 720 ppm and 850 ppm, while the same for

B1 and COMMIT scenarios are 550 ppm and ≈ 370

ppm respectively. Rise in concentration of atmospheric

CO2 in the atmosphere causes the earth’s average tem-

perature to increase, which in turn causes increase in

evaporation especially at lower latitudes. The evaporated

water would eventually precipitate [17]. In the COMMIT

scenario, where the emissions are held the same as in the

year 2000, no significant trend in the pattern of projected

future precipitation could be discerned. The overall re-

sults show that the projections obtained for precipitation

are indeed robust.

6. Conclusions

This paper investigates the suitability of partial least

square regression approach to downscale mean monthly

precipitation from GCM output to local scale. The effec-

tiveness of this model is demonstrated through the ap-

plication of lake catchments in arid region in India. The

predictands are downscaled from simulations of CGCM3

for four IPCC scenarios namely SRES A1B, A2, B1 and

COMMIT.

The selection of relevant predictors used for empirical

model development plays a crucial role. PLS regression

has been applied for selection of important variables

which have a VIP score greater than 0.8. PLS regression

seems to be a useful tool for downscaling. PLS regres-

sion seems to be a useful alternative to the commonly

used PCR method for empirical downscaling. The results

of downscaling models using PLS regression show that

precipitation is projected to increase in future for A2 and

A1B scenarios, whereas it is least for B1 and COMMIT

scenarios using predictors.

5. References

[1] R. Weisse and R. Oestreicher, “Reconstruction of Poten-

tial Evaporation for Water Balance Studies,” Climate Re-

search, Vol. 16, No. 2, 2001, pp. 123-131.

[2] C. Prudhomme, D. Jakob and C. Svensson, “Uncertainty

and Climate Change Impact on the Flood Regime of

Small UK Catchments,” Journal of Hydrology, Vol. 277,

No. 1, 2003, pp. 1-23.

[3] R. L. Wilby, C. W. Dawson and E. M. Barrow, “SDSM –

A Decision Support Tool for the Assessment of Climate

Change Impacts,” Environmental Modelling & Software,

Vol. 17, No. 2, 2002, pp. 147-159.

[4] M. K. Goyal and C. S. P. Ojha, “Robust Weighted Re-

gression as a Downscaling Tool in Temperature Projec-

tions,” International Journal of Global Warming. http://

www.inderscience.com/browse/index.php?journalID=331

&action=coming

[5] R. L. Wilby, “Modelling Low-Frequency Rainfall Events

Using Airflow Indices, Weather Patterns and Frontal Fre-

quencies,” Journal of Hydrology, Vol. 213, No.1-4, 1998,

pp. 380-392.

[6] A. J. Cannon and P. H. Whitfield, “Downscaling Recent

Streamflow Conditions in British Columbia, Canada Us-

ing Ensemble Neural Network Models,” Journal of Hy-

drology, Vol. 259, No. 1, 2002, pp. 136-151.

[7] S. Tripathi, V. V. Srinivas and R. S. Nanjundiah, “Down-

scaling of Precipitation for Climate Change Scenarios: A

Support Vector Machine Approach,” Journal of Hydrol-

ogy, Vol. 330, No. 3-4, 2006, pp. 621-640.

[8] S. Ghosh and P. P. Mujumdar, “Statistical Downscaling

of GCM Simulations to Streamflow Using Relevance

Vector Machine,” Advances in Water Resources, Vol. 31,

No. 1, 2008, pp. 132-146.

[9] E. Kalnay, et al., “The NCEP/NCAR 40-Year Reanalysis

Project,” Bulletin of the American Meteorological Society,

Vol. 77, No. 3, 1996, pp. 437-471.

[10] S. D. Khobragade, “Studies on Evaporation from Open

Water Surfaces in Tropical Climate,” PhD Dissertation,

Indian Institute of Technology, Roorkee, 2009.

[11] K. Bergant and L. K. Bogataj, “N-PLS Regression as

Empirical Downscaling Tool in Climate Change Studies,”

Theoretical and Applied Climatology, Vol. 81, No. 1-2,

2005, pp. 11-23.

[12] R. Manne, “Analysis of Two Partial Least Squares Algo-

rithms for Multivariate Calibration,” Chemometrics and

Intelligent Laboratory Systems, Vol. 2, No. 1, 1987, pp.

187-197.

[13] W. Svante, M. Sjostrom and L. Eriksson, “PLS-Regre-

ssion: A Basic Tool of Chemometric,” Chemometrics and

Intelligent Laboratory Systems, Vol. 58, No. 2, 2001, pp.

109-130.

[14] B. C. Hewitson and R. G. Crane, “Climate Downscaling:

M. GOYAL ET AL.

Techniques and Application,” Climate Research, Vol. 7,

1996, pp. 85-95.

[15] I. G. Chong and C. H. Jun, “Performance of Some Vari-

able Selection Methods When Multicollinearity is Pre-

sent,” Chemometrics and Intelligent Laboratory Systems,

Vol. 78, No. 1-2, 2005, pp. 103-112.

[16] L. Eriksson, E. Johansson, N. Kettaneh-Wold and S.

Wold, Multi- and Megavariate Data Analysis: Principles

and Applications, Umetrics Academy, Umeå, 2001.

[17] A. Anandhi, V. V. Srinivas, D. N. Kumar, R. S. Nanjun-

diah, “Role of Predictors in Downscaling Surface Tem-

perature to River Basin in India for IPCC SRES Scenar-

ios Using Support Vector Machine,” International Jour-

nal of Climatology, Vol. 29, No. 4, 2009, pp. 583-603.