^{1}

^{*}

^{1}

^{2}

^{3}

^{3}

^{1}

This study aims to develop a statistical modelling framework of urban water consumption forecast for the city of Aquidauana, Brazil from year 2005 to 2014, monthly data, using multiple linear regression, cluster analysis, and principal component analysis. These forecasts were based on historical data collected through SANESUL System (Water Systems of South Mato Grosso). The meteorological data were provided by the Water Resources Monitoring Center of South Mato Grosso—CEMTEC. The statistical model developed explains 71.5% of the variance with three factors: number of consumers (19.3%), seasonality (37.8%), and climate regression (14.3%). The model was further validated using an independent set of data from January 2005 to November 2014, with an R2 of 86% and error of 1.7%. The results indicated no intervention of climate variables in the phenomenon. This tool, combined with the perception of the potential and limitations of managers of water resources and public policy makers, can be used in the regulation of per capita consumption, and thereby achieve the optimization of available resources and also contribute to the sustainable perspective of water resources.

Research needs to address the climate change impacts of problems using hydrological models include estimates of scaling parameters, model validation, generation of climate scenario, and data, and modular modeling tools to provide a framework to facilitate interdisciplinary research. Solutions to these problems would significantly improve the ability of models to assess the effects of climate change [

Assessment of seasonal and long-term water availability is not only important for sustaining human life, biodiversity and the environment, but also helpful for water authorities and farmers to determine agricultural water management and water allocation. Climate change is one of the greatest pressures on the hydrological cycle along with population growth, pollution, land use changes, and other factors [

Many studies have evaluated climate change impacts on stream flow including spatial components of water availability by using various modelling methods across the world climates [

Climate impacts on water resources are varied in different river basins. The frequency of droughts and floods will increase under future climate conditions. Runoff and streamflow are more sensitive to rainfall than to evapotranspiration. Efficient water use and integrated management will be increasingly important for reducing the impacts on water scarcity and droughts. Although many water management approaches have been adapted to mitigate climate impacts, there is still a need to determine local solutions. It is necessary to know how much water can be used in each irrigation area and the river basin, when the water is available and how much water can be stored for use in the drought period, variability quantity of water resources over a long-term basis and associated links with energy and biodiversity.

The aim of this study is to develop a statistical model to predict the future of urban water consumption as a result of climatic changes, to develop a better understanding considering factors of economic and climatic.

For this study, the consumption region of Aquidauana city was chosen with average daily water consumption of 381 liters/day. Temperature is one of the factors that can influence water consumption [

Aquiduana is located in the south of the Midwest Brazilian region, in the Pantanal of South Mato Grosso (wetlands), micro-region of Aquidauana. It is located at latitude 20˚28ꞌ15" South and longitude 55˚47ꞌ13" West, at an altitude of 149m. It is situated between the Piraputanga and the Maracaju mountain ranges. Its territory is divided into two parts: the low one (two-thirds of the town) and the high one in the mountain ranges).

The tropical climate of the region, with an annual average of 27˚C, features two opposing moments. The period between October and April is marked by floods and high temperatures. While from mid-July to end of September, it is represented by a period of drought, with frosts and milder temperatures of approximately 15˚C. It occupies an area of 16 958.496 km^{2}.

SPSS (Statistical Package for the Social Sciences) and

For the processing of data, the following resources were used: descriptive statistics, correlation analysis, development of scatter plots, hierarchical clustering analysis of main components, and finally, statistical modeling and model validation based on residual analysis.

Regarding the statistical model, the existence of a reasonable number of intervening variables guided the use of multiple regression analysis and correlation as limiting indicator of the participation of these variables in the model. Therefore, the explanatory variables in the model were considered, those with higher correlation coefficients. From the reduced variables,

where

The Seasonal coefficient indicator (SC) includes the effect of seasonality on models and here it was calculated by the ratio between the measured volume of the month by the average of the measured volume of the year, as seen in Equation (2):

In this study a descriptive analysis of the variables shall be done and, subsequently, the hypotheses will be tested using multiple regression models. The root mean square error (RMSE) has been used to verify the accuracy of the model.

where:

Multivariate analysis techniques, like ACP techniques, are powerful tools when analysing a great number of variables. They allow a reduction of the observation matrix dimension without losing the important pieces of information of the original data, enabling thus further investigation of the time-space behaviour of the variables involved in the problem, as well as detecting groups of variables that present homogeneous behaviour. This method has the objective of describing data contained in an individual-numerical character matrix: characters p are measured in n individuals.

Basic information gathering, in main component analyses, is the data matrix. In n observations there are m variables, so the normalized data matrix (with zero mean and one variance) of the wind speed can be presented as m ´ n, and indicated by Z, from where the correlation matrix R, given in Equation 1, can be obtained.

where

Due to the eigenvector orthogonality, the inverse of

Each line of Z corresponds to an MC that forms the temporal series associated to the eigenvalues. Values of Y in the n-th local may be calculated by:

The solution to this equation is unique. It considers the total variation present at the initial variable group, in which MC1 explains the maximum possible variance of the initial data, whereas MC2 explains the maximum possible variance still unexplained, and so forth, until the last MCm which contributes with the smaller parcel of explanation to the total variance of initial data.

In the case of this study, every MC has a portion of the total variance of wind speed monthly data, and they are arranged in decreasing order of the most significance eigenvalues of a_{1} in A, given by.

Total variance of the system (V) is defined as the sum of the variances of the observed variables; therefore, V is given by:

where S is the variance of observed variables, and λᵢ are the eigenvalues. The matrix trace can be understood as well as the total sum of the main diagonal of the correlation matrix.

The variance explained by each component is:

The chosen number of MCs was based on the Kaiser truncation criterion, which considers as the most significant eigenvalues those values which are superior to the unit.

There are two types of methods or group classification algorithms. One is the hierarchical method, in which the partition of the groups starts from a minimum of groups not initially defined. The major groups are divided into minority subgroups grouping those individuals who have similar characteristics. The final structure of classes is presented as a classification tree (dendrogram) having an objective summary of the results. The other is a non- hierarchical method of classification in which the number of groups is set a priori. In both clustering methods, classification of individuals into different groups is made from a grouping function and a mathematical grouping criterion [

This is a hierarchical method which uses Euclidean distance to measure the similarity or dissimilarity between the individuals, that is, the distance between X_{i} and X_{j} individuals is given by [

Ward proposes that at any stage of the analysis the loss of information, which results from the grouping of individuals into clusters, is measured by the total sum of squared deviations (SQD) of every point from the mean of the cluster to which it belongs [

where n is the total number of the elements of the grouping and x_{i} is the ith element of the grouping.

The survey of the average monthly consumption, in turn, showed that it varies throughout the year, being higher in the summer, peaking in January and lower in the winter, especially July. In general, the trend in consumption is to decrease from the month of March on and increase from the month of November on. The month of August has a peak compared to the winter months, a result of dry weather that occurs during this period, which causes an increase in consumption. During the week, Sunday is the day of lowest consumption and Friday the highest. Wednesdays and Saturdays are days close to the average consumption.

The same may occur in relation to consumption throughout the day. In general, the peak consumption takes place from 12:00 p.m., when it becomes more or less constant, with minor variations until 5:00 p.m. Then, it begins to decrease at about 6:00 p.m., becoming nearly constant over the period between 9:00 p.m. and 12:00 a.m. The period between 1:00 to 6:00 a.m. shows a reduction in consumption, and the minimum occurs at 6 o’clock in the morning. After this period, it starts to increase again.

We used 3285 observations, whose variables were classified according to the class, month, seasonality, rainfall (rain), temperature (Temp), relative humidity (RH), wind speed, and number of consumers. The statistical validation sample (n) was effected by calculating the size of stratified random sample for average estimates for finite population [

Descriptive statistics are presented in

Positive and negative weak correlations concerning the variables were observed. The graphical representation of quantitative variables allowed understanding the joint behavior of the variables as to whether or not there is the association between them. A very useful device to verify that association is the scatter plot [

Referring to Figures 1-3, they show a lack of interaction between the variables, then suggesting the null association between the air temperature variables, rainfall, wind speed and relative humidity with water consumption. This result was also confirmed according to

The graphical analysis of Figures 1-3 presented correlation trends, since they show increasing linear trend and formation of dispersion clouds, respectively. ViannaI, V. and Depexe, M. D. [

Max. Tmp | Max. Hmd | Wind Speed | Rain | Consumption m^{3} | Number of consumers | SC | Estimated | |
---|---|---|---|---|---|---|---|---|

Average | 32.17 | 90.20 | 13.10 | 6.61 | 139,274 | 12,781 | 1.00 | 139,669 |

Standard deviation | 2.53 | 5.93 | 2.55 | 30.67 | 15,416 | 658 | 0.08 | 14,348 |

Maximum | 37.37 | 96.77 | 30.96 | 301.13 | 185,999 | 13,982 | 1.22 | 175,660 |

Minimum | 23.53 | 68.00 | 8.64 | 0.00 | 107,529 | 11,793 | 0.84 | 109,622 |

Consumption | Temperature | Humidity | Speed | Rain | Number of consumers | |
---|---|---|---|---|---|---|

Temperature | 0.909 | |||||

Humidity | 0.599 | 0.566 | ||||

Speed | 0.66 | 0.617 | −0.021 | |||

Rain | 0.101 | 0.124 | 0.048 | −0.284 | ||

Number of consumers | 0.791 | 0.766 | 0.845 | 0.292 | 0.027 | |

SC | 0.998 | 0.911 | 0.598 | 0.684 | 0.082 | 0.785 |

Significant correlation is noted at the 0.01 level.

analyzes. Thus, the accuracy and errors obtained were analyzed. At the end of the study, an equation capable of predicting the volume of consumed water from Umuarama with acceptable errors was reached.

Lins et al., [

The scatter plot, shown in

Daily urban water consumption in Aquidauana from 2005 to 2014 was modeled and the statistical model developed explains 71.5% of the variance with the following three factors: number of consumers (19.3%), seasonality (37.8%), and climate regression (14.3%). The model was further validated using an independent set of data from January 2014 to August 2014, yielding an R^{2} of 86%. The results indicated a good performance of the statistical model developed to describe the temporal variations of the use of urban water in Aquidauana.

Considering also the selection of the intervening variables in water demand, a cluster analysis was performed, which is a set of statistical techniques whose aim is to group objects according to their characteristics, forming groups or homogeneous clusters [

Wong; J S; Zhang, Q, Chen, YD [^{2} of 76%.

Studies between detrended seasonal urban water use and weather and climate variables (precipitation, maximum temperature) is examined at daily, monthly, and seasonal scales using stepwise multiple regression and autoregressive integrated moving average (ARIMA) models. At a seasonal and a monthly timescales, interannual variation in maximum temperature is the most important predictor of seasonal water consumption per capita, explaining up to 48% of the variation in seasonal monthly water consumption. At a daily scale, one-day lagged seasonal water demand and maximum temperature are the variables that are significant in all the daily models. Together with day of the week and precipitation, these variables explained up to 87% of the variation in seasonal daily water consumption in summer. ARIMA models that take into account temporal autocorrelation explain between 70 and 81% of daily seasonal water consumption in summer months [

Concerning the selection of variables―a statistical model using regression techniques was developed. Then, the statistical model presented in the following multiple linear equation was obtained. The coefficients of the model at a significance level at 1% probability level for the F test were the following: a) Aquidauana: linear coefficient = −155,517; maximum temperature = 118; minimum humidity = −74; wind speed = −297; rain = 4.8; number of consumers = 13.4; seasonality = 130,186 and error = 1.7% and R^{2} = 0.865.

The verification of the adequacy of the model was performed using residual analysis in

By analyzing the histogram, it was concluded at first that the waste presented normal distribution since the frequencies appeared close to the normal distribution curve. However, the Kolmogorov-Smirnov test did not confirm the hypothesis of normality. After the waste was analyzed and a violation of the initial assumptions was diagnosed, the verification of possible biases is recommended for the model to fit the data and the assumptions made.

There is a positive relationship consumption of water and temperature and inverse relationship with precipitation [

The study of water use has been made in seasonal or daily variations [

To draw meaningful inferences on water consumption as it relates to weather and climate variability, multi- scale analysis is needed. Multi-scale temporal analyses allow us to project short-term and long-term water demand based on the fluctuations of climate variables, namely temperature and precipitation. Water resource managers need not only seasonal climate but also daily weather information as they relate to water supply and demand, and may need to identify the most important variables for short-term operational (i.e. daily, weekly) and mid- to long-term tactical or strategic (i.e. monthly, seasonal, yearly) planning [

The validation of the model was performed by relative error method [^{2} of 86.5%. An independent check was conducted using daily water consumption for Campo Grande from 1 January 2005 to 31 December 2014 (

It was observed that there was an average value of per capita water consumption in the regions of 175 L/inhabi- tant per day, which is a number consistent with typical values for community size [

In addition, the premise of the correlation between water consumption and socio-economic factors can be confirmed, which is a hypothesis of significant differences in the distribution of water consumption due to the different socio-economic conditions of the population. It is recommended that further investigation is directed to adjustments to the model proposed by insertion and interaction of economic variables and holiday periods.

The use of models to meet the management of water resources brings the perspective of a useful tool that can assist in the expansion and regulation of water supply, assuming the local context as a projection and optimization parameter in demand variability.

Universities by releasing their teachers for elaboration of work and Sanitation Company Basico de Agua de Mato Grosso do Sul by the release of water consumption data and number of consumers and the Climate Monitoring Centre and Water Resources of Mato Grosso do South by the release of climate data.

Amaury deSouza,FlavioAristone,IsmailSabbah,Debora A. daSilva Santos,Ana Paola deSouza Lima,GabrielaLima, (2015) Climatic Variations and Consumption of Urban Water. Atmospheric and Climate Sciences,05,292-301. doi: 10.4236/acs.2015.53022