The research shows that projection pursuit cluster (PPC) model is able to form a suitable index for overcom-ing the difficulties in comprehensive evaluation, which can be used to analyze complex multivariate prob-lems. The PPC model is widely used in multifactor cluster and evaluation analysis, but there are a few prob-lems needed to be solved in practice, such as cutoff radius parameter calibration. In this study, a new model-projection pursuit dynamic cluster (PPDC) model-based on projection pursuit principle is developed and used in water resources carrying capacity evaluation in China for the first time. In the PPDC model, there are two improvements compared with the PPC model, 1) a new projection index is constructed based on dynamic cluster principle, which avoids the problem of parameter calibration in the PPC model success-fully; 2) the cluster results can be outputted directly according to the PPDC model, but the cluster results can be got based on the scatter points of projected characteristic values or the re-analysis for projected character-istic values in the PPC model. The results show that the PPDC model is a very effective and powerful tool in multifactor data exploratory analysis. It is a new method for water resources carrying capacity evaluation. The PPDC model and its application to water resources carrying capacity evaluation are introduced in detail in this paper.
The difficulty frequently encountered in water resources carrying capacity evaluation is that there are so many factors and the complex interrelationship among them, which cannot be evaluated according to only one factor, all the effect factors associated with water resources carrying capacity must be thought over. However, up till now, there is no a unified standard of evaluation index system in the world. Presently, it is difficult to resolve complex high dimensional problem directly. If there is an effective way to reduce the dimensionality, multidimensional space problems can be resolved on visual space, such as three-dimensional space, two-dimensional space even one-dimensional space.
Friedman and Tukey developed a projection pursuit principle [
According to projection pursuit principle, many new mathematical analysis methods for high-dimensional data exploratory analysis also have been developed [2-8], and projection pursuit cluster (PPC) model is one of them. The PPC model is an effective method for multifactor data exploratory analysis, which is widely used in multivariable prediction, cluster and evaluation [9-15].
However, the PPC model does have disadvantage in practice as follows: 1) Being the only parameter in the PPC model, the cutoff radius is hard to estimate, even though it has a significant effect on the results. Nowadays, the cutoff radius are still set based on experience, i.e. it may be set to ten percent of the square root of the data variance along the largest principal axis [
In order to resolve the problem mentioned above, Wang and Ni developed a projection pursuit dynamic cluster (PPDC) model and it was used in regional partition of water resources in China [
A linear projection technique is described in this study. High-dimensional data is projected onto one-dimensional space, and the feature of high-dimensional data was studied through the projected characteristics of the one-dimensional space [
If (and. is the total number of samples, is the total number of effect factors of sample) is the initial value of the factor of the sample, the steps of developing the PPDC model are the following [
In order to eliminate the effect of different ranges of values of cluster factors, the initial data are standardized before it is used in the PPDC model. And the standardization formula used in this study is
where and are the initial maximum and minimum of the factor respectively.
In essence, projection is to observe data characteristic from all angles. The main purpose of projection pursuit is to find hidden structure in higher-dimensional data sets by searching through all their low-dimensional projections [
where is projection axis vector, and it is also called projection direction vector in the PPC model.
Cluster analysis is a tool for exploratory data analysis that tries to find the intrinsic structure of data by organizing patterns into groups or clusters [
Define () as the absolute value of distance between the projected characteristic values and, namely.
Let, and define as
Then, assume that the all samples are classified as () clusters. is the projected characteristic value space of cluster, which contains all the projected characteristic values of cluster, and that
where, and, and is the initial cluster core of both cluster and cluster, respectively. In practice, the average projected characteristic value of clusters is used as new cluster core to conduct the iteration until the criterion is met [
Next define
and
Finally, according to and, the new projection index in the PPDC model can be defined as
The bigger the value of is, the bigger of distance between data points will be, and the smaller the value of is, the smaller of distance between data points will be. The projection index measures the degree to which the data points in the projection are both concentrated locally (small) while, at the same time, expanded globally (large) [
According to the above analysis, the PPDC model can be expressed by
From (8), it is shown that the PPDC model reflects an optimum problem. Genetic algorithm (GA) has been able to converge with global optimum while coping with the large and complex problems [
The PPDC model is used in water resources carrying capacity evaluation in China. Five major factors of water resources carrying capacity are selected as index system: 1) per capita available amount of water resources (m3·person-1), 2) per unit GDP available amount of water resources (10-2 m3·(RMB Yuan)-1), 3) available amount of water resources per the estimated price of 45 kinds of potential resources (10-2 m3·(RMB Yuan )-1), 4) per arable area available amount of water resources(m3·hm-2) and 5) per unit area of available amount of water resources (104 m3·km-2). This Index system may reflect the water resources supporting capacity for population development (1 factor), economy development (2 and 3 factors) and eco-environment protection (4 and 5 factors). The data is shown in
The IPPC model is used to do a cluster analysis of regional partition in China according to its water resources carrying capacity.
In order to comparative analysis, we do water resources carrying capacity clustering in two cases, namely three clusters and four clusters. Based on the data in
The right projection direction is, when p = 3
and when
.
The projected characteristic value z and the cluster results also can be got, which are shown in
In
The schematic diagram of regional partition of water resources carrying capacity in China is shown in
The bigger the value of z is, the better the water resources carrying capacity will be. According to the index system in this study, the results of the PPDC model led
to four major conclusions: 1) the situation of water resources carrying capacity in south China is better than that of in north China. Tibet Autonomous Region, Guangdong Province and Fujian Province are the first three regions being the best in water resources carrying capacity in China. That is to say, in the regions of cluster 1, the development of society and economy may be very suitable for water resources situation; 2) the most regions being poor level of water resources carrying capacity are centered largely in north China and Gansu Panhandle. Ningxia Hui Autonomous Region is a serious situation of water resources carrying capacity, and Inner Mongolia Autonomous Region, Gansu Province and Xinjiang Uygur Autonomous Region next; 3) the cluster results in this study are consistent with the facts of China. Because many rivers such as Yangtze River, Ya-lu-tsang-pu River, Nujiang-Salween River, Lancangjiang-Mekong River, and Pearl River run through or rise in the southern part of China, there are abundant water resources in south China. There is good water resources carrying capacity in south China, too. Therefore, South-to-North Water Transfer Project that is being put into practice is one of the effective measures to improve the water resources carrying capacity level for north China; 4) the distribution situation of regional partition of water resources carrying capacity is similar to that of water resources quantity in China [
The PPDC model combines dynamic cluster method with projection pursuit principle, which is an effective improvement for the PPC model. Because there is no parameter calibration and the final result of need can be outputted directly, the PPDC model is easy to operate in practice. The studies show that the PPDC model is a new method for water resources carrying capacity evaluation. However, the application of the PPDC model in multifactor evaluation needs to be improved further. On the other hand, water quality is one of the main factors of water resources carrying capacity, which related to the availability of water resource. Because of lacking water quality data, there are no water quality indexes in evaluation index system in this research. The evaluation in this study is mainly focus on the water resources quantity rather than water quality.
This work is part of the Program of China Meteorological Administration (CCFS-09-19) and Institute of Plateau Meteorology of China Meteorological Administration (BROP200801 and BROP200907). The constructive comments and suggestions from the editor and anonymous reviewers, which resulted in a significant improvement of the manuscript, are gratefully appreciated. The opinions expressed here are those of the authors and not those of other individuals or organizations.