### Paper Menu >>

### Journal Menu >>

Fuzzy Time Series Forecasting Based On K-Means Clustering Zhiqiang Zhang Department of Statistics School of Economics Xiamen University, Xiamen, PR. China e-mail: jsxzx06@xmu.edu.cn Qiong Zhu School of Mathematical Science Xiamen University, Xiamen, PR. China e-mail: 516191479@qq.com Abstract—Many forecasting models based on the concepts of Fuzzy time series have been proposed in the past decades. These models have been widely applied to various problem domains, especially in dealing with forecasting problems in which historical data are linguistic values. In this paper, we present a new fuzzy time series forecasting model, which uses the historical data as the universe of discourse and uses the K-means clustering algorithm to cluster the universe of discourse, then adjust the clusters into intervals. The proposed method is applied for forecasting University enrollment of Alabama. It is shown that the proposed model achieves a significant improvement in forecasting accuracy as compared to other fuzzy time series forecasting models. Keywords fuzzy time series; fuzzy sets; K-means; enrollments 1. Introduction A drawback of traditional forecasting methods is that they can not deal with forecasting problems in which the historical data are represented by linguistic values. Using fuzzy time series to deal with forecasting problems can overcome this drawback. Song and Chissom were the pioneers of studying fuzzy time series models. The historical enrollment data of the University of Alabama were first adopted by Song and Chissom [1][2]. Because of its better performance in some kinds of forecasting problems, many researchers have proposed different fuzzy time series models in order to improve the forecasting accuracy. Chen [3] presented a simplified method of fuzzy time series forecasting of enrollments using the arithmetic operations rather than complicated max-min composition operations. Wang, Chen, and Lee [4] considered to use high–order time variant fuzzy time series model to deal with enrollment forecasting. Huarng [5] presented a heuristic model for fuzzy time series using heuristic knowledge to improve the forecast of enrollments. Jilani, Burney, Ardil [6] used a triangular function to define the fuzzy sets. In this paper, we present a new method to forecast enrollments based on k- mean clustering techniques. First, we select the historical data as the universe of discourse. Then we present the k-mean clustering algorithm for clustering the data into different lengths of intervals. Based on the new obtained intervals, we can propose a new method to forecast the enrollment of the university of Alabama. The proposed model is easy for implementation and the forecasting is more accurate than the other fuzzy time series methods. The rest of this paper is organized as follows. In Section 2, we briefly review the basic definitions of fuzzy time series models. In Section 3, we present a new method for handing forecasting problems based on k-means clustering techniques through the experiments of forecasting of the university of Alabama. In Section 4, we make a comparison of the proposed forecasting model with existing methods. Finally, summary and conclusions will be drawn in Section 5. I. FUZZY TIME SERIES In this section, we briefly review some basic concepts of fuzzy time series proposed by Song and Chissom [1][2], where the values of fuzzy time series are represented by fuzzy sets. Let U be the universe of discourse, where ^` n uuuU ,,, 21 " . A fuzzy set A in the universe of discourse U can be represented by n nA AA u uf u uf u uf A)( )()( 2 2 1 1 " (1) Where A f is the membership function of the fuzzy set A , ]1,0[: oUf A , )( iA uf denotes the grade of the membership of i u in the fuzzy set A , and ni dd1 . Let ",2,1,0),( ttY , is a subset of R , be the universe of discourse on which fuzzy sets )(tf i ˈ ...,3,2,1 i are defined and )(tF is the collection of )(tf i , then )(tF is called fuzzy time series on )(tY . If there exists a fuzzy logical relationship ),1( ttR such that ),,1()1()( ttRtFtF where both )(tF and )1( tF are fuzzy sets and the symbol “ ”is the max-min composition operator , then )(tF is called derived by )1( tF , denoted by a fuzzy logical relationship shown as follows: )()1( tFtFo . If i AtF )1( and j AtF )( , where i A and j A are fuzzy sets, then the fuzzy logical relationship between )1( tF and )(tF can be represented by Open Journal of Applied Sciences Supplement：2012 world Congress on Engineering and Technology 100 Cop y ri g ht © 2012 SciRes. ji AA o , where i A and j A are called current state and the next state of the fuzzy logical relationship, respectively. 2. A New Method For Fuzzy Time Series Forecasting In this section, we present the stepwise procedure of the proposed method for fuzzy time series forecasting based on historical time series data and apply the proposed method to forecast the enrollments of the University of Alabama. TABLE 1 shows the historical enrollments data of the University of Alabama. TABLE 1. HISTORICAL ENROLLMENTS OF UNIVERSITY OF ALABAMA YearActual enrollmentsYearActual enrollments 1971 13055 1982 15433 1972 13563 1983 15497 1973 13867 1984 15145 1974 14696 1985 15163 1975 15460 1986 15984 1976 15311 1987 16859 1977 15603 1988 18150 1978 15861 1989 18970 1979 16807 1990 19328 1980 16919 1991 19337 1981 16388 1992 18876 The proposed method and the experiment results are now presented as follows: Step 1: Apply the K-means clustering algorithm to partition the historical time series data into 14 clusters and sort the data in clusters in an ascending sequence, the results are as follows: {13055},{13563},{13867},{14696},{15145,15163},{15311,1 5433,15460,15497},{15603},{15861},{15984},{16388},{168 07,16859,16919},{18150},{18876,18970},{19328,19337}. Step 2: Calculate the cluster center m centercluster _ shown in TABLE 2 of each cluster m cluster as follows: r d centercluster r jj m ¦ 1 _ (2) Step 3: Adjust the clusters into intervals according to the follow rules. Assume that m centercluster _ and 1 _ m centercluster are adjacent cluster centers, then the upper bound m uBoundcluster _ of m cluster and the lower bound 1 _ m lBoundcluster of 1m cluster shown in TABLE 2 can be calculated as follows: 2 __ _ 1 mm m centerclustercentercluster uBoundcluster (3) mm uBoundclusterlBoundcluster __ 1 (4) wher e .1,2,1 km " Because there is no previous cluster before the first cluster and there is no next cluster after the last cluster, the lower bound 1 _lBoundcluster of the first cluster and the upper bound k uBoundcluster _ of the last cluster can be calculated as follows: )__( __ kk kk lBoundclustercentercluster centerclusteruBoundcluster )__( __ 11 11 centerclusteruBoundcluster centerclusterlBoundcluster After applying the procedure, we can get the following intervals and calculate the middle value of the interval in TABLE 2, )13309,12801[ 1 u )13715,13309[ 2 u )14282,13715[ 3 u )14925,14282[ 4 u )15290,14925[ 5 u )15514,15290[ 6 u )15732,15514[ 7 u )15923,15732[ 8 u )16186,15923[ 9 u )16625,16186[ 10 u )17506,16625[ 11 u )18537,17506[ 12 u )19128,18537[ 13 u ]19537,19128[ 14 u Step 4: Define each fuzzy set i X based on the intervals and the historical enrollments shown in TABLE 1, where fuzzy set i X denotes a linguistic value of the enrollments represented by a fuzzy set. As in [6], we use a triangular function to define the fuzzy sets i X . Step 5: Defuzzify the fuzzy data using the forecasting formula The support of National Social Science Fund Project (11BTJ001), MOE Key Laboratory of Econometrics and Fujian Key Laboratory of Statistical Sciences are gratefully acknowledged. Cop y ri g ht © 2012 SciRes.101 ° ° ° ° ° ° ¯ ° ° ° ° ° ° ® dd njif aa njif aaa jif aa t nn jjj j ,, 15.0 5.1 12,, 5.015.0 2 1,, 5.01 5.1 1 11 21 (5) TABLE 2. THE INTERVALS GENERATION PROCESS FROM THE CLUSTERS OF THE HISTORICAL ENROLLMENTS OF UNIVERSITY OF ALABAMA cluster datacluster center lower bound upper bound middle valu e 1 {13055} 13055 12801 13309 13055 2 {13563} 13563 13309 13715 13512 3 {13867} 13867 13715 14281.5 13998 4 {14696} 14696 14281.5 14925 14603.25 5 {15145, 15163} 15154 14925 15289.6 15107.3 6 {15311, 15433, 15460, 15497} 15425.25 15289.6 15514.1 15401.9 7 {15603} 15603 15514.1 15732 15623.1 8 {15861} 15861 15732 15922.5 15827.25 9 {15984} 15984 15922.5 16186 16054.3 10 {16388} 16388 16186 16624.85 16405.4 11 {16807, 16859, 16919} 16861.7 16624.85 17505.85 17065.4 12 {18150} 18150 17505.85 18536.5 18021.2 13 {18876, 18970} 18923 18536.5 19127.8 18832.2 14 {19328, 19337} 19332.5 19127.8 19537.3 19332.6 Where 11 ,, jjj aaa are the midpoints of the fuzzy intervals 11 ,, jjj XXX respectively. j t yields the predicted enrollment. The forecasted enrollment is provided in TABLE3. TABLE3. FORECASTING OF THE PROPOSED MODEL Year Enroll- ment s Fuzzy set Fore— cast Year Enroll- ment s Fuzzy set Fore- cast 1971 13055 X1 13204 1982 15433 X6 15381 1972 13563 X2 13511 1983 15497 X6 15381 1973 13867 X3 14017 1984 15145 X5 15049 1974 14696 X4 14567 1985 15163 X5 15049 1975 15460 X6 15381 1986 15984 X9 16082 1976 15311 X6 15381 1987 16859 X11 17120 1977 15603 X7 15617 1988 18150 X12 17963 1978 15861 X8 15832 1989 18970 X13 18743 1979 16807 X11 17120 1990 19328 X14 19163 1980 16919 X11 17120 1991 19337 X14 19163 1981 16388 X10 16474 1992 18876 X13 18743 3. A Comparsion of Different Forecasting Methods In this section, a comparison of accuracy in forecasted values of our proposed model with other models is made on the basis of mean square error (MSE) of forecasted values which are computed as: n valueforecastedvalueactual n iii ¦ 1 2 )__( MSE (6) where n is the number of years needed to forecast the enrollments. The comparison of MSE of the proposed method with different methods are shown in TABLE 4 and TABLE 5. TABLE 4. A COMPARISON OF MES OF THE PROPOSED METHOD WITH THE EXISTING METHODS Year Enroll- ment Song [1] Song [2] Chen [3] Wan g [4] 1971 13055 - - - - 1972 13563 14000 - 14000 - 1973 13867 14000 - 14000 - 1974 14696 14000 - 14000 - 1975 15460 15500 14700 15500 - 1976 15311 16000 14800 16000 16260 1977 15603 16000 15400 16000 15511 1978 15861 16000 15500 16000 16003 1979 16807 16000 15500 16000 16261 1980 16919 16813 16800 16833 17407 1981 16388 16813 16200 16833 17119 1982 15433 16789 16400 16833 16188 1983 15497 16000 16800 16000 14833 1984 15145 16000 16400 16000 15497 1985 15163 16000 15500 16000 14745 1986 15984 16000 15500 16000 15163 1987 16859 16000 15500 16000 16384 1988 18150 16813 16800 16833 17659 1989 18970 19000 19300 19000 19150 1990 19328 19000 17800 19000 19770 1991 19337 19000 19300 19000 19928 1992 18876 - 19600 19000 15837 102 Cop y ri g ht © 2012 SciRes. MSE - 775687 407507 321418 226611 3. Conclution The study proposed a new method for fuzzy time series forecasting with high accuracy. The K-means algorithm of the proposed method is simple and can be implemented easily by using mathematic software-Matlab. The method has been implemented on the historical time series data of enrollments of University of Alabama to have a comparative study with the existing methods. From Table 4 and Table 5 we can see that the proposed method has a higher forecasting accuracy rate than the methods presented before. TABLE 5. A COMPARISON OF MES OF THE PROPOSED METHOD WITH THE EXISTING METHODS Year Enroll- ment Huarng [5] Jilani [6] Our Method 1971 13055 - 13579 13204 1972 13563 14000 13798 13511 1973 13867 14000 13798 14017 1974 14696 14000 14452 14567 1975 15460 15500 15373 15381 1976 15311 15500 15373 15381 1977 15603 16000 15623 15617 1978 15861 16000 15883 15832 1979 16807 16000 17079 17120 1980 16919 17500 17079 17120 1981 16388 16000 16497 16474 1982 15433 16000 15737 15381 1983 15497 16000 15737 15381 1984 15145 15500 15024 15049 1985 15163 16000 15024 15949 1986 15984 16000 15883 16082 1987 16859 16000 17079 17120 1988 18150 17500 17991 17963 1989 18970 19000 18802 18743 1990 19328 19000 18994 19163 1991 19337 19500 18994 19163 1992 18876 19000 18916 18743 MSE - 86694 41426 22717 REFERENCES [1] Q. Song, B.S. Chissom, “Forecasting enrollments with fuzzy time series—Part I”, Fuzzy Sets and Systems, 54 (1993b) 1-10. [2] Q. Song, B.S. Chissom, “Forecasting enrollments with fuzzy time series—Part II”, Fuzzy Sets and Systems, 62 (1994) 1-8. [3] S. M. Chen, “Forecasting enrollments based on fuzzy time series”, Fuzzy Sets and Systems, 81 (1996) 311-319. [4] J. R. H Wang, S. M. Chen, C. H. Lee, “:Handing forecasting problems using fuzzy time series”, Fuzzy Sets and Systems, 100 (1998) 217-228. [5] K. Huarng, “Heuristic models of fuzzy time series for forecasting”, Fuzzy Sets and Systems, 123 (2001) 369- 386. [6] T. A. Jilani, S. M. A. Burney, C. Ardil, “ Fuzzy metric approach for fuzzy time series forecasting based on frequency density based partitioning”, In: Proceedings of World Academy of Science, Engineering and Technology 23 (2009) 1307-6884. Cop y ri g ht © 2012 SciRes.103 |