Journal of Intelligent Learning Systems and Applications, 2011, 3, 230241 doi:10.4236/jilsa.2011.34026 Published Online November 2011 (http://www.SciRP.org/journal/jilsa) Copyright © 2011 SciRes. JILSA Recurrent Support and Relevance Vector Machines Based Model with Application to Forecasting Volatility of Financial Returns Altaf Hossain1*, Mohammed Nasser2 1Department of Statistics, Islamic University, Kushtia, Bangladesh; 2Department of Statistics, Rajshahi University, Rajshahi, Bang ladesh. Email: *rasel_stat71@yahoo.com, mnasser.ru@gmail.com Received May 15th, 2011; revised June 15th, 2011; accepted June 24th, 2011. ABSTRACT In the recent years, the use of GARCH type (especially, ARMAGARCH) models and computationalintelligencebased techniques—Support Vector Machine (SVM) and Relevance Vector Machine (RVM) have been successfully used for financial forecasting. This paper deals with the application of ARMAGARCH, recurrent SVM (RSVM) and recurrent RVM (RRVM) in volatility forecasting. Based on RSVM and RRVM, two GARCH methods are used and are compared with parametric GARCHs (Pure and ARMAGARCH) in terms of their ability to forecast multiperiodically. These models are evaluated on four performance metrics: MSE, MAE, DS, and linear regression R squared. The real data in this study uses two Asian stock market composite indices of BSE SENSEX and NIKKEI225. This paper also examines the effects of outliers on modeling and forecasting volatility. Our experiment shows that both the RSVM and RRVM perform almost equally, but better than the GARCH type models in forecasting. The ARMAGARCH model is superior to the pure GARCH and only the RRVM with RSVM hold the robustness properties in forecasting. Keywords: RSVM, RRVM, ARMAGARCH, Outliers, Volatility Forecasting 1. Introduction In financial markets, volatility is important as its fore casts on stock price are crucial for portfolio selection, pricing derivatives, calculating measure of risk and hed ging strategy. A risk manager must know today the like lihood that his portfolio will decline in the future and he may want to sell it before it becomes too volatile [1]. Ac cording to Merton [2], expected market return is related to predictable stock market volatility. Due to the neces sity of volatility prediction, a large number of time series based volatility models have been developed since the induction of ARCH model of Engle [3]. Later Bolleslev [4] generalized the model as GARCH to capture a higher order of ARCH; See Ref. [5] for review and references. To deal with the intricacy specially, Wong et al. [6] adopted the wellknown GARCH model in the form of the socalled mixture of ARGARCH model in exchange rate prediction. Again, Tang et al. [7] explored the mix ture of ARMAGARCH model for stock price prediction; See [8] for more details. Evidence on the forecasting ability of the GARCH model is somewhat mixed. An derson and Bollerslev [9] showed that the GARCH mo del provides good volatility forecast. Conversely, some empirical studies showed that the GARCH model tends to give poor forecasting performances [1015]. To obtain more accurate predictions, recently, machine learning approaches have been successfully introduced to predict volatility based on various models of GARCH family. For example, Ref. [16] for Neural Network based GJR model, Ref. [17]: SVM based GARCH; Ref. [18]: RVM based GARCH, EGARCH and GJR; Ref. [1920] for SVM based GARCH with wavelet and spline wavelet kernels, and Ref. [21] for Neural Network based on nine different models of GARCH family. The neural network suffers from overfitting problems and the algorithm can result in a local minima solution which is not unique [22]. In this regard, Support Vector Machine developed by Va pnik [23] is a novel neural network algorithm model with various applications to prediction problems [2428]. The algorithm results in the globally optimum solution. The SVM algorithm, based on structural risk minimization, is equivalent to solving a convex programming problem
Recurrent Support and Relevance Vector Machines Based Model with Application to Forecasting 231 Volatility of Financial Returns where the unique result is always obtained. Moreover, with the help of kernel function satisfying Mercer’s con ditions, the difficulty of working with a nonlinear map ping function in high dimensional space is removed [29]. The RVM, an alternative method of SVM, is a prob abilistic model introduced by Tipping in 2000. The RVM has recently become a powerful tool for prediction prob lems. One of the main advantages is that the RVM has functional form identical to SVM and hence it enjoys various benefits of SVM based techniques: generaliza tion and sparsity. On the other hand, RVM avoids some disadvantages faced by SVM such as the requirement to obtain optimal value of regularized parameter, C, and epsilon tube; SVM needs to use Mercer’s kernel function and it can generate point prediction but not distributional prediction in RVM [30]. Tipping [30] illustrated the RVM’s predictive ability on some popular benchmarks by comparing it with the SVM. The empirical analysis also proved that the RVM outperformed SVM; some other applications of RVM in prediction problems are referred to [3133]. Chen et al. [34] applied SVM to model and forecast GARCH (1, 1) volatility based on the concept of recur rent SVM (RSVM) in Chen et al. [35], following from the recurrent algorithm of neural network and least square SVM of [36]. Accordingly, Ou and Wang [37] proposed RVM (as recurrent RVM) to model and fore cast GARCH (1, 1) volatility based on the concept of recurrent SVM in [34,35]. The models were shown to be a dynamic process and capture long memory of past in formation than the feedforward SVM and RVM which are just static. Multiperiod forecasts of stock market return volatil ities are often used in many applied areas of finance wh ere long horizon measures of risk are necessary. Yet, ve ry little is known about how to forecast variances several periods ahead, as most of the focus has been placed on oneperiodahead forecasts. In this regard, only Chen et al. [34] considered multiperiodahead forecasting with oneperiodahead forecasting. They showed that multi periodahead forecasting method performs better than the counterpart in forecasting volatility. Specifically, Ou and Wang [37] did not consider multiperiodahead method in forecasting volatility by RRVM. Yet, none of them investigated the above models’ (GARCH type, RSVM and RRVM) combination in the context of two Asian stock market (emerging) composite indices: BSE SEN SEX and NIKKEI225. It is important for us to forecast the BSE and NIKKEI225 markets volatility more accu rately for recent potential growth of the markets. Our first contribution is to deal with the application of ARMAGARCH with pure GARCH, RSVM and RRVM in volatility forecasting of multiperiodahead. Based on RSVM and RRVM, two GARCH methods are used and are compared with parametric GARCHs (Pure and ARMAGARCH) in terms of their ability to forecast volatility of two Asian stock market (emerging) compos ite indices: BSE SENSEX and NIKKEI225. Secondly, being inspired by Tang et al. [7], we put more emphasis on the comparison between the ARMA GARCH and pure GARCH models in forecasting vola tileity of emerging stock market returns. Of increasing importance in the time series modeling and forecasting is the problem of outliers. Volatility of e merging stock market returns poses especial challenges in this regard. In sharp contrast to the well developed sto ck markets, emerging markets are generally characterized by high volatility. In addition, high volatility in these markets is often marked by frequent and erratic changes, which are usually driven by various local events (such as political developments) rather than by the events of global importance [38,39]. Outliers in time series were first stu died by Fox in 1972. The outliers, which are really inde pendent, are the situations that cause the parameter esti mation values in classical modeling (ARMA and GAR CH type) to be subjective, they damage the processes even though they are set properly and it is an obligation to destroy or to eliminate the effects. They diminish the reliability of the results; see Ref. [4043] for more details. Outliers may affect forecasts through the carryover effect on the ARCH and GARCH terms, and may have a per manent effect on the parameter estimates. There are dif ferent types of outliers (like innovational and additive outlier) with different criteria (like Likelihood Ratio and Lagrange Multiplier) for detecting them in conventional time series volatility (GARCH type) modeling; for ex ample, [4345], etc. But the outliers are not classified in this paper. Also the numerical tests (like Likelihood Ra tio and Lagrange Multiplier) are not used to detect the outliers in this paper; rather we use a graphical (Quantile Quantile) test to detect general outliers very simply. De spite the voluminous research that examines the effects of outliers on the properties of the GARCH type models, no attention has been given to the effects of outlying ob servations in the combination of GARCH type models and computationalintelligencebased techniques (SVM and RVM) in forecasting financial volatility of emerging stock market returns. Thirdly, we are to reexamination the effects of outli ers on the ACFs, descriptive statistics, and classical tests (LjungBox Q and ARCHLM) in context of emerging stock markets. Finally, we check the impact of outliers or unusual ob servations in the model estimation and forecasting, that is, Copyright © 2011 SciRes. JILSA
Recurrent Support and Relevance Vector Machines Based Model with Application to Forecasting 232 Volatility of Financial Returns examining the robustness properties of RSVM and RR VM compared with GARCH type model, especially, in forecasting volatility in the presence of outliers. The remainder of the paper is organized as follows. The next two sections review the SVM and RVM algori thms. Section 4 specifies the empirical model and foreca sting scheme. Section 5 describes the BSE SENSEX and NIKKEI225 composite index data and discusses the vo latility forecasting performance of all models. Finally the conclusion is made in section 6. 2. Support Vector Machines The SVM deals with the classification and regression problems by mapping the input data into the higherdi mensional feature spaces. In this paper, the SVM deals only with the regression problems. Its central feature is that the regression surface can be determined by a subset of points or SupportVectors (SV); all other points are not important in determining the surface of the regression. Vapnik introduced a εinsensitive zone in the error loss function (Figure 1). Training vectors that lie within this zone are deemed correct, whereas those that lie outside the zone are deemed incorrect and contribute to the error loss function. As with classification, these incorrect vec tors also become the support vector set. Vectors lying on the dotted line are SV, whereas those within the εinsen sitive zone are not important in terms of the regression function. The SVM algorithm tries to construct a linear function such that training points lie within a distance ε (Figure1). Given a set of training data 11 ,,,, nn yxyX R, where X denotes the space of the input patterns, the goal of SVM is to find a function xthat has at most ε de viation from the targetsifor all the training data and, at the same time, is as flat as possible. y Let the linear function f takes the form: ,; , xwxbwXbR (1) Figure 1. Approximation function (solid line) of SV regres sion using a εinsensitive zone. The optimal regression function is given by the mini mum of the functional, 2 1 Φ,2ii i wwC (2) where C is prespecified value, and , are slack variables representing upper and lower constraints on the outputs of the system. Flatness in (1) means a smaller w. Using aninsensitive loss function, 0for otherwise fx y Ly fx y (3) the solution is given by, * ** *,,1 , * 1 1 max ,max, 2 n iij jij ij n iii i i Wx yy * x (4) with constraints, * * 1 0,, 1,2,, 0 ii n ii i Ci l (5) Solving equation of (4) with constraints Equation (5) determine the Lagrange multipliers, * , and the re gression function is given by (1), where * 1 1, 2 n iii i rs wx bwxx (6) w is determine by training patterns xi, which are SVs. In a sense, the complexity of the SVM is independent of the dimensions of the input space because it only depends on the number of SV. To enable the SVM to predict a nonlinear situation, we map the input data into a feature space. The mapping to the feature space F is denoted by :n x The optimization Equation (4) can be written as * * , ** *,1 , * 1 max , 1 max , 2 n n iij jij ij iii i i W x yy The decision can be computed by the inner products, Copyright © 2011 SciRes. JILSA
Recurrent Support and Relevance Vector Machines Based Model with Application to Forecasting 233 Volatility of Financial Returns (),( ) ij x without explicitly mapping to a higher dimension which is a timeconsuming task. Hence the kernel function is as follows: ,, xzx z By using a kernel function, it is possible to compute the SVM without explicitly mapping in the feature space. 3. Relevance Vector Machines Let 1i be a training data. The goal is to model the data by a function indexed by parameters defined as ,n ii xt 1 ;T m jjj yxwwx wx , (7) where basis function 1,, T m xx T w is non linear, 1mis weight vector and x in input vector. Hence the target is sum of the function and error term: ,,ww ii ty i (8) and vector form of the target is written as ty (9) For simplicity, i are assumed to follow independent Gaussian process with mean zero and variance2 . The likelihood of the complete dataset corresponding to (8) is obtained as the following 22 12 22 2 ,, 1 2πexp 2 ii ii ptwN y ty . For the n simultaneously training points, 12 22 2 12 2 22 2 1 ,2πexp 2 1 2πexp , 2 n iii n ptwt y tw (10) where ,and is (n × m) design matrix with nand . As [46], to avoid overfitting problems which may be caused by the Maxi mum likelihood estimation of w and σ2, zero mean Gaus sian prior over the weights w is introduced, 1,,T n tt t 1 1,, , ii xKx 1,, T m ww w 1,,x ,, T in xKxx T x 1 00, n iii Pw Nw (11) where αi is the ith element of vector hyperparameter α assigned to each model parameter wi. By Bayes rule, The posterior in (12) cannot calculated directly as de nominator of (12) contain normalizing integral i.e., 22 ,,,,dddPt PtwPww2 . However, the posterior can be decomposed as 22 ,,,, , 2 wtPwtP t (13) Now, the first term of (13) can be written as below 2 2 2 , ,, , Ptw Pw Pwt Pt . Noticeably, 22 ,,PtPtw Pww dis con volution of Gaussians, Pw is Gaussian prior and 2 ,Ptw is also Gaussian likelihood by (10), imply ing the posterior 2 ,,Pwt is Gaussian which ob tained as [47], 2 111 2 2 ,, 1 2πexp 2 nT Pwt ww (14) with covariance 1 2ΦΦ T (15) and mean 1ΦTt (16) where 1 diag, , , on A . In order to evaluate and we need to find the hyperparameters and 2 which maximize the second term of (13): 22 ,,PtPtPP 2 . For uniform hyperprior, it is just required to maximize the term 2 , t with respect to and 2 by ignoring P and 2 P . Then, the problem becomes 22 2,, ,, ,, Ptw Pw Pw tPt (12) 1 21 2 2 1 21 ,,d 2πΦΦ 1 expΦΦ . 2 nT TT PtPtw Pww IA tIAt 22 (17) This is called marginal likelihood which needs to maximize with respect to and2 . The maximization process is known as type II maximum likelihood method or evidence procedure. The hyperparameters are esti mated by iterative method as it cannot be obtained in closed form. As from [30], the solutions are obtained, new 2. i i i (18) Copyright © 2011 SciRes. JILSA
Recurrent Support and Relevance Vector Machines Based Model with Application to Forecasting 234 Volatility of Financial Returns by differentiating (17) and equating to zero. i is the ith posterior mean weight from (16) and 1. ii Σii 0,1 can be interpreted as a measure of well deter minedness of each parameter i. ii. is the ith diagonal element of the posterior weight covariance in (15) com puted with the current w and 2 values. Another differentiation with respect to 2 leads to 2 new 2Φ Σii t n . (19) The learning algorithm is applied repeatedly to (18) and (19) with updating of and Σ in (15) and (16) until suitable convergence criteria are obtained. During the reestimation, many i tend to infinity such that w will have a few nonzero weights that will be considered as relevance vectors and analogous to the support vectors of SVM. Thus the resulting model enjoys the properties of SVM such as sparsity and generalize tion. Given a new input x*, the probability distribution of the output is given by the predictive distribution *2 *22 **2 * ,,,,,d , MP MPMPMP MP Pt tPtwPwtw Nt y which has the Gaussian form with mean *T y * x and variance 22 ** * T MP x *;yx . So the predict tive mean is and the predictive variance com poses of two variance components. 4. Empirical Modeling and Forecasting Scheme In this paper, the data we analyze is just the daily finan cial returns, t, converted from the corresponding price or index, , using continuous compounding transfor mation as y t p 1 100 lnln tt t pp (20) A GARCH (1, 1) specification is the most popular form for modeling and forecasting the conditional vari ance of return of volatility, [48]. Therefore, we consider GARCH (1, 1) model throughout our paper. 4.1. The Linear Pure GARCH/ARMAGA RCH Model The basic Linear “pure” GARCH (1, 1) model tt yt (21) 22 11 11tt w 2 t y ) (22) ~(0,1 tIID where 2 t is their conditional variance. The basic Linear ARMA(p, q)GARCH(1, 1) model 11 11 ... ... ttptptqtqt yyuu u (23) 11 11 ... ... tttttptptqtqt uyy yuu u 2 t u 22 11 11tt w (24) when p = 1 and q = 0, then it is reduced to AR(1) GARCH(1, 1) process. The important point is that the conditional variance of t is given by1 u22 1/ ˆ ttttt Eu u . Thus, the conditional variance of t is the ARMA process given by the ex pression u 2 t in the equation (22 or 24) [4,49,50]. 2 111 1ttt uwu ww 2 1t 2 t (25) 22 2 /1 ˆ ttttt wuu u where is white noisy errors. The parameters w, t w1 and 1 must satisfy, 1 0w0 , and 10 11 1 to en sure that the conditional variance is positive. Together with the nonnegative assumption, if , then is covariance stationary. 2 t u 4.2. Recurrent SVM/RVM Based GARCH Model For recurrent SVM or RVM methods, the nonlinear AR (1)GARCH (1, 1) model has the following form: 1ttt fy u (261) 22 11 , ttt uguw w t (262) The algorithm of the recurrent SVM or RVMbased GARCH model is described as follows: Step 1: fit SVM (or RVM) to the returnas AR (1) format in the full sample period N, t y 1; 1,2,..., ttt fyutN , to obtain residuals, 12 , ,..., uu u. Step 2: run the recurrent SVM (or RVM) for squared residuals, ( N1 < N) without updating, 22 2 12 ,,..., n uu u 22 11 , ttt uguw w t to obtain n multiperiodahead forecasted volatilities: 22 2 11 121 ˆˆ ˆ , ,..., NN uuu n . For estimations, set the residuals of t to be zero at the first time in the Step 2, and then run the feedforward SVM (or RVM) to obtain estimated residuals. Using the estimated residuals as new1tinputs, this process can be carried out repeatedly until the stopping criterion is satis fied. Unlike the parametric case, by using the proposed w w Copyright © 2011 SciRes. JILSA
Recurrent Support and Relevance Vector Machines Based Model with Application to Forecasting 235 Volatility of Financial Returns approach we don’t need any assumption on the model parameters for stationary condition. We use R packages: “e1071” and “kernlab” to model and forecast SVM and RVM, respectively in the experiment. 4.3. Evaluation Measures and Proxy of Actual Volatility Although the Mean Square Error 2 22 1 1ˆ MSE n iii uu n is a perfectly acceptable measure of performance, in practice the ultimate goal of any testing strategy is to confirm that the results of models are robust and capable of measuring the profitability of a system. It is important, therefore, to design a test from the outset. According to [5153], the prediction performance is also evaluated using the following statistics: Mean Absolute Error (MAE) and Directional Symmetry (DS), expressed as follows [54,55]: 22 1 1ˆ MAE n iii uu n 1 100 DS %, n i ia n 2222 11 ˆˆ 1, where, 0, Otherwise ii ii i uu uu a MAE measures the average magnitude of forecasting error which disproportionately weights large forecast errors more gently relative to MSE; and DS measures the correctness of the turning points forecasts, which gives a rough indication of the average direction of the fore casted volatility. Also linear regression technique is employed to evalu ate the forecasting performance of the volatility models. We simply regress squared return on the forecasted vola tility for outofsample time point; the squared correla tion R2 is a measure of forecasting performance. We re port the proportion of the sample variation explained by the forecasts with the R2 statistic [56] defined by 2 22 1 2 2 22 11 ˆ 11 n iii nn ii ii uu R uu n The fundamental problem with the evaluation of vola tility forecasts of real data is that volatility is unobserv able and so actual values, with which to compare the forecasts, do not exist. Therefore, researchers are neces sarily required to make an auxiliary assumption about how the actual ex post volatility is calculated. In this pa per, we use square of the return assuming its mean value equal to zero as the proxy of actual volatility against which MSE, MAE, DE and R2 can be calculated; because this approach is the standard one, following from the previous research of [17,18,37]. The proxy of actual volatility in real data is expressed as where : returns. 22 , tt uyt y 5. Empirical Results 5.1. Data Description We examine Bombay Stock Exchange (BSE) SENSEX Index of India Stock Market and NIKKEI225 of Japan Stock Market in the experiment. It is important for us to forecast the BSE SENSEX and NIKKEI225 markets volatility more accurately. Recently the potential growth of these two markets has attracted foreign and local in vestors. The BSE index has increased by over ten times from June 1990 to the present. Using information from April 1979 onwards, the longrun rate of return on the BSE SENSEX works out to be 18.6% per annum, which translates to roughly 9% per annum after compensating for inflation. The NIKKEI225 average has deviated sharply from the textbook model of stock averages which grow at a steady exponential rate. The average hit its alltime high on December 29, 1989, during the peak of the Japanese asset price bubble, when it reached an in traday high of 38957.44 before closing at 38915.87, having grown sixfold during the decade. Subsequently it lost nearly all these gains, closing at 7054.98 on March 10, 2009—81.9% below its peak twenty years earlier. The stock index prices are collected from Yahoo Fi nance and are transformed into log returns before making analysis. For BSE, the whole sample of size 1000, span ned from 05 Oct. 2006 to 01 Nov. 2010, is used in the experiment to check the predictive capability and reli ability of the proposed models. First 900 data are taken for the insample estimation and last 100 data are re served for out of sample forecasting. For NIKKEI225, the whole sample of size 2411, spanned from 04 Jan. 2001 to 01 Nov. 2010, is used in the experiment to serve the same purposes. First 2171 data are for the insample estimation and the last 240 data are reserved for out of sample forecasting. The daily series for the loglevels and the returns of the BSE and NIKKEI225 are depicted in Figures 2 and 3, respectively. Both figures show that the returns series are meanstationary, and exhibit the typical volatility clus tering phenomenon with periods of unusually large vola tility followed by periods of relative tranquility. The autocorrelation functions (ACFs) of the return and squared return series for both markets are depicted in Figures 4 and 5, respectively. In Figure 4 (nonsquared Copyright © 2011 SciRes. JILSA
Recurrent Support and Relevance Vector Machines Based Model with Application to Forecasting 236 Volatility of Financial Returns ACFs), almost all the spikes are within the boundary (for med by standard errors), that is, ACFs decay very quickly toward zero; whereas almost all the spikes go out of the boundary in Figure 5 (squared ACFs), that is, it produces slowly decreasing positive autocorrelation functions of the squared returns, especially for NIKKEI225. Figure 5 indicates that the volatility clustering is reflected in the significant correlations of squared returns. The autocorrelation coefficients of squared returns are larger and last longer (persistent) than those of the return series (nonsquared). We must point out that the return (a) (b) Figure 2. Bombay stock exchange (BSE) index: 2006.10.5 2010.11.1. (a) Loglevels; (b) Returns. (a) (b) Figure 3. Japan stock exchange (NIKKEI 225) index: 2001. 1.42010.11.1. (a) Loglevels; (b) Returns. Figure 4. ACF for the returns of both markets. Figure 5. ACF for squared returns of both markets. series’ show little or no correlation, but its squares show high correlation, which indicate the ARCH or GARCH effect, especially for NIKKEI225. It is not clear—why the volatility clustering is not clearly/remarkably reflected in the significant correlations of squared return of BSE. It may happen due to influencing outlying observations. Figures 6 and 7 show the QQ plots of returns and squared returns, respectively, for each market. Figure 6 (of return series) shows that some observations question the assumption of normality, that is, they may be out liers/unusual observations. It is clear from Figure 7 (of squared return) that the assumption of normality is vio lated showing two outliers or unusual observations for BSE and a group of outliers for NIKKEI255. Table 1 reports the summary statistics and diagnostics for the total sample of BSE and NIKKEI225 returns. From Copyright © 2011 SciRes. JILSA
Recurrent Support and Relevance Vector Machines Based Model with Application to Forecasting 237 Volatility of Financial Returns the table, we can see that means of the returns are not far from zero as expected. Both the series are typically characterized by excessive kurtosis and asymmetry. The BeraJargue [57] test strongly rejects the normality hy pothesis on the returns for each market. For both series, the LjungBox Q (10) statistics of returns indicate no significant correlation at 1% and 5% level of signifi cances; but at 10% and more level of significance, there is relevant autocorrelation in the return series of BSE. The Q (10)* values of the squared returns reveal that there is no significant correlation in the squared returns at Figure 6. QuantileQuantile plots for the returns of BSE and NIKKEI225. Figure 7. QuantileQuantile plots for the squared retur ns of BSE and NIKKEI225. Table 1. Descriptive statistics for the returns of BSE and NIKKEI. Returns BSE SENSEX NIKKEI 225 Minimum –11.60444 –12.11103 Maximum 15.98998 13.23458 Mean 0.04965155 –0.01669452 Variance 4.058489 2.682107 Skewness 0.171791 –0.2858808 Kurtosis 6.093471 6.220848 Normality 272.5166 [0.0000] 52.794 [3.434e12] Q(10) 18.26964 [0.0505] 13.48905 [0.19759] Q(10)* 7.091121 [0.7168] 16.04430 [0.09837] ARCHLM 7.187405 [0.8449] 16.22137 [0.18130] Note: Kurtosis quoted is excess kurtosis; Normality is the BeraJargue (1981) normality test; Q (10) is the LjungBox Q test at 10 order for raw returns; Q (10)* is LB Q test for squared returns; ARCHLM is Engle’s (1982) LM test for ARCH effect. Significance levels (pvalues) are in brackets. 1% and 5% level of significances; even at 10% level of significance. Engle’s (Engle, 1982) ARCH tests show that there is no significant evidence in support of GA RCH effects (i.e., heteroscedasticity) for both series (str ongly for BSE). This numerical examination of daily returns on the BSE and NIKKEI225 data reveals that returns are not characterized by heteroscedasticity and timevarying au tocorrelation in spite of having the following preevi dences and stylized facts: 1) The graphical test (Figure 5) indicates the presence of time varying volatility in BSE and NIKKEI225, 2) the statistics (Maximum, Variance and Kurtosis) of BSE and NIKKEI225 are comparatively higher than those of the other markets used in the previ ous research with different periods, and 3) generally, the return series exhibit volatility clustering and leptokurtic pattern for most of the market in the world. This situation (or problem) is created due to the outliers or unusual ob servations detected by the graphical test (Figure 7: QQ). To check the robustness properties of the used models, the unusual observations are kept in the data set. How ever, we assume that the return series of BSE and NIK KEI225 exhibit volatility clustering and leptokurtic pat tern. Therefore, it is very suitable to model and forecast the return series by GARCH (1, 1). 5.2. In Sample Estimation or Training Results We first fit the insample returns series to GARCH (1, 1) and ARMAGARCH (1, 1) models in (22) and (24) to obtain their Maximum Likelihood Estimates. The esti mation results and the diagnostic test results of GARCH (1, 1) and ARMAGARCH (1, 1) volatility models for the BSE and NIKKEI225 returns are not reported here as the main focus is given in out of sample forecasting. It is Copyright © 2011 SciRes. JILSA
Recurrent Support and Relevance Vector Machines Based Model with Application to Forecasting Volatility of Financial Returns Copyright © 2011 SciRes. JILSA 238 seen that based on log likelihood (LL), AIC, and BIC criteria, ARMAGARCH (1, 1) model is more adequate to the data than pure GARCH (1, 1) model. Now we turn to consider our models, recurrent support vector machine and recurrent relevance vector machine. The considered models must be trained using the above algorithm stated in Step 1 and Step 2. While training RSVM, two parameters and Care considered since they are sensitive for modeling the SVM. is assumed to be 0.005 and used for all cases. We apply tenfoldcrossva lidation technique to tune the values of and with the range [2–5, 25] and [2–5, 25], respectively. The optimal parameters C ,C = (22, 2–2) and (25, 22) for training BSE and NIKKEI225, respectively. Table 2 illustrates the training results (only the number of support vectors and relevance vectors) by RSVM and RRVM for both markets. From the Table 2, we can see that RRVM is more adequate to the insample series compared to RSVM for each market since RRVM produces smallest number of relevance vectors compared to the number of support vectors of RSVM. 5.3. Out of Sample Volatility Forecasting Results Table 3 summarizes the forecasting performance based on four measures defined in section 4.3, MSE, MAE, DS and R square. From the table 3, we can see that ARMA GARCH generates smaller values of SqrtMSE (3.1742) and MAE (3.0199) but larger value of R2 (0.00553) than those of pure GARCH for BSE. For NIKKEI225, ARMA GARCH generates smaller values of SqrtMSE (2.8094) and MAE (2.2401) than those of pure GARCH. Both the ARMAGARCH and pure GARCH produce the same value of DS for each market and R2 for NIKKEI225. Hence the ARMAGARCH model outperforms the pure GARCH model. Whereas RSVM and RRVM, they provide better per formance than GARCH type models (pure GARCH and ARMAGARCH) for all cases except the RSVM for NI KKEI225 based on MSE and DS, where GARCH type models perform better than RSVM. If we make compa rison between RSVM and RRVM, the RSVM is better than RRVM based on MAE and R2 only; but in term of MSE and DS, the RRVM is better than RSVM for both markets. The forecasting performances of GARCH type models are very poor compared to that of RSVM and RRVM due to outliers affect on traditional GARCH type model in forecasting; that is, both the RSVM and RRVM (not GARCH type) hold the robustness properties in for ecasting through estimation. Figures 8 and 9 plot multiperiodahead forecasts by the machine learning models (RSVM and RRVM) and GARCH type models (pure GARCH and ARMAGAR Table 2. Training results for RSVM and RRVM. BSE NIKKEI225 No. of S.V.s 661 2049 No. of R.V.s 77 49 Figure 8. Volatility forecasts of BSE index returns. Table 3. Multiperiodahead forecasting accuracy by different models for real data. BSE NIKKEI225 Models Sqrt MSE MAE DS R2 Sqrt MSEMAE DS R2 GARCH 3.2568 3.1024 51 0.00485 2.8124 2.2464 52.5 0.00071 ARMAGARCH 3.1742 3.0199 51 0.00553 2.8094 2.2401 52.5 0.00071 RSVM 1.1160 0.6258 65 0.03011 3.0073 1.6864 50.4 0.00417 RRVM 1.0670 0.8104 87 0.02973 2.7466 1.7677 95.8 4.58E5
Recurrent Support and Relevance Vector Machines Based Model with Application to Forecasting 239 Volatility of Financial Returns Figure 9. Volatility forecasts of NIKKEI225 index returns. CH) against actual values for BSE and NIKKEI225, re spectively. From the plots we can see that the machine learning techniques generate better forecasting perform ances than the GARCH type models. The forecasted se ries’ by GARCH type models are pushed up due to outli ers affect; where the ARMAGARCH model is less af fected than pure GARCH. The series by RSVM is not affected due to its robustness properties; where RRVM is slightly affected by outliers for BSE. For NIKKEI, no model is remarkably affected by outliers; because of hav ing a group of outliers which may not be very much in fluential. 6. Conclusions To measure the model performances, we apply recurrent SVM and RVM to model GARCH as hybrid approaches comparing with traditional pure GARCH and ARMA GARCH models to forecast (multiperiodically) volatility of Asian (emerging) stock markets, BSE SENSEX and NIKKEI225. The above models are evaluated using the criteria: MSE, MAE, DS and linear regression R2 in out ofsample forecasting. In the parallel way, we examine the robustness properties of the used models in forecast ing volatility in the presence of outliers, where outliers are detected very simply using QQ plot. Using QQ pl ots of the observed volatility, two outliers are clearly de tected for BSE and a group of outliers for NIKKEI225. Due to the affect of these outliers, the ACFs (especially, of BSE), descriptive statistics and classical tests (Ljung Box Q and ARCHLM) give the misleading results, wh ich agree with the previous research on outliers in time series analysis. From the experimental results, we can come to the conclusion that 1) the outliers significantly affect the parameter estimates of the pure GARCH and ARMAGARCH models, 2) the RRVM produces small lest number of relevance vectors compared to the number of support vectors of RSVM, 3) the computationalin telligencebased techniques (RSVM and RRVM) per form better than the GARCH type models in outofsam ple forecasting, 4) the ARMAGARCH model is superior to the pure GARCH model in outofsample forecasting, 5) both the RSVM and RRVM perform almost equally in outofsample forecasting—the RSVM is better than RRVM based on MAE and R2, but in terms of MSE and DS, the RRVM is better than RSVM, and 6) RRVM with RSVM holds the robustness properties in forecasting through estimation, however, RRVM is slightly affected by outliers for being Bayesian approach. Theoretically, RVM is a probabilistic model having its functional form identical to SVM, where there is no requirement of free parameters and Mercer’s kernel function for RVM like SVM. Considering the above empirical results and theo retical properties of RVM and SVM, we are in favor of recurrent RVM (like the previous research) in forecasting volatility of emerging stock markets, even in the pres ence of outliers. REFERENCES [1] R. F. Engle and A. J. Patton, “What Good Is a Volatility Model?” Journal of Quantitative Finance, Vol. 1, No. 2, 2001, pp. 237245. doi:10.1088/14697688/1/2/305 [2] R. C. Merton, “On Estimating the Expected Return on the Market: An Exploratory Investigation,” Journal of Fi nancial Economics, Vol. 8, 1980, pp. 323361. doi:10.1016/0304405X(80)900070 [3] R. F. Engle, “Autoregressive Conditional Heteroscedas ticity with Estimates of the Variance of United Kingdom Inflation,” Econometrica, Vol. 50, No. 2, 1982, pp. 987 1007. doi:10.2307/1912773 [4] T. Bollerslev, “Generalized Autoregressive Conditional He teroscedasticity,” Journal of Econometric, Vol. 31, No. 3, 1986, pp. 307327. doi:10.1016/03044076(86)900631 [5] S. H. Poon and C. Granger, “Forecasting Volatility in Fi nancial Markets: A Review,” Journal of Economic Lit erature, Vol. 41, No. 2, 2003, pp. 478539. doi:10.1257/002205103765762743 [6] W. C. Wong, F. Yip and L. Xu, “Financial Prediction by Finite Mixture GARCH Model,” Proceedings of Fifth In ternational Conference on Neural Information Processing, Kitakyushu, 2123 October 1998, pp. 13511354. [7] H. Tang, K. C. Chun and L. Xu, “Finite Mixture of ARMAGARCH Model for Stock Price Prediction,” Pro ceedings of 3rd International Workshop on Computatio nal Intelligence in Economics and Finance (CIEF 2003), North Carolina, 2630 September 2003, pp. 11121119. [8] A. Hossain and M. Nasser, “Comparison of Finite Mix ture of ARMAGARCH, Back Propagation Neural Net works and SupportVector Machines in Forecasting Fi nancial Returns,” Journal of Applied Statistics, Vol. 38, Copyright © 2011 SciRes. JILSA
Recurrent Support and Relevance Vector Machines Based Model with Application to Forecasting 240 Volatility of Financial Returns No. 3, 2011, pp. 533551. [9] T. G. Andersen and T. Bollerslev, “Answering the Skep tics: Yes, Standard Volatility Models Do Provide Accu rate Forecasts,” International Economic Review, Vol. 39, No. 4, 1998, pp. 885905. doi:10.2307/2527343 [10] T. J. Brailsford and R. W. Faff, “An Evaluation of Vola tility Forecasting Techniques,” Journal of Banking and Finance, Vol. 20, No. 3, 1996, pp. 419438. doi:10.1016/03784266(95)000151 [11] R. Cumby, S. Figlewski and J. Hasbrouck, “Forecasting Volatility and Correlations with EGARCH Models,” Jour nal of Derivatives Winter, Vol. 1, No. 2, 1993, pp. 5163. [12] S. Figlewski, “Forecasting Volatility,” Financial Markets, Institutions and Instruments, Vol. 6, No. 1, 1997, pp. 188. doi:10.1111/14680416.00009 [13] P. Jorion, “Predicting Volatility in the Foreign Exchange Market,” Journal of Finance, Vol. 50, No. 2, 1995, pp. 507528. doi:10.2307/2329417 [14] P. Jorion, “Risk and Turnover in the Foreign Exchange Market,” In: J. A. Franke, G. Galli and A. Giovannini, Eds., The Microstructure of Foreign Exchange Markets, Chicago University Press, Chicago, 1996. [15] D. G. McMillan, A. E. H. Speight and O. Gwilym, “Fore casting UK Stock Market Volatility: A Comparative Ana lysis of Alternate Methods,” Applied Financial Econom ics, Vol. 10, No. 4, 2000, pp. 435448. doi:10.1080/09603100050031561 [16] R. G. Donaldson and M. Kamstra, “An Artificial Neural NetworkGARCH Model for International Stock Returns Volatility,” Journal of Empirical Finance, Vol. 4, No. 1, 1997, pp. 1746. doi:10.1016/S09275398(96)000114 [17] F. PerezCruz, J. A. AfonsoRodriguez and J. Giner, “Es timating GARCH Models Using Support Vector Machi nes,” Journal of Quantitative Finance, Vol. 3, 2003, pp. 163172. [18] P. H. Ou and H. Wang, “Predicting GARCH, EGARCH, GJR Based Volatility by the Relevance Vector Machine: Evidence from the Hang Seng Index,” International Re search Journal of Finance and Economics, No. 39, 2010, pp. 4663. [19] L. B. Tang, H. Y. Sheng and L. X. Tang, “Forecasting Vo latility Based on Wavelet Support Vector Machine,” Ex pert Systems with Applications, Vol. 36, No. 2, 2009, pp. 29012909. [20] L. B. Tang, H. Y. Sheng and L. X. Tang, “GARCH Pre diction Using Spline Wavelet Support Vector Machine,” Journal of Neural Computing and Application, Vol. 18, No. 8, 2009, pp. 913917. [21] M. Bildirici and Ö. Ö. Ersin, “Improving Forecasts of GARCH Family Models with the Artificial Neural Net works: An Application to the Daily Returns in Istanbul Stock Exchange,” Expert Systems with Applications, Vol. 36, No. 4, 2009, pp. 73557362. doi:10.1016/j.eswa.2008.09.051 [22] L. J. Cao and F. Tay, “Application of Support Vector Machines in Financial Time Series Forecasting,” Interna tional Journal of Management Science, Vol. 29, No. 4, 2001, pp. 309317. [23] V. N. Vapnik, “The Nature of Statistical Learning The ory,” 2nd Edition, SringerVerlag, New York, 1995. [24] L. J. Cao and F. Tay, “Modified Support Vector Machines in Financial Time Series Forecasting,” Journal of Neuro computing, Vol. 48, No. 14, 2002, pp. 847861. [25] K. J. Kim, “Financial Time Series Forecasting Using Sup port Vector Machines,” Journal of Neurocomputing, Vol. 55, No. 12, 2003, pp. 307319. [26] W. Huang, Y. Nakamori and S. Y. Wang, “Forecasting Stock Market Movement Direction with Support Vector Machine,” Journal of Computers & Operational Re search, Vol. 32, No. 10, 2005, pp. 513522. [27] C. J. Lu, T. S. Lee and C. C. Chiu, “Financial Time Series Forecasting Using Independent Component Analysis and Support Vector Regression,” Journal of Decision Support Systems, Vol. 47, No. 2, 2009, pp. 115125. [28] H. S. Kim and S. Y. Sohn, “Support Vector Machines for Default Prediction of SMEs Based on Technology Credit,” European Journal of Operational Research, Vol. 201, No. 3, 2010, pp. 938846. [29] A. J. Smola and B. Scholkopf, “A Tutorial on Support Vector Regression,” Journal of Statistics and Computing, Vol. 14, No. 3, 2004, pp. 199222. [30] M. E. Tipping, “Sparse Bayesian Learning and the Rele vance Vector Machine,” Journal of Machine Learning Research, Vol. 1, 2001, pp. 211244. [31] C. M. Bishop and M. E. Tipping, “Variational Relevance Vector Machine,” In: C. Boutilier and M. Goldszmidt, Eds., Uncertainty in Artificial Intelligence, Morgan Kau fmann, Waltham, 2000, pp. 4653. [32] S. Ghosh and P. P. Mujumdar, “Statistical Downscaling of GCM Simulations to Stream Flow Using Relevance Vector Machine,” Advances in Water Resources, Vol. 31, No. 1, 2008, pp. 132146. [33] D. Porro, N. Hdez, I. Talavera, O. Nunez, A. Dago and R. J. Biscay, “Performance Evaluation of Relevance Vector Machines as a Nonlinear Regression method in Real World Chemical Spectroscopic Data,” 19th International Conference on Pattern Recognition (ICPR 2008), Tampa, 811 December 2008, pp. 14. [34] S. Chen, K. Jeong and W. Härdle, “Support Vector Regres sion Based GARCH Model with Application to Forecast ing Volatility of Financial Returns,” SFB 649 Discussion Paper 2008014. http://edoc.huberlin.de/series/sfb649papers/200814/PD F/14.pdf [35] S. Chen, K. Jeong and W. Härdle, “Recurrent Support Vector Regression for a Nonlinear ARMA Model with Applications to Forecasting Financial Returns,” SFB 649 Discussion Paper 2008051. [36] J. A. K. Suykens and J. Vandewalle, “Recurrent Least Squares Support Vector Machines,” IEEE Transactions on Circuits and Systems I, Vol. 47, No. 7, 2000, pp. 1109 1114. doi:10.1109/81.855471 Copyright © 2011 SciRes. JILSA
Recurrent Support and Relevance Vector Machines Based Model with Application to Forecasting Volatility of Financial Returns Copyright © 2011 SciRes. JILSA 241 [37] P. H. Ou and H. Wang, “Predict GARCH Based Volatil ity of Shanghai Composite Index by Recurrent Relevant Vector Machines and Recurrent Least Square Support Vector Machines,” Journal of Mathematics Research, Vol. 2, No. 2, 2010. [38] G. Bekaert and C. R. Harvey, “Emerging Equity Market Volatility,” Journal of Financial Economics, Vol. 43, No. 1, 1997, pp. 2978. doi:10.1016/S0304405X(96)008896 [39] R. Aggarwal, C. Inclan and R. Leal, “Volatility in Emerg ing Stock Markets,” Journal of Financial and Quantita tive Analysis, Vol. 34, No. 1, 1999, pp. 3355. doi:10.2307/2676245 [40] J. Ledolter, “The Effects of Additive Outliers on the Fore casts from ARIMA Models,” International Journal of Forecasting, Vol. 5, No. 2, 1989, pp. 231240. doi:10.1016/01692070(89)900903 [41] R. F. Engle and G. G. J. Lee, “A Permanent and Transi tory Component Model of Stock Returns Volatility,” Dis cussion Paper 9244R, University of California, San Die go, 1993. [42] P. H. Franses, D. van Dijk and A. Locas, “Short Patches of Outliers, ARCH and Volatility Modeling,” Discussion Paper 98057/4, Tinbergen Institute, Erasmus University, Rotterdam, 1998. [43] P. H. Franses and H. Ghijsels, “Additive Outliers, GARCH and Forecasting Volatility,” International Journal of Fore casting, Vol. 15, No. 1, 1999, pp. 19. doi:10.1016/S01692070(98)000533 [44] A. J. Fox, “Outliers in Time Series,” Journal of the Royal Statistical Society, Series B, Vol. 34, No. 3, 1972, pp. 350 363. [45] C. Chen and L. Liu, “Joint Estimation of Model Parame ters and Outlier Effects in Time Series,” Journal of Ame rican Statistical Association, Vol. 88, No. 421, 1993, pp. 284297. doi:10.2307/2290724 [46] M. E. Tipping, “Relevance Vector Machine,” Microsoft Research, Cambridge, 2000. [47] M. E. Tipping, “Bayesian Inference: An Introduction to Principles and Practice in Machine Learning,” Advanced Lectures on Machine Learning, Vol. 3176/2004, 2004, pp. 4162. doi:10.1007/9783540286509_3 [48] P. R. Hansen and A. Lunde, “A Forecast Comparison of Volatility Models: Does Anything Beat a GARCH (1, 1)?” Journal of Applied Econometrics, Vol. 20, No. 7, 2005, pp. 873889. [49] J. D. Hamilton, “Time Series Analysis,” Princeton Uni versity Press, Saddle River, 1997. [50] W. Enders, “Applied Econometric Time Series,” 2nd Edi tion, John Wiley & Sons, New York, 2004. doi:10.1016/S03050483(01)000263 [51] F. E. H. Tay and L. Cao, “Application of Support Vector Machines in Financial TimeSeries Forecasting,” Omega, Vol. 29, No. 4, 2001, pp. 309317. [52] M. Thomason, “The Practitioner Method and Tools: A Basic Neural NetworkBased Trading System Project Re visited (Parts 1 and 2),” Journal of Computational Intel ligence in Finance, Vol. 7, No. 3, 1999, pp. 3645. [53] M. Thomason, “The Practitioner Method and Tools: A Basic Neural NetworkBased Trading System Project Re visited (Parts 3 and 4),” Journal of Computational Intel ligence in Finance, Vol. 7, No. 3, 1999, pp. 3548. [54] C. Brooks, “Predicting Stock Index Volatility: Can Mar ket Volume Help?” Journal of Forecasting, Vol. 17, No. 1, 1998, pp. 5980. doi:10.1002/(SICI)1099131X(199801)17:1<59::AIDFO R676>3.0.CO;2H [55] I. A. Moosa, “Exchange Rate Forecasting: Techniques and Applications,” Macmillan Press LTD, Lonton, 2000. [56] H. Theil, “Principles of Econometrics,” Wiley, New York, 1971. [57] A. K. Bera and C. M. Jarque, “An Efficient LargeSample Test for Normality of Observations and Regression Re siduals,” Australian National University Working Papers in Econometrics, 40, Canberra, 1981.
