The time series of share prices is a highly noised, non-stationary chaotic system which possesses both linear and non-linear characteristics. The alternative of either linear or non-linear prediction models is of its inherent limitation. The paper establishes an ARIMA and RBF-ANN combined model and makes a short-term prediction on the time series of CSI 300 index by choosing various typical input variables. Results show that the combined model with multiple input indicators, compared with single ARIMA model, single RBF-ANN model, or models with single input variable, is of higher precision.
Prediction model of share price or index is of both practical and theoretical significance. Y. Bai [
Autoregressive Integrated Moving Average (ARIMA) Model is a notable model in time series data prediction [
ARIMA model is to turn integrated time series into stationary time series, and then recovering its lagged value of dependent variables, present values and lagged values of stochastic error terms.
If the sequence
We can establish ARMA (p, q) Model, which is:
In this Equation,
From the view of operation, the ARIMA modeling idea of Box and Jenkins can be summarized into 6 steps:
1) To carry out a stationary test on the sequences (such as the ADF unit root test). When the sequences fail to meet the condition of smoothness, turn the sequence into stationary sequences via differential transform (or logarithm differential transform);
2) To determine the order of ARIMA model by generally using Auto Correlation Function (ACF) and Partial Auto Correlation Function (PACF) for preliminary order determination, as well as Akaike Information Criterion (AIC) and Schwarz Criterion (SC) for quantization order determination;
3) To estimate the model parameters and test the significance of parameters so as to evaluate their rationality and adjust the model;
4) To carry out hypothesis test so as to check whether the diagnosis of the residual sequence is white noise. However, for the combined model illustrated below, since the residual of ARIMA model shall be fitted by other models, it shall still work when this hypothesis test is not passed;
5) To carry out diagnostic analysis so as to confirm that the obtained model is in accordance with the observed data characteristics;
6) To use the adopted model to predict the analysis.
RBF Neural Network is a novel, effective feed forward artificial neural network with faster learning convergence rate. Theories show that RBF Neural Network can approximate any continuous rational function and possess the Best Approximation ability which the BP neural network doesn’t possess [
linear weighting to the output signals of the hidden layer.
RBF is usually defined as a monotone function of Euclidean distance between two random points in space. It is a non-negative, non-linear function with radial symmetry and bi-directional attenuation. The most common is Gaussian Function:
In the Equation,
In the Equation,
The learning process of RBF Neural Network can be classified into unsupervised and supervised learning. The unsupervised learning process is used to learn how to solve the center and variance of a primary function. The common K-means clustering algorithm is to cluster training samples and solve center of clustering
In the Equation:
Supervision of the learning stage aims to determine the link weight
The time series
First, we need to make prediction on
Next, the prediction on Sequence
Correspondingly, the output matrix is:
In the Equation: B is the lag operator;
To test the effectiveness of the combined model, this paper designs to use the previous historical data to predict the subsequent historical data, so that we can compare the prediction with the real data. Since it is considered that the change of Chinese stock market around the end of year 2014 to the beginning of year 2015 is especially striking and hard to forecast, this paper chooses data around this period to test the model, which are, the data of 190 trading days from July 1st, 2014, to April 10th, 2015.Considering that this model is aimed at predicting only one future closing price of the day after the last day of input data, but one-day prediction can hardly test the effectiveness of the model, this test carries out a dynamic prediction. For example, using data of the 1st to the 126th day to predict closing price of the 127th day, and using data of 2nd to the 127th day to predict closing price of the 128th day, and so on. By this method, we predict 64 days’ closing price, and compare them with the real data. The data source is the Wind database.
First, we need to determine the order of ARIMA (p, d, q) Model. From the view of the sequence chart drawn with Eviews 7.2, it’s clear that the stock data don’t conform to the characteristics of zero-mean equal variance. And the probability value of the ADF unit root test is 0.9999, obviously larger than 0.05, which means the sequence is not stable. According to the proportional change characteristics of price indexes, logarithm taking shall be done on data first and then first difference shall be done on the processed data. Now the probability value of the ADF unit root test is less than 0.0001. This means that the sequence passes the stationary test. So it’s determined that d = 1.
Then, it requires inspecting of the multiple-order ACF and PACF images of Sequence
According to the preliminary judgment from the picture, 2- and 4-order truncations appear on the ACF and PACF pictures. So four combinations, namely p = 2, 4 and q = 2, 4 are tried. According to the calculation of AIC and SC values, it is found that AIC = −6.06868 and SC = −5.92467 when p = 4, q = 2, being the smallest. So ARIMA (4, 1, 2) Model is used. Now, the t test probability value of all regression terms are less than 0.05, the critical value. So, ARIMA model, all of whose items are obviously effective, is obtained. By the LM order autocorrelation test of the residual error, it is found that the probability value (0.9999) is obviously greater than 0.05, showing that there’s no self-correlation element in the residual sequence. Via the inverse operation of first difference and logarithm taking, the forecasting sequence
Remove the predicted
Now we can predict the data behind with this model. When predicting the t-th day, data of the (t − 1)th day shall be the training set of the neural network. The predicted value can be treated in an anti-normalization way to obtain
In the Equation, # is the symbolic function, which means 1 for the positive number and 0 for the rest. “n” is total predicted number.
ups and downs is relatively accurate.
To inspect the superiority of the combined model, we also fit the price sequence respectively with ARIMA model and RBF neural network model. Relative errors of the predicting outcomes of the three models over the true values are in
Clearly, the error fluctuation degree of ARIMA and RBF-ANN combined model proposed by this paper reaches the least and the prediction preciseness reaches the highest with the superiority.
Furthermore, the effects of further prediction on models are inspected. For the prediction of the data on the t-th day, the paper respectively uses the first t − 1, t − 2, t − 3 and t − 4 data as the training set for prediction. The obtained MAPE is shown as
Model | ARIMA | RBF-ANN | Combined Model |
---|---|---|---|
MAPE | 1.5691% | 1.7772% | 0.9071% |
Training set | The first t − 1 | The first t − 2 | The first t − 3 | The first t − 4 |
---|---|---|---|---|
ARIMA Model MAPE | 1.5691% | 2.0427% | 2.4952% | 2.6728% |
Combined Model MAPE | 0.9071% | 1.6206% | 3.1094% | 3.5059% |
It can be seen that when the training set is lagged for above 2 phases, the forecast errors are obviously higher and the errors of the combined model are more than those of the ARIMA model. It fully verifies that the chaos characteristics of stock data increase dramatically as the forecast period goes up. The neural network model is more applicable to the capture of its short-term change rules. Therefore, it is right for the paper to choose to make short-term prediction on the future phase 1.
Furthermore, to inspect the effects of this paper to use multiple index combination input volumes such as the opening price, the highest and lowest price and transaction volume to enhance the prediction preciseness when constructing RBF neural network, the paper also carries out check experiments: only using the pure closing price sequence as the input volume to make combined model prediction under the same condition without considering other factors. Now, the predicted MAPE = 1.2624%, higher than the MAPE = 0.9071% predicted by the model in this paper. This means that the combination input added by this paper has effectively elevated the prediction accuracy and is totally necessary.
To make it meaningful, the paper also uses the combined model to predict the real future data, which is the closing price of November 2nd, 2015. The input data are the daily opening price, closing price, highest price, lowest price and transaction volume of 202 trading days from January 1st, 2015 to October 31st, 2015. The output price of the combined model is 3504.32. To apply this model for more future days, it has to take the dynamic prediction method illustrated in the part of model effectiveness test to predict future closing prices day by day.
Regarding the chaos and complexity of share price fluctuation, which possesses both linear and non-linear characteristics, it’s difficult to fit all the features of the information with single linear or non-linear prediction model. The paper proposes an ARIMA and RBF-ANN combined prediction model in which several prices and transaction volume indicators are chosen as combined input variables, hence endowing the model with high precision and good operability. An empirical study on short-term prediction on CSI 300 index is implemented to verify the superiority of the combined model over the other two, the precision of this combined model has obviously exceeded the single linear or non-linear model, as well as the prediction pattern with single closing price sequence as input variables. However the short-term prediction effects of the model are obviously better than long-term prediction effects, so relations between the share price and different factors, as well as the prediction on weekly and monthly data, have yet to be studied.
LyuxunYang,XiCheng, (2015) Predictive Analytics on CSI 300 Index Based on ARIMA and RBF-ANN Combined Model. Journal of Mathematical Finance,05,393-400. doi: 10.4236/jmf.2015.54033