1. Introduction

OJS

Open Journal of Statistics

2161-718X

Scientific Research Publishing

10.4236/ojs.2017.74050

OJS-78762

Articles

Physics&Mathematics

Univariate Time-Series Analysis of Second-Hand Car Importation in Zambia

Stanley

Jere

¹^*Bornwell

Kasense

¹Bwalya

Bupe Bwalya

Department of Mathematics and Statistics, Mulungushi University, Kabwe, Zambia

* E-mail:sjere@mu.ac.zm(SJ);

21072017

07047187307, July 201726, August 2017 29, August 2017

2014

This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/

Zambia largely depend s on the international second-hand car (SHC) market for their motor vehicle supply. The importation of Second hand Cars in Zambia presents a time series problem. The data used in this paper is monthly data on SHC importation from 1^st January, 2014 to 31^st December, 2016. Data was analyzed using Exponential Smoothing ( ES) and Autoregressive Integrated Moving Average (ARIMA) models. The results showed that ARIMA (2, 1, 2) was the best fit for the SHC importation since its errors were smaller than those of the SES, DES and TES. The four error measures used were Root-mean-square error ( RMSE), Mean absolute error ( MAE), Mean percentage error (MPE) and Mean absolute percentage error ( MAPE). The forecasts were also produced using the ARIMA (2, 1, 2) model for the next 18 months from January 2017. Although there is percentage increase of 90.6% from November 2015 to December 2016, the SHC importation generally is on the decrease in Zambia with percentage change of 59.5% from January 2014 to December 2016. The forecasts also show a gradual percentage decrease of 1.12% by June 2018. These results are more useful to policy and decision makers of Government departments such as Zambia Revenue Authority (ZRA) and Road Development Agency (RDA) in a bid to plan and execute their duties effectively.

Zambia Importation Second Hand Car Exponential Smoothing Models ARIMA Models Forecasting

1. Introduction

Over the last two decades private vehicle ownership in the developing countries has increased at an unprecedented pace. Between 1990 and 2005 the total number of registered vehicles in developing countries rose from 110 million to 210 million, and by some estimates it is forecast to reach 1.2 billion by 2030 [1] . Rising incomes explain a large share of this growth; as people get richer, they can afford the personal mobility that an automobile confers. High-income countries export large numbers of second-hand vehicles to low-income countries, and this trade will probably grow [2] . Zambia, being one of the developing countries, has experienced strong economic growth over the last decade and the country’s growth outlook is also positive [3] . The sustained positive growth of the Zambian economy has resulted in many shifts in consumption patterns of Zambian households. The economic reality of Zambia is that the majority of the population is middle class and hence middle income earners. This economic reality has forced Zambians to depend on the second-hand market for their motor vehicle supply. This is supported by [3] who noted that consumers with less purchasing power are more likely to be able to afford to buy second-hand motor vehicles. In addition, the car purchasing pattern in most developing countries has been high due to a rapid increase in ICT usage-Internet and mobile penetration, rising GDP and an emerging middle-class society [4] .

The importation of Second-hand Cars in Zambia presents a time series problem. There are several techniques that use time series but in this study we shall only concentrate on the Exponential Smoothing (ES) and Autoregressive Integrated Moving Average (ARIMA) models. In [5] , various smoothing techniques discussed include: The Simple Exponential Smoothing (SES) which is applied when the pattern is nearly horizontal; The Double Exponential Smoothing (DES) also known as the Holt’s Exponential Smoothing (HES) applied when data shows a trend; The Holt Winters (HW) also known as Triple Exponential Smoothing (TES) in which the data shows both trend and seasonality. The Best fit of the three models will be compared with the ARIMA model depending on whether the data used will exhibit a level and/or trend and/or seasonality. The ARIMA model is another method that is used to model and forecast time series data. The ARIMA models are also known as the “Box-Jenkins” approach following the work of Box and Jenkins [6] . This paper therefore, focuses on the major tools of decision making called univariate time-series-models.

2. Methodology

Below is the flowchart of the methodology.

Two main classes of models are considered in this paper: The Exponential Smoothing (ES) and Autoregressive Integrated Moving Average (ARIMA) models. The first class involves the SES, DES and TES models. The three models will be analysed and the best fit model will be chosen depending on whether the data used will exhibit a level and/or trend and/or seasonality. The second class involves the ARIMA models with the following model-building process: Tentative identification of a model, Estimation of parameters in the identified model and Diagnostic checks. The Best fit from the two classes will finally be compared to

choose the model for forecasting (Figure 1).

2.1. Exponential Smoothing (ES) Models2.1.1. Simple Exponential Smoothing (SES)

The SES is applied when the data pattern is nearly horizontal, and shows no particular trend or seasonal variation exists in previous data sets. For the series ϕ 1 , ϕ 2 , ⋯ , ϕ t the forecast for the preceding value ϕ t + 1 , say #Math_5#, is based on the weights 1 − α and α to the recent observation ϕ t and forecast ϕ ¯ t respectively. Where α is the smoothing constant called alpha, ϕ t is the actual value for period t, ϕ ¯ t is the forecast value for period t. The model is of the form

ϕ ¯ t + 1 = ϕ ¯ t + α ( ϕ t − ϕ ¯ t ) , 0 < α < 1 and t > 0. (1)

The value of α is subjectively such that a value close to zero is for smoothing out unwanted cyclical and irregular components and a value close to one is for forecasting.

2.1.2. Double Exponential Smoothing (DES)

This technique is used when the data exhibits a trend in its pattern. If you have a time series that can be described using an additive model with increasing or decreasing trend and no seasonality. The model is

ϕ ¯ t = α ϕ t + ( 1 − α ) ( ϕ ¯ t − 1 + β t − 1 ) , 0 < α < 1 , (2)

β t = θ ( ϕ ¯ t − ϕ ¯ t − 1 ) + ( 1 − θ ) β t − 1 , 0 < θ < 1 , (3)

ϕ ^ t + m = ϕ ¯ t + β t m (4)

where ϕ t is the actual value in time t, ϕ ¯ t is the level of series at time t, β t is the slope (trend) of the time series at time t. α and β ( = 0.1 , 0.2 , ⋯ , 0.9 ) are the smoothing coefficient for level and smoothing coefficient for trend respectively. The best values of α and β correspond to the minimum mean square error (MSE).

2.1.3. Triple Exponential Smoothing (TES)

The TES model is applied when time series data exhibit seasonality. It incorporates three smoothing equations; first for the level, second for trend and third for seasonality. The Triple exponential smoothing model is:

ϕ ¯ t = α ϕ t S t − p + ( 1 − α ) ( ϕ ¯ t − 1 + β t − 1 ) , 0 < α < 1 , (5)

β t = θ ( ϕ ¯ t − ϕ ¯ t − 1 ) + ( 1 − θ t − 1 ) β t − 1 , 0 < θ < 1 , (6)

S t = γ ϕ t ϕ ¯ t + ( 1 − γ ) S t − p , 0 < γ < 1 , (7)

So we have our prediction for time period T + τ :

ϕ ^ T + τ = ( ϕ ¯ T + τ θ T ) S T (8)

where: ϕ ¯ T is the smoothed estimate of the level at time T, θ T is the smoothed estimate of the change in the trend value at time T, S T is the smoothed estimate of the appropriate seasonal component at T. α, β and γ are the level, trend and seasonal smoothing parameters respectively. ϕ ¯ t is the smoothed level at time t, θ t is the change in the trend at time t, S t is the seasonal smooth at time t and p is the number of seasons per year.

2.2. Autoregressive Integrated Moving Average (ARIMA) Model

The ARIMA model has the following stages: identification, estimation, diagnosis and prediction. “I” stands for integrated process which implies that the process needs to undergo differentiation and that, upon completion of the modelling, the results undergo an integration process to produce final predictions and estimates [7] . The function representing the ARIMA model is denoted ARIMA (p, d, q), which produces a stationary function ARMA (p, q) upon differentiation with respect to time. In the ARIMA (p, d, q), p stands for the order of autoregressive (AR) part, d stands for the number of times the data needs to be differenced to become stationary and q stands for the moving average (MA) part. The R statistical package was used to perform the ARIMA modelling of identification, estimation, diagnosis and prediction. The expressions for MA, AR and ARMA are:

AR model:

Y ^ t = ϑ 1 Y t − 1 + ϑ 2 Y t − 2 + ⋯ + ϑ p Y t − p + ε t = ∑ i = 1 p ϑ i Y t − i + ε t , (9 )

MA model:

Y ^ t = φ 1 ε t − 1 + φ 2 ε t − 2 + ⋯ + φ q ε t − q = ∑ i = 1 q φ i ε t − i , (10)

and ARMA model:

Y ^ t = ∑ i = 1 p ϑ i Y t − i + ε t + ∑ i = 1 q φ i ε t − i (11)

where ϑ t is the auto-regressive parameter at time t, ε t is the error term at time t and φ t is the moving-average parameter at time t.

2.3. Assumption: Stationarity

The stationarity assumption implies that the mean, variance and autocorrelation structures do not change over time. Stationarity will mean a flat looking series, without trend, constant variance over time and no periodic fluctuations (seasonality). However, this assumption of stationarity applies to ARIMA models and not ES models. When the data is found to be non-stationary, the first difference (d = 1) will be used. Only in extreme cases will second difference (d = 2) be applied.

2.4. Model-Selection Criteria

Four model-selection metrics to evaluate the performance of the estimated Exponential Smoothing models and the estimated ARIMA model are used. The best fit model is one with a high number of smaller errors. These errors are; the Root Mean Square Error (RMSE), the Mean Absolute Percentage Error (MAPE), the Mean Percentage Error (MPE) and the Mean Absolute Error (MAE).

Table 1 shows how the errors measures are calculated.

3. Results and Discussion

The data collected was called into R version 3.3.3 to perform the necessary analysis as outlined in the subsections to follow. Figure 2 show a plot of the original SHC imports data.

Figure 2 indicates a trend.

3.1. Exponential Smoothing Output

Using the appropriate coding in R, the following output was automatically generated.

3.1.1. Simple Exponential Smoothing

The R output for the SES model was as shown in Table 2. The alpha level was

Table 1 Model accuracy metrics

Criteria	Formula	Criteria	Formula
RMSE		MAPE
MPE		MAE

Table 2 Model Information for Simple Exponential Smoothing

Model information:
Smoothing parameters:	Initial states:	sigma:	AIC		AICc	BIC
alpha = 0.9104	l = 5521.3676	473.3778	578.5190	579.2690		583.2696

Table 3 Model Information for Double Exponential Smoothing

Model information:
Smoothing parameters:	Initial states:	sigma:	AIC	AICc	BIC
alpha = 0.8006 beta = 0.0004	l = 6033.9228 b = −118.9081	465.3511	581.2877	583.2877	589.2053

estimated at α = 0.9104 with initial state, l = 5521.3676 and AIC = 578.5190

And the fitted model for this result took the form of

ϕ ¯ t + 1 = 0.9104 ϕ t + 0.0896 ϕ ¯ t (12)

3.1.2. Double Exponential Smoothing

The R output for the DES model was as shown in Table 3. The level and trend components were estimated at α = 0.8006 and β = 0.0004 respectively, with initial states, l = 6033.9228 and b = − 118.9081 . AIC = 581.2877 .

The following equations constituted the fitted DES model for SHC importation using Equations ((2) and (3)).

ϕ ¯ t = 0.8006 ϕ t + 0.1994 ( ϕ ¯ t − 1 + β t − 1 ) , (13)

β t = 0.0004 ( ϕ ¯ t − ϕ ¯ t − 1 ) + 9.9996 β t − 1 ,

β t = 0.0004 ( ϕ ¯ t − ϕ ¯ t − 1 ) + 9.9996 β t − 1 , (14)

3.1.3. Triple Exponential Smoothing

Table 4 shows the R output for the HW model. The smoothing parameters level, trend and gamma were estimated at α = 0.8006 , β = 0 and γ = 1 respectively. The coefficients are; a = 2158.93914 and.

Using Equations (5)-(7), we fitted the HW model for SHC imports as;

Table 4 Model Information for Triple Exponential Smoothing

Model information:
Smoothing parameters:	Coefficients:
alpha: 0.7706147 beta: 0 gamma: 1	[,1] a 2158.93914 b −116.18546

ϕ ¯ t = 0.7706147 ϕ t S t − p + 0.2293853 ( ϕ ¯ t − 1 + β t − 1 ) , (15)

β t = β t − 1 , (16)

S t = ϕ t ϕ ¯ t (17)

3.1.4. Choice of Appropriate Exponential Smoothing Technique

Figure 3 shows the plots for three fitted ES techniques and original data models for easy of comparisons and choosing.

Figure 3(a), which shows TES, is eliminated easily because, by graphical inspection, it does not closely mimic the time plot for observed data as good as the other two. Now the choice was to be made between Figure 3(b) and Figure 3(c) which clearly look so closely alike and both were mimicking the observed data plot quite well. Therefore, to make a good choice the AIC for both were calculated and compared with the model giving a smaller AIC being chosen.

Clearly the AICs in Table 5 show that the SES was a better fit than DES. Hence the appropriate ES technique of the three ES compared was chosen to be SES.

3.2. Autoregrassive Integrated Moving Average3.2.1. Model Identification and Selection

To model an ARIMA, a time plot is the first step. Figure 4 shows a time plot of the SHC imports for d = 0 and d = 1. Figure 4(b) is as a result of non-stationarity nature of the observed data as evidenced by Figure 4(a) and the ACF and PACF plots in Figure 5. ARIMA modelling requires that observed data be stationary and if not, it must be made stationary.

Hence Figure 4(b) and Figure 5(c) and Figure 5(d) which are as a result of first difference, that is d = 1.

Model selection requires that the ACF and PACF plots for d = 1 in Figure 4 be examined to establish the most suitable ARIMA. But the ACF and PACF plots did not give clear indication of significant spikes at any one lag. Hence, several tentative ARIMA models and their respective AICs were examined as shown in Table 6. Table 6 shows ARIMA (2, 1, 2) was chosen as the best fit of the tentative ARIMA models examined. Although the first six had smaller AICs, their parameters were found to be insignificant. ARIMA (2, 1, 2) had all its parameters estimated significant as Table 7 shows.

3.2.2. Estimation

When estimating the parameters, R gave the following output for ARIMA (2, 1, 2) in Table 7(a). Then their significance was tested by use of p-value (see Table 7(b) for p-values of each parameter).

The parameters found significant were AR (1), AR (2), MA (1), and MA (2) at

Table 5 AIC for SES and DES

Model	AIC	Ranking
SES	578.5190	1
DES	589.2877	2

Table 6 Measure of Accuracy for selected ARIMA models

Tentative model	ARIMA (0, 1, 1)	ARIMA (1, 1, 0)	ARIMA (2, 1, 0)	ARIMA (1, 1, 1)	ARIMA (0, 1, 2)	ARIMA (1, 1, 2)	ARIMA (2, 1, 2)	ARIMA (3, 1, 0)
AIC	535.51	535.55	537.31	537.44	537.81	538.81	538.93	539.29
Rank	1	2	3	4	5	6	7	8
						**	****
Tentative model	ARIMA (2, 1, 1)	ARIMA (0, 1, 3)	ARIMA (1, 1, 3)	ARIMA (3, 1, 1)	ARIMA (0, 1, 4)	ARIMA (4, 1, 0)	ARIMA (4, 1, 1)	ARIMA (1, 1, 4)
AIC	539.3	539.29	540.77	540.78	541.12	541.28	542.72	543.28
Rank	9	10	11	12	13	14	15	16
Tentative model	ARIMA (3, 1, 2)	ARIMA (3, 1, 3)
AIC	543.29	544.17
Rank	17	18

Note: *Number of significant parameters and lesser prediction errors

Table 7 (a) Model Information for ARIMA (2,1,2); (b). p-values for estimated coefficients

(b)

Model information:
parameters:	s.e.	sigma²	log likelihood	AIC
ar1 = −1.0536 ar2 = −0.9947 ma1 = 1.0907 ma2 = 0.9983	0.0881 0.0258 0.1881 0.1942	198,508	−264.47	538.93

Variables	Coefficients	p-value
AR(1)	−1.0536	0*
AR(2)	−0.9947	0*
MA(1)	1.0907	0.000000006720324*
MA(2)	0.9983	0.0000002724817*

Note: *implies p-value < 0.05 hence significant coefficient.

5% significance level. Hence the fitted ARIMA (2, 1, 2) using equation 11 was;

X ⌢ t = 1.0907 ε t − 1 + 0.9983 ε t − 2 − 1.0536 X t − 1 − 0.9947 X t − 2 (18)

Figure 6 is a plot of the fitted model to the observed SCH imports which shows that the model fluctuates so closely to the actual SHC imports.

3.2.3. Diagnostic Checking

The model with best fit was identified by analysis of residuals to ensure they form a white noise process. The ACF of residual, the Q-Q plot and the histogram of residuals were used to show that the residuals of the fit form a white noise process. Figure 7 below shows that the residual are white noise and all p-values of the Ljung Box test are greater than 0.05. Hence ARIMA (2, 1, 2) is indeed the best fit model.

3.3. Discussion

The preceding sections revealed that of the three Exponential Smoothing techniques used for this analysis that is SES, DES and TED, SES was chosen as fitting the SHC imports data better than DES and TES. Its fitted model was estimated to be

F t + 1 = 0.9104 Y t + 0.0896 F t .

It was also revealed that ARIMA (2, 1, 2) fitted the data well as compared to other tentative ARIMA models suggested. ARIMA (2, 1, 2) was estimated to be

Y ^ t = 1.0907 ε t − 1 + 0.9983 ε t − 2 − 1.0536 Y t − 1 − 0.9947 Y t − 2 .

But then the question remains as to what is the best fit for the SHC imports

Table 8 Criteria for selecting the better model between SES, DES, TES and ARIMA

Measures of accuracy (Errors)	SES	DES	TES	ARIMA
RMSE	480.0931	465.3511	580.6247	439.3114*
MAE	358.1159	366.8639	372.0239	325.0751*
MPE	−5.300821*	−0.09889986	0.7725993	−4.472449
MAPE	15.39877	15.67945	18.14711	13.46418*
Model ranking	2	3	4	1

Note: Smaller error (*) implies better fit.

data of all the four considered in this report as highlighted in Figure 1. The accuracy of each fit was evaluated by using four metrics, as discussed in the preceding section. Each approach was applied to determine and rank the performances of the models for the given time series. Table 8 summarizes the four models and their forecasting performances.

The results indicate that the ARIMA model performs better than either of the other models for this given time series. The ARIMA (2, 1, 2) has more smaller prediction errors than the SES and so it was rightfully concluded that ARIMA (2, 1, 2) is the best model fit for the SHC imports data. Thus it can be used to even forecast future imports of SHCs.Note, however, that although the SES model exhibits the second best forecast after that of the ARIMA model, the performance of each model relies on the data used.

Here, it should be noted that differences between their performances are related to the differences between the methods of determining forecasts in the ES and in the ARIMA models. The forecasting method in the ES models relies on a weighted average of the past observed values in which the weights decline exponentially. This basically implies that the data for more recent observations contribute significantly more than the previous data does. The ARIMA model, however, has three parts: autoregression, integration and moving average, with the future value of a variable being a linear combination of the past values and the associated errors.

4. Forecasting

Forecasting is usually the last stage in time series analysis as stated in Figure 1. It plays a significant role in planning and decision making to policy makers. When both current and future events are taken into account, near perfect to perfect decisions are made by those in whom powers are bestowed of decision making. Thus, no matter how uncertain forecasts might appear, they need not be ignored and decision maker are compelled never to ignore forecasts because of their vital nature on the entire process. Hence, Table 9 shows forecasts of 18 months (from January 2017 to June 2018).

Figure 8 is a graphical representation of the forecast for ARIMA (2, 1, 2) for a future period of 18 months starting at January 2017.

5. Conclusion

Zambia largely depends on the international second-hand car market for their motor vehicle supply. In this paper, monthly time series data on second hand car

Table 9 ARIMA (2, 1, 2) forecasts for the next 18 months

Time (months)	Point Forecast	Lo 80	Hi 80	Lo 95	Hi 95
Jan 2017	2045.993	1463.52478	2628.462	1155.18458	2936.802
Feb 2017	2047.141	1217.04098	2877.242	777. 61237	3316.670
Mar 2017	2237.907	1236.01893	3239.796	705.65127	3770.163
Apr 2017	2035.770	876.03471	3195.505	262.10811	3809.431
May 2017	2059.000	757.79151	3360.209	68.97329	4049.027
Jun 2017	2235.582	818.66531	3652.498	68.59512	4402.568
Jul 2017	2026.425	493.22452	3559.625	−318.40260	4371.252
Aug 2017	2071.159	428.86885	3713.450	−440.50729	4582.826
Sep 2017	2232.065	496.66342	3967.467	−422.00285	4886.133
Oct 2017	2018.035	185.98431	3850.086	−783.84482	4819.915
Nov 2017	2083.496	159.69855	4007.294	−858.69851	5025.691
Dec 2017	2227.411	223.49336	4231.330	−837.31682	5292.140
Jan 2018	2010.667	−77.89577	4099.231	−1183.51437	5204.849
Feb 2018	2095.888	−73.16495	4264.941	−1221.39227	5413.168
Mar 2018	2221.684	−18.82067	4462.188	−1204.87199	5648.239
Apr 2018	2004.377	−312.46350	4321.218	−1538.92478	5547.679
May 2018	2108.213	−281.03488	4497.461	−1545.82639	5762.253
Jun 2018	2214.954	−239.45714	4669.366	−1538.74416	5968.653

(SHC) importation was analyzed using SES, DES, TES and ARIMA techniques. The quality of all the techniques was determined by comparing each one of the fitted model’s predictive power with the observed data. The results showed that ARIMA (2, 1, 2) was the best fit for the SHC importation because its errors were smaller than those of the SES, DES and TES. The four error measures used were RMSE, MAE, MPE and MAPE. The forecasts were also produced using the ARIMA (2, 1, 2) model for the next 18 months from January 2017. Although there is percentage increase of 90.6% from November 2015 to December 2016, the SHC importation generally has been on the decrease in Zambia with percentage change of 59.5% from January 2014 to December 2016. The forecasts also show a gradual percentage decrease of 1.12% by June 2018. Ultimately, these results can be used by Government departments like Zambia Revenue Authority and Road Development Agency in the bid to plan and execute their duties effectively.

Cite this paper

Jere, S., Kasense, B. and Bwalya, B.B. (2017) Univariate Time-Series Analysis of Second-Hand Car Importation in Zambia. Open Journal of Statistics, 7, 718-730. https://doi.org/10.4236/ojs.2017.74050

References1

Dargay, J., Dermot, G. and Martin, S. (2007) Vehicle Ownership and Income Growth, Worldwide: 1960-2030. Energy Journal, 28, 143-170. https://doi.org/10.5547/ISSN0195-6574-EJ-Vol28-No4-7

Davis, L.W. and Kahn, M.E. (2011) Cash for Clunkers? The Environmental Impact of Mexico’s Demand for Used Vehicles. Access, No. 38, 15. http://www.accessmagazine.org/articles/spring-2011/cash-clunkers-environmental-impact-mexicos-demand-used-vehicles/

Chikuba, Z. (2014) Zambia Institute for Policy Analysis and Research (ZIPAR). Used Motor Vehicle Imports and the Impact on Transportation in Zambia. Working Paper No. 21.

Kamau, H. (2014) Trade in Second-Hand Vehicles: Sustainable Transport Africa.

Gardener, E.S. (1985) Exponential Smoothing—The State of the Art. Journal of Forecasting, 4, 1-28. https://doi.org/10.1002/for.3980040103

Box, G. and Jenkins, G. (1970) Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco.

Tularam, G.A. and Saeed, T. (2016) Oil-Price Forecasting Based on Various Univariate Time-Series Models. American Journal of Operations Research, 6, 226-235. https://doi.org/10.4236/ajor.2016.63023