Open Journal of Statistics
Vol.07 No.05(2017), Article ID:80097,25 pages
10.4236/ojs.2017.75062
Performance of Existing Biased Estimators and the Respective Predictors in a Misspecified Linear Regression Model
Manickavasagar Kayanan1,2, Pushpakanthie Wijekoon3
1Deparment of Physical Science, Vavuniya Campus of the University of Jaffna, Vavuniya, Sri Lanka
2Postgraduate Institute of Science, University of Peradeniya, Peradeniya, Sri Lanka
3Department of Statistics and Computer Science, University of Peradeniya, Peradeniya, Sri Lanka
Copyright © 2017 by authors and Scientific Research Publishing Inc.
This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).
http://creativecommons.org/licenses/by/4.0/
Received: September 19, 2017; Accepted: October 28, 2017; Published: October 31, 2017
ABSTRACT
In this paper, the performance of existing biased estimators (Ridge Estimator (RE), Almost Unbiased Ridge Estimator (AURE), Liu Estimator (LE), Almost Unbiased Liu Estimator (AULE), Principal Component Regression Estimator (PCRE), r-k class estimator and r-d class estimator) and the respective predictors were considered in a misspecified linear regression model when there exists multicollinearity among explanatory variables. A generalized form was used to compare these estimators and predictors in the mean square error sense. Further, theoretical findings were established using mean square error matrix and scalar mean square error. Finally, a numerical example and a Monte Carlo simulation study were done to illustrate the theoretical findings. The simulation study revealed that LE and RE outperform the other estimators when weak multicollinearity exists, and RE, r-k class and r-d class estimators outperform the other estimators when moderated and high multicollinearity exist for certain values of shrinkage parameters, respectively. The predictors based on the LE and RE are always superior to the other predictors for certain values of shrinkage parameters.
Keywords:
Misspecified Regression Model, Generalized Biased Estimator, Generalized Predictor, Mean Square Error Matrix, Scalar Mean Square Error
1. Introduction
It is well known that the misspecification of the linear model is unavoidable in practical situations. Misspecification may occur due to including some irrelevant explanatory variables or excluding some relevant explanatory variables. Excluding some relevant explanatory variables from a regression model causes these variables to become part of the error term. In this case the mean of error term of the model is not zero. Furthermore, the excluded variables may be correlated with the variables in the model. According to the assumptions of linear regression model, the error term of the model should be independently and identically normally distributed with mean zero and variance . Therefore, one or more assumptions of the linear regression model will be violated when the model is misspecified, and hence the estimators become biased and inconsistent.
Further, it is well known that the ordinary least square estimator (OLSE) does not hold its desirable properties if multicollinearity exists among the explanatory variables in the regression model. To overcome this problem, biased estimators based on the sample model , or by combining sample model with the exact or stochastic restrictions have been used in the literature. The motivation of this article is to examine the performance of the existing biased estimators in the misspecified linear regression model when multicollinearity exists.
Sarkar [1] examined the consequences of omission of some relevant explanatory variables from a linear regression model when multicollinearity exists among the explanatory variables. Recently, Şiray [2] and Wu [3] discussed the efficiency of r-d class estimator and r-k class estimator over some existing estimators, respectively. Teräsvirta [4] was discussed the case of biased estimation with stochastic linear restrictions in the misspecified regression model due to including an irrelevant variable with incorrectly specified prior information. Later, the efficiency of Mixed Regression Estimator (MRE) under misspecified regression model due to excluding relevant variable with correctly specified prior information was discussed by Mittelhmmer [5] , Ohtani and Honda [6] , Kadiyala [7] and Trenkler and Wijekoon [8] . Further, the superiority of MRE over the Ordinary Least Squares Estimator (OLSE) under the misspecified regression model with incorrectly specified sample and prior information was discussed by Wijekoon and Trenkler [9] . Hubert and Wijekoon [10] have considered the improvement of Liu estimator under a misspecified regression model with stochastic restrictions.
In this paper, the performance of existing biased estimators of the linear regression model based on the sample information such as Principal Component Regression Estimator (PCRE) introduced by Massy [11] , Ridge Estimator (RE) defined by Hoerl and Kennard [12] , r-k class estimator proposed by Baye and Parker [13] , Almost Unbiased Ridge Estimator (AURE) proposed by Singh et al. [14] , Liu Estimator (LE) proposed by Liu [15] , Almost Unbiased Liu Estimator (AULE) proposed by Akdeniz and Kaçiranla r [16] , r-d class estimator proposed by Kaçıranlar and Sakallıoğlu [17] were examined under misspecified regression model without combining any prior information to the sample model. A generalized form to represent all the above estimators was used for comparing these estimators and their respective predictors easily.
The rest of this article is organized as follows. The model specification and respective OLSE are written in section 2. In section 3, generalized form to represent the estimators under the misspecified regression model is proposed. In section 4, the Mean Square Error Matrix (MSEM) and Scalar Mean Square Error (SMSE) comparison between two generalized estimators and their respective predictors are considered. In section 5, the numerical example and Monte Carlo simulation are given to illustrate the theoretical results in SMSE criterion. Finally, some concluding remarks are stated in section 6. The references and Appendix are given at the end of the paper.
2. Model Specification
Assume that the true regression model is given by
(2.1)
where is the vector of observations on the dependent variable, and are the and matrices of observations on the regressors, and are the and vectors of unknown coefficients, is the vector of disturbances with mean vector zero and dispersion matrix .
Let us say that the researcher misspecifies the regression model by excluding regressors as
(2.2)
According to Singh et al. [14] , by applying the spectral decomposition of the symmetric matrix (since is a positive definite matrix) we have , where is the orthogonal matrix and being the ith eigenvalue of . Let be the remaining column of having deleted columns where . Hence, .
Let and then models (2.1) and (2.2) can be written in canonical form as
(2.3)
(2.4)
The OLS estimator of model (2.4) is given by
(2.5)
Using (2.3), can be written as
(2.6)
Hence, the expectation vector and the dispersion matrix of are given by
(2.7)
and
(2.8)
respectively.
3. Modified Biased Estimators, Predictors and Its Generalized Form
To combat multicollinearity several researchers introduce different types of biased estimators in place of OLSE, and seven such estimators are RE, AURE, LE, ALUE, PCRE, r-k class estimator and r-d class estimator given bellow respectively:
(3.1)
(3.2)
(3.3)
(3.4)
(3.5)
(3.6)
(3.7)
where , , and is the OLS estimator of the model (2.1).
Further, Xu and Yang [18] showed that Equations (3.5)-(3.7) could be rewritten as
(3.8)
(3.9)
(3.10)
In the case of misspecification, the RE, AURE, LE, AULE, PCRE, r-k class estimator and r-d class estimator for the model (2.4) can be written as
(3.11)
(3.12)
(3.13)
(3.14)
(3.15)
(3.16)
(3.17)
respectively.
where , , , , , and .
It is clear that and are positive definite, and , and are non-negative definite.
Now consider
and
Hence, and are also positive definite.
Since RE, AURE, LE, AULE, PCRE, r-k class estimator and r-d class estimator are based on so we can use the following generalized form:
(3.18)
where is positive definite matrix if it stands for , , and , and it is non-negative definite matrix if it stands for and .
The expectation vector, bias vector, dispersion matrix and the mean square error matrix can be calculated with
(3.19)
(3.20)
(3.21) (3)
Based on 3.19 to 3.21, the respective expectation vector, bias vector and dispersion matrix of the RE, AURE, LE, AULE, PCR, r-k class estimator and r-d class estimator can easily be obtained and given in Table A1 in the Appendix.
By using the approach of Kadiyala [7] , and Equations ((2.3) and (2.4)), the generalized prediction function can be defined as follows:
(3.22)
(3.23)
where is the actual value and is the corresponding predictor value.
The MSEM of the generalized predictor is given by
(3.24)
Note that the predictors based on the OLSE, RE, AURE, LE, AULE, PCRE, r-k class estimator and r-d class estimator are denoted by and respectively.
4. Mean Square Error Comparisons
4.1. Mean Square Error Matrix (MSEM) Comparison of Generalized Estimators
If two generalized biased estimators and are given, the estimator is said to be superior to with respect to MSEM sense if and only if .
Let us consider
Now let , , , then the above difference can be written as
(4.1)
The following theorem can be stated for superiority of over with respect to the MSEM criterion.
Theorem 1: If is positive definite, is superior to in MSEM sense when the regression model is misspecified due to excluding relevant variables if and only if and , where is the largest eigenvalue of , , and .
Proof: Assume that is positive definite, which implies .
Due to Lemma 3 (see Appendix), if , where is the largest eigenvalue of .
Hence, according to Lemma 2 (see Appendix), is non-negative if and only if , which completes the proof.
4.2. Scalar Mean Square Error (SMSE) Comparison of Generalized Estimators
If two generalized biased estimators and are given, the estimator is said to be superior to with respect to SMSE sense if and only if .
The following theorem can be stated for superiority of over with respect to the SMSE criterion.
Theorem 2: is superior to when the regression model is misspecified due to excluding relevant variable with respect to SMSE sense if and only if
where , and
.
Proof: Let us consider
Using (4.1) we can write
Then is superior to if .
if and only if
which completes the proof.
4.3. Mean Square Error Matrix (MSEM) Comparison of Generalized Predictors
If two generalized predictors and are given, the estimator is said to be superior to with respect to MSEM sense if and only if .
Let us consider
The following theorem can be stated for superiority of over with respect to the MSEM criterion.
Theorem 3: is superior to in MSEM sense when the regression model is misspecified due to excluding relevant variables if and only if , and , where
, , stands for column space of and is an independent choice of g-inverse of .
Proof: Using (4.1) MSEM difference of the two generalized predictor can be written as
(4.2)
After some straight forward calculation, equation (5.1) can be written as
where , and .
Due to Lemma 1 (see Appendix), is non-negative definite matrix if and only if , and , where , , stands for column space of and is an independent choice of g-inverse of , which completes the proof.
Note that, obviously the conditions derived under Theorem 1 are sufficient for . Consequently we may say that there are situations where is superior to in MMSE sense.
4.4. Scalar Mean Square Error (SMSE) Comparison of Generalized Predictors
Using (4.2) SMSE difference of the two generalized predictor can be written as
The following theorem can be stated for superiority of over with respect to the SMSE criterion.
Theorem 4: is superior to in SMSE sense when the regression model is misspecified due to excluding relevant variables if and only if
Proof: is superior to if .
if and only if
which completes the proof.
Based on Theorem 1, Theorem 2, Theorem 3 and Theorem 4 we can obtain the corresponding results for each of the biased estimators and respective predictors by plugging the values for , , , and . The results are summarized in Tables A2-A6 in the Appendix.
5. Illustration of Theoretical Results
5.1. Numerical Example
To illustrate our theoretical results, we consider a dataset which gives total National Research and Development Expenditures―as a Percent of Gross National Product by Country: 1972-1986. It represents the relationship between the dependent variable the percentage spent by the United States and the four other independent variables , , and . The variable represents the percent spent by former Soviet Union, that spent by France, that spent by West Germany, and that spent by the Japan. The data was discussed in Gruber [19] , and the data has been analysed by Akdeniz and Erol [20] , Li and Yang [21] and among others. Now we assemble the data as follows:
,
Note that the eigenvalues of the are 312.932, 0.754, 0.045, 0.037, 0.002, the condition number is 299 and Variance Inflation Factor (VIF) values are 6.91, 21.58, 29.75, 1.79. Since condition number is greater than 100 and first three VIF values are greater than 5, which implies the existence of serious multicollinearity in the data set.
After the standardization of the data, the corresponding OLS estimator is
For the standardized data (since there are ten observations and four parameters), we obtain
Table 1 shows the estimated SMSE values of OLSE, RE, AURE, LE, AULE, PCRE, r-k class estimator and r-d class estimator for the regression model when , , and with respect to shrinkage parameters (k/d), where denotes the number of variable in the model and p denotes the number of misspecified variables.
Table 2 shows the estimated SMSE values of the predictor of OLSE, RE, AURE, LE, AULE, PCRE, r-k class estimator and r-d class estimator for the regression
Table 1. Estimated SMSE values of the estimators.
From Table 1, it can be observed that the minimum SMSE of the estimators depends on the values of shrinkage parameters and the level of misspecification, which is agreed with our theoretical findings.
Table 2. Estimated SMSE values of the predictors.
According to Table 2, it can be observed that the predictors behave differently than the respective estimators, which is also agreed with our theoretical findings.
model when , , and for some selected shrinkage parameters (k/d). For simplicity we choose shrinkage parameter values k and d in the range (0, 1).
5.2. Monte Carlo Simulation Study
For further clarification, a Monte Carlo simulation study is done under different levels of misspecification using R 3.2.5. Following McDonald and Galarneau [22] , we can get explanatory variables as follows:
where is an independent standard normal pseudo random number, and is specified so that the theoretical correlation between any two explanatory variables is given by . A dependent variable is generated by using the equation
where is a normal pseudo random number with mean zero and variance one. In this study, we choose as the normalized eigenvector corresponding to the largest eigenvalue of for which . We consider the following set up to investigate the effects of different degrees of multicollinearity on the estimators:
, condition number = 6.06 and VIF = (4.84, 4.83, 4.82, 4.81, 4.87)
, condition number = 20.12 and VIF = (46.09, 46.12, 46.02, 45.97, 46.56)
, condition number = 64 and VIF = (458.3, 459.2, 458.1, 457.8, 463.4)
Three different sets of observations are considered by selecting , and when , where denotes the number of variable in the model and p denotes the number of misspecified variables. For simplicity, we select values k and d in the range .
The simulation is repeated 2000 times by generating new pseudo random numbers and the simulated SMSE values of the estimators and predictors are obtained using the following equations:
and respectively.
Tables 3-5 are showing the estimated SMSE values of the estimators for the regression model when , and , and , and for the selected values of shrinkage parameters (k/d), respectively. Tables 6-8 are showing the corresponding estimated SMSE values of the predictors for the regression model, respectively.
From Tables 3-8, we can summarise the results as shown in Table 9.
6. Conclusions
In this study, a common form of superiority conditions were obtained for comparison among the biased estimators (RE, AURE, LE, AULE, PCRE, r-k class estimator and r-d class estimator) and their predictors by using a generalized form for the misspecified linear regression model when explanatory variables are multicollinearity. Furthermore, the theoretical findings were analyzed by using a numerical example and a Monte Carlo simulation study.
The simulation study shows that the LE and RE outperform the other estimators when weak multicollinearity exist, and RE, r-k class and r-d class estimators
Table 3. Estimated SMSE values of the estimators when and
According to Table 3, it can be observed that, is superior over other estimators when and is superior over other estimators when under diferent levels of misspecifications for the weak multicollinearity.
Table 4. Estimated SMSE values of the estimators when and
According to Table 4, it can be observed that, is superior over other estimators when and is superior over other estimators when under different levels of misspecifications for the moderated multicollinearity.
Table 5. Estimated SMSE values of the estimators when and
According to Table 5, it can be observed that, is superior over other estimators when under all level of misspecifications, is superior over other estimators when and is superior over other estimators when under and , and is superior over other estimators when and is superior over other estimators when under for high multicollinearity.
Table 6. Estimated SMSE values of the predictors when and
According to Table 6, it can be observed that, is superior over other estimators when and is superior over other estimators when under different levels of misspecifications.
Table 7. Estimated SMSE values of the predictors when and
According to Table 7, it can be observed that, is superior over other estimators when and is superior over other estimators when under different levels of misspecifications.
Table 8. Estimated SMSE values of the predictors when and
According to Table 8, it can be observed that, is superior over other estimators when and is superior over other estimators when under different levels of misspecifications.
Table 9. Shrinkage parameter ranges for superior estimators and predictors.
outperform the other estimators when moderated and high multicollinearity exist for selected values of shrinkage parameters, respectively. It can also be noted that, the predictors based on the LE and RE are always superior to the other predictors for selected values of shrinkage parameters when multicollinearity exists among explanatory variables.
One of the limitation of this study is that we assume the error variance is equal for all models even when the variables are omitted from the model.
Cite this paper
Kayanan, M. and Wijekoon, P. (2017) Performance of Existing Biased Estimators and the Respective Predictors in a Misspecified Linear Regression Model. Open Journal of Statistics, 7, 876-900. https://doi.org/10.4236/ojs.2017.75062
References
- 1. Sarkar, N. (1989) Comparisons among Some Estimators in Misspecified Linear Models with Multicollinearity. Annals of the Institute of Statistical Mathematics, 41, 717-724. https://doi.org/10.1007/BF00057737
- 2. Siray, G.ü. (2015) r-d Class Estimator under Misspecification. Communications in Statistics—Theory and Methods, 44, 4742-4756. https://doi.org/10.1080/03610926.2013.835421
- 3. Wu, J. (2016) Superiority of the r-k Class Estimator over Some Estimators in a Misspecified Linear Model. Communication in Statistics—Theory and Methods, 45, 1453-1458. https://doi.org/10.1080/03610926.2013.863934
- 4. Terasvirta, T. (1980) Linear Restrictions in Misspecified Linear Models and Polynomial Distributed Lag Estimation. Department of Statistics, University of Helsinki, Helsinki.
- 5. Mittelhammer, R.C. (1981) On Specification Error in the General Linear Model and Weak Mean Square Error Superiority of the Mixed Estimator. Communications in Statistics—Theory and Methods, 167-176. https://doi.org/10.1080/03610928108828027
- 6. Ohtani, K. and Honda, Y. (1984) On Small Sample Properties of the Mixed Regression Predictor under Misspecification. Communications in Statistics—Theory and Methods, 2817-2825. https://doi.org/10.1080/03610928408828863
- 7. Kadiyala, K. (1986) Mixed Regression Estimator under Misspecification. Economic Letters, 21, 27-30. https://doi.org/10.1016/0165-1765(86)90115-1
- 8. Trenkler, G. and Wijekoon, P. (1989) Mean Square Error Matrix Superiority of the Mixed Regression Estimator under Misspecification. Statistica, 49, 65-71.
- 9. Wijekoon, P. and Trenkler, G. (1989) Mean Square Error Matrix Superiority of Estimators under Linear Restrictions and Misspecification. Economics Letters, 30, 141-149. https://doi.org/10.1016/0165-1765(89)90052-9
- 10. Hubert, M. and Wijekoon, P. (2004) Superiority of the Stochastic Restricted Liu Estimator under Misspecification. Statistica, 64, 153-162.
- 11. Massy, F. (1965) Principal Components Regression in Exploratory Statistical Research. Journal of the American Statistical Association, 60, 234-266. https://doi.org/10.1080/01621459.1965.10480787
- 12. Hoerl, A. and Kennard, R. (1970) Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12, 55-67. https://doi.org/10.1080/00401706.1970.10488634
- 13. Baye, R. and Parker, F. (1984) Combining Ridge and Principal Component Regression: A Money Demand Illustration. Communications in Statistics—Theory and Methods, 13, 197-205. https://doi.org/10.1080/03610928408828675
- 14. Singh, B., Chaubey, Y.P. and Dwivedi, T.D. (1986) An Almost Unbiased Ridge Estimator. The Indian Journal of Statistics, 48, 342-346.
- 15. Liu, K. (1993) A New Class of Biased Estimate in Linear Regression. Communications in Statistics—Theory and Methods, 22, 393-402. https://doi.org/10.1080/03610929308831027
- 16. Akdeniz, F. and Kaciranlar, S. (1995) On the Almost Unbiased Generalized Liu Estimator and Unbiased Estimation of the Bias and MSE. Communications in Statistics—Theory and Methods, 24, 1789-1797. https://doi.org/10.1080/03610929508831585
- 17. Kaciranlar, S. and Sakallioglu, S. (2001) Combining the Liu Estimator and the Principal Component. Communications in Statistics—Theory and Methods, 30, 2699-2705. https://doi.org/10.1081/STA-100108454
- 18. Xu, W. and Yang, H. (2011) On the Restricted r-k Class Estimator and the Restricted r-d Class Estimator in Linear Regression. Journal of Statistical Computation and Simulation, 81, 679-691. https://doi.org/10.1080/00949650903471023
- 19. Gruber, M. (1998) Improving Efficiency by Shrinkage: The James-Stein and Ridge Regression Estimators. CRC Press, New York.
- 20. Akdeniz, F. and Erol, H. (2003) Mean Squared Error Matrix Comparisons of Some Biased Estimators in Linear Regression. Communications in Statistics—Theory and Methods, 2, 389-2413. https://doi.org/10.1081/STA-120025385
- 21. Li, Y. and Yang, H. (2010) A New Stochastic Mixed Ridge Estimator in Linear Regression Model. Statistical Papers, 51, 315-323. https://doi.org/10.1007/s00362-008-0169-5
- 22. McDonald, G.C. and Galarneau, D.I. (1975) A Monte Carlo Evaluation of Some Ridge-Type Estimators. Journal of the American Statistical Association, 70, 407-416. https://doi.org/10.1080/01621459.1975.10479882
- 23. Baksalary, J. and Kala, R. (1983) Partial Orderings between Matrices One of Which Is of Rank One. Bulletin of the Polish Academy of Sciences Mathematics, 31, 5-7.
- 24. Trenkler, G. and Toutenburg, H. (1990) Mean Square Error Matrix Comparisons between Biased Estimators: An Overview of Recent Results. Statistical Papers, 31, 165-179. https://doi.org/10.1007/BF02924687
- 25. Wang, S., et al. (2006) Matrix Inequalities. 2nd Edition, Chinese Science Press, Beijing.
Appendix
Lemma 1: (Baksalary and Kala [23] )
Let of type matrix, is a vector and is a positive real number. Then the following conditions are equivalent.
i)
2) , and , where stands for column space of B and is a independent choice of g-inverse of B.
Lemma 2: (Trenkler and Toutenburg [24] )
Let and be two linear estimator of . Suppose that is positive definite, then is non negative if and only if , where , and denote dispersion matrix, mean square error matrix and bias vector of respectively, .
Lemma 3: (Wang et al. [25] )
Let matrices , then if and only if , where is the largest eigenvalue of the matrix .
Table A1. Expectation vector, Bias vector and Dispersion matrix of the estimators.