Journal of Applied Mathematics and Physics
Vol.07 No.01(2019), Article ID:89889,11 pages
10.4236/jamp.2019.71008
Strong Consistency of Estimators under Missing Responses
Linran Zhang, Jingjing Zhang*
College of Science, University of Shanghai for Science and Technology, Shanghai, China
Copyright © 2019 by author (s) and Scientific Research Publishing Inc.
This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).
http://creativecommons.org/licenses/by/4.0/
Received: December 18, 2018; Accepted: January 12, 2019; Published: January 15, 2019
ABSTRACT
In this article, we focus on the semi-parametric error-in-variables model with missing responses: , , where are the response variables missing at random, are design points, are the potential variables observed with measurement errors , the unknown slope parameter and nonparametric component need to be estimated. Here we choose two different approaches to estimate and . Under appropriate conditions, we study the strong consistency for the proposed estimators.
Keywords:
Semi-Parametric Model, Error-in-Variables, Missing Responses, Strong Consistency
1. Introduction
Consider the following semi-parametric error-in-variables (EV) model
(1.1)
where are the response variables, are design points, are the potential variables observed with measurement errors , , are random errors with . is an unknown parameter that needs to be estimated. is a unknown function defined on close interval , is a known function defined on satisfying
(1.2)
where are also design points.
Model (1.1) and its special forms have gained much attention in recent years. When , are observed exactly, the model (1.1) reduces to the general semi-parametric model, which was first introduced by Engle et al. [1] . However, in many applications, there are often covariates measurement errors. So the EV models are somewhat more practical than the ordinary regression model. In addition, when are complete observed and , the model (1.1) reduces to the usual linear EV model, which has been studied by Liu and Chen [2] , Miao et al. [3] , Miao and Liu [4] , Fan et al. [5] and so on. For complete data, the model (1) itself has also been studied by many authors: See Cui and Li [6] , Liang et al. [7] , Zhou et al. [8] and so on. In recent years, the semi-parametric EV models have been widely concerned.
On the other hand, we often encounter incomplete data in the practical application of the models. In particular, some response variables may be missing, by design or by happenstance. For example, the responses may be very expensive to measure and only part of are available. Actually, missing of responses is very common in opinion polls, social-economic investigations, market research surveys and so on. Therefore, we focus our attention on the case that missing data occur only in the response variables. When can fully be observed, the model (1.1) reduces to the usual semi-parametric model which has been studied by many scholars in the literature: See Wang et al. [9] , Wang and Sun [10] , Bianco et al. [11] .
To deal with missing data, one method is to impute a plausible value for each missing datum and then analyze the results as if they are complete. In regression problems, common imputation approaches include linear regression imputation by Healy and Westmacott [12] , nonparametric kernel regression imputation by Cheng [13] , semi-parametric regression imputation by Wang et al. [9] , Wang and Sun [10] , among others. We here extend the methods to the estimation of and under the semi-parametric EV model (1.1). We obtain two approaches to estimate and with missing responses and study the strong consistency for the estimators.
In this paper, suppose we obtain a random sample of incomplete data from the model (1.1), where if is missing, otherwise . Throughout this paper, we assume that is missing at random. The assumption implies that and are independent. That is, . This assumption is a common assumption for statistical analysis with missing data and is reasonable in many practical situations.
The paper is organized as follows. In Section 2, we list some assumptions. The main results are given in Section 3. Some preliminary lemmas are stated in Section 4. Proofs of the main results are provided in Sections 5.
2. Assumptions
In this section, we list some assumptions which will be used in the main results. Here means for every , means as , while a.s. is stand for almost sure.
(A0) Let , and be sequences of independent random variables satisfying
i) , , , is known.
ii) , for some .
iii) , , are independent of each other.
(A1) Let in (2) be a sequence satisfying
i) .
ii) , where is a permutation of .
iii) .
(A2) and are continuous functions satisfying the first-order Lipschitz condition on the close interval .
(A3) Let be weight functions defined on [0, 1] and satisfy
i) a.s.
ii) a.s. for any .
iii) a.s.
(A4) The probability weight functions are defined on and satisfy
i) .
ii) , for any .
iii) .
Remark 2.1. Conditions (A0)-(A4) are standard regularity conditions and used commonly in the literature, see Härdle et al. [14] , Gao et al. [15] and Chen [16] .
3. Main Results
For model (1.1), we want to seek the estimators of and . The most natural idea is to delete all the missing data. Therefore, one can get model . If can be observed, we can apply the least squares estimation method to estimate the parameter . If the parameter is known, using the complete data , we can define the estimator of to be
where are weight functions satisfying (A3). On the other hand, under the condition of the semi-parametric EV model, Liang et al. [7] improved the least squares estimator (LSE) on the basis of the usual partially linear model, and employ the estimator of parameter to minimize the following formula:
Therefore, we can achieve the modified LSE of as follow:
(3.1)
where , . We substitute (3.1) into , then
(3.2)
Apparently, the estimators and are formed without taking all sample information into consideration. Hence, in order to make up for the missing data, we imply an imputation method from Wang and Sun [10] , and let
(3.3)
Therefore, Using complete data , similar to (3.1)-(3.2), one can get another estimators for and , that is
(3.4)
(3.5)
where , , are weight functions satisfying (A4).
Based on the estimators for and , we have the following results.
Theorem 3.1 Suppose that (A0)-(A3) are satisfied. For every , we have
a)
b)
Theorem 3.2 Suppose that (A0)-(A4) are satisfied. For every , we have
a)
b)
4. Preliminary Lemmas
In the sequel, let be some finite positive constants, whose values are unimportant and may change. Now, we introduce several lemmas, which will be used in the proof of the main results.
Lemma 4.1 (Baek ang Liang [17] , Lemma 3.1) Let , be independent random variables with . Assume that is a triangular array of numbers with and . If for some . Then
Lemma 4.2 (Hardle et al. [14] , Lemma A.3) Let be independent random variables with , finite variances and . Assume that is a sequence of numbers such that for some and for . Then
for .
Lemma 4.3
a) Let , where or . Let , where or . Then, (A0)-(A4) imply that and
b) (A0)-(A4) imply that , , and
c) (A0)-(A4) imply that and
Lemma 4.4 Suppose that (A0)-(A4) are satisfied. Then one can deduce that
One can easily get Lemma 4.3 by (A0)-(A4). The proof Lemma 4.4 is analogous to the proof of Theorem 3.1(b).
5. Proof of Main Results
Firstly, we introduce some notations, which will be used in the proofs below.
Proof of Theorem 3.1(a). From (3.1), one can write that
(5.1)
Thus, to prove a.s., we only need to verify that and for .
Step 1. We prove Note that
By Lemma 4.3(a), we have a.s. Hence, it suffices to verify that a.s. for . Applying (A0), taking , , in Lemma 4.2, we can verity that
(5.2)
where is a sequence of independent random variables satisfying and . Therefore, we obtain from (A0) and (5.2). On the other hand, taking in Lemma 4.1, we have
(5.3)
where is a sequence of independent random variables satisfying and . By (A0) and Lemma 4.3, taking , , in Lemma 4.2, one can also deduce that
(5.4)
Note that, from Lemma 4.3(a), (5.2) and (5.3), we have
(5.5)
(5.6)
(5.7)
Therefore, for (5.2)-(5.7), one can deduce that , which yields that
Therefore, by the Lemma 4.3(b), we can get that
Step 2. We verify that for . From (A0), we find out is a sequences of independent random variables with , , for some . Similar to (4), we deduce that
Meanwhile, from (A0)-(A3), Lemma 4.3, (5.2) and (5.3), one can achieve that
The proof of for is analogous. Thus, the proof of Theorem 3.1(a) is completed.
Proof of Theorem 3.1(b). From (3.2), for every , one can write that
Therefore, we only need to prove that a.s. for every and . From (A0)-(A3), Theorem 3.1(a), Lemma 4.3, (2) and (3), for every and any , one can get
Thus, the proof of Theorem 3.1(b) is completed.
Proof of Theorem 3.2(a). From (3.3)-(3.4), write that
Using a similar approach as step 1 in the proof of Theorem 3.1(a), one can get
Therefore, we only need to verify that for . From (A0)-(A4), Lemmas 4.2-4.4, Theorem 3.1(a), (5.2)-(5.4), we have
In the same way, from (A0)-(A4), Lemmas 4.2-4.4, (5.2) and (5.3), one can similarly deduce that for . Thus, the proof of Theorem 3.2(a) is completed.
Proof of Theorem 3.2(b). From (3.4), write that
Therefore, we only need to prove that a.s. for every and . From (A0)-(A4), Lemma 4.3-4.4, (5.2), (5.3), one can get
Meanwhile, the proof of for every and is analogous. Thus, the proof of Theorem 3.2(b) is completed.
Acknowledgements
The authors greatly appreciate the constructive comments and suggestions of the Editor and referee. This research was supported by the National Natural Science Foundation of China (11701368).
Conflicts of Interest
The authors declare no conflicts of interest regarding the publication of this paper.
Cite this paper
Zhang, L.R. and Zhang, J.J. (2019) Strong Consistency of Estimators under Missing Responses. Journal of Applied Mathematics and Physics, 7, 93-103. https://doi.org/10.4236/jamp.2019.71008
References
- 1. Engle, R.F., Granger, C.W.J., Rice, J. and Weiss, A. (1986) Semiparametric Estimation of the Relation between Weather and Electricity Sales. Journal of the American Statistical Association, 81, 310-320. https://doi.org/10.1080/01621459.1986.10478274
- 2. Liu, J.X. and Chen, X.R. (2005) Consistency of LS Estimator in Simple Linear EV Regression Models. Acta Mathematica Scientia, Series B, 25, 50-58. https://doi.org/10.1016/S0252-9602(17)30260-6
- 3. Miao, Y., Yang, G. and Shen, L. (2007) The Central Limit Theorem for LS Estimator in Simple Linear EV Regression Models. Communications in Statistics—Theory and Methods, 36, 2263-2272. https://doi.org/10.1080/03610920701215266
- 4. Miao, Y. and Liu, W. (2009) Moderate Deviations for LS Estimator in Simple Linear EV Regression Model. Journal of Statistical Planning and Inference, 139, 3122-3131. https://doi.org/10.1016/j.jspi.2009.02.021
- 5. Fan, G.L., Liang, H.Y., Wang, J.F. and Xu, H.X. (2010) Asymptotic Properties for LS Estimators in EV Regression Model with Dependent Errors. Advances in Statistical Analysis, 94, 89-103. https://doi.org/10.1007/s10182-010-0124-3
- 6. Cui, H.J. and Li, R.C. (1998) On Parameter Estimation for Semi-Linear Errors-in-Variables Models. Journal of Multivariate Analysis, 64, 1-24. https://doi.org/10.1006/jmva.1997.1712
- 7. Liang, H., Hardle, W. and Carrol, R.J. (1999) Estimation in a Semiparametric Partially Linear Errosr-in-Variables Model. The Annals of Statistics, 27, 1519-1935.
- 8. Zhou, H.B., You, J.H. and Zhou, B. (2010) Statistical Inference for Fixed-Effects Partially Linear Regression Models with Errors in Variables. Statistical Papers, 51, 629-650. https://doi.org/10.1007/s00362-008-0150-3
- 9. Wang, Q., Linton, O. and Hardle, W. (2004) Semiparametric Regression Analysis with Missing Response at Random. Journal of the American Statistical Association, 99, 334-345. https://doi.org/10.1198/016214504000000449
- 10. Wang, Q. and Sun, Z. (2007) Estimation in Partially Linear Models with Missing Responses at Random. Journal of Multivariate Analysis, 98, 1470-1493. https://doi.org/10.1016/j.jmva.2006.10.003
- 11. Bianco, A., Boente, G., Gonzlez-Manteiga, W. and Prez-Gonzlez, A. (2010) Estimation of the Marginal Location under a Partially Linear Model with Missing Responses. Computational Statistics & Data Analysis, 54, 546-564. https://doi.org/10.1016/j.csda.2009.09.028
- 12. Healy, M.J.R. and Westmacott, M. (1956) Missing Values in Experiments Analysis on Automatic Computers. Journal of Applied Statistics, 5, 203-206. https://doi.org/10.2307/2985421
- 13. Cheng, P.E. (1994) Nonparametric Estimation of Mean Functionals with Data Missing at Random. Journal of the American Statistical Association, 89, 81-87. https://doi.org/10.1080/01621459.1994.10476448
- 14. Hardle, W., Liang, H. and Gao, J.T. (2000) Partial Linear Models. Physica-Verlag, Heidelberg. https://doi.org/10.1007/978-3-642-57700-0
- 15. Gao, J.T., Chen, X.R. and Zhao, L.C. (1994) Asymptotic Normality of a Class of Estimators in Partial Linear Models. Acta Mathematica Sinica, 37, 256-268.
- 16. Chen, H. (1988) Convergence Rates for Parametric Components in a Partly Linear Model. The Annals of Statistics, 16, 136-146. https://doi.org/10.1214/aos/1176350695
- 17. Baek, J.I. and Liang, H.Y. (2006) Asymptotic of Estimators in Semi-Parametric Model under NA Samples. Journal of Statistical Planning and Inference, 136, 3362-3382. https://doi.org/10.1016/j.jspi.2005.01.008