Open Journal of Statistics
Vol.06 No.05(2016), Article ID:71409,11 pages
10.4236/ojs.2016.65071
Efficient Shrinkage Estimation about the Partially Linear Varying Coefficient Model with Random Effect for Longitudinal Data
Wanbin Li
School of Mathematics and Statistics, Yancheng Teachers University, Yancheng, China

Copyright © 2016 by author and Scientific Research Publishing Inc.
This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).
http://creativecommons.org/licenses/by/4.0/



Received: August 13, 2016; Accepted: October 18, 2016; Published: October 21, 2016
ABSTRACT
In this paper, an efficient shrinkage estimation procedure for the partially linear varying coefficient model (PLVC) with random effect is considered. By selecting the significant variable and estimating the nonzero coefficient, the model structure specification is accomplished by introducing a novel penalized estimating equation. Under some mild conditions, the asymptotic properties for the proposed model selection and estimation results, such as the sparsity and oracle property, are established. Some numerical simulation studies and a real data analysis are presented to examine the finite sample performance of the procedure.
Keywords:
Partially Linear Varying Coefficient Model, Mixed Effect, Penalized Estimating Equation

1. Introduction
With the effort to reduce the risk of model misspecification, more flexible nonlinear and non/semiparametric models have been proposed for independent and subject- dependent data. See, for example, [1] introduced the varying coefficient model. Reference [2] and [3] studied the partially linear varying coefficient (PLVC) model and single index model, respectively, for longitudinal data. For longitudinal data, see [4] for an intensive review.
As a natural extension of [5] , which used marginal model for longitudinal data analysis, a random effect method is developed when considering within-subject correlation and further shrinkage estimation. In the literatures in longitudinal data study, random effect method has received relatively enough attention. See, for example, [6] - [9] and so on. Some advantages of random effect method were mentioned in [3] including its computation efficiency.
For the special case of PLVC model with random effect, an important problem is to choose the significant covariant variable. Shrinkage estimation based on regularization has attracted lots of interest. See, for example, [10] - [13] . Also, some extensions of the variable selection under the regularization framework to varying coefficient models include, [5] [14] - [16] . In this article, the variable selection problem for PLVC model with random effect is investigated. Because of the obvious simplicity and wide usage, a penalized estimating equation based shrinkage estimation procedure is introduced, following the idea of [17] .
The rest of this article is organized as follows. In Section 2, the model, estimation procedure and statistical properties of the estimators are introduced. In Section 3, the practical computational issues are discussed and some numeric simulations and a real data analysis for the finite sample performance are illustrated in Section 4.
2. Estimation and Asymptotic Property
2.1. Penalized Estimating Equation
Let
be the observed data associated with the ith subject in a longitudinal study. We consider the partially linear varying coefficient mixed effect model (PLVCMeM)
(1)
where
is a
coefficient vector,
is a
vector of random effect with mean 0 and covariance matrix D,
is an unknown smoothing
function vector and
is a random variable with mean 0 and variance
.
Assume that
is a set of B-spline basis functions of order
with
quasi-uniform internal knots. Then, each
can be expressed with a linear combination of normalized B-spline basis function
where
is the spline coefficient vector. Therefore, with the given spline basis
, model (1) can be approximated as
(2)
where 



where 
A primary goal for (1) is to explore useful information for Z and X, it is important to select and estimate the nonzero coefficients in 


where


where 

Among many choices of the penalty function

where 

2.2. Estimators of the Variance Components
An efficient estimation for the parameters of interest in model (5) depends on estimators for the variance component, therefore a consistent estimators for them is required. Suppose that the variance covariance matrix for model (1) is

where 









where by the estimator 




Therefore, the estimator for 

2.3. Asymptotic Properties about the Estimators
In this section, we investigate the asymptotic behavior of the estimators for the parametric, nonparametric and variance component as well. Throughout the article, the following assumptions are needed to facilitate the technical details, although they may not be the weakest conditions. Let 



(C1) For some

(C2) The density function f(u), which genernates the sequence of design points
(C3) The number of measurements m is bounded.
(C4) For an

(C5) Let 


Firstly, we present that the estimators given by (8) are asymptotic normal.
Theorem 1 Suppose that conditions (C1)?(C5) hold, then

where
To obtain the consistency and oracle property about the estimators, additional conditions are required as follows, which are similar to that used in [10] and [15] .
(C6) Let


(C7) 


Theorem 2 Under the conditions (C1)-(C7) and the number of knots 
i) 
ii) 
where
Theorem 2 ensures the convergence rate of the weighted estimators for not only the parametric component, but also the nonparametric component. Furthermore, the following two theorems provide us with the oracle property of the consistent estimators.
Theorem 3 Under the conditions (C1)-(C7) and the number of knots
Let


as


i) 
ii) 
According to Remark 1 in [10] , the variable selection procedure can specify the model correctly and efficiently. Next theorem further shows us that under some conditions, the nonzero coefficient estimators of the parametric component have the same asymptotic distribution as that based on true model.
Let





Theorem 4 Under the conditions (C1)-(C7) and the number of knots 

where 
3. Practical Computational Issues
Denote that 



Step 1. Calculate the estimator 

Step 2. Solve the penalized estimators 
Step 3. Replace the estimator 

Remark 1. This modified penalized estimation procedure inherits the computational efficiency and sparsity of Lasso type solutions. And the computational details can be referred to [13] .
Although our theoretical results give technical conditions on 






where 





4. Empirical Study
We now use two examples to illustrate the superiority of the proposed weighted shrinkage estimation to that one without considering within-subject correlation.
Example 1. Consider a partially linear varying coefficient mixed effect model

where 












To illustrate the estimation accuracy of the proposed method, we define generalized mean squared errors (GMSE) and the square root of average square error (RASE) to be


And for the purpose of a intensive comparison, in addition to the proposed method in this article, two other estimation methods are also required, that are the “naive” approach based on the working independence method, and the “ideal” one based on the true within-subject covariance. And the estimator, obtained by the “naive” approach, the proposed method in this article and the “ideal” approach, are denoted to be


The results about variable selection, based on 100 replications, are included in Table 1. Table 1 shows that the proposed method can select the true model quite well and leads to smaller GMSE and RASE values. Table 2 reports a satisfactory estimation for the variance component. As the sample increase, the performance becomes better.
Secondly, the variance, bias and mean square error of the estimators for the nonzero parameters, denoted to be “V”, “Bias” and “MSE”, are listed in Table 3. From Table 3, all the three methods can obtain consistent estimators with small bias. Moreover, the values of “V” and “MSE” also argue that the newly proposed method and the “ideal” method can derive a more efficient estimator than the “naive” method does. What’s more, the asymptotic normality of the estimators for the parametric component is shown with Quantile-Quantile plot in Figure 1.
Finally, in Figure 2, all of the curves, estimated by all the three methods, fit the true nonparametric curve well. However, their 95% confidence intervals have different interval length, with the proposed method in this article and the “ideal” method showing a similar and smaller length. All these scenarios indicate a great improvement of the estimation with the proposed method in this article.
Table 1. Simulation results for the variable selection.
Table 2. Simulation results for the estimators of the variance components
Table 3. Simulation results ×100 for 
Figure 1. Q-Q plot of the estimator
Example 2. To illustrate the effectiveness of the proposed estimation procedure, we shall apply it to the analysis of a longitudinal AIDS data set, which is reported by [18] and comprises HIV status of 283 homosexual males who were infected with HIV during a follow-up period between 1984 and 1991. The focus in this application is to probe into the trend of the mean CD4 percentage depletion over time and evaluate the effects of cigarette smoking, preHIV infection CD4 percentage and age at infection on the mean CD4 percentage after infection.
For the jth measurement of the ith subject, let 


Figure 2. The estimated average curve for the nonparametric component 




where the baseline of CD4 percentage 

By the analysis, there are two variables 


5. Conclusion and Discussion
This article considered an efficient shrinkage estimation for the partially linear varying coefficient models with random effect. Variance component model was employed to take within subject correlation into consideration. Some asymptotic properties, such as convergence rate, consistency and oracle property, were established. Moreover, the effectiveness was further illustrated by a real data analysis. As a more ambitious goal, we would try to investigate the performance of variable selection issue for mixed effect
Figure 3. The estimators for the mean CD4 percentage
model under a more general within-subject covariance matrix.
Acknowledgements
This work was partially supported by the National Statistical Science Research Project of China [Grant No. 2014LZ14 and 2015LZ27] and the Yancheng Teachers’ Professor and Doctors’ Research Project [Grant No. 14YSYJB0108].
Cite this paper
Li, W.B. (2016) Efficient Shrinkage Estimation about the Partially Linear Varying Coefficient Model with Random Effect for Longitudinal Data. Open Journal of Statistics, 6, 862-872. http://dx.doi.org/10.4236/ojs.2016.65071
References
- 1. Hastie, T. and Tibshirani, R. (1993) Varying Coefficient Models. Journal of the Royal Statistical Society: Series B, 55, 757-796.
- 2. Ahmad, I., Leelahanon, S. and Li, Q. (2005) Efficient Estimation of a Semiparametric Partially Linear Varying Coefficient Model. Ann. Statist., 33, 258-283.
http://dx.doi.org/10.1214/009053604000000931 - 3. Pang, Z. and Xue, L.G. (2012) Estimation for the Single-Index Models with Random Effects. Computational Statistics and Data Analysis, 56, 1837-1853.
http://dx.doi.org/10.1016/j.csda.2011.11.007 - 4. Xue, L.G. and Zhu, L.X. (2007) Empirical Likelihood for a Varying Coefficient Model with Longitudinal Data. Journal of the American Statistical Association, 102, 642-652.
http://dx.doi.org/10.1198/016214507000000293 - 5. Zhao, P.X. and Xue, L.G. (2010) Variable Selection for Semiparametric Varying Coefficient Partially Linear Errors-in-Variable Models. Journal of Multivariate Analysis, 101, 1872-1883.
http://dx.doi.org/10.1016/j.jmva.2010.03.005 - 6. Ruckstuhl, A.F., Welsh, A.H. and Carroll, R.J. (2000) Nonparametric Function Estimation of Estimation of the Relationship between Two Repeatedly Measurement Variables. Statistica Sinica, 10, 51-71.
- 7. Wang, N. (2003) Margingal Nonparametric Kernel Regression Accounting for within Subject Correlation. Biometrika, 90, 43-52.
http://dx.doi.org/10.1093/biomet/90.1.43 - 8. Su, L. and Ullah, A. (2007) More Efficient Estimation of Nonparametric Panel Data Models with Random Effects. Economics Letters, 96, 375-380.
http://dx.doi.org/10.1016/j.econlet.2007.02.018 - 9. You, J. and Zhou, X. (2009) Partially Linear Models and Polynomial Spline Approximations for the Analysis of Unbalanced Panel Data. Journal of Statistical Planning and Inference, 139, 679-695.
http://dx.doi.org/10.1016/j.jspi.2007.04.037 - 10. Fan, J. and Li, R. (2001) Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association, 96, 710-723.
http://dx.doi.org/10.1198/016214501753382273 - 11. Zou, H. (2006) The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association, 101, 1418-1429.
http://dx.doi.org/10.1198/016214506000000735 - 12. Yuan, M. and Lin, Y. (2006) Model Selection and Estimation in Regression with Grouped Variables. Journal of the Royal Statistical Society: Series B, 68, 49-67.
http://dx.doi.org/10.1111/j.1467-9868.2005.00532.x - 13. Zou, H. and Li, R.Z. (2008) One Steps Sparse Estimates in Noncave Penalized Likelihood Models. Annals of Statistics, 36, 1509-1533.
http://dx.doi.org/10.1214/009053607000000802 - 14. Liang, H. and Li, R.Z. (2009) Variable Selection for Partially Linear Models with Measurement Errors. Journal of the American Statistical Association, 104, 234-248.
http://dx.doi.org/10.1198/jasa.2009.0127 - 15. Zhao, P.X. and Xue, L.G. (2009) Variable Selection for Semiparametric Varying Coefficient Partially Linear Models. Statistics and Probability Letters, 79, 2148-2157.
http://dx.doi.org/10.1016/j.spl.2009.07.004 - 16. Li, G., Lai, P. and Lian, H. (2014) Variable Selection and Estimation for Partially Linear Single-Index Models with Longitudinal Data. Statistics & Computing, 25, 579-593.
http://dx.doi.org/10.1007/s11222-013-9447-8 - 17. Johnson, B., Lin, D. and Zeng, D. (2008) Penalized Estimating Functions and Variable Selection in Semiparametric Regression Models. Journal of the American Statistical Association, 103, 672-680.
http://dx.doi.org/10.1198/016214508000000184 - 18. Kaslow, R.A., Ostrow, D.G., Detels, R., Phair, J.P., Polk, B.F. and Rinaldo, C.R. (1987) The Multicenter AIDS Cohort Study: Rationale, Organization and Selected Characteristics of the Participants. American Journal of Epidemiology, 126, 310-318.
http://dx.doi.org/10.1093/aje/126.2.310
Appendix
Supplementary material related to this article can be asked for by email.














