^{1}

^{*}

^{1}

^{*}

^{1}

^{*}

^{1}

^{*}

This paper simultaneously investigates variable selection and imputation estimation of semiparametric partially linear varying-coefficient model in that case where there exist missing responses for cluster data. As is well known, commonly used approach to deal with missing data is complete-case data. Combined the idea of complete-case data with a discussion of shrinkage estimation is made on different cluster. In order to avoid the biased results as well as improve the estimation efficiency, this article introduces Group Least Absolute Shrinkage and Selection Operator (Group Lasso) to semiparametric model. That is to say, the method combines the approach of local polynomial smoothing and the Least Absolute Shrinkage and Selection Operator. In that case, it can conduct nonparametric estimation and variable selection in a computationally efficient manner. According to the same criterion, the parametric estimators are also obtained. Additionally, for each cluster, the nonparametric and parametric estimators are derived, and then compute the weighted average per cluster as finally estimators. Moreover, the large sample properties of estimators are also derived respectively.

In real application, the analysis of cluster data arises in various research areas such as biomedicine and so on. Without loss of generality, the data are clustered into classes in terms of the objects which have certain similar property. For example, focus on the same confidence interval as a cluster. Numerous parametric approaches are applied to the analysis of cluster data, and with the rapid development of computing techniques, nonparametric and semiparametric approaches have attained more and more interest. See the work of Sun et al. [

Consider the semiparametric partially linear varying-coefficient model which is a useful extension of partially linear regression model and varying-coefficient model over all clusters, it satisfies

where

Obviously, when m = 1, model (1) reduces to semiparametric partially linear varying-coefficient model. A series of literature (You and Chen [

However, in practice, responses may often not be available completely because of various factors. For example, some sampled units are unwilling to provide the desired information, and some investigators gather incorrect information caused by careless and so on. In that case, a commonly used technique is to introduce a new variable

Due to the practicability of the missing responses estimation, semiparametric partially linear varying-coefficient model with missing responses has attracted many authorsâ€™ attention, such as Chu and Cheng [

It is worth pointing out that there is little work concerning both missing and cluster data especially in semiparametric partially linear varying-coefficient model. If ignore the difference of clusters, it leads the predictors of response values Y far away from the true values and the estimators have poor robustness. Therefore, it is necessary to take cluster data into consideration with the purpose of improving estimation efficiency. For each cluster, introduce group lasso to semiparametric partially linear varying-coefficient model respectively on the basis of complete case data. In order to automatically select variables and conduct estimation simultaneously, lasso is a popular technique which has attracted many authorsâ€™ attention such as Tibshirani [

The rest of the paper is organized as follows. The use of the applied method is given in Section 2. In Section 3, the theoretical properties are provided. Conclusions are shown in Section 4. Finally, the proofs of the main results are relegated to Appendix.

Due to there exist missing responses, for simplicity, focus on the case where

In this situation, if the parametric component

where

Similarity, consider the jth cluster data firstly, given any index value

According to

with respect to

Due to it is assumed that the last

where

It is well known that, there exist many computational algorithms for the lasso-type problems such as local quadratic approximation, the least angle regression and many others. For simplicity, this article describes here an easy implementation based on the idea of the local quadratic approximation. Specifically, the implementation is based on an iterative algorithm with

be the KLASSO estimate obtained in the mth iteration j cluster. Then, the loss function in (6) can be locally approximated by

whose minimizer is given by

where

Furthermore, for each cluster and each group, by using weighted mean idea to gain the finally estimator of coefficient vector

where

In terms of the above estimator of nonparametric component and according to the same criterion, the lasso estimation of parametric components

where

The following assumptions are needed to prove the theorems for the proposed estimation methods.

Assumption 1. The random variable U has a bounded support

Assumption 2. For each

Assumption 3. There is an

Assumption 4.

Assumption 5. The function K(.) is a symmetric density function with compact support.

Lemma 1. Suppose that the Assumptions of (A1)-(A5) hold,

Lemma 2. If (A1)-(A5),

The proof of Lemma 1 and Lemma 2 can be shown in Wang and Xia [

Suppose that the Assumptions (A1)-(A5) hold. For j th cluster, let

Theorem 1. Assume (A1)-(A5),

With the purpose of considering the oracle property, define the orale estimators as follows:

Theorem 2. Suppose that the assumptions are satisfied, if

In the case where

model can be consistently identified. Due to there exists a great challenge to select p shrinkage parameters, thus as shown in Zou [

where

where

Obviously, the effective sample size

Note that

Theorem 3. Selection Consistency. Suppose that Assumptions (A1)-(A5) hold, the tuning parameter

In this paper, it mainly discusses the shrinkage estimation of semiparametric partially linear varying-coefficient model under the circumstance that there exist missing responses for cluster data. Combined the idea of complete-case data, this paper introduces group lasso into semiparametric model with different cluster respectively. The new method simultaneously conducts variable selection and model estimation. Meanwhile, the technique not only reduces biased results but also improves the estimation efficiency. Finally, combined the idea of weighted mean, the nonparametric and parametric estimators are derived. The BIC criterion as tuning parameter selection is well applied in this artice. Furthermore, the properties of asymptotic normality and consistency are also derived theoretically.

This work is supported by the National Natural Science Foundation of China (61472093). This support is greatly appreciated.

MingxingZhang,JiannanQiao,HuaweiYang,ZixinLiu, (2015) Shrinkage Estimation of Semiparametric Model with Missing Responses for Cluster Data. Open Journal of Statistics,05,768-776. doi: 10.4236/ojs.2015.57076

Proof. Based on Lemma 2 and as shown in Hunter and Li [

For simplify, we follow (8) and

Due to each diagonal component of

where

Proof. As is well known,

That is to say,

where

where