**Open Journal of Statistics**

Vol.06 No.02(2016), Article ID:65859,15 pages

10.4236/ojs.2016.62021

Marginal Conceptual Predictive Statistic for Mixed Model Selection

Cheng Wenren^{1}, Junfeng Shang^{2*}, Juming Pan^{2}

^{1}Process Modeling Analytics Department, Bristol-Myers Squibb, New York, NY, USA

^{2}Bowling Green State University, Bowling Green, OH, USA

Copyright © 2016 by authors and Scientific Research Publishing Inc.

This work is licensed under the Creative Commons Attribution International License (CC BY).

Received 9 March 2016; accepted 23 April 2016; published 26 April 2016

ABSTRACT

We focus on the development of model selection criteria in linear mixed models. In particular, we propose the model selection criteria following the Mallows’ Conceptual Predictive Statistic (C_{p}) [1] [2] in linear mixed models. When correlation exists between the observations in data, the normal Gauss discrepancy in univariate case is not appropriate to measure the distance between the true model and a candidate model. Instead, we define a marginal Gauss discrepancy which takes the correlation into account in the mixed models. The model selection criterion, marginal C_{p}, called MC_{p}, serves as an asymptotically unbiased estimator of the expected marginal Gauss discrepancy. An improvement of MC_{p}, called IMC_{p}, is then derived and proved to be a more accurate estimator of the expected marginal Gauss discrepancy than MC_{p}. The performance of the proposed criteria is investigated in a simulation study. The simulation results show that in small samples, the proposed criteria outperform the Akaike Information Criteria (AIC) [3] [4] and Bayesian Information Criterion (BIC) [5] in selecting the correct model; in large samples, their performance is competitive. Further, the proposed criteria perform significantly better for highly correlated response data than for weakly correlated data.

**Keywords:**

Mixed Model Selection, Marginal C_{p}, Improved Marginal C_{p}, Marginal Gauss Discrepancy, Linear Mixed Model

1. Introduction

With the development in data science over the past decades, people become more aware of the complexity of data in real life. Univariate linear regression models with independent identically distributed (i.i.d.) Gaussian errors cannot achieve good fitness for some types of data, especially for the data with observations that are correlated. For instance, in longitudinal data, observations are usually recorded from the same individual over time. It is reasonable to assume that correlation exists among the observations from the same individual and linear mixed models are therefore appropriately utilized for modeling such data.

Since linear mixed models are extensively used, mixed model selection plays an important role in statistical literature. The aim of mixed model selection is to choose the most appropriate model from a candidate pool in the mixed model setting. To facilitate this task, a variety of model selection criteria are employed to implement the selection process.

In linear mixed models, a number of criteria have been developed to characterize model selection. The most widely used criteria are the information criteria such as the AIC [3] [4] and the BIC [5] . Sugiura [6] proposed a marginal AIC (mAIC) which involved the number of random effects parameters into the penalty term. Shang and Cavanagh [7] employed the bootstrap method to estimate the penalty term of mAIC for proposing two variants of AIC. For longitudinal data, a special case of linear mixed models, Azari, Li and Tsai [8] proposed a corrected Akaike Information Criterion (AICc). In the justification of AICc, the paper mainly handled the challenge initiated by the correlation matrix under certain conditions for the mixed models. Vaida and Blanchard [9] redefined the Akaike information based on the best linear unbiased predictor (BLUP) [10] - [12] for the random effects in the mixed models, and proposed a conditional AIC (cAIC). Dimova et al. [13] derived a series of variants of the Akaike Information Criterion in small samples for linear mixed models.

Another information criterion, BIC, can be considered as a Bayesian alternative to AIC. In linear mixed models, BIC is converted from marginal AIC by replacing the constant 2 in the penalty by, where N is the sample size (mBIC) [14] . Jones [15] proposed a measure of the effective sample size to replace the sample size in the penalty term of BIC, leading to a new criterion BIC_{J}.

We note that the BIC-type information criteria are derived using Bayesian approaches. Different from that, the AIC-type information selection criteria are justified from the frequentist perspective and based upon the information discrepancy. However, little research has relied on other discrepancy to propose criteria including Mallows’ C_{p} [1] [2] in linear mixed models. In fact, because of dissimilar derivation, each selection criterion has its own advantages, and no unique selection criterion can cover all the benefits for model selection. To further develop the selection criteria in the mixed modeling setting, we aim to justify the C_{p}-type ones relying on the Gauss discrepancy.

Mallows’ C_{p} [1] [2] in linear regression models targets to estimate the Gauss discrepancy between the true model and a candidate model. It serves as an asymptotically unbiased estimator of the expected Gauss discrepancy. Fujikoshi and Satoh [16] identified C_{p} in multivariate linear regression. Davies et al. [17] presented the estimation optimality of C_{p} in linear regression models. Cavanaugh et al. [18] provided an alternate version of C_{p}. The Gauss discrepancy is an L2 norm measuring the distance between the true model and a candidate model in linear models. To select the most appropriate model among competing fitted models, the candidate model leading to the smallest value of C_{p} is chosen. However, since the covariance matrix of linear mixed models poses the challenge for the justification of selection criteria, C_{p} statistic in linear mixed models has not been identified.

This paper extends the justification of C_{p} from linear models to linear mixed models. We first define a marginal Gauss discrepancy reflecting the correlation for measuring the distance between the true model and a candidate model. We utilize the assumption that under certain conditions, the estimator of the correlation matrix for the candidate model is consistent to that for the true correlation matrix. The marginal C_{p}, abbreviated as MC_{p}. MC_{p} serves as an asymptotically unbiased estimator of the expected marginal Gauss discrepancy between the true model and a candidate model. An improvement of MC_{p}, abbreviated as IMC_{p}, is also proposed and proved. We then justify IMC_{p} as an asymptotically more precisely unbiased estimator of the expected marginal Gauss discrepancy. We examine the performance of the proposed criteria in a simulation study where we utilize various correlation structures and different sample sizes.

The paper is organized as follows: Section 2 presents the notation and defines the marginal Gauss discrepancy in the setting of linear mixed models. In Section 3, we provide the derivations of the model selection criteria MC_{p} and IMC_{p}. Section 4 presents a simulation study to demonstrate the effectiveness of the proposed criteria. Section 5 concludes.

2. Marginal Gauss Discrepancy

In this section, we will introduce the true model, also called the generating model, and the candidate model in the setting of linear mixed models, then define the marginal Gauss discrepancy.

Suppose that the generating model for the data is given by

(2.1)

where y denotes an N × 1 response vector, X_{o} is an N × p_{o} design matrix of full column rank, β_{o} is a p_{o} × 1 unknown vector for fixed effects. Z is an N × mr known matrix of full column rank and b_{o} is an mr × 1 unknown vector for random effects, where m is the number of cases, the sample size, and r is the dimension of the random effects for each case. Here, , , and b_{o} and ε_{o} are mutually independent and G_{o} is a positive definite matrix and is a scalar.

We fit the data with a candidate model of the form

(2.2)

where X is an N × p design matrix of full column rank, β is a p × 1 unknown vector, , , and b and ε are mutually independent. The design matrix of the random effects Z and the random effects b are the same as those in the generating model. The matrix G is a positive definite matrix with the q unknown parameters in it.

Since the random part of the model (i.e. Zb) is not subject to selection, it is easier to use the marginal form in [19] of linear mixed models. Let, then the generating model (2.1) can be written as

(2.3)

where the scaled variance.

For the candidate model (2.2), let, we have

(2.4)

where the scaled variance. Therefore, the Σ is a nonsingular positive definite matrix.

In models (2.3) and (2.4), the terms ζ_{o} and ζ are the combinations of the random effects and errors in the model, respectively. Since they are both assumed to have mean zero, the parameters scaled variances Σ_{o} and Σ contain all the information of the random effects and errors, including the correlation structures.

We measure the distance between the true model and a candidate model by defining the marginal Gauss discrepancy based on the marginal forms of models (2.3) and (2.4). The true model is assumed to be included in the pool of candidate models. Let θ_{o} and θ denote the vectors of parameters and, respectively. The marginal Gauss discrepancy between the true model and a candidate model is defined as

where E_{o} denotes the expectation with respect to the true model. Note that the marginal Gauss discrepancy contains a weight of inverse scaled variance Σ^{−}^{1} into the L_{2} norm. Therefore, the correlation between observations is involved when we use the marginal Gauss discrepancy to measure the distance between the true model and a candidate model.

Now let denote an estimate of θ. For instance, could be the maximum likelihood estimator (MLE) or the restricted maximum likelihood estimator (REML). However, in this paper, the MLE is utilized. The marginal Gauss discrepancy between the true model and the fitted candidate model is defined as

which can be therefore expressed as

(2.5)

We define a transformed marginal Gauss discrepancy between the true generating model and the fitted candidate model as a linear function of the marginal Gauss discrepancy (2.5) as

(2.6)

Taking the expectation of the transformed marginal Gauss discrepancy (2.6), we obtain the expected transformed marginal Gauss discrepancy as

(2.7)

To serve as a model selection criterion based on the expected transformed marginal Gauss discrepancy in Equation (2.7), an unbiased estimator or an asymptotically unbiased estimator will be proposed. To simplifying the procedure, we will first abbreviate this discrepancy in Equation (2.7).

From expression (2.7), the expectation part in the numerator can be written as

(2.8)

where is a projection matrix such that. To explore a further expression of (2.8), we need to know the properties of.

Theorem 1. For every, the matrix satisfies the following properties:

1) is idempotent.

2) and.

The proof is given in the Appendix.

Corollary 1. Following Theorem 1, we have:

1).

2).

3).

The proof of Corollary 1 can be easily completed following Theorem 1.

By Corollary 1, expression (2.8) can be written as

(2.9)

Note that the scaled variance Σ is a function of the q unknown parameter vector of variance components γ, i.e.,. Azari, Li and Tsai [8] noted that under the assumption that the set of candidate models includes the true model, it is reasonable to assume that the MLE is a consistent estimator of. Therefore, we can approximate by Σ, i.e.,. In what follows, we will make use of this approximation.

First, since and, using the approximation and Theorem 1, we have the first term of (2.9) as

(2.10)

Second, using the approximation again, the first term of Equation (2.7) can be simplified as

(2.11)

Using expressions (2.9), (2.10), and (2.11), in (2.7) can be therefore approximated as

(2.12)

Following Mallows’ interpretation, in (2.12) can be expressed as

where V_{P} and B_{p} are respectively “variance” and “bias” contributions given by

and

We comment that increasing the number of the parameters of the fixed effects p will decrease the bias B_{p} for the fitted model, yet will increase the variance V_{P} at the same time. The marginal Gauss discrepancy can therefore be considered as a bias-variance trade-off. Since a smaller value of the discrepancy indicates a smaller distance between the true model and a candidate model, the size of the Gauss discrepancy can really reflect how a fitted model is close to the true model.

3. Derivations of Marginal C_{p} and Improved Marginal C_{p}

3.1. Marginal C_{p}

In this section, model selection criteria based on are developed by finding a statistic that has an expectation which equals to or asymptotically equals to the expected transformed marginal Gauss discrepancy.

We start with the expectation of the sum of squared errors SS_{Res} from a candidate model. In linear mixed models, the sum of squared errors SS_{Res} can be written as

By Theorem 1 and Corollary 1, the expectation of the “scaled sum of squared error” can be expressed by

and then we have

(3.1)

Similar to the derivation of Equation (2.11), the numerator of first term of Equation (3.1) is expressed as

(3.2)

Then, by Equations (3.1) and (3.2), it is straightforward to construct a function, which is a linear combination of. It can be shown that the function T has the expectation

Note that the function T is not a statistic since the parameter is unknown. Here, we would like to use an estimator to replace in the function. Let denote the design matrix for the largest model in the candidate pool with. We assume that. Let represent the sum of squared errors for the corresponding fitted model and is written as

where and are the MLEs for parameters and in the largest candidate model respectively. The estimator cannot be expressed in a closed form and is calculated by computational algorithm where the iterations are needed.

For the estimator of, we use the mean squared error of the largest candidate model

(3.3)

which is an asymptotically unbiased estimator for, yet it is biased. In the justification of this estimator, using the approximation, we can represent in terms of, then the expected value of can be easily calculated as, i.e., asymptotically we can have. Serving as an asymptotically unbiased estimator of, the in Equation (3.3) for the largest candidate model is preferred to estimate.

MC_{p} is then obtained as

(3.4)

Note that MC_{p} is biased for. However, under the assumption that the true model is included in the pool of candidate models, MC_{p} serves as an asymptotically unbiased estimator of the discrepancy in expression (2.7). The proof is nontrivial, yet the simulations (not presented here) can show that as the samples size increases, the curves of the average values for MC_{p} and the discrepancy, along with IMC_{p}, which will be introduced in the following subsection, collectively get merged, indicating that MC_{p} and IMC_{p} are all asymptotically unbiased estimators of the discrepancy.

3.2. Improved Marginal C_{p}

To improve the performance of the MC_{p} statistic in linear mixed models, we wish to propose an improved marginal C_{p}, called IMC_{p}, which is expected to be a more accurate or less biased estimator of the expected transformed marginal Gauss discrepancy than MCp. IMC_{p} is proposed as

(3.5)

where SS_{Res} and are the sum of squared errors from the candidate fitted model and the largest fitted model, respectively. Note that IMC_{p} provides us an asymptotically unbiased estimator of, i.e., , and it will be shown in what follows.

To evaluate the expectation of IMC_{p}, we first need to calculate the ratio of the sum of squared errors between the candidate model and the largest candidate model in the pool. By Corollary 1, we have

By using the approximation for all, we approximate and by H and, respec-

tively, and and. Then, the ratio can be written as

(3.6)

To continue the proof, we will use the following theorem and corollary.

Theorem 2. If, then for any matrix K, we have.

The proof of Theorem 2 is presented in the Appendix.

Corollary 2. Following Theorem 2, we can obtain following results:

1).

2).

The proof of Corollary 2 is included in the Appendix.

By Theorem 1 and Corollary 2, we have

such that the quadratic forms and are independent. It follows that the

expectation of in (3.6) can be written as

(3.7)

For the term in (3.7), since, we have

(3.8)

For the term in (3.7), we can prove that

Note that. To justify the distribution of, we have

where. For the distribution of y, we know that. We calculate that, and by Theorem 1, the matrix is idempotent. Therefore, we have, where

and by Corollary 2, we can calculate λ as

Now, its inverse follows an inverse Chi-square distribution, i.e., , with the expectation as

(3.9)

Using the results of (3.8) and (3.9), we have the expectation of in (3.7) as

(3.10)

We recall that the criterion IMC_{p} in (3.5) is defined as

By the result of (3.10) and the approximation again , we have the expectation of IMC_{p} as

Hence, IMC_{p} is an asymptotically unbiased estimator of the expected overall transformed Gauss discrepancy in Equation (2.7). The advantage of IMC_{p} is that it avoids the bias of using to estimate to derive the criterion comparing to the derivation of MC_{p}.

We comment that the proposed MC_{p} and IMC_{p} are justified based upon the assumption that the true model is contained in the candidate models. Hence, we can calculate the MC_{p} and IMC_{p} values for the correctly and overfitted candidate models. However, the proposed criteria are also can be utilized for the underspecified models except that the values will be quite large and not behave well.

4. Simulation Study

In this simulation study, we investigate the ability of MC_{p} in (3.4) and IMC_{p} in (3.5) to determine the correct set of fixed effects for the simulated data in different models.

4.1. Presentation of Simulations

Consider a setting in which data are generated by the model of the following form

where the random effects are uncorrelated with mean 0 and variance, the errors are independent with each other with mean 0 and variance. It follows that the correlation between any two observa-

tions from the same case is, whereas the observations from different cases are uncorrelated. Let f denote the proportion between the variance of the random effects and the variance of the errors, i.e.. We can obtain that the correlation between the observations from the same case equals, which is an increas-

ing function of f. Therefore, a higher f implies a higher correlation between the observations in the same case.

For convenience, the generating model can also be expressed by

where are unknown coefficients of the fixed effects. It is assumed that the random effects with, and. We set for, an n_{i}-vector of ones, and . We also assume that the error term, and is independent of the random effects b.

Since the random part of the model (i.e. Zb) is not subject to selection, we would like to express the model by its marginal form. Let, we have

which can also be expressed by the general form as

(4.1)

where, is a scaled covariance matrix. Equivalently, the term has the following exchangeable correlation structure:, where, I is the identity matrix and J is the matrix of 1’s.

In this simulation study, we generate the design matrix X with of 5. The first column of X is 1 and the other four columns of X are generated randomly from uniform distributions but are fixed throughout the simulations. Therefore, the number of fixed effects including the intercept in the largest model is. We assume that the candidate vectors of covariates, from which the columns of X are to be selected, then there are candidate models in the candidate pool. Here, we will illustrate the behavior of model selection criteria by choosing three generating models:

1) Model 1:,;

2) Model 2:,;

3) Model 3:,.

These three models correspond to the three bs:, and in model (4.1) with the number of fixed effects equals 2, 3, 4, respectively. Again, the MLEs are used for estimation in the simulations.

Furthermore, we consider the case where the correlated errors have varying degrees of exchangeable structure. The variance component of error term is taken to be 1, and four values in an increasing order of are considered: 3, 6, 9, corresponding to three values of f: 3, 6, 9, respectively. We take the number of clusters (m) to be 5, 10 and 20, the number of repetitions in a cluster to be fixed at n = 5. We employ a total of 100 realizations for each model.

4.2. Results

4.2.1. Model 1:

Table 1 presents the performance of the two versions of marginal C_{p} (MC_{p} and IMC_{p}), mAIC and mBIC, under model 1 with the true fixed effects parameter, and corresponding to p_{o} = 2. The correct model selection rate for each criterion is listed. We observe that corresponding to each f, the IMC_{p} outperforms the MC_{p}, and both outperform mAIC and mBIC in selecting the correct model for small samples. With the increasing of the ratio f, we can observe the better performance in selecting the correct model from our proposed criteria.

4.2.2. Model 2:

We evaluate the proposed criteria for model 2 in the same manner as for model 1. Table 2 presents the performance of MC_{p} and IMC_{p}, mAIC and mBIC under model 2, where the true fixed effects parameter is and p_{o} = 3. The only change on model 2 from model 1 is that we add one more fixed effect variable X_{5} and set the coefficient of that variable. In Table 2, the simulation results of model 2 are similar to those of model 1. With the increasing of the ratio f, we can have the better performance from our proposed criteria MCp and IMCp, indicating that the proposed MCp and IMCp can effectively fulfill the mission of model selection in the mixed models. We can also observe and conclude that IMC_{p} has improved the performance of MC_{p} for model selection in small samples. With the increasing of m, the performance of IMC_{p} and MC_{p} becomes closer. Comparing to the correct selection rates in model 1, all model selection criteria behave better in model 2.

4.2.3. Model 3:

As in the first two models, we evaluate the performance of model selection criteria by the rates in correctly selecting the true model. The results are presented in Table 3. Model 3 is identical to model 2 with the exception that we add one more significant fixed effect variable X_{2} with the coefficient.

The simulation results of model 3 are similar to those of models 1 - 2. Considering the rates in choosing the correct model, we can find the trend of dramatic improvement of all criteria on model 3 over those on models 1 and 2, implying that the proposed MC_{p} and IMC_{p} essentially and effectively implement model selection when the fixed-effects are significant. In moderately large (m = 20) sample sizes, compared to that of mAIC and mBIC, MC_{p} and IMC_{p} have comparative performance in selecting the correct model.

Table 1. Correct selection rate in model 1.

Table 2. Correct selection rate in model 2.

Table 3. Correct selection rate in model 3.

5. Concluding Remarks

The simulation results illustrate that the proposed criteria MC_{p} and IMC_{p} outperform mAIC and mBIC when the observations are highly correlated in small samples. The results also show that with the increasing of the ratio f between the variance for the random effects and that for errors, the MC_{p} and IMC_{p} perform better. Since a larger f implies a higher correlation between the observations, we can conclude that with the correlation between observations increases, a better performance from the proposed criteria MC_{p} and IMC_{p} would be observed. Since the model with a small f which close to 0 is similar to a linear regression model with independent errors, our proposed criteria are not advantageous to be applied in such case.

The simulation results show that the proposed criteria MC_{p} and IMC_{p} significantly outperform mAIC and mBIC when the sample size is small. As the sample size increases, the performance of the proposed criteria becomes comparable to that of mAIC and mBIC. Therefore, MC_{p} and IMC_{p} are highly recommended in small samples in the setting of linear mixed models.

Our research (not shown in this paper) also shows that both proposed criteria behave best when the maximum likelihood estimation (MLE) is employed, comparing to those when the restricted maximum likelihood estimation or least squares estimation are used. The research on MC_{p} and IMC_{p} under REML estimation needs to be further developed in the future.

In the simulation study, by the comparison among models 1, 2 and 3, we see that when the true model includes more significant fixed effect covariates, the proposed criteria perform better in selecting the correct model. This fact indicates that the models with more significant variables (larger bs) are more identifiable by the proposed criteria than the models with variables which are not quite significant.

Comparing the performance between MC_{p} and IMC_{p}, we find that when the sample size is small, IMC_{p} obtains a higher correct selection rate than MC_{p}, which demonstrates that IMC_{p} improves the performance of MC_{p} in selecting the most appropriate model. However, when the sample size becomes larger, the performance of MC_{p} and IMC_{p} is quite identical.

Regarding the consistency of a model selection criterion, it means that as the sample size increases, the model selection will select the true model with probability 1. Note that MC_{p}, IMC_{p}, and mAIC are not consistent, whereas mBIC is consistent as expected since its penalty term prevents the overfitting in large samples. As the simulation study demonstrates, we can address again that the proposed criteria MC_{p} and IMC_{p} validate their advantages in small samples, although they are originally justified with large sample approximations, which is similar to quite a few other model selection criteria. The details for the consistency of model selection criteria in linear mixed models can also see Jiang and Rao [20] .

Cite this paper

Cheng Wenren,Junfeng Shang,Juming Pan, (2016) Marginal Conceptual Predictive Statistic for Mixed Model Selection. *Open Journal of Statistics*,**06**,239-253. doi: 10.4236/ojs.2016.62021

References

- 1. Mallows, C.L. (1973) Some Comments on Cp. Technometrics, 15, 661-675.
- 2. Mallows, C.L. (1995) More Comments on Cp. Technometrics, 37, 362-372.
- 3. Akaike, H. (1973) Information Theory and an Extension of the Maximum Likelihood Principle. In: Petrov, B.N. and Csaki, F., Eds., International Symposium on Information Theory, 267-281.
- 4. Akaike, H. (1974) A New Look at the Model Selection Identification. IEEE Transactions on Automatic Control, 19, 716-723.

http://dx.doi.org/10.1109/TAC.1974.1100705 - 5. Schwarz, G. (1978) Estimating the Dimension of a Model. Annals of Statistics, 6, 461-464.

http://dx.doi.org/10.1214/aos/1176344136 - 6. Sugiura, N. (1978) Further Analysis of the Data by Akaike’s Information Criterion and the Finite Corrections. Communications in Statistics—Theory and Methods A, 7, 13-26.

http://dx.doi.org/10.1080/03610927808827599 - 7. Shang, J. and Cavanaugh, J.E. (2008) Bootstrap Variants of the Akaike Information Criterion for Mixed Model Selection. Computational Statistics & Data Analysis, 52, 2004-2021.

http://dx.doi.org/10.1016/j.csda.2007.06.019 - 8. Azari, R., Li, L. and Tsai, C. (2006) Longitudinal Data Model Selection. Applied Times Series Analysis, Academic Press, New York, 1-23.

http://dx.doi.org/10.1016/j.csda.2005.05.009 - 9. Vaida, F. and Blanchard, S. (2005) Conditional Akaike Information for Mixed-Effects Models. Biometrika, 92, 351-370.

http://dx.doi.org/10.1093/biomet/92.2.351 - 10. Henderson, C.R. (1950) Estimation of Genetic Parameters. Annals of Mathematical Statistics, 21, 309-310.
- 11. Harville, D.A. (1990) BLUP (Best Linear Unbiased Prediction) and beyond. In: Gianola, D. and Hammond, K., Eds., Advances in Staitstical Methods for Genetic Improvement of Livestock, Springer, New York, 239-276.

http://dx.doi.org/10.1007/978-3-642-74487-7_12 - 12. Robinson, G.K. (1991) That BLUP Is a Good Thing: The Estimation of Random Effects. Statistical Science, 6, 15-32.

http://dx.doi.org/10.1214/ss/1177011926 - 13. Dimova, R.B., Mariantihi, M. and Talal, A.H. (2011) Information Methods for Model Selection in Linear Mixed Effects Models with Application to HCV Data. Computational Statistics & Data Analysis, 55, 2677-2697.

http://dx.doi.org/10.1016/j.csda.2010.10.031 - 14. Müller, S., Scealy, J.L. and Welsh, A.H. (2013) Model Selection in Linear Mixed Models. Statistical Science, 28, 135-167.

http://dx.doi.org/10.1214/12-STS410 - 15. Jones, R.H. (2011) Bayesian Information Criterion for Longitudinal and Clustered Data. Statistics in Medicine, 30, 3050-3056.

http://dx.doi.org/10.1002/sim.4323 - 16. Fujikoshi, Y. and Satoh, K. (1997) Modified AIC and Cp in Multivariate Linear Regression. Biometrika, 84, 707-716.

http://dx.doi.org/10.1093/biomet/84.3.707 - 17. Davies, S.L., Neath, A.A. and Cavanaugh, J.E. (2006) Estimation Optimality of Corrected AIC and Modified Cp in Linear Regression. International Statistical Review, 74, 161-168.

http://dx.doi.org/10.1111/j.1751-5823.2006.tb00167.x - 18. Cavanaugh, J., Neath, A.A. and Davies, S.L. (2010) An Alternate Version of the Conceptual Predictive Statistic Based on a Symmetrized Discrepancy Measure. Journal of Statistical Planning and Inference, 140, 3389-3398.

http://dx.doi.org/10.1016/j.jspi.2010.05.002 - 19. Jiang, J. (2007) Linear and Generalized Linear Mixed Models and Their Applications. Springer, New York.
- 20. Jiang, J. and Rao, J.S. (2003) Consistent Procedures for Mixed Linear Model Selection. Sankhya, 65, 23-42.

Appendix

Proof of Theorem 1. 1) To prove that is idempotent, we calculate

Thus, we prove that is idempotent.

2) By the properties of trace, we have

Therefore, we have

Thus, Theorem 1 is proved. □

Proof of Theorem 2. Let. We need to show that.

Since, there exists a p × 1 vector β_{1} such that.

By, there also exists a p × 1 vector β_{2} such that, which makes .

So we have. □

Proof of Corollary 2. 1) Since is positive definite, there exists an N × N matrix V with,

such that. It follows that.

Let, we can have. Then, we arrive at

Now, let

and

Since, by Theorem 2, we have, so that we can have, which leads to

The first part of Corollary 2 is therefore proved.

2) Following the first part proof of Corollary 2, since, we have. Then, we can conclude that

Therefore, the proof for the second part of Corollary 2 is completed. □

NOTES

^{*}Corresponding author.