Open Journal of Statistics, 2011, 1, 212-216
doi:10.4236/ojs.2011.13025 Published Online October 2011 (http://www.SciRP.org/journal/ojs)
Copyright © 2011 SciRes. OJS
A New Test for Large Dimensional Regression Coefficients
June Luo1, Yi-Jun Zuo2
1Department of Mathematical Sciences, Clemson University, Clemson, USA
2Department of Statistics and Probability, Michigan State University, Lansing, USA
E-mail: jluo@clemson.edu
Received August 19, 2011; revised September 22, 2011; accepted September 30, 2011
Abstract
In the article, hypothesis test for coefficients in high dimensional regression models is considered. I develop
simultaneous test statistic for the hypothesis test in both linear and partial linear models. The derived test is
designed for growing p and fixed n where the conventional F-test is no longer appropriate. The asymptotic
distribution of the proposed test statistic under the null hypothesis is obtained.
Keywords: High Dimension, Ridge Regression, Hypothesis Test, Partial Linear Model, Asymptotic
1. Introduction
Some high dimensional data, such as gene expression
datasets in microarray, exhibits the property that the
number of covariates greatly exceeds the sample size.
The discovery of “large p, small n” paradigm brings
challenges to many traditional statistical methods, and
thus the asymptotic properties of various estimators
when p goes to infinity much faster than n have been
discussed (see [1-3]). Reference [1] considered uniform
convergence for a large number of marginal discrepancy
measures targeted on univariate distributions, means and
medians. Reference [3] proposed a two sample test on
high dimensional means. Both of these aforementioned
articles considered testing under “large p, small n
without a regression structure, which the present article
concentrates on.
Zhong and Chen in [4] proposed a test statistic for
testing the regression coefficients in linear models when
p/n ρ in (0,1). As in microarray data, the number of
genes (p) is in the order of thousands whereas the sample
size (n) is much less, usually less than 50 due to
limitation for replications. The fact results in p going to
infinity and thus I think the consideration of p going to
infinity and n remains constant is more practical.
Covariate selection for high dimensional linear regres-
sion has received considerable attention in recent years.
Penalizing methods are alternatives to the traditional
least squares estimator for shrinkage estimation as in [5,
6]. Shao and Chow in [7] proposed a variable screening
method using ridge estimators as both p and n go to
infinity. In contrast to the assumptions in the literature, I
consider “large p, fixed n” setting in linear models for
variable selection. Testing hypothesis on the regression
coefficients is critical in determining the effects of
covariates on certain outcome variable. Motivated by the
latest need in biology to identify significant sets of genes,
rather than individual gene, I aim at developing simultan-
eous tests for coefficients in linear regression models.
The partial linear models have been extensively
studied. They have a wide range of applications, from
statistics to biomedical sciences. In these models, some
of the relations are believed to be of certain parametric
form while others are not easily parameterized. Several
approaches have been developed to construct estimators.
A profile likelihood approach was used in [8,9]. In this
article, I apply a difference based estimation method in
the partial linear models. The method of taking differ-
ences to eliminate the effect of the unknown nonpa-
rametric component has been used in both nonparametric
and semiparametric settings. Rice in [10] first introduced
a differencing estimator of the residual variance. Horo-
witz and Spokoiny in [11] used the differencing method
to test between a parametric model and a nonparametric
alternative. After taking the differences to eliminate the
bias induced from the nonparametric term, I concentrate
attention on estimating the linear component and then
formulate the test statistic for testing the linear compo-
nents.
The article begins with the conventional F-test. I will
then discuss the efficiency of ridge estimator and pro-
pose a new test statistic for “large p, fixed n” setting. The
asymptotic distribution of the proposed test statistic
under null hypothesis is established. Extensions to partial
213
J. LUO ET AL.
linear models are then made in Section 3.
2. Test Statistics
Consider a linear regression model
Y = Xβ + ε (1)
where X = (X1, X2, ..., Xn)' are independent and identi-
cally distributed observation matrix, covariates Xi1, Xi2,...,
Xip are uncorrelated, Y = (Y1, Y2, ..., Yn)' are in- de-
pendent responses, β is the p × 1 vector of regression
coefficients, and
2
N0,
p
n
I

I am interested in test-
ing a high-dimensional hypothesis
001
H: vs H:0

(2)
for a specific β0 in Rp.
2.1. F-Test for “Large n, Small p
I will start from reviewing the F-test for hypothesis (2)
by Rao in [12]. When we have a large sample size, we
can use least squares method to estimate the coefficients.
The least squares estimator is
= (X'X)–1X'Y. The
conventional test for (2) is given by



00
,
XX
YX YX
np
p
F
np
 





(3)
As proven in [12], under H0, Fn,p ~ Fp,n-p. Hence, an α-
level F-test rejects H0 if Fn,p > Fp,n-p;α, the upper α-
quantile of the Fp,n-p distribution. The F statistic is a
monotone function of the likelihood ratio statistic and is
distributed as a noncentral F distribution under the alter-
native (see [13]).
2.2. A New Test Statistic for “Large p, Fixed n
I have seen a limitation with the F-test defined in Equa-
tion (3): it can not be applied to large p and small n. As
more and more datasets exhibit larger dimension than
sample size, we are in need to formulate a test statistic to
suit the large “p and small n” paradigm. Because least
squares estimator
is inappropriate when p > n, I
modify the F-statistic in two aspects. One is to replace
the least squares estimator with an appropriate estimator
of β. The second is to find the asymptotic distribution of
the new test statistic.
To overcome the singularity of X'X when p > n in
model (1), consider using penalizing methods. The ridge
estimator ˆ
of β in [14] is
ˆ
= (X'X +
p
p
hI )–1·X'Y (4)
where hp is the regularization parameter. Luo in [15]
proved that the ˆ
in Equation (4) is mean squared er-
ror consistent of β under certain conditions. More re-
cently, Luo in [16] proved the mean squared error con-
sistency under less restrictive conditions. The assump-
tions and the results in [16] are given below.
Assumption A. 1/hp = o(1). For sufficiently large p,
there is a vector bp×1 such that β = X'Xb. Furthermore,
there exists a constant ε > 0 such that each component of
bp×1 is
1
O1p
.
Assumption B. σp and hp are chosen such that pε·hp =
o(1) and σp =o(hp
0.5).
It was proven in [16] that under the Assumption A and
B,

2
ˆ
iasOo 1
pp
jh

 and

2
ˆ
varOo 1
ipp
h

. In this article, I will take
the opportunity to explore more concise asymptotic re-
sults about ˆ
under Assumption A and B. Because
X'X can have at most n positive eigenvalues, without
loss of generality, let λip be the ith nonzero eigenvalue of
X'X and assume λip > 0 for all i = 1, 2, ..., n. Let Г =
(τij)p×p be an orthogonal matrix such that
X'X = Г
 
 
npn npn
pn npnpn
O
OO
 
 
Г'
where Λn×n is a diagonal matrix with elements λip, i = 1,
2, ..., n.
Theorem 1. Under Assumption A and B, given that
the p covariates are uncorrelated, if hp is chosen such that
pε/2hp/σp = o(1) and λip = o(hp) for all i =1, 2, ..., n, I have



n
ˆN0,diagXX
p
p
h


where diag(X'X) means the diagonal matrix with diago-
nal elements of X'X.
Proof.
Because the random error ϵ is multivariate normal, ˆ
is a multivariate normal.



1
12
11
2
2
2
2
ˆ
covX XXXX X
XXXX XX
AAA
AA
pp ppp
p
pp
pppp
p
p
p
p
hI hI
III
hhhh
h
h
 
1
p


 







 



-
where A = ( XX
p
h
+
p
I
)–1 is a diagonal matrix with i = 1,
2, ..., n as first n diagonal elements, and the rest (p – n)
diagonal elements all equal to 1. So
Copyright © 2011 SciRes. OJS
J. LUO ET AL.
214


22
2
22
1
ˆ
var n
pp
jij
i
ppip
h
hh


ip
for all j = 1, 2, ..., p.
Under the assumption λip = o(hp) for all i = 1, 2, ..., n,

2
2
lim 1
p
p
pip
h
h

for all i = 1, 2, ..., n.
Notice that 2
1
n
iijip
λip is the jth diagonal element of
X'X, we have


2
2
ˆ
limvardiagXX
p
j
j
pp
h

for all j = 1, 2, ..., p.
where diag(X’X)j means the jth diagonal element of X’X.
Given that the p covariates are uncorrelated, I conclude

2
2
ˆ
limcovdiagX X
p
pp
h

(5)
As in [16], the bias(ˆ
j
) = O(pεhp) = o(1), the assump-
tion that pε/2hp/σp = o(1) guarantees bias(ˆ
j
)hp/σp = o(1).
Along with result in (5), that completes the proof for
Theorem 1.
Now I can modify the Fn,p to a test statistic for “large p,
fixed n” paradigm. Define





,1
2
0
ˆˆ
ˆˆ
diagX X
np
p
YX YX
G
h

 

0
p
. (6)
Under assumption A and B, as p , ˆ
in Equation
(4) is mean squared error consistent of β which implies
ˆ
converges in probability to β. Apply the continuous
mapping theorem,


2
22
ˆˆ
converges in probability to
ˆˆ
p
pp
YX YX
YX YX






which is χn
2 distribution. Under H0
as p , by Theo-
rem 1,




21
00
2
ˆˆ
diagX X
p
p
h


converges to χp
2 distribution. By Law of large numbers,




21
0
2
ˆˆ
diagX X
p
p
hp
 

0
converges to 1. Apply the Slutsky’s theorem, I conclude
that under H0, test statistic Gn,p in Equation (6) converges
in distribution to χn
2 as p and n is a fixed constant.
Hence, an α-level Gn,p statistic rejects H0 if Gn,p > χn;α
2,
the upper α quantile of the χn
2 distribution.
3. Extension to Partial Linear Models
Partial linear models are more flexible than standard
linear models. They can be a suitable choice when one
suspects that the response Y linearly depends on X, but
Y is nonlinearly related to Z. Consider a fix design
version of the partial linear model which has the matrix
form
Y = Xβ + f(Z) +
ϵ
(7)
where Y = (Y1, Y2, ..., Yn+1)', X is a (n + 1) × p matrix
whose ith row is given by xi, the p covariates of xi are
uncorrelated and
ϵ
= (
ϵ
1,
ϵ
2, ...,
ϵ
n+1)' is normally
distributed with a mean vector 0 and covariance matrix
2
1
p
n
I
. Estimators of the linear component for n > p
situation have been discussed in [17-19]. The methods
are not applicable for p > n, I propose the following
procedure to obtain a statistic for hypothesis (2) in partial
linear model (7). Assume the sequence {zi} c0 as p
, for all i = 1, 2, ..., n + 1, where c0 is a finite constant.
The unknown function f is continuous at point c0.
Consider
yi+1 – yi = (xi+1 – xi)β + f(zi+1) – f(zi) +
ϵ
i+1
ϵ
i.
(8)
Since zi c0 for all i = 1, 2, ..., n + 1, for any ψ > 0,
there exists a large enough p value so that we have
11 0
max in i
zc

.
Function f is continuous at point c0, so for a large
enough p, we have

11 0
max in i
fz fc
 
,
which implies that for a finite n,

10
1
max (1)
i
in
f
zfzo
  (9)
Define a matrix
110 00
01 100
D1
00011
nn







. (10)
We now consider the matrix form of Equation (8),
which is
DY = DXβ + Df(Z) + D
ϵ
. (11)
Because of Equation (9), I can ignore the presence of
nonparametric part in model (11). Thus, (11) becomes
DY = DXβ + D
ϵ
(12)
Copyright © 2011 SciRes. OJS
215
J. LUO ET AL.
where matrix D is given in (10). Luo in [20] examined
the asymptotic distribution of ridge estimator of β in (12).
Obviously the random errors D
ϵ
are not independent and
thus the following procedure is crucial for the extension
of previous results. Without loss of generality, assume
sample size n is even. Define (see Equations (13) and
(14)).
So Equation (12) becomes
D1Y = D1Xβ + D1
ϵ
(15)
and
D2Y = D2Xβ + D2
ϵ
. (16)
Notice that D1
ϵ
~ N(0, σp
2In/2) and D2
ϵ
~ N(0, σp
2In/2).
Now I can apply the results in Section 2.2 in model (15)
and model (16). It follows that the two statistics for
testing hypothesis (2) in model (15) and (16) are given
by




111111
1
,1
2
1011 10
ˆˆ
DY DXDY DX
ˆ
diagXDD X
np
p
G
h

 

ˆ
P
and




222222
2
,1
2
2022 20
ˆˆ
DY DXDY DX
ˆ
diagX DDX
np
p
G
h

 

ˆ
P
where 1
ˆ
= (X'D1'D1X +
p
p
hI )–1X'D1'D1Y and 2
ˆ
=
(X'D2'D2X +
p
p
hI )–1X'D2'D2Y. When all assumptions
for Theorem 1 hold, under H0, both and
converge in distribution to χn
2 as p . Hence, the de-
1
,np
G2
,np
G
cision rule is we reject H0 if min(,) >
1
,np
G2
,np
G
2
;
2
n
and otherwise, fail to reject H0.
5. References
[1] M. Kosorok and S. Ma, “Marginal Asymptotics for the
‘Large p, Small n’ Paradigm: With Aplications to Mi-
croarray Data,” Annals of Statistics, Vol. 35, No. 4, 2007,
pp. 1456-1486. doi:10.1214/009053606000001433
[2] J. Fan, P. Hall and Q. Yao, “To How Many Simultaneous
Hypothesis Tests Can Normal Student’s t or Bootstrap
Calibrations Be Applied,” Journal of the American Sta-
tistical Association, Vol. 102, No. 480, 2007, pp. 1282-
1288. doi:10.1198/016214507000000969
[3] S. Chen and Y. Qin, “A Two Sample Test for High Di-
mensional Data with Applications to Gene-Set Testing,”
Annals of Statistics, Vol. 38, No. 2, 2010, pp. 808-835.
doi:10.1214/09-AOS716
[4] P. Zhong and S. Chen, “Tests for High-Dimensional Re-
gression Coefficients with Factorial Designs,” Journal of
American Statistical Association, Vol. 106, No. 493, 2011,
pp. 260-274. doi:10.1198/jasa.2011.tm10284
[5] R. Tibshirani, “Regression Shrinkage and Selection via
the Lasso,” Journal of the Royal Statistical Society. Se-
ries B, Vol. 58, No. 1, 1996, pp. 267-288.
[6] J. Fan and J. Lv, “Sure Independence Screening for Ul-
tra-high Dimensional Feature Space (with Discussion),”
Journal of Royal Statistical Society, Vol. 70, No. 5, 2008,
pp. 849-911. doi:10.1111/j.1467-9868.2008.00674.x
[7] J. Shao and S. Chow, “Variable Screening in Predicting
Clinical Outcome with High-Dimensional Microarrays,”
Journal of Multivariate Analysis, Vol. 98, No. 8, 2007, pp.
1529-1538. doi:10.1016/j.jmva.2004.12.004
[8] R. Carroll, J. Fan, I. Gijbels and M. Wand, “Generalized
Partially Linear Single-Index Models,” Journal of Ameri-
can Statistical Association, Vol. 92, No. 438, 1997, pp.
477-489. doi:10.2307/2965697
[9] T. Severini and W. Wong, “Profile Likelihood and Con-
ditionally Parametric Models,” Annals of Statistics, Vol.
20, No. 4, 1992, pp. 1768-1802.
doi:10.1214/aos/1176348889
[10] J. Rice, “Bandwidth Choice for Nonparametric Regres-
sion,” Annals of Statistics, Vol. 12, No. 4, 1984, pp. 1215-
1230. doi:10.1214/aos/1176346788
[11] J. Horowitz and V. Spokoiny, “An Adaptive Rate-optimal
Test of a Parametric Mean-regression Model against a
Nonparametric Alternative,” Econometrica, Vol. 69, No.
3, 2001, pp. 599-631. doi:10.1111/1468-0262.00207
[12] C. Rao, H. Touteburg, and C. Heumann, “Linear Models
and Generalizations,” Springer, New York, 2008.
1
000 0
12 12
000 0
12 12
D1
2
000 0
12 12
nn








(13)
2
0000
12 12
00 00
12 12
D1
2
0000
12 12
nn








. (14)
Copyright © 2011 SciRes. OJS
J. LUO ET AL.
Copyright © 2011 SciRes. OJS
216
[13] T. Anderson, “An Introdution to Multivariate Statistical
Analysis,” Wiley, Hoboken, 2003.
[14] A. Hoerl and R. Kennard, “Ridge Regression Biased Es-
timation for Nonorthogonal Problems,” Technometrics,
Vol. 12, No. 1, 1970, pp. 55-67. doi:10.2307/1267351
[15] J. Luo, “The Discovery of Mean Square Error Consis-
tency of Ridge Estimator,” Statistics and Probability Let-
ters, Vol. 80, No. 5, 2010, pp. 343-347.
doi:10.1016/j.spl.2009.11.008
[16] J. Luo, “Asymptotical Properties of Coefficient of De-
termination for Ridge Regression with Growing Dimen-
sions,” Oriental Journal of Statistical Methods, Theory
and Applications, Vol. 1, No. 1, 2011, pp. 41-49.
[17] L. Wang, L. Brown and T. Cai, “A Difference Based
Approach to Semiparametric Partial Linear Model,” Elec-
tronic Journal of Statistics, Vol. 5, 2011, pp. 619-641.
[18] A. Yatchew, “An Elementary Estimator of the Partial
Linear Model,” Economics Letters, Vol. 57, No. 2, 1997,
pp. 135-143. doi:10.1016/S0165-1765(97)00218-8
[19] A. Yatchew, “Scale Economies in Electricity Distribution:
A Semiparametric Analysis,” Journal of Applied Eco-
nomics, Vol. 15, No. 2, 2000, pp. 187-210.
[20] J. Luo, “Asymptotic Efficiency of Ridge Estimator in
Linear and Semiparametric Linear Models,” Statistics
and Probability Letters, Vol. 82, No. 1, 2011, pp.58-62.
doi:10.1016/j.spl.2011.08.018.