Efficiency of Selecting Important Variable for Longitudinal Data

doi:10.4236/psych.2014.51002

Paper Menu >>

Journal Menu >>

Psychology

2014. Vol.5, No.1, 6-11

Published Online January 2014 in SciRes (http://www.scirp .org/journal/psych) http://dx.doi.org/10.4236/psych.2014.51002

Efficiency of Selecting Important Varia ble for Longitudinal Data

Jo ngmi n Ra, Ki-Jong Rhee

Department of Educa tion, Kookmin University, Seoul, Sout h Korea

Email: rems2002@gmail.com

Received O ctober 12th, 2013; revised November 13th, 2013; accept ed Decem ber 9th, 2013

Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium,

provided the original work is properly cited. In accordance of the Creative Commons Attribution License all

Varia ble selec tion wi th a large nu mber of pr edictors is a very cha llenging a nd impor tant prob lem in edu-

cati onal and s ocial doma ins. Howe ver, rel ativel y littl e attent ion has b een paid t o issues of vari able sel ec-

tion in longitudinal data with application to education. Using this longitudinal educational data (Test of

English for International Communication, TOEIC), this study compares multiple regression, backward

elimination, group least selection absolute shrinkage and selection operator (LASSO), and linear mixed

models i n terms of t heir p erformance i n variabl e selection. The resul ts from the s tudy show tha t four di f-

ferent statistical methods contain different sets of predictors in their models. The linear mixed model

(LMM) provides the s mallest number of predi ctors (4 predic tors a mong a tota l of 19 pr edic tors). In a ddi-

tion, LMM is the onl y ap propria te method for the r epeated mea surement a nd is the bes t method with re-

spect to t he pri ncipa l of pa rsi mony. This s t udy als o provides int er p reta tion of the sel ec ted model b y LMM

in the conclus ion using marginal

Key words: Group LASSO; Li near Mi xed M odel; Longitudinal D ata; Marginal

; Variable Selection

Introduction

The characteristic of a longitudinal study is that individuals

are measured repeatedly through different time points and re-

quir e special statisti cal methods because th e set of observatio ns

on the same individual tends to be inter-correlated and can be

explained by both fixed and random effects.

As longitudinal data are common in educational settings, the

linear mixed model (LMM) has emerged as an effective ap-

proach since it can model within and between subject hetero-

geneity (Vonesh, Chinchilli, & Pu, 1996). The LMM also at-

tempts to account for within-subject dependency in the multiple

measurements by including one or more subject-specific vari-

ables in a regression model (Laird & Ware, 19 82; Giks, Wang,

Yvonnet, & Coursaget, 1993).

Despite the development of statistical models, model selec-

tion criteria for the LMM h ave received little attention (Orelien

& Edwards, 2008; Vonesh et al., 1996). However, several stud-

ies (Vonesh & Chinchilli, 1997; Vonesh et al., 1996; Zheng,

2000) recently suggest model fit indices which are useful for

mixed effect models. More specifically, studies (Vonesh &

Chinchilli, 1997; Vonesh et al., 1996) show that marginal

is preferred when only fixed-effect components are involved in

the predicted values, but conditional

is preferred for ran-

dom effects (Vonesh et al., 1996).

It is not uncommon to collect a large number of predictors to

model an individual’s reading achievement more accurately in

educational and psychological fields. Thus, it is fundamental to

select meaningful variables in multivariate statistical models

(Zhang, Wahba, Lin, Voelker, Ferris, Klein, & Klein, 2004) to

increase prediction accuracy and to provide better understand-

ing of concepts.

It is, however, ch allen gin g to select impor tan t variabl es when

a respon se vari abl e is measured rep eated ly over a cert ain per io d

of time because it is known that the selection process of statis-

tically significant variables is hindered by the correlation

among th e repeated measure ment s. Furthermore, classical vari-

able selection methods, such as the forward selection and the

backward elimination methods are time-consuming, unstable,

and sometimes unreliable for making inferences. Although

there is a great deal of e xtent rese arch examin in g issu es o f var i-

able select ion in linear regressi on, little research has been don e

investigating how differently and similarly different statistical

methods perform within a longitudinal data. This study aims to

investigate how similarly and differently various statistical

method s perform in the presen ce of the rep eated mea surements

in the d ata.

Hence, thi s study compares fo ur different statist ical method s,

multiple regression, backward elimination, group least selection

absolute shrinkage and selection operator (LASSO), and the

LMM, using a test of English as International Communication

(TOEIC) data as individuals’ reading achievement. For the

LMM, marginal

for remaining variables in the model is

used to provide a better understanding of the impacts of se-

lected predictors in the longitudinal data.

Multiple Linear Regression

Mult iple linear r egression is a flexible method o f data anal y-

sis that may be appropriate whenever a response variable is to

be examined in relation to any other predictors (Cohen, Cohen,

OPEN ACCE SS 6

J. RA, K.-J. RHEE

West, & Aiken, 2003). For instance, if a multiple regression

method is used for predicting and explaining an individual’s

English achievement, many variables such as gender, age, and

socio -economic status (SES) might all contribute toward indi-

vidual’s En glish achievement.

The multiple regression method for predicting English achieve-

ment, Y, with the observed data

( )

1, ,,1,, ,

i pi

X Xin=

is as

follows

0112 2iiip pii

Y XXX

ββ ββε

= +++++

. This equation

shows the relationship between p predictors and a response

variable Y, all of which are measured simultaneously on the

subject. This method is called linear because the effects of the

various pred ictors are treated as addit ive.

In addition, much efforts has been put to estimate the per-

formance of different methods and choose the best one by using

fit indices such as AIC (Akaike, 1973), BIC (Schwarz, 1978),

Mellow’s

(Mallows, 1973), and adjusted

. AIC and

BIC are bas ed on th e penalized maximum likelih ood estimat es.

AIC is defined as −2log(L) + 2p, where log(L) is the loglikeli-

hood function of the parameters in the model evaluated at the

maximum likelihood estimator while the second term is a pen-

alty term for additional parameters in the model. Therefore, as

the number of independent variables included in the model

increases, the first term decreases while the penalty term in-

creases. Conversely, as variables are dropped from the model,

the lack of fit term increases while the penalty term decreases.

BIC is defined as

()( )

2log logLp n− +×

. The penalt y term for

BIC is similar to AIC but uses a multiplier of log(n) instead of a

constant 2 by incorporating the sample size. In general, AIC

tends to choose overly complex models when sample size is

large and BIC tends to choose overly simple model when sam-

ple size is small and also choose the correct model when sam-

ple size approaches infinity.

Mallow’s

is als o co m mo nl y u sed to i n v es ti g at e how we l l a

mod el fits d ata and can b e defin ed as

2()

C SSEpn

+= +



In this equation,

represents the estimate of

and

SSE

is defined as

( )

i ii

= =

−

∑∑

, where

is the estimator

. Mallow’s

is calculated for all possible subset

models. The model with the smallest value of

is deemed

to be the best linear model. As the number of independent

variables (p) increases, an increased penalty term 2p is offset

with a decreased SSE.

Another commonly used fit index for model selection is

or adjusted

. Both

and adjusted

represent the

percentage of the variability of the response variable that is

explained by the variation of predictors.

is a function of

the total sum of square (SST) and SSE, and the formula is given

( )

1SSE SST−

Adjusted

takes into account the degrees of freedom

used up by adding more predictors. Even though adjusted

attempts to yield a more robust value to estimate

, there is

little difference between adjusted

and

when a large

number of predictors are included in a model.

When the number of observations is very large compared to

the number of predictors in a model, the value of

and ad-

justed

will be much closer because the ratio of

() ()

n1 n1p−−−

will approach 1. Despite the practical ad-

vantages o f using a multiple regression method, it is difficult to

build multiple regression models for repeatedly measured re-

sponses. The multiple regression method is not appropriate for

correlated response variables as in longitudinal data without

accounting for correlation within response variables.

Backward Elimination Approach

Besides the multiple regression approach, backward eli mina-

tion is common and important practice to select relevant vari-

ables among a large number of predictors. A subset selection

method is one of the most widely used variable selection ap-

proaches in which one predictor at a time is added or deleted

based on the F statistic iteratively (Bernstein, 1989). Subset

selection methods, in general, provide an effective means to

screen a large n u mber of var iab les (Ho smer & Lmeshow, 2000).

Since there is a possibility of emerging a suppressor effect in

the forward inclusion method (Agresti & Finlay, 1986), the

backward elimination method is usually preferred method of

exploratory analysis (Agresti, 2002; Hosmer & Lemeshow,

2000; Menard, 1995) and follows three steps.

First, obtains a regression equation which includes allp pre-

dictors. Second, conducts a partial F-test for each of the pre-

dictors which indicates the significance of the corresponding

predictor as if it is the last variable entered into the equation.

Finally, selects the lowest partial F value and compares it with

a threshold partial,

, the value set equal to some predeter-

mined level of significance,

If the smallest partial F is less

than

, then deletes that variable and repeats the process for

p − 1 predictors. This sequence continues until the smallest

partial

at any given st ep i s gr eat er th an

. The variab le s

that are remained in the model are considered as significant

predictors. In general, the backward elimination method is

computationally attractive and can be conducted with an esti-

mation accuracy criterion or through hypothesis testing.

The backward elimination method, however, is far from p er-

fection. This method often leads to locally optimal solutions

rather than globally optimal solution. Also, the backward

elimination method yields confidence intervals for effects and

predicted value that are far too narrow (Altman & Andersen,

1989). The degree of correlation among the predictors affects

the frequency with which authentic predictor find their way into

the final model in terms of frequency o f obtaining authentic and

noise predictors (Derksen & Keselman, 1992). More specifi-

cally, the number of candidate predictors affects the number of

noise predictors that gains entry to the model. Furthermore, it is

well known that the backward elimination method will not

necessarily produce the best model if there are redundant vari-

ables (Derksen & Keselman, 1992). It also yields

values

that are badly biased upward and have severe problems in the

presence of collinearity. Since the backward elimination me-

thod gives biased re gression coefficient esti mates, they need t o

be shrunk because the regression coefficients for remaining

variables are too large. Besides well-known inherent technical

problems, it is time consuming when a large number of predic-

tors are included in the model and cumbersome to choose ap-

propriate variables manually when categorical variables are

includ ed in the model as a dummy variable.

The Group LASSO

To overcome problems shown in multiple regression and

backward elimination approaches, a number of shrinkage

methods are developed to overcome the inherent problem

shown in traditional variable selection methods (Bondell &

Reich, 2008; Forster & George, 1994; George & McCulloch,

1993; Tibshirani, 1996). Among many suggested shrinkage

methods, the least absolute shrinkage and selection operator

OPEN ACCE SS 7

J. RA, K.-J. RHEE

(LASSO) suggested by Tibshirani (1996) is one of well-known

penalized regression approaches (Bondell & Reich, 2008;

Meier, van de Geer, & Bhlmann, 2008; Tibshirani, 1996). The

LASSO method minimizes the residual sum of squares subject

to the sum of the absolute value of the coefficients being less

than a constant (Tibshirani, 1996). It is also well known that all

the variables in LASSO type methods such as the standardized

LASSO and group LASSO (Yuan & Lin, 2006) need to be

standardized before performing analysis.

The LASSO method is defined as follows

( )

10 1

minarg npp

LASSO ii

ii i

β λβλβ

= ==

= −+

∑∑ ∑

In this

equation,

( )

β βββ

=

and

is a penalty or tuning

parameter. The parameter  controls the amount of shrinkage

that is applied to the estimates. The solution paths of LASSO

are piecewis e linear, and thus can be compu ted very efficien tly.

The variables selected by the LASSO method are included in

the model with shrunken coefficients. The salient feature of the

LASSO method is that it sets some coefficients to be 0 and

shrinks others. Furthermore, the LASSO method has two ad-

vantages co mpared to the trad itional estimati on method . One is

that it estimates para meters an d select variables si multaneously

(Tibshirani, 1996; Fan & Li, 2001). The other is that the solu-

tion path of the LASSO method moves in a predictable manner

sine it has good computational properties (Efron, Hastie,

Johnstone, & Tibshirani, 2004). Thus, the LASSO method can

be used for high-dimensional data as long as the number of

predictors, is smaller than or equal ton,

≤

The LASSO method, however, has some drawbacks (Yuan

& Lin, 2006). If the number of predictors (p) is larger than the

number of observations (n), the LASSO method at most select

variables due to the nature of the convex optimization problem.

Also, the LASSO method tends to make selection based on the

strength of individual derived input variables rather than the

strength of groups of input variables, often resulting in select-

ing more variables than necessary. Another drawback of using

the LASSO method is that the solution depends on how the

variables are orthonormalized. That is, if any variable

reparameterized through a different set of orthonormal contrasts,

there is a possibility of getting different set of variables in the

solution. This is undesirable since solutions to a variable selec-

tion and estimation problem should not depend on how the

variables are represented. In addition, the LASSO solutions

brin g another problem when cat egorical variab les enter into the

model. The LASSO method treats categorical variables as an

individual variables rather than a group (Meier et al., 2008). A

major stumbling block of the LASSO method is that if there are

groups of highly correlated variables, it tends to arbitrarily se-

lect only one from each group. This makes models difficult to

interpret because p redictor s th at are strongly associated with the

outcome are not included in the predictive model.

To remedy the shortcomings of the LASSO method, Yuan

and Lin (2006) suggested the group LASSO in which an entire

group of predictors may drop out of the model depending on.

The group LASSO is defined as follows

( )

minarg Lp

LASSO ll

YX P

βλλ β

= =



= −+





∑∑

In this

equation,

represents the predictors corresponding to the

lth group, with corresponding coefficient sub-vector, and

takes into account for the different group sizes. If

( )

1,T

xxx=

, then,

x x

∑

. The group LASSO acts

like the LASSO at the group level; depending

, an entire

group of predictors may drop out of the model. The group

LASSO takes two steps. First, a solution path indexed by cer-

tain tuning parameter is built. Then, the final model is selected

on the solution path by cross validation or using a criterion such

as the Mallow’s

This gives group LASSO tremendous computational advan-

tages when compared with other methods. The group LASSO

makes statistically insignificant variables become zero by in-

corporating shrinkage as the standard LASSO do es . Overall, the

group LASSO method enjoys great computational advantages

and excellent performance, and a number of nonzero coeffi-

cients in the LASSO and the group LASSO methods are an

unbiased estimated of the degree of freedom (Efron et al.,

2004).

Even though the group LASSO is suggested for overcoming

drawbacks for the standard LASSO, the group LASSO method

still has some limitations. For example, the solution path of the

group LASSO is not piecewise linear which precludes the ap-

plication of efficient optimization methods (Efron et al., 2004).

It is also known that the method tends to select a large number

of groups than necessary, and thus includes some noisy vari-

ables in the model (Meier et al., 2008). Furthermore, the group

LASSO method is not directly applicable to longitudinal data

and needs further study for being suitable for the repeated

measurement. R code for the group LASSO is provided in

Appendix.

Linear Mixed Model

The linear mixed model (LMM) is another very useful ap-

proach for longitudinal studies to describe relationship between

a response variable and predictors. The LMM has been called

differently in different fields. In economics, the term “random

coefficient regression models” is common. In sociology, “mul-

tilevel modeling” is common, alluding to the fact that regres-

sion intercepts and slops at the individual level may be treated

as random effects of a higher level. In statistics, the term

“variance components models” is often used in addition to

mixed effect models, alluding to the fact that one may decom-

pose the variance into components attributable to within-groups

versus between-groups effects. All these terms are closely re-

lated, albeit emphasizing different aspects of the LMM. In the

context of repeated measure, let

is an

n×

vector of

observations from the ith subject. Then, the LMM (Laird &

Ware, 1982) is as follows

iiii i

Y XZb

βε

= ++

In this model,

( )

1, ,

ii pi

X XX=

, where

is an

n p×

fix e d effect design matrix whereas

are known

nq×

const ant design matr ices.

( )

βββ

=

is an p-dimensional

vector and unknown coefficients of the fixed effects. Here,

is assumed to be multivariate normally distributed with mean

vector 0 and variance matrix

. Thus, the random effects

vary by group. In addition, variance-covariance matrix

diag

( )

Ψ,Ψ

should be symmetric and positive semidefinite

(Laird & Ware, 1982). The

are vectors of error term and

assu med to follow a normal distribution with mean vector 0 and

variance-covariance matrix ,

, which are the same for all

subjects. It is also commonl y assumed th at

is di agonal and

all diagonal values are equal,

. However, instead of assum-

ing equal variance in grouped data, it is possible to extend to

allow unequal variance and correlated within-group errors.

The vectors

and

are assumed to be independent.

OPEN ACCE SS

J. RA, K.-J. RHEE

Method

Participants and Va r i ables

This study takes place in a public university in Republic of

Korea, between the years 2009 and 2010, over two semesters.

Participating students (n = 281) enrolled in TOEIC classes for

four hours a week. Except students’ TOEIC scores, Th e TOE I C

dataset records 20 predictors. Among 20 predictors, 13 are

continuous: age, father’s education level (FEL), mother’s edu-

cation level (MEL), SES, English study time (EST), reading

time, level of reading competence (LRC), materials written in

English (ME), level of computer skill (LC), length of private

tutoring (LPT ) , three mean-centered cognitive assessment

scores (STAS: State and trait anxiety scale, FLCAS: Foreign

language classroom anxiety scale, FRAS: Foreign language

readin g anxiety scale); an d 7 are categorical: major, gen der, ex-

perience of private tutoring (EPT), experience of h avi ng foreign

instructors (EFI), living areas, length of staying at abroad

(LSA), experience of staying English speaking countries (ESE).

The wave 2, 3, and 4 data ar e collected ever y t hree mont hs after

collect ing wave 1 dat a.

Procedures

All the analysis are p erformed with R (R Development Core

Team, 2013) due to the unavailability of the group LASSO

approach in standardized statistical packages such as SPSS.

Once statistically significant predictors in the model are ob-

tained, goodness-of-fit for the LMM can be considered. Among

different types of 2such as unweighted concordance correla-

tion coefficient (CCC: Venesh et al., 1996), and proportional

reduction in penalized quasi-likelihood (Zheng, 2000), the mar-

ginal

(Vonesh & Chinchilli, 1997) is easy to compute and

interpret in that it is a straightforward extension of the tradi-

tional

(Orelien & Edwards, 2008). The marginal

in this

analysis for s electin g relevant vari ables is defined as follo ws

() ()

() ( )

ˆ ˆ

ii i

i pii pi

YY YY

RYY YY

−−

= −−−

∑

Given the equation shown above,

,nof observations, is a

observed response variable and

ˆi

is a predicted response

variables.

is the grand mean and is an vector of 1’s. This

equation implies

and consid ers only fixed effect s. In

addition, marginal2modeling the average subject (

)

leads to the terms average model (Vonesh & Chinchilli, 1997)

whe re

is the proportionate reduction in residual variation

explained by the modeled response of the average subject. Thus,

when important predictors in the model are not included, the

values o f margi nal

decrease sha rpl y. If the ran dom eff ects

are excluded in the computation of the predicted values that

lead to the residuals, the marginal

is able to select the

most parsimonious model.

Results

For descriptive analysis, frequencies and percentages of all

variables are calculated. Regarding categorical variables, there

are 9 different majors having similar number of students who

are participated in this study except two majors (Child Educa-

tion and Occupational Therapy major) which consist of less

than 10 % of total sample si zes, resp ectivel y. Also, t here are th e

smallest number of students (n = 14) in Child Education major

compared to other majors. Relatively a large number of stu-

dents (n = 35) from the Chung-Nam areas are participated. In

accordance with the experiences of having classes with for-

eign-instructors, about 44.9% of students do not have any ex-

perience. About 15.7% of them have experience studying

abroad and 29.2% are male. Furthermore, almost 60% of stu-

dents never have a pri vat e tutoring.

For continuous variables, the mean and standard deviation of

continuous variables are calculated. In terms of outcomes

across 4 wave point s, reading scores of TOEIC are in creased as

time incr eas es, 234.69, 274, 94, 264.75, and 284.03 respec-

tively. However, scores of TOEIC ar e sligh tl y drop ped between

wave 2 and wave 3. The average age of students is 20.11 years

old. The averag e ed ucat io n level o f fathers (3 .4 2 ) is little h igh er

than that of mothers (3.13). Furthermore, the significantly dif-

ferent TOEIC scores across four waves are shown among dif-

ferent majors. Furthermore, Figure 1 shows that students in

medical major has hi gh initial TOEIC sco res.

The existence of relationship between reading achievement

and predictors across wave 1, wave 2, wave, 3 and wave 4 is

analyzed using four separate multiple regression runs. Results

show that there are four majors statistically significant majors

(medical, nursing, e-business, tourism) across 4 wave points.

Besides students’ major, four separate multiple regression

models contains only one variable (LRC) across four wave

points in common.

Results are also obtained from the four separated backward

elimination procedures for the each wave, including nineteen

predictors in the full model. Only five majors (medical, nursing,

e-business, tourism, childcare majors) are statistically signifi-

cant across four wave points. Besides individual’s major, there

are seven significant predictors across four separate analyses;

two variables (MEL and LRC) at the first, two variables (ME

and LRC) for the second wave, on e variabl e (LRC) at the third,

and six variables (gender, FEL, EST, ME, LRC and STAI) at

the fourth wave point. The interesting point is that fou r separat e

backward elimination procedures contain different sets of pre-

dictors in the model. It might imply that the backward elimina-

tion method is not suitable for dealing with the repeated meas-

urement.

Figure 1.

Individual TOEIC scores across 4 waves in medical majors.

OPEN ACCE SS 9

J. RA, K.-J. RHEE

Compared to the backward elimination method, the group

LASSO contains more predictors in the model. In addition, four

separate group LASSO procedures contain different types of

predictors. Besides students’ majo r, total seventeen predictors

are included across four separate models; fourteen variables

(gender, age, area, FEL, MEL, EFI, EST, ME, LRC, LSA, LPT,

LC, STAI, and FLRAS) are selected in the first wave, then ten

variables (place, MEL, ME, LRC, LSA, LP T, and STAI) in the

second wave, nine variab les (gender, income, MEL, ME, LRC,

LS A, E PT, LC , an d S TAI) i n the th ir d wave, and n in e variab les

(age, area, FEL, EST, ME, LRC, LC, STAI, and FLRAS) in the

fourth wave.

Compared to multiple regression and backward elimination

method , the group LASSO i ncludes mor e categorical variables,

such as ar ea, place, and length o f staying abr oad in the fin ali zed

model. However, the results show that four separate group

LASSO methods also contain different sets of predictors in the

model. This might suggest inappropriateness of using the group

LASSO to the repeated measurement.

Results obtained from the LMM show that all the majors and

four continuous explanatory variables (MEL, LST, ME, and

LRC) are included in the finalized model. The results reveal

that TOEIC achievement is positively related with MEL (p

< .05), LST (p < .01), an d LRC (p < .01) but n egatively related

ME (p < .01). Interesting finding is that ME positively affects

TOEIC achievement positi vel y in univariate analysis but affects

TOEIC achievement negatively when considered ME condi-

tional on students’ major, LRC, LST, and MEL.

Once selecting statistically significant predictors in the

model, changes of marginal

across all possible combina-

tions of predictors are calculated in Table 1. Table 1 shows

that Model 1 only contains MAJOR and LRC in the model.

Model 2 includes MAJOR and LRC with other three predictors

(MS, LST and MEL). To identify which predictors mostly af-

fec t TOEIC achievement, margi nal

for all possible combi-

nations within Models 2 are also considered.

However, there is less variations among all possible combi-

nations in Model 2. Values of the marginal

for all possible

combinations of the selected predictors range from .518 to .527.

Model 3 contains five predictors: students’ major, LRC, EST,

ME, and MEL predictors selected from the LMM. Finally,

Model 4 includes all twenty predictors in the model.

Valued of four different marginal

s for Model 1, Model 2,

Model 3 and Model 4 are .513, .527, .539, and .544, respec-

tively. Figure 2 also describes the changes of marginal

across four different models.

As shown in Figure 2, there is less chan ges of marginal

(.005) between Model 4 including nineteen predictors and

Model 3 including 5 predictors. However, compared to changes

of marginal

from Model 4 to Model 3, changes of mar-

ginal

from Model 3 to Model 2 is relatively large, .012.

This result suggests that four continuous variables (LRC, ME,

EST, and MEL) should be included in the model.

Conclusion and Discussion

This stu dy examines the relation of TOEIC achievemen t and

twe nt y predictors under four different statistical methods. Dif-

ferent sets of predictors are selected in four different statistical

methods. The results show that there is a strong evidence to

support the existence of relation between TOEIC achievement

and some predictors included in this study. Without considering

Tabl e 1.

Marginal

for all pos sible combinat ion.

Model Va r iabl es Marginal

2

1 Major, LRC 0.513

2 Major, LRC, ME 0.518

Major, LRC, EST 0.521

Major, LRC, MEL 0.519

Major, LRC, ME, EST 0.527

Major, LRC, ME, MEL 0.527

Major, LRC, EST, MEL 0.527

3 Major, LRC, EST, ME, MEL 0.539

4 All variables 0.544

Figure 2.

Changes of marginal

other predictors, there are much variation in TOEIC reading

achievement among nine different majors. As expected, stu-

dents in medical program have high TOEIC scores compared to

others in different programs. Thus, it is necessar y to in vestigate

predictors which affect growth of TOEIC scores while consid-

ering group difference. Results from this study also show that

LRC (levels of English ability) is a useful variable to explain

and predict TOEIC achievement. Interestingly, LRC is signifi-

cant across four different statistical methods. It makes sense

since the levels of English ability affect TOEIC reading

achievement positively across four waves. However, when

negative rel ation ship between EM an d TOEIC ach ievement h as

emerged when considered ME predictor conditional on other

predictors (major, LRC, EST, and MEL) in the model.

The LMM reveals that there is little variation in the values of

marginal across all possible combinations of predictors in-

cluded in the final model. Among four different statistical

methods, the LMM model seems to be most effective and use-

ful to build a parsimonious model with important and mean-

ingful predictors because it takes into account the repeated

measure ments, which is flexible, and powerful to analyze bal-

anced and unbalanced grouped data. However, these results

must be regarded as very tentative and inconclusive because

this is a search for plausible predictors, not a convincing test of

any theory. Further development based on these results would

require replication with other data and explanation of wh y these

variables appear as predictors of continuity of achievement.

Moreover, this study has some limitations. Besides simply

finding important variables, it is necessary to deal with other

considerations such as optimal size of variables, interaction

effects, and ratio of variables and observations (O’Hara & Sil-

lanpaa, 2009). Another limitation is that the best-fit model

OPEN ACCE SS

J. RA, K.-J. RHEE

among four statistical models is not pursued since the objective

of this r es earch is to test hypotheses based on theories.

Concerning the LASSO method, the group LASSO method

enjoys great computational advantages and excellent perform-

ance, and a number of nonzero coefficient in the LASSO and

the group LASSO method are an unbiased estimate of the de-

gree of freedo m (Efron et al. , 2004) . However, it is necessar y to

consider the LASSO method in the hierarchical structure for

further studies since experiment and survey designs should be

included in the model. Then, the LASSO method in the LM

model framework is useful to explain random effects. Despite

the limitations listed above, this study would contribute to the

field of education as a better way of explaining of relationship

between personal predictors and English ach ievement.

REFERENCES

Agresti, A. (2002). Categorical data analysis (2nd ed.). Boboken, NJ:

John Wiley & Sons.

Agresti, A., & Finlay, B. (1986). Statistical method for the social sci-

ences (2nd, ed.). San Francisco , CA: Dellen.

Akaike, H. (1973). Information theory and an extension of the maxi-

mum likelihood principle. In B. N. Petrov, & F. Csaki (Eds.), Second

international symposium on information theory (pp. 267-281). Bu-

dapest: AcademiaiKiado.

Altman, D. G., & Andersen , P. K. (1989). Boot st r ap investi gation of the

stability of a Coxregression model. Statistics in Medici ne, 8, 771-783.

Bernstein, I. H. (1989). Applied multivariate analysis. New York:

Springer-Verlag.

Bond ell, H. D. , & Rei ch, B. J. (200 8). Simultaneous regression shrink-

age, variable selectionand clustering of predictors with OSCAR. Bio-

metrics, 64, 115-123.

Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Appli ed mul-

tipleregression/correlation analysis for the behavioral sciences (3rd

ed.). Ma hwah, NJ: Lawrence Erlbaum.

Derksen, S., & Keselman, H. J. (1992). Backward, forward and step-

wise automated subset selection algorithms. British Journal of Ma-

thema tical and St a tistical Ps y c ho lo gy, 45, 265-282.

Efron, B., Hastie, T., Johnstone, I., & Tib shi rani , R. (2004 ). Lea st an gle

regr ession. The Annals of Statistics, 32, 407-489.

Fan, J., & Li, R. (2001). Variable selection vianonconcave penalized

likelihood and its oracle properties. Journal of the American Statis-

tical Association, 96, 1348-1360.

Foster, D. P., & George, E. I. (1994). The risk inflation criterion for

multiple reg ression. The Anna ls of Sta tis t ic s, 22, 1947-1975.

George, E. I., & McCulloch, R. E. (1993). Variable selection via Gibbs

sampling. Journal of the American Statistical Association, 88, 881-

889.

Gilks, W. R., Wang, C. C, Yvonnet, B., & Coursaget, P. (1993). Ran-

dom effects models for longitudinal data using Gibbs sampling. Bio -

metrics, 49, 441-453.

Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression.

New York: Joh n Wiley& Sons.

Laird, N., & Ware, J. H. (1982). Random effec t m od els for lon gi tu din al

data. Biometrics, 38, 963-974.

Mallows, C. L. (1973). Some comments on Cp. Technometrics, 15,

611-675.

Meier, L., van de Geer, S., & Buhlmann, P. (2008). The group lasso for

logistic regression. Journal of Roya l S ta tistica l So ci e ty , B, 70, 53-71.

Menard, S. (1995). Ap p lied logist ic regression analysis (Sage university

paper series on quantitative application in the social sciences, series

no. 10 6) (2nd ed.). ThousandOaks, CA: Sage.

O’Hara, R. B., & Sillanpaää, M. J. (2009). A review of Bayesian vari-

able selection methods: what, how and which. Bayesian Analysis, 4,

85-118.

Orelien.,& Edwards, L. J. (2008). Fixed effect variable selection in

linear mixed models using statistics. Computational Statistics &

Data Analysis, 52 , 1896-1907.

R Development Core Team. (2013). R: A language environment for

statistical computing. Vienna, Austria: The R foundation for statisti-

cal computing. h ttp:// www.R -project.org/

Schwarz, G. (1978). Estimating the dimension of a model. Th e Annals

of Statistics, 6, 461-464.

Tibshi rani, R. (199 6). Regression shrinkage and selection vi a the lasso.

Journal of Royal Statistical So ciety, B, 58, 267-288.

Vonesh, E. F., & Chinchilli, V. M. (1997). Linear and Nonlinear mod-

els for the analysis of repeated measurement. New York: Marcel

Dekker.

Vonesh, E. F., Chinchilli, V. M., & Pu, K. W. (1996).Goodness-of-fit in

generaliz ed nonlinear mixed-effects model. Biometrics, 5 2, 572-587.

Yuan, M., & Lin, Y. (2006).The composite absolute penalties family

for group ed and hierarchical variable selection. Journal of the Royal

Sta tistical Society, B, 68 , 49-67.

Zhang, H. H. Wahba, G., Lin, Y., Voelker, M., Ferris, M., Klein, R., &

Klein, B. (2004). Variable selection and model building via like li-

hood b a sis pu rsui t. Journal of the Amer ica n Sta tisti cal A ssocia tion,

99, 659-672.

Zheng, B. Y. (2000). Summarizing the goodness of fit of generalized

lin ear models for longitudinal data. Statistics in Medicine, 19, 1265-

1275.

Appendix

Group LASSO (R code)

toeic.tr < -as.data.frame(toeic.group[ind[1:225],])

#GROUP LASSO

Cols < -ncol(toeic.tr)-1

index.lasso < -c(rep(0,cols))

numgr < -length(gr)

stg < −1

ltg < -gr[1]

for (i in 1:numgr) {

index.lasso[(stg:ltg)] < -1

if ( I < numgr) {

stg < -stg + gr[i]

ltg < -ltg + gr[I + 1]

}

Lamda < -c(2000, 1500, 1000, 500, 1 00 , 10, 1, 0.1, 0.01)

fol d < −10

lamda.lasso < -cvlasso Reg (y~. , to eic.tr, fold, cvind, index.

lasso, l am)

ini.lasso < -grplasso (x = as.matrix(toeic.tr [,-30]),

y = as.matrix(toeic.tr[,30]), index = index.lasso, lamda = lam.

lasso, model = LinReg(), p enscale = sqrt)

lasso.pred < -as.matrix (as.mat rix (toeic.te [,−30]))% * % ini.

Lasso $ coefficients

beta.l as s o < -ini.lasso$coef/sx

OPEN ACCE SS 11