This study attempted to interpret differential item discriminations between individual and cluster levels by focusing on patterns and magnitudes of item discriminations under 2PL multilevel IRT model through a set of variety simulation conditions. The consistency between the mean of individual-level ability estimates and cluster-level ability estimates was evaluated by the correlations between them. As a result, it was found that they were highly correlated if the patterns of item discriminations were the same for both individual and cluster levels. The magnitudes of item discriminations themselves did not affect much on correlations, as far as the patterns were the same at the two levels. However, it was found that the correlation became lower when the patterns of item discriminations were different between the individual and cluster levels. Also, it was revealed that the mean of the estimated individual-level abilities would not be necessarily a good representation of the cluster-level ability, if the patterns were different at the two levels.
Multilevel modeling has become a popular data analysis technique in psychological and educational measure- ment. Traditional psychometric models, such as classical test theory and item response theory (IRT) models, do not account for a nested structure of the data. Multilevel modeling becomes important when researchers analyze nested data, because it takes into account of both within and between cluster variations of the data. One of pop- ular multilevel modeling techniques is a hierarchical generalized linear model (HGLM). However, when HGLM is applied to multilevel IRT [
IRT models define the relationship between observed item scores and latent constructs for dichotomous and polytomous item response data. An IRT framework has been extended to a multilevel data structure [
One popular form of an IRT model for dichotomously scored items is a 2-parameter logistic (2PL) model, where the probability of an individual correctly responding to an item depends on individual’s ability, item dif- ficulty, and item discrimination. The 2PL IRT model can be written as
where
The 2PL IRT model also allows the item discriminations to vary freely across items in a test. Its extension to a multilevel IRT model has been investigated and documented by several authors [
Some authors investigated a cluster-level IRT modeling. For example, Mislevy [
Assuming we fit a 1PL multilevel IRT model. The pattern of item discrimination would be exactly the same for both individual and cluster levels. In other word, it obtains only one pattern of the item discriminations for both levels (e.g., same pattern of item discrimination across both levels). The magnitudes of item discrimination would be exactly the same for both individual and cluster levels (e.g., 1.0 for all items across both levels). On the other hand, once the 2PL IRT model is applied to the multilevel model, the patterns and the magnitudes of the item discrimination could be different across levels. While the item discriminations may have the same pat- terns and the same magnitudes for both individual and cluster levels, the item discriminations may have the same patterns but different magnitudes between individual and cluster levels. The item discriminations may also have different patterns and different magnitudes between individual and cluster levels. However, effects of the difference in different patterns and magnitudes of item discrimination have not been demonstrated in literature yet with the multilevel IRT modeling perspective.
If differential patterns or magnitudes of item discriminations between levels affect estimates of individual and cluster level abilities differently, the conclusion that Tate [
An item discrimination indicates a quality of an item, because it dictates how strongly each item correlates with the ability being measured. A higher discrimination corresponds to a greater correlation with the ability, which also leads to a higher scoring weight for the item. Therefore, individuals who answer items with high dis- criminations correctly would have a higher estimated ability than individuals who answer items with lower dis- criminations correctly, given the same raw scores. In other words, it matters not only how many items were answered correctly, but also which items were answered correctly. Under the 2PL single-level IRT model, the individual ability estimates are weighted by item discriminations, such that
where
Since 2PL multilevel IRT model allows item discriminations to vary both at the individual and cluster levels, we hypothesized that different patterns of item discriminations between the levels affect patterns of scoring weights to be different between the levels. For this reason, our hypothesis was that the aggregated mean indi-
vidual-level abilities
item discriminations are the same between the individual and cluster levels. However, if the patterns of item discriminations are different between the levels, we hypothesized that the aggregated mean individual-level abil-
ities
ter-level ability
mated cluster-level ability
The 2PL multilevel IRT model was investigated under this study for both data generation and fitting the model. In IRT, the response outcome data are treated as categorical with binomial distributions. The idea of a 2PL mul- tilevel IRT model is to measure each latent variable incorporated in the multilevel model with the IRT model. The 2PL multilevel IRT model can be written as
where
level k,
and
normal distribution with the mean of zero and the standard deviation of 1.0.
It was assumed that the test consisted of 12 items with item difficulties ranging from −2.0 to 2.0. These item dif- ficulties were fixed across conditions. The biserial correlations between the ability estimate and the propensity for correct response were used to set up the item discriminations for individual and cluster levels. These five item discriminations (0.8, 1.0, 1.2, 1.6, and 2.2) represented biserial correlations of 0.40, 0.50, 0.55, 0.66, and 0.77 between the latent trait and the propensity to a correct answer. They created 16 sets of item discriminations classified into three patterns (see
Based on these specifications, the dichotomous item response data were randomly generated and fit by the 2PL multilevel IRT model as shown in Equation (3). The Mplus software, using the Maximum Likelihood esti- mator with robust standard errors was used to estimate model parameters (
ual-level abilities
evaluated the consistency of the aggregated mean individual-level abilities
level ability
Results demonstrated three primary scenarios regarding Pearson’s product moment correlation between the ag-
gregated mean individual-level abilities
. Item discriminations
Item Discriminations | |||||
---|---|---|---|---|---|
Set | Pattern | Individual Level | Cluster Level | ||
Items 1-6 | Items 7-12 | Items 1-6 | Items 7-12 | ||
1 | A | 0.8 | 0.8 | ||
2 | B | 0.8 | 1.2 | ||
3 | B | 0.8 | 2.2 | ||
4 | B | 1.2 | 0.8 | ||
5 | A | 1.2 | 1.2 | ||
6 | B | 1.2 | 2.2 | ||
7 | B | 2.2 | 0.8 | ||
8 | B | 2.2 | 1.2 | ||
9 | A | 2.2 | 2.2 | ||
10 | A | 1.0 | 1.6 | 1.0 | 1.6 |
11 | C | 0.8 | 1.0 | 1.6 | |
12 | C | 1.2 | 1.0 | 1.6 | |
13 | C | 2.2 | 1.0 | 1.6 | |
14 | C | 1.0 | 1.6 | 0.8 | |
15 | C | 1.0 | 1.6 | 1.2 | |
16 | C | 1.0 | 1.6 | 2.2 |
First, conditions with the same patterns and the same magnitudes of item discriminations for both levels (Pattern A), and conditions with the same patterns of item discriminations but the magnitude of item discriminations were higher for individual level (Sets 4, 7, and 8 under Pattern B), the correlations were very high. They were in the range of [0.757, 0.928]. These results indicated that the patterns and the magnitudes of item discriminations between levels affected the estimates and the correlations between the aggregated mean individual-level abilities
the mean of the individual-level ability of all individuals in that cluster. This statement also holds for the 1PL multilevel IRT as a special case. This result suggested that it is reasonable if one estimates the cluster-level abil- ity by computing the mean of the individual-level ability of all individuals in that cluster under these circums- tances. Both approaches would produce similar results. Also, within this pattern, the correlations slightly in- creased as the number of clusters and the cluster size increased.
Second, for conditions with the same patterns of item discriminations between levels but the magnitudes of item discriminations were lower for individual level (Sets 2, 3, and 6 under Pattern B), the correlations were moderately high. They were in the range of [0.497, 0.799]. These results showed that the individual level needed to be at least at the same level of item discriminations for cluster level for the aggregated mean individual-level
abilities
and the estimated cluster-level ability
some information when we aggregate the mean individual-level abilities
dividual-level abilities
Third, for conditions with different patterns and different magnitudes of item discriminations between levels (Pattern C), the correlations were much lower. They were in the range of [0.174, 0.462] among the six Pattern C
conditions. In this scenario, it became obvious that the aggregated mean individual-level abilities
the estimated cluster-level ability
item discriminations had a huge impact on the estimates of the aggregated mean individual-level abilities
and the estimated cluster-level ability
much on the estimates of the aggregated mean individual-level abilities
ability
generalization of the aggregated mean individual-level abilities
item discriminations under the 2PL multilevel IRT model behave differently from the differential item discrimi- nations under the 2PL single-level IRT model.
Overall, the Pearson’s product moment correlations between the aggregated mean individual-level abilities
and the estimated cluster-level ability
Correlations between the aggregated mean individual-level abilities and the esti- mated cluster-level ability
the aggregated mean individual-level abilities
individual traits have different meaning than the cluster-level traits under conditions where the patterns and the magnitudes of the item discriminations differ.
Regarding the effect of simulation factors (see also
individual-level abilities
however, not necessarily in the way we typically understand the effect of sample size.
Regarding the effect of item discrimination magnitudes, first, for conditions with the same patterns and the same magnitudes of item discriminations for both levels (Pattern A), the correlations increased as the magni- tudes of either item discriminations for the individual level or item discriminations for the cluster level increased. These results were not surprising to us because the item discrimination indicates a quality of an item. A higher item discrimination corresponds to a greater correlation with the ability, which also leads to a higher scoring weight for the item.
Second, for conditions with the same patterns but different magnitudes of discriminations between the levels (Pattern B), the correlations increased as the magnitudes of item discriminations for the individual level in- creased, given the same magnitudes of item discriminations for the cluster level. On the other hand, the correla- tions decreased as the magnitudes of item discriminations for the cluster level increased, given the same magni- tudes of item discriminations for the individual level. Also, the correlations were much lower when the magni- tudes of item discriminations of the individual level were much lower than the magnitudes of item discrimina- tions of the cluster level. Thus, the difference in magnitudes of item discriminations between levels affected the
Estimates of the aggregated mean individual-level abilities
This result confirmed that the patterns of item discriminations have to be at least the same between levels to ob- tain similar estimates between the aggregated mean individual-level abilities
Third, for conditions with different patterns and different magnitudes of item discriminations between levels (Pattern C), the correlations decreased as the magnitudes of either item discriminations for the individual level or item discriminations for the cluster level increased. These findings were interesting, because they were dif- ferent from what we typically understand the effect of item discrimination; that is, higher item discrimination leads to a higher scoring weight for the item. If this statement is true under the 2PL multilevel IRT modeling, the aggregated mean individual-level abilities
ities
criminations and the magnitudes of either item discriminations for the individual level or item discriminations for the cluster level increased. This result also demonstrated that the patterns of the item discriminations had substantial effect on ability estimates under the 2PL multilevel IRT model.
Finally, it should be noted that there were interaction effects between our simulation factors. Namely, the cor-
relations between the aggregated mean individual-level abilities
Our investigation demonstrated one way to interpret differential item discriminations between individual and cluster levels in the 2PL multilevel IRT context. We found that the patterns of item discriminations between le- vels are more important than the magnitudes of item discriminations to obtain consistent estimates of the aggre-
gated mean individual-level abilities
vealed that this would be justifiable only if the magnitudes of item discriminations for both levels share the same patterns. For this reason, a caution should be made when aggregating individually estimated abilities, because
the mean of the estimated individual-level abilities
Although our investigation looked into three simulation factors over 64 simulation conditions, there are of course other factors and conditions that should be investigated. For example, it would be interesting to see how the number of items on each level would affect the estimates of the aggregated mean individual-level abilities
between levels. The item difficulty is another interesting factor. It would be interesting to see the effect of the interaction between the item difficulties and the patterns of item discriminations between levels. The extremely low or high item discrimination magnitudes under different patterns of item discriminations between levels should be of interest for future investigation. Also, it would be interesting to further algebraically demonstrate how the individual ability estimates are weighted by the differential item discriminations under the 2PL multile- vel IRT model.