Open Journal of Statistics
Vol.04 No.06(2014), Article ID:49247,18 pages
10.4236/ojs.2014.46045
Multiple Choice Tests: Inferences Based on Estimators of Maximum Likelihood
Pedro Femia-Marzo*, Antonio Martín-Andrés
Biostatistics, Faculty of Medicine, Department of Statistics and OR, University of Granada, Granada, Spain
Email: *pfemia@ugr.es, amartina@ugr.es
Copyright © 2014 by authors and Scientific Research Publishing Inc.
This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/



Received 25 May 2014; revised 28 June 2014; accepted 12 July 2014
ABSTRACT
This paper revises and expands the model Delta for estimating the knowledge level in multiple choice tests (MCT). This model was originally proposed by Martín and Luna in 1989 (British Journal of Mathematical and Statistical Psychology, 42: 251) considering conditional inference. Consequently, the aim of this paper is to obtain the unconditioned estimators by means of the maximum likelihood method. Besides considering some properties arising from the unconditional inference, some additional issues regarding this model are also going to be addressed, e.g. test-inversion confidence intervals and how to treat omitted answers. A free program that allows the calculations described in the document is available on the website http://www.ugr.es/local/bioest/Delta.
Keywords:
Model Delta, Multiple-Choice Tests, Agreement, Guessing

1. Introduction
Multiple-choice tests (MCT in the following) are widely known as psychometric instruments intended to measure the degree of knowledge of students about a specific matter. Nowadays, the enormous development in information technologies encourages new teaching methodologies in which MCTs constitute a fast and objective way of evaluation; especially when there is a large number of students. On the other hand, MCT also stimulate students’ active and self-managed learning. From a psychometric standpoint, MCTs are tools that can be adapted to different disciplines and knowledge levels, allowing high-level cognitive reasoning to be measured [1] . At the same time MCTs can give greater validity and reliability than other methods of [2] [3] . Nevertheless, how to determine the students’ degree of knowledge of the subject matter from responses to MCT is still a topic in debate [4] [5] .
In this paper we are going to consider a model to assess the students’ knowledge from MCT whose foundations were laid by Martín & Luna in 1989. We are going to call this model Delta. Originally the estimation of its parameters was addressed by means of a conditional method [6] . So the goal here is to develop the unconditional (maximum likelihood) procedure. Once we achieve this goal, we will review some of the features of the model based on the unconditional method and develop some additional advances; e.g. the estimation of the degree of knowledge in presence of unanswered questions, a usual situation that was not addressed in the original formulation. But before studying the unconditional method we will introduce the notation and give some background on the model Delta proposed by M & L.
The model Delta is applicable to MCTs of the type “single best-answer multiple-choice questions”. This type of test consists of
items, each of which is composed of a statement (the stem) and
alternative answers, of which only one is correct (the key) and the remainder are distractors [1] . When a student answers the items of an MCT, the raw data can be summarized as shown in Table 1. Here
is the number of times that the alternative
is the correct answer,
is the number of times that the student gives the alternative answer
and
refers to number of times that the student gives the alternative answer
when the correct one is the alternative
; obviously,
. When there are omissions,
stands for the number of items attempted and
for the number of answered questions when the alternative
is the correct one. In any case
and
refer to the number of answers given by the student and in presence of omissions
. If matrix notation is needed, we will consider



In the past, various different scoring rules have been considered for evaluating a student’s degree of knowledge with respect to the data in Table 1. The simplest scoring rule that is consistent with number right scoring, has traditionally been criticized because it does not take random guessing into account, and this has given rise to various formula scoring rules [4] [7] - [9] . In 1982 Hutchinson [10] , using a model based on the theory of finite states, suggests that the test-taker’s knowledge level is given by a


This rule, originally proposed by Lord and Novick [11] , is the classic penalty for guesses, according to which each incorrect answer is penalized with


The model proposed by M & L [6] follows Hutchinson’s concept, according to which the examinee is assumed to know a proportion


Table 1. Summary of the raw data of a MCT with















where












According to this, the estimation of the examinee’s knowledge level proposed by M & L is:

an expression which demonstrates that the relevant information for evaluating the test taker’s knowledge level is the sum of the proportions of successes, not the total of these. When the distribution of correct options in each position is homogeneous, that is if


Unlike the classical scoring rules, this model allows to address several questions of statistical inference: performing a contrast hypothesis on



In their formulation, M & L [6] considered a conditional method to obtain the estimator


2. Estimation of Parameters Using the Method of Maximum Likelihood
In the following and for the sake of simplification, let us focus on the particular case where the whole of the questions are answered. We will consider how to treat the more natural situation where there are omissions later on in this paper; but until then


Under M & L’s model, the













where










A) When





B) When




C) Otherwise


and the estimators


Solutions








Note that the model assumes that the



In the particular case of


where the third expression is the same value


3. Fit of the Model
Given that






which will have to be compared in the classic manner with a theoretical distribution








One observation need to be made here about expression (9): when

4. Standard Error of the Estimators
In Appendix III the variance-covariance matrix of the estimators for the parameters of the model is obtained. The elements of greatest interest are





where

By substituting parameters




one finds that the estimate of standard error in these will be



It can be seen that







Moreover, according to the exposition in Appendix III, the term inside the set of brackets in expression (10) corresponds to the exposition of a quadratic form depending on the parameters of the model which is positive-definite, so that the lower its value, the lower






The fact that the













5. Maximum Variance in MCTs with Homogeneous Distribution of the Correct Alternatives
5.1. General Case
If





an expression which coincides with the prediction of maximum variance carried out by M & L [12] for the balanced tests. This is why the consequences deriving from this expression (and which are set out in the following) are the same as the ones given by the said authors. The value of





It can be seen that the explicit value which










From here on, one can estimate the effect that increasing the number of







Similarly, adding more items to the test has a more meaningful effect in reducing the maximum variance when working with values lower than n.
An additional result is that one can determine the number





For example, a test with





Finally, in the tests with balanced values of












where







5.2. Special Case of K = 2
As has been pointed out previously, in the tests with only two alternatives (such as true/false)



However, the principle considered by M & L for obtaining this variance is still valid: if K = 2 the random variables




a simpler expression than (19) and the estimation of which is given by

The case of


Note that, for a given value of









6. Confidence Intervals for
6.1. General Case
The classic form of expressing the




In reality, in this situation a one-sided CI of the type

Agresti and Min [17] showed that for discrete data (like those shown here) it is more appropriate to obtain the CI by inverting a test because in that way narrower CI are obtained. In addition, it has the advantage of making the results of the test and the CI compatible. The principle is that if





by


under the null hypothesis and







where



termine the values of

A) When



B) When

C) When



D) Otherwise,

where


The

when



The problem with resolving expression (25) is that it is difficult to compute, because it is necessary to iterate it in





6.2. The Case of K = 2
When





With regard to the chargeable computer programs, the most usual one is StatXact, a statistical software for small-sample categorical and nonparametric data problem solving [19] .
With respect to the free programs, the webpage http://www.ugr.es/local/bioest/ software gives a large number of these, both exact and asymptotic, and for the case in which the


A very simple (and reliable) procedure which allows one to obtain the asymptotic CI for


where






7. Treatment of Omitted Responses
Let us now consider how to treat omitted responses; i.e. when there is at least one



The first proposal consists in considering that the omissions are due to the fact that the student does not know the corresponding answers. Consequently, the idea is to estimate the degree of knowledge




Thus a student who answers 50 out of 100 questions and gets


The second proposal is to consider the imputation of the omitted answers. Assuming the pattern given by the estimates


In this case, the confidence intervals are those obtained from this new data matrix
When the MCT is balanced both methods give similar results. Otherwise, the imputation method implies a penalty for omissions that can be lower or higher depending on the pattern of the vectors of probabilities


8. Examples
In Table 2 there are three examples of MCT with

Table 2. Cases with data proceeding from the MCT with

which were considered in M & L’s original paper [6] . This is a balanced test and the results show that the original conditional estimation fitted well with respect to the maximum likelihood estimation; note that this includes the estimation of the variance


Table 3 covers two cases of MCT with



Table 4 shows the treatment of omitted answers according to the methods introduced in the previous section.
9. Discussion
In this paper the model Delta for MCT has been revised and expanded. This model allows for addressing the assessment of the level of knowledge of a MCT taker from a statistical perspective. Besides, it also allows objectively characterizing some properties of this kind of tests, such as the optimal number of choices or the test length.
Given that the estimator

Regarding the main point on MCTs, the decision-taking as to whether or not the examinee exceeds a given knowledge level


Given that the proposed inference methods in this paper require a large amount of computation, readers may obtain a free program which carries these out, on the website of our group [23] .
Table 3. Two examples of a test with

Table 4. Treatment of omitted answers. In this example




The behaviour of Delta can be observed by means of simulations. Figure 1 shows the point estimation of the parameter


Let us conclude by saying that the model Delta is constructed from formal and consistent standpoints [12] . It generalizes measures for evaluating knowledge which have already been considered from the classic point of view by measurement specialists [4] [7] [11] . Furthermore, this model has been extended successfully to cover more complex situations [13] -[15] .
Figure 1. Behaviour of the model Delta. 50,000 simulations of MCT with K = 3 alternatives and




Table 5. Invariance of Delta and its SE under all possible re-arrangements (permutations) of the distracters.
Acknowledgements
This research was supported by the Spanish Ministry of Education and Science, grant number MTM2012-35591 (co-financed by the European Regional Development Fund).
Cite this paper
PedroFemia-Marzo,AntonioMartín-Andrés, (2014) Multiple Choice Tests: Inferences Based on Estimators of Maximum Likelihood. Open Journal of Statistics,04,466-483. doi: 10.4236/ojs.2014.46045
References
- 1. Tarrant, M., Ware, J. and Mohammed, A.M. (2009) An Assessment of Functioning and Non-Functioning Distractors in Multiple-Choice Questions: A Descriptive Analysis. BMC Medical Education, 9, 40-48.
http://dx.doi.org/10.1186/1472-6920-9-40 - 2. Simkin, M.G. and Kuechler, W.L. (2005) Multiple-Choice Tests and Student Understanding: What Is the Connection? Decision Sciences Journal of Innovative Education, 3, 73-97.
http://dx.doi.org/10.1111/j.1540-4609.2005.00053.x - 3. Gronlund, N.E. and Waugh, C.K. (2008) Assessment of Student Achievement. Pearson, Upper Saddle River.
- 4. Scharf, E.M. and Baldwin, L.P. (2007) Assessing Multiple Choice Question (MCQ) Tests—A Mathematical Perspective. Active Learning in Higher Education, 8, 31-47.
http://dx.doi.org/10.1177/1469787407074009 - 5. Lesage, E., Valcke, M. and Sabbe, E. (2013) Scoring Methods for Multiple Choice Assessment in Higher Education—Is It Still a Matter of Number Right Scoring or Negative Marking? Studies in Educational Evaluation, 39, 118-193. http://dx.doi.org/10.1016/j.stueduc.2013.07.001
- 6. Martín Andrés, A. and Luna del Castillo, J.D. (1989) Tests and Intervals in Multiple Choice Tests: A Modification of the Simplest Classical Model. British Journal of Mathematical and Statistical Psychology, 42, 251-263.
http://dx.doi.org/10.1111/j.2044-8317.1989.tb00914.x - 7. Budescu, D. and Bar-Hillel, M. (1993) To Guess or Not to Guess: A Decision-Theoretic View of Formula Scoring. Journal of Educational Measurement, 30, 277-291.
http://dx.doi.org/10.1111/j.1745-3984.1993.tb00427.x - 8. Bar-Hillel, M., Budescu, D. and Attali, Y. (2005) Scoring and Keying Multiple Choice Tests: A Case Study in Irrationality. Mind & Society, 4, 3-12.
http://dx.doi.org/10.1007/s11299-005-0001-z - 9. Espinosa, M.P. and Gardazabal, J. (2010) Optimal Correction for Guessing in Multiple-Choice Tests. Journal of Mathematical Psychology, 54, 415-425.
http://dx.doi.org/10.1016/j.jmp.2010.06.001 - 10. Hutchinson, T.P. (1982) Some Theories of Performance in Multiple-Choice Tests, and Their Implications for Variants of the Task. British Journal of Mathematical and Statistical Psychology, 35, 71-89.
- 11. Lord, F.M. and Novick, M.R. (1968) Statistical Theories of Mental Test Scores. Addison-Wesley, Menlo Park.
- 12. Martín Andrés, A. and Luna del Castillo, J.D. (1990) Multiple Choice Tests: Power, Length and Optimal Number of Choices Per Item. British Journal of Mathematical and Statistical Psychology, 43, 57-71.
http://dx.doi.org/10.1111/j.2044-8317.1990.tb00926.x - 13. Martín Andrés, A. and Femia Marzo, P. (2004) Delta: A New Measure of Agreement between Two Raters. British Journal of Mathematical and Statistical Psychology, 57, 1-19.
http://dx.doi.org/10.1348/000711004849268 - 14. Martín Andrés, A. and Femia Marzo, P. (2005) Chance-Corrected Measures of Reliability and Validity in K × K Tables. Statistical Methods in Medical Research, 14, 473-492.
http://dx.doi.org/10.1191/0962280205sm412oa - 15. Martín Andrés, A. and Femia Marzo, P. (2008) Chance-Corrected Measures of Reliability and Validity in 2 × 2 Tables. Communications in Statistics-Theory and Methods, 37, 760-772.
http://dx.doi.org/10.1080/03610920701669884 - 16. Peirce, C.S. (1884) The Numerical Measure of Success in Predictions. Science, 4, 453-454.
http://dx.doi.org/10.1126/science.ns-4.93.453-a - 17. Agresti, A. and Min, Y. (2001) On Small Sample Confidence Intervals for Parameters in Discrete Distributions. Biometrics, 57, 963-971.
http://dx.doi.org/10.1111/j.0006-341X.2001.00963.x - 18. Dunnett, C.W. and Gent, M. (1977) Significance Testing to Establish Equivalence between Treatments, with Special Reference to Data in the Form of 2 × 2 Tables. Biometrics, 33, 593-602.
http://dx.doi.org/10.2307/2529457 - 19. Cytel (2014) StatXact Statistical Sofware.
http://www.cytel.com/software-solutions/statxact - 20. Group of Biostatistics of the University of Granada (Spain) (2014) Statistical Software.
http://www.ugr.es/~bioest/software.htm - 21. Agresti, A. and Caffo, B. (2000) Simple and Effective Confidence Intervals for Proportions and Difference of Proportions Result from Adding Two Successes and Two Failures. The American Statistician, 54, 280-288.
- 22. Altman, D.G., Machin, D., Bryant, T.N. and Gardner, M.J. (2000) Statistics with Confidence. 2nd Edition, BMJ.
- 23. Femia Marzo, P. and Martín Andrés, A. (2014) Software Delta Website.
http://www.ugr.es/local/bioest/Delta - 24. Meyer, C.D. (2000) Matrix Analysis and Applied Linear Algebra. Society for Industrial and Applied Mathematics (SIAM), Philadelphia.
Appendix I. Possible Values of
Since


where



increases with πi. Because




Appendix II. Estimation of the Parameters of the Model
The likelihood function for the model under consideration is

sion (2). Hence, if

This means that in order to obtain the estimators







1. Estimation of the Parameters πi
Since





where




If


The sum of which results in

By adding by


From the above expressions it is possible to deduce some conditions to be verified by the constant








In addition, from expression (A4):

In order to estimate the parameters





from which:

As


then, from these three expressions and from expression (A10) one can deduce that:

and as a result:

With the goal of verifying whether, of the two possible solutions given by expression (A10), only the one obtained with the positive sign ?




1) When




2) When





3) When














4) Finally, when




Because in this case




As the solution is always


when the goal is to estimate










2. Estimation of the Parameter Δ
In this section it is understood that both the parameters πi and the parameter D are unknown. Now,

When


then













When

By substituting this value in the Equations in the previous section the following results are obtained. By substituting in the expression in (A9), each πi should verify that:

so that expression (A10) ?which, as we know, only makes sense for the sign “+”? indicates that:

As a result, the Equation (A15) becomes:

which, because it is necessary to resolve this in

Once the value of




Finally, the inequality (A7) and the fact that the numerator in the first expression (A21) should be positive indicate, respectively, that:

3. Unicity of Solutions
In order to see that the solution to Equation (A20) is unique, let us look at the function












hand the function is convex with respect to








if









Similarly, in order to see whether the solution to the Equation (A15) is unique, let us look at the function






Since






expression (A12), and thus


pression (A22), and hence




Appendix III. Standard Error in the Estimators
With the aim of obtaining the matrix



by the elements



where:

Given the type of sampling adopted (the values for ri are previously fixed), the following must occur:



And by defining



where

In order to determine


where obviously











In order to determine







where


By substituting the




The other elements of






By operating in both expressions one finds that the


while


Obviously, the elements of the diagonal of


The substitution of the unknown parameters with their respective estimators produces the corresponding esti-
mators for the variances; i.e.


It should be pointed out that the magnitude of








the proposed result is thus obtained. Therefore, the minimum value for

when


NOTES
*Corresponding author.











