Open Journal of Statistics
Vol.09 No.04(2019), Article ID:94290,9 pages

Bayesian Approach to Ranking and Selection for a Binary Measurement System

Mark Eschmann1, James D. Stamey2, Phil D. Young3, Dean M. Young4

1Department of Statistical Science, Waco, TX, USA

2Department of Statistical Science, Baylor University, Waco, TX, USA

3Department of Information Systems, Baylor University, Waco, TX, USA

4Department of Statistical Science, Baylor University, Waco, TX, USA

Copyright © 2019 by author(s) and Scientific Research Publishing Inc.

This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).

Received: June 25, 2019; Accepted: August 10, 2019; Published: August 13, 2019


Binary measurement systems that classify parts as either pass or fail are widely used. Inspectors or inspection systems are often subject to error. The error rates are unlikely to be identical across inspectors. We propose a random effects Bayesian approach to model the error probabilities and overall conforming rate. We also introduce a feature-subset selection procedure to determine the best inspector in terms of overall classification accuracy. We provide simulation studies that demonstrate the viability of our proposed estimation ranking and subset-selection methods and apply the methods to a real data set.


Bayesian Statistics, Quality Control, Binary Measurement Systems, Misclassification

1. Introduction

Repeated binary testing, often referred to as a binary measurement system (BMS), is regularly used in quality control studies as a means of assessing the quality of the units produced. However, these inspection methods are highly dependent on the quality of the individual inspectors, thus making the inspection itself an integral part of the quality control process. Two aspects of evaluating the inspection process are repeatability and reproducibility. A process’s repeatability refers to how frequently a single inspector inspecting a single item will obtain the same result, while reproducibility refers to how often different inspectors inspecting the same item will reach the same conclusion. Estimating classification rates of a system has been considered by several authors. [1] considered various sampling plans to assess the qualities of a BMS. [2] found maximum likelihood estimators and method of moment estimators for the case of multiple raters, assuming fixed effects. When there are multiple inspectors, it may be of interest to determine which of several inspectors or inspection systems is performing best.

The model we consider here is a Bayesian version of [3] . Specifically, we consider a random effects model for multiple testers and multiple inspections per inspector from a Bayesian perspective. There are multiple advantages to using a Bayesian approach. For example, prior knowledge can be incorporated into the study with the use of informative prior distributions. This knowledge can be obtained either from previous data or expert opinion. Also, even in the absence of prior knowledge where the asymptotic dominance of the prior by the likelihood is present, interval estimates generated from the Bayesian paradigm are based largely on the likelihood which has been shown to be superior to other interval estimation methods [4] . Another advantage of the Bayesian paradigm is that if the prior is sufficiently informative, then, assumptions required for identifiability can be relaxed. Thus, our Bayesian approach can be used in situations when the parameters of a likelihood function are not identifiable. The Bayesian estimators considered here have no known closed form and, thus, must be found approximately. We use Markov Chain Monte Carlo (MCMC) simulations to sample from the model’s posterior distribution and obtain parameter estimates.

The remainder of the paper is outlined as follows. In Section 2, we present the model and give identifiability assumptions. In Section 3, we describe a simulation study and present the simulation results for the Bayesian estimation. In Section 4, we apply our model to two parameter-ranking applications and two subset selection problems for multiple sites, and in Section 5, we perform an additional simulation to determine the effectiveness of our subset selection procedure. Finally, in Section 6, we provide several comments summarizing our results.

2. The Model

Assume that N randomly selected items to be inspected are sampled from the general population of items. Let the true quality state of an item be denoted by T, where T = 1 indicates a good item and T = 0 denotes an item that fails to meet the quality specifications. The symbol τ denotes the overall conforming rate. Because we assume that no gold standard is used and because T is a latent variable, we also assume T ~ Bernoulli ( τ ) .

Repeated independent, fallible observations are then derived by m different inspectors on the ith unit to indirectly assess the true state of the ith unit, where i { 1, , N } . Let Y i j k denote the result of the kth inspection on the ith item by the jth inspector, where Y i j k = 1 denotes a passed inspection, Y i j k = 0 denotes a failed inspection, and k { 1, , n i j } . For each Y i j k and inspector j, we further define the conditional probabilities θ j , + = P ( Y i j k = 1 | T i = 0 ) (false positive rate) and θ j , = P ( Y i j k = 0 | T i = 1 ) (false negative rate) with respect to the true state of the item, T i . Further, assume

( Y i j k | T i = 0 ) ~ Bernoulli ( θ j , + ) (1)


( Y i j k | T i = 1 ) ~ Bernoulli ( 1 θ j , ) . (2)

here, we initially assume that inspections are independent, given the true latent state of the ith part. This conditional independence assumption yields

( k = 1 l Y i j k | T i = 0 ) ~ Binomial ( l , θ j , + ) . (3)

To relax assumptions that the inspectors all have the same probability of classifying correctly and allow for other random heterogeneity, we consider the random effects model where

θ j , + ~ Beta ( μ + , γ + ) , θ j , ~ Beta ( μ , γ ) , (4)

where the Beta distribution has been reparameterized such that μ = α / ( α + β ) and γ = α + β . Thus, the reparameterized Beta probability density function (PDF) is

f ( x ) = x μ γ 1 ( 1 x ) γ μ γ 1 B ( μ γ , γ μ γ ) . (5)

To complete the hierarchical model we require priors for μ + , μ , γ + and γ . Specifically we assume Beta ( α + , β + ) and Beta ( α , β ) priors for μ + and μ , respectively. Finally, Gamma ( c + , d + ) and Gamma ( c , d ) priors are used for γ + and γ , respectively. In the absence of prior information, Beta ( 1,1 ) priors can be used for μ + and μ and diffuse Gamma priors are used for γ + and γ .

We have chosen a Beta distribution to model the random effects because of its interpretability under this reparameterization. An often used alternative model structure is

logit ( θ ) ~ N ( μ , σ ) , (6)

where μ is generally given a normal prior and σ is often given a half-t or half-Cauchy prior.

For the parameters Θ = { θ , θ + } and Ψ : = { μ + , γ + , μ , γ , τ } , the likelihood of the latent vector t = [ t 1 , , t N ] , the observed data matrix is X = [ x 1 , , x N ] , where x = [ x i , 1 , , x i , m ] and

f ( x , t , Θ | Ψ ) = f ( x | t , Θ , Y ) f ( t , Θ | Ψ ) = τ i = 1 N t i ( 1 τ ) N i = 1 N t i ( j = 1 m ( 1 θ j ) i = 1 N t i x i j + μ γ 1 × ( θ j ) i = 1 N t i ( n i j x i j ) + γ μ γ 1 ) ( j = 1 m ( θ j + ) i = 1 N ( 1 t i ) x i j + μ + γ + 1 × ( 1 θ j + ) i = 1 N ( 1 t i ) ( n i j x i j ) + γ + μ + γ + 1 ) . (7)

For the random effects model, the first assumption necessary for identifiability [3] is

μ + + μ < 1. (8)

The interpretation of (8) is that the overall expected probability of correctly classifying an item is greater than the chance of misclassifying it. This assumption is required due to the bimodal nature of the likelihood [4] .

The second identifiability assumption assures that there are enough degrees of freedom to estimate all model parameters. This assumption requires two things: that enough inspectors and inspections per inspector are available to estimate the status of each item, and that enough inspectors are available to estimate the inspectors’ random effects parameters. The second condition requires at least two inspectors while letting l j = min ( n i j , , n N j ) . A sufficient condition to meet the first requirement is that

1 + j = 1 m ( l j + 1 ) 2 m + 1. (9)

In the present model, (9) is sufficient because additional inspections do not harm the model identifiability.

The third identifiability assumption is that both true negatives ( T i = 0 ) and true positives ( T i = 1 ) exist in the sample. This assumption is necessary because the absence of true negatives indicates one cannot estimate false negative rates. [3] have demonstrated that the absence of either true negatives or true positives essentially implies that there is enough data to estimate only half of the variables, namely θ + , μ + , and γ + or θ , μ , and γ . We remark that the last two identifiability requirements can be omitted if one uses sufficiently informative priors on at least some parameters.

3. Ranking and Selecting Inspectors

Suppose we are interested in determining which inspector has the lowest overall error rate. Here, we have chosen to combine the false positive and false negative rates into a single positive likelihood ratio (LR), η j = ( 1 θ j , + ) / θ j , . Whichever inspector has the highest likelihood ratio would be determined to be the best. The positive likelihood ratio may not always be the most appropriate combination of the error rates, however, it is simply the one we use here as an example. In some cases, the negative likelihood ratio, η j = θ j , + / ( 1 θ j , ) or even a weighted sum of θ + and θ may be more appropriate. This approach can be decided on a case by case basis. We follow the method of [5] who have derived a decision-theoretic approach to partition parameters into two sets based on an ordering of the parameters of interest. Also, [6] extended their work to determine the largest Poisson rate when counts are subject to misclassification. Here we apply the method to subset a group of inspectors into a superior set, S, and an inferior set, S C .

In the creation of a best subset, there are m separate two-state decision problems. Each decision involves whether or not to place an inspector’s likelihood ratio in the superior set, d + k : η k S . We assign following constant loss functions:

L + k ( η ) = ( 0 if η k = η b e s t c 1 if η k η b e s t and L k ( η ) = ( c 2 if η k = η b e s t 0 if η k η b e s t (10)

where L + k and L k are the loss functions for d + k and d k , respectively. To make a decision, only c = c 2 / c 1 is required. These loss functions determine the decision criteria: take action d + k and include η k as a candidate for the largest parameter if P ( η k = η b e s t | x ) 1 / ( c + 1 ) . Here, generally, c 2 > c 1 because failing to place the best η k in S is the more serious error. Thus, c should be chosen larger than 1.

The probability that η k is the best of the likelihood ratios is

P ( η i = η b e s t | x ) = 0 1 0 η i 0 η i p ( η | x ) d η 1 d η i 1 d η i + 1 d η m d η i , (11)

where p ( η | x ) is the marginal posterior of the likelihood ratios. MCMC methods are used to approximate (11) numerically. To accomplish this task, we generate a sample ( η k 1 , η k 2 , , η k B ) , for k = 1 , 2 , , m of size B from the posterior distribution, and then approximate the posterior probability that η k is the best parameter by

P ^ ( η k = b e s t ( η 1 , , η m ) | x ) = # ( η k i = b e s t ( η 1 i , , η m i ) ) B , (12)

where k = 1 , , m and i = 1 , 2 , , B , and B is the Monte Carlo repetition size.

The parameter η k is included in the superior set S if

P ^ ( η k = b e s t ( η 1 , , η m ) | x ) 1 / ( c + 1 ) , k = 1, , m . (13)

4. Example

As an example we consider data from [4] on a sample of 38 prints produced by inkjet cartridges. Three inspectors analyzed each print 3 times. Only the total number of passes out of the 9 inspections was provided, so for illustrative purposes, for those parts that did not have 0 or 9 passes, we distributed the number of passes across the three inspectors in a way to best match the frequentist parameter estimates provided in [4] . We assign beta (1, 9) priors to both μ + and μ since both of these quantities are expected to be considerably below 0.50. Our expert was 95% certain that both misclassification rates were less than 0.40, and a beta (1, 9) prior appropriately modeled the uncertainty. These distributions have prior 95% intervals of (0.003, 0.336) and have an equivalent sample size of 10 observations, and, therefore, would be considered mildly informative. A beta (1, 1) prior is used for τ , and Gamma (0.1, 0.1) priors are used for both γ + and γ . A burn-in of 10,000 iterations was used and inferences were based on the 20,000 subsequent iterations. The posterior summaries for each model parameter are provided in Table 1.

Table 1. Posterior summary for [4] example.

From Figure 1, we see that when combined into the positive likelihood ratio, where a higher number is better, Inspector 1 has the overall highest LR. To apply the decision theoretic procedure to determine if any inspector is “best,’’ we compute the posterior probabilities of each likelihood ratio parameter being the largest. Here, a value of c = 10 , implies that it is 10 times worse to leave the best inspector out of the superior set than to put an inferior inspector in the superior set, the critical probability would then be 1/(10 + 1) = 0.091. The probabilities that Inspectors 1, 2, and 3 are each in the superior set are 0.891, 0.083, and 0.026, respectively. Thus, here, only Inspector 1 exceeds the 0.091 probability threshold. Thus, inspector 1 would be the only inspector placed in the superior set.

5. A Simulation Study

We conducted a simulation study to determine the effectiveness of the subset selection procedure. We set the number of inspectors to be m = 7 and the number of repeats to be l = 3 . For τ = 0.5 , μ + = 0.15 , μ = 0.1 , γ + = 20 , and γ = 40 we generated a single set of θ + , j ’s and θ , j ’s. The values for θ + , j , θ , j and the corresponding likelihood ratios are presented in Table 2.

The prior distributions used were

μ + ~ b e t a ( 1,1 ) , (14)

μ ~ b e t a ( 1,1 ) , (15)

γ + ~ G a m m a ( 0.1,0.1 ) , (16)

γ ~ G a m m a ( 0.1,0.1 ) , (17)


τ + ~ b e t a ( 1,1 ) . (18)

Figure 1. Posterior distributions of likelihood ratios.

Table 2. Misclassification parameters for simulation study.

Table 3. Simulation results for N = 50 .

Table 4. Simulation results for N = 100 .

Table 5. Simulation results for N = 200 .

Thus, relatively non-informative priors were employed for all parameters. We considered sample sizes of N = 50 , 100, and 200 and generated 1000 data sets for each sample size. We monitored the probability that each likelihood ratio is the largest and the 95% credible set of the rank for each η i . These results are provided in Tables 3-5. For the decision theory problem we used c = 10 and, thus, also monitored whether the true “best’’ inspector was included in the superior set as well as the average size of the superior set. In this paper we are focusing on the ranking and selection methods, so those are the simulation results we report here. We also monitored posterior means and found they were close to the true values with small bias and coverage of intervals close to nominal for all parameters. The bias and coverage results are available upon request.

For all simulations, Inspector 6, who was the “best’’ inspector, yielded the highest probability of having the largest likelihood ratio, and, therefore, was correctly estimated to be the best inspector the most times. Also, the credible intervals on the ranks for Inspector 6 were consistently closest to the top rank. Conversely, Inspector 2, who was the “worst’’ inspector, produced the lowest probability of having the largest likelihood ratio and, was correctly considered the worst inspector the most times. Inspector 2 also yielded credible intervals for the rank with the largest values, implying this inspector was generally ranked last. Thus both the ranking and selection procedures performed well.

For all three considered sample sizes, the probability of the “best’’ inspector being included in the superior set was greater than 0.9. The average size of the superior set was 2.8 for a sample size of 50, 2.4 for a sample size of 100 and 2.2 for a sample size of 200.

6. Conclusions

In this paper we have proposed a Bayesian random effects model for a binary measurement system. As shown in our real data example, combining the data with mildly informative priors yields an identifiable model where inferences can be made on the overall classification rates along with comparisons of individual inspectors. Our simulation study shows that for moderate sample sizes, even when information is not available for priors, the procedure works well with the best inspector being included in the superior set a large percentage of the time.

The methods we have proposed could be extended to comparing overall defective rates and classification probabilities of manufacturing plants instead of inspectors, as we have done here. Expanding to continuous measurements from binary is also potentially of interest.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

Cite this paper

Eschmann, M., Stamey, J.D., Young, P.D. and Young, D.M. (2019) Bayesian Approach to Ranking and Selection for a Binary Measurement System. Open Journal of Statistics, 9, 436-444.


  1. 1. Danila, O., Steiner, S.H. and MacKay, R.J. (2008) Assessing a Binary Measurement System. Journal of Quality Technology, 40, 310-318.

  2. 2. van Wieringen, W.N. (2008) Measurement System Analysis for Binary Data. Technometrics, 50, 468-478.

  3. 3. Danila, O., Steiner, S.H. and MacKay, R.J. (2012) Assessing a Binary Measurement System with Varying Misclassification Rates Using a Latent Class Random Effects Model. Journal of Quality Technology, 44, 179-191.

  4. 4. Boyles, R.A. (2001) Gauge Capability for Pass-Fail Inspection. Technometrics, 43, 223-229.

  5. 5. Bratcher, T.L. and Bhalla, P. (1974) On the Properties of an Optimal Selection Procedure. Communications in Statistics: Simulation and Computation, 3, 191-196.

  6. 6. Stamey, J.A., Bratcher, T.L. and Young, D.M. (2004) Parameter Subset Selection and Multiple Comparisons of Poisson Rate Parameters with Misclassification. Computational Statistics & Data Analysis, 45, 467-479.