g" /> unit. Further suppose the prior on is such that and let represent the probability that the coin is fair.
The expected loss table is given in Table 4. From the description we see so that if then a priori we are indifferent between saving the coin or throwing it away. Then the expected risk of an immediate decision is:
We may also consider sampling data by flipping a coin which is assumed to cost 0.1 units. We now calculate the range of where it is beneficial to flip the coin. To do so we determine posteriors on after observing the possible results of a coin flip:
The predictive probability of observing heads or tails, at any point, is:
Letting denote our posterior probability of the coin being biased after being flipped, the risk profiles of the decision following an observation is:
Now we can relate the bounds on to bounds on:
Suppose, then the expected loss is:
We also need to include the cost of flipping (0.1 units) resulting with an expected loss for observing once then deciding of. To see when this risk is preferable to deciding immediately we solve the following inequalities for:
Observations should continue to be taken until leaves this interval, at which a point a decision should be made. Hence the expected risk profile is:
With this risk profile we now compute the expected loss assuming we know the parameter. There are two cases: when or. In each case the expected loss as a function of is computed. The process is identical to the above so we just report them:
Recall the prior on was such that. Hence, the expected risk after learning is:
Thus the expected value of perfect information is the difference between Equation (14) (without perfect information) and Equation (17) (with perfect information):
This represents the maximum amount of units we should be prepared to forsake in order to be informed the true value of the cost parameter prior to commencing the SPRT. From this we obtain a new function that represents the loss resulting from the occurrence of a Type II error:
Equation (19) represents the expected value of the loss of making a Type II error, but discounted by the fact that we obtain information which allows more informed decisions to be made in subsequent SPRTs. A plot of (19) is given in Figure 2 where it can be observed that local minima in the expected loss occur at boundaries of indifference between choices in the initial SPRT, and that plateaus in the expected loss coincide with values of where it is never beneficial to take an observation for any value of.
2.2. Noisy Information
Now assume we only receive noisy observations concerning meaning that following observation we are not certain of its value. The procedure is similar to that for perfect information and again we first perform an SPRT without considering the value of the information.
Denoting the true value of as and our observation as then this setting
Figure 2. A plot of the expected loss incurred from committing a Type II error for Example 2.1 generated from Equation (19). The x-axis varies over the prior probability for the state of nature w, whilst the y-axis indicates the resulting expected loss.
means. We also allow the distribution over what is observed to depend on the true value and hence generate a likelihood, from which a marginal distribution, expected value, and posterior distribution may be calculated in the usual way.
For each potential a new expected value of, , is generated. Denoting as the expected loss before observation and as the expected loss after observing, the expected value of the noisy observation is calculated as:
Once this has been generated the consequence of the error will be reduced in the risk table just as was the case with perfect information, allowing a classical SPRT to be performed.
2.3. Noisy Information Numerical Example
We return to the setting of Example 2.1, but now assume that the probability that the true value is observed is only 0.8, i.e.,. This results in and.
After observing a value for we update its expected value to the following:
Letting represent the expected loss when (so, for example, is the loss from step 1 where future trials are not considered), then the value of information from our noisy observation for each is calculated as:
Note that each term in the above implicitly depends on the initial value assigned to. We then simply proceed as in Example 2.1 to obtain the final decision rule. The resulting loss tables are provided below for the three quantities listed in Equation (23):
As a result the expected value of noisy information is calculated as:
Now the new expected cost of a Type II error for the noisy information example can be determined as in see Equation (26). A plot of this function is given in Figure 3 which can be contrasted with the perfect information case given in Figure 2. Note that as before the minima occur at boundaries of indifference and that plateaus occur where we would always (or never) take an observation no matter the value of. Also note that in comparison to Figure 2, the result for noisy information results in a larger expected cost of Type II error when the true value of does play a role in the decision making. This is to be expected due to the weaker and less useful noisy information in comparison to what we learn from perfect information.
Figure 3. A plot of the expected loss incurred from committing a Type II error for Example 2.3 generated from Equation (28). The x-axis varies over the prior probability for the state of nature w, whilst the y-axis indicates the resulting expected loss.
2.4. Numerical Simulation
Details of a numerical simulation are now provided. The scenario detailed in Example 2.1 was tested in R  by considering the outcome of 3 million trials of both the classic and adaptive framework.
Each classical trial consisted of:
1) A SPRT with consequence of Type I/II error of 2 and cost of observation 0.1 run repeatedly until a Type II error is made. The bounds used are those in Equation (14), namely, before value of information is considered.
2) Upon making a Type II error, the cost from that particular SPRT is stored. The value of is then learned and another SPRT is run using the true value for the consequence of Type II error. The two costs are added to provide the total value for that trial.
In accordance with our prior on, two-thirds of the trials were performed with the true value of, while the others had.
A further 3 million trials were then run using the adaptive framework under the same procedure but with the bounds in step 1 being different. This is due to the different values used for consequence of Type II error seen in Equation (19). Using initial values of corresponded to using, resulting in bounds of approximately. The second step remains the same as the classical trial.
The average costs are given in Table 5. As can be seen, this indicated a substantial improvement (21% with the numerical scenario here) in using the adaptive framework and formally taking such uncertainty into account.
2.5. Statistical Dependence
To conclude we give a brief discussion on the effect of their being statistical dependence between the state of nature w and cost parameter. Without loss of generality, consider a joint distribution as taking on the values (and associated probability) given in Table 6. This implies conditional probabilities as given in Table 7. Note that this specification ensures that w and are not independent.
Now consider the implementation of the SPRT. The initial loss table when w and were independent is given in Table 8. However, note that we can only incur losses governed by when the state of nature is. So any loss that occurs in the joint distribution when is true should not be considered here. Also note that an equivalent scenario will occur if the uncertainties were in both Type I and Type II errors. Thus, Table 8 should be corrected to that given in Table 9, where, as can be seen, remains constant at independently of the value of, and hence the value of. This means the SPRT will have constant losses that do not change between observations, and so we simply proceed as before.
Table 5. Average costs from the simulation described in Section 2.4.
Table 6. Assumed joint distribution between w and.
Table 7. Implied conditional probabilities.
Table 8. Initial loss table in the case of independence.
Table 9. Loss table in the case of statistical dependence.
3. Unknown Observation Cost
Now suppose that the costs of making a Type I () or Type II () error are known. This means that if we were to implement an immediate decision the expected loss will be unchanged from the classical setting. However, we assume the observation cost is uncertain but subject to some prior distribution and some specified data likelihood, in which case the expected loss of making a decision after observation will have to take into account not only the uncertainty concerning the information we may receive in relation to the true state of nature, but also the uncertainty in the additional cost of having taken a further observation.
If we take the expected value of, as the observation cost, then we can determine bounds on values of within which we should seek additional data before implementing a decision. The expected risk profile (expected loss), as a function of would then be:
where is a concave (or linear) function of determined by the data generating mechanism. Then, for each possible information statement i we may receive (where here i contains both the information concerning the true state of nature and any information we gain concerning the cost of sampling), we can determine a posterior distribution on and updated expected value. With this we continue the SPRT leading to updated intervals which if does not fall within, would result in our now taking an immediate decision. The updated risk table would now have form:
Here is another concave (or linear) function in.
As the information i we may receive is currently unknown, we take the expectation of Equation (30). Subtracting this from Equation (29) (the expected risk without learning information) we obtain the expected value of that information, which can be thought of as the most we would be willing to pay for it in advance of seeing it. This should now be subtracted from, the original expected observation cost, to obtain what we would use as the adaptive information cost for the adaptive SPRT. Note that this value will be a function of. A classical SPRT is then performed with this adaptive observation cost until the true cost has been learned, at which point the test continues with the cost uncertainty removed, i.e., in the classical way.
Remark. The expected value of information is zero for any (the bounds on for which we would take further samples) and also for any that is always contained in.
As a toy example to aid in clarification of the above, suppose we are testing the efficacy of a drug and are certain of the costs incurred in making a Type I or Type II error (say 2 and 4 units respectively). Assume, however, that we have little experience in running clinical trials (our observation costs) and are not sure if it will be easy and cheap to organise () or relatively expensive (). Prior beliefs are that it is more likely to be cheap so that. Also suppose that the probability a bad drug passes the clinical trial is 0.5 whilst the probability that a drug that works passes is 0.8.
As we begin testing of the first drug we determine how to modify the SPRT procedure to take into account this uncertainty. Interest lies in the expected value of information of the observation cost, and we assume that the information will be of a perfect nature (namely remove all uncertainties). Noting that, the risk profile, without information, is:
So if we take a further observation and hence also determine the true value of. This leads to two possible further risk profiles depending on if we learn or.
Recalling the prior on is such that leads to an expected risk after learning information of:
Subtracting Equation (34) (expected risk with knowledge of) from Equation (31) provides the expected value of perfect information for the observation cost:
A plot of Equation (35) is provided in Figure 4. Note that the areas where the expected value of information is zero are where the decision rule is the same regardless of the information concerning the cost of sampling, agreeing with our earlier remark, and that the expected value of sampling information increases to be maximal where we are currently indifferent between making an immediate decision or taking further samples. With this to hand, we would continue by performing the SPRT as if we had an observation cost of, and if we do take an observation we learn the true value of and continue the SPRT with this knowledge.
In this paper we have considered the generalisation of SPRTs from a classical to adaptive utility setting where preferences or associated costs are not assumed fully known but are instead learned through experience or by funding additional information through survey or trial etc. Both unknown cost of Type I/II error was examined before subsequently considering the effect of uncertain observation cost.
Both perfect and noisy information were discussed, where we demonstrated the methods of quantifying the value for such information and numerical examples were
Figure 4. A plot of the expected value of information in Example 3.1 given by Equation (35).
provided to demonstrate the theory. Statistical dependence between the parameter and the state of nature was also considered and shown to not influence results. The numerical simulation indicated the enhanced performance by formally treating uncertainties and opportunities to learn within a SPRT in comparison to the somewhat easier modelling assumption of equating uncertainties in costs to their expected values.
Cite this paper
McMeel, C. and Houlding, B. (2016) Incorporating Uncertain Costs within a Series of Sequential Probability Ratio Tests. Open Journal of Statistics, 6, 882-897. http://dx.doi.org/10.4236/ojs.2016.65073