An important problem with null hypothesis significance testing, as it is normally performed, is that it is uninformative to reject a point null hypothesis [1]. A way around this problem is to use range null hypotheses [2]. But the use of range null hypotheses also is problematic. Aside from the usual issues of whether null hypothesis significance tests can be justified at all, there is an issue that is specific to range null hypotheses. It is not straightforward how to calculate the probability of the data given a range null hypothesis. The traditional way is to use the single point that maximizes the obtained p-value. The Bayesian alternative is to propose a prior probability distribution and integrate across it. Because frequentists and Bayesians disagree about a variety of issues, especially those pertaining to whether it is permissible to assign probabilities to hypotheses, and what gets lost in the shuffle is that the two camps actually come to different answers for the probability of the data given a range null hypothesis. Because the probability of the data given the hypothesis is a precursor for both camps, for drawing conclusions about hypotheses, different values for this probability for the different camps is crucial but seldom acknowledged. The goal of the present article is to bring out the problem in a manner accessible to researchers without strong mathematical or statistical backgrounds.
Frequentists and Bayesians disagree about how to handle the inverse inference issue. How does a researcher traverse a pathway from the calculated probability of the finding given a hypothesis (such as the null hypothesis) to the probability of the hypothesis given the finding? Bayesians argue that direct inverse inferences are invalid, thereby similarly invalidating the null hypothesis significance testing procedure. In contrast, frequentists criticize Bayesians for having to make unjustified assumptions about priori probabilities of null hypotheses to allow the Bayesian machinery to run. In marked contrast to this issue, there is little literature on what would seem to be an issue that precedes the inverse inference issue; namely, how does one calculate the probability of the finding given a hypothesis in the first place? It might seem that this is straightforward, and it is straightforward in the context of point hypotheses. But it is not straightforward in the context of range hypotheses which provide the present focus. Put simply, the question of interest is: Given a range hypothesis, how can one calculate the probability of the finding given it?
To understand why we should care about range null hypotheses at all, it is necessary to consider, in detail, that which is so well known that few consider it carefully. First, there is a preliminary issue about whether hypotheses can have probabilities at all. Second, there is an additional preliminary issue about precisely the logic by which frequentists decide between competing hypotheses. My immediate goal in these sections is not to take sides but rather to bring out the disagreements. My more general goal is to show that both sides can be faulted not just on the difficult issue of inverse inference, but even on the more basic issue of calculating the probability of the finding given a range hypothesis. If a calculation this basic already is problematic, the inverse inference issue may be even more intractable.
Probabilities of HypothesesBayesians and frequentists disagree with each other with respect to how they draw conclusions about hypotheses. Bayesians are willing to assume that hypotheses have probabilities anywhere between 0 and 1, whereas the furthest frequentists are willing to go is to allow that hypotheses can have probabilities of 0 (hypothesis is false) or 1 (hypothesis is true), but nothing between these extreme values. And of course, most frequentists will freely admit that they do not know whether to assign a probability of 0 or a 1 to a particular hypothesis. This admission causes most frequentists to focus on procedures for controlling the error rate, rather than assigning values to particular hypotheses [
Graduate training in the sciences tends to stress scientific conservativism; scientists should demand reasonably impressive evidence before being willing to draw a conclusion. From this perspective, the fact that frequentists are more conservative than liberals easily can be taken as evidence that frequentists are more “scientific” than Bayesians. When contrasting the two perspectives against each other at the level of drawing conclusions about hypotheses, it is relatively easy to make this argument. It clearly is more conservative to admit to not knowing how to assign a probability to a hypothesis than to insist that one does know how to do this. Saying “I do not know” is more conservative than making numerical assignments of numbers to hypotheses.
In the other direction, however, Bayesians could claim that frequentists are too liberal because they use 0.05 as the alpha level for deciding statistical significance. In general, Bayesians claim to be more conservative than frequentists because they insist that the probability of the favored hypothesis given the finding, when they get to that point, be at least 8 or even 10 times greater the probability of the hypothesis that is not favored, before believing the favored hypothesis [
Thus, the two sides disagree on whether the notion of a probability of a hypothesis (other than zero or one) makes sense at all, whether it is possible to calculate such an entity even if it did make sense, on the importance of the probabilities of findings given hypotheses, and on whether liberalism or conservativism is about ratios of hypothesis probabilities or about probabilities of findings given hypotheses. But the main issue of interest here is yet to come, and it falls out of the issue of the plausibility of point null hypotheses.
Over many decades, there has accumulated much criticism pertaining to the null hypothesis significance testing procedure (NHSTP) [
My goal is to examine this argument carefully to see where it leads. However, it is first necessary to review the syllogisms that come into play in discussions of this sort.
Let us commence with the usual logic that accompanies traditional two-tailed significance tests. In such cases, researchers define a point null hypothesis to be contrasted against a range alternative hypothesis. Because the arguments to be developed do not depend on the idiosyncrasies of any particular type of study, let us consider the simplest possible case of coin tosses and whether or not the coin is fair. We might define null and alternative hypotheses as follows where P(H) refers to the probability of heads.
Case 1
H0: P(H) = 0.5
H1: P(H) ≠ 0.5
In the foregoing case, the logic is simple and based on the ability to use a small p-value to reject the null hypothesis1. Specifically, we have the following syllogism.
Syllogism 1
H0 or H1 {Premise 1}
Not H0 {Premise 2}
Therefore, H1 {Conclusion}
There can be no doubt that Syllogism 1 is valid. But as pointed out earlier, Syllogism 1 can be criticized as not being informative because it is extremely implausible, a priori, that a coin is perfectly fair.
It is easy to imagine another state of affairs, accompanied by another syllogism. Consider Case 2 and Syllogism 2 below.
Case 2
H0: P(H) = 0.5
H1: P(H) > 0.5
Syllogism 2
H0 or H1 or something else [i.e., P(H) < 0.5] {Premise 1}
Not H0 {Premise 2}
Therefore, H1 {Conclusion}
Syllogism 2 has a rather obvious flaw that stems from the fact that Premise 1 states three possibilities, which is necessitated by the fact that Case 2 leaves open the possibility that P(H) can be greater than 0.5 (alternative hypothesis), equal to 0.5 (null hypothesis), or less than 0.5 (unstated hypothesis). Therefore, rejecting the null hypothesis that the probability of heads is equal to 0.5 does not allow an unambiguous conclusion about whether this probability is less than or greater than 0.5. Put simply, Syllogism 2 is blatantly invalid when based on Case 2. Perhaps it is a recognition of this invalidity that is responsible for some statistical authorities favoring range null hypotheses. As an example, consider Case 3 and Syllogism 3 below2.
Case 3
H0: P(H) ≤ 0.5
H1: P(H) > 0.5
Syllogism 3
H0 or H1 {Premise 1}
Not H0 {Premise 2}
Therefore, H1 {Conclusion}
The combination of Case 3 and Syllogism 3 seems beautiful. It is logically valid and, at the same time, solves the problem that we had earlier of rejecting a non-plausible null hypothesis. Rejecting the null hypothesis in Case 3 is quite informative because doing so also causes half of the possibilities to be rejected, thereby allowing a directional hypothesis to be supported. Therefore, it is worth examining this combination in more detail.
The standard way to handle the combination of Case 3 and Syllogism 3 is to use a one-tailed test. For coin tosses, one would use the binomial theorem. Suppose that one has obtained k heads out of N tosses. The one-tailed probability is simply the probability of having obtained k heads out of N tosses, plus the probability of having obtained k + 1 heads out of N tosses, and so on, up to N heads out of N tosses. The binomial theorem is presented below as Equation (1):
P ( k headsoutof N tosses ) = N ! k ! ( N − k ) ! p k ( 1 − p ) N − k . (1)
Suppose that an investigator performed a study that involved N = 20 coin tosses and k = 17 heads. The normal procedure would be to use Equation (1) as follows. Set p at the “fair coin” level of 0.5 (note that this p is not the same p as in p-value), and substitute 20 and 17 for N and k in Equation (1), respectively, but with three more iterations where 18, 19, and 20 are substituted for k. This is performed below:
P ( 17 ormoreheadsoutof 20 tosses ) = 20 ! 17 ! ( 20 − 17 ) ! 0.5 17 ( 1 − 0.5 ) 20 − 17 + 20 ! 18 ! ( 20 − 18 ) ! 0.5 18 ( 1 − 0.5 ) 20 − 18 + 20 ! 19 ! ( 20 − 19 ) ! 0.5 19 ( 1 − 0.5 ) 20 − 19 + 20 ! 20 ! ( 20 − 20 ) ! 0.5 20 ( 1 − 0.5 ) 20 − 20 = 0.001
In other words, the result of our hypothetical experiment is highly significant and so Syllogism 3 can proceed without hindrance. Or so it seems.
But there is a major problem with the foregoing mathematics in the context of Case 3. Specifically, the calculation performed gives the probability of 17 or more heads out of 20 tosses, based on a single population parameter (p = 0.5). But the calculation does not cover if p equals 0.49, 0.48, and so on down to 0. Depending on one’s philosophical perspective, it is far from clear that the calculation based on a single parameter applies to the whole range of values. Arguably, one would have to assign (prior) probabilities to all of the values within the range between p = 0 and p = 0.5, and integrate across that range. To render the arguments accessible to everyone, however, let us simplify the problem and see where the simplification takes us.
Let us commence with a null hypothesis that specifies only two values, rather than dealing with a range of values. Later, we will add more values.
Suppose that we have a null hypothesis with two values instead of a null hypothesis with a range of values. This is shown in Case 4.
Case 4
H0: P(H) = 0.5 or P(H) = 0
H1: P(H) ≠ 0.5 and P(H) ≠ 0
Syllogism 4
H0 or H1 {Premise 1}
Not H0 {Premise 2}
Therefore, H1 {Conclusion}
Syllogism 4 is logically valid and it makes use of Case 4 where the null hypothesis specifies two values. Assuming, as usual, that a sufficiently low probability of the finding given the null hypothesis justifies rejecting the null hypothesis, how should the probability be calculated?
To answer this question, let us put aside the null hypothesis for a moment and consider the abstract case where we are concerned with the probability of A given that C or D is true. For example, imagine we are invited to dinner and we are interested in the probability that our hostess will serve chocolate for dessert (A) given that she serves chicken (C) or fish (D) for dinner. In symbols, we are interested in P ( A | ( C or D ) ) = P ( A | ( C ∪ D ) ) .
Let us assume that C and D are mutually exclusive ( C ∩ D = ∅ ) . In the dinner example, our hypothetical hostess would never serve both chicken and fish for dinner, though she might serve either one or something else entirely. It is possible to rewrite the expression of interest so that we have only conditional and unconditional probabilities (see Equation (5) below):
P ( A | ( C ∪ D ) ) = P ( A ∩ C ) + P ( A ∩ D ) P ( C ) + P ( D ) , (2)
P ( A | ( C ∪ D ) ) = P ( A ∩ C ) P ( C ) + P ( D ) + P ( A ∩ D ) P ( C ) + P ( D ) , (3)
P ( A | ( C ∪ D ) ) = P ( A ∩ C ) P ( C ) ⋅ P ( C ) P ( C ) + P ( D ) + P ( A ∩ D ) P ( D ) ⋅ P ( D ) P ( C ) + P ( D ) , (4)
P ( A | ( C ∪ D ) ) = P ( A | C ) ⋅ P ( C ) P ( C ) + P ( D ) + P ( A | D ) ⋅ P ( D ) P ( C ) + P ( D ) . (5)
Equation (5) makes clear that we need not only the conditional probability of A given C or D, but we also need the unconditional probability of C and the unconditional probability of D, in order to calculate the conditional probability of A given that C or D is true. Returning to our hostess, if we wish to calculate the probability that she will serve chocolate for dessert given that she serves chicken or fish for dinner, we need to know the unconditional probability that she will serve chicken for dinner and the unconditional probability that she will serve fish for dinner. If we do not know these two unconditional probabilities, there is no way for us to calculate the conditional probability that our hostess will serve chocolate for dessert given that she serves chicken or fish for dinner.
Let us now apply what we learned from our hostess to consider again the combination of Case 4 and Syllogism 4, where the null hypothesis specifies that p = 0 or p = 0.50. Equation (5) tells us that in order to calculate the probability of getting, say, 17 heads out of 20 tosses, given that the population proportion of heads is 0 or 0.50, we would need to know the unconditional probability that the population proportion of heads is 0.50 and the unconditional probability that the population proportion of heads is 0. We saw earlier how to calculate the conditional probability of 17 heads out of 20 tosses given a single value (p = 0.50), but Equation (5) shows that this is insufficient when there are two population values to consider. Again, although we do need the conditional probability of 17 heads out of 20 tosses, given that p = 0 or p = 0.50, we also need the two unconditional probabilities concerning the 0 and 0.50 population values. Are we stuck?
It depends, to some extent, on one’s philosophical position pertaining to whether hypothesized population values can take on probabilities. If one’s answer is “yes,” as would be the case with most Bayesians, then we are not stuck. The researcher would find an arbitrary way of assigning probabilities to p = 0 and p = 0.50, and then it would be easy to carry the calculation through. An example of an arbitrary approach would be to say that because we have no reason to favor p = 0 over p = 0.50, or to favor p = 0.50 over p = 0, we can assign a probability of 0.5 to each of these. Using this arbitrary system, we might perform the following calculation:
P ( 17 ormoreheadsoutof 20 tosses | ( p = 0 or p = 0.5 ) ) = P ( 17 ormoreheadoutof 20 tosses | p = 0 ) ⋅ P ( p = 0 ) P ( p = 0 ) + P ( p = 0.50 ) + P ( 17 ormoreheadsoutof 20 tosses | p = 0.50 ) ⋅ P ( p = 0.50 ) P ( p = 0 ) + P ( p = 0.50 ) = 0 + 0.001 ( 0.5 0.5 + 0.5 ) = 0.0005.
Note that because it is impossible to get 17 (or any) heads out of 20 tosses if the population proportion of heads is 0, the whole first term is 0. Regarding the second term, we saw earlier that the probability of 17 heads out of 20 tosses, if the population proportion is 0.5, is 0.0013. Thus, the final value is 0.0005, which is half the value we had obtained earlier. We could have assigned a probability of 1 to the 0.5 value and a probability of 0 to the 0 value, in which case we would have obtained the same result as the calculation performed earlier. Or we could have assigned a probability of 0.75 to the 0.5 value and a probability of 0.25 to the 0 value, or anything else we wished. The advantage of arbitrary unconditional probabilities is that they allow us to run the mathematical machinery. A disadvantage is that admitting the arbitrariness of assignments of unconditional probabilities to population values renders difficult a convincing argument that any particular answer is the correct answer. After all, a different assignment of probabilities to unconditional probabilities would result in different “correct” answers. Is there another way to think about this?
In fact, there is, though it also depends on arbitrariness [
An argument that can be used is as follows. Based on the binomial theorem, we would obtain a larger probability of 17 heads out of 20 tosses using a population value of 0.5 than using a value of 0. Therefore, by using 0.5 as the population value, we can be sure that we are obtaining the largest possible probability of the finding, given the null hypothesis. Because the goal eventually will be to use a low probability of the finding, given the null hypothesis, to reject the null hypothesis, using 0.5 as the population value renders a conservative judgment. If the probability of 17 heads out of 20 tosses is 0.001 assuming p = 0.50, the probability of 17 heads out of 20 tosses is lower than 0.001 using p = 0.49, p = 0.48, and so on, down to p = 0. In fact, as we saw earlier, the probability of 17 heads out of 20 tosses given a population value of 0, is 0.
So what is correct? If one assumes that hypotheses about population values can have probabilities other than 0 or 1, this results in a dilemma for those who wish to perform one-tailed tests to reject null hypotheses. That is, to make the logic work out so that rejecting the null hypothesis is both meaningful (because it specifies a range rather than a point) and also really does force acceptance of the alternative hypothesis, it is necessary to have a range null hypothesis rather than a point null hypothesis. However, again from the perspective that it is reasonable to assign probabilities to hypotheses, Equation (5) shows that the mathematics typically used gives wrong answers! That is, the mathematics of using only the binomial theorem, without taking unconditional probabilities into account, is blatantly wrong by Equation (5). On the other hand, if one uses the argument that it only is permissible to assign a value of 0 or 1 to each population value, perhaps it is possible to justify the binomial calculation that results in a value of 0.001 for the probability of 17 heads out of 20 tosses. We will discuss this further, but let us come closer to considering the whole range of population values first.
The combination of Case 4 and Syllogism 4 used a null hypothesis that specified two values: these were 0 and 0.50. Let us now consider 51 values: 0, 0.01, 0.02, ∙∙∙, 0.50. There is no expectation that anyone performing substantive research would use these value. Rather, the purpose is to move us closer to the continuous case, where a traditionalist would maximize and a Bayesian would integrate.
Case 5
H0: P(H) = 0 or P(H) = 0.01 or P(H) = 0.02, ∙∙∙, or P(H) = 0.50
H1: P(H) ≠ 0 and P(H) ≠ 0.01 and P(H) ≠ 0.02, ∙∙∙, and P(H) ≠ 0.50
Syllogism 5
H0 or H1 {Premise 1}
Not H0 {Premise 2}
Therefore, H1 {Conclusion}
Case 5 is similar to Case 4 except that 51 values are specified rather than two values. Using such a spread allows us to dramatize the difference between the two philosophical perspectives pertaining to whether hypotheses about population values can have probabilities.
Let us generalize Equation (5) to the case where there are 51 values rather than two, whose union qualifies under the null hypothesis. Generalization renders Equation (6):
P ( A | ( C ∪ D ∪ ⋯ ∪ fifty-firstletter ) ) = P ( A | C ) ⋅ P ( C ) P ( C ) + P ( D ) + ⋯ + P ( fifty-firstletter ) + P ( A | D ) ⋅ P ( D ) P ( C ) + P ( D ) + ⋯ + P ( fifty-firstletter ) + ⋯ + P ( A | fifty-firstletter ) ⋅ P ( fifty-firstletter ) P ( C ) + P ( D ) + ⋯ + P ( fifty-firstletter ) . (6)
We might ask what the probability is of getting 17 or more heads out of 20 trials given Equation (6). From the point of view that hypotheses about population values can have probabilities, we would need to compute the probability of 17 or more heads out of 20 tosses given each of the population values (0, 0.01, 0.02, ∙∙∙, 0.50), which is not much of a challenge. But we also would need to assign probabilities to the population values (that is, we need to assign unconditional probabilities). This is much more of a challenge. For example, should we assign an equal probability to each of the 51 population parameters included in the null hypothesis? Should we assign larger probabilities to values near 0.5? Alternatively, as 0.25 is in the middle of the range from 0 to 0.5, should we assign larger probabilities to values near 0.25? In fact, we could assume a normal-like distribution of values so that values near 0.25 are more likely than values at either extreme (0 or 0.5)4. Although there is no easy answer to this question, two points should be apparent. First, different decisions about assigning probabilities to population values will render different probabilities of getting 17 or more heads out of 20 tosses. Second, although it is possible to make decisions that would result in findings similar to the binomial calculation with which we commenced, it also is possible to make decisions that would result in findings that differ markedly from that obtained by the binomial calculation.
Or, we could resort again to the strategy of maximizing the calculated probability of obtaining 17 or more heads out of 20 tosses. In this case, we would assign a probability of 1 to the population value of 0.5, and a probability of 0 to all of the other population values. In this case, Equation (6) would reduce down to a single term that is equivalent to the binomial calculation with which we commenced―namely, the probability of 17 or more heads out of 20 tosses is 0.001.
Let us now return to Case 3 and Syllogism 3, copied below for the reader’s convenience.
Case 3
H0: P(H) ≤ 0.5
H1: P(H) > 0.5
Syllogism 3
H0 or H1 {Premise 1}
Not H0 {Premise 2}
Therefore, H1 {Conclusion}
In the combination of Case 3 and Syllogism 3, we have that which truly represents the thinking in one-tailed tests, as opposed to merely approximating as in Combination 4 and Combination 5.
In the combination of Case 3 and Syllogism 3, we have a continuous range, with an infinite number of points contained within the range from 0 to 0.5. Nevertheless, the issues that were raised still apply. From the philosophical point of view that hypothesized population values can have probabilities, the question in this continuous case is: What is the density distribution that the researcher should assign to the range going from 0 to 0.5? Here, if one desired, one could apply a uniform distribution, an approximation of the normal distribution, a triangular distribution, or many others. Further, the researcher might wish to define the peak, if there is one, at 0.5 but might instead choose 0.25, or even 0, with the choice being influenced by the shape of the distribution one wishes to assume. To find the probability of the finding given the range null hypothesis, the researcher would have to integrate across the range of values for the assumed distribution. The philosophical weak point, perhaps, is that it is difficult to know what distribution to use and also, if the chosen distribution has a peak, it may be difficult to know where that peak should be.
Or, we can be traditional, and again assign a prior probability of 1 to the population value of 0.5 and a probability of 0 to everything else. As usual, this results in a binomial calculation for the probability of 17 or more heads out of 20 tosses as being 0.0015. If a researcher decided to assign probabilities other than 0 or 1 to hypotheses, there are many ways of integrating across the range from 0 to 0.5, depending on the assumed distribution, that would result in values for the probability of 17 heads out of 20 tosses that differ markedly from each other, and also from the strict binomial calculation based on a probability of 1 for the 0.5 population value.
Having taken considerable trouble to mark out the issues, does it make sense to compute the probabilities of findings given range null hypotheses? As we will see below, there is more than one way to think about it.
Calculations of probabilities of findings given hypotheses play a role in three schemes: Bayesian [
For many researchers, the goal of research is to come to probabilities of hypotheses. From the standard frequentist point of view, as explained earlier, hypotheses are true or false but do not have probabilities (other than zero or one). But if one is a Bayesian, then hypotheses have probabilities. In turn, if the ultimate goal is to assign probabilities to hypotheses (but informed by the data), there is no need to go through the formalism of accepting or rejecting them, and so there is no point in using any of the foregoing syllogisms. Rather, the exercise of determining probabilities of findings is a preliminary step to the eventual use of the famous theorem by Bayes to determine the probabilities of hypotheses. Thus, there is no reason to perform either one-tailed or two-tailed significance tests because statistical significance is irrelevant. Importantly for the issue of conditional probabilities of findings, a Bayesian would assume a prior density distribution of unconditional probabilities to aid in calculating the conditional probability of a finding, given a hypothesis, rather than assigning a probability of 1 to the fair coin value of 0.5. From a Bayesian point of view, then, the usual calculation of the probability of the finding given a range null hypothesis is blatantly wrong because of the failure to assign unconditional probabilities to all of the values in the range specified by the null hypothesis. Let me reiterate for emphasis. From a Bayesian point of view, the calculation of the probability of the finding given a range null hypothesis is wrong. This is not just a matter of the frequentist being too conservative, but rather a matter of plain mathematical wrongness.
According to Fisher [
The probability of the null hypothesis, given the finding, is not the same as the probability of the finding given the null hypothesis (Trafimow & Marks, 2015). From a frequentist point of view, the null hypothesis does not have a probability (other than zero or one) and even if one is not a frequentist, it should be obvious that the probability of the hypothesis given the finding is not the same as the probability of the finding given the hypothesis. Nevertheless, based on Neyman and Pearson [
From the point of view of controlling Type I error, we have seen that assigning an unconditional probability of 1 to the 0.5 value serves to provide the most conservative way to arrive at the probability of the finding given the null hypothesis. That is, it provides the largest possible probability and thus the smallest chance of meeting the required 0.05 criterion. But from another point of view, using a one-tailed test is not at all conservative. Consider again our original binomial calculation that the probability of 17 or more heads out of 20 tosses is 0.001. But if we were using a traditional point hypothesis with a two-tailed test, we would be interested in the probability of obtaining 17 or more heads out of 20 tosses, given that the coin is fair; and we also would be interested in the probability of obtaining 3 or fewer heads out of 20 tosses, given that the coin is fair. The probability of 17 or more heads out of 20 tosses, or 3 or fewer heads out of 20 tosses, is double the value that we calculated earlier. That is, the value is 0.001 × 2 = 0.002. Obviously, then, although using a one-tailed test is conservative from the point of view of assigning a prior probability of 1 to the maximum value contained in the range specified by the null hypothesis (i.e., 0.5), it is extremely liberal relative to the usual two-tailed test with a point null hypothesis that P(H) = 0.5.
Well then, from the first and second points of view above, it is silly to engage in null hypothesis significance testing in the first place. That is, the goal either should be to come to a conclusion about the probability of the hypothesis given the finding, in which case one assumes that it is reasonable to assign probabilities to hypotheses; or to form a preliminary assessment of the strength of the evidence. In the former case, to calculate the conditional probability of the finding, one needs to assign unconditional probabilities of hypotheses to carry the calculation through. From this perspective, the usual one-tailed calculation gives blatantly wrong answers (again, not just conservative answers but wrong answers). And from the perspective of using the calculation as a preliminary assessment of the state of the evidence, a precise value is desirable, and the usual one-tailed calculation clearly is not precise. Consequently, it is only if one wishes to control Type I error that (a) significance testing makes sense and (b) one- tailed tests make sense to avoid the problem of rejecting an implausible null hypothesis. However, Trafimow and Earp have suggested that this point of view also is problematic. [
We have seen that, depending on one’s perspective, the traditional calculation for conditional probabilities of findings given range null hypotheses via maximization at the largest value in the range, is either blatantly wrong, quite imprecise, a conservative overestimate of the actual probability of the finding (in which case the risk of Type II error is quite large), or a quite liberal underestimate of the actual probability of the finding (relative to two-tailed calculations with point hypotheses). I underscore that these contradictory assessments pertain to evaluating probabilities of findings given hypotheses. This contrasts with the usual demonstration that different points of view give contradictory assessments about how researchers should evaluate hypotheses given findings. To my knowledge, this is the first demonstration to emphasize that these contradictions occur at the level of data evaluation rather than just at the level of hypothesis evaluation.
It is interesting that if researchers only used point null hypotheses, although different philosophical perspectives would still demand differences in hypothesis evaluation, at least the calculations would not differ pertaining to data evaluation. That is, for example, the probability of 17 or more heads out of 20 tosses, given a fair coin, would be calculated the same way by everybody and in accordance with the binomial theorem as we saw earlier (probability = 0.001). Thus, data evaluation, at least, would not be controversial, though hypothesis evaluation would remain controversial. But matters change when range null hypotheses are used, which places researchers in the Neyman-Pearson tradition in a dilemma. On the one hand, they can use point null hypotheses, where the calculations pertaining to data evaluation are not controversial, but at the cost of rejecting null hypotheses that are not plausible anyway. Or, they can use range null hypotheses that have better plausibility; but where the calculation of the probability of the obtained findings, given the null hypothesis, is potentially quite problematic. More generally, whatever the philosophical perspective, range null hypotheses are problematic even from a data evaluation point of view prior to hypothesis evaluation. Unfortunately, it is not obvious how to defend the computational technique chosen to calculate the probability of the obtained finding given a range null hypothesis. A balanced assessment might be that all of them stand on rickety foundations.
Trafimow, D. (2017) Why It Is Problematic to Calculate Probabilities of Findings Given Range Null Hypotheses. Open Journal of Statistics, 7, 483-499. https://doi.org/10.4236/ojs.2017.73034