There are a few statistics testing the homogeneity of odds ra t ios across strata. Asymptotic statistics los e their power in the “sparse-data” setting. Both asymptotic statistics and exact tests have low power when the sample sizes are small. We created a set of U statistics and compared them with some existing statistics in testing homogeneity of OR at different data settings. We evaluated their performance in terms of the empirical size and power via Monto Carlo simulations. Our results showed that two of the U-statistics under our study had higher power for testing homogeneity of odds ratios for 2 by 2 contingency tables. The application of the tests was illustrated in two real examples.
Odds ratio is commonly used in the analysis of association of two factors that both have two categories. In epidemiological studies and clinical trials, these two factors usually refer to the exposure (treatment/intervention/risk) factor X and the outcome factor Y respectively. The association between X and Y, however, could be modified or confounded by a third factor Z. For example, in a multi-center clinical trial, factor Z could be the center. Each center is corresponding to a stratum of Z. Because the presence of the heterogeneity of odds ratios may lead to different methods of analysis, researchers usually want to test whether the odds ratios are homogeneous across the strata of the factor Z or not. This type of tests is called the tests for homogeneity of the odds ratios, or the tests for null interaction.
A few procedures have been developed for testing the homogeneity of odds ratios. They are usually categorized into two classes: exact tests and asymptotic tests. Most of the asymptotic statistics were derived for “large-stratum” settings, where the sample size is large, and the number of strata is small. Liang and Self [
Our study presented in this paper compared a class of U-statistics with the established tests such as the Zelen test and the Breslow-Day test.
The statistics of the form
U = [ l ! ( n − l ) ! / n ! ] ∑ ( n , l ) h ( X i 1 , ⋯ , X i l ) (1)
are known as U-statistics, where { X i 1 , ⋯ , X i l } is a set of l-subset of { 1 , ⋯ , n } ;
the sum ∑ ( n , l ) is taken over all subsets 1 ≤ i 1 < ⋯ < i l ≤ n of { 1 , 2 , ⋯ , n } ; h is the
kernel function and symmetric in its arguments. In our study, we investigated the applicability of this class of statistics with l = 2 in testing homogeneity of odds ratios among 2 by 2 tables. U-statistics were first identified as a minimum-variance unbiased estimator by Halmos [
Assume that ak and bk are counts of independent binomial outcomes from number nk and mk of trials with or without exposure at the stratum k respectively; Nk is the total sample size of the stratum k; and the tables are independent among strata. The commonly used estimate of the odds ratio of the kth stratum is expressed as: ψ ^ k = ( a k / ( n k − a k ) ) / ( b k / ( m k − b k ) ) . We want to test whether the odds ratios are homogeneous among all K strata (or all levels of the third variable Z), that is to test H0: ψ 1 = ψ 2 = ⋯ = ψ K = ψ against Ha: ψ i ≠ ψ j for at least one pair of (i, j), where i , j = 1 , ⋯ , K , i ≠ j .
Our study evaluated a class of weighted U-statistics ∑ i < j w i j h ( ψ ^ i , ψ ^ j ) as well as
a class of unweighted U-statistics ( 2 ! ( K − 2 ) ! / K ! ) ∑ i < j h ( ψ ^ i , ψ ^ j ) , where ψ ^ i and ψ ^ j are the estimates of the odds ratios in the ith and jth stratum of Z, h ( ψ ^ i , ψ ^ j ) is a function of ψ ^ i and ψ ^ j , and w i j is the weight associated with h ( ψ ^ i , ψ ^ j ) . Based on the simulation results, we only focus our attention on the following two statistics in this paper:
U3: ( 2 ! ( K − 2 ) ! / K ! ) ∑ i < j | log ψ ^ i − log ψ ^ j | , (2)
WU3: ∑ i < j K w i j | log ψ ^ i − log ψ ^ j | . (3)
The base of log in all formula is e. The sample distribution of the estimated odds ratio is highly skewed when the sample size is small or moderate. Because of this, we used the natural logarithm of ψ ^ in U3 and WU3 to reduce the skewness. Consider that a large stratum offers more accurate estimate for the odds ratio, a weight was selected for each h ( ψ ^ i , ψ ^ j ) in expression (3), which is proportional to the stratum’s size. It has the following form:
w i j = ( w i w j ) / ∑ w k (4)
where w i = 1 / var ( log ( ψ ^ i ) ) . The log transform of the sample odds ratio has an asymptotic variance in a simple form, which is, var ( log ψ ^ k ) ≈ 1 / a k + 1 / ( n k − a k ) + 1 / b k + 1 / ( m k − b k ) . If there was any cell count that equals to zero, the odds ratio estimate ψ ^ k and var ( log ψ ^ k ) from the above formula would be undefined. We added 0.5 to each cell count of that table in the calculations to get the amended estimators [
Y = 1 | Y = 0 | ||
---|---|---|---|
X = 1 | ak | nk - ak | nk |
X = 0 | bk | mk - bk | mk |
tk | Nk - tk | Nk |
A total of 10,000 data sets were simulated using the SAS subroutine RANBIN. Each data set contains pre-specified K sets of 2 by 2 tables. The cell counts ak and bk were independently generated from binomial distributions (nk, p1k) and (mk, p0k), where p x k = P ( Y = 1 | X = x , Z = k ) is the probability of Y = 1 when X = x (x = 1, 0) in the kth stratum. Each set of the tables was simulated with a given nk, mk, the number of the strata K and the odds ratios. For a given odd ratio ψk and a binomial proportion p0k, p1k was calculated by solving:
ψ k = [ p 1 k / ( 1 − p 1 k ) ] / [ p 0 k / ( 1 − p 0 k ) ] .
Following the previous simulation study by Reis, Hirji and Afifi [
We compared the performance of the U-statistics with the Breslow-Day statistic and Zelen’s exact test in our simulation study. A C++ program was written to calculate these statistics’ exact P-values, empirical sizes and power. The empirical size was calculated as the percentage of times that the test rejected the null hypothesis of a common odds ratio at a pre-specified α level among 10,000 tests that were simulated with same odds ratios among K tables. The empirical power was calculated as the percentage of times that a test rejected the null hypothesis of a common odds ratio at a prescribed α level when data were simulated under alternative hypotheses. Because the U-statistics studied here are functions of the sums across all the absolute distances between all possible pairs of the estimated odds ratios in log scale, a large value of U statistics indicates the heterogeneity of the odds ratio.
Theoretically, under suitable conditions, [ T − E ( T ) ] / var ( T ) will be asymptotically following N ( 0 , 1 ) as K → ∞ , where T represents a U-statistic. In our study, the sample mean and the sample variance of 10,000 statistics under the null hypothesis were used to estimate the E(T) and the var(T). In an actual application, the sample mean and variance may be estimated as suggested in our application section. In the simulation, one would use the result from expression [
The five factors affected the test statistics differently. The empirical size of the Breslow-Day test was improved (moved closer to the predefined α level) as the values of nk, mk:nk, p0k and odds ratios increased but diverged from the pre-specified α level when the number of stratum increased. A weak trend was observed that the empirical size of U3 and WU3 moved closer and then diverged from the pre-specified α level when the sample size nk increased (
The number of strata K had an apparent effect on the sizes of U-statistics; their empirical sizes were improved as K increased (
Seven settings of heterogeneous odds ratios were evaluated as alternative hypotheses in our study. However, in this article, we only reported the empirical powers from the scenario that the alterative odds ratios were generated following the pattern of 1, 2, 3, 7. That is, 25% of the generated tables under Ha have odds ratios being 1, 2, 3 and 7, respectively. In order to show the effects of different factors on the test statistics, we also simulated the critical values based on these factors (Figures 5-11).
All the statistics’ empirical power increased as nk increased (
Generally, U3 and WU3 performed well in terms of both size and power. Their empirical sizes were stable under various situations and had relatively high power.
With the assumption that the odds ratios of K 2 × 2 tables are the same; and log ( ψ ^ i ) is normally distributed with variance equal to σ 2 , we can derive the estimated expected value of U3 and WU3. They are:
E ( U 3 ) = ( 2 ! ( K − 2 ) ! / K ! ) E ( ∑ i < j | log ψ ^ i − log ψ ^ j | ) = 2 ( 1 / ( 2 σ π ) ) ∫ 0 ∞ x exp ( − x 2 / ( 4 σ 2 ) ) d x = 2 σ π (5)
E ( W U 3 ) = E ( ∑ i < j K w i j | log ψ ^ i − log ψ ^ j | ) = E ( | log ψ ^ i − log ψ ^ j | ) ∑ i < j K w i j = ∑ i < j K w i j 2 σ π (6)
The variance of U3 can be also expressed as:
var ( U 3 ) = [ 2 ! ( K − 2 ) ! / K ! ] [ 2 ( K − 2 ) 1.436 σ 2 + ( K − 2 ) 2 σ 2 ] . (7)
The variance of WU3 can be also expressed as:
var ( W U 3 ) = [ 1.436 σ 2 ∑ j ≠ k w i j w i k + 2 σ 2 ∑ i < j w i j 2 ] . (8)
To estimate the σ 2 , consider the Mantel-Haenzel estimator of common odds ratio ψ ^ M H as a weighted average of odds ratios. Given the weight, we can solve the σ 2 as a function of var ( log ( ψ ^ M H ) ) , which is:
σ 2 = var ( log ( ψ ^ M H ) ) [ ∑ ( b k c k / N k ) ] 2 / ∑ ( b k c k / N k ) 2 (9)
And the value of var ( log ( ψ ^ M H ) ) can be calculated by the following formula:
var ( log ( ψ ^ M H ) ) = ( ∑ G i P i ) / [ 2 [ ∑ G i ] 2 ] + ∑ ( G i Q i + H i P i ) / [ 2 ∑ G i ∑ H i ] + ∑ H i Q i / 2 [ ∑ H i ] 2
where G i = a i d i / N i , H i = b i c i / N i , P i = ( a i + d i ) / N i , Q i = ( b i + c i ) / N i .
To illustrate the application of these two U-statistics, we applied and compared them to the Breslow-Day statistic and the Zelen statistic in two published data sets: 1) Alcohol assumption data (
To summarize the simulation study that we conducted, the following are some remarks: When the number of strata is not very small, (K ≥ 6), the empirical size of U3 and WU3 were very stable under various situations and stay very close to the nominal of 0.05. In terms of size and power, U3 and WU3 performed better than the Breslow-Day statistic and the Zelen’s exact test. Therefore, U3 and WU3 are considered as better statistics for testing the homogeneity of odds ratios in this situation. The test statistic U3 is recommended when the sample size is the same in each stratum, the number of strata is large and the sample size in each stratum is not large. Otherwise, WU3 is recommended.
Breslow-Day test is conservative in most situations; its empirical size is close
Age | Daily Alcohol Consumption | Odds Ratio | ||
---|---|---|---|---|
(Years) | 80 + g | 0 - 79 g | ||
25 - 34 | Case | 1 | 0 | 33.63 |
Control | 9 | 106 | ||
35 - 44 | Case | 4 | 5 | 5.05 |
Control | 26 | 164 | ||
45 - 54 | Case | 25 | 21 | 5.67 |
Control | 29 | 138 | ||
55 - 64 | Case | 42 | 34 | 6.36 |
Control | 27 | 139 | ||
65 - 74 | Case | 19 | 26 | 2.58 |
Control | 18 | 88 | ||
75+ | Case | 5 | 8 | 40.76 |
Control | 0 | 31 |
Source: Statistical Methods in Cancer Research, volume 1, page 137.
Statistics | Observed value | Expected value | Variance | z value | p-value |
---|---|---|---|---|---|
U3 | 2.24525 | 0.452702 | 0.144733 | 2.24525 | 0.000001228 |
WU3 | 6.25017 | 4.47819 | 8.0402 | 0.624919 | 0.26601 |
Breslow_Day | 9.38159 | 0.0968 | |||
Zelen exact | 0.0968552 | 0.0969 |
Test Site | New Drug | Control Drug | Odds Ratio | ||
---|---|---|---|---|---|
Response | No | Response | No | ||
1 | 0 | 15 | 0 | 15 | 1 |
2 | 0 | 39 | 6 | 32 | 0.06 |
3 | 1 | 20 | 3 | 18 | 0.3 |
4 | 1 | 14 | 2 | 15 | 0.54 |
5 | 1 | 20 | 2 | 19 | 0.48 |
6 | 0 | 12 | 2 | 10 | 0.17 |
7 | 3 | 49 | 10 | 42 | 0.26 |
---|---|---|---|---|---|
8 | 0 | 19 | 2 | 17 | 0.18 |
9 | 1 | 14 | 0 | 15 | 1.07 |
10 | 2 | 26 | 2 | 27 | 1.04 |
11 | 0 | 19 | 2 | 18 | 0.19 |
12 | 0 | 12 | 1 | 11 | 0.31 |
13 | 0 | 24 | 5 | 19 | 0.07 |
14 | 2 | 10 | 2 | 11 | 1.1 |
15 | 0 | 14 | 11 | 3 | 0.01 |
16 | 0 | 53 | 4 | 48 | 0.1 |
17 | 0 | 20 | 0 | 20 | 1 |
18 | 0 | 21 | 0 | 21 | 1 |
19 | 1 | 50 | 1 | 48 | 0.96 |
20 | 0 | 13 | 1 | 13 | 0.32 |
21 | 0 | 13 | 1 | 13 | 0.32 |
22 | 0 | 21 | 0 | 21 | 1 |
Statistics | Observed value | Expected value | Variance | z value | p-value |
---|---|---|---|---|---|
U3 | 1.3056 | 0.762194 | 0.117405 | 1.58593 | 0.0564 |
WU3 | 6.56955 | 4.11497 | 1.09492 | 2.34578 | 0.0095 |
Breslow-Day | 25.7844 | 0.0785 | |||
Zelen exact | 0.0292015 | 0.0292 |
to 0.05 when the sample size is large; when sample size is small, Breslow-Day test is not recommended. Breslow-Day test is never recommended for sparse data;
When the sample size is small and the number of strata is small, say less than 5, Zelen’s exact test is recommended;
In our application, the sample mean and the variance were estimated based on certain assumptions. The empirical power and size of U3 and WU3 would be highly dependent on how well the estimator of σ 2 would be.
This work was partially supported by Cancer Prevention Research Institute of Texas (RP170668).
The authors declare no conflicts of interest regarding the publication of this paper.
Wei, Q. and Lai, D.J. (2019) Test for Homogeneity of Odds Ratios Using U-Statistics. Open Journal of Statistics, 9, 347-360. https://doi.org/10.4236/ojs.2019.93024