 Open Journal of Statistics, 2012, 2, 305-308 http://dx.doi.org/10.4236/ojs.2012.23037 Published Online July 2012 (http://www.SciRP.org/journal/ojs) The Shortest Width Confidence Interval for Odds Ratio in Logistic Regression Eugene Demidenko Section of Biostatistics and Epidemiology, Geisel School of Medicine at Dartmouth, Hanover, USA Email: eugened@dartmouth.edu Received May 16, 2012; revised June 18, 2012; accepted July 2, 2012 ABSTRACT The shortest width confidence interval (CI) for odds ratio (OR) in logistic regression is developed based on a theorem proved by Dahiya and Guttman (1982). When the variance of the logistic regression coefficient estimate is small, the shortest width CI is close to the regular Wald CI obtained by exponentiating the CI for the regression coefficient esti-mate. However, when the variance increases, the optimal CI may be up to 25% narrower. It is demonstrated that the shortest width CI is favorable because it has a smaller probability of covering the wrong OR value compared with the standard CI. The closed-form iterations based on the Newton’s algorithm are provided, and the R function is supplied. A simulation study confirms the superior properties of the new CI for OR in small sample. Our method is illustrated with eight studies on parity as a preventive factor against bladder cancer in women. Keywords: Bladder Cancer; Coverage Probability; Logistic Regression; Newton’s Algorithm 1/2 1/2ˆˆ,zz1. Introduction Odds ratio, as the exponentiated logistic regression co- efficient, is a popular measure of association in medicine, epidemiology and biostatistics. Routinely, the confidence interval (CI) for odds ratio (OR) in logistic regression is computed by exponentiating the CI for the beta-co- efficient (log OR, hereafter denoted as ), [1,2]. While it is true that if a CI for  has coverage probability 1 the exponentiated CI for OR has the same coverage probability, such CI does not have the shortest width and therefore can be improved. The goal of this note is to demonstrate how to compute the shortest CI for OR using a theorem proved in . Previously,  sug- gested to find the shortest confidence interval for OR using the same approach but their procedure of minimi- zation of the interval’s width was just an approximate solution. In this paper, we find the exact minimum via Newton’s iterations. 2. The Method Let the coefficient of logistic regression  be estimated by maximum likelihood (ML) so that 2ˆ,OR = in large sample. We want to construct the shortest CI for e based on ˆ assuming that its variance 2 is known. In practice, this variance is not known but usually the sample size is large enough, so that one can assume that 2 is fixed. Routinely, one first constructs the CI for 100 1% as  and then exponentiates it to ob- 100 1% CI for OR as tain the 1/2 1/2ˆˆ,,zzee1/2z where is the 12th quantile of the standard normal cdf, 1=12,z=0.051/2 where  is the cdf of the stand- ard normal distribution. For example, if  we have 1/2 This CI will be refered to as the (traditional) Wald CI with symmetric z-values. =1.96.z1,The idea of the shortest CI is to chose asymmetric z-values such that the coverage probability is the same,  but the length of the CI is minimum. Thus we seek CI for OR in the form 12ˆˆ,zzee12 1.2. probability was computed as the proportion of simulated samples for which CI covers the true OR; the CI width is computed as the median of 5000 widths (we prefer median over mean to reduce the unwanted effect of outliers in case of false convergence, especially in the case of large OR values). The results of our simulations are depicted in Figure 3. The Shortest Width CI has the width con- Figure 3. The coverage probability and the width of two CIs for OR in logistic regression from a simulation study (the number of experiments = 5000; the nominal coverage pro- bability = 95%). Both methods have coverage probability close to the nominal level. However, the “shortest width” CI has the width shorter than the traditional one on average (the width is computed as the median to avoid possible out-liers). This difference increases with the value of the true R. O Copyright © 2012 SciRes. OJS E. DEMIDENKO Copyright © 2012 SciRes. OJS 308 Table 1. Odds ratios and their confidence intervals for child birth/parity as a preventive factor against bladder cancer in women computed via the traditional way and the shortest-width CI in eight studies. Study OR σ Lower CI standard Upper CI standard Lower CI shortest Upper CI shortest % width reduction Cantor 1992 0.67 0.201 0.45 0.99 0.43 0.96 1.9 LaVecchia 1993 1.08 0.315 0.60 2.06 0.51 1.87 6.8 Cantwell 2006 0.70 0.221 0.45 1.07 0.43 1.04 1.6 McGrath 2006 0.78 0.188 0.54 1.13 0.52 1.10 1.7 Prizment 2007 0.66 0.240 0.41 1.05 0.38 1.01 1.6 Davis-Dao 2009 0.66 0.160 0.48 0.90 0.47 0.88 2.4 Huang 2009 0.43 0.386 0.20 0.91 0.16 0.83 5.6 Dietrich current 0.71 0.293 0.40 1.26 0.36 1.18 4.7 delta2 = (d1 + d2*dnorm(z1))/den sistently smaller that the regular CI although for this par- ticular simulation set up the gain is not very substantial. if(abs(delta1) + abs(delta2) < eps) break 5. Example z1 = z1 + delta1 z2 = z2 – delta2 We illustrate the computation of the shortest width CI for OR using a recently published article on the meta-ana- lysis of preventive and risk factors for bladder cancer in women . Table 1 presents the results of eight case- control studies where the bladder cancer occurrence was correlated with woman’s parity. In most studies, it was found that child birth is a statistically significant pre- ventive factor against bladder cancer. Traditional and shortest width CIs for OR are presented. The percent width reduction is in the range from 1.6 to 6.8. Note that the shortest width CI tends to reduce the upper limit. } return(c(z1,z2)) } 7. Acknowledgements This work was supported by a grant from NIH/NCI R01 CA130880. REFERENCES  A. Agresti, “Categorical Data Analysis,” 3d Edition, Wiley, New York, 2002. 6. The R Function  B. Rosner, “Fundamentals of Biostatistics,” 7th Edition, Pacific Grove, Duxbury, 2010. The following function implements the Newton’s itera- tions described in the previous section. For example z1z2 (sigma = 0.201) returns values for and as −2.199928 1.797928.  R. C. Dahiya and I. Guttman, “Shortest Confidence and Prediction Intervals for the Log-Normal,” Canadian Jour- nal of Statistics, Vol. 10, No. 4, 1982, pp. 277-291. doi:10.2307/3556194 1z2zz1z2 = function(sigma,alpha = 0.05, eps = 0.000001,maxit = 100)  P. D. Wilson and P. Langenberg, “Usual and Shortest Con- fidence Intervals on Odds Ratios from Logistic Regres- sion,” The American Statistician, Vol. 53, No. 4, 1999, pp. 332-335. { z1 = qnorm(alpha/2) z2 = –z1 for(it in 1:maxit)  K. Dietrich, E. Demidenko, A. Schned, M. S. Zens, J. Heaney and M. R. Karagas, “Parity, Early Menopause and the Incidence of Bladder Cancer in Women: A Case— Control Study and Meta-Analysis,” European Journal of Cancer, Vol. 47, No. 4, 2011, pp. 592-599. doi:10.1016/j.ejca.2010.10.007 { den = dnorm(z1) + dnorm(z2) d1 = pnorm(z2) – pnorm(z1) – 1 + alpha d2 = z1 + z2 + 2*sigma delta1 = (d1 – d2*dnorm(z2))/den