### Journal Menu >>

 Open Journal of Statistics, 2012, 2, 172-176 http://dx.doi.org/10.4236/ojs.2012.22019 Published Online April 2012 (http://www.SciRP.org/journal/ojs) Modified W ilcoxon Signed-Rank Test Ikewelugo Cyprian Anaene Oyeka, Godday Uwawunkonye Ebuh* Department of Statistics, Faculty of Physical Sciences, Nnamdi Azikiwe University, Awka, Nigeria Email: *ablegod007@yahoo.com Received January 6, 2012; revised February 10, 2012; accepted February 19, 2012 ABSTRACT This paper briefly reviews the Wilcoxon signed rank sum test and proposes a modification. Unlike the Wilcoxon method, the proposed approach does not require that the populations being studied be continuous. Also unlike the Wil-coxon signed rank test the proposed method, does not require the absence of zero differences or tied absolute values of differences. Rather the proposed method structurally makes provisions for these possibilities. The proposed test statistic also enables the estimation of the probabilities of positive, zero or tied and negative differences within the data. This was illustrated with an example and the proposed method was generally more efficient and hence more powerful than the Wilcoxon test statistic with the power increasing as the number of tied observations or zero differences increases. Keywords: Proposed Method; Probabilities; Positive; Zero; Tied; Negative; Frequencies 1. Introduction Wilcoxon signed rank test is a rank based alternative to the parametric t test that assumes only that the distribu-tion of differences within pairs be symmetric without requiring normality [1]. Let Xi be the ith observation, in a random sample of size n drawn from population X with unknown median M; or let (Xi, Yi) be the ith pair in a paired random sample of size n drawn from population X and Y with unknown M1 and M2 respectively. For the moment, we assume that X and Y are continuous. In the one sample case, interest may be in testing that the unknown population median M is equal to some specified value, M0. In the paired sample case interest may be in testing that the unknown popula-tion medians are equal that is M1 = M2 or that one popu-lation median is equal to at least some multiple of other population median, that is M1= c·M2 + k say, where c (c > 0) and k are real numbers versus appropriate two-sided or one sided alternative hypotheses. If the assumption of parametric test are satisfied, the first hypothesis may be tested using the one sample t-test while the second hy-pothesis may be tested using the paired sample t test. The third hypothesis may however be readily tested using the parametric method because of problems of non-homo- geneity. If the necessary assumptions of the parametric t- test cannot be reasonably made, use of a non-parametric method that often readily suggests itself in these situa-tions is the Wilcoxon signed rank sum test [2]. 1, 2,,in1, 2,,in1, 2,,inIn this paper, we briefly discuss the Wilcoxon method and then proceed to present a modified version of the method that may be appropriate for testing the above hypotheses. 2. The Wilcoxon Signed Rank Sum Test According to [3,4], the Wilcoxon signed rank test is used to test the null hypothesis that the median of a distribu-tion is equal to some value and can be used in place of a one sample t-test, a paired t-test or for ordered categori-cal data where a numerical scale is inappropriate but where it is possible to rank the observations. To use the Wilcoxon signed rank sum test, we first find the difference between the observation and the hy-pothesized median in the one sample problem or the dif-ference between the paired observations in the paired sample problems. That is, in the one sample case, we find di = xi or in the two sample case (di = xi – cyi – k) for . We then take the absolute values of these differences and rank them either from the smallest to the largest or from the largest to the smallest, always taking note of the ranks of the absolute values with positive differences and those with negative differences. The re-quirement that the populations from which the samples are drawn are continuous makes it possible to state at least theoretically that the probability of obtaining zero differences or tied absolute values of the differences is zero. Now, let rdi be the rank assigned to idid1, 2,,in, the absolute value of the ith difference ; for 1,if 0;0,if 0.iiidZd. Let  (1) 1iPZ (2) Let *Corresponding author. Copyright © 2012 SciRes. OJS I. C. A. OYEKA, G. U. EBUH 1731niiiTZrd (3) That is T is the sum of the ranks of the absolute values with positive differences. Now for simplicity but without loss of generality, we let ird i1niiTiZ [2]. Then, (4) Now iZ and 1iVar Z (5) Hence 11EnniiiiTiZiZ That is 12nnTˆ (6) Note that an estimate of θ namely  may be obtained from the expression 1ˆ2nnT2VariiiZ (7) 11Var VarnniiTiZ Cov ,0ZZ ij Since , for ijThat is 12 1Var 16nn nT (8) The null hypothesis that is usually tested in the Wil- coxon signed rank sum approach is [2]. 0021:H versus either a two sided or an appropriate one sided al-ternative hypothesis. For the paired sample case, this null hypothesis is equivalent to 01212M M:HP . Similarly for the one sample case this null hypothesis is equivalent to 00:HMM. A large sample test statistic for any hypothesized value of 0, θ0 say (0 < θ < 1) is given by  0001211126nnTZnn n (9) This has approximately a standard normal distribution under the desired null hypothesis. But under the null hy-pothesis usually tested using the Wilcoxon signed rank test, that is, 001:2H, then Equation (6) becomes 014nnTH (10) and 012 1Var 24nn nTH (11) Hence the test statistics of Equation (9) becomes  1412 124nnTZnn n (12) which under H0 has a standard normal distribution for fairly large sample size n and may be used to test the null hypothesis of equal population medians. When all non zero values are of the same sign, the Wilcoxon signed rank test reduces to the sign test and the two tailed P- value is 112n1, 2, ,in, where n is the number of non zero val- ues [5]. 3. The Proposed Modified Method We here drop the requirement that the populations from which the samples are drawn are continuous. We now only require that the populations be quantitative data measured on at most the ordinal scale. The populations could be continuous or discrete. The requirements of no zero differences or tied absolute values are also no longer necessary; since these problems are taken care of struc-turally by following model specifications. We here con-tinue to use di to represent the difference between xi and the hypothesized population median M0, in the one-sam- ple case and the differences xi – yi (or xi – yi – k) in the paired sample case for . We also continue to use again without loss of generality rd ii to rep-resent the rank assigned to the absolute value of the ith difference, di. 1, if 0; 0, if 0; and1, if 0 for 1,2,,iiiiddZdi n (13) Also let π1iPZ0π0iPZ, , π1iPZ0πππ1 (14) where 1niiTiZ (15) Finally define  (16) Copyright © 2012 SciRes. OJS I. C. A. OYEKA, G. U. EBUH 174 That is, T is the difference between the sum of ranks assigned to absolute values with positive differences and the sum of the ranks assigned to absolute values with negative differences. 0π1πππ1π0iZ  That is iZ2ππ Z (17) and  222200Var 1π1π1ii iZZ Z 2πππ That is Var ππiZ (18) Furthermore  11nniiiiTiZ i   That is 1ππππˆˆππ2nnT (19) Note that may be estimated as  from 1ˆˆππ2VariiiZv2nnT (20) Also 11Var VarnniiTiZ Since , for ,ijZZ 0ijCoTherefore   212 1Var ππ ππ6nn nT  (21) Note that in the case of one sample , and π0π π are respectively on the average the probabilities that the population median is greater than, equal to, or less than the hypothesized median Mo while in the paired sample case, they are on the average the probabilities that one population median is greater than, equal to or less than the other population median. They are estimated respec-tively as the relative frequencies of occurrence of 1, 0 and –1 in the frequency distribution of the n elements of iZ. That is ˆπfn; 00ˆπfn; ˆπfn (22) where f0, f and fi are respectively the frequen-cies of occurrence of 1, 0 and –1 in the frequency distri-bution of Z. Often the null hypothesis required to be tested is 00100:ππ versus :ππ ,say, 11HH    (23) or versus a two tailed or an appropriate one tailed alter-native hypothesis. For the paired sample case, this null hypothesis is equivalent to  012 120:HPM MPM M 00 which for  is the same as 12H01 2:MMMcM k or For the one sample case, the null hypothesis is equiva-lent to  00 00:HPMM PMM 0 Hwhich for 0 implies 00:MM The test statistic  021212 1ππππ6nnTZnn n   (24) which under 0H has a standard normal distribution for fairly large n. In practical application and ππ of Equation (24) are usually replaced with their sample es-timates of Equation (22). Note that as defined by Wilcoxon is the sum of the ranks of the absolute values of positive differences and that θ is the probability of the occurrence of only positive differences. Thus the speci-fication of Wilcoxon’s statistic does not explicitly pro-vide for the possible occurrence of negative differences. So the Tπ in our proposed modification in Equation (14) is automatically set equal to zero in the Wilcoxon’s ap-proach and under 0H the probability of the occurrence of positive differences is hypothesized as 001:2H. Hence if we automatically set π0 and under the null hypothesis we set 0001ππ2  in Equation (24), then the pro- posed test statistic is seen to coincide with Wilcoxon signed rank test statistic could be rewritten as 2412121ππ ππ3TnnZnn n    (25) while the test statistic for the proposed method could be rewritten as 02212121ππ ππ3TnnZnn n   (26) Copyright © 2012 SciRes. OJS I. C. A. OYEKA, G. U. EBUH 175The hypothesis tested under the Wilcoxon’s app roach 00:21 is 00ππ 0 e proposed method. tistics are able hypothesis or reject afalse null hypobe obtained by comuation (25) with the vaHequivalent to :Htested under thTherefore the relative rate at which these two test sta-to accept a true null thesis can therefore -paring the variance of +4T of Eqriance of 2T of Equation (26) that is in terms of the relative efficiency of T compared with +T that is  2Var 44VarRE ;Var 22Var1 TTTT TT (27) ππ ππ  In other words  211RE ;ππππ ππTT   sinThat is ce 2ππ 0 011π (28) omTherefore 1 (29) fo increases. roposed modified (T) is more efficient than the Wilcoethod RE ;TTSince fr Equation (15), 0ππ1π RE ;TTr 0π0, increasing as 0πHence, the pxon’s mT whenever 0π is not ere are zero differences or equal to 0, thatver thservta. fficiency of T c0,eth rete the actual and the ideal number have. The results are pre-olute differences are assigned their mean ranks. The results are presented in Table 2. Ta is whenetied obations in the daThe relative eompared with T increases as 0π increases. The two methods are equally efficient when 0π that is when there are no ties in the data whatsoever. Thus unless there are no ties whatsoever in the data, the proposed mod yields a mo powerful test crite-rion than the Wilcoxon signed rank sum test statistic for the same sample size. only4. Illustrative Example A random sample of twelve married women were se-lected and asked to staof children they would like tosented in Table 1. To apply the Wilcoxon signed rank test, we take and rank the differences between the actual and ideal number of children by the sample of married women shown in Table 1. Tied absble 1. Actual and ideal number of childre n by a sample of married women. Woman Actual Ideal 1 4 5 2 1 5 3 6 5 4 1 6 5 7 5 6 1 9 7 4 4 8 2 6 9 8 8 10 5 5 11 4 4 12 4 5 Table 2. Ranks of absolute diffces erenird , ofe dif-ferences ditween actual and ideal number of cdren in Table 1. No. of Wo12346 7 8 9 1112 thhil bemen5 10Ideal 5 5 5 6 5 9 4 5 4 56 8 Actual 4 1 6 1 7 1 4 2 8 5 44di (Ideal-Actual)14–15–2 8 0 4 0 0 01id 1 4 1 5 2 8 0 4 0 0 0 1Rank of id 69.56 11 8 12 2.5 9.5 2.52.52.56Sign of id ++ – + – + 0 + 0 00+ we hahhe rans absote differe pii.5 1565From Table 2ve tat ttive e sumsig of ts hk oflunces withosns 691129.4T The null hypothesis to be tested with the Wilcoxon si0.5. Hence under   gned rank test is usually that the two populations of interest have equal medians 00:H0H we obtain from Equation (6) and (8) that 12 13394T and 12 1325Var 162.524T 00:0.5H The resulting test statistation (12)) under the Wilcoxon approach is ic (Equ54 3915177 1.1812.748162.50Z significant at the 5 percent level. We now apply the modified Wilcoxon signed rk test to the data of Table 1 for comparative purposes. Now from Table 2 and Equation (16) we have that 1.(P-value = 0.1190) which is not statisticallyanCopyright © 2012 SciRes. OJS I. C. A. OYEKA, G. U. EBUH Copyright © 2012 SciRes. OJS 176 6 840 and from54TT T Equation (22) 6ˆπ0.5012 ; 0ˆπ0.3312 ; 4 2ˆπ0.1712 from Equations (19) and (21), we have that  0.17 25.74 and 12 130.502T   2Var0.500.170.50 0.176 364.715T test the null hypothesis of equal population medi-ng the modified approach we have from Equation 012 1325Toans usi(24), with 0 4040 2.094Z e two populations have equal medians. Note that thhypothesis was accepted using the Wilcoxon signed rank test statistic unmoresence of ties in19.098364.715(P-value = 0.0183) which is stalytistical significant at 5 percent level. Hence we now reject the null hypothesis that this same null dified for possible pthe data. Also note from Equation (28) that the relative effi-ciency of the modified test statistic T to the Wilcoxon test statistic T is estimated as 010.33 0.671πThus for the data being analysed, the proposed test sta-tistic is at least 1.49 times more e111RE ; 1.49TT fficient and hence morepowerful than the Wilcoxon signed rank test statisticT. Thipaper briefly reviews the Wilcoxon signed rank sum a modification. Unlike the Wilcoxon sed approach does not require that the [1] R. De-W. Derryberry, S. B. Schou and W. J. Conover, “Teaching Raizing Structural Similarities toric Tests,” Journal earning Support Centre, 2004, pp. 1- /.../NonParametrics. 5. Conclusion s test and proposes method, the propopopulations being studied be continuous. Also unlike the Wilcoxon signed rank test the proposed method, does not require the absence of zero differences or tied absolute values of differences. Rather the proposed method struc-turally makes provisions for these possibilities. The pro-posed test statistic also enables the estimation of the probabilities of positive, zero or tied and negative differ-ences within the data. The proposed method shown to be generally more efficient and hence more powerful than the Wilcoxon test statistic with the power increasing as the number of tied observations or zero differences in-creases. REFERENCES nk-Based Tests by Emphas Corresponding Parametof Statistics Education, Vol. 18, No. 1, 2010, pp. 1-19. www.amstat.org/publications/jse/v18n1/derryberry.pdf [2] D. Gibbon, “Non Parametric Statistics,” McGraw Hill, New York, 1971. onlinelibrary.wiley.com/doi/10.1111/j.2044-8317 [3] R. Shier, “Statistics,” The Wilcoxon Signed Rank Sum Test, Mathematics L3. mlsc.lboro.ac.uk/resources/statistics/signtest.pdf [4] I. C. A. Oyeka, “An Introduction to Applied Statistical Methods,” 8th Edition, Nobern Avocation Publishing Company, Enugu, 2009, pp. 496-533. [5] pdf B. H. Robbins, “Non Parametric Tests,” Scholars Series, 2010, pp.1-30. biostat.mc.vanderbilt.edu/wiki/pub/Main