Open Journal of Statistics
Vol.05 No.04(2015), Article ID:56539,8 pages
10.4236/ojs.2015.54026

On Discordance Tests for the Wrapped Cauchy Distribution

A. H. Abuzaid1, M. M. El-hanjouri2, M. M. Kulab2

1Department of Mathematics, Faculty of Science, Al-Azhar University-Gaza, Gaza, Palestine

2Department of Applied Statistics, Faculty of Economics and Administrative Sciences, Al-Azhar University-Gaza, Gaza, Palestine   Received 28 March 2015; accepted 17 May 2015; published 22 May 2015

ABSTRACT

Circular data as any other types of data are subjected to contamination with some unexpected observations which are known outliers. In this paper, four tests of discordancy for circular data based on M, C, D, and A statistics are extended to the wrapped Cauchy distribution to detect possible outliers. The cut-off points and the power of performances are investigated via extensive simulation study. Results show that tests perform better as the concentration of the samples is increased. Two real circular data sets are analysed for illustration.

Keywords:

Arc Length, Circular Distance, Outlier, Wrapped Normal Distribution 1. Introduction

Circular data refer to a set of observations measured by angles and distributed within radians and it can be presented on the circumference of a unit circle. Circular data need special statistical methods to be described and modeled rather than the conventional linear techniques. Circular data can be found whenever periodic phenomena occur; it is the source of interest to scientists in many fields, including: biology, meteorology, physics, psychology, image analysis, medicine, astronomy, social sciences and earth sciences, see  . The existence of outliers is considered as one of the most common problems in statistical analysis. This can be extended to circular data due to the expected influence of outliers on the parameters estimates. Outliers in the context of circular data would be defined as a set of observations which is inconsistent with the rest of the sample. It is expected to lie far from the mean direction of the circular sample. Despite this, there are only a few numerical and graphical tests of discordancy in circular samples. The problem of outliers in different types of circular data including univariate samples, regression, functional relationship models and circular time series are addressed by several authors (see  - ).

The rest of this paper is organized as follows: Section 2 describes the properties of the wrapped Cauchy distribution. Section 3 presents four discordance tests to detect possible outliers in circular univariate data. In Section 4, the cut-off points for tests are obtained based on samples generated from the wrapped Cauchy distribution. The power of performances is investigated via simulation studies in Section 5. Lastly, we apply the statistics on two real data sets for illustration in Section 6.

2. Wrapped Cauchy Distribution

A circular random variable can be obtained from any random variable on the real line X with probability density function , and distribution function by defining .

That’s mean wrapping the original distribution on the real line around the circle to get the wrapped distribution. The Cauchy distribution on the real line with the density (1)

where and are the mean and standard deviation, respectively. Once we wrapped the around the circle, then we get to the wrapped Cauchy distribution with probability density function denoted by and given by: (2)

where is the mean direction and is the concentration parameter that is called the mean resultant length. Then, the distribution function of the wrapped Cauchy is given by: Reference  introduced the wrapped Cauchy distribution, and  illustrated that the wrapped Cauchy distribution can be obtained by mapping Cauchy distribution on to the circle by the transformation . Reference  quantified the dispersion measure for the wrapped Cauchy distribution by a concentration

parameter, and is given in the form, and he explained that as approaches 0, the distribution converges to the circular uniform distribution with probability density function;

and as approaches one, the distribution tends to the point distribution concentrated in the direction.

The distribution is unimodal and symmetric about, Reference  illustrated that the distribution enjoys the additive property and the central limit theorem, on other words, the convolution of the wrapped Cauchy distributions and is the wrapped Cauchy distribution. One of the main features of the wrapped Cauchy distribution that has a heavy tail even for large concentrations, which make the detection of outlier a hard task.

3. Discordance Tests for Circular Samples

Suppose that we are given angles that are observations in a random circular sample of size from a circular population. We consider four discordance tests based on M, C, D, and A statistics to identify outliers in a univariate circular sample from the WC distribution. Under the null hypothesis that is not an outlier.

3.1. M Statistic

The statistic was proposed by  and given in the following formulation, , where

is the resultant length and such that and, and is the resultant length by excluding the ith observation. Reference  approximated the asymptotic distribution of the statistic for large values of the concentration parameter by a standard normal distribution after reformulation of the M statistic in terms of:

(3)

where.

3.2. C Statistic

It was proposed by  , and given by

(4)

where is the mean resultant length of circular data set and is the mean resultant length by excluding the ith observation.

3.3. D Statistic

It was derived based on the relative arc lengths between the ordered observations of a circular sample where. Let be the arc length between consecutive observations and defined by,

and. Define, and. Let corresponds

to the greatest arc containing a single observation. The is two tailed statistic, therefore,  suggested the consideration of the minimum value of and its inverse, where.

3.4. A Statistic

Reference  defined the circular distance between two angles and as. Recently, 

proposed a new test based on the summation of all circular distances

from the point of interest to all other points, and given in the form

(5)

Furthermore, the approximated distribution of the A statistic was discussed in  .

For the mentioned four tests of discordancy the cut-off points at three percentiles 10%, 5% and 1% are obtained based on simulation studies for samples generated from von Mises distribution with various sample sizes and concentration parameters, and also for the wrapped normal distribution (see  ). The values of statistics are then compared with the associated cut-off points, if the value of statistics is greater than the cut-off point, then the null hypothesis is rejected and the observation is labeled as an outlier.

4. Cut-Off Points of the Discordance Tests

In this section, we obtain the cut-off points for the four test statistics based on simulation studies. The percentage points of the null distribution of free outliers in the generated random circular samples from the wrapped Cauchy distribution, with mean direction zero and concentration parameter,. We consider 12 values of the concentration parameter in the range of 0.1 to 0.999 and 20 different sample sizes from 5 to 150. For each generated random sample the values of the four considered statistics M, C, D and A are calculated based on the formulas in Section 3.

For each combination of the sample size n and concentration parameter, the process is repeated 3000 times to ensure the convergence of the desired percentiles (cut-off points). The obtained statistics are sorted in ascending manner and then 10%, 5% and 1% upper percentiles of free outliers samples are obtained. Tables 1-4

Table 1. The 5th percentile cut-off points for the test based on the M statistic.

Table 2. The 5th percentile cut-off points for the test based on the C statistic.

Table 3. The 5th percentile cut-off points for the test based on the D statistic.

Table 4. The 5th percentile cut-off points for the test based on the A statistic.

present part of the cut-off points at 5% percentiles. The comprehensive cut-off points are available upon request from the authors. From the obtained cut-off points we notice that:

Firstly, as one would expect, there are an inverse relationship between the cut-off points and the level of percentiles. Secondly, for M statistic the increase of the concentration parameter increases the cut-off of points, while the increase the sample size n decreases the cut-off points. Thirdly, the cut-off points of D statistic are fluctuating slightly for, and correlated indirectly with either sample size n or concentration parameter for. Fourthly, for C statistic the cut-off points are a decreasing function of the concentration parameter and there are an inverse relationship between the cut-off points and the sample size. Lastly, the cut-off points of A statistic keep increasing as the concentration parameter increase up to, and then the cut-off points are rapidly approach zero for. Furthermore, the increase of sample size reflects on the concentration parameters as follows: 1) for small concentration parameter the cut-off points decreases gradually; 2) for the cut-off points almost constant; 3) for high concentration parameter the cut-off points increases gradually.

5. Performance of the Discordance Tests

The power of performance of discordancy tests can be evaluated via several measures. References   stated that a good test of discordancy should have: 1) a high power function; where is the Type-II error; 2) a high probability of identifying a contaminating value as an outlier when it is in fact an extreme value, where an extreme value is defined as a point with the maximum circular deviation, denoted by; and 3) a low probability of wrongly identifying a good observation as discordant, where.

To study the performances of the four discordancy tests, we use 3000 samples based on different sizes n and concentration parameter. The samples are generated in such a way that of the observations come from and the remaining one observation comes from, where is the degree of contamination and. The M, C, D, and A statistics in each random sample are then calculated based on corresponding equations as given in Section 3. Furthermore, the values of power performances are obtained.

Figure 1 illustrates the behavior of power of performances of the tests for different cases. The main results can be summarized as follows:

Firstly, the performance for all statistics increases when we increase the contamination value (Figure 1(a) and Figure 1(d)) and tests outperform for (Figure 1(c) and Figure 1(d)). C and A statistics perform better than other statistics for large contamination levels, while M statistic is better for small contamination level. Secondly, there is an increasing function between the power of performances and the concentration parameter (see Figure 1(a) and Figure 1(c)). Thirdly: For any sample size, all considered discordancy tests at moderate or less concentration parameter, the values of P1 are very low (less than 0.1) regardless the contamination level (Figure 1(a) and Figure 1(b)). The weak performances for small concentration parameter is attributed to heavily tails of the wrapped Cauchy distribution, similar trends are observed for P3 and P5. Lastly, the difference between P1 and P3 generally are very closes to 0 for all cases.

6. Real Data Analysis

For illustration purposes, two real data sets following the wrapped Cauchy distribution are considered to be analyzed, and to apply the proposed tests of discordancy to illustrate their performance in real data as given in the following subsections.

6.1. The Ants’ Direction Data

Reference  randomly selected the directions chosen by 100 ants toward a black target when they are released in a round arena as a part from a study conducted by  . The wrapped Cauchy distribution has been shown to be the best distribution for the data  . The estimates of location parameters, namely circular mean and median are 183˚ and 180˚, respectively. Which are close to each other and reflects the symmetry of the data distribution. Two measures of dispersion inform that the data are moderately concentrated, where the estimates of mean resultant length and concentration parameter are 0.61 and 0.65 respectively.

Table 5 gives the actual values of each test statistics, the corresponding cut-off points for and, associated with the decision. None of the tests values is exceeded the associated cut-off points, thus we may conclude that the ant’s direction data set is free of any outliers.

Figure 1. Relative performances of discordancy tests for wrapped Cauchy distribution.

Table 5. Results of discordancy tests on ants’ direction data.

6.2. Wind Data

It consists of the wind direction at 6 a.m. and 12 noon were measured each day at the weather station in Milwaukee for 21 consecutive days. Reference  proposed a circular-circular regression model with error follow the wrapped Cauchy distribution. The curve is expressed as a form of the Mȍbius circle transformation. As an example,  used their model for regressing this data at 12 noon on that at 6 a.m. The maximum likelihood estimates of the parameters are and. The circular error that obtained from the circular regression model is consisted of 21 observations measured in radian and presented in Figure 2.

The circular mean and median of circular error is very close to zero (−0.04) and 0.031, respectively, and the estimate of the mean resultant length and concentration parameter are 0.552 and 0.773 respectively. Reference  considered observations number 5, 7, 12, 17 and 20 as outliers without using any discordance test, and they stated that “Apart from five outliers, the proposed model seems to provide a satisfactory fit to the data”. We have implemented four discordancy tests M, C, D, and A to test whether the suspected five observations are outliers or not.

Table 6 presents the actual values of the discordancy test statistics, their corresponding cut-off point and the decision, for and. Results show that in the first iteration, C statistic was able to detect observation number 5 with value 3.44 as an outlier, while other tests failed to identify any point as outlier.

In order to detect any other outliers, observation number 5 is excluded and the descriptive statistics are re-estimated, the mean of circular error is −0.015 which gets closer to zero and the estimates of the mean resultant length and concentration parameter are 0.62 and 0.8 respectively. Then, the four tests of discordancy are obtained as given in the second iteration in Table 6, for at 0.05 level of significance. The four tests of discordancy agreed to identify observation number 17 as a suspected outlying observation but none of them identified it as an outlier where the tests values are less than the corresponding cut-off points.

Figure 2. Circular plot of circular error of the wind data.

Table 6. Results of discordancy tests on wind data.

7. Conclusion

In this paper four tests of discordancy M, C, D and A were extended for the wrapped Cauchy distribution; the cut-off points and the power of performances were investigated via extensive simulation study. It was noticed that for any sample size, all considered discordancy tests at moderate or less concentration parameter (), the power of performances is very low (less than 0.1) regardless the contamination level λ due to the heavy tailed characteristics of the wrapped Cauchy distribution. Thus, it is recommended to propose various circular regression and functional relationship models with wrapped Cauchy error which is expected to be more robust to the existence of outliers. Moreover, the tests were applied on ants’ data set and wind direction data set.

References

1. Jammalamadaka, S.R. and Sengupta, A. (2001) Topics in Circular Statistics. World Scientific Press, Singapore.
2. Collett, D. (1980) Outliers in Circular Data. Applied Statistics, 29, 50-57. http://dx.doi.org/10.2307/2346410
3. Hussin, A.G., Abuzaid, A., Zulkifili, F. and Mohamed, I. (2010) Asymptotic Covariance ad Detection of Influential Observations in a Linear Functional Relationship Model for Circular Data with Application to the Measurements of Wind Directions. Science Asia, 36, 249-253. http://dx.doi.org/10.2306/scienceasia1513-1874.2010.36.249
4. Abuzaid, A.H., Mohamed, I.B. and Hussin, A.G. (2012) Boxplot for Circular Variables. Computational Statistics, 27, 381-392. http://dx.doi.org/10.1007/s00180-011-0261-5
5. Rambli, A., Mohamed, I., Hussin, A.G. and Ibrahim, S. (2012) On Discordance Test for the Wrapped Normal Data, Sains Malaysiana, 41, 769-778.
6. Ibrahim, S., Rambli, A., Hussin, A.G. and Mohamed, I. (2013) Outlier Detection in a Circular Regression Model Using COVRATIO Statistic. Communications in Statistics - Simulation and Computation, 42, 2272-2280. http://dx.doi.org/10.1080/03610918.2012.697239
7. Abuzaid, A.H., Mohamed, I.B. and Hussin, A.G. (2014) Procedures for Outlier Detection in Circular Time Series Models. Environmental and Ecological Statistics, 21, 793-809. http://dx.doi.org/10.1007/s10651-014-0281-8
8. Lévy, P. (1939) L’addition des variables aléatoires définies sur une circonférence. Bulletin de la Société Mathématique de France, 67, 1-41. http://www.numdam.org/item?id=BSMF_1939__67__1_0
9. McCullagh, P. (1996) Mȍbius Transformation and Cauchy Parameter Estimation. Annals of Statistics, 24, 787-808. http://dx.doi.org/10.1214/aos/1032894465
10. Fisher, N.I. (1993) Statistical Analysis of Circular Data. Cambridge University Press, London. http://dx.doi.org/10.1017/cbo9780511564345
11. Mardia, K.V. and Jupp, P.E. (2000) Directional Statistics. John Wiley & Sons, London.
12. Mardia, K.V. (1975) Statistics of Directional Data. Journal of the Royal Statistical Society, Series B, 37, 349-393.
13. Rao, J.S. (1969) Some Contributions to the Analysis of Circular Data. Ph.D. Thesis, Indian Statistical Institute, Calcutta.
14. Abuzaid, A.H., Mohamed, I.B. and Hussin, A.G. (2009) A New Test of Discordancy in Circular Data. Communications in Statistics―Simulation and Computation, 38, 682-691. http://dx.doi.org/10.1080/03610910802627048
15. Abuzaid, A.H., Hussin, A.G., Rambli, A. and Mohamed, I.B (2012) Statistics for a New Test of Discordance in Circular Data. Communications in Statistics―Simulation and Computation, 41, 1882-1890. http://dx.doi.org/10.1080/03610918.2011.624239
16. David, H.A. (1970) Order Statistics. Wiley, New York and London.
17. Barnett, V. and Lewis, T. (1984) Outliers in Statistical Data. 2nd Edition, John Wiley & Sons, Chichester.
18. Jander, R. (1957) Die optische Richtungsorientierung der Roten Waldameise (Formica rufa L.) Zeitschrift tiir vergMehende Physiologie Bd, 40, 162-238.
19. Ravindran, P. and Ghosh, S.K. (2012) Bayesian Analysis of Circular Data Using Wrapped Distributions. Journal of Statistical Theory and Practice, 5, 547-561. http://dx.doi.org/10.1080/15598608.2011.10483731
20. Kato, S., Shimizu, K. and Shieh, G.S. (2008) A Circular-Circular Regression Model. Statistica Sinica, 18, 633-643.