Journal of Applied Mathematics and Physics
Vol.04 No.08(2016), Article ID:69918,13 pages
10.4236/jamp.2016.48165
A New Approach for Dispersion Parameters
Ahmed Mohamed Mohamed El-Sayed
Department of Management Information Systems, High Institute for Specific Studies, Nazlet Al-Batran, Giza, Egypt

Copyright © 2016 by author and Scientific Research Publishing Inc.
This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/



Received 12 June 2016; accepted 19 August 2016; published 22 August 2016
ABSTRACT
This paper presents a new approach to identify and estimate the dispersion parameters for bivariate, trivariate and multivariate correlated binary data, not only with scalar value but also with matrix values. For this direction, we present some recent studies indicating the impact of over- dispersion on the univariate data analysis and comparing a new approach with these studies. Following the property of McCullagh and Nelder [1] for identifying dispersion parameter in univariate case, we extended this property to analyze the correlated binary data in higher cases. Finally, we used these estimates to modify the correlated binary data, to decrease its over-dispersion, using the Hunua Ranges data as an ecology problem.
Keywords:
Measures of Association, Correlated Binary Data, Dispersion Parameters, Scaled Deviance, Scalar Value, Scalar Matrix

1. Introduction
The dispersion parameter should be the unity in case of the univariate Bernoulli data, but there may be deviation if there is a sequence of the Bernoulli outcomes included in a study that may lead to a binomial variable. The over-dispersion is happened if the variance of actual response is more than the nominal variance,
, as a function of the mean,
. The estimation of dispersion parameter in the univariate case can be obtained easily using the Pearson’s Chi-square or the deviance function. Many studies have devoted the over-dispersion criteria in the univariate case, namely, when the binomial data are used. It is difficult to extend these methods to estimate the dispersion parameters in the bivariate case, because in the bivariate case, the association between correlated response variables may be happened. So, we must take this association into account when estimate the dispersion parameter. But in the independence case, the estimate of dispersion parameter is performed as in the univariate case. The estimate of dispersion parameters for the bivariate correlated binary data can be obtained using different methods. The first one when the dispersion parameter is scalar. The second one when we have a matrix values of dispersion parameters. These estimates can be extended to the trivariate and multivariate correlated binary data. So, we present a new approach to identify and estimate the dispersion parameters, in scalar and matrix values, for the bivariate, trivariate and multivariate correlated binary data. Also, after obtaining these estimates we can modify the correlated binary data, this happens to obtain a dispersion parameter equal or near to the unity.
This paper can be organized as follows: Some of the previous studies are presented in the Section 2.
A proposed approach for identifying and estimating the dispersion parameters in a scalar and matrix values, and the impact of over-dispersion in the case of bivariate, trivariate and multivariate binary outcomes associated with covariates, are demonstrated in the Sections 3, 4 and 5, respectively.
Finally, the numerical examples for the vectorized generalized additive model, VGAM, or vectorized generalized linear model, VGLM, Yee and Wild [2] , and the alternative quadratic exponential form, AQEF, measure, El-Sayed et al. [3] , are demonstrated in Section 6.
2. Previous Studies
In this section, we present some studies on the over-dispersion problem as shown below:
(1) Smith and Heitjan [4] provided an appropriate statistical tool to detect extra binomial variation (over-disp- ersion). To test the nominal dispersion in the i-th (
) margin, it is important to give the relation, for
trials,
(1)
The hypothesis testing problem is formulated as

An appropriate procedure to test
is the score statistic suggested by Smith and Heitjan
(2)
where
is a random vector that registers the difference between actual information and nominal information, in the i-th margin with respect to every j-th (
) parameter, for
observations, namely
(3)
And
is the covariance matrix of
corrected for estimation of linear predictors,
, where



with p degrees of freedom. The eventual rejection of 

(2) Cook and Ng [5] described a bivariate logistic-normal mixture model for over-dispersed two state Markov processes. The use of these mixed models cause increase in the standard error of marginal probability estimates. They did not specify the explicit form for the over-dispersion estimate, but display the log-likelihood function for the full sample of m subjects, as

where, the expectation, 


(3) Saefuddin et al. [6] showed the effect of over-dispersion on the hypothesis test of logistic regression.
A simple method proposed by William, [7] , was used to correct the effect of over-dispersion by taking inflation factor into consideration. This method takes account of adjusting the estimate of the standard error of the parameter resulting from the over-dispersion. Modeling of the over-dispersion is often expressed in the equation of the variance of response variable, 


where 











where





The algorithm of the William method is described as follows:
1. Assume


2. Compare 





3. Using the initial weights

we can recalculate the value of 

4. If 




If 




(4) Davila et al. [9] introduced a new approach for modeling the multivariate marginals over-dispersed binomial data. They illustrate this approach by analyzing the data using the Gaussian copula with Beta-binomial margins. In order to model the over-dispersion, they used the Beta-binomial model, a generalization of binomial distribution, Casella and Berger [10] . In this model, it is supposed that




where,


The conditional variance is

From the relation (12), we see that the marginal dispersion parameter is

Comparing the relation (1) with the relation (12), it is noted that the later has a greater variance. In their study, as compared with the multivariate normal (MVN), the marginal GLM, and the marginal over-dispersion model (ODM), they have shown that the model based on the Beta-binomial model (BBM) displayed the higher standard errors associated to estimated parameters.
(5)-The vectorized generalized additive model (VGAM) introduced by Yee and Wild [2] and implemented by Yee [11] [12] . The conditional distribution of VGAM function for bivariate correlated binary responses, 

where, 
And the

The conditional distribution of VGAM family function for trivariate binary responses, 

Note that a third order association parameter, 

The conditional distribution of VGAM (VGLM) function for multivariate correlated binary responses, 

where 
In the next section, we suggest a new approach to estimate the dispersion parameter, 
Using the following notations which imply to the link functions which enable us to use the regression model:

we have the log-likelihood function for the bivariate AQEF measure as

The log-likelihood function for the trivariate AQEF measure is

where,
Finally, the log-likelihood function for the multivariate AQEF measure is

where,

3. Dispersion Parameters in Bivariate Case
In this section, we determine the identification and estimation of a fixed value for dispersion parameter, 
3.1. Scalar Dispersion Parameter
We can use the variance-covariance matrix of 




Following the GLM property, the variance-covariance matrix of Y is
where,
And,
Then, the estimator of

Hence, we can show that

Then,
Follows the non-central



3.2. Matrix of Dispersion Parameters
Now, we use different values for dispersion parameter, such that 



The estimator of dispersion parameters matrix is
Then,

From the equation (26), we have
Follows the non-central




We can correct the data using the estimates of dispersion parameters, 

4. Dispersion Parameters in Trivariate Case
We can define the response vector


4.1. Scalar Dispersion Parameter
The variance-covariance matrix of Y can be written as

where,
The estimator of

Since,
Follows the non-central



4.2. Matrix of Dispersion Parameters
The variance-covariance matrix of Y can be displayed as

The estimator of dispersion parameters, 

Since,
Follows the non-central




Similarly, we can correct the data using the estimates of dispersion parameters, 


5. Dispersion Parameters in Multivariate Case
We can define the response vector


5.1. Scalar Dispersion Parameter
The variance-covariance matrix of Y can be written as

where,
The estimator of


Since,
Follows non-central



5.2. Matrix of Dispersion Parameters
The variance-covariance matrix of Y can be displayed as

The estimator of dispersion parameters, 

Since,
Follows non-central







6. Numerical Examples
In this section, we present two examples. The first one applies to the bivariate correlated binary data. This example presents the results obtained by using AQEF measure and the VGLM measure which are similar in the bivariate case. The second one applies on the trivariate binary data. However, the third association is absent in the VGAM (VGLM) measure. In both examples, we will use the Hunua Ranges data, Yee [11] [12] . These data were collected from the Hunua Ranges, a small forest in the Southern Auckland, New Zealand.
At 392 sites in the forest, the presence/absence of 17 plant species was recorded along with the altitude. Each site was of area size 200 m2. The Hunua Ranges data frame has 392 rows and 18 columns. Altitude is a continuous variable, and there are binary responses (presence = 1, absence = 0) for 17 plant species. These data frame contains the following columns: agaaus, beitaw, corlae, cyadea, cyamed, daccup, dacdac, eladen, hedarb, hohpop, kniexc, kuneri, lepsco, metrob, neslan, rhosap, vitluc and altitude (meters above the sea level).
6.1. Application to Bivariate Case
Hence, we will use the first two columns, agaaus and beitaw, as correlated binary outcome variables, 

We will use the estimates, 

From Table 1 and Table 2, we demonstrate the conclusions after modifying the correlated data by the estimates of dispersion parameters, as follows:
1. The estimates of the regression parameters are changed.
2. The standard errors are decreased for the estimates of association parameters. This leads to a significant association between the two outcomes binary variables, 
3. The Wald statistic test shows lower values, this confirms a significant association between the two outcomes binary variables, 
4. The LRT is increased, this also confirms the conclusion observed from the Wald statistic.
5. The estimate of a scalar dispersion parameter, 
6. The estimates of the matrix of dispersion parameters, 

7. The scaled deviance value is increased.
6.2. Application to Trivariate Case
We will use the columns, cyadea, beitaw and kniexc, as the dependent correlated binary variables, 



Table 1. Results of AQEF and VGLM before modifying the data.
Hence, the LRT’s will be compared with
Table 2. Results of AQEF and VGLM after modifying data.
Hence, the LRTs will be compared with
Table 3. Results before and after modifying data.
Hence, the LRT’s will be compared with
From Table 3, we demonstrate the conclusions after modifying the data by the estimates of dispersion parameters, as follows:
1. The estimates of regression parameters in the two measures are changed.
2. The scaled deviance is increased for the two measures.
3. The estimate of a scalar dispersion parameter, 
4. The estimates of values of dispersion parameters, 





5. For the VGLM measure, the LRTs reflect significant association between the pairwise outcome variables, 


For the AQEF measure, the LRTs also reflect significant association between the pairwise outcome variables, 

However, no significant association is observed between the correlated binary outcome variables, 
6. The LRT for the third association, which is observed from the AQEF measure, reflects no significant association between the correlated binary outcome variables, 
So, when modifying the correlated data, the estimates of dispersion parameters, 




Acknowledgements
For all my professors.
Cite this paper
Ahmed Mohamed Mohamed El-Sayed, (2016) A New Approach for Dispersion Parameters. Journal of Applied Mathematics and Physics,04,1554-1566. doi: 10.4236/jamp.2016.48165
References
- 1. McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models. 2nd Edition, Chapman and Hall, London.
http://dx.doi.org/10.1007/978-1-4899-3242-6 - 2. Yee, T.W. and Wild, C.J. (1996) Vector Generalized Additive Models. Journal of the Royal Statistical Society, Series B (Methodological), 58, 481-493.
- 3. El-Sayed, A.M.M., Islam, M.A. and Alzaid, A.A. (2013) Estimation and Test of Measures of Association for Correlated Binary Data. Bulletin of the Malaysian Mathematical Sciences Society 2, 36, 985-1008.
- 4. Smith, P. and Heitjan, F. (1993) Testing and Adjusting for Departures from Nominal Dispersion in Generalized Linear Models. Applied Statistics, 42, 31-34.
http://dx.doi.org/10.2307/2347407 - 5. Cook, R.J. and Ng, E.T.M. (1997) A Logistic-Bivariate Normal Model for Over-Dispersed Two-State Markov Process. Biometrics, 53, 358-364.
http://dx.doi.org/10.2307/2533121 - 6. Saefuddin, A., Setiabudi, N.A. and Achsani, N.A. (2011) The Effect of Over-Dispersion on Regression Based Decision with Application to Churn Analysis on Indonesian Mobile Phone Industry. European Journal of Scientific Research, 60, 584-592.
- 7. William, D.A. (1982) Extra-Binomial Variation in Logistic Linear Models. Applied Statistics, 31, 144-148.
http://dx.doi.org/10.2307/2347977 - 8. Collett, D. (2003) Modeling Binary Data. 2nd Edition, Chapman and Hall, London.
- 9. Davila, E., Lopez, L.A. and Dias, L.G. (2012) A Statistical Model for Analyzing Interdependent Complex of Plant Pathogens. Revista Colombiana de Estadistica Numero especial en Bioestadistica, 35, 255-270.
- 10. Casella, G. and Berger, R. (2002) Statistical Inference. 2nd Edition, Duxbury Press, Florida.
- 11. Yee, T.W. (2008) The VGAM Package. R News, 8, 28-39.
- 12. Yee, T.W. (2010) The VGAM Package for Categorical Data Analysis. Journal of Statistical Software, 32, 1-34.
http://dx.doi.org/10.18637/jss.v032.i10
















