Effect sizes are estimated from several study designs when the subjects are individually sampled. When the samples are the aggregate cluster of individuals, the within cluster correlation must be accounted for to construct correct confidence intervals, and to conduct valid statistical inference. The purpose of this article is to propose and evaluate statistical procedures for the estimation of the variance of the estimated attributable risk in parallel groups of clusters, and in a design dividing each of k clusters into two segments creating multiple sub-clusters. The estimated variance is the first order approximation and is obtained by the delta method. We apply the methodology and propose a Wald type confidence interval on the difference between two correlated attributable risks. We also construct a test on the hypothesis of equality of two correlated attributable risks. We evaluate the power of the proposed test via Monte-Carlo simulations.
In the epidemiological research, it is important that the collected data are translated into interpretable results which can be easily communicated to clinicians. The need for “translatable” evidence from research studies is of prime importance in the evaluation of clinical interventions, because they hold the potential to immediately influence the course of patient treatment. When evaluating these studies, the examination of “Effect Size” or (ES) can be a useful measure of the comparative efficacy of the treatment under investigation. In randomized clinical trials, an effect size estimate quantifies the direction and magnitude of an effect of an intervention.
When exposure and disease risk are measured on a binary scale, several measures of effect size are in current use [
The concept of AR was introduced in [
The concept of AR and its statistical characteristics have been reviewed in [
In this paper, we obtain the variance of the estimated AR under cluster sampling, focusing on cohort and cross-sectional designs. In Section 2, we construct an AR estimator, and in Section 3, we derive its large sample variance adjusted for the intracluster correlation (ICC). In Section 4, we consider the split cluster design, and describe situations where we compare two correlated AR parameters. In Section 5, we conduct a Monte-Carlo experiment to evaluate the empirical power of Wald’s test on the null hypothesis of equality of two correlated attri- butable risk parameters. At the end of each section, we provide an example.
We start with a parallel group design where k clusters are exposed to a specified risk factor, and l clusters are not exposed, as in the data layout given in
In
Exposed | Non-Exposed | ||||||
---|---|---|---|---|---|---|---|
1 | 2 | 1 | 2 | ||||
absence of exposure. In the unexposed clusters, let y r s ( r = 1 , 2 , ⋯ l , s = 1 , 2 , ⋯ m r ) with y r s = 1 and 0 denote positive and negative responses with Q r = P r [ y r s = 1 | unexposedcluster r ] . Furthermore, let X i = ∑ j = 1 n i x i j and Y r = ∑ s = 1 m r y i j denote respectively the total number of events in the exposed and non-exposed groups; provided that the misclassification error is zero. Therefore, conditional on π i , X i has binomial distribution with parameters ( n i , π i ) . Similarly, conditional on Q r , Y r has binomial distribution with parameters ( m r , Q r ) . To introduce a within cluster correlation, we assume that π i follows a beta distribution B ( a , b ) with probability density function (pdf) given in (1).
f ( π i | a , b ) = Γ ( a + b ) Γ ( a ) Γ ( b ) π i a − 1 ( i − π i ) b − 1 (1)
and that Q r follows a similar beta distribution where pdf is denoted B ( α , β ) . The effect of the intracluster correlation among the responses may be accounted for as follows.
Under the transformations, P = a a + b and , ρ 1 = ( 1 + a + b ) − 1 , the mean and variance of π i are given respectively by E ( π i ) = P and Var ( π i ) = P ( 1 − P ) ρ 1 .
Similarly, = α / ( α + β ) and ρ 2 = ( 1 + α + β ) − 1 . Therefore, we have E ( Q i ) = Q , Var ( Q i ) = Q ( 1 − Q ) ρ 2 . Consequently, the unconditional distribution of x i is beta binomial with, E ( x i ) = n i ρ , and Var ( X i ) = n i P ( 1 − P ) [ 1 + ( n i − 1 ) ρ 1 ] . Similarly; E ( Y i ) = m i Q , and
Var ( Y i ) = m i Q ( 1 − Q ) [ 1 + ( m i − 1 ) ρ 2 ] .
It should be noted that the beta distribution assumptions imposed on the model parameters is not necessary, and one may adopt a quasi-likelihood set-up, by specifying the first two moments for π i and Q i asshown in [
The parameters ρ 1 and ρ 2 are respectively interpreted as the within cluster correlations among all pairs of scores in the group of exposed and unexposed. We may obtain consistent estimators of Var ( X i ) and Var ( Y i ) on replacing the parameters, Q , ρ 1 , and ρ 2 with appropriate estimators from the data as will be shown in the next sections. We shall now construct unbiased point estimators for the parameters P and Q .
From [
Var ( Y ) = M Q ( 1 − Q ) [ 1 + ( m o − 1 ) ρ 2 ] , where N = ∑ i = 1 k n i , M = ∑ r = 1 l m i ,
n 0 = ∑ i = 1 k n i 2 / N , and m o = ∑ r = i l m i 2 / M . Clearly X / N and Y / M are unbiased point estimators for P and Q respectively.
The data, under the above set up can then be summarized in a 2 ´ 2 table as shown in
Formally, the AR is defined in [
A R = { P ( D ) − P ( D | E ¯ ) } / P ( D ) (2)
where P ( D ) is the percentage of disease in the population, and P ( D | E ¯ ) is the percentage of disease in the population in the absence of exposure to the risk factor. Levin [
Using Bayes theorem, and from [ [
A R = p ( E ) ( R R − 1 ) 1 + p ( E ) ( R R − 1 ) . (3)
Here; RR is the relative risk or the risk ratio, and P ( E ) is the risk of expo-
sure. The RR is defined by R R = p ( D | E ) P ( D | E ¯ ) . In terms of population parameters,
the AR as defined in (3) is equivalent to:
A R = P ( 1 − Q ) − Q ( 1 − P ) P + Q = P − Q P + Q . (4)
Under the transformation, Ψ = 1 − A R 1 + A R , we get Q = Ψ P . We shall use this
transformation to facilitate the derivation of the large sample variance of AR.
The sample estimator of AR, is obtained using the data in a 2 ´ 2 cross classification as given in
Epidemiologists use this statistic quite frequently to assess the consequences of an association between a binary outcome of interest ( D ) and exposure to a risk factor ( E ) . The total number of observations in the non-exposed and the exposed groups are given respectively by M and N , assumed fixed.
For a cross sectional or cohort study designs the A R estimator is from [
Response | ||||
---|---|---|---|---|
Total | ||||
Exposure | ||||
Total |
given by:
A R ^ = X ( M − Y ) − Y ( N − X ) ( X + Y ) M . (5)
Following [
We first write, Var ( X ) = N P ( 1 − P ) c 1 , and, Var ( Y ) = M Q ( 1 − Q ) c 2 ,
where c 1 = 1 + ( n 0 − 1 ) ρ 1 , c 2 = 1 + ( m 0 − 1 ) ρ 2 , n 0 = ∑ i = 1 k n i 2 | N , and m 0 = ∑ i = 1 l m i 2 | M .
Using the delta method [
Var ( θ ⌢ ) = N ( 1 − P ) C 1 P ( N + M Ψ ) 2 + N 2 ( 1 − P Ψ ) C 2 M P Ψ ( N + M Ψ ) 2 . (6)
A consistent estimator of Var ( θ ⌢ ) may be obtained on replacing the para-
meters P , c 1 , c 2 and Ψ by their moment estimators. An ( 1 − α ) 100 % confidence interval on AR is thus given as:
( 1 − exp ( θ ⌢ + z α / 2 var ( θ ⌢ ) ) , 1 − exp ( θ ⌢ − z α / 2 var ( θ ⌢ ) ) ) .
The moment estimators of the intraclass correlations are obtained separately from the groups of exposed and unexposed clusters. The moment estimator of ρ 1 is given by:
ρ ^ 1 = M S B − M S W M S B + ( n 0 − 1 ) M S W (7)
where r,
M S W = 1 n − k ∑ i = 1 k ∑ j = 1 n i x i j ( n i j − x i j ) n i j (8)
M S B = 1 k − 1 ∑ i = 1 k ∑ j = 1 n i ( x i j − x i ) 2 n i j . (9)
Similar expressions for the (MSW, MSB) are obtained for the clusters of unexposed. The quantities (MSW, MSB) are estimated from the one-way ANOVA model when the responses are measured on the binary scale. For details the readers are referred to [
We now consider two examples, the first is from data arising from a cross sectional study and the second example is on data from a randomized prospective trial.
Example 1: Cross-Sectional Study: The effect of consanguinity on congenital heart defects (CHD).
The Saudi Arabian CHD registry [
The participating hospitals are from regions that cover the country making the registry a nationwide data repository for the Kingdom of Saudi Arabia [Congenital Heart Disease Registry 2013]. The present example uses data on a major congenital heart disease; Patent Ductus Arteriosus (PDA). The incidence of PDA has been reported to be approximately, 1 in 2000 births, which accounts for 5% to 10% of all congenital heart diseases with female to male ratio of almost 2:1 [
Arab countries are notorious for consanguineous marriages, with first cousin types being the most common. For example in Jordan the prevalence of consanguinity was reported in [
For illustrative purposes of the methodologies presented in this section, we sampled two children from the registry whose mother are non-diabetic, with maternal age less than 40 years. Each sampled child was classified according to the presence/absence of PDA, and the type of parental consanguinity (exposure variable). Therefore, for the exposed (children from consanguineous marriages restricted to first degree cousin) and non-exposed (children from non-consangui- neous marriages) the cluster size is n = m = 2 . The data are presented in
Direct applications using Equations (5), (8), (9), and (10) we get:
AR = ( 53 ) ( 66 ) − ( 30 ) ( 99 ) ( 83 ) ( 96 ) = 6.6 % , P ( E ) = 0.61
ρ 1 = 0.325 , ρ 2 = 0.332 , P = P r ( D | Consanguineous ) = 0.39 ,
Q = P r ( D | Non − Consanguineous ) = 0.31 , and R R = 0.349 0.313 = 1.11 .
The square root of Equation (6) gives s e ( θ ^ ) = 0.085 , and the 95% confidence interval of AR is: −0.104 < AR < 0.210.
The AR estimate is interpreted as follow: if among infants born with CHD, gi- ven that PDA among infants with CHD is a preventable event, then prohibiting first degree relatives’ marriages will reduce the chance of having PDA by 6%.
Example 2: Prospective Cohort study (Weil’s data)
The data in this example was given first in [
PDA | ||||
---|---|---|---|---|
Present | Absent | Total | ||
Consanguinity | Yes | 53 | 99 | 152 |
No | 30 | 66 | 96 | |
Total | 83 | 165 | 248 |
results from an experiment comparing two treatments. One group of 16 pregnant female rats was fed a control diet during pregnancy and lactation, while the diet of a second group of 16 pregnant females was treated with a chemical. For each cluster (litter consisting of the pups born to a female rat), the number n of pups alive at 4 days and the number y of pups that survived at 31 day lactation period were recorded. The data are given as a fraction y / n in
In
P = 0.24 and Q = 0.10 , giving relative risk R R = 2.4 .
A R = ( 35 ) ( 142 ) − ( 16 ) ( 112 ) ( 51 ) ( 158 ) = 39.4 % .
ρ ( control ) = 0.029 , ρ ( treated ) = 0.040 , n 0 = 9.84 , m 0 = 9.16 , and s e ( θ ^ ) = 0.0555 , and the 95% confidence interval on AR is: 0.33 < AR < 0.45.
Split-cluster experiments are being used by investigators in health sciences when naturally occurring aggregates of individuals with nested subgroups may be assigned to different treatments. Cited examples include split mouth trials, in which a subject’s mouth is divided into two segments that are randomly assigned to different treatment groups. In other situation, randomization to treatment conditions may be possible at the person level within the cluster. In this case, when the treatment conditions are available within each cluster, the design is referred to as a multisite or split cluster design (SCD). The major attractiveness of this design is that it removes a large portion of the inter-subject variation from the estimate of treatment effect; and hence has the potential to require a lesser number of subjects than a parallel arm design with the same power. When the response variable of interest is binary, statistical methods developed to evaluate the effect of intervention depends on non-parametric methods, as shown in [
In this section we present the data layout for the SCD (see
Under a similar set up to that we presented in the previous section and with appropriate change in notations the random variables X i = ∑ j = 1 n i x i j and Y i = ∑ j = 1 m i y i j will have the same beta-binomial distributions, but they are no- longer independent.
Control | 13/13 | 12/12 | 9/9 | 9/9 | 8/8 | 8/8 | 12/13 | 11/12 |
---|---|---|---|---|---|---|---|---|
9/10 | 9/10 | 8/9 | 11/13 | 4/5 | 5/7 | 7/10 | 7/10 | |
Treatment | 12/12 | 11/11 | 10/10 | 9/9 | 10/11 | 9/11 | 9/11 | 8/9 |
8/9 | 4/5 | 7/9 | 4/7 | 5/10 | 3/6 | 3/10 | 0/7 |
Exposure | Status | Total | |
---|---|---|---|
Dead | Alive | ||
Treated | 35 | 112 | 147 |
Control | 16 | 142 | 158 |
Total | 51 | 254 | 305 |
Clusters | ||||||
---|---|---|---|---|---|---|
Sub-Clusters | 1 | 2 | ||||
1 | ||||||
(Exposed) | ||||||
2 | ||||||
(Unexposed) | ||||||
Var ( X ) = N P ( 1 − P ) [ 1 + ( u 1 − 1 ) ρ 1 ]
Var ( Y ) = M Q ( 1 − Q ) [ 1 + ( u 2 − 1 ) ρ 2 ] .
The correlation parameters ρ 1 and ρ 2 are estimated as shown in Equations (7)-(9).
Although the AR estimator maintains the same expression under split clusters, its variance is affected by the correlations within the sub-clusters, and between units in the exposed and the non-exposed sub-clusters.
Using the delta method, we can therefore show that
Var ( θ ^ ) = N ( 1 − P ) c 1 P [ N + M ψ ] 2 + N 2 ( 1 − ψ P ) c 2 M ψ P [ N + M ψ ] 2 + { − 2 N P ρ 12 M ψ P [ N + M ψ ] 2 [ N M ψ ( 1 − P ) ( 1 − ψ P ) c 1 c 2 ] 1 / 2 } (10)
where c 1 = 1 + ( u 1 − 1 ) ρ 1 , c 2 = 1 + ( u 2 − 1 ) ρ 2 , u 1 = ∑ i = 1 k n i 2 / N and
u 2 = ∑ i = 1 k m i 2 / M .
Here, ρ 1 is the intraclass correlation among the individuals in the sub-clus- ters of exposed, and ρ 2 is the intraclass correlation among the individuals in the sub-clusters of unexposed. Both correlations are estimated from the one-way ANOVA layout as explained in Equations (7)-(9). The cross-clusters correlation which is interpreted as an intercluster correlation denoted by ρ 12 is similarly estimated from the data by first ignoring the splitting structure of the data, and then use the one-way ANOVA to obtain the within and between mean squares. Substituting these quantities in (7) we obtain a moment estimator of ρ 12 .
Example 3: Split-Mouth Trial
For illustrating the proposed methodology, as a third example, we consider data from a split-mouth trial on 23 patients evaluating the effect of chlorhexidine in the treatment of gingivitis [
ρ 1 = 0.0395 , ρ 2 = 0.087 , ρ 12 = 0.039 .
A R = ( 82 ) ( 21 ) − ( 10 ) ( 71 ) ( 153 ) ( 92 ) = 1722 − 710 14076 = 7.19 % .
s e ( θ ^ ) = 0.092 , and the 95% CI on AR is (−0.112, 0.225).
Interest is focused on studying the change in disease-exposure etiology under va- rying conditions. We illustrate this situation using the published data [
For example in the case of family data we may be interested in evaluating the effect of disease status of a parental exposure variable on their siblings, which can be divided into males and females within the same family. In this case, we
Treat | Affected (+) | Not Affected (−) | Total |
---|---|---|---|
Chloro. (1) | 82 | 10 | 92 |
Control (2) | 71 | 21 | 92 |
Total | 153 | 31 | 184 |
Exposure | Males (b) | Females (g) | Total | ||||
---|---|---|---|---|---|---|---|
Subtotal | Subtotal | ||||||
Father+ | 43 | 144 | 187 | 61 | 134 | 195 | 382 |
Father− | 21 | 107 | 128 | 22 | 94 | 116 | 244 |
Total | 64 | 251 | 315 | 83 | 228 | 311 | 626 |
have two correlated attributable risk estimator, one describing the disease-ex- pose etiology for males, and the other for females. The main interest here is to compare the AR of males to that of females from the same sib-ship.
Example 4: Correlated AR’s from Cross Sectional Study: Family Data
We now consider a highly structured clustered familial data that has a two level hierarchy with blood measurements taken on parents (level two) and their offspring (level one) together with other anthropometric features [
We present the general methodology as follows: Testing for gender difference in the population A R is formulated as testing the null hypothesis H 0 : A R 1 = A R 2 against a general unspecified alternative H 1 : A R 2 = A R 1 + Δ . Note that testing this null hypothesis is equivalent to testing H 0 : θ 1 = θ 2 , or H 0 : Δ = 0 .
Let the point estimators be denoted by θ ^ 1 and θ ^ 2 . The difference D = θ ^ 1 − θ ^ 2 is asymptotically unbiased and has variance var ( D ) = var ( θ ^ 1 ) + var ( θ ^ 2 ) − 2 cov ( θ ^ 1 , θ ^ 2 ) .
Hence the null hypothesis is rejected whenever Z = D / var ( D ) falls in the interval Z > z α / 2 or Z < − z α / 2 , where z α / 2 is the ( 1 − α / 2 ) 100 % cut off point on the standard normal curve. With a slight difference in notation, var ( θ ^ i ) is similar to the expression in Equation (6). We derive cov ( θ ^ 1 , θ ^ 2 ) using the delta method. In general, the data will have a structure similar to that given in
We define the moment estimator as before:
A R ^ j = x j ( M j − Y j ) − Y j ( N j − X j ) M j ( X j + Y j ) j = 1 , 2.
Let θ ^ j = ln ( 1 − A R ^ j ) , then similar to the first situation, we have:
τ j 2 = var ( θ ^ j ) = N j ( 1 − P j ) c 1 j P j ( N j + M j ψ j ) 2 + N j 2 ( 1 − P j ψ j ) c 2 j M j P j ψ j ( N j + M j ψ j ) 2 .
Exposure | Condition (1) | Condition (2) | ||
---|---|---|---|---|
Here;
N j = ∑ i = 1 k j n j i , M j = ∑ i = 1 l j m j i , c 1 j = 1 + ( n 0 j − 1 ) ρ 1 j , c 2 j = 1 + ( m 0 j − 1 ) ρ 2 j ,
and n 0 j = 1 N j ∑ i = 1 k j n j i 2 , m 0 j = 1 M j ∑ i = 1 l j m j i 2 , ψ j = ( 1 − A R j ) / ( 1 + A R j ) , and
P j = rateofexposuretotheriskfactorunderthe j t h condition .
Moreover, ρ 1 j is the intracluster correlation of the exposed clusters under j t h condition, and ρ 2 j is the intracluster correlation of the unexposed clusters under j t h condition. They two parameters are estimated as described in (7).
For simplicity we assume that these correlations are constant among the exposed and unexposed.
Using the delta method we can show after some algebra that:
cov ( θ ^ 1 , θ ^ 2 ) = ρ [ α 1 α 2 γ 1 γ 2 + α 2 β 1 γ 2 δ 1 + α 1 β 2 γ 1 δ 2 + β 1 β 2 δ 1 δ 2 ] . (11)
The correlation ρ which, under both conditions is the average correlation among the responses, is estimated as described in Section 3.
The values inside the square bracket are given by:
α j = ∂ ¯ θ j ∂ x j = − 1 P j ( N j + M j ψ j )
β j = ∂ ¯ θ j ∂ y j = − N j M j ψ j P j ( N j + M j ψ j )
γ j 2 = var ( x j ) = N j P j ( 1 − P j ) c 1 j
δ j 2 = var ( y j ) = M j Q j ( 1 − Q j ) c 2 j = M j ψ j P j ( 1 − ψ j P j ) c 2 j j = 1 , 2.
Therefore;
var ( θ ^ 1 − θ ^ 2 ) = τ 1 2 + τ 2 2 − 2 ρ [ α 1 α 2 γ 1 γ 2 + α 2 β 1 γ 2 δ 1 + α 1 β 2 γ 1 δ 2 + β 1 β 2 δ 1 δ 2 ] . (12)
Using the data in
Males: P 1 = .59 , M 1 = 128 , A R 1 = .19 , var ( θ ^ 1 ) = .0142 .
Females: P 2 = .63 , M 2 = 244 , A R 2 = .29 , var ( θ ^ 2 ) = .00578
var ( θ ^ 1 − θ ^ 2 ) = .01987 , z = − .211 + .342 .141 = 0.929 , and p − value = 0.353 .
Therefore there is not enough evidence in the data to support the hypothesis of presence of gender differences for the paternal effect on the siblings’ hypertension status.
We carried out a Monte-Carlo study generating the observations from bivariate beta binomial distribution. We restricted our simulations to the situation when the intracluster and the cross clusters correlation are equal. We also assumed a fixed number of observations within each cluster. The purpose was to limit the number of scenarios under which we examine the properties of the proposed test statistic Z. The statistic Z = D / var ( D ) is computed when both P 1 and P 2 are strictly positive with additional restriction, ρ < ( τ 1 2 + τ 2 2 ) / 2 [ α 1 α 2 γ 1 γ 2 + α 2 β 1 γ 2 δ 1 + α 1 β 2 γ 1 δ 2 + β 1 β 2 δ 1 δ 2 ] . If these conditions are not satisfied, the sample is replaced until a total of 1000 iterations are obtained for each parameter combination.
The population Attributable risk, like the odds ratio and relative risk is a measure of disease risk association. However it has a special appeal to public health epidemiologists as it measures the percent reduction in the chances of having the outcome among subjects who are exposed to the risk factor. Clearly, not everyone in the population is exposed to the risk factor. For example, in evaluating the relationship between consanguinity and the risk of PDA, not all parents are relatives. We assume say that 55% of women in the population (as in the Saudi traditional society) are married to a first cousin. To determine how much of a reduction there would be in PDA among CHD newborns we have 0.55 × 0.06 = 3.3%.
We have developed estimators of the variance and the confidence interval on AR when the units of sampling are aggregates of individuals under three study designs. In all situations the estimation of the intraclass correlation is crucial to
k = l | n = m | 0.05 | 0.10 | 0.25 | 0.05 | 0.10 | 0.25 | 0.05 | 0.10 | 0.25 | |||
5 | 2 | 0.049 | 0.058 | 0.078 | 0.101 | 0.049 | 0.060 | 0.084 | 0.110 | 0.049 | 0.083 | 0.180 | 0.296 |
3 | 0.049 | 0.060 | 0.084 | 113 | 0.049 | 0.062 | 0.089 | 0.122 | 0.049 | 0.089 | 0.210 | 0.356 | |
5 | 0.050 | 0.062 | 0.092 | 0.132 | 0.050 | 0.063 | 0.097 | 0.136 | 0.049 | 0.096 | 0.247 | 0.436 | |
50 | 2 | 0.049 | 0.081 | 0.183 | 0.350 | 0.049 | 0.090 | 0.22 | 0.42 | 0.049 | 0.218 | 1.00 | 1.00 |
3 | 0.050 | 0.087 | 0.221 | 0.444 | 0.050 | 0.095 | 0.260 | 0.512 | 0.050 | 0.257 | 1.00 | 1.00 | |
5 | 0.051 | 0.097 | 0.281 | 0.595 | 0.050 | 0.104 | 0.312 | 0.635 | 0.050 | 0.309 | 1.00 | 1.00 | |
100 | 2 | 0.051 | 0.097 | 0.281 | 0.604 | 0.051 | 0.111 | 0.357 | 0.740 | 0.050 | 0.353 | 1.00 | 1.00 |
3 | 0.051 | 0.108 | 0.359 | 0.787 | 0.051 | 0.122 | 0.435 | 0.910 | 0.051 | 0.429 | 1.00 | 1.00 | |
5 | 0.050 | 0.124 | 0.472 | 0.99 | 0.050 | 0.140 | 0.537 | 1.00 | 0.051 | 0.530 | 1.00 | 1.00 |
conduct valid statistical inferences.
One of the objectives of this paper was to develop and evaluate simple test statistic that could be used to compare dependent attributable risks in the case of clustered dichotomous outcome variables.
1) Through simulations, a major finding of our work is that to test the equality of correlated attributable risks, either from cross sectional or cohort studies, one needs a much larger number of clusters than that expected to achieve high power.
2) An interesting extension of our study is to construct model-based inference on the AR. This would require the development of a semi parametric model similar to the generalized estimating equation, or a full probabilistic model such as generalized linear mixed model where the effect of multiple covariates may be accounted for.
3) A limitation of the simulation study is the restrictions that the number of observations in all clusters is held constant (balanced design) and that the within cluster and the cross cluster correlations are equal. The reason for this assumption is to limit the number of factors which affect the power so that reasonable conclusions can be made. But we believe that these restrictions should not affect the overall conclusions.
The authors declare that there is no conflict of interest.
Shoukri, M., Donner, A. and Al-Mohanna, F. (2017) Estimation of Attributable Risk from Clustered Binary Data: The Case of Cross-Sec- tional and Cohort Studies. Open Journal of Statistics, 7, 240-253. https://doi.org/10.4236/ojs.2017.72019