Cross-section data from Canadian Community Health Surveys are used to examine the relationship between moderate alcohol use and type 2 diabetes. Results from these data are compared with those which have been obtained from prospective longitudinal studies. The major result is that both types of data yield similar conclusions with respect to this relationship. The reason why this occurs is because Canadian drinking behavior is quite stable once a respondent has become an adult and remains relatively stable thereafter. The only difference between the two types of survey is the time at which information on drinking behavior is obtained. Since this does not matter if drinking behavior is stable over large age ranges results from the two types of survey will be similar. Neither type of data can be used to support the proposition that the relationship between drinking behavior and the risk of diabetes is causal. Some advantages that sample survey data have over longitudinal data are also noted.
Much has been written on the effects of moderate alcohol consumption as a prophylactic for type 2 diabetes. The studies which are regarded as the most influential and referred to most often are prospective or longitudinal. They collect baseline information on a sample at a fixed point in time and then follow the respondents in the sample for a number of years until the respondent is either diagnosed with diabetes, dies, or the study is termi- nated for administrative reasons. The measure which is used to represent diabetes is the duration of diabetes free life. One of the distinguishing features of the methodology is that only those respondents who do not have di- abetes at baseline are retained for follow-up coverage. Some studies collect information at regular intervals as the study progresses but most do not.
There is a sufficiently large number of these studies to have generated three meta-studies reviewing their re- sults [
The appeal of prospective studies is that the observed statistical relationship between alcohol use and the risk of type 2 diabetes is based on the respondent’s self-reported drinking behavior at baseline which by construction is prior to the onset of diabetes. Because of this the hypothesis that this relationship is causal has some appeal. On the other hand, results based on sample surveys involving cross-section data are seen as much less convinc- ing since drinking behavior at the time of the survey refers to a time after the onset of diabetes for those who suffer from the disease. This would have current behavior explaining events that happened in the past, or so it would seem.
It is argued here that because Canadian drinking habits are relatively stable over time and over a large range of ages the data requirements for both cross-sectional and prospective longitudinal surveys to be informative about this issue are similar. This is good news for diabetes research for two reasons. First, countries like Canada, for example, which have not produced many longitudinal medical or health surveys can use cross-section data to investigate what effect drinking behavior has on the probability of having diabetes using the sample surveys that are regularly carried out by Statistics Canada [
Finally, unlike longitudinal studies, there are no attrition problems when a cross-section survey is the source of the data. Respondents are contacted only once; there is no need to keep track of them and inference problems due to non-response or death do not arise even if mortality is related to alcohol use2.
The paper has the following format. The argument that cross-sectional and longitudinal surveys are similar with respect to when drinking behavior is determined is developed in detail in the next section. The statistical model is outlined in Section 3 and the results are contained in Section 4. These are discussed in Section 5 and the paper the ends with a summary of the results and some conclusions.
When researchers use baseline data to examine outcomes that occur later in a project this data has to represent the respondent’s characteristics not just at the time the data was collected but for a considerable period prior to the collection date as well as for the follow-up period. In the case of alcohol consumption and diabetes risk in longitudinal surveys the authors of [
Stability of behavior is also required for cross-section data to be informative about this relationship but the requirements are somewhat more stringent. Alcohol intake has to be stable for a period of unknown length prior to the onset diabetes. For the samples used here many of the respondents became diabetic 10 or 15 years before the data was collected so that reported drinking behavior obtained in the survey, if it is to be informative about the risks of having diabetes, has to be the same as it was long before the information on the respondent’s current alcohol consumption behavior was obtained.
In Canada, drinking behavior is formed when respondents are in their twenties and remains remarkably stable up to ages 50 - 59 for men. There is considerable more variation for women. In
This table contains data from the 2000-01 and 2011-12 Canadian Community Health Surveys and gives pro- portions by ten-year age groups for both male and female respondents. In 2000-01 the proportions of regular drinkers among males was around 50% for age groups 20 - 59 with little or no significant variation across age groups. For the 2011-12 sample there is hardly any variation at all over the first four age categories but the pro- portions of male regular drinkers in the age groups 20 - 59 in 2010 were about 10% higher than they were eleven years earlier. For women changes in behavior across the two surveys are much larger. In 2011-12 women in all age categories drank considerably more than they did eleven years earlier. The survey design is the same for both years so although the respondents are not the same for the two surveys they come from the same distribu- tion. For men, the conclusion from this is that alcohol consumption behavior does not change very much as res- pondents or cohorts get older and thus the age at which this information is collected will not have a major im- pact on the results concerning the importance of alcohol intake on the risk of diabetes. Of course, the upward trend in regular drinking behavior could have an impact on the results. This issue will be examined later by looking at some simulations.
Year | ||
---|---|---|
Age Group | 2000-01 | 2011-12 |
20 - 29 | 0.498 | 0.564 |
30 - 39 | 0.518 | 0.555 |
40 - 49 | 0.514 | 0.549 |
50 - 59 | 0.472 | 0.588 |
60 - 69 | 0.402 | 0.557 |
70 - 79 | 0.341 | 0.497 |
Average | 0.447 | 0.555 |
Sample Size | 48206 | 44381 |
Year | ||
---|---|---|
Age Group | 2000-01 | 2011-12 |
20 - 29 | 0.251 | 0.368 |
30 - 39 | 0.325 | 0.364 |
40 - 49 | 0.320 | 0.398 |
50 - 59 | 0.285 | 0.426 |
60 - 69 | 0.227 | 0.367 |
70 - 79 | 0.143 | 0.279 |
Average | 0.255 | 0.371 |
Sample Size | 57533 | 52689 |
In the Canadian Community Health Surveys respondents are asked whether they have type 2 diabetes. They are also asked what type of medication has been prescribed for them. Most respondents (86% in 2011-12) were ei- ther taking some form of oral medication or insulin or sometimes both. So that although the information is self- reported the fact that most respondents had some involvement with a medical practitioner suggests that it is quite reliable. The measure of diabetes used here is the answer to this question for type 2 diabetes. Call this di for res- pondent i. This is a binary variable which takes the value 1 if respondent i has type 2 diabetes and 0 if not at the time of the survey.
The type 2 diabetes indicator variable is explained by normal probability model. Define
where Xi is vector of personal characteristics of respondent i including whether he or she is a regular or occa- sional drinker and ui is a normally distributed error term.
and
where
The respondent characteristics include six smoking categorical variables going from never smoked to being a daily smoker. There are four educational categories going from less than a high school diploma to a university degree. There is also information on the respondent’s age, body mass index, income decile and level of physical activity. But the last two were not used as regressors. The non-drinker category includes former drinkers. This is seen as problematic by some authors, [
Age Group | Regular Drinker | Occasional Drinker | Ln (BMI) Age | |
---|---|---|---|---|
40 - 49 | −0.468 (0.098) | −0.165 (0.100) | 0.291 (0.0034) | 0.090 (0.037) |
50 - 59 | −0.426 (0.069) | −0.151 (0.074) | 0.337 (0.026) | 0.120 (0.027) |
60 - 69 | −0.412 (0.061) | −0.066 (0.066) | 0.336 (0.024) | 0.068 (0.024) |
70 - 79 | −0.318 (0.067) | −0.100 (0.074) | 0.274 (0.028) | 0.069 (0.028) |
Age Group | Regular Drinker | Occasional Drinker | Ln (BMI) Age | |
---|---|---|---|---|
40 - 49 | −0.391 (0.153) | −0.169 (0.106) | 0.417 (0.039) | 0.065 (0.043) |
50 - 59 | −0.565 (0.077) | −0.245 (0.067) | 0.410 (0.027) | 0.108 (0.028) |
60 - 69 | −0.601 (0.066) | −0.170 (0.055) | 0.399 (0.024) | 0.075 (0.024) |
70 - 79 | −0.673 (0.075) | −0.184 (0.056) | 0.267 (0.026) | 0.030 (0.026) |
Discussion of the results begins with the analysis of the age group 40 - 49 for males. Diabetes prevalence rates for the age groups 20 - 39 are quite low at 1.2%. They rise to 4.8%, four times higher, for the age group 40 - 49. It would therefore appear that most of the respondents who suffer from diabetes in this age group became di- abetic in their late thirties or forties. As argued earlier the drinking behavior for this age group is similar to what it was at younger ages and before the onset of diabetes. Thus, the large and significant regression coefficient for the categorical variable “regular drinker” for males of −0.468 (0.098) in the first row of
Other variables were included as regressors in the normal probability models. The coefficients of the natural logarithm of the respondent’s body mass index, BMI, was always the largest and most significant, followed by age and then some of the higher educational categories. Being a lifetime non-smoker also reduced the risk of diabetes. Income and physical activity were not included as explanatory variables because of the possibility of reverse causation. Being a heavy drinker was included but it was never significant, a result similar to that found by [
The parameter estimates for the older age groups are very similar to those for the age group 40 - 49. This re- sult is somewhat surprising since the distribution of drinking behavior begins to change towards more occasional and nondrinkers as the cohorts get older. Apparently these changes are not large enough to alter the conclusions based on the youngest cohort.
The parameter estimates for women are similar to those for men except that being a regular drinker is more important for women and these coefficients increase with age. This is an unusual result since most studies find that the prophylactic effects of regular moderate alcohol use are less pronounced for women.
Age Group | Regular Drinker | Occasional Drinker | Non-Drinker |
---|---|---|---|
40 - 49 | 0.030 | 0.073 | 0.076 |
50 - 59 | 0.069 | 0.130 | 0.146 |
60 - 69 | 0.118 | 0.205 | 0.217 |
70 - 79 | 0.181 | 0.280 | 0.291 |
Age Group | Regular Drinker | Occasional Drinker | Non-Drinker |
---|---|---|---|
40 - 49 | 0.018 | 0.056 | 0.073 |
50 - 59 | 0.033 | 0.092 | 0.124 |
60 - 69 | 0.053 | 0.131 | 0.177 |
70 - 79 | 0.080 | 0.171 | 0.228 |
The slight upward trend in the proportion of regular drinkers means that the measured proportions for the age group 40 - 49 actually overstate the proportions of regular drinkers ten or twenty years earlier. If the actual pro- portions overstate the true proportions then it will also lead to an inflated estimate of the true effect of being a regular drinker on the probability of having diabetes. How large is the error associated with the use of the in- flated data? To get an answer to this question female a simulation exercise was carried out where 10% of the regular male drinkers and 37% of the regular female drinkers in the age group 40 - 49 were randomly reallocated equally to the two other categories. This leaves a set of regular drinkers which has the same proportion of regu- lar drinkers for the age group 20 - 29 in the 2000-01 sample for both genders. For these simulated samples there is an increase in the regression coefficient associated with the regular drinker dummy from −0.401 (0.076) to −0.346 (0.077) for men and −0.372 (0.081) to −0.318 (0.090) for women, respectively. Neither change is signifi- cant, and the new coefficients are still many times their standard errors. Although there is a change in the size of the response the effect associated with being a regular drinker it is still present and highly significant. Thus even if current drinking behavior does not represent exactly what respondents did twenty years earlier it is still a good enough measure of their behavior.
One of the reasons why researchers pay so much more attention to longitudinal data sources is because of the belief that it will bring them closer to discovering a causal relation between drinking behavior and the risk of diabetes. The issue of causality is an important one and not being able to claim that results are causal is often seen as detracting from their credibility. The important question here is whether results based on either longitu- dinal or sample survey data can be used to support the hypothesis that there is a causal relation between drinking behavior and diabetes.
From a purely statistical point of view the answer to this question is most probably not and this result does not depend on which type of data is being used in the analysis. In the linear regression framework ( [
However, what is of slightly more concern is that even if an accurate picture of drinking behaviour could be observed it still might not be possible to confirm that the relation is causal. Suppose for example that instead of Equation (1) the true model is
where the Zi are highly correlated with the Xi variables but cannot be observed by the researcher. Ideally the model to be estimated should be based on the equation
The estimated β coefficients will not be significant but the δ coefficients will be. But when Equation (1) is the basis of the model then
and ui and Xi are not independent. The model based on Equation (1) is not causal not because of measurement error but because of omitted regressors which are correlated with the observable regressors.
This is not just a hypothetical situation. Information on the respondent’s history of dietary and physical activ- ity as well as detailed information on the timing and degree of being overweight or obese is very important in the analysis of diabetes. This information is uniformly absent in almost all surveys whatever their type. In this respect, both types of survey are similar and neither will be very informative about issue of causality.
However, there is other evidence suggesting a causal relationship. [
The sample survey results presented here should be of considerable interest to diabetes researchers because they confirm what others have found using prospective data, namely that there is a “U” shaped relation between alcohol consumption and the risk of diabetes. Evidence of this result was obtained here when age groups were aggregated into the age group 40 - 79. Sample sizes were too small to use the individual age groups. Within the drinker category there are seven sub-categories going from less than once a month to drinking every day. The category with the largest regression coefficient was three to four days per week for men and five to six days per week for women. Optimal drinking behavior was never characterized by drinking every day. This confirms the “U” shaped relation between alcohol use and the risk of type 2 diabetes mentioned above.
In the introduction the study involving older cohorts by [
The results in this paper show that moderate alcohol use acts as a prophylactic in reducing the risk of type 2 di- abetes. The data used here is cross-sectional and represents behavior at one point in time. However, Canadian drinking habits are fairly stable over time and across cohorts so that information about them will be similar for both longitudinal and cross-section surveys. Thus it should not be surprising that the protective effects of mod- erate alcohol use that so many longitudinal studies have found should also apply to respondents in the 2010-11 Canadian Community Health Survey. This is useful information since this issue has not been examined using Canadian longitudinal data. There are advantages from using cross-section surveys in terms of cost and the avoidance of both attrition and selection problems that arise in longitudinal surveys. It was also shown that nei- ther type of survey could be used to justify a causal relation between alcohol use and type 2 diabetes. For longi- tudinal surveys the fact that the information on alcohol use was collected prior to the onset of the disease is not sufficient to support the claim that the relation was causal.
Sample surveys like the Canadian Community Health Survey are a new source of data that can and should be used to explore how health issues are related to respondent behavior. However, some of the problems noted about these surveys could be circumvented by including more retrospective content like the history of the res- pondent’s weight and exercise habits as well as more accurate information on how much and how often they drink alcohol.