Background and Objective: A multitude of large cohort studies have collected data on incidence and covariates/risk factors of various chronic diseases. However, approaches for utilization of these large data and translation of the valuable results to inform and guide clinical disease prevention practice are not well developed. In this paper , we proposed, based on large cohort study data, a novel conceptual cost-effective disease prevention design strategy for a target group when it is not affordable to include everyone in the target group for intervention. Methods and Results: Data from American Indian participants (n = 3516; 2056 women) aged 45 - 74 years in the Strong Heart Study, the diabetes risk prediction model from the study, a utility function, and regression models were used. A conceptual cost-effective disease prevention design strategy based on large cohort data was initiated. The application of the proposed strategy for diabetes prevention was illustrated. Discussion: The strategy may provide reasonable solutions to address cost-effective prevention design issues. These issues include complex associations of a disease with its significant risk factors, cost-effectively selecting individuals at high risk of developing disease to undergo intervention, individual differences in health conditions, choosing intervention risk factors and setting their appropriate, attainable, gradual and adaptive goal levels for different subgroups, and assessing effectiveness of the prevention program. Conclusions: The strategy and methods shown in the illustrative example can also be analogously adopted and applied to other diseases preventions. The proposed strategy provides a way to translate and apply epidemiological study results to clinical disease prevention practice.
Prevention of chronic diseases has emerged as an urgent issue due to increasing prevalence of the chronic diseases and their effects on medical care, public health and economic burden. For example, it is estimated that >18 million Americans have diabetes (DM) and are at risk of related vascular complications [
Let us consider designing a disease prevention program to reduce incident risk of a disease in a given time period, say, four years, for a group/community (called the target group) in a population for which it is not affordable to include everyone in the target group for intervention. We will use the following example to show the related issues in the design, and how to use available data from a large cohort study that includes the same or a similar group that is representative of the target group in terms of the factors considered (called the reference group) in the prevention design.
Example: Consider a DM (defined as having a fasting plasma glucose (FPG) ≥ 126 mg/dl or hemoglobin A1c (HbA1c) ≥ 6.5%) prevention in the target group (aged 40+ years AI with a waist circumference (WAIST) > 102 cm and free of DM).
Available result: The following SHS DM risk (probability) prediction model [
P ( an individual will develop DM in four years ) = 1 / ( 1 + exp ( − x b e t a ) ) (1)
where
x b e t a = 11 . 3544 − 0.0 292 × Age + 0.0 167 × WAIST + 0. 2856 × I ( elevated blood pressure ) + 0.000 2 × FPG × FPG − 6 . 4798 × HbA1c + 0. 6856 × HbA1c × HbA1c + 0.0 192 × Log ( UACR ) × Log ( UACR ) + 0. 3723 × I ( hypertriglyceridemia ) (2)
and in which the “elevated blood pressure” is defined as systolic blood pressure (SBP)/diastolic blood pressure (DBP) ≥ 130/85 mmHg or on hypertension (HTN) medication treatments, UACR denotes urinary albumin/creatinine ratio, hypertriglyceridemia is defined as triglyceride (TG) ≥ 150 mg/dl, and I(.) the indicator function (for example, I(hypertriglyceridemia) = 1 if hypertriglyceridemia presented; =0 otherwise).
Already collected data: Data from the reference group (the SHS baseline (1989-1991) AI participants, aged 45 - 75 years, with WAIST > 102 cm and free of DM).
It would be desirable to include everyone in the target group for intervention. However, this could be expensive and labor-intensive due to the size of the target group (based on the SHS data, about 46% of aged 40+ non-DM AI may have WAIST > 102 cm, which is huge even from a small community). In addition, not everyone in the target group will develop DM (only about 29% of AI in the target group would develop DM in 4 years based on the SHS data). Therefore, ideally, only those persons who are at high risk of developing DM (or an affordable number within the budget limitation) would receive the intervention. To implement this approach we need to solve Problem 1. How to identify those at high risk of developing DM in the target group for intervention? Incident DM is usually the result of combined effects of many risk factors such as FPG, HbA1c, WAIST, UACR, and metabolic syndrome traits, and usually most of them are correlated [
U ( p , CIDM , Costs , Benefits ) = CIDM × SEN ( p ) × Benefits − ( 1 − CIDM ) × ( 1 − SPE ( p ) ) × Costs (3)
Or, equivalently,
U ( p , CIDM , CBR ) = CIDM × SEN ( p ) − ( 1 − CIDM ) × ( 1 − SPE ( p ) ) × CBR (3a)
where CIDM is the estimated cumulative incidence of DM for the target group (=0.2888 estimated based on the data from the reference group); CBR = Costs/Benefits is a given costs-to-benefits ratio; p denotes a cutoff probability, say, p = 0.1 to 0.9 by 0.0001; SEN(p) and SPE(p) are the respective sensitivity and specificity for a given p(i.e., relating to the accuracy of identifying those who will or will not develop incident DM) and can be obtained based on the data from the reference group and the SHS DM risk prediction model.
For a given estimated CIDM, if CBR has been assumed/estimated for the intervention, the utility can be calculated at each p between 0.1 and 0.9. The optimal costs-benefits-balanced cutoff probability associated with the given CBR, denoted as p*, is defined as the cutoff probability with the highest utility, that is,
U ( p * , CIDM , CBR ) = max { U ( p , CIDM , CBR ) , 0 < p < 1 } (4)
In a special case when CBR equals CIDM/(1-CIDM) (that is the odds of DM), from Equation (3a) and (4), the corresponding p* also maximizes SEN(p) + SPE(p).
In the case that funds are budgeted to have only a fixed number of individuals in the target group for the intervention, the affordable cutoff probability p^{†} can be simply estimated as
p † = the { 100 × ( 1 − The fixed number in the target group for intervention Estimated total number of individuals in the target group ) } percentile of “ all predicted probabilities from the AIs in the reference group ” (5)
After identified participants for intervention based on either the optimal costs-benefits-balanced cutoff probability p* or the affordable cutoff probability p^{†}, we encountered immediately Problem 2. How to choose disease risk factors to address with intervention, and determine their appropriate, attainable and safe goal levels? As we aforementioned, incident DM is usually the result of combined effects of many risk factors. Therefore, a prevention program focused on one or two risk factors may not be sufficient, and thus may decrease efficacy of the program. Furthermore, the usual way to set one uniform goal for a risk factor for all participants in a prevention program may not be appropriate or attainable due to individual differences in risk factors and health conditions, and sometimes may even cause adverse effects and safety problems. Adverse events, medication toxicity, and safety problems are reasons that some clinical trials are discontinued. On the other hand, to reduce risk of a disease for those “at high risk of developing DM” or “positive” individuals in the target group through a prevention program, one intuitive way is to improve the profiles of risk factors of the disease in the “positive” individuals to the profiles in the others who are “not-positive” in the target group. To implement these considerations and the approach, we adopted ways from our previous paper [
To reduce effects of individual differences in risk factors and health conditions on setting goal levels for each of the risk factors, we divide all individuals in the reference group into subgroups based on some of the major risk factors in the prediction model, and derive goal levels for each of the risk factors separately for each of subgroups. Because the reference group is representative of the target group, these derived goal levels of risk factors for each of the subgroups based on the data from the reference group can be adopted as the respective goal levels for the respective subgroups of the target group. Prevention settings to achieve the goal levels of all risk factors for each participant in the target group can then be designed individually based on his/her measured risk profile from the screening/baseline exam, respective subgroup goal levels, and prevention program. Individuals in each subgroup of the reference group will be classified as positive (if their “predicted incident risk from the prediction model” ≥the given cutoff probability p*) or not-positive (other-wise). For each subgroup and a continuous risk factor, we propose to use a regression model to derive the goal level for the risk factor. In the regression model, the risk factor is the dependent variable, and the other risk factors in the prediction model and a classified variable (=1 if an individual is positive; =0, otherwise) are independent variables. Least-squares means (LSM) and 95% confidence interval (CI) of the risk factor for those positives and not-positives in the subgroup then can be estimated from the regression. The LSM represents the mean of the risk factor after adjusting for the other risk factors since they may be correlated. We propose to use the upper bound of the 95% CI of the LSM of the risk factor from those not-positives in the subgroup as the goal level of the risk factor for the subgroup (the lower bound will be used if the risk factor is negatively associated with the disease in the prediction model). For a dichotomous risk factor, a similar procedure using a logistic regression model will be applied. It is obvious that if the participants in each subgroup of the target group approach the goal levels of the risk factors for the subgroup through the prevention program, that is, their levels of risk factors will not differ significantly from those of not-positives, consequently their expected disease risks will also decrease and approach the risks of those who are not positive.
For example, the regression model for deriving the upper bound of the 95% CI of the LSM of FPG from those not-positives in a subgroup (the goal level of risk factor FPG for the subgroup) is as follows.
FPG = b 0 + b 1 × I ( individual is positive ) + b 2 × Age + b 3 × WAIST + b 4 × I ( HTN medications ) + b 5 × SBP + b 6 × DBP + b 7 × HbA1c + b 8 × Log ( UACR ) + b 9 × Log ( TG ) + ε (6)
where ε denotes the error term and I(.) is the indicator function.
Let APPDM positive , i and APPDM not-positive , i denote the estimated average predicted probabilities of developing DM (PPDM) in four years from those positives and not-positives in the ith subgroup of the reference group, respectively; and m_{i} and k_{i} denotes the number of positives (intervention participants) and not-positives, respectively, in the ith subgroup of the target group. Then, two APPDMs for a subgroup can be used to pre-assess expected intervention effects for the subgroup. In addition, the weighted average
∑ i m i APPDM positive , i / ∑ i m i − ∑ i k i APPDM not-positive , i / ∑ i k i (7)
will give the pre-assessed expected intervention effect for all intervention participants. Furthermore, the difference between PPDM based on the risk factor measurements at the screening/baseline exam for prevention and at the exam at the end of the intervention period from each intervention participant can be used as a score to estimate the true prevention effect.
The characteristics for baseline participants of the SHS have been reported previously [
CBR | ||||||
---|---|---|---|---|---|---|
0.406 (=CIDM/(1-CIDM)) | 0.2 | 0.4 | 0.6 | 0.8 | 1 | |
p* | 0.2945 | 0.148 | 0.295 | 0.302 | 0.404 | 0.468 |
%^{a} | 38.30% | 83.70% | 38.30% | 36.80% | 16.50% | 9.30% |
^{a}The respective expected percentage of American Indians in the target group who will be identified as “at high risk of developing DM” or “positive” by using the p* in the screening exam, and hence will be included for DM intervention. CIDM, estimated cumulative incidence rate of DM in 4 years in the target group (CIDM = 0.2888, based on the data from the reference group, and hence CIDM/(1-CIDM) = 0.4060); DM is defined as FPG ≥ 126 mg/dl or HbA1c ≥ 6.5%.
corresponding cutoff probability p* = 0.2945. If this p* will be used in identification, by using the measured risk factors at the screening exam, those AI in the target group whose predicted probability (from Equation (1)) ≥ p* (=0.2945) would be classified as “at high risk of developing DM” or “positive” and be selected for intervention.
Based on the data from the reference group and Equation (5),
According to the methods explained in the Methods section, we divide all individuals in the reference group into four subgroups (FPG ≤ 106 mg/dL and HbA1c ≤ 5.3%, FPG ≤ 106 mg/dL and HbA1c 5.4% - 6.4%, FPG 107 - 125 mg/dL and HbA1c ≤ 5.3%, FPG 107 - 125 mg/dL and HbA1c 5.4% - 6.4%) based on the 50th percentiles of FPG (106 mg/dl) and HbA1c (5.3%).
Implementing a disease prevention intervention for all individuals in a target group is usually not economically affordable, or may result in unnecessary intervention for large percent of individuals with low risk [
FPG | HbA1c | Not-Positive | ||||
---|---|---|---|---|---|---|
(mg/dl) | (%) | Risk Factor | LSM | 95% CI | P^{a} | |
≤106 | ≤5.3 | FPG (mg/dl) | 97 | 96 | 98 | 0.0595 |
HbA1c (%) | 4.9 | 4.8 | 4.9 | 0.0164 | ||
Not-Positive | Positive | UACR (mg/g) | 6 | 5 | 7 | 0.0001 |
n = 257 | n = 21 | TG (mg/dl) | 113 | 107 | 120 | 0.0019 |
APPDM = 0.164 | APPDM = 0.356 | TG ≥ 150 mg/dl | 21.7% | 16.81% | 27.48% | 0.0002^{c} |
APPDM-All = 0.178 | SBP/DBP ≥ 130/85 mmHg or on medication for HTN | 59.7% | 53.16% | 65.98% | 0.0176^{c} | |
DBP^{b} (mmHg) | 77 | 76 | 78 | 0.0037 | ||
SBP^{b} (mmHg) | 123 | 121 | 124 | 0.0033 | ||
WAIST (cm) | 112 | 111 | 113 | <0.0001 | ||
≤106 | 5.4 - 6.4 | FPG (mg/dl) | 97 | 96 | 98 | 0.0005 |
HbA1c (%) | 5.6 | 5.6 | 5.7 | <0.0001 | ||
Not-Positive | Positive | UACR (mg/g) | 7 | 5 | 10 | 0.0607 |
n = 79 | n = 69 | TG (mg/dl) | 117 | 105 | 129 | 0.1262 |
APPDM = 0.210 | APPDM = 0.405 | TG ≥ 150 mg/dl | 13.3% | 6.57% | 25.02% | 0.0076 |
APPDM-All = 0.301 | SBP/DBP ≥ 130/85 mmHg or on medication for HTN | 46.9% | 30.94% | 63.44% | 0.0054 | |
DBP (mmHg) | 74 | 72 | 76 | 0.0314 | ||
SBP (mmHg) | 122 | 119 | 125 | 0.0717 | ||
WAIST (cm) | 112 | 110 | 115 | 0.0003 | ||
107 - 125 | ≤5.3 | FPG (mg/dl) | 112 | 111 | 113 | <0.0001 |
HbA1c (%) | 4.9 | 4.9 | 5.0 | 0.3036 | ||
Not-Positive | Positive | UACR (mg/g) | 6 | 5 | 8 | 0.0255 |
n = 114 | n = 63 | TG (mg/dl) | 115 | 106 | 125 | 0.0002 |
APPDM = 0.218 | APPDM = 0.360 | TG ≥ 150 mg/dl | 7.7% | 3.79% | 14.98% | <0.0001 |
APPDM-All = 0.268 | SBP/DBP ≥ 130/85 mmHg or on medication for HTN | 35.8% | 25.22% | 48.04% | <0.0001 | |
DBP (mmHg) | 75 | 74 | 77 | 0.0549 | ||
SBP (mmHg) | 120 | 118 | 123 | 0.0011 | ||
WAIST (cm) | 111 | 110 | 113 | <0.0001 | ||
107 - 125 | 5.4 - 6.4 | FPG (mg/dl) | 111 | 109 | 112 | <0.0001 |
HbA1c (%) | 5.6 | 5.5 | 5.6 | <0.0001 | ||
Not-Positive | Positive | UACR (mg/g) | 3 | 2 | 6 | 0.0019 |
n = 39 | n = 151 | TG (mg/dl) | 108 | 93 | 125 | 0.0940 |
APPDM = 0.249 | APPDM = 0.456 | TG ≥ 150 mg/dl | 4.2% | 1.22% | 13.22% | 0.0002 |
APPDM-All = 0.413 | SBP/DBP ≥ 130/85 mmHg or on medication for HTN | 32.2% | 17.27% | 51.96% | 0.0017 | |
---|---|---|---|---|---|---|
DBP (mmHg) | 75 | 72 | 77 | 0.6384 | ||
SBP (mmHg) | 125 | 121 | 129 | 0.8488 | ||
WAIST (cm) | 111 | 108 | 113 | <0.0001 |
^{a}p-value from testing the difference of least-square means between positive and not-positive AI in a subgroup. ^{b}The results for DBP and SBP are based on data from those without hypertension medications treatments. ^{c}p-value from testing the difference of least-square rates of the metabolic syndrome trait between positive and not-positive AI in a subgroup. AI, American Indians; CI, confidence interval; DBP, diastolic blood pressure; n, the sample size; APPDM, estimated average predicted probability of developing DM in four years; FPG, fasting plasma glucose; HbA1c, hemoglobin A1c; HTN, hypertension; LSM, least-square mean; SBP, systolic blood pressure; TG, triglycerides; UACR, urinary albumin and creatinine ratio; WAIST, waist circumference.
among those AIs who participated the SHS, the proportions of those potential participants for DM interventions considered in the literature such as pre-DM or obese [
Recent clinical trials demonstrated that lifestyle/pharmaceutical interventions may prevent development of DM [
a) Addressed complex associations of a disease with its combined and correlated major risk factors, and used all available valuable results and costly collected data in the design.
b) It is reasonable to expect that individuals in the same subgroup have approximately similar health conditions. The proposed goal levels based on the
levels of risk factors from those not-positives in the same subgroup accommodate subgroup differences and the combined and correlated effects of the DM risk factors. Therefore, these proposed goal levels might be more appropriate, attainable and safe compared to those usual ways of setting uniform goal levels for all participants in an intervention. Moreover, in an intervention, for a participant in a subgroup, if his/her levels of some risk factors are already satisfying the respective goal levels, no interventions for these risk factors will be conducted, and thus is cost-saving.
c) The derived information and goal levels (
d)
e)
f) Easy prediction and assessments for the intervention as explained in Methods section.
g) Learnable. Data collected from the intervention might be added to the already collected data, and the expanded data then might be used to improve/update the disease prediction model and the subgroup goal levels for the future intervention.
We proposed and demonstrated how to utilize and translate the available research results from SHS in the cost-effective design of a DM prevention program for the target group, and assessed/predicted the effectiveness of our proposed strategy. The strategy and methods shown in the illustrative example for DM prevention can be analogously adopted and applied for other disease preventions. To our knowledge, the proposed cost-effective design strategy is new representing a novel frame work for the utilization and translation of large collected data to inform practice. However, such design strategies need to be tested and validated in real disease prevention studies. The proposed strategy depends on a disease prediction model and risk factors data from the same (or similar) population of the target group. If the needed information is not available from the same population, one may use available information from another population that closely resembles the population under study. The cutoff probability p* from Equation (4) depends on assumed/estimated CBR. The estimation of CBR depends on intervention programs and the definitions of costs and benefits [
A limitation specific, not to the approach, but to the disease diabetes is that the two risk factors that are more cost effective are not on the causal path to the development of type 2 diabetes. Elevated triglycerides and blood pressure levels are a result of the insulin resistance that is the determinant that results in elevated glucose levels and eventual pancreatic fatigue. It is not feasible to measure insulin resistance in a clinical setting, however. Thus correcting the elevated triglycerides and blood pressure may not improve insulin resistance. This limitation is specific to diabetes, however, whereas in most other chronic diseases, such as cardiovascular disease, the measurable risk factors are in the causative pathway (e.g. elevated LDL C). Thus, the strategy presented here may be even more cost effective in those cases.
The proposed strategy considers the complex associations of a disease with its combined and correlated risk factors and individual differences; provides ways to cost-effectively identify individuals for intervention, and to simultaneously set gradual, attainable and safe goal levels for all risk factors in different subgroups; and forms an adaptive intervention frame. The proposed design strategy represents a way to utilize or translate available valuable results and costly collected data from large cohort studies for clinical disease prevention practice, and can be applied to group/community disease prevention interventions.
This study was supported by grants UO1 HL-41642, UO1 HL-41652, UO1 HL-41654, R01HL109284, R01HL109315, R01HL109319, R01HL109301, and R01HL109282 from the National Heart, Lung and Blood Institute. The SHS was approved by all Institutional Review Boards from related universities, institutes, centers and the tribes. The authors wish to express their appreciation to all participating AI tribes/communities, the Indian Health Service, and the participants for their support and assistance. The authors also thank the SHS field center coordinators and the SHS staffs for conducting exams and collecting the data.
The authors declare no conflicts of interest regarding the publication of this paper.
Wang, W.Y., Lee, E.T., Howard, B.V., Devereux, R., Zhang, Y. and Stoner, J.A. (2018) Large Cohort Data Based Cost-Effective Disease Prevention Design Strategy: Strong Heart Study. World Journal of Cardiovascular Diseases, 8, 588-601. https://doi.org/10.4236/wjcd.2018.812058