Prevalent cohort studies involve screening a sample of individuals from a population for disease, recruiting affected individuals, and prospectively following the cohort of individuals to record the occurrence of disease-related complications or death. This design features a response-biased sampling scheme since individuals living a long time with the disease are preferentially sampled, so naive analysis of the time from disease onset to death will over-estimate survival probabilities. Unconditional and conditional analyses of the resulting data can yield consistent estimates of the survival distribution subject to the validity of their respective model assumptions. The time of disease onset is retrospectively reported by sampled individuals, however, this is often associated with measurement error. In this article we present a framework for studying the effect of measurement error in disease onset times in prevalent cohort studies, report on empirical studies of the effect in each framework of analysis, and describe likelihood-based methods to address such a measurement error.
Both the conditional and unconditional analyses make use of the reported onset time, and the latter requires the additional assumption of a stationary disease incidence process. For individuals determined to have the disease at the time of assessment, the disease may have begun several years earlier, making accurate recall of the onset time difficult. There may therefore be considerable uncertainty about the reported onset time and the difference between the true onset time and the reported onset time represents recall, reporting, or measurement error; we will henceforth use the term measurement error.
Both the conditional and unconditional approaches to the analysis of prevalent cohort data will in general lead to biased estimators in the presence of measurement error. We therefore investigate the impact of this measurement error in both the conditional and unconditional frameworks for parametric and nonparametric settings.
In retrospective studies, selected patients need to recall their disease onset times. In this case, the recall times are very likely different from the exact disease onset times, even though perhaps they are quite close. Consider disease incidence over
where
The data obtained in this case are
where
If we ignore the measurement error and treat
From
. Empirical properties of estimators in presence of measurement error in disease onset time using Naive likelihood (NAIVE), Conditional likelihood (COND) and Unconditional likelihood (UNCOND); n = 500, nsim = 1000
Method | |||||||||
---|---|---|---|---|---|---|---|---|---|
EBIAS | ESE | ASE | ECP | EBIAS | ESE | ASE | ECP | ||
NAIVE | −0.434 | 0.025 | 0.024 | 0.000 | 0.095 | 0.042 | 0.042 | 0.381 | |
COND | 0.033 | 0.053 | 0.055 | 0.937 | −0.250 | 0.065 | 0.064 | 0.024 | |
UNCOND | −0.090 | 0.037 | 0.036 | 0.295 | −0.145 | 0.050 | 0.049 | 0.177 | |
NAIVE | −0.316 | 0.022 | 0.022 | 0.000 | 0.174 | 0.041 | 0.041 | 0.013 | |
COND | 0.003 | 0.040 | 0.040 | 0.958 | −0.085 | 0.060 | 0.059 | 0.702 | |
UNCOND | −0.026 | 0.031 | 0.031 | 0.843 | −0.047 | 0.049 | 0.048 | 0.833 | |
NAIVE | −0.267 | 0.023 | 0.022 | 0.000 | 0.214 | 0.041 | 0.041 | 0.000 | |
COND | 0.001 | 0.036 | 0.034 | 0.943 | 0.001 | 0.054 | 0.055 | 0.958 | |
UNCOND | 0.001 | 0.030 | 0.029 | 0.946 | 0.001 | 0.048 | 0.047 | 0.950 | |
coverage probability is far away from the nominal value. Further, when the variance of the measurement error becomes smaller, the biases of estimators reduce a lot and the empirical coverage probabilities become better. This makes sense because the smaller the variance of measurement error, the closer of reported onset time to the true onset time, which reduces the impact of using the reported onset time.
To clearly understand the importance of correcting for measurement error in disease onset time for prevalent cohort samples, we plot the true survivor function versus estimated survivor functions based on the naive, conditional and unconditional likelihoods without correcting for measurement error, both parametric and nonparametric models are considered.
A “correct” likelihood approach can be used to account for the measurement error in the onset time and will
. Empirical properties of nonparametric and parametric survivor estimators at certain time points based on naive (NAIVE), conditional (COND) and unconditional (UNCOND) likelihoods; n = 500, nsim = 1000
t | True | Nonparametric | Parametric | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NAIVE | COND | UNCOND | NAIVE | COND | UNCOND | |||||||||||||
EST | ESE | EST | ESE | EST | ESE | EST | ESE | EST | ESE | EST | ESE | |||||||
2.537 | 0.2 | 0.516 | 0.024 | 0.216 | 0.026 | 0.273 | 0.024 | 0.522 | 0.019 | 0.219 | 0.021 | 0.276 | 0.019 | |||||
2.195 | 0.3 | 0.617 | 0.022 | 0.291 | 0.032 | 0.360 | 0.030 | 0.623 | 0.018 | 0.298 | 0.026 | 0.367 | 0.023 | |||||
1.914 | 0.4 | 0.699 | 0.021 | 0.362 | 0.039 | 0.440 | 0.035 | 0.705 | 0.017 | 0.376 | 0.031 | 0.453 | 0.026 | |||||
1.665 | 0.5 | 0.770 | 0.019 | 0.435 | 0.045 | 0.518 | 0.039 | 0.773 | 0.015 | 0.455 | 0.034 | 0.537 | 0.028 | |||||
1.429 | 0.6 | 0.831 | 0.017 | 0.510 | 0.051 | 0.595 | 0.043 | 0.832 | 0.013 | 0.537 | 0.036 | 0.620 | 0.028 | |||||
1.194 | 0.7 | 0.886 | 0.014 | 0.592 | 0.056 | 0.674 | 0.046 | 0.883 | 0.011 | 0.625 | 0.037 | 0.704 | 0.027 | |||||
0.945 | 0.8 | 0.933 | 0.011 | 0.683 | 0.062 | 0.757 | 0.049 | 0.928 | 0.008 | 0.721 | 0.035 | 0.791 | 0.024 | |||||
2.537 | 0.2 | 0.428 | 0.024 | 0.210 | 0.024 | 0.223 | 0.022 | 0.436 | 0.019 | 0.212 | 0.019 | 0.224 | 0.015 | |||||
2.195 | 0.3 | 0.546 | 0.023 | 0.302 | 0.029 | 0.319 | 0.027 | 0.555 | 0.018 | 0.305 | 0.024 | 0.322 | 0.020 | |||||
1.914 | 0.4 | 0.645 | 0.021 | 0.389 | 0.034 | 0.411 | 0.031 | 0.654 | 0.017 | 0.397 | 0.028 | 0.417 | 0.023 | |||||
1.665 | 0.5 | 0.733 | 0.019 | 0.477 | 0.040 | 0.503 | 0.036 | 0.737 | 0.016 | 0.489 | 0.031 | 0.512 | 0.026 | |||||
1.429 | 0.6 | 0.809 | 0.017 | 0.565 | 0.046 | 0.594 | 0.042 | 0.809 | 0.014 | 0.582 | 0.032 | 0.606 | 0.027 | |||||
1.194 | 0.7 | 0.875 | 0.015 | 0.654 | 0.051 | 0.685 | 0.046 | 0.871 | 0.011 | 0.677 | 0.032 | 0.700 | 0.026 | |||||
0.945 | 0.8 | 0.930 | 0.011 | 0.745 | 0.055 | 0.776 | 0.049 | 0.924 | 0.008 | 0.776 | 0.028 | 0.796 | 0.022 | |||||
. Empirical properties of nonparametric and parametric percentile estimators based on naive (NAIVE), conditional (COND) and unconditional (UNCOND) likelihoods; n = 500, nsim = 1000
True | Nonparametric | Parametric | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NAIVE | COND | UNCOND | NAIVE | COND | UNCOND | |||||||||||||
EST | ESE | EST | ESE | EST | ESE | EST | ESE | EST | ESE | EST | ESE | |||||||
0.453 | 0.823 | 0.068 | 0.175 | 0.143 | 0.240 | 0.154 | 0.800 | 0.048 | 0.291 | 0.049 | 0.395 | 0.046 | ||||||
0.649 | 1.118 | 0.064 | 0.317 | 0.174 | 0.433 | 0.173 | 1.110 | 0.054 | 0.460 | 0.064 | 0.598 | 0.058 | ||||||
1.073 | 1.732 | 0.069 | 0.735 | 0.195 | 0.948 | 0.158 | 1.752 | 0.058 | 0.873 | 0.086 | 1.067 | 0.073 | ||||||
1.665 | 2.590 | 0.084 | 1.445 | 0.167 | 1.710 | 0.133 | 2.613 | 0.066 | 1.532 | 0.101 | 1.772 | 0.081 | ||||||
2.355 | 3.581 | 0.117 | 2.360 | 0.141 | 2.632 | 0.126 | 3.582 | 0.093 | 2.390 | 0.103 | 2.646 | 0.079 | ||||||
3.035 | 4.519 | 0.182 | 3.296 | 0.140 | 3.538 | 0.120 | 4.513 | 0.136 | 3.311 | 0.112 | 3.548 | 0.082 | ||||||
3.462 | 5.056 | 0.247 | 3.877 | 0.156 | 4.087 | 0.132 | 5.087 | 0.169 | 3.921 | 0.132 | 4.132 | 0.094 | ||||||
0.453 | 0.824 | 0.064 | 0.248 | 0.157 | 0.303 | 0.159 | 0.788 | 0.043 | 0.398 | 0.051 | 0.434 | 0.044 | ||||||
0.649 | 1.086 | 0.057 | 0.433 | 0.183 | 0.518 | 0.162 | 1.066 | 0.046 | 0.588 | 0.062 | 0.632 | 0.053 | ||||||
1.073 | 1.609 | 0.056 | 0.914 | 0.162 | 1.006 | 0.130 | 1.625 | 0.049 | 1.014 | 0.075 | 1.069 | 0.063 | ||||||
1.665 | 2.321 | 0.066 | 1.591 | 0.136 | 1.664 | 0.125 | 2.352 | 0.053 | 1.635 | 0.080 | 1.695 | 0.066 | ||||||
2.355 | 3.139 | 0.093 | 2.370 | 0.113 | 2.424 | 0.116 | 3.148 | 0.071 | 2.385 | 0.079 | 2.437 | 0.062 | ||||||
3.035 | 3.921 | 0.146 | 3.117 | 0.117 | 3.158 | 0.137 | 3.898 | 0.103 | 3.144 | 0.087 | 3.180 | 0.064 | ||||||
3.462 | 4.371 | 0.202 | 3.582 | 0.137 | 3.615 | 0.137 | 4.355 | 0.127 | 3.630 | 0.103 | 3.651 | 0.074 | ||||||
Nonparametric and parametric estimates of survivor function based on the naive, conditional and unconditional likelihoods in presence of measurement error in disease onset time when ignoring the measurement error; n = 5000. (a) σ = 1; (b) σ = 0.5
yield unbiased estimators of the parameters of interest if the component model assumptions are correctly specified. Such a likelihood should be based on the reported onset time and the (possibly censored) survival time, which will require explicit modeling of the measurement error process. Let
The “correct” conditional likelihood for right-censored left-truncated data
Similarly, the joint density of the observed onset time and calendar time of death is
where the last equality is derived by (10).
The “correct” unconditional likelihood can then be constructed as follows,
where
Since
The maximum likelihood estimators
where
To examine the performance of “correct” likelihoods in the presence of measurement error in disease onset time, we use the same strategy to generate length-biased survival data with measurement error in disease onset times as in Section 3.2. The “correct” likelihood is considered here in two scenarios: the variance of the measurement error
Statistical models and methods for the analysis of prevalent cohort data have been reviewed here from both the conditional and unconditional frameworks. It is well known that naive analyses which ignore the selection bias lead to overestimation of the survivor probabilities. The conditional likelihood based on the density for left- truncated event times can be used to correct for this selection bias. The unconditional likelihood approach is based on the joint density of the backwards and forward recurrence times yield more efficient estimators by incorporating the information contained in the onset times. The typical assumption required to formulate the associated model is of a stationary disease incidence process. Since both approaches make use of the onset time information to correct for selection effects, misspecification of the retrospectively reported disease onset time can have serious implications on the estimation. We investigate the impact of measurement error in disease onset time for prevalent cohort sample and propose “correct” conditional and unconditional likelihoods to account for the measurement error.
The methods we proposed to correct for measurement error in this paper are based on the parametric model. It
Comparison of the true survivor function with estimated survivor functions based on conditional likelihood and “correct” conditional likelihood approach; σ = 1, n = 500, nsim = 1000
. Empirical properties of estimators based on the naive conditional likelihood (COND.NA), the corrected conditional likelihood (COND.C), the naive unconditional likelihood (UNCOND.NA), and the corrected unconditional likelihood (UNCOND.C); n = 500, nsim = 1000
EBIAS | ESE | ASE | ECP | EBIAS | ESE | ASE | ECP | ||
COND.NA | 0.0329 | 0.0534 | 0.0551 | 0.937 | −0.2496 | 0.0655 | 0.0645 | 0.024 | |
COND.C1 | −0.0024 | 0.0498 | 0.0516 | 0.957 | 0.0316 | 0.1682 | 0.1663 | 0.931 | |
COND.C2 | 0.0011 | 0.0489 | 0.0507 | 0.968 | 0.0059 | 0.1140 | 0.1150 | 0.958 | |
UNCOND.NA | −0.0903 | 0.0368 | 0.0356 | 0.295 | −0.1451 | 0.0503 | 0.0493 | 0.177 | |
UNCOND.C1 | −0.0006 | 0.0464 | 0.0471 | 0.958 | 0.0188 | 0.1246 | 0.1214 | 0.955 | |
UNCOND.C2 | 0.0005 | 0.0463 | 0.0471 | 0.962 | 0.0103 | 0.0984 | 0.0986 | 0.961 | |
COND.NA | 0.0028 | 0.0389 | 0.0399 | 0.970 | −0.0877 | 0.0595 | 0.0591 | 0.703 | |
COND.C1 | −0.0037 | 0.0383 | 0.0396 | 0.960 | 0.0389 | 0.1282 | 0.1154 | 0.947 | |
COND.C2 | 0.0007 | 0.0381 | 0.0398 | 0.969 | 0.0011 | 0.0701 | 0.0720 | 0.960 | |
UNCOND.NA | −0.0248 | 0.0309 | 0.0312 | 0.867 | −0.0485 | 0.0483 | 0.0483 | 0.826 | |
UNCOND.C1 | 0.0021 | 0.0344 | 0.0361 | 0.968 | 0.0086 | 0.0703 | 0.0714 | 0.971 | |
UNCOND.C2 | 0.0011 | 0.0334 | 0.0350 | 0.959 | 0.0019 | 0.0600 | 0.0618 | 0.964 | |
1Denotes case of unknown
is of interest to investigate what the limiting value is of standard nonparametric estimators for both the conditional and unconditional frameworks. The modest increase in the standard error of the Weibull shape and scale parameters when
We focused on the classical error model in this study, but other measurement error models are also of interest; often individuals will report later onset times since their views on disease onset may be more closely tied to the onset of symptoms than the actual disease. Methods to correct for this kind of measurement error are also important and are under development.