**American Journal of Computational Mathematics**

Vol.05 No.04(2015), Article ID:61501,9 pages

10.4236/ajcm.2015.54035

Estimation and Forecasting Survival of Diabetic CABG Patients (Kalman Filter Smoothing Approach)

M. Saleem^{1}, K. H. Khan^{2}, Nusrat Yasmin^{1}

^{1}Centre for Advanced Studies in Pure and Applied Mathematics, Bahauddin Zakariya University, Multan, Pakistan

^{2}Department of Mathematics, College of Science and Humanities, Prince Sattam Bin Abdulaziz University, Al Kharj, Saudi Arabia

Copyright © 2015 by authors and Scientific Research Publishing Inc.

This work is licensed under the Creative Commons Attribution International License (CC BY).

Received 9 September 2015; accepted 23 November 2015; published 26 November 2015

ABSTRACT

In this paper, we present a new approach (Kalman Filter Smoothing) to estimate and forecast survival of Diabetic and Non Diabetic Coronary Artery Bypass Graft Surgery (CABG) patients. Survival propor- tions of the patients are obtained from a lifetime representing parametric model (Weibull distri- bution with Kalman Filter approach). Moreover, an approach of complete population (CP) from its incomplete population (IP) of the patients with 12 years observations/follow-up is used for their survival analysis [1] . The survival proportions of the CP obtained from Kaplan Meier method are used as observed values at time t (input) for Kalman Filter Smoothing process to update time varying parameters. In case of CP, the term representing censored observations may be dropped from likelihood function of the distribution. Maximum likelihood method, in-conjunction with Davi- don-Fletcher-Powell (DFP) optimization method [2] and Cubic Interpolation method is used in esti- mation of the survivor’s proportions. The estimated and forecasted survival proportions of CP of the Diabetic and Non Diabetic CABG patients from the Kalman Filter Smoothing approach are presented in terms of statistics, survival curves, discussion and conclusion.

**Keywords:**

CABG Patients, Complete and Incomplete Populations, Weibull & Distribution, Kalman Filter, Maximum Likelihood Method, DFP Method, Estimation and Forecasting of Survivor’s Proportions

1. Introduction

The Coronary Artery Disease (CAD) is a chronic disease, which progresses with age at different rates. CAD is a result of built-up of fats on the inner walls of the coronary arteries. Thus, the sizes of coronary arteries become narrow and as a result the blood flow to the heart muscles is reduced/blocked. Therefore, the heart muscles do not receive required oxygenated blood, which leads to the heart attack. CAD is a leading cause of death worldwide (see Hansson [3] , John [4] , and Sun and Hong [5] , William, Stephen, Thomas and Robert [6] ). The medical scientists Goldstein [7] and Jennifer [8] , William [6] are of the opinion that CABG is an effective treatment option for CAD patients. The medical research organizations like Heart and Stroke Foundation Canada [9] , American Heart Association [10] and Virtual Health Care Team Columbia have classified risk factors of CABG patients as modifiable (Hypertension, Diabetes, Smoking, High Cholesterol, Sedentary Lifestyle and Obesity) and non-modifi- able (Age, Gender and Family History-Genetic Predisposition).

William, Ellis, Josef, Ralph and Robert [6] carried out the survival study on incomplete population (progressive censoring of type 1) of CABG patients comprising 2011 patients using Kaplan Meier method [11] . The patients were grouped with respect to Male, Female, Age, Hypertension, Diabetes, and Ejection Fraction, Vessels, Congestive Heart Failure, Elective and Emergency Surgery. The patients were undergone through a first re-operation at Emory University hospitals from 1975 to 1993 (see William [12] . The patients were observed/followed up for 12 years. In the article [13] [14] we proposed a procedure, to make an IP, and a CP.

The Weibull distribution model has been used for survival analysis by Abrenthy [15] , Bunday [16] , Cohen [17] , Crow [18] , Gross and Clark [19] , Klein & Moeschberger [20] , Lang [21] , Lawless [22] , and Paul [23] . In particular, the survival study of chronic diseases, such as AIDS and Cancer, has been carried out using Weibull distributions by Bain and Englehardt [24] , Khan & Mahmud [23] [25] , Klein & Moeschberger [20] , Lawless [22] and Swaminathan and Brenner [26] . Lanju & William [27] used Weibull distribution to human survival data of patients with plasma cell and in response-adaptive randomization for survival trials respectively. We [14] have carried out survival analysis of CABG patients by parametric estimations-classical approach, in modifiable risk factors (Hypertension and Diabetes).

The dynamic linear model (DLM) and Kalman Filter (KF) equations have been described by Harrison and Steven [28] . According to the researchers, Sorenson [2] and Greg [29] Kalman Filter is a mathematical technique, used to estimate the state of a process by minimizing error of estimation. Kalman Filter extracts signals from a series of incomplete and noisy measurements. It removes noises from the process parameters and retains useful information. Kalman filter estimates the state of a dynamic linear model through its recurrence equations which minimizes the variance of estimation error. To implement Kalman filter, observed values as dependent variables are required for updating the process parameters. Though, since time of introduction, the Kalman Filter has been subject of research for engineering processes see Frank [30] , however the KF methodology has been applied extensively in medical research/life-testing studies/survival analysis; for example, Meinhold and Singpurwalla [31] proposed a new method for inference and extrapolations in certain dose-response, damage-assess- ment, and accelerated-life-testing studies, using Kalman-filter smoothing. Anatoli, Kenneth and James [32] indicated that various multivariate stochastic process models have been developed to represent human physiological aging and mortality. These researchers considered the effects of observed and unobserved state variables on the age trajectory of physiological parameters. The parameters of the distribution used were estimated based on an extension of the theory of Kalman filters to include systematic mortality selection. Ludwig [33] considered models for discrete time panel and survival data; and used a generalized linear Kalman filter approach.

In our study, Kalman filter technique is applied to estimate parameters of Weibull probability distribution using Diabetic and Non Diabetic CABG patient’s data sets. For construction of KF equations, survivor function of the probability distribution is linearized by transformation of double-log. The procedure to construct linear form of the survivor function, as advocated by researchers (see Gross and Clark [19] , Kalbfleisch and Prentice [34] and Lawless [22] , Meinhold and Singpurwalla [35] ) is followed. Survival proportions for complete population of Diabetic CABG patients obtained from Kaplan Meier method are used as observed values at time t, for updating the time varying parameters of the distribution. After defining the updating system of parameters of a probability distribution with KF approach (discussed in the methodology), the parameters are estimated at each time t by maximizing likelihood function of double-lognormal distribution, through Davidon-Fletcher-Powel method of optimization [36] . Since, in KF approach the observed values are from complete population, therefore, censored part is excluded (dropped) from log-likelihood function. The survival proportions obtained by the pro- bability distributions with KF approach are presented with respect to Diabetic and Non Diabetic patients i.e. Diabetes Present () and Diabetes Absent () Groups of CABG patients.

2. Methodology

For the estimation of survival proportions Kaplan Meier [11] proposed a method and latter discussed by William

[6] and Lawless [22] i.e., where and are the number of items failed (died indi-

viduals/patients) and number of individuals at risk at time respectively, that is, the number of individuals survived and uncensored at time.This method may be applied to both censored and uncensored data, see Lawless [22] . In case of censored individuals (items) the analysis is performed on IP. Khan, Saleem & Mahmud [1] proposed that the censored individuals may be taken into account. The inclusion of splitted-censored in-

dividuals, proportionally and into known survived, and died in-

dividual’s respectively make populations complete. Thus the survival analysis may be performed on the CP from its IP. We apply Kaplan Meier method on CP of and groups of CABG patients to obtain survival proportions’s and use as input in the DLM and KF equations/process. In this study the observed values (survival proportions) are denoted by, where may take value at time . Harrison and Stevens [28] described the DLM which may be reproduced as system of following two equations:

Observation Equation:

(1)

System Equation:

(2)

where and are of arbitrary dimensions. is a scalar, is vector of process parameters at time t, is matrix of independent variables, known at time t, G is known system matrix (identity matrix), is error term, a difference between observed and expected value and respectively at time t. is the variance of. It is assumed that has Gaussian distribution with mean 0 and variance. The system equation describes the change which occurs when process parameter changes from preceding value to current value and is the variance of disturbance term. According to Harrison and Stevens (1976), it is assumed that distribution of the parameter vector at time t = 0 i.e. prior to the first observation is in the form of normal probability distribution with mean say and variance i.e.. If the observed values; are described through DLM, then the posterior distribution of parameter vector is also normally distributed with mean say and variance i.e.. Whereas, the values of

and are recursively obtained as:;;;;

. The Kalman filter equations are: and (for detail see Harrison and Stevens [28] ). is variance of and is a matrix which update & at each time t recursively.

The KF equations of Weibull probability distribution models are constructed by linearizing survival function of the distribution with transformation; double-log. The parameters of the probability distributions are estimated at each time t, by maximizing log-likelihood function of lognormal distribution (which is transformed form of Weibull distribution), through the Davidon-Fletcher-Powel method of optimization. For the entire system, the parametric space at each time point t is. Specification of starting values of the parameters is a common difficulty in implementing Kalman Filter. Practitioners have to check the sensitivity of the final results with different sets of assumed values (see Meinhold and Singpurwalla, [31] . After obtaining the prior values of the parameters of the probability distributions at time t = 0, the values are obtained recursively by using the Kalman filter updating equations. The survival proportions for complete population of and CABG patients are used as observed values’s at time t, for updating the time varying parameters of the distributions. Since, in the Kalman filter approach the observed values are from complete population, therefore, censored part is dropped from the log-likelihood function. To find maximum likelihood estimates we take negative log-likelihood function of the distribution. A subroutine for maximizing log-likelihood function of each distribution along with KF process is developed in FORTRAN program. The subroutine in-conjunction with the DFP optimization method is used to find the optimal initial estimates of the mean and variance parameters included in the model, and, from final iteration of the program. For outside sample period (forecasting), due to non-availability of dependent values) we stop the process of updating the mean parameters. Therefore, values of these optimal mean parameters remain constant and are utilized for updating the variance parameters for outside sample period, using the KF equations. The survival proportions’s of these probability distributions are estimated.

3. Application (Construction of KF Equations of Weibull Distribution)

Since the values of survival proportions (observed values) lies in the interval (0, 1), expected value, of a probability distribution should also lie in the interval (0, 1). Keeping in view the natural process of deaths with the passage of time, it is assumed that as a function of t is monotonically decreasing. These re-

searchers, Meinhold and Singpurwala [33] considered a quantity which is a

nonlinear, monotonically decreasing function of t and is survival function, of the Weibull distribution. Moreover, the form (where and are scale and shape time varying parameters respectively in KF approach) has property with respect to linearity; may be linearized by taking its double logarithm. The linear

form is a requirement for filtering techniques. Thus to implement KF a random quantity is

defined, which require that has a Gaussian density with expectation and variance. This implies that the random quantity must have double-lognormal distribution with pdfat of the form:

Now,

setting

(3)

The corresponding system equation is:

(4)

Comparing Equations (3) and (4) with (1) and (2), we find that:

,

and (identity matrix).

To find maximum likelihood estimates we consider negative log-likelihood function say) of the double-lognormal distribution, given as:

where, and are observed values from CP and number of failures at time respectively and may

be obtained as:.

For derivation of and its partial derivatives, see Appendix A.

A subroutine for maximizing log-likelihood function of the double-lognormal distribution along with KF process (subroutine) is developed in FORTRAN program.

The subroutine in-conjunction with DFP optimization method is used to find the optimal initial estimates of the parameters included in the model and, from final iteration of the program.

The optimal initial estimates of parameters obtained by maximizing the log-likelihood function are presented in Table 1.

The results (survival proportions obtained by using Weibull distribution and KF approach at each

time point t as explained earlier) of (Diabetic Absent) and (Diabetic Present) groups of CABG patients are presented in Table 2 and Table 3 respectively.

Table 1. The estimates of parameters of Weibull distribution and KF using data of and groups of CABG patients.

Table 2. Survival proportions of 12 years estimated and 3 years forecasted of CP (complete population) of (diabetic absent group) of CABG patients obtained by Kalman Filter approach.

4. Conclusion

The graphs of observed survival proportions from the complete population and expected survival proportions of and groups of CABG patients (Figure 1 & Figure 2) indicate that the behavior of from group is like linear throughout the sample period, whereas of the group is almost linear for the first 7 values and curved for the rest of values; due to more noises, however it remains around of the. This reflects that the complete population (forecasting) data has been modeled adequately. Kalman Filter smoothing approach is appropriate and forecast of and groups of CABG patients is reliable outside the sample observations.

Table 3. Survival proportions of 12 years estimated and 3 years forecasted of CP (complete population) of (diabetic present group) of CABG patients obtained by Kalman filter approach.

Figure 1. Diabetes absent group.

Figure 2. Diabetes present group.

Acknowledgements

We are thankful to our reviewers, whose constructive criticism has resulted in a clearer presentation of our work and inclusion of additional useful reference material.

Cite this paper

M.Saleem,K. H.Khan,NusratYasmin, (2015) Estimation and Forecasting Survival of Diabetic CABG Patients (Kalman Filter Smoothing Approach). *American Journal of Computational Mathematics*,**05**,405-413. doi: 10.4236/ajcm.2015.54035

References

- 1. Khan, K.H., Saleem, M. and Mahmud, Z. (2011) Survival Proportions of CABG Patients: A New Approach. International Journal of Computational Science and Mathematics (IJCSM), 3, 293-302.
- 2. Sorenson, H. (1985) Kalman Filtering: Theory and Application. IEEE Press, Los Alamitos.
- 3. Hansson, G.K. (2005) Inflammation, Atherosclerosis, and Coronary Artery Disease. The New England Journal of Medicine, 352, 1685-1695.
- 4. John, H.L. (2003) Hand Book of Patient Care in Cardiology Surgery. Lippincott Williams & Wilkins.
- 5. Sun, Z. and Hong, N. (2011) Coronary Computed Tomography Angiography in Coronary Artery Disease. World Journal of Cardiology, 3, 303-310. http://dx.doi.org/10.4330/wjc.v3.i9.303
- 6. Weintraub, W.S., Jones, E.L., Craver, J.M., et al. (1995) In-Hospital and Long-Term Outcome after Reoperative Coronary Artery Bypass Graft Surgery. Circulation, 92, II50-II57.
- 7. Goldstein, L.B., Adams, R., Alberts, M.J., Appel, L.J., Brass, C., Bushnell, A., Culebras, T., De Graba, P. and Guyton, J.R. (2006) American Heart Association; American Stroke Association Stroke Council. Primary Prevention of Ischemic Stroke. American Journal of Ophthalmology: American Heart Association, 142, 716.

http://dx.doi.org/10.1016/j.ajo.2006.08.011 - 8. Jennifer, H.R. (2008) After Coronary Artery Bypass Graft Surgery-Recovering from Open Heart Surgery.
- 9. Heart and Stroke Foundation Canada (1997) Heart Disease and Stroke Statistics. Tipping the Scales of Progress: Heart Disease and Stroke in Canada.
- 10. American Heart Association Dallas, Texas (2007) Heart Disease and Stroke Statistics. Circulation, 115, 169-171.
- 11. Kaplan, E.L. and Meier, P. (1958) Nonparametric Estimations from Incomplete Observations. Journal of the American Statistical Association, 53, 457-481. http://dx.doi.org/10.1080/01621459.1958.10501452
- 12. William, S., Weintraub, M., Stephen, D., Clements, J.M., Van Thomas, C., Robert, A. and Guyton, N. (2003) Twenty Years Survival after Coronary Artery Surgery. American Heart Association, Dallas.
- 13. Saleem, M., Khan, K.H. and Mahmud, Z. (2014) Long Term Survival of CABG Patients in Age Groups Using Complete and Incomplete Populations: (A New Approach). International Journal of Scientific & Engineering Research, 5, 21-28.
- 14. Saleem, M., Khan, K.H. and Mahmud, Z. (2012) Survival Analysis of CABG Patients by Parametric Estimations in Modifiable Risk Factors—Hypertension and Diabetes. American Journal of Mathematics and Statistics, 2, 120-128.

http://dx.doi.org/10.5923/j.ajms.20120205.04 - 15. Abernathy, R.B. (1998) The New Weibull Handbook. 3rd Edition, SAE Publications, Warren dale.
- 16. Bunday, B.D. and Al Mutwali, I.A. (1981) Direct Optimization for Calculation of Maximum Likelihood Estimates of Parameters of the Weibull Distribution. IEEE Transactions on Reliability, R-30, 367-369.

http://dx.doi.org/10.1109/TR.1981.5221119 - 17. Cohen, A.C. (1965) Maximum Likelihood Estimation in the Weibull Distribution Based on Complete and on Censored Samples. Technometrics, 7, 579-588.

http://dx.doi.org/10.1080/00401706.1965.10490300 - 18. Crow, L.H. (1982) Confidence Interval Procedures for the Weibull Process with Applications to Reliability Growth. Technometrics, 24, 67-72.

http://dx.doi.org/10.1080/00401706.1982.10487711 - 19. Gross, A.J. and Clark, V. (1975) Survival Distribution: Reliability Applications in the Biomedical Sciences. Wiley, Hoboken.
- 20. Klein, P.J. and Moeschberger, L.M. (1997, 2003) Survival Analysis: Techniques for Censored and Truncated Data. Series: Statistics for Biology and Health, 2nd Edition, Springer, Berlin.
- 21. Lang, W. (2010) Mixed Effects Models for Complex Data. Math & Statistics Library, Stanford.
- 22. Lawless, J.F. (1982, 2003) Statistical Models and Methods for Lifetime Data. John Wiley and Sons, Inc., New York.
- 23. Kurlansky, P., Herbert, M., Prince, S. and Mack, M.J. (2015) Improved Long-Term Survival for Diabetic Patients with Surgical versus Interventional Revascularization. The Annals of Thoracic Surgery, 99, 1298-1305.

http://dx.doi.org/10.1016/j.athoracsur.2014.11.035 - 24. Bain, L.J. and Englehardt, M. (1991) Statistical Analysis of Reliability and Life-Testing Models: Theory and Methods. 2nd Edition, Marcel Dekker, New York.
- 25. Khan, K.H. and Mahmud, Z. (1999) Weibull Distribution Model for the Breast Cancer Survival Data Using Maximum Likelihood Method. Journal of Research (Science), 10, 45-49.
- 26. Swaminathan, R. and Brenner, H. (1998, 2011) Statistical Methods for Cancer Survival Analysis.
- 27. Zhang, L.J. and Rosenberger, W.F. (2007) Response-Adaptive Randomization for Survival Trials: The Parametric Approach. Journal of the Royal Statistical Society: Series C (Applied Statistics), 56, 153-165.

http://dx.doi.org/10.1111/j.1467-9876.2007.00571.x - 28. Harrison, P.J. and Stevens, C.F. (1976) Bayesian Forecasting. Journal of the Royal Statistical Society, 3, 205-228.
- 29. Greg, W. and Gary, B. (2004) An Introduction to the Kalman Filter. University of North Carolina, Chapel Hill.
- 30. Frank, S.S. (2006) Autonomous Mobile Robots: Sensing, Control, Decision-Making and Application. CRC/Taylor & Francis, Boca Raton.
- 31. Meinhold, R.J. and Singpurwalla, N.D. (1987) A Kalman-Filter Smoothing Approach for Extrapolations in Certain Dose-Response, Damage-Assessment, and Accelerated-Life-Testing Studies. The American Statistician, 41, 101-106.
- 32. Anatoli, I., Kenneth, G.M. and James, W.V. (1983) Mortality and Aging in a Heterogeneous Population: A Stochastic Process Model with Observed and Unobserved Variables. Theoretical Population Biology, 27, 154-175.
- 33. Ludwig, F. (1994) Dynamic Modeling and Penalized Likelihood Estimation for Discrete Time Survival Data. Biometrika, 81, 317-330.

http://dx.doi.org/10.1093/biomet/81.2.317 - 34. Kalbfleisch, J.D. and Prentice, R.L. (1980) The Statistical Analysis of Failure Time Data. Wiley. John Wiley & Sons, Inc., Hoboken.
- 35. Meinhold, R.J. and Singpurwalla, N.D. (1983) Understanding the Kalman Filter. The American Statistician, 37, 123-127.
- 36. Fletcher, R. and Powell, M.J.D. (1963) A Rapid Convergent Decent Method for Minimization. The Computer Journal, 6, 163-168.

http://dx.doi.org/10.1093/comjnl/6.2.163

Appendix-A

The Double-Lognormal Distribution

Consider p.d.f of log-normal distribution:

where and are parameters of the distribution.

Let and or,

then,

.

Let, we may write as:

To find maximum likelihood estimates, we consider negative log-likelihood function say) of the double- lognormal distribution, given as:

,

where, by excluding the censored part since observed values are from complete popula-

tion, are the number of failures (died) at time and may be obtained by replacing value of. We get as:

For partial derivatives, differentiating above equation with respect to and, we get

and