Open Journal of Statistics
Vol.06 No.01(2016), Article ID:63764,23 pages
10.4236/ojs.2016.61013
Transformation Models for Survival Data Analysis with Applications
Yang Liu1, Qiusheng Chen2, Xufeng Niu2
1Senior Biometrician, Merck Research Laboratories, Rahway, NJ, USA
2Department of Statistics, Florida State University, Tallahassee, FL, USA

Copyright © 2016 by authors and Scientific Research Publishing Inc.
This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/


Received 21 December 2015; accepted 22 February 20146; published 25 February 2016
ABSTRACT
When the event of interest never occurs for a proportion of subjects during the study period, survival models with a cure fraction are more appropriate in analyzing this type of data. Considering the non-linear relationship between response variable and covariates, we propose a class of generalized transformation models motivated by Zeng et al. [1] transformed proportional time cure model, in which fractional polynomials are used instead of the simple linear combination of the covariates. Statistical properties of the proposed models are investigated, including identifiability of the parameters, asymptotic consistency, and asymptotic normality of the estimated regression coefficients. A simulation study is carried out to examine the performance of the power selection procedure. The generalized transformation cure rate models are applied to the First National Health and Nutrition Examination Survey Epidemiologic Follow-up Study (NHANES1) for the purpose of examining the relationship between survival time of patients and several risk factors.
Keywords:
Link Functions, Mixture Cure Rate Models, Noninformative Improper Priors, Proportional Hazards Models, Proportional Odds Models

1. Introduction
Survival data analysis is an important topic in statistics that focuses on analyzing the expected duration of time until one or more events occur, such as death or cancer in a targeted population. In a standard survival model, it is often assumed that all uncensored subjects will eventually experience the event of interest, which is described by a monotone decreasing survival function
. The function
goes to 0 when time t tends to infinity. Survival time T is a continuous nonnegative random variable representing the time of an event. The probability of a subject’s surviving till time t is given by
, where
is the distribution function with probability density
. The hazard function
is defined as the instantaneous failure rate at time t conditional on survival until time t or later. The cumulative hazard
is defined as
, which represents the total amount of risk up to time t. Usually covariates, such as gender, age,
weight, blood pressure, heart rate, stage of surgery, etc., are modeled through survival models. In this paper, we assume that the covariates are independent of time.
Cox [2] brought the idea of separating time t and individual covariate vector
in the hazard function, which led to the popular proportional hazard model with

where
was the baseline hazard function and
was a vector of regression coefficients.
However, in some situations, the event of interest never occurs for a significant proportion of subjects. For example, in a cancer clinical trial, the endpoint of interest is often recurrence. For some patients, the disease will never relapse after being treated. These patients are considered cured. Sometimes, subjects with long-term censored times can be viewed as “cured” as well. Survival models with a cure fraction are very popular in analyzing this type of cancer clinical trials.
Motivated by the transformed proportional time cure model introduced by Zeng et al. [1] , we propose a class of generalized transformation models to characterize the non-linear relationship between survival function
and relate covariates. Statistical properties of the proposed models are investigated, which include iden- tifiability, asymptotic consistency, and asymptotic normality of the estimated regression coefficients. Powers of fractional polynomials within the proposed models are selected based on the likelihood function. A simulation study is carried out to examine the performance of the power selection procedure. The generalized trans- formation cure rate models are applied to coronary heart disease and cancer related medical data from both observational cohort studies and clinical trials.
The first cure rate model is the mixture cure rate model proposed by Berkson and Gage [3] , which combines the cured and non-cured populations by using a summation function. In their model, the survival function for the entire population, denoted by
, is given by

where 



Even though the mixture model introduced by Berkson and Gage [3] , is attractive and widely used, it has several drawbacks. One of them is that the mixture model cannot have a proportional hazards structure if the covariates are modeled through
Yakovlev and Tsodikov [11] , Tsodikov [12] , Chen et al. [13] , and Zeng et al. [1] proposed and studied promotion time cure model. Instead of dividing the population into two sub-populations so that some subjects are long-term survivors with probability 
















The promotion time cure model avoids the drawbacks of a mixture model and has a proportional hazards structure through the cure rate parameter. Chen et al. [13] also proposed classes of noninformative and infor- mative priors for promotion time cure rate model that lead to proper posterior distributions.
The promotion time cure rate model and the mixture cure rate model are linked by a mathematical relation- ship, and can be rewritten in a uniform format. Zeng et al. [1] proposed a general promotion time cure model with transformation. Their model includes proportional hazards model and proportional odds model as special cases. To take into account the unknown and unobservable risk factor for each individual, they used a subject- specific frailty variable


Different parametric distributions may be applied to the frailty




As Zeng et al. [1] pointed out, (1.3) provides a very wide class of transformation cure models with the form:

where

When 

The proportional hazards model in (1.1) is a special case of the transformation families (1.5) and (1.6) corre- sponding to 



From model (1.4) the cure fraction is
where 
The covariates can be modeled through a known and strictly positive increasing link function 


In this paper, we extend the transformed proportional time cure model proposed by Zeng et al. [14] to a more general class of transformation models, in which fractional polynomials are used instead of the simple linear combination of the covariates. The statistical properties of our proposed models will be investigated. Estimation and model selection procedures will be discussed. The rest of the paper is organized as follows. In Section 2, we introduce the generalized transformation models and study the identifiability and asymptotic properties of the proposed models. In Section 3, simulation studies are conducted for the purpose of assessing the performance of the power selection procedure. In Section 4, the proposed models will be applied to some real datasets and compared with other models. Conclusions and some discussions are given in Section 5. Proofs of the theorems in Section 2 are provided in Appendix.
2. Proposed Models and Their Properties
In survival data analysis, the relationship between hazard rates and covariates is quite often nonlinear. Motivated by Zeng et al. [1] , we propose a generalized transformation cure model by using a general additive function of 

The additive models were introduced by Stone [15] , which is defined by





and







For some data sets, especially data from medical studies, fractional polynomials may give a better fit com- pared to the conventional polynomial. In our proposed models we use a fractional polynomial instead of 



where

dummy variables, and 

sidered in (2.2) when we assume that







In a typical survival analysis setting, survival times are often right censored, which means for some subjects we do not know when exactly the failures occurred, but we do know that the survival time is at least beyond some certain time point C. Suppose that there are n right censored subjects. For the ith individual the survival time and the fixed censoring time are denoted by 


The observed time point for the ith subject is





In a proportional hazard model, the regression coefficient 
In the model (1.4) with link function (2.2), if the parameter 

Given observations (




where 


The three pieces of products in (2.4) are for failures, censored cases, and subjects who never experience failure or censoring, respectively. The estimate of

2.1. Model Identifiability
For the statistical properties of our proposed models, we first discuss the identifiability of generalized trans- formation models. Suppose that we use models (1.4) and (1.5) with the link function defined in (2.2). The observed-data likelihood function of parameters 

The following two lemmas give sufficient conditions of identifiability to a more general class of transfor- mations that include the transformation (1.5) as a special case. Proofs of the lemmas are given in Appendix.
Lemma 1. If 
(G1) 


(G2) If

Then 



It can be shown that the transformation family given in (1.5) satisfies both conditions (G1) and (G2). Speci- fically, we have 

Other transformation families can also be considered as long as the conditions (G1) and (G2) hold. For example, the Box-Cox type transformation discussed in Zeng et al. [1] , also satisfies conditions (G1) and (G2) with 

Next, we consider the following function

where 




Function in (2.6) is a more general function than that defined in (2.2). The following lemma show that the parameters

Lemma 2. For the function 



Based on the results in Lemma 1 and Lemma 2, we have the following theorem on the identifiability of the generalized transformation models.
Theorem 1. For the generalized transformation models defined in (1.4) and (1.5) with the link function specified in (2.2), if



2.2. Estimation
Zeng et al. [1] discussed semiparametric transformation models for survival data with a cure fraction and estab- lished theorems describing the asymptotic properties of the maximum likelihood estimation of



To obtain consistency and asymptotic normality, we make the following assumptions:
(C1) The covariate X belongs to a compact set
(C2) The vector of regression coefficients 




(C3) 




(C4) Conditional on

(C5) The positive link function 

(C6) The transformation 




Under conditions (C1)-(C6), we can prove the following theorems.
Theorem 2. The maximum likelihood estimates 
Theorem 3. 
Sketched proofs of Theorem 2 and Theorem 3 are provided in Appendix.
3. Simulations
In this section, we conduct simulations to study the empirical properties of the generalized transformation models and to examine the performance of the proposed power selection procedure on generalized transfor- mation models. The model used in this simulation was given in Zeng et al. [1] and has a survival function of the form:

where 
For the purpose of illustration, only one continuous variable 



where 









Survival times of subjects with covariates 











with probability
Assume each subject being right-censored with a probability


The complete data set 

The whole population is categorized into three groups: right-censoring events when


The coefficients







Table 1 shows the power selection results under the proposed generalized transformation model based on 200 simulated data sets with q = 80% and sample sizes 2000 and 5000, respectively. The columns labeled “mean” are the average of the selected powers and the columns labeled “freq.” are the number of times of selecting the
Table 1. Results of power selection under the proposed generalized transformation model based on 200 simulated data sets with coefficients



true power in the 200 simulations. When the sample size is 5000, the power selection procedure work well. The accurate rates of choosing the true power are higher than 50% and the means of the selected powers are very close to the true value for most of the cases. For example, when 






Table 2 presents more results on power selection with 

In this simulation we assume that the probability of each subject being censored is q = 80%. In fact, the probability q basically does not affect the performance of the power selection procedure. When q takes different values while other factors in the simulation remain the same, the power selection results show a very similar pattern as that when q = 80%.
4. Applications
In this section, we will illustrate the applications of the proposed generalized transformation models and compare the proposed models with the Cox proportional hazards model and the Zeng et al. [1] transformation cure model by analyzing data from the First National Health and Nutrition Examination Survey Epidemiologic Follow-up Study (NHANES1). The NHANES1 data set is from the Diverse Populations Collaboration (DPC), which is a pooled database contributed by a group of investigators to examine issues of heterogeneity of results in epidemiological studies. The database includes 21 observational cohorts studies, 3 clinical trials, and 3 national samples. In the dataset NHANES1, information for 14,407 individuals was collected in four cohorts from 1971 to 1992. In this analysis, we use data from two of the four cohorts, the black female cohort and the black male cohort. After dropping all missing observations, a total of 2027 patients remains in these two cohorts, including 1265 black females and 762 black males. Survival times of the 2027 patients are used as the response variable. The endpoint is the overall survival time collected in 1992. In the two cohorts 848 patients, about 40% of the total number of patients, died at the end of followup with a maximum survival life time of 7691 days. There were 1179 patients whose survival times were right censored, among them 115 patients had survival time longer than 7691 days. We consider these 115 patients as cured subjects.
Covariates selected by fitting the Cox model and using the stepwise backward elimination algorithm will be
Table 2. Results of power selection under the proposed generalized transformation model based on 200 simulated data sets with sample size n = 5000 and the probability of each subject being right-censored q = 80%.
included to compare different survival models. These covariates are Age, Systolic blood pressure (Sbp), Sex, Body Mass Index (BMI), Diabetes (Diab), and Coronary heart disease (Chd). Summary statistics of continuous covariates are list in Table 3. Diab and Chd are categorical and only take the values of 0 and 1 for absence and presence of the corresponding disease. Among the 2027 patients in the two cohorts, there were 121 of them having diabetes and 82 of them having coronary heart disease.
The results of the Cox proportional hazard model are summarized in Table 4. All covariates are highly signi- ficant at the 
A transformation of 
There are three continuous covariates in our analysis, Age, BMI, and Sbp. The main relationship of interest is between mortality and the factor BMI. In the next step, we will focus on choosing an appropriate power from the set A = (−2, −1.5, −1, −0.5, 0, 0.5, 1, 1.5, 2) for BMI within our proposed models. To do so, we fit models
with link function



Table 3. Summary statistics of continuous covariates in the NHANES1 study.
Table 4.Fitted Cox proportional hazards model for the NHANES1 study.
Figure 1. Log-likelihood in Zeng et al. [1] model from transformation (1.5) with different γ for the NHANES1 study.
Table 5. Estimates of regression coefficients in Zeng et al. [1] model based on transformation class (1.5) with γ = 0 for the NHANES1 study.


In model (4.1), when we fix Age and Sbp, power 





The corresponding estimates of regression coefficients are listed in Table 6.
Now let us compare the Cox model, Zeng et al. [1] models, and the proposed models by using the Brier score. The Brier score was originally proposed by Brier [16] to verify the accuracy of weather forecasts and then
Figure 2. Log-likelihood and selected power in the proposed models from transformation (1.5) for the NHANES1 study. (a) Model (4.1), (b) Model (4.2), (c) Model (4.3), (d) Model (4.4).
Table 6. Estimates of regression coefficients in the proposed models (4.1) based on transformation class (1.5) with 

extended by May et al. [17] to survival models. The Brier score (BS) at time 

where n is the total sample size, 




Table 7. Brier scores for different survival models for the NHANES1 study.
It is obvious that the Brier score takes minimum value of 0 for perfect prediction of survival status and its range is from 0 to 1. The lower the value of the Brier score, the better the prediction. To compare the Cox model, Zeng et al. [1] models, and proposed models, we calculated the Brier scores at the first quartile Q1, median Q2, and last quartile Q3 of 848 uncensored survival times in the NHANES1 study. The results are summarized in Table 7. We can see that the proposed models has the smallest Brier scores at all there time points. For example, at the median uncensored survival time Q2 = 3894.5 days, the Brier score is 0.1396 for the Cox model. It is 0.1308 for Zeng et al. [1] model. The value of Brier score drops to 0.1297 for the selected proposed model, which indicates the chosen proposed model can well predict the survival outcome as the other two models, and sometimes better.
5. Conclusions and Discussion
In this paper, we proposed a class of generalized transformation models. Zeng et al. [14] introduced semi- parametric transformation models for survival data with a cure fraction, which included the commonly used proportional hazards cure rate models and proportional odds models as special cases. Similar to the structure suggested in Zeng et al. [1] , covariates related to the event of interest were modeled through a link function




The proposed generalized transformation models can be applied to a variety of survival data. Even though the cure models are motivated from clinical trials where the end point is not death, such as relapse-free survival time, it can be used to overall survival time as well. In this article, the applications of the proposed models are illu- strated by examining the relationship between the survival time of a patient and several risk factors based on two cohorts data from the First National Health and Nutrition Examination Survey Epidemiologic Follow-up Study. In terms of the Brier scores, the selected proposed model provides better fitting compared with the Cox proportional hazards model and the Zeng et al. [1] transformation cure model. It should be pointed out that even though the Brier score is commonly used in practice for model comparison, it has its own disadvantages. For instance, although the Brier score can be calculated at any arbitrary time point, but it dose not discriminate competing models over the whole time period. Other model comparison methodologies will be explored in our future study. For example, receiver operating characteristic (ROC) curves may be used to measure the diffe- rences of the models over all the relevant time periods.
Acknowledgements
The authors would like to thank Dr. Donglin Zeng from the University of North Carolina at Chapel Hill for sharing his original MATLAB code with us, and to thank the Diverse Populations Collaboration Group for providing data from their studies. The authors would also like to thank the Editor, the Associate Editor, and the referees for their insightful comments and suggestions that provide guidelines for the authors to revise the paper.
Cite this paper
YangLiu,QiushengChen,XufengNiu, (2016) Transformation Models for Survival Data Analysis with Applications. Open Journal of Statistics,06,133-155. doi: 10.4236/ojs.2016.61013
References
- 1. Zeng, D., Yin, G. and Ibrahim, J.G. (2006) Semiparametric Transformation Models for Survival Data with a Cure Fraction. Journal of the American Statistical Association, 101, 670-684.
http://dx.doi.org/10.1198/016214505000001122 - 2. Cox, D.R. (1972) Regression Models and Life-Tables. Journal of the Royal Statistical Society, 34, 187-220.
http://www.jstor.org/stable/2985181?seq=1#page_scan_tab_contents - 3. Berkson, J. and Cage, R.P. (1952) Survival Curve for Cancer Patients Following Treatment. Journal of the American Statistical Association, 47, 501-505.
http://dx.doi.org/10.1080/01621459.1952.10501187 - 4. Farewell, V.T. (1982) The Use of Mixtures Models for the Analysis of Survival Data with Long-Term Survivors. Biometrics, 38, 1041-1046.
http://dx.doi.org/10.2307/2529885 - 5. Gary, R.J. and Tsiatis, A.A. (1989) A Linear Rank Test for Use When the Main Interest Is in Differences in Cure Rates. Biometrics, 45, 899-904.
http://dx.doi.org/10.2307/2531691 - 6. Sposto, R., Sather, H.N. and Baker, S.A. (1992) A Comparison of Tests of the Difference in the Proportion of Patients Who Are Cured. Biometrics, 48, 87-99.
http://dx.doi.org/10.2307/2532741 - 7. Laska, E.M. and Meisner, M.J. (1992) Nonparametric Estimation and Testing in a Cure Model. Biometrics, 48, 1223-1234.
http://dx.doi.org/10.2307/2532714 - 8. Sy, J.P. and Taylor, J.M.G. (2000) Estimation in a Cox Proportional Hazards Cure Model. Biometrics, 56, 227-236.
http://dx.doi.org/10.1111/j.0006-341X.2000.00227.x - 9. Lu, W. and Ying, Z. (2004) On Semiparametric Transformation Cure Models. Biometrika, 91, 331-343.
http://dx.doi.org/10.1093/biomet/91.2.331 - 10. Ibrahim, J.G., Chen, M.-H. and Sinha, D. (2001) Bayesian Survival Analysis. Springer Series in Statistics, First Edition, Springer, New York.
http://dx.doi.org/10.1007/978-1-4757-3447-8 - 11. Yakovlev, A.Y. and Tsodikov, A.D. (1996) Stochastic Models of Tumor Latency and Their Biostatistical Application. First Edition, World Scientific, Singapore.
- 12. Tsodikov, A. (1998) A Proportional Hazards Model Taking Account of Long-Term Survivors. Biometrics, 54, 1508-1516.
http://dx.doi.org/10.2307/2533675 - 13. Chen, M.-H., Ibrahim, J.G. and Sinha, D. (1999) A New Bayesian Model for Survival Data with a Surviving Fraction. Journal of the American Statistical Association, 94, 909-919.
http://dx.doi.org/10.1080/01621459.1999.10474196 - 14. Royston, P. and Altman, D.G. (1994) Regression Using Fractional Polynomials of Continuous Covariates: Parsimonious Parametric Modeling. Applied Statistics, 43, 429-467.
http://dx.doi.org/10.2307/2986270 - 15. Stone, C.J. (1985) Additive Regression and Other Nonparametric Models. Annals of Statistics, 13, 689-705.
http://dx.doi.org/10.1214/aos/1176349548 - 16. Brier, G.W. (1950) Verification of Forecasts Expressed in Terms of Probability. Monthly Weather Review, 78, 1-3.
http://dx.doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2 - 17. May, M., Roystion, P., Egger, M., Justice, A.C. and Sterne, J.A. (2004) Development and Validation of a Prognostic Model for Survival Time Data: Application to Prognosis of HIV Positive Patients Treated with Antiretroviral Therapy. Statistics in Medicine, 23, 2375-2398.
http://dx.doi.org/10.1002/sim.1825 - 18. van der Vaart, A.W. and Wellner, J.A. (1996) Weak Convergence and Empirical Processes. Springer Series in Statistics, First Edition, Springer, New York.
http://dx.doi.org/10.1007/978-1-4757-2545-2
Appendix: Proofs of the Main Results
In Appendix, we first prove the Lemmas on model identifiability listed in Section 2.1. Then we will show the asymptotic properties of the semi-parametric estimates in the proposed models under conditions (C1)-(C6) given in Section 2.2. Proofs of the Theorems 2 and 3 are similar to those of Theorems 1 and 2 in Zeng et al. [14] with some modifications.
a. Proofs of Model Identifiability
Proof of Lemma 1: Suppose that 


then we will have the following two equations about

The inverse function of 



We want to show that 









Suppose that 


Calculating the first and second order derivatives in both sides of (3), plugging in







Proof of Lemma 2: Suppose that




ample, and only consider


Without loss of generality, assume that 

Let 


Because the function in the left side on (A.6) is analytic in some interval








To prove the identifiability of the coefficient of a categorical covariate, for example




Thus all parameters in 
b. Proofs of Strong Consistency of the Maximum Likelihood Estimates
Let 
surable function

dependent right censored observations. For the ith observation, we have



In applications we may use
to differ the cured and uncured population. which will not affect the proof of consistency and asymptotic nor- mality of the maximum likelihood estimates.
The modified semi-parametric version observed-data likelihood function of parameters





reaches its maximum. The log likelihood function 

where 








by the method of Lagrange multipliers, where 

for

Equation (A.11) can be written as

for i 





Since 

from 


almost surely since 


pointwise. Notice that 



The structure of the limit function 

Lemma 3. Under conditions (C1)-(C6), 
(A.15)
Actually, the right hand side of Equation (A.14) converges to


Lemma 4. Under conditions (C1)-(C6), 

Proof: Because

Letting


We then calculate the right hand side of (A.17) by using conditional expectations.

where

Function 



stants 


(A.18), we then have
It can be shown that 


the continuity of 



which implies that 




Based on the results in Lemma 4, for a given M there exists 

any


A. W. and Wellner, J. A. [18] ) because 




Following the calculations in (A.18), we have 

Based on the expression of







where 



Then it is obviously


The calculation here is similar to that in (A.18) with 


Because 

For the strong convergency of the maximum likelihood estimates, we need to show that
Letting 



cludes 

Theorem 2. Under conditions (C1)-(C6),
based on the modified likelihood function are strongly consistent, that is 




Proof: First of all, we want to prove

by a direct calculation.
Because 

and
Thus from (A.26) we have



We have proved that any subsequence of




We also proved that 










c. Proofs of Asymptotic Normality of the Maximum Likelihood Estimates
We consider the likelihood function

and write 

Then the log likelihood function, denoted by

Lemma 5. For any 

where 



Proof: By Jensen inequality

From (A.29) we can derive a differential equation with

(1) 

(2)

(3) For 


Under conditions (1)-(3), 







Particularly, we can construct 





Define

We can show 

Let us consider a modified semiparametric version likelihood function,

where

For any 

we have

where 

Similar to the continuous case, now we can derive a differential equation with

(1)′



(2)′ The summation of 


(3)′ When 


Under conditions (1)′ -(3)′, 





Define

With such a function 

Now, let us consider functions h with bounded total variation such that 



where
Lemma 6. With the notations defined in (A.39), we have

where 

Proof: It follows from (A.32) and (A.38) that

We consider a class



Using the Taylor expansion at 

Lemma 7. The linear operator 

Proof: Decompose 



We can show 









with probability one, we have

which implies 

Theorem 3. Under condition (C1)-(C6), 

Proof: Because 


Immediately from (A.44) and (A.48), we have

Equation (A.49) holds uniformly for any 






































