Open Journal of Statistics
Vol.05 No.06(2015), Article ID:60841,13 pages
10.4236/ojs.2015.56064
Robust Differentiable Functionals for the Additive Hazards Model
Enrique E. Álvarez1, Julieta Ferrario2
1Schools of Economics and Engineering, Universidad Nacional de La Plata y CONICET, La Plata, Argentina
2Department of Mathematics, Universidad Nacional de La Plata, Buenos Aires, Argentina
Email: enriqueealvarez@fibertel.com.ar, jferrario@mate.unlp.edu.ar
Copyright © 2015 by authors and Scientific Research Publishing Inc.
This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/



Received 10 September 2015; accepted 27 October 2015; published 30 October 2015
ABSTRACT
In this article, we present a new family of estimators for the regression parameter β in the Additive Hazards Model which represents a gain in robustness not only against outliers but also against unspecific contamination schemes. They are consistent and asymptotically normal and furthermore, and they have a nonzero breakdown point. In Survival Analysis, the Additive Hazards Model proposes a hazard function of the form
, where
is a common nonpara- metric baseline hazard function and z is a vector of independent variables. For this model, the seminal work of Lin and Ying (1994) develops an estimator for the regression parameter β which is asymptotically normal and highly efficient. However, a potential drawback of that classical estimator is that it is very sensitive to outliers. In an attempt to gain robustness, Álvarez and Ferrarrio (2013) introduced a family of estimators for β which were still highly efficient and asymptotically normal, but they also had bounded influence functions. Those estimators, which are developed using classical Counting Processes methodology, still retain the drawback of having a zero breakdown point.
Keywords:
Robust Estimation, Additive Hazards Model, Survival Analysis

1. Introduction
In Survival Analysis, a main goal is how to model a random variable
which is nonnegative and typically continuous and represents the waiting time until some events. A common method for collection of survival-type data consists in deciding on an observation window
over which n individuals are followed. Naturally, some events may take longer to occur than the window length
; also, some individuals can be lost from the sample due to different reasons (such as changing hospitals in clinical studies). In those cases, instead of an event time a censoring time is observed. In that manner, at the end of the observation window, the researcher ends up with a sample of triplets
where
represents either the true duration or the censoring time,
is the indicator that the observed time is uncensored, and
is a vector of individual covariates. Statistical models for that type of data are the main goal of the branch of statistics called Survival Analysis, and the relevant literature is by now enormous. Some clasical textbook-long treatises were Kalbfleish and Prentice (1980) [1] , Fleming and Harrington (1991) [2] , Andersen et al. (1993) [3] , Aalen et al. (2008) and [4] , among others.
A particular type of survival models with great appeal among practitioners focuses on the so-called hazard function
, which intuitively measures the instantaneous risk of the occurrence of the event at any given moment in time. While the most widespread model for the hazard function was the semiparametric Multiplicative Hazards Model due to Cox (1972) [5] , a popular alternative for datasets without proportionality of hazards was the Additive Hazards Model (AHM) presented by Aalen (1980) [6] . With time-fixed covariates, the latter proposes that
, where
is a vector of p nonnegative parameters. An estimation method for
and the nonparametric baseline function
for this model were first described in a seminal article by Lin and Ying (1994). They proposed an estimating equation for the Euclidean parameter
which was independent of
and which had the additional benefit of yielding an estimate in closed form, in addition to being consistent and asymptotically normal. It’s drawback, however, lies in the sensitivity to outliers.
Within the Cox model, the potential harmful effects of outliers were commented by Kalbfleisch and Prentice (1980, ch. 5) [1] and Bednarski (1989) [7] . Robust alternatives were first introduced by Sasieni (1993a, 1993b) [8] [9] by essentially modifying the Cox’s partial likelihood score function introducing weight functions. Along the same line, important work had been developed by Bednarski (1993) [10] , who proposed estimators that were consistent and efficient not only at the model but also on small contaminated neighbourhoods. His estimators had the advantage of being Fréchét differentiable for a wide class of weight functions.
As for the Additive Hazards Model, the proposal of robust alternatives has received much less atention in the literature. In Álvarez and Ferrario (2013) [11] , we introduced a family of estimators for the Euclidean parameter


where





where H is the noncontaminated distribution that belongs to the additive hazards family and Q represents a point mass at its argument. For the practitioner, estimators with bounded influence functions are of interest when (s) he seeks a guard against a very small proportion of outliers.
Appart from the fact that the contamination scheme above is very specific, a further drawback of the estimators presented in Álvarez and Ferrario (2013) [11] is that they have a zero breakdown point. Heuristically this means that just a small proportion of contamination, strategically located, is sufficient to drive the estima- tors nonsensical. Different notions and measures of robustness and their implications are developed in many classical books, such as Maronna, Martin and Yohai (2006) [12] , Huber and Ronchetti (2009) [13] and Hampel et al. (1986) [14] .
In this article we propose a new family of robust estimators for the additive hazards model in a manner similar to Bednarski (1993) [10] . This is, we start from Lin and Yings’ estimating equation and modify it by introducing appropriate weight functions that retain consistency and assymptotic normality while improving robustness, in the sense that the resulting estimating functionals are Fréchét differentiable about small neighborhoods of the true model. That type of differentiability entails three important consequences: 1) that the proposed family of estimators has bounded influence functions; 2) that they have a strictly positive (nonzero) breakdown point; and 3) that consistency and asymptotic normality hold about small neighbourhoods of the true model for generic contamination schemes, i.e. our family of estimators resists not only the outlier-type contamination presented in (2) but also small deviations in the structure of the model itself. For instance, one could contaminate with model a which does not have additive hazards, or with a model in which T and

The advantage of the estimators we present in this paper over previous proposals arises whenever a dataset contains outliers. When a sample is contaminated by unusual observations, the classical estimator (Ling and Yings) rapidly becomes nonsensical (in that its value drifts away towards zero or infinity). The estimators in Álvarez and Ferrario [11] on the other hand, while they resist contamination by large times or large values of the covariates, they exhibit no advantage against more involved types of contamination. Here we develop a family of estimators that resist arbitrary contamination schemes. This paper is organized as follows, in section we introduce the estimating method and we construct explicitely the Additive Hazards Family of distribution functions for survival data. In subsection we prove that our estimators are Fréchét differentiable. That entails asymptotic normality not only at true distributions in the additive hazards family, but also under contiguous alternatives. In order to assess the performance of the proposed method in small samples, section contains a small simulation study which serves two purposes: 1) it illustrates the improvement of our proposed estimators from the robustness point of view against the classical counterparts; and 2) it exhibits a non-zero breakdown point which is apparently fairly high. A simulation approach to the breakdown point is important because it is not feasible to compute it analytically. That is in part beacause the calculations involve are formidable, as they involve identifying the worst possible contaminating distribution. But more importantly, it is because the breakdown point depends on the joint distribution of the triplet
2. Robust Differentiable Estimators
Let




so-called at risk process defined by

where for a column vector v, we denote the matrix


Using classical Counting Process theory, Lin and Ying prove that their

In order to propose a Fréchét differentiable alternative to the classical (Lin and Ying’s) estimator we need to express the estimator as a functional of the joint empirical distribution function and we need to make explicit the structure of the Additive Hazard Family of distributions. We pursue this as follows.
2.1. Construction of the Additive Hazards Family
Event times: Let





Covariates: The covariates

Censoring: Conditional on Z, censoring and event times are independent, i.e.
Observed times: Due to censoring, the observed times are



the joint density of T and Z is

We now develop the joint bivariate distribution function

Censoring indicator: Let

Thus taking the derivative with respect to t we obtain

We define that a cummulative joint distribution function










Now we express the clasical estimator



where we introduce the process


empirical distributions.
In Alvarez and Ferrario (2013) [11] we illustrated that the classical estimator is very sensitive to outliers and we have shown that its influence function is unbounded. In this article we propose an alternative family of estimators which is robust not only against outliers but also against unspecific contamination, in that the defining functional is not only continous but also Fréchét differentiable. This entails a nonzero breakdown point and bounded influence curve. As a reference, the implication and uses of Fréchét differentiable statistical functionals in Asymptotic Statistics and Robust Statistics are thoroughly presented in Bednarski (1991) [17] .
2.2. Fréchét Differentiability
In order to define contamination e-neighborhoods, let



that G is in a neighborhood


point mass at some triplet

We propose here an estimating equation by introducing weight functions in the classical formulation, i.e.

where



Naturally, in the the special case where


Let us denote by




where FD is a linear functional called “Fréchét derivative”. Notice that we opt here for a uniform type of differentiability over

In order to avoid excessive notation we will in the sequel develop the proofs without censoring. Let


sponding to the true value of the parameter

and a real function




by
In order that our family of estimators become Fréchét differentiable we will need the following assumptions.
Assumptions
A1) For all



A2) All the functions in


Assumptions A1) and A2) ensure differentiability. The compactness assumption in A2) is needed to allow posibly adaptive choices of W based on some preeliminary estimate of


We seek now a linear approximation of



For the first difference in functionals above, the following Lemma gives a linear approximation:
Lemma 1. Under assumptions A1) and A2),
where

Moreover,
As for the second difference in (11) we have:
Lemma 2. For any



where


Further, the following result gives a bound of


Lemma 3. Under assumptions A1) and A2) there are constants






At this point, for further results we need to add another assumption that guarantees the existence of the inverse of

A3) There is a pair of constants



Thus, the consistency of the estimator in a neighborhood of

Theorem 1. Let the family of functions





for all
Moreover, Fréchét differentiability is asserted as follows:
Theorem 2. Let



This implies that the Fréchét derivative of



In the following theorem we investigate convergence in distribution under contiguous alternatives to some distribution in the additive hazards family


Theorem 3. Let

for some constant

and





where



The result above implies that asymptotic normality holds not only under the true model but also under contiguous alternatives.
3. Simulations
In this section, we evaluate the performance of our proposed family of estimators via simulations. Specifically, we carry out three simulation experiments choosing for simplicity a single covariate






In the first simulation we study the behavior or our estimator, denoted “RD” (Robust Differentiable) for increasing sample sizes. We take


In the second simulation, we do a comparison among the classical estimator (LY), the bounded-influence- function (BIF) estimators proposed in Álvarez and Ferrario (2013) [11] , and the ones proposed in this paper (RD). This is done under outlier-type contamination, where an increasing percentage of the sample was replaced by a large covariate value equal to 10. We take 100 replicates for a sample size of
results of this experiment. For the BIF estimators we take the weight function

Table 1. Classical vs. robust differentible estimators in pure samples.
Table 2. Comparison of estimators with outliers.



Lastly, we carry out a third simulation experiment in order to detect what the breakdown points of the RD estimators may be under a different type of model departure. As a model for the contaminating distribution, we chose point masses on the line
Figure 1. Pure vs. contaminated sample. (a) Pure sample (b) Contaminated sample.
Table 3. Comparison of estimators under model contamination.
sample a size


Intuitively, the finite sample breakdown point of an estimator is the largest proportion of contaminated observations, and a method can resist before the estimates become nonsensical, which usually means that the estimate drifts away towards zero of infinity, or in general towards the boundaries of a parameter space. Equ- ivalently, its functional version is called the asymptotic breakdown point and it measures the largest proportion of contamination. A statistical functional could tolerate before becoming nonsensical in the same sense (e.g. Maronna, Martin and Yohai 2006 [12] for formal definitions). It is noteworthy that either in its finite sample or in its asymptotic version, calculating a breakdown point requires identifying the worst possible type of con- tamination. This would depend on the joint distribution of the triplet


Acknowledgements
We thank the editor and the referee for their comments. This work has been financed in part by UNLP Grants PPID/X003 and PID/X719. Julieta Ferrario further wishes to thank Tadeus Bednarski for generously sharing otherwise electronically unavailable manuscripts.
Cite this paper
EnriqueE. Álvarez,JulietaFerrario, (2015) Robust Differentiable Functionals for the Additive Hazards Model. Open Journal of Statistics,05,631-644. doi: 10.4236/ojs.2015.56064
References
- 1. Kalbfleisch, J.D. and Prentice, R.L. (1980) The Statistical Analysis of Failure Time Data. Wiley, New York.
- 2. Fleming, T.R. and Harrington, D.P. (1991) Counting Processes and Survival Analysis. Wiley, New York.
- 3. Andersen, P.K., Borgan, O., Gill, R.D. and Keiding, N. (1993) Statistical Models Based on Counting Processes. Springer-Verlag, New York.
http://dx.doi.org/10.1007/978-1-4612-4348-9 - 4. Aalen, O.O., Borgan, O. and Gjessing, H.K. (2008) Survival and Event History Analysis. A Process Point of View. Springer, New York.
http://dx.doi.org/10.1007/978-0-387-68560-1 - 5. Cox, D.R. (1972) Regression Models and Life-Tables (with Discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 34, 187-220.
- 6. Aalen, O.O. (1980) A Model for Nonparametric Regression Analysis of Counting Processes. In: Klonecki, N., Koesh, A. and Rosinski, J., Eds., Lecture Notes in Statistics, 2th Edition, Springer, New York, 1-25.
http://dx.doi.org/10.1007/978-1-4615-7397-5_1 - 7. Bednarski, T. (1989) On Sensitivity of Cox’s Estimator. Statistics & Decisions, 7, 215-228.
http://dx.doi.org/10.1524/strm.1989.7.3.215 - 8. Sasieni, P. (1993) Maximum Weighted Partial Likelihood Estimators for the Cox Model. Journal of the American Statistical Association, 88, 144-152.
- 9. Sasieni, P. (1993) Some New Estimators for Cox Regression. Annals of Statistics, 21, 1721-1759.
http://dx.doi.org/10.1214/aos/1176349395 - 10. Bednarski, T. (1993) Robust Estimation in Cox’s Regression Model. Scandinavian Journal of Statistics, 20, 213-225.
- 11. álvarez, E.E. and Ferrario, J. (2013) Robust Estimation in the Additive Hazards Model. Comm. Statist. Theory Methods, in press.
- 12. Maronna, R.A., Martin, R.D. and Yohai, V.J. (2006) Robust Statistics: Theory and Methods. Wiley Series in Probability and Statistics, John Wiley & Sons, Hoboken.
- 13. Huber, P.J. and Ronchetti, E.M. (2009) Robust Statistics. Wiley, New York.
http://dx.doi.org/10.1002/9780470434697 - 14. Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. and Stahel, W.A. (1986) Robust Statistics: The Approach Based on Influence Functions. Wiley, New York.
- 15. Ferrario, J. (2015) Estimación Robusta en Modelos de Supervivencia con Función de Hazard Aditiva. PhD Thesis, Univesidad Nacional de La Plata, La Plata, Argentina.
- 16. Lin, D.Y. and Ying, Z. (1994) Semiparametric Analisis of the Additive Risk Model. Biometrika, 81, 61-71.
http://dx.doi.org/10.1093/biomet/81.1.61 - 17. Bednarski, T., Clarke, B.R. and Kolkiewicz, W. (1991) Statistical Expansions and Locally Uniform Fréchét Differentiability. Journal of the Australian Mathematical Society, 50, 88-97.
http://dx.doi.org/10.1017/S1446788700032572
Appendix
Proof of Lemma 1. Rearranging,

So that substracting
Let







To simplify notation, let









where after distributing the inner brackets

A2), we can choose large enough values

the support of any function in

Take


where


where in order to simplify notation, we introduced the operator
Denote also the set




Since we chose the


In consequence,
Hence for all
which is bounded because of A1) and A2). i.e. for some constant,
Thus
so that






to claim that for some finite constants






focus on the terms

Similar calculations hold for the other
For the last assertion of the Lemma, substitute H by








Proof of Lemma 2. For any fixed


which is independent of
So substracting,
Proof of Lemma 3. Express
where

arguments as in Lemma 1, relying on integration by parts and Assumptions A1)-A2), we see that all the terms
are
Proof of Theorem 1. By Lemmas 1 and 3, for all




Also by Lemma 2,





Take now


for some fixed



So for


Equation (18), if some



which implies that
Proof of Theorem 2. Since

Also by Lemmas 1 and 2 respectively, for all


Adding the above Equations in (19) we get
Note that by Theorem 1, for




Now since by Assumption A3)



Proof of Theorem 3. Decompose
and by Glivenko-Cantelli’s Theorem


Since








































