Open Journal of Statistics
Vol.04 No.02(2014), Article ID:43291,18 pages
10.4236/ojs.2014.42012
On the Convergence of Observed Partial Likelihood under Incomplete Data with Two Class Possibilities
Tomoyuki Sugimoto
Department of Mathematical Sciences, Hirosaki University, Hirosaki, Japan
Email: tomoyuki@cc.hirosaki-u.ac.jp
Copyright © 2014 Tomoyuki Sugimoto. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In accordance of the Creative Commons Attribution License all Copyrights © 2014 are reserved for SCIRP and the owner of the intellectual property Tomoyuki Sugimoto. All Copyright © 2014 are guarded by law and by SCIRP as a guardian.


ABSTRACT
Received October 21, 2013; revised November 21, 2013; accepted November 28, 2013
In this paper, we discuss the theoretical validity of the observed partial likelihood (OPL) constructed in a Cox- type model under incomplete data with two class possibilities, such as missing binary covariates, a cure-mixture model or doubly censored data. A main result is establishing the asymptotic convergence of the OPL. To reach this result, as it is difficult to apply some standard tools in the survival analysis, we develop tools for weak convergence based on partial-sum processes. The result of the asymptotic convergence shown here indicates that a suitable order of the number of Monte Carlo trials is less than the square of the sample size. In addition, using numerical examples, we investigate how the asymptotic properties discussed here behave in a finite sample.
Keywords:
Cox’s Regression Model; Logistic Regression Model; Incomplete Binary Data; Partial Likelihood; Partial-Sum Processes; Profile Likelihood
1. Introduction
Although the Cox model [1] is a standard tool for the analysis of time-to-event data, in practice analysts are often confronted with some problems in handling incomplete data beyond the right-censored form, such as interval-censored data [2], missing covariates [3-5] or (statistical) structural modelling. The inference for the Cox model under such cases of incomplete data can usually be performed based on the semiparametric profile likelihood (SPFL) [6] as a generalization of Cox’s partial likelihood [7]. On the other hand, as a substitute for the SPFL method, one can analyse the same data using the imputation method, which yields a sum of partial likelihoods. By describing the sum of all possible partial likelihoods more exactly, we can formulate the marginal of partial likelihoods [8], that is, the observed partial likelihood (OPL). In this area, the theory for the SPFL has been studied by many authors (e.g., [6,9]). However, to the best of our knowledge, there has not been much development in mathematical theory for the OPL.
In this paper, we discuss the theoretical validity of the OPL which appears in a Cox-type model under incomplete data beyond the right-censored form. In this area, one advantage of the OPL is that the baseline hazard function as a nuisance included in a Cox-type model is eliminated completely in the inferential likelihood. This yields a more stable computational system for optimization than that of the SPFL. For example, in a Cox-cure model, a computational process based the EM algorithm to obtain the SPFL easily fails to converge if a suitable starting value is not provided (e.g., see [10]). The main disadvantages of the OPL are, for instance, that a great length of time is required for the exact computation and it is not clear how much the amount of computation can be reduced by the Monte Carlo (MC) method. However, even if the feasible number of MC trials is smaller than desirable to approximate the OPL, and hence the MC approximation is quite rough, it may be sufficient for a starting value in the computational process of the SPFL.
Generally, it is difficult to investigate computationally to what extent the MC approximations of the OPL are valid, since the exact computation requires a huge number of summands, as the sample size and incomplete information of data are increasing. For this reason, it is worth studying the OPL theoretically. However, it is not easy to complete such a study in one go, because standard tools to study asymptotic properties of Cox’s partial likelihood or the SPFL cannot be applied directly to an objective of the OPL. Therefore in this paper, for the sake of simplicity, we focus on the OPL constructed in incomplete data composed of unobserved two class labels. Typical cases of this type occur in a Cox-type model with incomplete data, such as missing binary covariates, a cure-mixture model or doubly censored data. As a main result, we establish the asymptotic convergence of the OPL and derive a limit form of the OPL. This result is a foundation or precondition for applying an infinite-dimensional Laplace approximation for integral on the baseline hazard. Such a Laplace approximation method will yield the other limit form of the OPL [11], which is useful in discussing the consistency and asymptotic normality of the estimators. However, the method is not convenient for showing the convergence of the OPL. For these reasons, it is also valuable to discuss the convergence of the OPL using the arguments employed in this paper.
A matter of interest in practice concerns MC approximations of the OPL. One other significant point is that the result for the convergence of the exact OPL can be easily tailored to the context of the MC approximations. Based on such an argument, we show that a suitable order of the number of MC trials is less than O(n2) Further, in Section 4 we investigate how the asymptotic properties discussed here behave in a finite sample.
In Section 2 we formulate the OPL in incomplete data with two class possibilities, providing several examples of interest; in Section 3 we develop the tools to obtain the main result and show the convergence of the OPL, and in Section 4 we discuss the performances of MC approximations.
2. Observed Full and Partial Likelihoods
2.1. Notations and Motivated Examples
Let
and
be the observed survival time and right-censoring indicator of the
individual, where
are continuous random variables independent of
and
is the indicator function. Suppose that the individuals possess some difference between models or observations identified by the two classes. We define such a class variable by

In the case that
expresses the difference between models, assume that the distribution of
follows the proportional hazards model formulated as


where
is the baseline hazard function,
is the function given by

is the covariate vector
from the population of the class
, and
is the regression
coefficient vector
As usual, the information on
can be re-expressed using the counting processes

In this paper, we consider incomplete data where some of the
Each of these is used to construct the likelihoods. Further, if the event of

where






Let


bilities, the observed full likelihood (OFL) can be generally written as

with the elements such that
and





is the survival function of the






In the following three examples, we show how the form of the OFL is related to the representative cases. Hereafter, we will often omit



Example: Missing Binary Covariates. Let us assume that


first covariate is binary and may be missing,


where

Using the binomial expansion, this can be rewritten as

Example: Cox Cure-Mixture Model. The Cox cure-mixture model [10,12-15] is presumed to hold the proportional hazards model for uncured individuals and to be zero-hazard for cured ones. That is, we assume


We observe that





note here that


Example: Doubly Censored Data. In doubly censored data [16,17], left-censored data may be included. Let



In the phenomenal meaning, the common model is assumed regardless of the type of observations, but we do not define







where note that





2.2. Observed Partial Likelihood
Let



where
Let





and
Using these expressions, let
where


be the n-dimensional version of Minkowski’s measure



where





and
which is


where






Remark 1. In (2.5), even if we consider a difference or quotient between








3. Convergence of the Observed Partial Likelihood
We will now discuss how the mean of the log OPL converges to a deterministic function and provide Theorem 1 of the main result. The following conditions are assumed for these discussions.
Conditions A. Let









A1:


A2:
A4:
Condition A2 means






right-censoring time under a given







Theorem 1. Suppose that Conditions A are satisfied. Then, as


Theorem 1 is proved in Section 3.3. We prepare useful tools for such a proof in Sections 3.1 and 3.2 below. In Section 3.1, we discuss a relation needed to show that two OPL’s converge to the same limit, determining a plan (Lemmas 1 and 2) to obtain Theorem 1. In Section 3.2, following the plan, we provide a tool (Lemma 3) to give a weak convergence of all possible partial-sum processes.
3.1. Relations between Two Observed Partial Likelihoods
Note that the OPL is constructed by an integral on



be functions that exist around



Lemma 1. Suppose that

with probability 1. Then, as
where

(Proof of Lemma 1). Using Taylor expansions of

the difference between


where, with some


Let us assume that







are shown by

Applying (3.1) to the above inequality, this lemma is proved.
Using Lemma 1, for several patterns of





Lemma 2. Suppose that

where

Remark 2. For simplicity, letting
denote
as the area of

because it is always satisfied that
Thus, we show that the operators of


(Proof of Lemma 2). Note that
because








This shows

Therefore, condition (3.2) gives (3.1).
3.2. All Possible Partial-Sum Processes
Note that, by the portmanteau theorem, (3.2) is equivalent to, as
where

For simplicity, let






tions of more essential terms to consider in the convergence on

where examples of


can be regarded as


which is the conditional expectation of






So, letting






Example: Missing Binary Covariates. For simplicity, we assume that



while the expectations of terms which may form all possible partial-sums in



In these calculi, note that the Bayes rule is used, such as
Example: Cox Cure-Mixture Model. In this model,

which may form all possible partial-sums are

where

On Weak Convergence. Let







Remark 3. For example, if

However, a result of interest here is whether



Incidentally, we cannot obtain the almost sure convergence in this problem, since
is always apart from zero.
Lemma 3. Let










(i) The class of functions

(ii)

(iii)

then, as
Lemma 3 is proved in Appendix A.1. The following examples show that the conditions needed in Lemma 3 are
satisfied for

Example 1. Let



Example 2. Let





3.3. Proof of Theorem 1
Consider








i.e.


and
by Conditions A1 and A4, similar to the standard Cox model (see [19]). Thus,
is obtained as

using Lemma 3 in Example 1 and applying the strong law of large numbers (SLLN) to




by applying the SLLN on

using Lemma 3 in Example 2. Hence,



using Lemma 3 in Example 2 and the continuous mapping theorem about log-function. For the latter application, note that


converges in probability to zero as
Applying the above three results to Lemmas 1 and 2, therefore, we obtain
respectively, so that we conclude

Although (3.4) shows that the limit of





In discussing a convergence about the form




Let

at arbitrary point

note that
and then
Lemma 4.

uniformly on
A proof of Lemma 4 is provided briefly in Appendix A.2 since it is similar to Lemma 3. Now, applying Lemma
4 to

Let








nation of this result and (3.4) yields

On a Limit Form. The result of (3.5) shows only that the limit of

Here we discuss a limit form of




sets of





Then,
Because of

so that, via the general binomial theorem, we can show that
Also,




In addition, (3.6) is derived in the case of
A limit function of

Corollary 1. If Theorem 1 holds, then a limit expression to which


4. Additional Considerations
4.1. Monte Carlo (MC) Approximations
It usually takes a long time for the exact computation of the OPL. So, another subject of interest is the performance of its MC approximations. Let


We assign a point






Given fixed data










using

By the standard asymptotic theory, as


provided


To evaluate the quantity of



similar to the discussion for (3.6). As this result means that



using the delta method. Now consider the other aspect of (4.2) under

where






4.2. Numerical Examples
We will investigate two circumstances in the finite samples using the Cox cure-mixture model. One is how a relation such as (3.6) obtained as



Ovarian Cancer Data: For the first purpose, we use survival data of ovarian cancer patients [20]. We set the covariates as




Here, let

and










Simulated Data: For the second purpose, we prepare simulated data with












We use these to observe a better estimation performance than
Figure 2 shows simulated averages and standard errors (SEs) of 100 pairs of
computed at










Further, even if the approximations were reduced to



Figure 1. Plots of





we set


computed under these settings. Although





5. Concluding Remarks
A main result of this paper was to show the almost sure convergence of the OPL constructed in incomplete data with two class possibilities. To obtain this result, we discussed the principle of formulating this type of structure of the OPL, and then developed the tools based on a partial-sum processes argument. The limit function of the OPL resulting finally (Corollary 1) is the essential supremum of partial likelihoods obtained based on all the forms of complete data included in incomplete data, which is similar to

Unfortunately, it will be difficult to show consistency and asymptotic normality of the maximum OPL estimator (MOPLE) using the limit function of the OPL provided in Corollary 1. However, if the consistency is
Figure 2. Plots of averages (polygonal lines) and SEs (horizontal whiskers) of







Figure 3. Plots of averages of





achieved (as almost expected), the global essential maximum will be accomplished around true complete data under a true regression parameter. On the other hand, for the purpose of showing the consistency of the MOPLE, there will be other convenient limit expressions, although not discussed in this paper. A future paper on this topic is based on an infinite-dimensional Laplace approximation for integral on the baseline hazard function [11]. However, in applying such a Laplace approximation to the OPL, a precondition that the OPL converges to a deterministic function is necessary. Hence, in order to obtain this precondition and for the reason that it is generally difficult to show the convergence result directly using the Laplace approximation, it is meaningful to discuss the asymptotic convergence of the OPL using the argument employed in this paper.
The results on the convergence of the exact OPL could easily suit the context of MC approximations. For example, at the end of Section 4.1 we show that, by applying Theorem 1 and Corollary 1, the size of the MC error is less than






In future study, it is important to derive the other expression of the limit function based on an infinite-dimen- sional Laplace approximation for integral on the baseline hazard and then to discuss the consistency and asymptotic normality of the MOPLE, since the asymptotic convergence of the OPL is given in this paper. Further, it is an interesting issue how the discussion of the OPL of the binary class as considered here could be extended to that under continuous class possibilities, such as the Cox frailty model.
Acknowledgements
The author is grateful to anonymous referees for their careful reading. This work is financially supported by JSPS KAKENHI grant number 23700336.
References
- 1. D. R. Cox, “Regression Models and Life Tables (with Discussion),” Journal of the Royal Statistical Society, Series B, Vol. 34, No. 2, 1972, pp. 187-220.
- 2. J. S. Kim, “Maximum Likelihood Estimation for the Proportional Hazards Model with Partly Interval-Censored Data,” Journal of the Royal Statistical Society, Series B, Vol. 65, No. 2, 2003, pp. 489-502.
http://dx.doi.org/10.1111/1467-9868.00398 - 3. M. C. Paik and W.-Y. Tsai, “On Using the Cox Proportional Hazards Model with Missing Covariates,” Biometrika, Vol. 84, No. 3, 1997, pp. 579-593. http://dx.doi.org/10.1093/biomet/84.3.579
- 4. H. Y. Chen and R. J. A. Little, “Proportional Hazards Regression with Missing Covariates,” Journal of the American Statistical Association, Vol. 94, No. 447, 1999, pp. 896-908.
http://dx.doi.org/10.1080/01621459.1999.10474195 - 5. A. H. Herring and J. G. Ibrahim, “Likelihood-Based Methods for Missing Covariates in the Cox Proportional Hazards Model,” Journal of the American Statistical Association, Vol. 96, No. 453, 2001, pp. 292-302.
http://dx.doi.org/10.1198/016214501750332866 - 6. S. A. Murphy and A. W. van der Vaart, “On Profile Likelihood (with Discussion),” Journal of the American Statistical Association, Vol. 95, No. 450, 2000, pp. 449-465.
http://dx.doi.org/10.1080/01621459.2000.10474219 - 7. D. R. Cox, “Partial Likelihood,” Biometrika, Vol. 62, No. 2, 1975, pp. 269-276.
http://dx.doi.org/10.1093/biomet/62.2.269 - 8. R. Gill, “Marginal Partial Likelihood,” Scandinavian Journal of Statistics, Vol. 19, No. 2, 1992, pp. 133-137.
- 9. M. R. Kosorok, “Introduction to Empirical Processes and Semiparametric Inference,” Springer, Berlin, 2008.
http://dx.doi.org/10.1007/978-0-387-74978-5 - 10. J. P. Sy and J. M. G. Taylor, “Estimation in a Cox Proportional Hazards Cure Model,” Biometrics, Vol. 56, No. 1, 2000, pp. 227-236. http://dx.doi.org/10.1111/j.0006-341X.2000.00227.x
- 11. T. Sugimoto, “A Large Sample Study of Marginal Partial Likelihood in a Cox Cure-Mixture Regression Model,” Unpublished.
- 12. A. Y. C. Kuk and C.-H. Chen, “A Mixture Model Combining Logistic Regression with Proportional Hazards Regression,” Biometrika, Vol. 79, No. 3, 1992, pp. 531-541.
http://dx.doi.org/10.1093/biomet/79.3.531 - 13. Y. Peng and K. B. G. Dear, “A Nonparametric Mixture Model for Cure Rate Estimation,” Biometrics, Vol. 56, No. 1, 2000, pp. 237-243. http://dx.doi.org/10.1111/j.0006-341X.2000.00237.x
- 14. W. Lu and Z. Ying, “On Semiparametric Transformation Cure Models,” Biometrika, Vol. 91, No. 2, 2004, pp. 331-343.
http://dx.doi.org/10.1093/biomet/91.2.331 - 15. T. Sugimoto, T. Hamasaki and M. Goto, “Estimation from Pseudo Partial Likelihood in a Semiparametric Cure Model,” Journal of the Japanese Society of Computational Statistics, Vol. 18, No. 1, 2005, pp. 33-46.
- 16. B. W. Turnbull, “Nonparametric Estimation of a Survivorship Function with Doubly Censored Data,” Journal of the American Statistical Association, Vol. 69, No. 345, 1974, pp. 169-173. http://dx.doi.org/10.1080/01621459.1974.10480146
- 17. B. W. Turnbull, “The Empirical Distribution Function with Arbitrarily Grouped, Censored and Truncated Data,” Journal of the Royal Statistical Society, Series B, Vol. 38, No. 3, 1976, pp. 290-295.
- 18. J. D. Kalbleisch and R. L. Prentice, “Marginal Likelihoods Based on Cox’s Regression and Life Model,” Biometrika, Vol. 60, No. 2, 1973, pp. 267-278. http://dx.doi.org/10.1093/biomet/60.2.267
- 19. P. K. Andersen and R. D. Gill, “Cox’s Regression Model for Counting Processes: A Large Sample Study,” Annals of Statistics, Vol. 10, No. 3, 1982, pp. 1100-1120.
http://dx.doi.org/10.1214/aos/1176345976 - 20. D. Collett, “Modelling Survival Data in Medical Research,” 2nd Edition, Chapman & Hall/CRC, London, 2003.
- 21. A. W. van der Vaart and J. A. Wellner, “Weak Convergence and Empirical Processes,” Springer-Verlag, New York, 1996.
http://dx.doi.org/10.1007/978-1-4757-2545-2






























































































































