1. Introduction

Applied Mathematics

2152-7385

Scientific Research Publishing

10.4236/am.2014.513200

AM-47985

Articles

COMPUTER SCIENCE & COMMUNICATIONSENGINEERINGPHYSICS & MATHEMATICS

Wavelet Density Estimation of Censoring Data and Evaluate of Mean Integral Square Error with Convergence Ratio and Empirical Distribution of Given Estimator

Mahmoud

Afshari

₁^*

Department of Statistics, College of Science, Persian Gulf University, Bushehr, Iran

* E-mail:afshar@pgu.ac.ir

07072014

05132062207219 April 201429 May 2014 12 June 2014

2014

This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/

Wavelet has rapid development in the current mathematics new areas. It also has a double meaning of theory and application. In signal and image compression, signal analysis, engineering technology has a wide range of applications. In this paper, we use wavelet method, for estimating the density function for censoring data. We evaluate the mean integrated squared error, convergence ratio of given estimator. Also, we obtain empirical distribution of given estimator and verify the conclusion by two simulation examples.

Wavelet Estimation Censoring Mean Integral Error Convergence

1. Introduction

One of data types, which researchers are extremely interested in, is caring to the time interval till the occurrence of certain events such as death etc. Any process waiting for a specific event produces survival data. Survival function, which is shown by, indicates the ratio of people who survived since the base time which is the point they enter the experiment. Failure in survival analysis means the occurrence of the event we were waiting for. The time, where survival is measured after that point, is called the start time. The failure time is the time that failure occurs for each individual which is denoted by for. The failure time is occurred from the base time up to when the failure occurs and it’s known as . It’s not always possible to observe the failure time for each individual. In such cases, censorship occurs. The rate of occurrences of an event (failure) in a specific short period of time providing that no failure occurred before that time is the concept which is discussed by the name hazard function in survival analysis. Hazard function for the failure time line is as follows:

Wavelets theory was proposed by Alfred Harr [1] for the first time in 1910. He showed that a continuous function can be approximated as follows:

(1)

Such that

Also for mother wavelet and father wavelets the following:

Definition 1-1: Assume that; is an orthogonal unit base for and contains all sectionally constant functions and their exact length is twice the interval length of.

Spaces are called multiresolatio analysis or scale function, if it satisfies the following conditions:

1-, 2-, 3-,

4-, 5-‘.

6- in condition that is an orthogonal base for.

If we consider the scale function in the interval, then the image of f on the space V_j is defined as

which is a function with the resolution, and because of the fact that

thus is a good approximation of function for large amounts of.

Let the nested sequence of closed subspaces; …be a multiresolutuon approximation to. Define, to be orthogonal complement of in.

The term wavelets are used to refer to a set of basis functions with very special structure. The special of wave- lets basis for function as scaling function and mother wavelet such that forms an orthogonal basis for and forms an orthonormal basis for. Other wavelets in the basis are then generated by translation of the scaling function and dilations of the mother wavelet by using the relationships:

(2)

Given above Wavelet basis, a function can be written a formal expansion:

(3)

where

As for general orthogonal series estimator, Daubechies [2] , density estimator can be written as:

(4)

where the obvious coefficient estimator can be written:

(5)

We divide time axis into two parts, the intervals and the number of events in each interval. We determine number of events and hazard function according to the observations. Then we flatten them separately via linear wavelet density estimation on the whole time and then we calculate the function estimator and evaluate the asymptotic distribution.

In this paper we obtain estimator density for censoring data by using wavelet method and evaluate mean integral square error with convergence ratio and empirical distribution of given estimator.

2. Estimator of Density by Using Wavelet Method

Wavelets can be used for transient phenomena analysis or functions analysis which sometimes changes rapidly, and they are symmetrical and have limited period unlike rugged Sine waves, thus the signals with radical changes are analyzed better. The close relationship between wavelet coefficients and some spaces, wavelet bases being orthogonal and also useful properties of them in wavelet issues simplify the computational algorithms. As a result, numerous articles have been published about density function estimation. The mathematical theorem of wavelets and their application in statistics have been studied as a technique for nonparametric curve estimators by Antoniadys [3] .

Afshari [4] -[6] have done some researches about density function estimator, the density functional derivative and the nonparametric regression function for the mixing random variables. Donohu [7] , kyacharyan, Picard [8] , Malat [9] , Meyer [10] , and some articles have been published in this field. Hall and Patil [11] have found a formula for the Mean Integrated Squared Error of Nonlinear Wavelet based on density estimators. Antoniadys et al. [12] achieved the density function estimator and the hazard function for right-censored data with the wavelets. In this section we obtain estimator of density function for censoring data by using wavelet method.

Suppose are failure time of tests that are studied. They are non-negative, independent, identically distributed, with the density function and distribution function and are corresponding to censored times, non-negative, independent, identically distributed, with the density function and distribution function.

Assuming independency of failure times and censored time of the observed random variable, and the function and Hazard function are shown as below:

Such that is indicator function of. For data censoring, if then we have as the following:

Also we definite as follows:

To estimate, we divide the time axis into two parts of small intervals and the amounts of events (0 or 1) in each interval, and then we divide these values to the length of intervals.

Estimation procedures of can be summarized as the following:

Select and collect the observed failures in intervals with the length and using wavelet estimation on the collected data. We find an estimate of sub density. This means that we calculate the collected wavelet coefficients data on the scale of by choosing the decomposition level and then we estimate. It is necessary to state the following symbols to show the details:

We figure estimators on the finite interval in which. Note that if is the ordinal order sta-

tistic of the sequence then. In fact we suppose.

Suppose that is an integer that could be dependent to and the estimated points are as follows:

Suppose that and we divide the interval of time axis to intervals with long

The -th interval is marked by so: for.

Now we define the following indicator function that indicates the number of uncensored failures in the time interval We assume that the observed failures ratio in the

interval n other words:

Theorem 2-1: Suppose that the sub density is a continuous function on and it’s m times differentiable, then if vor, we have:

Proof: see [13] .

We smooth the data by an appropriate wavelet smoother to find the estimation of.

We can write,

(6)

where,

The complex structural polymorphism analysis causes an efficient tree construction algorithm for analysis of functions in with theoretic scale wavelet coefficients. However, the integral scale is not well available and we need an initial value for a fast wavelet transform. Antonyadys [4] suggested the following initial amount:

As a result a reasonable estimate for image of with clarity is:

(7)

If we assume that the collected valueswhich are equal to the estimators of, are in Sobolev space and is regular of degree. We estimate the unknown function as follows to level the data with a better rate for the sample size and the sequence:

(8)

That it is the orthogonal image of on the leveler approximation space.

Theorem 2-2: Suppose that the sub density is a continuous function on and it’s m times differen- tiable, then if for we have:

Proof: by using theorem (2-1) we can write:

(9)

Since, , then and we can write as the following:

So Equations (9) can be written as follows:

(10)

By using Equation (1) we have:

(11)

By using Equations (10) and (11) we have:

By using theorem (2-1) we can writhe as follows:

Using this fact that is uniformly bounded on and, we have:

(12)

Since is regular in order we can write:

(13)

According Equation (13), we can write:, complete the proof.

3. Evaluate of Mean Integral Square Error with Convergence Ratio

In this section we evaluate mean integral square error and convergence ratio is investigated.

Definition 3-1: The mean integrated square error (MISE) of kernel estimator of a density function is given. In this formula denotes the right and left convergence, when, denotes the sample size, denotes the estimator bandwidth core, denotes core level and , denote kernel dependent quantities with unknown density.

Theorem 3-1: Suppose that the sub density is a continuous function on and it’s times differentiable, then if for and, then,

(14)

Proof:

(15)

By using Equation (15) and theorem (2-2) for, we can write as the following:

Because we can write as the following:

(16)

(17)

So by using Equations (16) and (17), we can write:

(18)

For evaluate, we can write:

Also we can write:

then,

(19)

By using theorem (2-1) and expectation of Equation (19), we can write as the following:

(20)

By using theorem (2-1) we have:

(21)

(22)

By using Equation (22) and this fact that is uniformly bounded, we can write as the following:

The second part of Equation (20) can be written as the following:

By using, the proof is complete.

4. Empirical Distribution of Purpose Estimator

In this section we investigate empirical distribution of estimator under some condition.

Theorem 4-1 Suppose that the sub density is a continuous function on and it’s m times differentiable, for, , , , then for interval, we have:

Proof:

By using theorems (2-1) and (2-2), we can write as the following:

(23)

(24)

So by using equation of (23) and (24) we can write as the following:

We prove that II has asymptotically normal distribution and also I, III tend to zero when

First, we show that I, III tend to zero when. According to Equation (24) we have:

(25)

By using Equation (23) we have:

So by using Equation (24) and (25), the phrase I, III tend to zero when, and finally we have:

So we have:

(26)

Such that for each fixed, while, is defined as an independent and identically distributed random sample with the mean as follows:

By using cushy Schwartz inequality:

(27)

So we can write as the following:

Using this fact that is uniformly bounded and, , , we can write:

Thus, the Equation (26) state is convergent in and thus in the distribution.

Also by using Theorem (2-2), we have:

Thus we have:

We control the Lindberg condition in order to prove that II is asymptotically normal. For this purpose, we

set: and we show that

By using cushy Schwartz inequality:

, So we can write as the

following:

and complete the proof.

5. Simulation and Numerical Computation for Target Estimator

In this section we simulate, on the data of size by using Semlayt’s wavelet. We consider convergence ratio of given estimator by computing of average mean square error of given estimators. We use software and wavelet package for simulation.

Example 1: We generate and from the Samples of size and with, , and for optimal surface.

The results in Table 1 displays the average mean square errors of subdensity function estimator for sample sizes and.

The panel in Figure 1 displays the wavelet estimator of subdensity of observed failures for a traditional censoring data. The solid line is the density estimator and the dotted line is the true density.

Example 2: Suppose that, where and. We generate from sample size of and with, , and.

The results in Table 2 displays the average mean square errors of subdensity function estimator for sample sizes and.

The panel in Figure 2 displays the wavelet estimator of subdensity of observed failures for a traditional censoring data. The solid line displays the subdensity estimates based actual data and the dotted line is the true density.

Table 1

. The average mean square errors of subdensity function estimator by wavelet method



17.9 10.1 7.2	26.1 19.2 18.6	8 16 32

Table 2

. The average mean square errors of subdensity function estimator by wavelet method



610 275 278	680 420 379	8 16 32

Figure 1

The wavelet subdensity and true density estimator

Figure 2

The wavelet subdensity and true density estimator

6. Conclusion

In this paper we obtain density estimation for censoring data by using wavelet method and evaluate mean integral square error. We show that convergence ratio is acceptable and empirical distribution of given estimator under some condition is normal.

Acknowledgements

The support of Research Committee of Persian Gulf University is greatly acknowledged.

References1

HARR

,et al. (1910)ZUR THEORIE DER ORTHOGONALEN FUNKTIONEN. MATHEMATISCHE ANNALEN 69, 331-371.

DAUBECHIES

,et al. (1988)ORTHOGONAL BASES OF COMPACTLY SUPPORTED WAVELETS. COMMUNICATION IN PURE AND APPLIED MATHEMATICS 41, 909-996.

ANTONIADIS

,et al. (1996)SMOOTHING NOISY DATA WITH TAPERED COIFLETS SERIES. SCANDINAVIAN JOURNAL OF STATISTICS 23, 313-330.

AFSHARI

,et al. (2013)A FAST WAVELET ALGORITHM FOR ANALYZING OF SIGNAL PROCESSING AND EMPIRICAL DISTRIBUTION OF WAVELET COEFFICIENTS WITH NUMERICAL EXAMPLE AND SIMULATION. COMMUNICATION OF STATISTICS-THEORY AND METHODS 42, 4156-4169.

AFSHARI

,et al. (2014)ESTIMATION OF HAZARD FUNCTION FOR CENSORING RANDOM VARIABLE BY USING WAVELET DECOMPOSITION AND EVALUATE OF MISE, AMSE WITH SIMULATION. JOURNAL OF DATA ANALYSIS AND INFORMATION PROCESSING 2, 1-5.HTTP://DX.DOI.ORG/10.4236/JDAIP.2014.21001

AFSHARI

,et al. (2008)WAVELET-KERNEL ESTIMATION OF REGRESSION FUNCTION FOR UNIFORMLY MIXING PROCESS. WORD APPLIED SCIENCES JOURNAL 4, 605-609.

DONOHA

D.L.

, JOHNSTONE

I.M.

,et al. (1994)IDEAL SPATIAL ADAPTATION BY WAVELET SHRINKAGE. BIOMETRIKA JOURNAL 81, 425-455.HTTP://DX.DOI.ORG/10.1093/BIOMET/81.3.425

KERKYACHARIAN, G. AND PICARD, D. (1993) DENSITY ESTIMATION BYKERNEL AND PROBABILITY. MCGRAW-HILL SCIENCE, NEW YORK, 327-336.

MALLAT

S.G.

,et al. (1989)A THEORY FOR MULTIRESOLUTION SIGNAL DECOMPOSITION: THE WAVELET REPRESENTATION. TRANSFORMATIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 11, 674-693.

MEYER, Y. (1990) ON DE LETTES ET OPERATEURS. HERMANN, PARIS.

HALL

, PATIL

,et al. (1995)FORMULA FOR MEAN INTEGRATED SQUAREDERROR OF NON-LINEAR WAVELET BASED DENSITY ESTIMATORS. ANNALS OF STATISTICS 23, 905-928.HTTP://DX.DOI.ORG/10.1214/AOS/1176324628

ANTONIADIS

, GREGOIRE

, NASON

,et al. (1999)ANTONIADIS, A., GREGOIRE, G. AND NASON, P. DENSITY AND HAZARD RATE ESTIMATION FOR RIGHT CENSORED DATA USING WAVELET METHODS. JOURNAL OF ROYAL STATISTICAL SOCIETY SERIES B 23, 313-330.

VIDAKOVIK, B. (1999) STATISTICAL MODELING BY WAVELETS. WILEY, NEW YORK. HTTP://DX.DOI.ORG/10.1002/9780470317020