Minimum Cramér-Von Mises distance estimation is extended to a simulated version. The simulated version consists of replacing the model distribution function with a sample distribution constructed using a simulated sample drawn from it. The method does not require an explicit form of the model density functions and can be applied to fitting many useful infinitely divisible distributions or mixture distributions without closed form density functions often encountered in actuarial science and finance. For these models likelihood estimation is difficult to implement and simulated Minimum Cramér-Von Mises (SMCVM) distance estimation can be used. Asymptotic properties of the SCVM estimators are established. The new method appears to be more robust and efficient than methods of moments (MM) for the models being considered which have more than two parameters. The method can be used as an alternative to simulated Hellinger distance (SMHD) estimation with a special feature: it can handle models with a discontinuity point at the origin with probability mass assigned to it such as in the case of the compound Poisson distribution where SMHD method might not be suitable. As the method is based on sample distributions instead of density estimates it is also easier to implement than SMHD method but it might not be as efficient as SMHD methods for continuous models.
In actuarial science or finance we often model losses or log-returns with distribution functions where neither the distribution function nor its corresponding density function has a closed form expression yet it is not complicated to draw random samples from these distributions. It is clear that likelihood methods are complicated in such a situation.
For statistical inferences using models with these features, we shall assume to have independent and identically distributed (iid) observations X 1 , ⋯ , X n which have a common distribution as X with model distribution and density given respectively by F θ ( u ) and f θ ( u ) . Neither F θ ( u ) nor f θ ( u ) has a closed form expression but often its moment generating function (mgf) M θ ( s ) has a closed form expression. The vector of parameters of interest is
θ = ( θ 1 , ⋯ , θ m ) ′
The compound Poisson distribution used in actuarial sciences and jump diffusion distribution in finance are typical examples for these types of models. Furthermore, in many circumstances distributions derived from the increments of Lévy processes also display these characteristics and it is of interest to make inferences for the vector of parameters. We shall illustrate the situation with example 1 and example 2 below.
Example 1
In this example, we shall consider the compound Poisson gamma distribution which is commonly used in actuarial science and it arises from the compound Poisson processes which also belong to the class of Lévy processes.
The compound gamma distribution is the distribution of a random variable X representable as a random sum, i.e.,
X = ∑ i = 1 N Y i with the Y i ’s being iid with a common gamma distribution with the density function given by f Y ( y ) = 1 Γ ( α ) β α y α − 1 e − y β , y > 0 , α > 1 , β > 0 and the moment generating function is given by M Y ( s ) = 1 ( 1 − β s ) α . The random variable N follows a Poisson distribution with parameter λ > 0 and it is assumed that the Y i ’s and N are independent.
Note that the moment generating function of X is
M θ ( s ) = e λ ( M Y ( s ) − 1 ) (1)
and from the mgf M θ ( s ) the first there cumulants can be found and they are given by
c 1 = α β λ , c 2 = α β 2 λ ( 1 + α ) , c 3 = α λ ( α + 1 ) ( α + 2 ) β 3 (2)
The vector of parameters is θ = ( α , β , λ ) ′ .
It is not difficult to simulate from the distribution of X but the density function of X has no closed form, see Klugman et al. [
Lévy processes are also used in finance and they can be used as alternative models to the classical Brownian motion. The distributions of the increments of these processes can be more flexible than the normal distribution, they can be asymmetric and have fatter tail than the normal distribution. Consequently, they are more suitable to model log-returns of assets in finance. The following double exponential jump diffusion distribution is an illustration of an alternative distribution to the normal distribution which is the distribution of the increments of a Brownian motion.
Example 2
The double exponential jump diffusion model is a special case of a larger class of jump diffusion models where the distribution for the jumps follows an asymmetric Laplace distribution instead of the classical normal distribution as in the classical jump-diffusion model introduced by Merton [
X = Z + ∑ i = 1 N Y i
The Y i ’s are iid with a common distribution and mgf given respectively by
f Y ( y ; ω , η ) = 1 2 η e − | x − ω η | , − ∞ < x < ∞ , − ∞ < ω < ∞ , η > 0 , (3)
M Y ( s ) = e ω s 1 − η 2 s 2 . (4)
The distribution function of the double exponential distribution is
F Y ( y ) = 1 2 e x − ω η for x ≤ ω and F Y ( y ) = 1 2 + 1 2 ( 1 − e − x − ω η ) for x > ω . (5)
Since this distribution has an explicit expression, simulated samples drawn from the double exponential distribution can be based on the inverse method.
Tsay [
E ( Y ) = ω , V ( Y ) = 2 η 2 (6)
It is assumed that the Y i ’s, Z and N are independent, N follows a Poisson distribution with parameter λ and Z has a normal distribution N ( μ , σ 2 ) .
It is easy to see that the mgf of X is given by
M θ ( s ) = e μ s + 1 2 σ 2 s 2 e λ ( e ω s 1 − η 2 s 2 − 1 ) . (7)
From M θ ( s ) , the first five cumulants can be found and they are given by
c 1 = μ + λ ω , c 2 = σ 2 + λ ( 2 η 2 + ω 2 ) , c 3 = λ ( 6 η 2 ω + ω 3 ) (8)
and
c 4 = λ ( 24 η 4 + 12 η 2 ω 2 + ω 4 ) , c 5 = λ ( 120 η 4 ω + 20 η 2 ω 3 + ω 5 ) . (9)
The vector of parameters is θ = ( μ , σ 2 , λ , ω , η ) ′ .
For models introduced by these examples if we use methods of moments (MM) to estimate the parameters, the MM estimators will lack of robustness properties and they might not even be efficient as for models with more than two parameters, MM estimators will depend on polynomials of degree higher or equal to three hence will be unstable in the presence of outliers. Estimators based on empirical characteristic functions procedures such as the KL procedures of Feurverger and McDunnough [
Q n ( θ ) = 1 n ∑ i = 1 n ( F n ( x i ) − F θ ( x i ) ) 2 or equivalently (10)
Q n ( θ ) = ∫ − ∞ ∞ ( F n ( u ) − F θ ( u ) ) 2 d F n ( u ) (11)
as given by Duchesne et al. [
the indicator function. Note that if it is easy to draw samples from F θ ( u ) , we can construct the simulated sample distribution function F θ S ( u ) using S observations drawn from F θ ( u ) similarly and minimize instead the following objective function
Q n ( θ ) = 1 n ∑ i = 1 n ( F n ( x i ) − F θ S ( x i ) ) 2 (12)
to obtain estimators. We shall call these estimators simulated MCVM (SMCVM) estimators and denoted them by the vector θ ^ S and we shall call this version, version S. The method is numerically relatively simple to implement using simplex direct search methods which are derivative free. Packages like R already have built in function to minimize a function using the Nelder-Mead simplex method. The SMCVM method does not require a proxy model like other simulated methods such as methods based on indirect inference, see Garcia et al. [
The paper is organized as follows. Following the approach in section 3 by Pakes and Pollard [
We can make use elegant results of Theorem 3.1 and 3.2 in section 3 of the paper by Pakes and Pollard [
For an element x l = ( x l , 1 , x l , 2 , ⋯ ) ′ which belongs to l 2 , define
‖ x l ‖ = ( ∑ i = 1 ∞ x l , i 2 ) 1 2 assumed to be finite. Clearly, ‖ . ‖ is a norm for l 2 and it
generalizes naturally the Euclidean norm. Also, a vector u = ( u 1 , ⋯ , u p ) ′ is of finite dimension p, hence belongs to the Euclidean space then it belongs to l 2 with u = ( u 1 , ⋯ , u p , 0 , 0 , ⋯ ) ′ . The space l 2 and the norm ‖ . ‖ have been studied in functional analysis or real analysis, see Davidson and Donsig [
For a matrix A = ( a i j ) , i = 1 , 2 , ⋯ , j = 1 , 2 , ⋯ in l 2 , define ‖ A ‖ = ( ∑ i = 1 ∞ ∑ j = 1 ∞ a i j 2 ) 1 2 . With the space l 2 , most of the results of their Theorems in section 3 are valid and only some minor changes are needed.
For estimation, we assume that we have a random sample which consist of n iid observations X 1 , ⋯ , X n from a continuous parametric family with distribution F θ ( u ) . We also assumed that F θ has no closed form expression but simulated samples can be drawn from F θ ( u ) .The commonly used sample distribution function is denoted by F n ( u ) . The vector of parameters is denoted by
θ = ( θ 1 , ⋯ , θ m ) ′ .
Define the following vectors of random functions
G n ( θ ) = ( F n ( x 1 ) − F θ ( x 1 ) n , ⋯ , F n ( x n ) − F θ ( x n ) n , 0 , 0 , ⋯ ) ′ (13)
for version D and it is easy to see that
‖ G n ( θ ) ‖ 2 = 1 n ∑ i = 1 n ( F n ( x i ) − F θ ( x i ) ) 2 . (14)
Equivalently,
‖ G n ( θ ) ‖ 2 = ∫ − ∞ ∞ ( F n ( u ) − F θ ( u ) ) 2 d F n , (15)
if F θ ( u ) has support on the real line and
‖ G n ( θ ) ‖ 2 = ∫ 0 ∞ ( F n ( u ) − F θ ( u ) ) 2 d F n , if F θ ( u ) is the distribution of a nonnegative random variable. Using the set up given by section 3 in Pollard and Pakes [
For the simulated version of MCVM estimation, i.e., version S, define
G n ( θ ) = ( F n ( x 1 ) − F θ S ( x 1 ) n , ⋯ , F n ( x n ) − F θ S ( x n ) n , 0 , 0 , ⋯ ) ′ (16)
with F θ S ( u ) being the sample distribution function based on the simulated sample of observations of size S drawn out F θ ( u ) . Then the SCVM estimators given by the vector θ ^ S is obtained by minimizing
‖ G n ( θ ) ‖ 2 = 1 n ∑ i = 1 n ( F n ( x i ) − F θ S ( x i ) ) 2 . (17)
Clearly, both versions of MCVM estimation can be treated in a unified way using this set up, we also have ‖ G ( θ ) ‖ < ∞ in probability. For both versions, let
G ( θ ) = ( F θ 0 ( x 1 ) − F θ ( x 1 ) n , ⋯ , F θ 0 ( x n ) − F θ ( x n ) n , 0 , 0 , ⋯ ) ′ ,
‖ G ( θ ) ‖ 2 = ∫ − ∞ ∞ ( F θ 0 ( u ) − F θ ( u ) ) 2 d F n ( u ) and we have F n ( u ) → p F θ 0 ( u ) , ‖ G n ( θ ) ‖ − ‖ G ( θ ) ‖ = o p ( 1 ) , with o p ( 1 ) being an expression which converges to 0 in probability.
We shall restate Theorem 3.1 given by Pakes and Pollard [
Q n ( θ ) = ‖ G n ( θ ) ‖ 2 or ‖ G n ( θ ) ‖ (18)
Theorem 1: Under the following conditions, the estimators given by the vector θ ˜ converges in probability to θ 0 , the vector of the true parameters, i.e., θ ˜ → p θ 0 .
1) ‖ G n ( θ ˜ ) ‖ ≤ o p ( 1 ) + inf θ ∈ Ω ‖ G n ( θ ) ‖ , Ω is the parameter space assumed to be compact.
2) ‖ G n ( θ 0 ) ‖ = o p ( 1 ) .
3) sup ‖ θ − θ 0 ‖ > δ ‖ G n ( θ ) ‖ − 1 = O p ( 1 ) for each δ > 0 , O p ( 1 ) is an expression bounded in probability.
Clearly for the SMCVM estimators given by the vector θ ^ S which minimizes Q n ( θ ) = ‖ G n ( θ ) ‖ 2 will satisfy condition 1) and 2) of Theorem 1 as ‖ G n ( θ ) ‖ 2 → p 0 only at θ = θ 0 if the parametric family is well parameterized
which is the case in general. Note that the integrand of the integral defined by expression (10) is nonnegative and smaller or equal to one. Therefore, in probability,
0 < Q n ( θ ) ≤ 1 for ‖ θ − θ 0 ‖ > δ , δ > 0
The condition 3) is satisfied in general which implies consistency for the SCVM estimators, we then have θ ^ S → p θ 0 . Note that since Q n ( θ ) is always bounded it is not surprising that it generates robust estimators. For more on robustness in the sense of bounded influence functions for the SMCVM estimators see section (2.2.2). Also, observe that θ ^ S remains consistent even the parametric models are only hybrid, i.e., with some discontinuity points such as in the case of the compound Poisson models. Now we turn our attention to the question of asymptotic normality for θ ^ S and discuss informally the arguments used to establish asymptotic normality for θ ^ S first and the formal arguments will follow subsequently from the proofs of Theorem 3.3 by Pakes and Pollard [
Since Q n ( θ ) = ( ‖ G n ( θ ) ‖ ) 2 is not differentiable, the traditional Taylor expansion argument cannot be used to establish asymptotic normality of estimators obtained by minimizing ( ‖ G n ( θ ) ‖ ) 2 . Here, we assume G ( θ ) is differentiable with derivative matrix Γ ( θ ) , it means Fréchet differentiable with respect to the norm ‖ . ‖ for l 2 ; see Luenberger [
If the property of differentiability holds then we can define the random function Q n a ( θ ) to approximate Q n ( θ ) with
Q n a ( θ ) = ( ‖ L n ( θ ) ‖ ) 2 , L n ( θ ) = G n ( θ 0 ) + Γ ( θ 0 ) ( θ − θ 0 ) (19)
Let θ ^ S and θ * be the vectors which minimize Q n ( θ ) and Q n a ( θ ) respectively. The ideas behind the proofs for asymptotic normality of Theorem (3.3) of Pakes and Pollard are if the approximation of the original objective function Q n ( θ ) which is not differentiable by a differentiable one namely Q n a ( θ ) is of the right order then the vector θ ^ S which minimizes Q n ( θ ) and θ * , the vector which minimizes Q n a ( θ ) are asymptotically equivalent, i.e., we have:
1) n ( θ ^ S − θ 0 ) = n ( θ * − θ 0 ) + o p ( 1 ) or using equality in distribution, n ( θ ^ S − θ 0 ) = d n ( θ * − θ 0 ) and it is easy to see that θ * can be expressed explicitly as θ * = − ( Γ ′ Γ ) − 1 Γ ′ G n ( θ 0 ) , Γ = Γ ( θ 0 ) since L n ( θ ) is an affine transformation.
2) Q n ( θ ^ S ) = Q n a ( θ * ) + o p ( n − 1 ) , o p ( n − 1 ) is an expression converging to 0 in probability at a faster rate than n − 1 .
Note that the matrix Γ ( θ 0 ) is of rank m with m columns but infinite number of rows given by
Γ = Γ ( θ 0 ) = 1 n ( b i j ) with b i j = − ∂ F θ 0 ( x i ) ∂ θ j , i = 1 , ⋯ , n , j = 1 , ⋯ , m
and
b i j = 0 , i = n + 1 , ⋯ , j = 1 , ⋯ , m
An estimate of this matrix Γ = Γ ( θ 0 ) is Γ ^ n and is defined by expression (33) in section (2.2), consequently we can estimate ∂ F θ 0 ( x i ) ∂ θ j by its estimate − b i j ^ , i = 1 , ⋯ , n , j = 1 , ⋯ , m using the corresponding elements Γ ^ n ( i , j ) extracted from Γ ^ n ,
− b i j ^ = − n Γ ^ n ( i , j ) , i = 1 , ⋯ , n , j = 1 , ⋯ , m (20)
Under these conditions, it suffices to work with θ * and Q n a ( θ * ) to derive asymptotic distribution for of θ ^ S . A regularity condition for the approximation is of the right order given by their Theorem 3.3 which is the most difficult to check is given as
sup ‖ θ − θ 0 ‖ ≤ δ n ‖ G n ( θ ) − G ( θ ) − G n ( θ 0 ) ‖ n − 1 2 + ‖ G n ( θ ) ‖ + ‖ G n ( θ 0 ) ‖ = o p ( 1 )
by Pakes and Pollard [
A slightly more stringent condition which obviously implies the above regularity condition is
sup ‖ θ − θ 0 ‖ ≤ δ n n ‖ G n ( θ ) − G ( θ ) − G n ( θ 0 ) ‖ = o p ( 1 ) . (21)
For simulated methods for this condition to hold, in general independent samples for each θ cannot be used, see Pakes and Pollard [
In this section, we shall state Theorem 2 which is essentially Theorem (3.3) given by Pakes and Pollard [
Theorem 2
Let θ ˜ be a vector of consistent estimators for θ 0 , the unique vector which satisfies G ( θ 0 ) = 0 .
Under the following conditions:
1) The parameter space Ω is compact.
2) ‖ G n ( θ ˜ ) ‖ ≤ o p ( n − 1 2 ) + inf θ ∈ Ω ‖ G n ( θ ) ‖
3) G ( . ) is differentiable at θ 0 with a derivative matrix Γ = Γ ( θ 0 ) of full rank
4) sup ‖ θ − θ 0 ‖ ≤ δ n n ‖ G n ( θ ) − G ( θ ) − G n ( θ 0 ) ‖ = o p ( 1 ) for every sequence { δ n } of positive numbers which converge to zero.
5) n ‖ G n ( θ 0 ) ‖ = O p ( 1 ) , O p ( 1 ) is an expression bounded in probability.
6) θ 0 is an interior point of the parameter space Ω , assumed to be compact.
Then, we have the following representation which will give the asymptotic distribution of θ ˜ in Corollary 1, i.e.,
n ( θ ˜ − θ 0 ) = − ( Γ ′ Γ ) − 1 n Γ ′ G n ( θ 0 ) + o p ( 1 ) , (22)
or equivalently, using equality in distribution,
n ( θ ˜ − θ 0 ) = d − ( Γ ′ Γ ) − 1 n Γ ′ G n ( θ 0 ) . (23)
The proofs of these results follow from the results used to prove Theorem 3.3 given by Pakes and Pollard [
Therefore, for version D,
n ( θ ˜ − θ 0 ) = d − ( Γ ′ Γ ) − 1 n Γ ′ G n ( θ 0 ) , G n ( θ ) as defined by expression (13)
And for version S,
n ( θ ^ S − θ 0 ) = d − ( Γ ′ Γ ) − 1 n Γ ′ G n ( θ 0 ) , G n ( θ ) as defined by expression (16).
From the result of the Theorem, it is easy to see that we can obtain the main result of the following corollary which gives the asymptotic covariance matrix of the estimators.
Corollary 1.
Let Y n = n Γ ′ G n ( θ 0 ) , if Y n → L N ( 0 , V ) and ( Γ ′ Γ ) → p A , A is full rank and symmetric then n ( θ ˜ − θ 0 ) → L N ( 0 , D ) with
D = ( A ) − 1 V ( A ) − 1 (24)
The matrices D and V depend on θ 0 , and we adopt the notations D = D ( θ 0 ) , V = V ( θ 0 ) .
These results are proved by Pakes and Pollard [
g n ( θ ) = n ‖ G n ( θ ) − G ( θ ) − G n ( θ 0 ) ‖ 2 , n = 1 , 2 , ⋯
Assumption 1
1) As n → ∞ and θ → θ 0 , for version S of CVM estimation
E | x { ( F θ 0 S ( x ) − F θ 0 ( x ) ) ( F θ S ( x ) − F θ ( x ) ) } → E | x { ( F θ 0 S ( x ) − F θ 0 ( x ) ) 2 } , (25)
E | x { . } Is the conditional expectation on x of the expression inside the bracket.
2) The sequence of functions g n e ( θ ) is differentiable with continuous partial derivatives, g n e ( θ ) = E ( g n ( θ ) ) , the expectation is under θ 0 and using the usual conditioning argument, it can also be expressed
g n e ( θ ) = ∫ − ∞ ∞ ( n E | x { ( F θ 0 S ( x ) − F θ 0 ( x ) ) 2 } + n E | x { ( F θ S ( x ) − F θ ( x ) ) 2 } − 2 n E | x { ( F θ 0 S ( x ) − F θ 0 ( x ) ) ( F θ S ( x ) − F θ ( x ) ) } ) d F θ 0 ( x ) (26)
For the condition 1) of Assumption 1 to hold we cannot use independent samples for different values of θ to draw simulated samples for version S of CVM estimation, otherwise
E | x { ( F θ 0 S ( x ) − F θ 0 ( x ) ) ( F θ S ( x ) − F θ ( x ) ) } = 0 and g n e ( θ ) cannot converge to 0 in probability. This justifies the same seed must be used to generate random samples for different values of θ .
We shall proceed to check the regularity conditions for both versions of MCVM estimation and note that Γ ( θ ) is the derivative of G ( θ ) in l 2 means that Γ = Γ ( θ 0 ) is the Fréchet derivative at θ = θ 0 with the property
‖ G n ( θ ) − G n ( θ 0 ) − Γ ( θ − θ 0 ) ‖ = o ( ‖ θ − θ 0 ‖ )
As for the Euclidean space, the sufficient condition for differentiability here only requires the partial derivatives ∂ F θ ( x ) ∂ θ j being continuous with respect to
θ . For the notion of derivative in Hilbert space, see the notion of Fréchet derivative in Luenberger [
We proceed to find the asymptotic distribution for n Γ ′ G n ( θ 0 ) . Using expression (22) and expression (23), we shall obtain the asymptotic covariance matrix for the MCVM estimators for both versions. For version D, the asymptotic covariance matrix has been obtained by Duchesne at al. [
R ( F n ) = − Γ ′ G n ( θ 0 ) = ∫ − ∞ ∞ ( F n ( u ) − F θ 0 ( u ) ) ∂ F θ 0 ( u ) ∂ θ d F n ( u ) , ∂ F θ 0 ( u ) ∂ θ = ∂ F θ ( u ) ∂ θ | θ = θ 0
and consider the vector of influence function
I C 1 ( x ) = ∂ R ( F ϵ ) ∂ ϵ | ϵ = 0 , F ϵ = F + ϵ ( δ x − F ) (27)
δ x is the degenerate distribution at the point , F = F θ 0 , 0 ≤ ϵ ≤ 1 .The influence function I C 1 ( x ) is bounded provided that ∂ F θ 0 ( u ) ∂ θ as a vector of functions of u
is bounded which implies the MCVM estimators are robust for version D. We shall assume this property of bounded influence functions holds implicitly; we shall see this also makes version S robust. Furthermore, based on standard results of robust estimation theory, the representations given by expressions (28) and (31) using influence functions are valid for the statistical functionals being considered. Now since R ( F θ 0 ) = 0 ,
n R ( F n ) = − n Γ ′ G n ( θ 0 ) = n ( R ( F n ) − R ( F θ 0 ) ) = 1 n ∑ i = 1 n I C 1 ( x 1 ) + o p ( 1 ) (28)
This is the influence function representation of R ( F n ) = Γ ′ G n ( θ 0 ) for version D and we have n ( θ ^ − θ 0 ) → L N ( 0 , D 1 ) with D 1 = ( A ) − 1 V 1 ( A ) − 1 for version D, V 1 is the covariance matrix of I C 1 ( x ) , I C 1 ( x ) is given by expression (2.15) in Duchesne et al. [
I C 1 ( x ) = ∫ − ∞ ∞ ( δ x ( u ) − F θ 0 ( u ) ) ∂ F θ 0 ( u ) ∂ θ d F θ 0 ( u ) (29)
E ( I C 1 ( x ) ) = 0 , since E θ 0 ( δ x ( u ) ) = F θ 0 ( u ) .
Replacing ∂ F θ 0 ( u ) ∂ θ by ∂ F θ ^ ( u ) ∂ θ and F θ 0 ( u ) by F n ( u ) in the above expression leads to approximate the vector
I C 1 ( x ) by I C 1 ( x ) ˜ with its elements given by
I C 1 l ( x ) ˜ = 1 n ∑ j = 1 n ( I [ x j ≥ x ] − F n ( x j ) ) ∂ F θ ^ ( x j ) ∂ θ l , l = 1 , ⋯ , m
An estimate for the covariance matrix V 1 can be defined as
V 1 ˜ = 1 n ∑ i = 1 n ( I C 1 ( x i ) ˜ ) ( I C 1 ( x i ) ˜ ) ′ . (30)
Using V 1 ˜ , an estimate for the asymptotic covariance matrix of θ ^ can be constructed, see expression (2.15) and expression (2.13) given by Duchesne et al. [
Note that the property of asymptotic normality continues to hold even the parametric model fails to be continuous and is only hybrid as in the compound Poisson gamma case. Using the arguments of the next paragraph to establish asymptotic normality, the same conclusion can be reached for version S. The derivation of the asymptotic covariance matrix D 2 for the SCVM estimators is similar. We shall make use of the notion of bivariate statistical functional introduced by expression (1.6) given by Reid [
B ( F n , F θ 0 S ) = − Γ ′ G n ( θ 0 ) = ∫ − ∞ ∞ ( F n ( u ) − F θ 0 S ( u ) ) ∂ F θ 0 ( u ) ∂ θ d F n ( u )
We have a representation which is similar to the representation given by expression (28) but using both I C 1 ( x ) and I C 2 ( y ) with
I C 2 ( y ) = ∂ B ( F ϵ , F τ ) ∂ τ | ϵ = 0 , τ = 0 , F ϵ is as defined by expression (27) and F τ is similarly defined with F τ = F + τ ( δ y − F ) , δ y is the degenerate distribution at y and 0 ≤ τ ≤ 1 . Note that I C 1 ( x ) as given by expression (29) can also be reobtained using the bivariate statistical functional with I C 1 ( y ) = ∂ B ( F ϵ , F τ ) ∂ ϵ | ϵ = 0 , τ = 0 .
Based on the expression defining B ( F n , F θ 0 S ) , we have I C 2 ( y ) = − I C 1 ( y ) and I C 1 ( x ) is identical for version D and S. Therefore, for version S, we have the representation
n B ( F n , F θ 0 S ) = − n Γ ′ G n ( θ 0 ) = 1 n ∑ i = 1 n I C 1 ( x i ) + n U ∑ i = 1 U I C 2 ( y i ) + o p ( 1 ) . (31)
Note that the size of the random sample drawn from the model distribution is U = τ n and the y i ’s are iid and have the same distribution as the x i ’s but the y i ’s are independent of the x i ’s as the simulated sample is independent from the original sample represented by the data. Therefore,
n Γ ′ G n ( θ 0 ) → L N ( 0 , V ) , V = ( 1 + 1 τ ) V 1 . (32)
It is also clear that the elements of Γ ′ Γ are given by a i j = ∫ − ∞ ∞ ∂ F θ 0 ( u ) ∂ θ i ∂ F θ 0 ( u ) ∂ θ j d F n ( u ) , i = 1 , ⋯ , m , j = 1 , ⋯ , m which converge in probability to the corresponding elements a ¯ i j of the matrix A with
a ¯ i j = ∫ − ∞ ∞ ∂ F θ 0 ( u ) ∂ θ i ∂ F θ 0 ( u ) ∂ θ j d F θ 0 ( u ) , i = 1 , ⋯ , n , j = 1 , ⋯ , m , i.e. (33)
A = ( a ¯ i j ) , i = 1 , ⋯ m , j = 1 , ⋯ , m .
The asymptotic covariance matrix of θ ^ S can be estimated if we can estimate Γ = Γ ( θ 0 ) . Using a result given by Pakes and Pollard (p. 1043), an estimate for Γ is the matrix
Γ ^ n = [ G n ( θ ^ G S + ϵ n e 1 ) − G n ( θ ^ G S ) ϵ n , ⋯ , G n ( θ ^ G S + ϵ n e m ) − G n ( θ ^ G S ) ϵ n ] (34)
e i = ( 0 , 0 , ⋯ , 1 , 0 , ⋯ , 0 ) ′ with 1 occuring at the ith entry of the vector e i and ϵ n = n − δ , δ ≤ 1 2 and in general we can let δ = 1 2 . Note that the columns of Γ ^ n estimate the corresponding columns of Γ ( θ 0 ) with elements depend on ∂ F θ 0 ( x i ) ∂ θ j , i = 1 , ⋯ , n , j = 1 , ⋯ , m as mentioned in section (2.2).
Therefore, using results of Corollary 1 we have the asymptotic for version S
n ( θ ^ S − θ 0 ) → L N ( 0 , D 2 ) with D 2 = ( 1 + 1 τ ) ( A ) − 1 V 1 ( A ) − 1 . (35)
The factor 1 + 1 τ represents the loss of overall efficiency due to simulations
and can be controlled if we let τ ≥ 10 . This factor is identical to the one for simulated unweighted minimum chi-square method or the one for simulated quasi-likelihood method, see Pakes and Pollard [
Define I C 1 ^ ( x ) with its elements given by
I C 1 l ^ ( x ) = 1 n ∑ j = 1 n ( I [ x j ≥ x ] − F n ( x j ) ) ( − b j l ^ ) , l = 1 , ⋯ , m ,
b j l ^ = − ∂ F θ 0 ( x j ) ∂ θ l ^ , j = 1 , ⋯ , n , l = 1 , ⋯ , m are as given by expression (20).
An estimate for V 1 for version S can then be defined as
V 1 ^ = 1 n ∑ i = 1 n ( I C 1 ^ ( x i ) ) ( I C 1 ^ ( x i ) ) ′ . (36)
Consequently, an estimate D 2 ^ for D 2 can be defined as
D 2 ^ = ( 1 + 1 τ ) ( Γ ^ ′ n Γ ^ n ) − 1 V 1 ^ ( Γ ^ ′ n Γ ^ n ) − 1 (37)
Clearly with D 2 ^ available, it will facilitate hypothesis testing for the parameters of the model.
The MM method consists of matching the empirical cumulants with its model counterpart to form estimating equations and solutions will give the moment estimators. For the compound gamma model of example 1 this leads to the system of equations given by
c 1 n = X ¯ = λ α β , s 2 = λ α β 2 ( α + 1 ) , c 3 n = 1 n ∑ i = 1 n ( X i − X ¯ ) 3 = λ α β 3 ( α + 1 ) ( α + 2 ) .
The sample mean and variance are given respectively by X ¯ and s 2 , the moment estimators can be obtained explicitly. Note from these equations let r 3 n = c 3 n s 2 = β ( α + 2 ) and r 2 n = s 2 X ¯ = β ( α + 1 ) which implies r 3 n r 2 n = α + 2 α + 1 and from the last equation, we can solve for α which gives α M ^ the MM estimator for α with α M ^ = 2 r 2 n − r 3 n r 3 n − r 2 n . Since the parameter α ≥ 1 , we might want to define the moment estimator as α M ˜ = min ( α M ^ , 1 ) . It is not difficult to obtain β M ^ = r 2 n α M ^ + 1 and λ M ^ = X ¯ α M ^ β M ^ the corresponding MM estimators for β and λ and when we also consider the constraints imposed on β and λ , this leads to define
β M ˜ = min ( β M ^ , 0 ) and λ M ˜ = min ( λ M ^ , 0 ) .
For the KWT model, there are five parameters so beside the first three empirical cumulants as defined above we also need the fourth and fifth empirical cumulants with
c 4 n = 1 n ∑ i = 1 n ( X i − X ¯ ) 4 − 3 s 4 , c 5 n = 1 n ∑ i = 1 n ( X i − X ¯ ) 5 − 10 c 3 n s 2 and matching c 1 n = c 1 , c 2 n = c 2 , c 3 n = c 3 , c 4 n = c 4 , c 5 n = c 5 will give the moment estimators as in the previous example. It might be easier to let δ = η 2 and from these estimating equations, it is not difficult to see that the following two equations c 3 n c 4 n = c 3 c 4 and c 5 n c 4 n = c 5 c 4 depend only on δ and ω and can be solved numerically to obtain the MM estimators for δ and ω which are given respectively by δ M ^ and ω ^ M . Also, using the first three equations we obtain
λ M ^ = c 3 n ω ^ M 3 + 6 δ M ^ ω ^ M , σ ^ M 2 = c 2 n − λ M ^ ( 2 δ M ^ + ω ^ M 2 ) , μ M ^ = c 1 n − λ M ^ ω ^ M .
We might want to redefine these MM estimators by imposing λ M ^ ≥ 0 , σ ^ M 2 ≥ 0 .
In the limited simulation study, we draw M = 100 samples of size n=1000 for each sample and use U = 10000 , τ = 10 .
For the overall asymptotic relative efficiency (ARE) for the compound gamma model we use
A R E = M S E ( λ ^ S ) + M S E ( α ^ S ) + M S E ( β ^ S ) M S E ( λ ˜ ) + M S E ( α ˜ ) + M S E ( β ˜ ) , the mean square errors (MSE) are estimated using random samples and displayed in
M S E ( π ^ ) = E ( π ^ − π 0 ) 2 .
The range of the parameters being considered is given by
2 ≤ α ≤ 10 , 1 ≤ λ ≤ 10 , 1 ≤ β ≤ 10 .
We find that the SCVM method is more efficient than MM method, the order of ARE gained by using SCVM method is illustrated with results displayed in
α⋱λ | 1.00 | 5.00 | 6.00 | 7.00 | 8.00 | 9.00 | 10.00 |
---|---|---|---|---|---|---|---|
2.00 | 0.4726 | 0.0105 | 0.4970 | 0.6418 | 0.7768 | 0.0030 | 0.2702 |
4.00 | 1.1277 | 0.1560 | 0.0348 | 0.0929 | 0.0393 | 0.0000 | 0.2973 |
6.00 | 1.0468 | 0.0396 | 0.0906 | 0.0070 | 0.0449 | 0.0592 | 0.02834 |
8.00 | 0.9032 | 0.0196 | 0.0351 | 0.0124 | 0.0553 | 0.0068 | 0.0032 |
10.00 | 0.8560 | 0.0352 | 0.3730 | 0.0896 | 0.0010 | 0.0179 | 0.0180 |
The overall efficiency used for comparisons used is A R E = M S E ( λ ^ S ) + M S E ( α ^ S ) + M S E ( β ^ S ) M S E ( λ ˜ ) + M S E ( α ˜ ) + M S E ( β ˜ ) .
λ⋱ω | 0.005 | 0.006 | 0.007 | 0.008 | 0.009 | 0.010 |
---|---|---|---|---|---|---|
0.002 | 0.00000 | 0.00123 | 0.00099 | 0.00069 | 0.00045 | 0.00029 |
0.004 | 0.00070 | 0.00041 | 0.00036 | 0.00022 | 0.00012 | 0.00010 |
0.006 | 0.00038 | 0.00021 | 0.00015 | 0.00007 | 0.00004 | 0.00001 |
0.008 | 0.00019 | 0.00016 | 0.00005 | 0.00001 | 0.00000 | 0.00000 |
0.010 | 0.00018 | 0.00008 | 0.00001 | 0.00000 | 0.00000 | 0.00000 |
The overall efficiency used for comparisons used is A R E = M S E ( μ ^ S ) + M S E ( σ ^ S ) + M S E ( λ ^ S ) + M S E ( ω ^ S ) + M S E ( η ^ S ) M S E ( μ ˜ ) + M S E ( σ ˜ ) + M S E ( λ ˜ ) + M S E ( ω ˜ ) + M S E ( η ˜ ) .
similar findings.
For the KWT model we use the corresponding asymptotic relative efficiency (ARE) and it is defined as
A R E = M S E ( μ ^ S ) + M S E ( σ ^ S ) + M S E ( λ ^ S ) + M S E ( ω ^ S ) + M S E ( η ^ S ) M S E ( μ ˜ ) + M S E ( σ ˜ ) + M S E ( λ ˜ ) + M S E ( ω ˜ ) + M S E ( η ˜ )
The mean square errors (MSE) are similarly defined as in the case of the compound gamma model and again estimated using simulated samples. The ARE is a ratio with the total of mean square errors for the SCVM estimators appearing in the numerator and the total of mean square errors of MM estimators appearing in the denominator.
The key findings are illustrated using
0 ≤ λ ≤ 0.010, 0.005 ≤ ω ≤ 0.010 and 0 ≤ μ ≤ 0.001, 0 ≤ σ ≤ 0.008. With the results displayed in
It appears that SCVM method has the potential to generate more efficient estimators than MM method especially for models with more than two parameters. Like SMHD method, it is also robust and easier to implement than SMHD method as it is based on sample distribution function instead of density estimates. It can handle continuous models with a few discontinuity points with probability masses attached to them where the SMHD method might not be suitable but it might be less efficient than SMHD method for continuous model, in general.
The helpful comments of an anonymous referee and the support of the staffs of OJS which lead to an improvement of the presentation of the paper are gratefully acknowledged.
Luong, A. and Blier-Wong, C. (2017) Simulated Minimum Cramér-Von Mises Distance Estimation for Some Actuarial and Financial Models. Open Journal of Statistics, 7, 815-833. https://doi.org/10.4236/ojs.2017.75058
In this technical appendix, we shall prove that with the conditions of Assumption 1, the condition 4 of Theorem 2 will hold, i.e.,
sup ‖ θ = θ 0 ‖ ≤ δ n n ‖ G n ( θ ) − G ( θ ) − G n ( θ 0 ) ‖ = o p ( 1 ) , i.e.,
n ‖ G n ( θ ) − G ( θ ) − G n ( θ 0 ) ‖ → p 0 uniformly as θ → θ 0 and n → ∞
Now define the sequence of functions g n ( θ ) = n ‖ G n ( θ ) − G ( θ ) − G n ( θ 0 ) ‖ 2 , it suffices to show
g n ( θ ) → p 0 uniformly as θ → θ 0 and n → ∞ .
Using Markov’s type inequality, for any ϵ > 0 , we have the following inequality
P ( g n ( θ ) ≥ ϵ ) ≤ g n e ( θ ) ϵ with g n e ( θ ) = E ( g n ( θ ) ) as given by expression (26).
Consequently, it suffices to have g n e ( θ ) → 0 uniformly as θ → θ 0 and n → ∞ . Clearly under Assumption 1 we have g n e ( θ ) → 0 pointwise but we need to strengthen it to uniform convergence for { g n e ( θ ) } . Therefore, it suffices to have equicontinuity for the sequence { g n e ( θ ) } as the domain of the sequence of functions is compact, see Rudin [
For the notion of stochastic equicontinuity a stochastic version of equicontinuity, see Newey and McFadden [