Certain distributions do not have a closed-form density, but it is simple to draw samples from them. For such distributions, simulated minimum Hellinger distance (SMHD) estimation appears to be useful. Since the method is distance-based, it happens to be naturally robust. This paper is a follow-up to a previous paper where the SMHD estimators were only shown to be consistent; this paper establishes their asymptotic normality. For any parametric family of distributions for which all positive integer moments exist, asymptotic properties for the SMHD method indicate that the variance of the SMHD estimators attains the lower bound for simulation-based estimators, which is based on the inverse of the Fisher information matrix, adjusted by a constant that reflects the loss of efficiency due to simulations. All these features suggest that the SMHD method is applicable in many fields such as finance or actuarial science where we often encounter distributions without closed-form density.
In actuarial science and finance, we often have to fit data with a distribution that is continuous. In several instances, though the distribution does not have a closed-form density, it is not complicated to simulate from it. Such distribution can be infinitely divisible. Also, new distributions can be created by means of a mixing mechanism.
For those distributions, Luong and Bilodeau [
It is conjectured that, asymptotically, the SMHD estimators could attain the lower bound given by the Fisher information matrix adjusted by a factor which is a constant reflecting the loss of efficiency due to simulations from the parametric
models. This constant can be expressed as ( 1 + 1 τ ) , with τ = U n assumed to
remain constant, where n is the original sample size of the data and U is the simulated sample size used to estimate the model density function or distribution. This factor also appears in various methods of estimation based on simulations and reflects the loss of efficiency due to the model density or distribution having to be estimated using a simulated sample drawn from the model distribution. Section 2 of the paper further discusses this factor which appears in simulated unweighted minimum chi-square method and simulated quasi-likelihood method.
In this paper, which can be viewed as a follow-up to the previous paper, we shall show that, indeed, under some regularity conditions, the SMHD estimators will follow an asymptotic normal distribution, and the asymptotic covariance matrix is given by the inverse of the Fisher information matrix adjusted by the
constant ( 1 + 1 τ ) , as conjectured in Luong and Bilodeau [
SMHD estimators are fully efficient among the class of simulated estimators, just as the maximum likelihood (ML) estimators are in the classical set-up.
We shall closely follow the work of Tamura and Boos [
We shall call version D the version based on a parametric family having a closed-form density. We extend the results to a simulated version, version S, where the parametric family requires a density estimate using a random sample drawn from the parametric family as introduced in Luong and Bilodeau [
Furthermore, since minimum distance estimators are in general robust, it makes SMHD estimators applicable whenever there is a need for robustness and evidence that data are contaminated.
In actuarial science and finance, there are many useful densities without closed forms with semi-heavy tails which satisfy the requirements needed for asymptotic efficiency of the SMHD estimators. For examples in actuarial science, see Klugman, Panjer and Wilmot [
To establish asymptotic normality, we shall also make use of Theorem 7.1, given by Newey and McFadden [
For count data, Luong, Bilodeau and Blier-Wong [
The paper is organized as follows. The classical version, version D, is re-examined in section 2. We also extract the relevant results given by Tamura and Boos [
In this section, we shall review some of the results already established by Tamura and Boos [
We shall define some notation before restating Theorem 4.1 given by Tamura and Boos [
We assume we have independent and identically distributed observations X 1 , ⋯ , X n from a parametric family { f θ } , with θ = ( θ 1 , ⋯ , θ m ) ′ , and the true vector of parameters is denoted by θ 0 . For version D as considered by Tamura and Boos [
Q n ( θ ) = ∫ − ∞ ∞ { [ f n ( x ) ] 1 / 2 − [ f θ ( x ) ] 1 / 2 } 2 d x , (1)
where
f n ( x ) = 1 n h n ∑ i = 1 n ω ( x − x i h n ) (2)
is a kernel density estimate based on the sample with nonrandom bandwidth h n , and ω is the kernel density used to obtain MHD estimators. This is version D and we shall state the relevant results in this section.
For version S, which we will consider in section 3, the model density f θ ( x ) is replaced by a density estimate f θ S ( x ) that is constructed similarly to f n ( x ) but using a random sample of size U = τ n drawn from f θ ( x ) instead of the original sample given by the data. SMHD estimators are obtained by minimizing
Q n S ( θ ) = ∫ − ∞ ∞ { [ f n ( x ) ] 1 / 2 − [ f θ S ( x ) ] 1 / 2 } 2 d x (3)
and will be discussed in section 3.
Let s θ = ( f θ ) 1 / 2 and denote the vector of its first partial derivatives by s ˙ θ and the matrix of its second partial derivatives by s ¨ θ . All the partial derivatives are assumed to be continuous with respect to θ .
It is easy to see that, if we can interchange the order of integration and differentiation, ∫ − ∞ ∞ f ¨ θ ( x ) d x is an m × m matrix with entries 0, where f ¨ θ is the matrix of second partial derivatives of f θ . It is assumed implicitly that this requirement is met subsequently.
From the definitions of s ˙ θ and s ¨ θ , and with the previous assumption, we have
s ˙ θ 2 ( f θ ) 1 / 2 = 1 4 ∂ log f θ ∂ θ (4)
and
− ∫ − ∞ ∞ s ¨ θ [ f θ ( x ) ] 1 / 2 d x = ∫ − ∞ ∞ ( s ˙ θ ) ( s ˙ θ ) ′ d x = I ( θ ) 4 , (5)
where I ( θ ) is the commonly used Fisher information matrix for ML estimation.
The following equalities might be used to derive Equations (4) and (5). Using differentiability rules, we have
s ˙ θ = 1 2 ( f θ ) 1 / 2 ∂ f θ ∂ θ
and
s ¨ θ = − 1 4 ( f θ ) 3 / 2 ∂ f θ ∂ θ ∂ f θ ∂ θ ′ + 1 2 ( f θ ) 1 / 2 ∂ 2 f θ ∂ θ ∂ θ ′
if
E θ [ ∂ 2 f θ ∂ θ ∂ θ ′ ] = 0
and
E θ [ ∂ log f θ ∂ θ ∂ log f θ ∂ θ ′ ] = I ( θ ) ,
by assuming that we can interchange the order of integration and differentiation, and E θ [ ⋅ ] is the expectation of the expression inside the brackets under f θ .
Now, we consider the following expression,
ψ θ = − { ∫ − ∞ ∞ s ¨ θ [ f θ ( x ) ] 1 / 2 d x } − 1 s ˙ θ 2 ( f θ ) 1 / 2 , (6)
as given by Tamura and Boos [
ψ θ = [ I ( θ ) ] − 1 ∂ log f θ ∂ θ , (7)
where I ( θ ) is the Fisher information matrix and ∂ log f θ ∂ θ is the vector of the score functions.
Tamura and Boos [
ρ θ = 4 [ I ( θ ) ] − 1 s ˙ θ . (8)
With these equalities and simplifications, we shall restate Theorem 4.1 given by Tamura and Boos [
Theorem 1
If we can find a sequence of positive numbers { α n } with α n → ∞ as n → ∞ , then, provided the following conditions 1 - 8 are met, the MHD estimators θ ^ obtained by minimizing Equation (1) have an asymptotic normal distribution and attain the Cramer-Rao lower bound based on the Fisher information matrix, i.e.,
n ( θ ^ − θ 0 ) → L N ( 0 , [ I ( θ ) ] − 1 )
and θ ^ is first-order as efficient as θ ^ M L , where θ ^ M L is the vector of classical ML estimators.
Here are the eight conditions to meet:
1) The kernel density ω used to construct the density estimate has a compact support W and the bandwidth h n used satisfies the property h n + ( n h n ) − 1 → 0 as n → ∞ .
2) The parameter space Θ is compact and θ 0 is an interior point.
3) The parameterization of the model has no problem of identification, i.e., if θ 1 ≠ θ 2 , then f θ 1 ≠ f θ 2 .
4) n sup t ∈ W Pr [ | X − h n t | > α n ] → 0 as n → ∞ .
5) The ratio ∫ − α n α n | ∂ log f θ ( x ) ∂ θ | d x n h n → 0 as n → ∞ .
6) The sequence M n with M n = sup | x | ≤ α n sup t ∈ W f θ ( x + h n t ) f θ ( x ) is bounded as n → ∞ .
7) The Fisher information matrix I ( θ ) exists and we can interchange the order of differentiation and integration so that Equation (5) holds.
8) The function s θ = ( f θ ) 1 / 2 has first partial derivatives vector s ˙ θ and second partial derivatives matrix s ¨ θ , and all the partial derivatives are continuous with respect to θ .
Conditions 1, 2 and 3 are standard and easily satisfied. Regarding condition 4, Tamura and Boos [
We shall follow these recommendations and will show, in section 3.2, that, for version S, the SHMD estimators given by the vector θ ^ S which minimizes the objective function as given by Equation (3) will have the following asymptotic normality distribution:
n ( θ ^ S − θ 0 ) → L N ( 0 , ( 1 + 1 τ ) [ I ( θ ) ] − 1 ) .
The factor ( 1 + 1 τ ) also appears in other simulated methods of inference. It is
used to discount the efficiency of the minimum unweighted chi-square method to obtain the efficiency of the simulated version; see Pakes and Pollard [
Before we proceed, we would like to extract a few results given by Tamura and Boos [
n ( θ ^ − θ ) = d n ∫ − ∞ ∞ ρ θ { [ f n ( x ) ] 1 / 2 − [ f θ ( x ) ] 1 / 2 } d x (9)
= d n ∫ − ∞ ∞ ρ θ 2 [ f θ ( x ) ] 1 / 2 [ f n ( x ) − f θ ( x ) ] d x (10)
= d n ∫ − ∞ ∞ ψ θ [ f n ( x ) − f θ ( x ) ] d x (11)
= d n ∫ − ∞ ∞ ψ θ d ( F n − F θ ) , (12)
where F n is the commonly used sample distribution function and F θ is the model distribution function.
Now, under the commonly used assumption E θ [ ∂ log f θ ( X ) ∂ θ ] = 0 , justified
if interchanging the order of integration and differentiation is permissible, the last equality can be re-expressed as
n ( θ ^ − θ ) = d 1 n [ I ( θ ) ] − 1 ∑ i = 1 n ∂ log f θ ( X i ) ∂ θ , (13)
from which we can see easily that θ ^ is as efficient as θ ^ M L . Besides, θ ^ is robust, whereas that may not be the case for θ ^ M L . We can also see that, using Equations (9)-(13), we have the following equalities:
n 2 ∫ − ∞ ∞ s ˙ θ { [ f n ( x ) ] 1 / 2 − [ f θ ( x ) ] 1 / 2 } d x = d n ∫ − ∞ ∞ s ˙ θ [ f θ ( x ) ] 1 / 2 [ f n ( x ) − f θ ( x ) ] d x (14)
= d n ∫ − ∞ ∞ 1 2 ∂ log f θ ( x ) ∂ θ [ f n ( x ) − f θ ( x ) ] d x (15)
= d 1 2 n ∑ i = 1 n ∂ log f θ ( X i ) ∂ θ (16)
Beran [
The equalities given by Equations (9)-(16) will be used for establishing asymptotic normality for SMHD estimators in section 3.2. A few notions, namely the notions of continuity in probability and differentiability in probability, which extend the related notions in classical real analysis for nonrandom functions, are needed and they will be presented in section 3.1.
These notions have been introduced and discussed for SMHD estimation for count data, see Luong, Bilodeau and Blier-Wong [
Definition 1 (Continuity in probability)
A sequence of random functions { g n ( θ ) } is continuous in probability at θ * if g n ( θ ) → p g n ( θ * ) whenever θ → θ * . Equivalently, for any ε > 0 and δ 1 > 0 , there exist δ ≥ 0 and n 0 such that
Pr [ | g n ( θ ) − g n ( θ * ) | ≤ ε ] ≥ 1 − δ 1 , for n ≥ n 0 ,
whenever ‖ θ − θ * ‖ ≤ δ . This can be viewed as a stochastic version, or an extension, of the classical definition of continuity in real analysis.
It is well known that the supremum of a continuous function on a compact domain is attained at a point in the compact domain; see Davidson and Donsig [
In order to use Theorem 7.1 of Newey and McFadden [
S ( θ 0 , δ n ) = { θ | ‖ θ − θ 0 ‖ ≤ δ n } ,
and we note that, as n → ∞ , δ n → 0 , and S ( θ 0 , δ n ) → θ 0 .
Property 1
The random function g n ( θ ) , which is continuous in probability and bounded in probability on a compact set Θ , will attain its supremum on a point of Θ in probability.
The justification of this property is similar to the deterministic case, which is a classical result in real analysis. For the random case, again, it suffices to pick a sequence { θ j } in Θ with the property that g n ( θ j ) → p sup θ ∈ Θ g n ( θ ) . Since Θ is compact, we can extract a subsequence { θ j k } from { θ j } having the property θ j k → θ * * which belongs to Θ . Then g n ( θ j k ) → p g n ( θ * * ) and sup θ ∈ Θ g n ( θ ) = p g n ( θ * * ) .
Beside the concept of continuity in probability, we also need the concept of differentiability in probability which is given below.
Definition 2 (Differentiability in probability)
A sequence of random functions { g n ( θ ) } is differentiable with respect to
θ at θ 0 in probability if lim ε → 0 g n ( θ + ε e j ) − g n ( θ ) ε = p v n ( j ) ( θ ) , for
j = 1 , 2 , ⋯ , m , where e j = ( 0 , 0 , ⋯ , 0 , 1 , 0 , ⋯ , 0 ) ′ with 1 appearing only in the jth entry. We also require that v n ( j ) ( θ ) be continuous in probability for j = 1 , 2 , ⋯ , m .
We can let the derivatives vector be denoted as v n ( θ ) = ( v n ( 1 ) ( θ ) , ⋯ , v n ( m ) ( θ ) ) ′ .
From definition 2, we can see that differentiability in probability is a notion which parallels the classical notion of differentiability, where each partial derivative of the nonrandom function is required to be continuous.
A similar notion of differentiability in probability has been used in the stochastic processes literature; see Gusak et al. [
Below are the assumptions we need to make to establish asymptotic normality for SMHD estimators in section 3.2, and they appear to be satisfied in general.
For the simulated version, we implicitly assume that the sample size U used to draw samples from the parametric family { f θ } is proportional to the sample size n, i.e., U = τ n . Moreover, the same seed is used across different values of θ to draw the simulated samples. Under those assumptions, { f θ S } can be viewed as a proxy model for { f θ } . However, unlike other methods of simulated inference that require that we look for another parametric model that is different from the model { f θ } , the proxy model here is directly based on the parametric model.
Assumption 1
The density of the parametric model has the continuity property with ( f θ ) 1 / 2 → ( f θ * ) 1 / 2 whenever θ → θ * .
Assumption 2
The simulated counterpart has the continuity in probability property with ( f θ S ) 1 / 2 → p ( f θ * S ) 1 / 2 whenever θ → θ * .
In general, assumption 2 is met if assumption 1 is.
Assumption 3
( f θ ) 1 / 2 is differentiable with respect to θ .
We also need the following assumption for applying Theorem 7.1 given by Newey and McFadden [
Assumption 4
( f θ S ) 1 / 2 , with the same seed being used across different values of θ , is differentiable in probability with the same derivatives vector as f θ , namely
s ˙ θ = ( ∂ f θ ∂ θ 1 , ⋯ , ∂ f θ ∂ θ m ) . (17)
This assumption appears to be reasonable as ( f θ S ) 1 / 2 → p ( f θ ) 1 / 2 and ( f θ S ) 1 / 2 is continuous in probability. Also, the partial derivatives in probability can be found using definition 2, which involves considering limits which are similar to the deterministic case of real analysis. We can summarize the assumption as follows: by assuming ( f θ ) 1 / 2 to be differentiable, we have that ( f θ S ) 1 / 2 is differentiable in probability. This appears to be reasonable.
For version S, because the objective function to be minimized is nonsmooth, we will use Theorem 7.1 of Newey and McFadden [
The objective function Q n S ( θ ) is nonsmooth and the estimators are given by the vector θ ˜ which is obtained by minimizing Q n S ( θ ) . We can consider the vector θ * which is obtained by minimizing a smooth function Q n a ( θ ) which approximates Q n S ( θ ) . We assume Q n S ( θ ) is differentiable in probability at θ 0 , with the derivatives vector given by D n ( θ 0 ) .
Also, if Q n S ( θ ) → p Q ( θ ) and we assume that Q ( θ ) is nonrandom and
twice differentiable with H = H ( θ 0 ) = ∂ 2 Q ( θ 0 ) ∂ θ ∂ θ ′ , and Q ( θ ) attains its minimum at θ = θ 0 , then we can define
Q n a ( θ ) = Q n S ( θ 0 ) + [ D n ( θ 0 ) ] ′ ( θ − θ 0 ) + 1 2 ( θ − θ 0 ) ′ H ( θ − θ 0 ) . (18)
The vector θ * which minimizes Q n a ( θ ) can be obtained explicitly since Q n a ( θ ) is a quadratic function of θ . It is given by θ * = θ 0 − H − 1 D n ( θ 0 ) . Using equality in distribution, we have
n ( θ * − θ 0 ) = d − H − 1 n D n ( θ 0 ) . (19)
If the remainder of the approximation is small, we also have
n ( θ ˜ − θ 0 ) = d n ( θ * − θ 0 ) = d − H − 1 n D n ( θ 0 ) . (20)
Before defining the remainder term R n ( θ ) , we note that the following approximation Q n b ( θ ) ,
Q n b ( θ ) = Q n S ( θ 0 ) + [ D n ( θ 0 ) ] ′ ( θ − θ 0 ) + Q ( θ ) − Q ( θ 0 ) , (21)
can be viewed as equivalent since Q ( θ ) − Q ( θ 0 ) ≈ 1 2 ( θ − θ 0 ) ′ H ( θ − θ 0 ) when we account for the fact that, since Q ( θ ) is minimized at θ = θ 0 , ∂ Q ( θ 0 ) ∂ θ = 0 .
The remainder term is defined as
R n ( θ ) = n { [ Q n S ( θ ) − Q n S ( θ 0 ) ] − [ D n ( θ 0 ) ] ′ ( θ − θ 0 ) − [ Q ( θ ) − Q ( θ 0 ) ] } ‖ θ − θ 0 ‖ ,(22)
and, for the approximation to be valid, we require that sup ‖ θ − θ 0 ‖ ≤ δ n | R n ( θ ) | → p 0 as n → ∞ and δ n → 0 .
The following Theorem 2 is essentially Theorem 7.1 given by Newey and McFadden [
Theorem 2
Suppose that Q n S ( θ ˜ ) ≤ inf θ ∈ Θ Q n S ( θ ) + ο p ( 1 n ) , and
1) Q ( θ ) is minimized at θ = θ 0 ;
2) θ 0 is an interior point of the compact parameter space Θ ;
3) Q ( θ ) is twice differentiable at θ = θ 0 with nonsingular matrix H;
4) n D n ( θ 0 ) → L N ( 0 , K ) ;
5) sup ‖ θ − θ 0 ‖ ≤ δ n | R n ( θ ) | → p 0 as n → ∞ and δ n → 0 .
Then, n ( θ ˜ − θ 0 ) → L N ( 0 , H − 1 K H − 1 ) .
Regularity conditions 1 - 3 of Theorem 2 can easily be checked. Condition 4 is a consequence of the results already obtained for version D. The most difficult condition to be verified is condition 5. Because it involves technicalities, its verification will be done toward the end of this section.
Here, assuming all conditions can be validated, we apply Theorem 2 for SMHD estimation with θ ˜ = θ ^ S .
Clearly, the objective function Q n S ( θ ) is as defined by Equation (3) and
Q n S ( θ ) → p Q ( θ ) = ∫ − ∞ ∞ { [ f θ 0 ( x ) ] 1 / 2 − [ f θ ( x ) ] 1 / 2 } 2 d x , (23)
with ( f n ) 1 / 2 → p ( f θ 0 ) 1 / 2 and ( f θ S ) 1 / 2 → p ( f θ ) 1 / 2 as n → ∞ .
Under the differentiability assumptions for s θ , the second derivatives matrix
of Q ( θ ) at θ = θ 0 is given by H = ∂ 2 Q ( θ 0 ) ∂ θ ∂ θ ′ = 2 ∫ − ∞ ∞ ( s ˙ θ ) ( s ˙ θ ) ′ d x = 1 2 I ( θ ) . Therefore,
H − 1 = 2 [ I ( θ 0 ) ] − 1 . (24)
Using assumptions 1 - 4 and by performing limit operations as for finding derivatives in real analysis, we can conclude n Q n S ( θ ) is differentiable in probability with derivatives vector n D n ( θ 0 ) at θ = θ 0 and
n D n ( θ 0 ) = − n 2 ∫ − ∞ ∞ { [ f n ( x ) ] 1 / 2 − [ f θ 0 S ( x ) ] 1 / 2 } s ˙ θ 0 d x , (25)
which can be re-expressed as
− n D n ( θ 0 ) = n 2 ∫ − ∞ ∞ { [ f n ( x ) ] 1 / 2 − [ f θ 0 ( x ) ] 1 / 2 } s ˙ θ 0 d x − n 2 ∫ − ∞ ∞ { [ f θ 0 S ( x ) ] 1 / 2 − [ f θ 0 ( x ) ] 1 / 2 } s ˙ θ 0 d x (26)
Let us denote the first and second terms of Equation (26) by Y 1 and Y 2 respectively. Then − n D n ( θ 0 ) = Y 1 − Y 2 , with Y 1 and Y 2 being independent since the simulated sample is independent from the original sample. − n D n ( θ 0 ) will have a limit distribution, hence it is bounded in probability and − n D n ( θ ) will be continuous in probability for θ ∈ S ( θ 0 , δ n ) = { θ | ‖ θ − θ 0 ‖ ≤ δ n } for n ≥ n 0 , where n 0 is a positive integer, by invoking the Dominated Convergence Theorem if necessary.
We have − H − 1 n D n ( θ 0 ) = d H − 1 Y 1 − H − 1 Y 2 , and, using Equations (24)-(26), we can conclude
− H − 1 n D n ( θ 0 ) → d N ( 0 , ( 1 + 1 τ ) [ I ( θ 0 ) ] − 1 ) , (27)
which implies
n ( θ ^ S − θ 0 ) → d N ( 0 , ( 1 + 1 τ ) [ I ( θ 0 ) ] − 1 ) . (28)
From the results given by Equations (27) and (28), asymptotic properties suggest that the SMHD estimators will have high efficiency in large samples as the lower bound for simulated estimators is attained. We should also keep τ ≥ 10 if possible to minimize the loss of efficiency due to simulations and the same seed should be used to generate simulated samples across different values of θ .
To assess the performance of the SMHD estimators in finite samples, we need simulation studies which are based on the parametric family being considered as asymptotic theory, despite being quite general, might not be applicable for finite samples, especially with sample size n ≤ 100 .
We now proceed to verify sup ‖ θ − θ 0 ‖ ≤ δ n | R n ( θ ) | → p 0 , where
R n ( θ ) = n { [ Q n S ( θ ) − Q n S ( θ 0 ) ] − [ D n ( θ 0 ) ] ′ ( θ − θ 0 ) − [ Q ( θ ) − Q ( θ 0 ) ] } ‖ θ − θ 0 ‖ .
Once again, S ( θ 0 , δ n ) = { θ | ‖ θ − θ 0 ‖ ≤ δ n } is a shrinking compact set: we note that, as n → ∞ , δ n → 0 , and S ( θ 0 , δ n ) → θ 0 .
In order to confirm that Theorem 2 is applicable, we need to study the properties of R n ( θ ) . Given that R n ( θ ) can be defined at θ = θ 0 as R n ( θ 0 ) = 0 , we would like to establish the following for R n ( θ ) :
1) R n ( θ ) is bounded in probability.
2) R n ( θ ) is continuous in probability for all θ ∈ S ( θ 0 , δ n ) , for n ≥ n 0 .
Clearly, if conditions 1 and 2 hold, then sup ‖ θ − θ 0 ‖ ≤ δ n | R n ( θ ) | is attained at a point θ n ∗ ∈ S ( θ 0 , δ n ) in probability for n ≥ n 0 as R n ( θ ) is continuous in probability. It would then follow that,for n ≥ n 0 , we have the following equality in probability:
sup ‖ θ − θ 0 ‖ ≤ δ n | R n ( θ ) | = p | R n ( θ n * ) | ,
with | R n ( θ n * ) | → p | R n ( θ 0 ) | = 0 as θ n * → θ 0 . Therefore, Theorem 2 will be justified for SMHD estimators.
We still need to establish the above results.
Define n G n ( θ ) = n [ Q n S ( θ ) − Q ( θ ) ] . Then, clearly, n G n ( θ ) is differentiable in probability at θ = θ 0 with the derivatives vector given by
[ D n ( θ 0 ) ] ′ since ∂ Q ( θ 0 ) ∂ θ = 0 and Q ( θ 0 ) = 0 .
We also have R n ( θ ) → p 0 as θ → θ 0 . If we define R n ( θ 0 ) = 0 , we can extend R n ( θ ) to be continuous in probability for θ ∈ S ( θ 0 , δ n ) , including the point θ = θ 0 , for n ≥ n 0 .
The expression n G n ( θ ) − n G n ( θ 0 ) can be assumed to be bounded in probability in a neighborhood of θ 0 , since it can be approximated by n [ D n ( θ 0 ) ] ′ ( θ − θ 0 ) and n D n ( θ 0 ) has a limit distribution as discussed earlier; see Equations (26) and (27).
Note that we can also write
n G n ( θ ) − n G n ( θ 0 ) − n [ D n ( θ 0 ) ] ′ ( θ − θ 0 ) = n { [ Q n S ( θ ) − Q n S ( θ 0 ) ] − [ D n ( θ 0 ) ] ′ ( θ − θ 0 ) − [ Q ( θ ) − Q ( θ 0 ) ] }
Therefore, it is not difficult to see that the above expression is continuous in probability. As a result, R n ( θ ) is bounded in probability and continuous in probability, and so is | R n ( θ ) | .
Hence, the use of Theorem 2 is justified for the SMHD estimators.
Moreover, SMHD estimators are robust as they are obtained by minimizing a distance; see Donoho and Liu [
Asymptotic properties established in this paper suggest that SMHD estimators are very efficient for large samples for parametric models where all the positive integer moment exists. For the subset of such parametric models that have no closed-form densities, as often are encountered in finance and actuarial science, SMHD estimators appear to be very suitable for large samples based on asymptotic normality results obtained. For any parametric family failing to have finite moments of all positive integer orders, SMHD estimators remain consistent and robust, but large-scale simulation studies seem to be necessary to study the efficiency of the estimators for the specific parametric model being considered.
The helps received from the Editorial staffs of OJS for preparing a revised version of the paper are gratefully acknowledged.
The authors declare no conflicts of interest regarding the publication of this paper.
Luong, A. and Bilodeau, C. (2018) Asymptotic Normality Distribution of Simulated Minimum Hellinger Distance Estimators for Continuous Models. Open Journal of Statistics, 8, 846-860. https://doi.org/10.4236/ojs.2018.85056