Likelihood and Quadratic Distance Methods for the Generalized Asymmetric Laplace Distribution for Financial Data

doi:10.4236/ojs.2017.72025

Open Journal of Statistics
Vol.07 No.02(2017), Article ID:75963,22 pages
10.4236/ojs.2017.72025

Andrew Luong

●How to Cite this Article

École d’actuariat, Université Laval, Canada

This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).

http://creativecommons.org/licenses/by/4.0/

Received: March 6, 2017; Accepted: April 27, 2017; Published: April 30, 2017

ABSTRACT

Maximum likelihood (ML) estimation for the generalized asymmetric Laplace (GAL) distribution also known as Variance gamma using simplex direct search algorithms is investigated. In this paper, we use numerical direct search techniques for maximizing the log-likelihood to obtain ML estimators instead of using the traditional EM algorithm. The density function of the GAL is only continuous but not differentiable with respect to the parameters and the appearance of the Bessel function in the density make it difficult to obtain the asymptotic covariance matrix for the entire GAL family. Using M-estimation theory, the properties of the ML estimators are investigated in this paper. The ML estimators are shown to be consistent for the GAL family and their asymptotic normality can only be guaranteed for the asymmetric Laplace (AL) family. The asymptotic covariance matrix is obtained for the AL family and it completes the results obtained previously in the literature. For the general GAL model, alternative methods of inferences based on quadratic distances (QD) are proposed. The QD methods appear to be overall more efficient than likelihood methods infinite samples using sample sizes $n \leq 5000$ and the range of parameters often encountered for financial data. The proposed methods only require that the moment generating function of the parametric model exists and has a closed form expression and can be used for other models.

Keywords:

M-Estimators, Cumulant Generating Function, Chi-Square Tests, Generalized Hyperbolic Distribution, Simplex Pattern Search, Variance Gamma, Minimum Distance, Value at Risk, Entropic Value at Risk, European Call Option

1. Introduction

1.1. Generalized Asymmetric Laplace (GAL) Distribution

The generalized asymmetric Laplace distribution (GAL) is a four parameters infinitely divisible continuous distribution with four parameters given by

$β = {(θ, μ, σ, τ)}^{'} .$ (1)

The parameter $θ$ is a location parameter and $σ$ is a scale parameter. The parameter $μ$ can be viewed as the asymmetry parameter of the distribution and $τ$ is the shape parameter which controls the thickness of the tail of the distribution. If $μ = 0$ , the distribution is symmetric around $θ$ , see Kotz et al. ( [1] , p. 180). It is flexible and can be used as an alternative to the four parameters stable distribution. The GAL distribution has a thicker tail than the normal distribution but unlike the stable distribution where even the first positive moment might not exist, all the positive integer moments exist. Its moment generating function is

$M (s) = \frac{e^{θ s}}{{(1 - μ s - \frac{1}{2} σ^{2} s^{2})}^{τ}}, σ, τ \geq 0, β = {(θ, μ, σ, τ)}^{'},$ (2)

$s$ must satisfy the inequality

$1 - \frac{1}{2} σ^{2} s^{2} - μ s > 0.$ (3)

The GAL distribution is also known as variance gamma (VG) distribution. It was introduced by Madan and Senata [2] , Madan et al. [3] . For the GAL distribution, we adopt the parameterizations used by Kotz et al. [1] . It is not difficult to relate them to the original parameterization, see Senata [4] . The commonly used parameterisations will be discussed in Section (1.2).

From the moment generating function, it is easy to see that the first four cumulants of the GAL distribution are given by

$c_{1} = θ + τ μ, c_{2} = τ σ^{2} + τ μ^{2},$ (4)

$c_{3} = 3 τ σ^{2} μ + 2 τ μ^{3}, c_{4} = 6 τ μ^{4} + 12 τ μ^{2} σ^{2} + 3 τ σ^{4} .$ (5)

Note that $c_{3} = 0$ if $μ = 0$ and $c_{3}$ can be positive or negative depending on values of the parameters .Therefore, the GAL distribution can be symmetric or asymmetric. Furthermore, with $c_{4} > 0$ , the tail of the GAL distribution is thicker than the normal distribution. These characteristics make the GAL distribution useful for modelling asset returns, see Senata [4] for further discussions on financial modelling using the GAL distribution.

The moments can be obtained based on cumulants and they are given below,

$\begin{array}{l} E (X) = c_{1}, \\ E {(X - E (X))}^{2} = c_{2}, \\ E {(X - E (X))}^{3} = c_{3}, \\ E {(X - E (X))}^{4} = 6 τ μ^{4} + 12 τ μ^{2} σ^{2} + 3 τ σ^{4} + 3 τ^{2} σ^{4} + 6 σ^{2} τ^{2} μ^{2} + 3 μ^{4} τ^{2} . \end{array}$

The GAL distribution belongs to the class of normal mean-variance mixture distributions where the mixture variable follows agamma distribution with shape parameter $τ$ and scale parameter equal to 1, i.e., with density function

$f_{w} (w) = \frac{1}{Γ (τ)} w^{τ - 1} e^{- w}, w, τ > 0$ , $Γ (.)$ is the commonly used gamma function.

This leads to the following representation in distribution using Expression (4.1.10) in Kotz et al. ( [3] , p. 183),

$X = {}^{d}θ + μ Y + σ \sqrt{Y} Z$ where (6)

1) $Z ~ N (0, 1),$

2) $Y ~ G (τ, 1)$ as given by expression (8) and independent of $Z$

3) $θ, μ, σ, τ$ are parameters with $σ, τ > 0$ 。

The representation given by expression (6) is useful for simulating samples from a GAL distribution. Note that despite the simple closed form expression for the moment generating function, the density function is rather complicated as it depends on the modified Bessel function of the third kind with real index $λ$ , i.e., $K_{λ} (u)$ see Kotz et al. ( [1] , p. 315) for various representations for the function $K_{λ} (.)$ . The density function will be introduced in Section (1.2). Using the moment generating function of the GAL distribution, it is easy to see that the distribution is related to a Lévy process, see Podgorski and Wegener [5] for GAL processes.

The GAL parametric family can be introduced as a limit case of the generalized hyperbolic (GH) family where the mixing random variable belongs to the generalized inverse Gaussian family, see Mc Neil et al. [6] for properties of the GH family. Note that the GAL family is nested within the bilateral gamma family as the GAL random variable can be represented in distribution as

$X = {}^{d}θ + \frac{σ}{\sqrt{2}} (\frac{1}{κ} G_{1} - κ G_{2})$ , (7)

$G_{1}$ and $G_{2}$ are independent random variables with common gamma distribution. The common mgf of the gamma distribution is given by

$M_{G} (s) = \frac{1}{{(1 - s)}^{α}}$ , see expression (4.1.1) given by Kotz et al. ( [1] , p. 183).

If we introduce $κ$ using $μ = \frac{σ}{\sqrt{2}} (\frac{1}{κ} - κ)$ , the GAL distribution can also be

parameterised using the four equivalent parameters, i.e., with $θ, σ, κ, τ$ .

Moment estimation for the GAL family has been given by Podgorski and Wegener [5] . Maximum likelihood estimation for the GH family by fixing the parameter $τ$ within some bounds has been given by Protassov [7] , McNeil et al. ( [6] , p. 80). For ML estimation, they implicitly assumed that the mixing random

variable $Y ~ Gamma (τ, \frac{ϕ}{2})$ which implies the following form of the moment

generating function for $X$ ,

$M (s) = \frac{e^{θ s}}{{(1 - \frac{ϕ}{2} (μ s + \frac{1}{2} σ^{2} s^{2}))}^{τ}}, ϕ > 0.$

From the above expression, it is easy to see that the parameter $ϕ > 0$ is redundant and the parameterisation using five parameters will introduce instability in the estimation process. It appears to be simpler to use the parameterisation given by Kotz et al. [1] or the parametrisation used by Madan and Senata [2] , Madan et al. [3] , Senata [4] with only four parameters by letting $ϕ = 2$ .

Hu [8] advocated fitting the GAL distribution using the EM algorithm but the drawback of this approach is the difficulty to obtain the information matrix using the method of Louis [9] , see McLachlan and Krishnan [10] for a comprehensive review of the EM algorithm. The lack of a closed form asymptotic covariance matrix for the estimators might create difficulties for hypotheses testing.

1.2. Some Properties of the GAL Distribution and Parameterisations

In this subsection, we first review a few parameterisations which are commonly used for the GAL distribution.

Definition 1 (GAL density)

From the GH density, the density function for the GAL distribution can be obtained and it can be expressed as

$\begin{matrix} f (x; θ, σ, μ, τ) = \frac{\sqrt{2} e^{\frac{μ}{σ} (\frac{X - θ}{σ})}}{σ \sqrt{π} Γ (τ)} {(\frac{1}{\sqrt{2 + {(\frac{μ}{σ})}^{2}}})}^{τ - 0.5} {(\frac{| X - θ |}{σ})}^{τ - 0.5} \\ \cdot K_{τ - 0.5} (\sqrt{2 + {(\frac{μ}{σ})}^{2}} \frac{| X - θ |}{σ}) . \end{matrix}$ (8)

The vector of parameters is $β = {(θ, μ, σ, τ)}^{'}$ and we shall call this parametrisation parameterisation1.The density can be derived using thenormal mean variance mixture representation given by expression (6). See expression (3.30) given by Mc Neil et al. ( [6] , p. 78).

Alternatively, by letting $μ = \frac{σ}{\sqrt{2}} (\frac{1}{κ} - κ)$ and keeping other parameters as in

parametrisation 1, we obtain the following expression for the density of a GAL distribution

$\begin{array}{l} f (x; θ, σ, κ, τ) = \frac{\sqrt{2} e^{\frac{\sqrt{2}}{2} (\frac{1}{κ} - κ) (\frac{X - θ}{σ})}}{σ \sqrt{π} Γ (τ)} {(\frac{\sqrt{2}}{\frac{1}{κ} + κ})}^{τ - 0.5} {(\frac{| X - θ |}{σ})}^{τ - 0.5} \\ \cdot K_{τ - 0.5} (\frac{1}{\sqrt{2}} (\frac{1}{κ} + κ) \frac{| X - θ |}{σ}) . \end{array}$ (9)

with the vector of parameters given by $β = {(θ, κ, σ, τ)}^{'}$ . We shall call this parameterisation, parameterisation 2 which is used by Kotz et al. ( [1] , p. 184).

Note that $θ, σ$ are respectively the location and scale parameter with either parameterisation 1 or 2. Setting $θ = 0, σ = 1$ , the standardized GAL density with parameterisation 2 will have only two parameters and it is given by

$f_{ε} (x; κ, τ) = \frac{\sqrt{2} e^{\frac{\sqrt{2}}{2} (\frac{1}{κ} - κ) x}}{\sqrt{π} Γ (τ)} {(\frac{\sqrt{2}}{\frac{1}{κ} + κ})}^{τ - 0.5} {(| x |)}^{τ - 0.5} K_{τ - 0.5} (\frac{1}{\sqrt{2}} (\frac{1}{κ} + κ) | x |)$

or equivalently by using parametrisation1,

$f_{ε} (x; μ, τ) = \frac{\sqrt{2} e^{μ x}}{\sqrt{π} Γ (τ)} {(\frac{1}{\sqrt{2 + {(μ)}^{2}}})}^{τ - 0.5} {(| x |)}^{τ - 0.5} K_{τ - 0.5} (\sqrt{2 + {(μ)}^{2}} | x |)$ .

Following Kotz et al. [1] we only use these two parametrisations but it is easy to see their relationships with parametrisation 3 used by Madan et al. [3] and Senata [4] . With parametrisation 3, the mgf of the GAL distribution is

$M (s) = \frac{e^{c s}}{{(1 - θ^{'} ν s - \frac{1}{2} ν σ'^{2} s^{2})}^{1 / ν}},$ (10)

the parameters are $θ^{'}, σ^{'}, ν, c$ with

$θ^{'} = \frac{μ}{ν}, {σ^{'}}^{2} = \frac{σ^{2}}{ν}, ν = 1 / τ, c = θ$ .

The first four moments using parameterisation 3 as given by Senata ( [4] , p. 181) aregiven below,

$\begin{array}{l} E (X) = c + θ^{'}, V (X) = {σ^{'}}^{2} + {θ^{'}}^{2} ν, \\ E {(X - E (X))}^{3} = 2 {θ^{'}}^{3} ν^{2} + 3 {σ^{'}}^{2} θ^{'} ν, \\ E {(X - E (X))}^{4} = 3 {σ^{'}}^{4} ν + 12 {σ^{'}}^{2} {θ^{'}}^{2} ν^{2} + 6 {θ^{'}}^{4} ν^{3} + 3 {σ^{'}}^{4} + 6 {σ^{'}}^{2} {θ^{'}}^{2} ν + 3 {θ^{'}}^{4} ν^{2} . \end{array}$ ,

The GAL random variable can also be expressed as the difference of two independent gamma random variables, the GAL random variable is nested inside the class of bilateral gamma random variable $Y$ which can be represented as

$Y = {}^{d}θ + G_{1} - G_{2}$ (11)

with $G_{1}$ , $G_{2}$ are independent gamma random variables with the mgf’s given

respectively by $M_{G_{1}} (s) = \frac{1}{{(1 - β_{1} s)}^{α}}$ and $M_{G_{2}} (s) = \frac{1}{{(1 - β_{2} s)}^{α}}$ . We obtain the GAL random variable by letting $β_{1} = \frac{σ}{κ \sqrt{2}}$ and $β_{2} = \frac{κ σ}{\sqrt{2}}$ .

The class of bilateral gamma distribution was introduced by Küchler and Tappe [11] and they have shown that the Esscher transform of a bilateral gamma distribution remains within this class of distribution .More specifically, let $Y^{E}$

be the random variable with mgf given by $M_{Y^{E}} (s) = \frac{M_{Y} (s + h)}{M_{Y} (s)}$ . I is easy to see

that $Y^{E} = {}^{d}θ + {\bar{G}}_{1} - {\bar{G}}_{2}$ , ${\bar{G}}_{1}$ and ${\bar{G}}_{2}$ are independent gamma random variables with common shape parameter $α (α = 1)$ and scale parameters given

respectively by ${\bar{β}}_{1} = \frac{β_{1}}{1 - β_{1} h}$ and ${\bar{β}}_{2} = \frac{β_{2}}{1 + β_{2} h}$ .

For option pricing with the risk neutral approach, this property is useful as it is easy to simulate samples from a bilateral gamma distribution. The use of Esscher transform to find risk neutral parameters for option pricing in financeisdue to the seminal works of Gerber and Shiu [12] . The Esscher transform risk neutral parameters can also be interpreted as minimum entropy risk neutral parameters. See Miyahara [13] for this interpretation, see section 4 for more discussions on financial applications.

For numerical methods to find estimators, Nelder-Mead simplex method and related derivative free simplex methods are recommended. Derivative free simplex direct search methods are well described in chapter 16 of the book by Bierlaire [14] .

The paper is organized as follows. In Section 2, some submodels of the GAL family are introduced to highlight the difficulty on obtaining the asymptotic covariance matrix using classical likelihood theory. Asymptotic properties of the ML estimators are investigated in section (3). The ML estimators for the GAL

family are shown to be consistent for $τ > \frac{1}{2}$ . For the special case with $τ = 1$ ,

this corresponds to the asymmetric Laplace (AL) model, we obtain the asymptotic covariance matrix in closed form using the approach based on M-estima- tion theory as given by Huber [15] which completes the missing components of expression (2) given by Kotz et al. ( [16] , p. 818). As an alternative to ML estimation, QD estimation based on matching cumulant generating functions is developed in section (4) for the entire GAL family. The QD estimators are shown to be consistent and follow an asymptotic normal distribution. The asymptotic covariance matrix can be obtained in closed form for the entire GAL family using QD methods which makes testing for parameters easy to implement. Chi-square goodness of fit tests statistics can also be constructed based on the distance function used to obtained QD estimators. The methods are also general and can be applied to other models. Numerical issues and simulations illustrations are discussed in Section (5). A limited simulation study shows that the proposed QD estimators perform better than ML estimators overall for sample sizes $n \leq 5000$ using parameters values often encountered for financial data. Some applications drawn from finance are discussed in Section (6).

We shall consider first a few submodels of the GAL model to show the difficulties encountered when likelihood theory is used to obtain the asymptotic covariance matrix for ML estimators.

The difficulties are mainly due to the score functions when viewed as functions of the parameters have a discontinuity point and fail to be differentiable. If the asymptotic covariance matrix for the ML estimators is derived based on likelihood theory, it will have missing components. This is the problem of expression (2) given by Kotz et al. ( [16] , p. 818) for the AL family, a subfamily of the GAL family. M-estimation theory will be used to replace likelihood theory for deriving the asymptotic covariance matrix.

2. Some Subfamilies of the GAL Family

Example 1

Let $μ = 0, τ = 1, σ = 1, τ = 1$ and the only parameter is the location parameter

and the family is symmetric around $θ$ . Using the result $K_{\frac{1}{2}} (u) = \sqrt{\frac{π}{2 u}} e^{- u}$ , the

density function is reduced to

$f (x, θ) = \frac{1}{2 s_{0}} e^{- \frac{| x - θ |}{s_{0}}}, s_{0} = \frac{1}{\sqrt{2}} .$

Equivalently,

$f (x, θ) = f_{0} (x - θ), f_{0} (x) = \frac{1}{\sqrt{2}} e^{- \sqrt{2} | x |}, - \infty < x < \infty .$

This is the well known double exponential distribution, the maximum likelihood estimator for $θ$ is the sample median. There is no Fisher information matrix available as the score function is discontinuous with respect to the parameter $θ$ . The asymptotic variance of the sample median can be found by using M-estimation theory, see Huber [17] , Huber [15] , also see Amemiya ( [18] , p. 148-154) on the least absolute deviations (LAD) estimator .We shall use the same approach to derive the asymptotic covariance matrix for the ML estimators for the GAL distribution with $τ = 1$ . The GAL distribution when $τ = 1$ is the asymmetric Laplace (AL) distribution. The AL distribution will be introduced below.

Example 2

Using the density of the GAL distribution and setting $τ = 1$ , we obtain the AL distribution with only 3 parameters. The location and scale parameters are given respectively by $θ, σ$ and the asymmetry parameter $μ$ . If parameterisation 2 is used, the density function $g (x; θ, σ, κ)$ of the AL distribution is based on the standardized AL density as given by expression (4.1.31) in Kotz et al. ( [1] , p. 189) with

$\begin{array}{l} g (x; θ, σ, κ) = \frac{\sqrt{2}}{σ γ} \exp (\frac{\sqrt{2}}{2} δ (\frac{X - θ}{σ})) \exp (- \frac{\sqrt{2}}{2} γ (\frac{| X - θ |}{σ})), \\ γ = κ + \frac{1}{k}, δ = κ - \frac{1}{k} . \end{array}$

The AL family can be considered as a subfamily of the GAL family and the score functions for this model are again discontinuous. We shall derive the asymptotic covariance matrix using M-estimation theory in Section (3.2) and complete the expression (2) of Kotz et al. ( [16] , p. 818). The expression derived by the authors has missing components as it is derived based on likelihood theory. Kotz et al. [16] used a different parametrisation but it is equivalent to the one used in Kotz et al. ( [1] , p. 189) and it is not difficult to establish the links between these 2 parameterisations.

3. Maximum Likelihood Estimation for the GAL Family

3.1. Maximum Likelihood Estimation for the GAL Distribution

For consistency of the MLE, the following Theorem which is Theorem 2.5 given by Newey and McFadden ( [19] , p. 2131) is useful. We make the basic assumption that we have a random sample which consists of n iid observations $X_{1}, \dots, X_{n}$ drawn from the GAL parametric family with density $f (x; β)$ where $β_{0}$ is the vector of the true parameters.

Theorem (Consistency)

Assume that:

1) If $β_{1} \neq β_{2}$ then $f (x; β_{1}) \neq f (x; β_{2})$ .

2) The parameter space $Θ$ is compact, $β_{0} \in Θ$ .

3) $f (x; β)$ is a continuous with respect to $β$ .

4) $E (\sup_{β \in Θ} | \ln f (x; β)) < \infty$ .

Under the conditons stated, the ML estimators (MLE) given by the vector $\hat{β}$ is obtained by maximizing the log of the likelihood function

$\ln L (β) = \sum_{i = 1}^{n} \ln f (x; β)$ is consstent, $\hat{β} \overset{p}{\to} β_{0}$ .

One can see that the conditions for consistency are mild, the condition d) will

be satisfied for the GAL family if $τ > \frac{1}{2}$ as the density function remains bounded. For $τ \leq \frac{1}{2}$ , the density functions with $θ = 0, μ = 0$ tend to infinity as

$x \to 0_{+}$ , see Theorem 4.1.2 given by Kotz et al. ( [1] , p. 190-192).

It might be possible to prove consistency using the approach to obtain results of Theorem 4 by Broniatowski et al. ( [20] , p. 2578).

For asymptotic normality, it is more complicated as standard theory often requires that the function $\ln L (β)$ being twice differentiable with respect to $β$ . The appearance of the Bessel function creates further complications. It makes it very difficult to establish asymptotic properties even with the use of M estimation theory.

For the special case with $τ = 1$ which corresponds to the AL distribution, the density function can be expressed without the use of the Bessel function and M- estimation theory can be used to find the asymptotic covariance matrix for the ML estimators. Asymptotic normality has been shown by Kotz et al. ( [1] , p. 158-174) but the asymptotic covariance matrix of the ML estimators is still incomplete.

The formula (2.2) given by Kotz et al. ( [16] , p. 818) does not give the correct asymptotic covariance for the ML estimators. The complete formula for the asymptotic covariance matrix of the ML estimators can be obtained using M-estimation theory. An example is given at the end of section (3.2) which shows that one cannot recover the common asymptotic variance of the sample median using results in Kotz et al. ( [16] , p. 818).

M-estimation theory allows the score functions when viewed as functions of the parameters to have a few points of discontinuities and full differentiability with respect to $β$ can be replaced by one side differentiability accordingly. Amemiya ( [18] , p. 151) uses this approach. For establishing asymptotic normality for the sample median, the sample median is viewed as a root given by a solution of the estimating equation

$\frac{1}{n} \sum_{i = 1}^{n} ψ (x_{i}, θ) = 0$ ,

using the indicator function $I [.]$ ,

$ψ (x, θ) = - I [θ < x]$ and $ψ (x, θ) = I [θ > x], ψ (x, θ) = 0$ if $x = θ$ .

The function $ψ (x, θ)$ is simply the one side derivative and we adopt the notation

$ψ (x, θ) = \frac{\partial | x - θ |}{\partial θ}$ with the meaning of one side derivative, also see Hogg et

al. ( [21] , p. 538) on estimating equations based on the sign test. The probability of the existence of such a root tend to 1 as $n \to \infty$ .

Another M estimator for the location parameter $θ$ has been proposed by Huber ( [17] , p. 232-233). It consists of estimating $θ$ by solving

$\frac{1}{n} \sum_{i = 1}^{n} ψ (x_{i}, θ) = 0$ with

$ψ (x, θ) = x - θ$ if $| x - θ | \leq k$ , k is chosen.

$ψ (x, θ) = k$ , if $| x - θ | > k$ .

For M-estimators based on $ψ (x, β)$ , where $β$ is a vector of parameters, Huber [17] , Huber [15] has generalized and relaxed conditions for the classical Taylor expansion. The technical details can be found in his seminal paper and in Huber [15] . It can be summarized as follows. Suppose that the M-estimators given by $\hat{β}$ , given as the roots of the following estimating functions

$\frac{1}{n} \sum_{i = 1}^{n} ψ (x, β) = 0$ . (12)

Under the following main conditions:

a) $\frac{1}{n} \sum_{i = 1}^{n} ψ (x_{i}, \hat{β}) \overset{p}{\to} 0$ , assuming $\hat{β} \overset{p}{\to} β_{0}$ has been shown,

b) $λ (β_{0}) = E_{β_{0}} (β_{0}) = 0, λ (β) = E_{β_{0}} (ψ (x, β)),$

with assumption N-3 given by Huber ( [15] , p. 132) and $λ (β)$ is differentiable with respect to $β$ , then we have the following representation:

$\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ψ (x, β_{0}) = - Λ (β_{0}) \sqrt{n} (\hat{β} - β_{0}) + o_{p} (1)$ ,

${Λ (β_{0}) = \frac{\partial λ (β)}{\partial β^{'}} |}_{β = β_{0}}$ and $o_{p} (1)$ is a term converging to 0 in probability.

When we compare with the usual Taylor expansion, we only require $λ (β) = E_{β_{0}} (ψ (x, β))$ to be differentiable with respect to $β$ . This differentiability condition is satisfied for the AL family. Note that if indeed the score functions are differentiable then $- Λ (β_{0})$ is the Fisher information matrix.

For the technical details on how to verify the conditions N-3, see Hinkley and Revankar ( [22] , p. 7). The condition 1) is usually verified by making use of the Lebesgue dominated convergence theorem (LDGT) as given by Rudin ( [23] , p. 321). It can become every technical to construct integrable functions to bound the score functions in order to check the sufficient conditions for the LDGT but they are expected to hold for the AL distribution with the existence of all integer positive moments and the parameters space is assumed to be compact. Essentially, we need to show that the condition 2) is met by showing the convergence in probability of the integrals

$\int_{- \infty}^{\infty} ψ (x, \hat{β}) d F_{n} (x) \overset{p}{\to} \int_{- \infty}^{\infty} ψ (x, β_{0}) d F_{β_{0}} (x) = E_{β_{0}} (β_{0}) = 0$ , $F_{n} (x)$ is the

sample distribution function, the score functions are given by expressions (14)-(16).

From the above representation, we then have

$\sqrt{n} (\hat{β} - β_{0}) \overset{L}{\to} N (0, ({[Λ (β_{0})]}^{- 1})) V_{β_{0}} (ψ (x, β_{0})) {({[Λ (β_{0})]}^{- 1})}^{'}$ .

The asymptotic covariance matrix of $\hat{β}$ is given by

$V (\hat{β}) = \frac{1}{n} ({[Λ (β_{0})]}^{- 1}) V_{β_{0}} (ψ (x, β_{0})) {({[Λ (β_{0})]}^{- 1})}^{'}$ , (13)

$V_{β_{0}} (ψ (x, β_{0}))$ is the covariance matrix of the vector $ψ (x, β_{0})$ , $ψ (x, β_{0})$ is the vector of the true score functions or quasi score functions if a proxy density function is used to replace the true density function.

Now based on M-estimation theory, we proceed to find $Λ (β_{0})$ and $V_{β_{0}} (ψ (x, β_{0}))$ for the AL distribution to obtain the asymptotic covariance matrix of the ML estimators in the following section.

3.2. Asymptotic Covariance Matrix for the AL Family

Kotz et al. [1] , Kotz et al. [16] have shown that the ML estimators for the AL distribution have an asymptotic normal distribution but their asymptotic covariance matrix given by expression (3.5.1) of Kotz ( [1] , p. 158) which is identical to expression (2) given by Kotz et al. ( [16] , p. 818) is still incomplete. If M-esti- mation theory is used then the asymptotic covariance matrix should be based on Corollary (3.2) as given by Huber ( [15] , p. 133), also see expression (12.18) given by Woolridge ( [24] , p. 407).

Since

$\ln g (x; θ, σ, κ) = \ln \sqrt{2} - \ln σ - \ln γ + \frac{δ \sqrt{2}}{2} \frac{(x - θ)}{σ} - \frac{γ \sqrt{2}}{2} \frac{| x - θ |}{σ}$ ,

the following derivatives are the score functions of the AL distribution,

$\begin{array}{l} ψ_{1} (x; θ, σ, κ) = \frac{\partial \ln g (x; θ, σ, κ)}{\partial θ} = - \frac{\sqrt{2} δ}{2 σ} - \frac{\sqrt{2} γ}{2 σ} v (x; θ) \\ with v (x; θ) = - I [x > θ] + I [x < θ], v (x; θ) = 0 if x = θ . \end{array}$ (14)

$ψ_{2} (x; θ, σ, κ) = \frac{\partial \ln g (x; θ, σ, κ)}{\partial σ} = - \frac{1}{σ} - \frac{\sqrt{2} δ}{2} \frac{(x - θ)}{σ^{2}} + \frac{\sqrt{2} γ}{2} \frac{| x - θ |}{σ^{2}}$ , (15)

$ψ_{3} (x; θ, σ, κ) = \frac{\partial \ln g (x; θ, σ, κ)}{\partial κ} = - \frac{\frac{\partial γ}{\partial κ}}{γ} + \frac{\partial δ}{\partial κ} \frac{\sqrt{2} (x - θ)}{σ} - \frac{\sqrt{2}}{2} \frac{\partial γ}{\partial κ} \frac{| x - θ |}{σ}$ . (16)

Let $β = {(θ, σ, κ)}^{'}$ and $β_{0}$ the vector of the true parameters we need to find first the vector

$\begin{array}{l} λ (β) = {(λ_{1} (β), λ_{2} (β), λ_{3} (β))}^{'}, \\ λ_{i} (β) = E_{β_{0}} (ψ_{i} (x; θ, σ, κ)), i = 1, 2, 3. \end{array}$

Subsequently, we need to find the derivatives of these expressions with respect to $β$ then evaluated at $β = β_{0}$ to obtain the matrix $- Λ (β_{0})$ . The matrix $- Λ (β_{0})$ generalizes the Fisher information matrix.

It will be reduced to this matrix if the score functions $ψ_{i} (x; β), i = 1, 2, 3,$ are differentiable with respect to $β$ . It is clear that the elements of $Λ (β_{0})$ will have closed form expressions but are lengthy to display. To obtain $E_{β_{0}} (ψ_{i} (x; θ, σ, κ)), i = 1, 2, 3$ , note that we have a location and scale parameter. Consequently, it appears to be simpler to define first the standardized AL density as the AL density with $θ = 0, σ = 1$ , i.e.,

$g_{ε} (x; κ) = \frac{\sqrt{2}}{γ} \exp (\frac{\sqrt{2}}{2} δ x) \exp (- \frac{\sqrt{2}}{2} γ (| x |))$ and the AL density with three

parameters as

$g (x; θ, σ, κ) = \frac{1}{σ} g_{ε} (\frac{x - θ}{σ}; κ)$ .

Making use of $g_{ε} (x; κ)$ ,

$E_{β_{0}} (v (x; θ)) = \int_{θ}^{\infty} \frac{1}{σ_{0}} g_{ε} (\frac{x - θ_{0}}{σ_{0}}; κ_{0}) d x - \int_{- \infty}^{θ} \frac{1}{σ_{0}} g_{ε} (\frac{x - θ_{0}}{σ_{0}}; κ_{0}) d x$ ,

$E_{β_{0}} (v (x; θ)) = 1 - 2 G_{ε} (\frac{θ - θ_{0}}{σ_{0}}; κ_{0})$ , (17)

$G_{ε} (x; κ)$ is the distribution function with density function $g_{ε} (x; κ)$ .

Similarly,

$E_{β_{0}} (| x - θ |) = \int_{θ}^{\infty} (x - θ) g_{ε} (\frac{x - θ_{0}}{σ_{0}}; κ_{0}) d x + \int_{- \infty}^{θ} (θ - x) \frac{1}{σ_{0}} g_{ε} (\frac{x - θ_{0}}{σ_{0}}; κ_{0}) d x$ . (18)

Therefore, $\frac{\partial E_{β_{0}} (| x - θ |)}{\partial θ}$ can be obtained by first evaluating the term

$\begin{matrix} \frac{\partial}{\partial θ} \int_{θ}^{\infty} (x - θ) \frac{1}{σ_{0}} g_{ε} (\frac{x - θ_{0}}{σ_{0}}; κ_{0}) d x = - \int_{θ}^{\infty} \frac{1}{σ_{0}} g_{ε} (\frac{x - θ_{0}}{σ_{0}}; κ_{0}) d x \\ = - [1 - G_{ε} (\frac{θ - θ_{0}}{σ_{0}}; κ_{0})], \end{matrix}$

using Leibnitz’s rule which taking into account the lower bound of the interval also depends on $θ$ then subsequently evaluate using Leibnitz’s rule the expression

$\frac{\partial}{\partial θ} \int_{- \infty}^{θ} (θ - x) \frac{1}{σ_{0}} g_{ε} (\frac{x - θ_{0}}{σ_{0}}; κ_{0}) d x = G_{ε} (\frac{θ - θ_{0}}{σ_{0}}; κ_{0})$ .

Consequently,

$\frac{\partial E_{β_{0}} (| x - θ |)}{\partial θ} = - 1 + 2 G_{ε} (\frac{θ - θ_{0}}{σ_{0}}; κ_{0}) .$

The elements of $Λ (β_{0})$ can be found subsequently by first forming

$λ_{1} (β; β_{0}) = \int_{- \infty}^{\infty} ψ_{1} (x, β) g (x; β_{0}) d x = - \frac{\sqrt{2}}{2} \frac{δ}{σ} - \frac{\sqrt{2}}{2} \frac{γ}{σ} E_{β_{0}} (v (x; θ))$ ,

$E_{β_{0}} (v (x; θ))$ is as given by expression (17). Also,

$\begin{matrix} λ_{2} (β; β_{0}) = \int_{- \infty}^{\infty} ψ_{2} (x, β) g (x; β_{0}) d x \\ = - \frac{1}{σ} - \frac{\sqrt{2}}{2} \frac{δ}{σ^{2}} (E_{β_{0}} (x) - θ) + \frac{\sqrt{2}}{2} \frac{γ}{σ^{2}} E_{β_{0}} (| x - θ |), \end{matrix}$

$E_{β_{0}} (| x - θ |)$ is as given by expression (18), $E_{β_{0}} (x) = θ_{0} + τ_{0} μ_{0}$ with $τ_{0} = 1$ using expression (4). With

$λ_{3} (β; β_{0}) = \int_{- \infty}^{\infty} ψ_{3} (x, β) g (x; β_{0}) d x$ or equivalently,

$λ_{3} (β; β_{0}) = - \frac{\frac{\partial γ}{\partial κ}}{γ} + \frac{\partial γ}{\partial κ} \frac{\sqrt{2}}{σ} (E_{β_{0}} (x) - θ) - \frac{\sqrt{2}}{2} \frac{\partial γ}{\partial κ} E_{β_{0}} (| x - θ |),$

then the matrix $Λ (β_{0})$ can be obtained by differentiating with respect to $β$ the vector

$λ (β; β_{0}) = {(λ_{1} (β; β_{0}), λ_{2} (β; β_{0}), λ_{3} (β; β_{0}))}^{'}$ and set $β = β_{0}$ , i.e.,

$Λ (β_{0}) = {\frac{\partial λ (β; β_{0})}{\partial β^{'}} |}_{β = β_{0}}$ .

Clearly, the elements of the matrix $Λ (β_{0})$ have closed form expressions but are lengthy to display. Packages like MATLAB or Mathematica can handle symbolic derivatives and can be used to obtain these elements. Substituting $β_{0}$ by the ML estimator $\hat{β}$ in $Λ (β_{0})$ yields an estimate for the matrix $Λ (β_{0})$ .

Now we turn our attention to the matrix ∑ which is the covariance matrix of the vector of score functions $ψ (x, β_{0}) = {(ψ_{1} (x, β_{0}), ψ_{2} (x, β_{0}), ψ_{3} (x, β_{0}))}^{'}$ . Using a different but equivalent parameterisation, this matrix has been obtained by Kotz et al. ( [16] , p. 818), Kotz et al. ( [1] , p. 158) but its inverse does not give the asymptotic covariance matrix of the ML estimators as claimed in their paper. It is not difficult to establish the relationships between the parameterisation used in example 2 and the one used in the paper by Kotz et al. ( [1] , p. 818).

Note that the inverse of Σ is not the asymptotic covariance matrix of the ML

estimators is due to $Λ (β_{0}) = {\frac{\partial λ (β; β_{0})}{\partial β^{'}} |}_{β = β_{0}}$ is not equal to $- Σ^{- 1}$ if the differ-

rentiability assumptions for the score functions do not hold, see corollary (3.2) and proposition (3.3) given by Huber ( [15] , p. 133).

The matrix $Σ$ can also be estimated by the following estimator

$\frac{1}{n} \sum_{i = 1}^{n} [ψ (x_{i}, \hat{β})] {[ψ (x_{i}, \hat{β})]}^{'}$ .

Let us consider the following location model with known $σ_{0}$ and check the expression (2.2) as given by Kotz et al. ( [16] , p. 818) has missing components. The density function is given by

$f (x; θ) = \frac{1}{σ_{0} \sqrt{2}} e^{- \frac{\sqrt{2}}{σ_{0}} | x - θ |}$ , or alternatively the density can also be expressed as

$f (x; θ) = f_{0} (x - θ), f_{0} (x) = \frac{1}{σ_{0} \sqrt{2}} e^{- \frac{\sqrt{2} | x |}{σ_{0}}}$ .

This subfamily will correspond to their parametrisation with ${κ = 1}^{}$ in their paper. The sample median $\hat{θ}$ is the ML estimator for $θ$ , using their result it will

lead to conclude that the asymptotic variance is given by ${(E {(\frac{\partial \ln f}{\partial θ})}^{2})}^{- 1} = \frac{σ_{0}^{2}}{2}$ ,

as indicated by case1 in the table of their paper. On the other hand, it is known that the asymptotic distribution of the sample median is given by

$\sqrt{n} (\hat{θ} - θ_{0}) \overset{L}{\to} N (0, \frac{1}{4 {(f_{0} (0))}^{2}})$ , see expression (2.4.19) given by Lehmann

( [25] , p. 81) for example. For the location model being considered, we have

$\frac{1}{4 {(f_{0} (0))}^{2}} = \frac{1}{8 σ_{0}^{2}}$ . Clearly, $\frac{1}{8 σ_{0}^{2}} \neq \frac{σ_{0}^{2}}{2}$ but the correct asymptotic variance can

be obtained using expression (13).

For the general GAL distribution with four parameters, alternative methods of estimation based on quadratic distances (QD) which make use of the empirical cumulant generating function will be introduced in the next section. The QD

methods are developed based on empirical findings which show that the ML methods for finite sample sizes as large as n = 5000 do not give good estimates for the shape parameter $τ$ and the scale parameter $σ$ but ML methods give good estimates for the other two parameters. Howewer, the overall efficiency of ML methods lags behind QD methods in finite samples. Also, QD methods beside giving better estimates for $σ$ and $τ$ , the methods can be used for parameter testing since the asymptotic covariance matrix for the QD estimators can be obtained explicitly for the entire GAL family. The methods also provide a chi- square test statistics of goodness-of-fit for the model being used. Therefore, it might be of interests to consider using QD methods whenever ML methods might have deficiencies.

4. Quadratic Distance Methods

General Quadratic distance (QD) theory has been developed in Luong and Thompson [26] . Howewer, if it is used for estimating parameters of the GAL distribution we need to specify a distance which can generate estimators with good efficiencies. For applied works, it is also preferable to have methods which are relatively simple to implement numerically.

For financial data, observations are recorded as percentages so they are small in magnitude, we recommend minimizing the following distance based on matching the empirical cumulant generating function $K_{n} (t)$ with its model counterpart $K_{β} (t)$ using the following points

$t_{j}, j = 1, \dots, m = 20$

with

$t_{1} = 0.01, t_{2} = 0.02, \dots, t_{10} = 0.1, t_{11} = - 0.01, t_{12} = - 0.02, \dots, t_{20} = - 0.1.$ (19)

The choice of points as given above is suggested based on empirical findings that overall, the QD estimators are more efficient than the ML estimators for the range of parameters often encountered for modelling financial data using finite sample sizes as large as n = 5000. Note that the set of points chosen does not include the origin 0.

The empirical moment generating function, empirical cumulant generating function are given respectively by

$M_{n} (s) = \frac{1}{n} \sum_{i = 1}^{n} e^{s X_{i}}$ and $K_{n} (s) = \log M_{n} (s)$ .

The model cumulant generating function is $K_{β} (t)$ , $K_{β} (t) = \log M_{β} (t)$ with $M_{β} (s)$ being the model moment generating function as defined by expression (1). The proposed QD estimators given by the vector $\tilde{β}$ is obtained by minimizing with respect to $β$ the following specific QD distance given by

$D (β) = {\sum_{j = 1}^{20} (K_{n} (s_{j}) - K_{β} (s_{j}))}^{2}$ . (20)

Once the estimates are obtained, goodness of fit test statistics with an asymptotic chi-square distribution with $r = 16$ degree of feedom can also be constructed. General QD distances theory can be used to derive the asymptotic covariance matrix of the QD estimators and the chi-square goodness of fit test statistics. They will be given at the end of this section. Having the asymptotic covariance matrix of the QD estimators in closed form for the GAL family is useful for parameter testing.

For notations, let us define the vector based on observations

$z_{n} = {(K_{n} (t_{1}), \dots, K_{n} (t_{m}))}^{'}, m = 20$ .

Its model counterpart is the vector

$z_{β} = {(K_{β} (t_{1}), \dots, K_{β} (t_{m}))}^{'}$ .

Therefore,

$D (β) = {(z_{n} - z_{β})}^{'} (z_{n} - z_{β}) .$

Observe that the elements of the covariance matrix $V_{M}$ for the vector

$\sqrt{n} {(M_{n} (t_{1}), \dots, M_{n} (t_{m}))}^{'}$ are given by

$V_{M} (i, j) = M_{β} (t_{i} + t_{j}) - M_{β} (t_{i}) M_{β} (t_{j}), i = 1, \dots, 20, j = 1, \dots, 20.$

The elements of the approximate covariance matrix based on the differential method or delta method for

$\sqrt{n} {(K_{n} (t_{1}), \dots, K_{n} (t_{m}))}^{'}$

are given by

$V_{K} (i, j) = (M_{β} (t_{i} + t_{j}) - M_{β} (t_{i}) M_{β} (t_{j})) / (M_{β} (t_{i}) M_{β} (t_{j})), i = 1, \dots, 20, j = 1, \dots, 20.$

Under the regularity conditions given by Lemma (3.4.1) of Luong and Thompson ( [26] , p. 244), the QD estimators given by the vector $\tilde{β}$ are consistent. Clearly, we need to assume that on the restricted parameter space the model moment generating function and the covariance matrix $V_{M}$ given by expression (21) are well defined. Some modifications might be necessary if the methods are applied to other models. The conditions are met in general for the GAL distribution when used for modeling financial data. We then have

$\begin{array}{l} \sqrt{n} (\tilde{β} - β_{0}) \overset{L}{\to} N (0, V), \\ V = {(S^{'} S)}^{- 1} S^{'} V_{K} S {(S^{'} S)}^{- 1} . \end{array}$

The asymptotic covariance for the QD estimators is simply $\frac{1}{n} V$ .

All the expressions which form $V$ as given above are evaluated under the true vector of parameters $β_{0}$ , $β = {(θ, μ, σ, τ)}^{'} = {(β_{1}, β_{2}, β_{3}, β_{4})}^{'}$ and

$S = [\begin{matrix} \frac{\partial K_{β} (t_{1})}{\partial β_{1}} & \dots & \frac{\partial K_{β} (t_{1})}{\partial β_{4}} \\ ⋮ & ⋱ & ⋮ \\ \frac{\partial K_{β} (t_{m})}{\partial β_{1}} & \dots & \frac{\partial K_{β} (t_{m})}{\partial β_{4}} \end{matrix}]$ , $S^{'}$ is the transpose of $S$ .

We also use $S = S (β_{0}), V_{K} = V (β_{0}), Σ_{2} = Σ_{2} (β_{0})$ to emphasize that these matrices depend on $β_{0}$ . The matrix $Σ_{2}$ is derived below. For constructing test statistics with chi-square limiting distribution, use expression (3.4.2) given by Luong and Thompson ( [26] , p. 248) to obtain

$\sqrt{n} (z_{n} - z_{\tilde{β}}) \overset{L}{\to} N (0, Σ_{2})$ with $Σ_{2}$ , a covariance matrix which depends on $β_{0}$ and

$Σ_{2} = [I - S {(S^{'} S)}^{- 1} S^{'}] V_{K} [I - S {(S^{'} S)}^{- 1} S^{'}] .$ (21)

In practice, $β_{0}$ needs te be replaced by $\tilde{β}$ so that an estimate of $Σ_{2}$ can be defined as

${\tilde{Σ}}_{2} = Σ_{2} (\tilde{β})$ .

We need to find the Moore-Penrose (MP) generalized inverse for ${\tilde{Σ}}_{2}$ to constructa chi-square statistics. The quadratic form constructed with the MP inverse will follow a chi-square distribution asymptotically. Many computer packages provide prewritten functions to find the Moore-Penrose inverse of a matrix. It can also be computed easily using the spectral decomposition of ${\tilde{Σ}}_{2}$ , i.e., using the representation ${\tilde{Σ}}_{2} = P D P^{'}$ . The columns of the matrix $P$ are the eigenvectors of ${\tilde{Σ}}_{2}$ and $D$ is a diagonal matrix with the diagonal elements being the corresponding eigenvalues of ${\tilde{Σ}}_{2}$ given respectively by $λ_{i} \geq 0, i = 1, \dots, m$ . The matrix $P$ is orthonormal with the property $P P^{'} = I$ .

The Moore Penrose inverse ${\tilde{Σ}}_{2}^{M P}$ can be obtained as

${\tilde{Σ}}_{2}^{M P} = P D^{-} P^{'}$ with

$D^{-}$ being the diagonal matrix constructed based on the diagonal elements $λ_{i}, i = 1, \dots, 20$ of $D$ . The diagonal elements of $D^{-}$ are given as

$λ_{i}^{-} = \frac{1}{λ_{i}}$ if $λ_{i} > 0$ and $λ_{i}^{-} = 0$ if $λ_{i} = 0$ .

For discussions on property of the Moore Penrose generalized inverse,

see Theil ( [27] , p 273-274), also see expressions (4.3 - 4.6) given by Harville ( [28] , p 504). For numerical computations using R, see section 8.3 given by Fieller ( [29] , p. 123-133). The chi-square test statistics for testing the null hypothesis which specifies that observations are drawn from the GAL family can be based on the criterion function

$Q (β) = n {(z_{n} - z_{β})}^{'} (P D^{-} P^{'}) (z_{n} - z_{β}),$ (22)

$Q (\tilde{β}) = n {(z_{n} - z_{\tilde{β}})}^{'} (P D^{-} P^{'}) (z_{n} - z_{\tilde{β}}) \overset{L}{\to} χ^{2} (16)$ . (23)

The limiting distribution of the test statistics is chi-square with $r = 16$ , based on Theorem 3.4.1 of Luong and Thompson ( [26] , p. 248). The test statistics can also be viewed as a generalized Pearson test statistics. The criterion function $Q (β)$ can also be used to find a good starting vector to initialize the algorithms for finding the QD estimators, see section (3) given by Andrews ( [30] , p. 917- 922) for more discussions and section (5.2) of this paper.

5. Numerical Issues

5.1. Simplemoment Estimators

The simple approximate moment estimate proposed by Senata [4] can be found explicitly and can be used as starting points for numerical optimization to find QDE or MLE. Let the first four moments be denoted by ${\hat{μ}}_{s j}, j = 1, 2, 3, 4$ with

${\hat{μ}}_{j} = \frac{1}{n} \sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{j}, j = 1, \dots, 4$ and equalizing with the model counterparts

and neglecting all the terms with ${θ^{'}}^{j}, j = 2, 3, 4$ yields the following system of estimating equation for moment estimation,

${\hat{μ}}_{1} = c + θ^{'}, {\hat{μ}}_{2} = {σ^{'}}^{2}, {\hat{μ}}_{3} = 3 {σ^{'}}^{2} θ^{'} ν, {\hat{μ}}_{4} = 3 {σ^{'}}^{4} ν + 3 {σ^{'}}^{4}$ . The moment estimators are

${({\bar{σ}}^{'})}^{2} = {\hat{μ}}_{2}, \bar{ν} = \frac{\frac{{\hat{μ}}_{4}}{3} - {({\bar{σ}}^{'})}^{4}}{{({\bar{σ}}^{'})}^{4}}, \bar{θ}' = \frac{{\hat{μ}}_{3}}{3 \bar{ν} {({\bar{σ}}^{'})}^{2}}, \bar{c} = {\hat{μ}}_{1} - \bar{θ}' .$ When converted to the

parameterization given by Kotz et al. [16] , the approximate moment estimators

for $τ, σ^{2}, μ, θ$ are given respectively as $\bar{τ} = \frac{1}{\bar{ν}}, {\bar{σ}}^{2} = {\bar{σ}}^{'}^{2} \bar{ν}, \bar{μ} = {\bar{θ}}^{'} \bar{ν}, \bar{θ} = \bar{c}$ . The

approximate moment estimators are not efficient but they are simple and given explicitly .Therefore, they can be used as starting points for the numerical algorithms to implement QD or ML estimation. Moment estimators can also be verified to see whether they are appropriate as starting points. This will be discussed in the next section.

5.2. The Choice of an Initial Vector

Most of the algorithms will return a local minimizer and the vector which gives the estimators is defined to be the global minimizer. Due to this limitation, some cares are needed to ensure that we can identify the global minimizer. In practice, it is important to test the algorithm with various starting vectors, see Andrews [30] . Andrews [30] has suggested that it is preferable to have the starting vector

$β^{(0)} = {(θ^{(0)}, μ^{(0)}, σ^{(0)}, τ^{(0)})}^{'}$ close to the vector of the estimators given by $\tilde{β}$

which globally minimizes the objective function. We might look for a different starting vector if the vector of moment estimators cannot be used as a starting vector to initialize the numerical algorithm.

The criterion function $Q (β)$ given by expression (22) which is used to construct goodness of fit test can also be used to select a good starting vector. The starting vector $β^{(0)}$ is subject to the screening test by checking whether

$Q (β^{(0)}) \leq χ_{0.95}^{2} (16)$ ,

$χ_{0.95}^{2} (16)$ is the 95th percentile of the chi-square distribution with 16 degree of freedom to be qualified as a suitable starting vector, see expression (3.5) given by Andrews ( [30] , p. 919). If $β^{(0)}$ passes the screening test then one might consider to use $β^{(0)}$ as the vector of starting points for the numerical algorithm used to find the vector of estimators, otherwise look for another one.

5.3. A limited Simulation Study

For financial data, observations are recorded as percentages so they are small in magnitude. We are in the situation of modeling with values for $θ$ and $μ$ are near 0. The plausible values for $τ$ and $σ, 0 < τ \leq 10, 0 < σ \leq 0.1$ . For parameters with these ranges we observe that the ML estimators for $τ$ and $σ$ do not perform well for sample size as large as $n = 5000$ . For comparisons between QD methods vs ML methods, the ratio of total Mean square errors is used as a measure for the overall relative efficiency. Due to the limited capacity on computing as we only have access to a laptop computer, we can only use M = 100 samples with each sample is of a size n = 5000.

The overall relative efficiency for comparisons is defined as the ratio

$\frac{TMSE (Q D)}{TMSE (M L)} = \frac{MSE (\tilde{θ}) + MSE (\tilde{μ}) + MSE (\tilde{σ}) + MSE (\tilde{τ})}{MSE (\hat{θ}) + MSE (\hat{μ}) + MSE (\hat{σ}) + MSE (\hat{τ})} .$

The expressions for MSE and TMSE which appear in Table 1 are estimated using simulated samples. The results of the simulation study are lengthy. We only extract the key findings, which is summarized using Table 1.

The study seems to indicate that overall ML methods are less efficient than QD methods but ML methods are more efficient for estimating the first two

(a)Overall relative efficiency: $\frac{TMSE (QD)}{TMSE (ML)} = 0.003$ .
(b)Overall relative efficiency: TMSE ( QD ) TMSE ( ML ) = 0.0207 .
(c)Overall relative efficiency: $\frac{TMSE (QD)}{TMSE (ML)} = 8.357 \times 10^{- 5}$ .

Table 1. Illustrations of simulation results.

parameters namely $θ, μ$ for the AL family and for the entire GAL family in finite samples where little is known about the asymptotic distributions of the ML estimators.

6. Financial Applications

6.1. Option Pricing and Risk Neutral Parameters

For options as they are tradable, risk neutral parameters are used for pricing. Risk neutral parameters are related to the physical parameters which can be estimated using historical data. A set of risk neutral parameters can be obtained by using the Esscher transform change of measure, see Schoutens ( [31] , p 77) based on the seminal works of Gerber and Shiu [12] . They can also be viewed as minimum entropy risk neutral parameters, see Miyahara [13] . We keep the four historical parameters of the GAL distribution as risk neutral parameters but introduce an extra parameter $h^{*}$ which is given by the following equation with $h$ being the unknown variable and r is the known risk free rate,

$r = \log M (h + 1) - \log M (h)$

where $M (s)$ is the moment generating function as given by expression (2).

Therefore, the risk neutral parameters are given by the vector

$β_{0}^{N} = {(h^{*}, θ_{0}, σ_{0}, μ_{0}, τ_{0})}^{'}$ .

The price of the asset is modeled as $S_{T} = S_{0} e^{X_{T}}$ where:

a) $S_{0}$ is the initial asset price at time $t = 0$ ,

b) $X_{T} = \sum_{i = 1}^{T} R_{i}$ ,

c) the log returns $R_{i} = \log (S_{i + 1}) - \log (S_{i}), i = 1, \dots, T$ are i.i.d as $R ~ G A L (β)$ with mgf $M_{β} (s)$ .

We also assume $T \geq 1$ and $T$ is a positive integer.

For pricing an European call option with the initial price $S_{0}$ , strike price $K$ and interest rate $r$ , the price of the European call option is $e^{- r T} E ({(S_{T} - K)}_{+})$ where ${(S_{T} - K)}_{+} = \max ((S_{T} - K), 0)$ and the expectation is under risk neural parameters. Therefore, it is possible use simulated samples from a bilateral gamma distribution to obtain an estimate for $E ({(S_{T} - K)}_{+})$ and price the option.

Senata ( [4] , p. 182-184) has illustrated the use of the GAL family, moment and ML methods to analyze historical data from the Dow Jones industrial average and other indexes. It is not difficult to see that QD methods can be considered as alternative methods for analyzing financial data.

Beside option pricing, measures of risks are used in finance and actuarial sciences. These measures will depend on the underlying distribution which is specified by a set of parameters. We briefly discuss these notions below. The inferences techniques can also be applied to estimate the parameters using historical data and quantify the level of risks incurred.

6.2. VaR, CVar, EvaR Using the GAL Distribution

The Value at Risk at confidence level $1 - α$ of a continuous loss random variable $X$ with distribution function $F (x)$ and density function $f (x)$ is defined as

$V a R_{1 - α} (L) = F^{- 1} (1 - α)$ is the quantile of the loss $X = - R$ specified by

$P (X > V a R_{1 - α} (X)) = α$ , the probability of the potential loss encountered by the holder of a financial assetfor one unit of time. The conditional value at risk

$C V a R_{1 - α} (X) = \frac{1}{α} \int_{V a R_{1 - α}}^{\infty} x f (x) d x$ , see Rockafellar and Uriyasev [32] for this mea-

sure of risk. If the log return random variable $R$ follows a $G A L (θ, σ, μ, τ)$ , $R ~ G A L (θ, σ, μ, τ)$ , then the loss random variable is $X ~ G A L (- θ, σ, - μ, τ)$ .

Ahmadi-Javid [33] proposed a coherent measure of risk, the entropic value-at risk (EVaR) using the Chernoff bound, see the seminal paper by Chernoff [34] for the bound. EVaR is defined implicitly using of the moment generating function $M (z)$ . Since the moment generating function of the GAL distribution is relatively simple and does not involve the Bessel function, Evar can also be computed easily. For more discussions on estimation and risk measures, see Toma and Dedu [35] .

7. Conclusion

As we can see in finite samples, ML methods only offer good estimators for two of the four parameters for the GAL family. Asymptotic normality can only be guaranteed for the AL family and the lack of a covariance matrix in closed form prevents hypotheses testing for the GAL family. Due to these restrictions, QD methods are developed as complementary methods to ML methods. The methods appear to be suitable for estimation and for parameter testing. The methods also produce a criterion function when evaluated at the values taken by the QD estimators gives a chi-square goodness-of-fit test statistics for the GAL model. The criterion function can be used to select a starting vector which is close to the vector of the QD estimators to start a numerical search algorithm. These last two features are not shared directly by ML methods and appear to be useful for applications.

Acknowledgements

The helpful and constructive comments of referees which lead to an improvement of the presentation of the paper and support from the editorial staffs of Open Journal of Statistics to process the paper are all gratefully acknowledged here.

Cite this paper

Luong, A. (2017) Likelihood and Quadratic Distance Methods for the Generalized Asymmetric Laplace Distribution for Financial Data. Open Journal of Statistics, 7, 347-368. https://doi.org/10.4236/ojs.2017.72025

References

1. Kotz, S., Kozubowski, T.J. and Podgorski, K. (2001) The Laplace Distribution and Generalizations. Birkhauser, Boston.
https://doi.org/10.1007/978-1-4612-0173-1

2. Madan, D.P. and Senata, E. (1990) The Variance Gamma (VG) Model for Share Market. Journal of Business, 63, 551-524.
https://doi.org/10.1086/296519

3. Madan, D.P., Carr, P. and Chang, E.C. (1998) The Variance Gamma Process and Option Pricing. European Finance Review, 2, 79-105.
https://doi.org/10.1086/296519

4. Seneta, E. (2004) Fitting the Variance Gamma Model to Financial Data. Journal of Applied Probability, 41, 177-187.
https://doi.org/10.1017/S0021900200112288

5. Podgorski, K. and Wegener, J. (2011) Estimation for Stochastic Models Driven by Laplace Motion. Communications in Statistics, Theory and Methods, 40, 3281-3302.
https://doi.org/10.1080/03610926.2010.499051

6. McNeil, A.J., Frey, R. and Embrechts, P. (2005) Quantitative Risk Management. Princeton University Press, Princeton.

7. Protassov, R.S. (2004) EM-Based Maximum Likelihood Parameter Estimation for Multivariate Generalized Hyperbolic Distributions with Fixed λ. Statistics and Computing, 14, 67-77.
https://doi.org/10.1023/B:STCO.0000009419.12588.da

8. Hu, W. (2005) Calibration of Multivariate Generalized Hyperbolic Distributions Using the EM Algorithm with Applications in Risk Management, Portfolio Optimization and Portfolio Credit Risk. Unpublished PHD Thesis, Department of Mathematics, The Florida State University, Tallahassee.

9. Louis, T.A. (1982) Finding the Observed Information Using the EM Algorithm. Journal of the Royal Statistical Society Series B, 44, 98-130.

10. McLachlan, G.J. and Krishnan, T. (2008) The EM Algorithm and Extensions. 2nd Edition, Wiley, New York.
https://doi.org/10.1002/9780470191613

11. Küchler, U. and Tappe, S. (2008) Bilateral Gamma Distributions and Processes in Fi-nancial Mathematics. Stochastic Processes and Their Applications, 118, 261-283.

12. Gerber, H.U. and Shiu, E.S.W. (1994) Option Pricing by Esscher Transforms. Transactions of the Society of Actuaries, 46, 99-191.

13. Miyahara, Y. (2012) Option Pricing in Incomplete Markets: Modeling Based on Geometric Lévy Processes and Minimal Entropy Martingales Measures. Imperial College Press, London.

14. Bierlaire, M. (2006) Introduction à l’optimisation différentiable. Presses Polytechniques et Universités Romandes, Lausanne.

15. Huber, P. (1981) Robust Statistics. Wiley, New York.
https://doi.org/10.1002/0471725250

16. Kotz, S., Kozubowski, T.J. and Podgorski, K. (2002) Maximum Likelihood Estimation of Asymmetric Laplace Parameters. Annals of Institute of Statistical Mathematics, 54, 816-826.
https://doi.org/10.1023/A:1022467519537

17. Huber, P. (1967) The Behaviour of Maximum Likelihood Estimates under Nonstandard Conditions. In: Proceeding 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, University of California Press, Berkeley.

18. Amemiya, T. (1985) Advanced Econometrics. Harvard University Press, Cambridge.

19. Newey, W.K. and McFadden, D. (1994) Large Sample Estimation and Hypothesis Testing. In: Engle, R.F. and McFadden, D., Eds., Handbook of Econometrics, Vol. 4, North Holland, Amsterdam.

20. Broniatowski, M., Toma, A. and Vajda, I. (2012) Decomposable Pseudodistances and Application in Statistical Estimation. Journal of Statistical and Planning Inference, 142, 2574-2585.

21. Hogg, R., McKean, J.W. and Craig, A.T. (2013) Introduction to Mathematical Statistics. 7th Edition, Pearson, New York.

22. Hinkley, D.V. and Revankar, N.S. (1977) Estimation of the Pareto Law form Under-reported Data: A Further Analysis. Journal of Econometrics, 5, 1-11.

23. Rudin, W. (1976) Principles of Mathematical Analysis. McGraw Hill, New York.

24. Woolridge, J.M. (2010) Econometric Analysis of Cross Section and Panel Data. 2nd Edition, MIT Press, Cambridge.

25. Lehmann, E.L. (1999) Elements of Large Sample Theory. Springer, New York.
https://doi.org/10.1007/b98855

26. Luong, A. and Thompson, M.E. (1987) Minimum Distance Methods Based on Quadratic Distances for Transforms. Canadian Journal of Statistics, 15, 239-251.
https://doi.org/10.2307/3314914

27. Theil, H. (1971) Principle of Econometrics. Wiley, New York.

28. Harville, D.A. (1997) Matrix Algebra from a Statistician’s Perspective. Springer, New York.
https://doi.org/10.1007/b98818

29. Fieller, N. (2016) Basics of Matrix Algebra with R. Chapman and Hall, New York.

30. Andrews, D.W.K. (1997) A Stopping Rule for the Computation of the Generalized Method of Moments Estimators. Econometrica, 65, 913-931.
https://doi.org/10.2307/2171944

31. Schoutens, W. (2003) Lévy Processes in Finance: Pricing Financial Derivatives. Wiley, New York.
https://doi.org/10.1002/0470870230

32. Rockafellar, R.T. and Uriyasev, S. (2002) Conditional Value-at-Risk for General Loss Distribution. Journal of Banking and Finance, 26, 1443-1471.

33. Ahmadi-Javid, A. (2012) Entropic Value-At Risk: A New Coherent Risk Measure. Journal of Optimization: Theory and Applications, 155, 1105-1123.
https://doi.org/10.1007/s10957-011-9968-2

34. Chernoff, H. (1952) A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on a Sum of Observations. Annals of Mathematical Statistics, 23, 497-507.
https://doi.org/10.1214/aoms/1177729330

35. Toma, A. and Dedu, S. (2014) Quantitative Techniques for Financial Risk Assessment: A Comparative Approach Using Different Risk Measures and Estimation Methods. Procedia Economics and Finance, 8, 712-719.

Journal Menu>>