Shrinkage Estimation in the Random Parameters Logit Model

doi:10.4236/ojs.2016.64056

Open Journal of Statistics
Vol.06 No.04(2016), Article ID:69976,8 pages
10.4236/ojs.2016.64056

Tong Zeng¹, R. Carter Hill²

●How to Cite this Article

¹Department of Applied Business Sciences and Economics, University of La Verne, La Verne, CA, USA

²Department of Economics, Louisiana State University, Baton Rouge, LA, USA

This work is licensed under the Creative Commons Attribution International License (CC BY).

http://creativecommons.org/licenses/by/4.0/

Received 10 May 2016; accepted 20 August 2016; published 23 August 2016

ABSTRACT

In this paper, we explore the properties of a positive-part Stein-like estimator which is a stochastically weighted convex combination of a fully correlated parameter model estimator and uncorrelated parameter model estimator in the Random Parameters Logit (RPL) model. The results of our Monte Carlo experiments show that the positive-part Stein-like estimator provides smaller MSE than the pretest estimator in the fully correlated RPL model. Both of them outperform the fully correlated RPL model estimator and provide more accurate information on the share of population putting a positive or negative value on the alternative attributes than the fully correlated RPL model estimates. The Monte Carlo mean estimates of direct elasticity with pretest and positive-part Stein-like estimators are closer to the true value and have smaller standard errors than those with fully correlated RPL model estimator.

Keywords:

Pretest Estimator, Stein-Rule Estimator, Positive-Part Stein-Like Estimator, Likelihood Ratio Test, Random Parameters Logit Model

1. Introduction

The random parameters logit (RPL) model is a generalization of the conditional logit model for multinomial choices. The conditional logit model is derived from an assumption that the errors in the underlying random utility functions for each choice alternative are statistically independent and identically distributed (iid) extreme value type I. This leads to the property known as the Independence of Irrelevant Alternatives (IIA): The ratio of the probability of two alternatives remains constant no matter how many choices there are. This is widely regarded to be a very restrictive assumption.

The key feature of the RPL model is that response parameters can vary randomly, following a chosen distribution, across the population from which samples are drawn. The random coefficients capture individual heterogeneity and the model does not suffer from the independence of irrelevant alternatives assumption. The random coefficients can be correlated in the RPL model as generally expected in reality, because the unobservable preference of each individual is used to evaluate the attributes of all alternatives in each choice situation. Estimation is by maximum simulated likelihood (MSL), which is described by [1] .

In this paper we explore a problem that can exist in any correlated random parameters model. Let, be an observable outcome variable from a density, where is a vector of K explanatory variables and are random parameters with mean and covariance matrix. Using MSL we estimate the population parameters and. Allowing the random parameters to be correlated introduces potentially many new parameters, covariance terms, that are difficult to estimate.

Most applied researchers will test the significance of the covariance parameters before deciding to rely on the fully correlated random parameter model instead the model in which the parameters are random but uncorrelated, so that is diagonal. We explore whether a pretesting strategy improves postestimation inference. We also explore the use of a Stein-like shrinkage estimator as an alternative to pretesting. This estimator shrinks the estimates from the fully correlated parameter model towards the estimates of the uncorrelated random parameter model. In numerical experiments using the RPL model we find that both the pretest estimator and shrinkage estimators have improved mean squared error (MSE) relative to the MSL estimator of the fully correlated parameter model. Last, we analyze the share of the population putting a positive or negative value on the alternative attributes, and the Monte Carlo mean estimates of direct elasticity with fully correlated RPL model estimates and pretest and shrinkage estimates. Based on our Monte Carlo experiment results, pretest and shrinkage estimates provide more accurate estimates on both of them than the fully correlated RPL model estimates.

2. The Random Parameters Logit Model

The RPL model is described in [2] . Consider individual n facing M alternatives. The random utility associated with alternative i is, where are K observed explanatory variables for alternative i, is an iid type I extreme value error which is independent of and. The random coefficients can be regarded as being composed of a mean and deviations. The RPL model decomposes the unobserved part of the utility into the extreme value term and the random part. Conditional on the pro-

bability that individual n chooses alternative i is of the usual logistic form,. Assume

that is multivariate normal¹ with mean vector and covariance matrix with elements. Denoting the MVN density, where contains the unknown mean and covariance parameters, the probability that individual n chooses alternative i is

(1)

For estimation purposes we use Cholesky’s decomposition and write, where A is lower triangular. The parameter means and elements of A are the objects of estimation. The parameters of the fully cor- related RPL model (FCRPLM), are

(2)

where are diagonal elements of A and, , are below the diagonal. If the random coefficients in the RPL model are uncorrelated, denoted UCRPLM, then is

(3)

where.

3. Stein-Like Shrinkage Estimation

Stein-rule estimators follow the work of [3] and [4] and combine sample information with non-sample infor- mation in a way that improve the precision of the estimation process and the quality of subsequent predictions. The Stein-rule estimator is a weighted average of the restricted and unrestricted estimators, the weight being a function of the magnitude of the test statistic used to test the restrictions.

Following is the Stein-rule estimator which dominates the maximum likelihood estimator (MLE) in linear regression under weighted quadratic loss with weight matrix W, , where y is a random vector, X is a matrix of rank and e is a vector of random disturbances distributed as. If represent a set of independent linear restrictions on, the Stein-rule estimator that combines sample and non-sample information is:

(4)

where is the restricted estimator, obtained by minimizing the sum of squared errors subject to the set of re-

strictions,. Here with.

Sufficient conditions for minimaxity, meaning that the estimator minimizes the maximum risk over the entire parameter space, are restrictions and the scalar a chosen to lie within the interval:

(5)

where is the largest characteristic root of the matrix in braces. The estimator can be written as

(6)

where u is the test statistic for the hypothesis, and. If the data support the non- sample information then u will be small and a relatively large weight is placed on the restricted estimator. Conversely, if the data do not support the imposed restrictions, u will be large and the unrestricted estimator is more heavily weighted. When, the Stein estimator reverses the sign of the estimator, or the latter is shrunk beyond the hypothesis vector. The problem is resolved by the use of “positive rule” estimator, which preserves the sign of the estimates and dominates the Stein-rule estimator over the entire parameter space.

The positive-part Stein-like estimator is a stochastically weighted convex combination of the MLE from an unrestricted model and a restricted MLE subject to J constraints. In our case the unrestricted MLE comes from the FCRPLM estimates and the restricted MLE from the UCRPLM estimates

(7)

where and is the indicator function of a test statistic u for the null hy-

pothesis that the coefficient covariance matrix is diagonal, or equivalently that the Cholesky elements in A below the diagonal are zero. The scalar a controls the amount of shrinkage towards the UCRPLM estimates. The shrinkage estimator becomes the UCRPLM estimator when the test statistic u is less than the value of a. The larger the value of a, the more weight that is given to the UCRPLM estimates. [5] show that if the number of constraints, then under information weighted quadratic loss the risk of the shrinkage estimator is smaller than the risk of the unrestricted maximum likelihood estimator for any. Common choices for the shrinkage constant are and. In our case is the number of cova- riance terms constrained to zero when obtaining the UCRPLM estimates.

With test statistic u, the pretest estimator is:

(8)

where is the critical value of chi-square distribution with J degrees of freedom and significance level. With the given of degrees of freedom, the critical value is determined by the level of test significance, which is between 0 and 1. When, pretest estimator becomes UCRPLM estimator. When, pretest estimator is FCRPLM estimator.

4. Monte Carlo Experiments

4.1. Design

In our experiments the number of choice alternatives is and the number of individuals is. Each individual is assumed to be observed once. The four explanatory variables for each individual and each alternative are generated from independent log-normal distributions. The coefficients for each individual are generated from multivariate normal distribution, with,. The variance of each random coefficient is,. The covariance elements,. The correlation takes the values 0, 0.2, 0.4, 0.6, 0.8. The values of and are held fixed over the Monte Carlo samples in each experiment. The choice probability for each individual is generated with the logit-smoothed accept-reject simulator suggested by [6] .

Our simulation and RPL model estimation were carried out in NLOGIT 5.0. Based on our Monte Carlo experiment results, [7] and [8] , we use 100 Halton draws to simulate choice probabilities during MSL estimation. The positive-part Stein-like and pretest estimators were calculated based on the likelihood ratio (LR), Lagrange multiplier (LM) and Wald test statistics with 25%, 5% and 1% significance level. Because the empirical percentile values of LR test are closer to the related critical values than those of LM and Wald tests, we only provide the results based on the LR test statistic. Using Monte Carlo experiments to study the RPL model, especially with correlated parameters, is numerically challenging. Key elements that are worth mentioning are 1) for the uncorrelated parameter model conditional logit estimates were used as starting values; 2) for the correlated parameter model the estimates from 1) were used as starting values; 3) samples for which con- vergence was not achieved were discarded, only 0.3% of the results are unconverged in our Monte Carlo experiments.

4.2. Results

To study how the pretest and shrinkage estimators reduce the estimation risk of the FCRPLM estimators, we calculate the MSEs of the estimated parameters mean, variance, covariance with the pretest, shrinkage and FCRPLM estimators respectively. First, we compare the MSE of the fully correlated estimators and those of

UCRPLM estimators, where MSE is the Monte Carlo average of the squared error loss. In

Table 1, the MSEs of UCRPLM estimators are all smaller than those of FCRPLM estimators. The risk of the estimated parameters mean with the FCRPLM is more than twice that of the UCRPLM. The MSEs of the estimated variance with the UCRPLM are about 25% of those with the FCRPLM. With nonzero correlation, the MSEs of estimated covariance parameters based on the FCRPLM are much bigger than those based on the UCRPLM. When the correlation and 0.4, the ratios of MSEs of estimated covariance elements are relatively smaller compared to the results for higher correlations. This implies that when the specification error is small, the FCRPLM, which is the correct model, has a much larger relative MSE for parameter covariance elements than the UCRPLM.

Table 1. The ratios of uncorrelated RPL model estimator MSE to the FCRPLM estimator MSE.

In Table 2, we compare the MSEs of LR based pretest and shrinkage estimators to those of FCRPLM estimators. All Table 2 ratios are less than one. The pretest and shrinkage estimators all perform better than the FCRPLM estimators. With a smaller level of test significance, the UCRPLM estimator is more fre- quently chosen as the pretest estimator and the pretest estimator has smaller MSE. However, compared to the shrinkage estimators, the LR based pretest estimators with have larger MSEs than the shrinkage estimators with the shrinkage constant, especially for the estimated covariance elements, which have the smallest ratio values.

The covariance elements reveal important information about the joint effect of alternative attributes on people' decisions. If two random coefficients are highly positively correlated with each other, it means people are attracted and motivated by both of the related attributes. In our Monte Carlo experiments, the shrinkage estimators with higher shrinkage constant a outperform estimators with less shrinkage and most of the pretest estimators.

Since one of the advantages of RPL model is providing the information on the share of population that places a positive or negative value on the alternative attributes, we also calculate the joint probability of the first two estimated parameters are less than zero. Table 3 shows the share of population putting a negative value on the attributes. Compared to the results with UCRPLM and FCRPLM estimates, the joint probability with FCRPLM estimates are closer to the true value with larger MSEs, except for the. From Table 3, the pretest and shrinkage estimates reduce the MSE of the joint probability estimator compared to the FCRPL model estimates. Even though the bias of the joint probability with pretest and shrinkage estimates are higher than UCRPLM and FCRPLM estimates, the difference is small in magnitude.

To analyze the sensitivity of the RPL model in response to a change in the level of alternative attribute, we calculate the mean estimates of direct elasticity with the true parameters, Table 4, and the Monte Carlo mean estimates of direct elasticity based on pretest, positive-part Stein-like estimates and FCRPLM estimates,

Table 2. The ratios of LR Based pretest, shrinkage estimator MSE to the FCRPLM estimator MSE.

Table 3. The Share of population putting negative value on the first two attributes of each alternative,.

Note: [ ] provides the MSE results, {} provides bias results.

Table 4. The mean estimates of direct elasticity with true parameters.

Table 5. Since the pretest estimator with smaller level of test significance has smaller MSE, we use the pretest estimator with 1% significance level. The first explanatory variable in each alternative is chosen to calculate the related mean estimates of direct elasticity.

Comparing the results in Table 4 to Table 5, we find that the results with FCRPLM estimates are all higher than the true value. When the, the results with pretest and shrinkage estimators are closer to the true value than those based on the FCRPLM estimators. The shrinkage estimators with the larger shrinkage constant have smaller bias of the Monte Carlo mean direct elasticity estimates than the pretest estimates and shrinkage estimates with smaller shrinkage constant. At the same time, the shrinkage and pretest estimators have smaller standard error of the Monte Carlo mean direct elasticity estimates than the FCRPLM estimates. Based on our Monte Carlo experiment results, the shrinkage and pretest estimates will give more reliable mean direct elasticity estimates than the FCRPLM estimates, especially with a larger shrinkage constant.

5. Conclusion

According to our Monte Carlo experiment results, the UCRPLM estimators have smaller estimation risk than the

Table 5. The Monte Carlo mean estimates of direct elasticity based on pretest, shrinkage and FCRPLM estimates.

Note: ( ) provides the standard error results.

FCRPLM estimators. The pretest and positive-part Stein-like estimators both perform better than the FCRPLM estimators. The positive-part Stein-like estimators with higher shrinkage constant a outperform those with a smaller one and the pretest estimators. Shrinkage estimation reduces the risk of the FCRPLM estimators by shrinking the FCRPLM estimates towards the UCRPLM estimates. Providing the information on the share of population putting a negative or positive value on the alternative attributes is one of the advantages of the RPL model. When the random coefficients are correlated to each other, the FCRPLM estimator of this quantity has a smaller bias and slightly larger MSE than the UCRPLM estimator. Based on our Monte Carlo experiments, the pretest and shrinkage estimates can reduce the MSEs of the estimated results of share of the population putting a positive or negative value on alternative attributes as well. The Monte Carlo mean estimates of direct elasticity based on the pretest and shrinkage estimators with a larger shrinkage constant are closer to the true value with smaller standard errors than those based on the FCRPLM estimators.

Cite this paper

Tong Zeng,R. Carter Hill, (2016) Shrinkage Estimation in the Random Parameters Logit Model. Open Journal of Statistics,06,667-674. doi: 10.4236/ojs.2016.64056

References

1. Greene, W.H. (2012) Econometric Analysis. Pearson Education, Inc., NJ.

2. Train, K.E. (2009) Discrete Choice Methods with Simulation. Cambridge University Press, Cambridge.
http://dx.doi.org/10.1017/CBO9780511805271

3. James, W. and Stein, C.M. (1961) Estimation with Quadratic Loss. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1, 361-379, University California Press, Berkeley.

4. Stein, C.M. (1956) Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1954-1955, I, 197-206. University California Press, Berkeley.

5. Kim, M. and Hill, R.C. (1995) Shrinkage Estimation in Nonlinear Regression: the Box-Cox Transformation. Journal of Econometrics, 66, 1-33.
http://dx.doi.org/10.1016/0304-4076(94)01606-Z

6. McFadden, D. (1989) A Method of Simulated Moments for Estimation of Discrete Response Models without Numerical Integration. Econometrica, 57, 995-1026.
http://dx.doi.org/10.2307/1913621

7. Bhat, C.R. (2001) Quasi-Random Maximum Simulated Likelihood Estimation of the Mixed Multinomial Logit Model. Transportation Research Part B, 35, 677-693.
http://dx.doi.org/10.1016/S0191-2615(00)00014-X

8. Zeng, T. (2016) Using Halton Sequences in Random Parameters Logit Models. Journal of Statistical and Econometric Methods, 5, 59-86.

NOTES

¹Other choices are possible. See Train (2009, 136).

Journal Menu >>