^{1}

^{*}

^{1}

In order to overcome the well-known multicollinearity problem, we propose a new Stochastic Restricted Liu Estimator in logistic regression model. In the mean square error matrix sense, the new estimation is compared with the Maximum Likelihood Estimation, Liu Estimator Stochastic Restricted Maximum Likelihood Estimator etc. Finally, a numerical example and a Monte Carlo simulation are given to explain some of the theoretical results.

Consider the following multiple logistic regression model is

y i = π i + ε i , i = 1 , ⋯ , n , (1.1)

which follows Bernoulli distribution with parameter π i as

π i = exp ( x ′ i β ) 1 + exp ( x ′ i β ) , (1.2)

where β is a ( p + 1 ) × 1 vector of coefficients and x i is the i^{th} row of X, which is an n × ( p + 1 ) data matrix with P explanatory variables, ε i is independent with mean zero and variance π i ( 1 − π i ) of the response y i . The maximum likelihood method is the most commonly used method of estimating parameters and the Maximum Likelihood Estimator (MLE) is defined as

β ^ MLE = C − 1 X ′ W ^ Z , (1.3)

where C = X ′ W ^ X ; W ^ = d i a g [ π ^ i ( 1 − π ^ i ) ] and Z is the column vector with i^{th} element equals log ( π ^ i ) + y i − π ^ i π ^ i ( 1 − π ^ i ) , which is an asymptotically unbiased estimate of β . The covariance matrix of β ^ M L E is

C o v ( β ^ M L E ) = ( X ′ W ^ X ) − 1 = C − 1 , (1.4)

Multicollinearity inflates the variance of the Maximum Likelihood Estimator (MLE) in the logistic regression. Therefore, MLE is no longer the best estimate of parameter in the logistic regression model.

To overcome the problem of multicollinearity in the logistic regression, many scholars conducted a lot of research. Schaffer et al. (1984) [

Some scholars also improve estimation by limiting unknown parameters in the model which may be exact or stochastic. Where additional linear restriction on parameter vector is assumed to hold, Duffy and Santer (1989) [

In this article, we propose a new estimator which is called the Stochastic Restricted Liu Estimator (SRLE) when the linear stochastic restrictions are available in addition to the logistic regression model. The article is structured as follows. Model specifications and the new estimators are proposed in Section 2. Section 3 is derived to compare the mean square error matrix (MSEM) of SRLE, MLE etc. Section 4 is a Numerical Example. A Monte Carlo Simulation is used to verify the above theoretical results shown in Section 5.

For the unrestricted model given in Equation (1.1), the LLE proposed by Liu (1993), Urgan and Tez (2008), Mansson et al. (2012) is defined as

β ^ L L E = Z d β ^ M L E , (2.1)

where 0 < d < 1 is a parameter and Z d = ( C + I ) − 1 ( C + d I ) . The bias and variance matrices of the LLE:

B i a s ( β ^ L L E ) = ( Z d − I ) β = b 1 , (2.2)

C o v ( β ^ L L E ) = Z d C − 1 Z d , (2.3)

In addition to sample model (1.1), let us be given some prior information about β in the form of a set of j independent linear stochastic restrictions as follows:

h = H β + v ; E ( v ) = 0 , C o v ( v ) = Ψ , (2.4)

where H is a q × ( p + 1 ) of full rank q ≤ ( p + 1 ) known elements, h is an q × 1 stochastic known vector and v is an q × 1 random vector of disturbances with dispersion matrix Ψ and mean 0, and Ψ is assumed to be known q × q positive definite matrix. Further, it is assumed that v is stochastically independent of ε * = ( ε 1 , ⋯ , ε n ) , i.e. E ( ε * v ′ ) = 0 .

For the restricted model specified by Equations (1.1) and (2.4), the SRMLE proposed by Varathan Nagarajah and Pushpakanthie (2015), the SRLMLE proposed by Varathan N, Wijekoon P (2016) are denoted as

β ^ S R M L E = β ^ M L E + C − 1 H ′ ( Ψ + H C − 1 H ′ ) − 1 ( h − H β ^ M L E ) , (2.5)

β ^ S R L M L E = Z d β ^ S R M L E , (2.6)

respectively, the bias and variance matrices of the SRMLE and SRLMLE:

B i a s ( β ^ S R M L E ) = 0 , (2.7)

B i a s ( β ^ S R L M L E ) = ( Z d − I ) β = b 1 , (2.8)

C o v ( β ^ S R M L E ) = C − 1 − C − 1 H ′ ( Ψ + H C − 1 H ′ ) − 1 H C − 1 = A , (2.9)

and

C o v ( β ^ S R L M L E ) = Z d A Z d , (2.10)

respectively.

We propose the Mix Maximum Likelihood Estimator (MME) [

β ^ M M E = ( C + H ′ Ψ − 1 H ) − 1 ( X ′ W ^ y + H ′ Ψ − 1 h ) , (2.11)

the bias and variance matrices of the MME: B i a s ( β ^ M M E ) = 0 ,

C O V ( β ^ M M E ) = ( C + H ′ Ψ − 1 H ) − 1 = C − 1 − C − 1 H ′ ( Ψ − 1 + H C − 1 H ′ ) − 1 = B .

In this paper, we propose a new estimator which is named Stochastic Restricted Liu Estimator. Defined as follows

β ^ S R L E = Z d β ^ M M E , (2.12)

the bias and variance matrices of the SRLE:

B i a s ( β ^ S R L E ) = E ( β ^ S R L E ) − β = ( Z d − I ) β = b 1 , (2.13)

and

C o v ( β ^ S R L E ) = D ( β ^ S R L E ) = Z d B Z d , (2.14)

respectively.

Now we will give a theorem and a lemma that will be used in the following paragraphs.

Theorem 2.1. [

Lemma 2.1. [

In this section, we will compare SRLE with MLE, LLE, SRMLE, SRLMLE under the standard of MSEM.

First, the MSEM of β ^ which is an estimator of β is

M S E M ( β ^ ) = C o v ( β ^ ) + [ B i a s ( β ^ ) ] [ B i a s ( β ^ ) ] ′ , (3.1)

where B i a s ( β ^ ) is the bias vector and C o v ( β ^ ) is the dispersion matrix. For two given estimators β ^ 1 and β ^ 2 , the estimator β ^ 2 is considered to be better than β ^ 1 in the MSEM criterion, if and only if

Δ ( β ^ 1 , β ^ 2 ) = M S E M ( β ^ 1 ) − M S E M ( β ^ 2 ) ≥ 0 , (3.2)

The scalar mean square error matrix (MSE) is defined as

M S E ( β ^ ) = t r ( M S E M ( β ^ ) ) , (3.3)

Note that the MSEM criterion is always superior over the scalar MSE criterion, we only consider the MSEM comparisons among the estimators.

In this section, we make the MSEM comparison between the MLE and SRLE.

First, the MSEM of MLE and SRLE as

M S E M ( β ^ M L E ) = C − 1 , (3.4)

and

M S E M ( β ^ S R L E ) = Z d B Z d + b 1 b ′ 1 , (3.5)

respectively.

We now compare these two estimates to the criterion of the MSEM

Δ 1 = M S E M ( β ^ M L E ) − M S E M ( β ^ S R L R E ) = C − 1 − Z d B Z d − b 1 b ′ 1 = C − 1 − ( Z d B Z d + b 1 b ′ 1 ) = M 1 − N 1 , (3.6)

where M 1 = C − 1 and N 1 = Z d B Z d + b 1 b ′ 1 . Obviously, b 1 b ′ 1 is non-negative definite matrices, C − 1 and Z d B Z d are positive definite. Using Theorem 2.1, it is clear that N 1 is positive define matrix. By Lemma 2.1, if λ max ( N 1 M 1 − 1 ) < 1 , where λ max ( N 1 M 1 − 1 ) is the largest eigen value of N 1 M 1 − 1 then M 1 − N 1 is positive definite matrix. Based on the above discussions, the following theorem can be proved.

Theorem 3.1. For the restricted linear model specified by Equations (1.1) and (2.4), the SRLE is superior to MLE if and only if λ max ( N 1 M 1 − 1 ) < 1 in the MSEM sense.

First, the MSEM of LLE as

M S E M ( β ^ L L E ) = Z d C − 1 Z d + b 1 b ′ 1 . (3.7)

We now compare these two estimates to the criterion of the MSEM

Δ 2 = M S E M ( β ^ L L E ) − M S E M ( β ^ S R L R E ) = Z d C − 1 Z d − Z d B Z d + b 2 b ′ 2 − b 2 b ′ 2 = Z d D Z d (3.8)

where D = C − 1 H ′ ( Ψ − 1 + H C − 1 H ′ ) − 1 H C − 1 . Obviously, Z d D Z d is positive definite. Based on the above discussions, the following theorem can be proved.

Theorem 3.2. For the restricted linear model specified by Equations (1.1) and (2.4), the SRLE is always superior to LLE in the MSEM sense.

First, the MSEM of SRMLE as

M S E M ( β ^ S R L E ) = A . (3.9)

We now compare these two estimates to the criterion of the MSEM

Δ 3 = M S E M ( β ^ S R M L E ) − M S E M ( β ^ S R L R E ) = C − 1 − C − 1 H ′ ( Ψ + H C − 1 H ′ ) − 1 H C − 1 − Z d B Z d − b 1 b ′ 1 = C − 1 − [ F + Z d B Z d + b 1 b ′ 1 ] = M 1 − N 3 (3.10)

where F = C − 1 H ′ ( Ψ + H C − 1 H ′ ) − 1 H C − 1 and N 3 = F + Z d B Z d + b 1 b ′ 1 . Obviously, b 1 b ′ 1 is non-negative definite matrices, F and Z d B Z d are positive definite. Using Theorem 2.1, it is clear that N 3 is positive define matrix. By Lemma 2.1, if λ max ( N 3 M 1 − 1 ) < 1 , where λ max ( N 3 M 1 − 1 ) is the largest eigen value of N 3 M 1 − 1 then M 1 − N 3 is positive definite matrix. Based on the above discussions, the following theorem can be proved.

Theorem 3.3. For the restricted linear model specified by Equations (1.1) and (2.4), the SRLE is superior to SRMLE if and only if λ max ( N 3 M 1 − 1 ) < 1 in the MSEM sense.

First, the MSEM of SRMLE as

M S E M ( β ^ S R L M L E ) = Z d A Z d + b 1 b ′ 1 . (3.11)

Now, we consider the following difference

Δ 4 = M S E M ( β ^ S R L M L E ) − M S E M ( β ^ S R L R E ) = Z d A Z d − Z d B Z d + b 1 b ′ 1 − b 1 b ′ 1 = Z d D Z d − Z d F Z d = M 4 − N 4 (3.12)

where M 4 = Z d D Z d and N 4 = Z d F Z d . Obviously, D , M 4 and N 4 are positive definite matrices. By Lemma 2.1, if λ max ( N 4 M 4 − 1 ) < 1 , where λ max ( N 4 M 4 − 1 ) is the largest eigen value of N 4 M 4 − 1 then M 4 − N 4 is positive definite matrix. Based on the above discussions, the following theorem can be proved.

Theorem 3.4. For the restricted linear model specified by Equations (1.1) and (2.4), the SRLE is superior to SRLMLE if and only if λ max ( N 4 M 4 − 1 ) < 1 in the MSEM sense.

In this section, we now consider the data set of IRIS from UCI to illustrate our theoretical results.

A binary logistic regression model is set where the dependent variable is as follows. If the plant is Iris-setosa, it is indicated with 0 and if the plant is Iris-versicolor, it is 1. The explanatory variables is as follows. x 1 : Sepal. Length; x 2 : Petal. Length; and x 3 : Petal. Width.

The sample consists of the first 80 observations. The correlation matrix can be seen in

From

1) With the increase of d, the MSE values of the estimators are decreasing which are LRE, SRRMLE, SRLRE, SRLMLE, SRLE. 2) With the increase of d, the MSE values of the estimators are same which are MLE, SRMLE, MME. 3) The new estimator is always superior to the other estimators.

To illustrate the above theoretical results, the Monte Carlo Simulation is used for data Simulation. Following McDonald and Galarneau (1975) [

x i j = ( 1 − ρ 2 ) 1 / 2 z i j + ρ z i , p , i = 1 , 2 , ⋯ , n , j = 1 , 2 , ⋯ , p , (5.1)

where z i j are pseudo-random numbers from standardized normal distribution and ρ 2 represents the correlation between any two explanatory variables.

In this section, we set ρ to take 0.70, 0.80, 0.99 and n to take 20, 100, 200 for the dependent variable with two and four explanatory variables. The dependent variable y i in (1.1) is obtained from the Bernoulli ( π i ) distribution where

π i = exp ( x ′ i β ) 1 + exp ( x ′ i β ) . The parameter values of β 1 , ⋯ , β p are chosen so that ∑ j = 1 p β j 2 = 1 and β 1 = ⋯ = β p . Further for the Liu parameter d, some selected values is chosen so that 0 ≤ d ≤ 1 . Moreover, for the restriction, we choose

H = ( 1 − 1 0 0 0 1 − 1 0 0 0 1 − 1 ) , h = ( 1 − 2 1 ) and Ψ = ( 1 0 0 0 1 0 0 0 1 ) , (5.2)

The simulation is repeated 2000 times by generating new pseudo-random numbers and the simulated MSE values of the estimators are obtained using the following equation

M S E ^ ( β ^ * ) = M e a n { t r [ M S E M ( β ^ , β ) ] } = 1 2000 ∑ n = 1 2000 ( β ^ − β ) ′ ( β ^ − β ) (5.3)

The results of the simulation are reported in Tables A3-A9 (Appendix A) and also displayed in Figures A1-A3 (Appendix B).

From Tables A3-A9, Figures A1-A3, we can conclude that:

1) The MSE values of all the estimators are increasing along with the increase of ρ ; 2) The MSE values of all the estimators are decreasing along with the increase of n; 3) SRLE is always superior to the MLE, LLE, SRMLE, SRLMLE for all d, n and ρ .

In this paper, we proposed the Stochastic Restricted Liu Estimator (SRLE) for logistic regression model when the linear stochastic restriction was available. In the sense of MSEM, we got the necessary and sufficient condition or sufficient condition that SRLE was superior to MLE, LLE, SRMLE and SRLMLE and Verify its superiority by using Monte Carlo simulation. How to reduce the new estimation’s bias is the focus of our next step which guaranteed mean square error does not increase.

This work was supported by the Natural Science Foundation of Henan Province of China (No. 152300410112).

Zuo, W.B. and Li, Y.L. (2018) A New Stochastic Restricted Liu Estimator for the Logistic Regression Model. Open Journal of Statistics, 8, 25-37. https://doi.org/10.4236/ojs.2018.81003

x 1 | x 2 | x 3 | |
---|---|---|---|

x 1 | 1 | 0.833919 | 0.811755 |

x 2 | 0.833919 | 1 | 0.97747 |

x 3 | 0.811755 | 0.97747 | 1 |

k, d = 0 | k, d = 0.2 | k, d = 0.4 | k, d = 0.5 | k, d = 0.6 | k, d = 0.8 | k, d = 0.9 | k, d = 0.99 | ||
---|---|---|---|---|---|---|---|---|---|

MLE | 8.0221e+03 | 8.0221e+03 | 8.0221e+03 | 8.0221e+03 | 8.0221e+03 | 8.0221e+03 | 8.0221e+03 | 8.0221e+03 | |

LLE | 8.0221e+03 | 220.9556 | 140.9535 | 122.7412 | 110.4983 | 95.4561 | 90.5954 | 87.1297 | |

SRMLE | 102.4228 | 102.4228 | 102.4228 | 102.4228 | 102.4228 | 102.4228 | 102.4228 | 102.4228 | |

SRLMLE | 28.0970 | 33.8486 | 44.1570 | 51.0200 | 59.0222 | 78.4441 | 89.8639 | 101.1157 | |

SRLE | 0.7705 | 0.7863 | 0.9251 | 1.0406 | 1.1869 | 1.5716 | 1.8102 | 2.0511 |

d = 0 | d = 0.10 | d = 0.30 | d = 0.40 | d = 0.50 | d = 0.70 | d = 0.80 | d = 0.99 | |
---|---|---|---|---|---|---|---|---|

MLE | 7.4662 | 7.4662 | 7.4662 | 7.4662 | 7.4662 | 7.4662 | 7.4662 | 7.4662 |

LLE | 4.4468 | 4.6588 | 5.1626 | 5.3374 | 5.7376 | 6.3522 | 6.6154 | 7.4366 |

SRMLE | 5.9236 | 5.9236 | 5.9236 | 5.9236 | 5.9236 | 5.9236 | 5.9236 | 5.9236 |

SRLMLE | 4.7969 | 4.8832 | 5.0778 | 5.1698 | 5.4686 | 5.5636 | 5.7184 | 5.9218 |

SRLE | 1.3450 | 1.3954 | 1.4974 | 1.5506 | 1.6109 | 1.7505 | 1.8285 | 1.9793 |

d = 0 | d = 0.10 | d = 0.30 | d = 0.40 | d = 0.50 | d = 0.70 | d = 0.80 | d = 0.99 | |
---|---|---|---|---|---|---|---|---|

MLE | 9.1711 | 9.1711 | 9.1711 | 9.1711 | 9.1711 | 9.1711 | 9.1711 | 9.1711 |

LLE | 4.8646 | 5.0099 | 5.7310 | 6.0395 | 6.6352 | 7.4975 | 7.7415 | 9.1088 |

SRMLE | 6.4694 | 6.4694 | 6.4694 | 6.4694 | 6.4694 | 6.4694 | 6.4694 | 6.4694 |

SRLMLE | 5.1932 | 5.3479 | 5.6561 | 5.6492 | 5.8148 | 6.1384 | 6.2194 | 6.5021 |

SRLE | 1.3138 | 1.3630 | 1.4919 | 1.5626 | 1.6396 | 1.8170 | 1.9175 | 2.1239 |

d = 0 | d = 0.10 | d = 0.30 | d = 0.40 | d = 0.50 | d = 0.70 | d = 0.80 | d = 0.99 | |
---|---|---|---|---|---|---|---|---|

MLE | 73.1647 | 73.1647 | 73.1647 | 73.1647 | 73.1647 | 73.1647 | 73.1647 | 73.1647 |

LLE | 4.5979 | 5.5820 | 11.9735 | 17.0739 | 23.5922 | 38.2869 | 50.5697 | 71.6768 |

SRMLE | 7.0724 | 7.0724 | 7.0724 | 7.0724 | 7.0724 | 7.0724 | 7.0724 | 7.0724 |

SRLMLE | 5.9067 | 6.0027 | 6.2015 | 6.2544 | 6.4525 | 6.6168 | 6.8033 | 6.9489 |

SRLE | 1.0659 | 1.0958 | 1.2590 | 1.3805 | 1.5313 | 1.9262 | 2.1691 | 2.7055 |

d = 0 | d = 0.10 | d = 0.30 | d = 0.40 | d = 0.50 | d = 0.70 | d = 0.80 | d = 0.99 | |
---|---|---|---|---|---|---|---|---|

MLE | 5.0422 | 5.0422 | 5.0422 | 5.0422 | 5.0422 | 5.0422 | 5.0422 | 5.0422 |

LLE | 4.7008 | 4.7702 | 4.7837 | 4.8357 | 4.8493 | 4.9529 | 4.9284 | 5.0390 |

SRMLE | 4.9033 | 4.9033 | 4.9033 | 4.9033 | 4.9033 | 4.9033 | 4.9033 | 4.9033 |

SRLMLE | 4.6768 | 4.6892 | 4.7607 | 4.7660 | 4.8208 | 4.8083 | 4.8644 | 4.8972 |

SRLE | 1.3186 | 1.3234 | 1.3332 | 1.3395 | 1.3465 | 1.3552 | 1.3621 | 1.3726 |

d = 0 | d = 0.10 | d = 0.30 | d = 0.40 | d = 0.50 | d = 0.70 | d = 0.80 | d = 0.99 | |
---|---|---|---|---|---|---|---|---|

MLE | 5.6054 | 5.6054 | 5.6054 | 5.6054 | 5.6054 | 5.6054 | 5.6054 | 5.6054 |

LLE | 5.1351 | 5.1538 | 5.2317 | 5.2462 | 5.3589 | 5.5121 | 5.5192 | 5.6019 |

SRMLE | 5.4591 | 5.4591 | 5.4591 | 5.4591 | 5.4591 | 5.4591 | 5.4591 | 5.4591 |

SRLMLE | 5.1120 | 5.1369 | 5.2144 | 5.2191 | 5.2741 | 5.3336 | 5.3462 | 5.4271 |

SRLE | 1.3596 | 1.3687 | 1.3845 | 1.3930 | 1.4041 | 1.4188 | 1.4303 | 1.4466 |

d = 0 | d = 0.10 | d = 0.30 | d = 0.40 | d = 0.50 | d = 0.70 | d = 0.80 | d = 0.99 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|

MLE | 5.2945 | 5.2945 | 5.2945 | 5.2945 | 5.2945 | 5.2945 | 5.2945 | 5.2945 | |||

LLE | 5.1219 | 5.1233 | 5.1459 | 5.1748 | 5.1897 | 5.2604 | 5.2559 | 5.2669 | |||

SRMLE | 5.2174 | 5.2174 | 5.2174 | 5.2174 | 5.2174 | 5.2174 | 5.2174 | 5.2174 | |||

SRLMLE | 5.0657 | 5.0520 | 5.0930 | 5.1300 | 5.1753 | 5.1433 | 5.2052 | 5.2148 | |||

SRLE | 1.2906 | 1.2930 | 1.2971 | 1.2995 | 1.3026 | 1.3063 | 1.3102 | 1.3163 | |||

d = 0 | d = 0.10 | d = 0.30 | d = 0.40 | d = 0.50 | d = 0.70 | d = 0.80 | d = 0.99 | |
---|---|---|---|---|---|---|---|---|

MLE | 9.0269 | 9.0269 | 9.0269 | 9.0269 | 9.0269 | 9.0269 | 9.0269 | 9.0269 |

LLE | 5.2509 | 5.4181 | 5.9620 | 6.2403 | 6.5520 | 7.4280 | 7.8798 | 9.1243 |

SRMLE | 6.1827 | 6.1827 | 6.1827 | 6.1827 | 6.1827 | 6.1827 | 6.1827 | 6.1827 |

SRLMLE | 5.9722 | 5.9833 | 6.0102 | 6.0147 | 6.0688 | 6.1128 | 6.1244 | 6.1267 |

SRLE | 1.3862 | 1.4439 | 1.5812 | 1.6644 | 1.7576 | 1.9647 | 2.0818 | 2.3316 |