_{1}

Loss data structures in non-life insurance businesses are increasingly complex, and the tendency of correlation and heterogeneity is gradually presented. Hierarchical model can breakthrough limitation that the traditional rate determination method only analyzes the loss data of the same insurance policy ; meanwhile, the accuracy of complex structure data prediction is improved. This paper, using a hierarchical generalized linear model, studies the non - life rate determination of multi-year loss data and takes auto insurance data for empirical analysis. The research results show that GLMM’s fitting degree is greatly improved compared with GLM, considering the random effects. It can more effectively reflect different risk individual differences and also reveal the heterogeneity and correlation of risk individual loss during multiple insurance periods.

In the 1990s, as a new statistical analysis technique, layered model is widely used in the world. The hierarchical model determines the model parameters to set its own probability submodel, extending the standard linear model (Linear Models, LM), generalized linear models (Generalized Linear Moeels, GLM) and the nonlinear model (Non-linear Models). In the process of using the above model for statistical analysis, data must be observed from independent (random) variables in general. At the same time in some actuarial and statistical problems [

Two of the core topics in non-life actuarial science are pricing and reserve assessment. For a property insurance company, its competitiveness and the company’s profitability are closely related to the rationality of pricing. In the 1990s, British actuaries introduced GLM in non-life insurance pricing. Since then, GLM has been widely used in non-life insurance pricing practices in many countries and made great achievements.

However, GLM still has inadequacies. For example, when some classification explanatory variables have less data at some level, the standard error of these horizontal parameter assessments will be enhanced. Moreover, the direct application of GLM also faces too many estimated parameters. In order to solve these problems, the actuary incorporates reliability theory into the GLM framework, and some statistical models and methods appeared [

The liability of the largest share of the balance sheet of the property insurance company is the claim reserve. The accurate assessment of the claim reserve is conducive to the correct judgment of the operating performance and solvency of the property insurance company. Therefore, the reasonable assessment of this liability is of great significance to the development of the property insurance company.

Make the following assumptions: There are m risk individuals, using random variables Y = ( Y i j ) ( i = 1 , 2 , ⋯ , n ) to represent the number of claims or amounts incurred by the i-th risk individual in the year of the i-th policy. Use the GLMM framework to set the following three parts:

1) The setting of the random part: Under the premise of specifying the random effect b = ( b 1 , b 2 , ⋯ , b n ) T , the observed variables of Y i j are independent of each other, and it is also consistent with the distribution of the exponential population (EDF). Then the probability density function can be recorded as:

F ( y i j / b i , β , φ ) = exp ( y i j θ i j − ψ ( θ i j ) φ + c ( y i j , φ ) ) , j = 1 , 2 , ⋯ , n (1)

θ is a natural parameter. ψ ( • ) and c ( • ) signify a known function. Scale parameter is φ .

2) System section Settings: The relationship between the mean of the response variable and the explanatory variable can be represented by a linear predictor. Let’s say the prediction is η = X β + Z b . Its design matrix of fixed effect is expressed as X. The design matrix of random effects is represented by Z. The estimated parameters of model fixed benefit are β .

3) Setting of the join function: g ( μ ) = X β + Z b . This function is a monotone differentiable function. There’s a logarithmic bond, identity connecting and logit connections. Given g − 1 ( • ) = h ( • ) , then E ( Y | b ) = μ = h ( X β + Z b ) can be used to represent mean conditions.

The four basic assumptions in this structure:

1) Independence: That is, under the conditions of the risk parameters, the following assumption is made. The claims (or amounts) of the number (or amounts) of individual risk individuals is non-interference.

2) Distribution: Y i j | u i j is in consistent with the exponential distribution. Then the probability density function is:

f Y i j | u i ( y ; ϑ i j , φ ) = exp { ω i j φ ( y ϑ i j − b ( ϑ i j ) ) } c ( y , φ ) (2)

One of the known weights (constants) is ω i j > 0 , the natural and discrete parameters are respectively ϑ i j and φ , the given function is b ( • ) and c ( • ) .

3) Structuredness: There is a kind of change relation between μ and X β + Z b , and this relation can be connected by the join function. That is μ = h ( X β + Z v ) , or g ( μ ) = X β + Z v . The new variable produced by u through the strict monotonic function is the cumulative effect v, which could be written as = g 1 ( u ) .

4) Distribution of risk parameters: In HGLM, u i is a random risk parameter, which can depict heterogeneity risk characteristics of different risk individual i. This assumption v i is subject to the distribution of EDF and can be written as:

f v i ( ω ) = sxp { 1 λ i ( ψ i ω − b ( ω ) ) } d ( ψ i , λ i ) (3)

Above, the super parameter is ψ , and discrete parameter is φ and λ .

We can see through the above that the connections and differences between models can be summarized as follows:

1) Structure of the model: GLM can get GLMM by extension and expansion. HGLM is a relatively general framework. The linear prediction of GLM introduces a stochastic effect based on the assumption of normal distribution [

2) Theoretical Calculation: Compared to the other two models, GLM is relatively simple. In general, GLM use the construction of maximum natural functions to calculate the estimator parameters of MLE. The general calculation method is Fisher algorithm, Newton Raphson iterative algorithm and so on.

The data of this paper is from Sun Weiwei’s paper [

1) Model 1

Proposed a hypothesis: The number of claims is consistent with the Poisson distribution of the parameter μ i j . Ignoring the correlation between the heterogeneity of the number of claims made by the random benefit and the claim for three years, the model can be built as:

log ( μ i j ) = β 0 + β i j × agecat i j × valuecat i j μ i j = E ( numclaims i j )

numclaims i j ~ P ( μ i j ) (4)

2) Model 2

It’s in the same form as model 1. But at the same time, the zero value problem of the number of claims is taken into account. Assume the number of claims is consistent with the zero expansion Poisson distribution (ZIP), then the model can be built as:

log ( u i j ) = β 0 + β 1 j × agecat i j + β 2 j × valuecat i j μ i j = E ( numclaims i j )

numclaims i j ~ ZIP ( μ i j ) (5)

3) Model 3

Based on model 1, the problems that are ignored have been taken into account. Make assumptions under the GLMM framework: Random effects do not interfere with each other, and it is consistent with the normal distribution [

log ( u i j ) = β 0 + β 1 j × agecat i j + β 2 j × valuecat i j + b i E ( numclaims i j | b i ) = μ i j

numclaims i j | b i ~ P ( μ i j )

b i ~ N ( 0 , σ 2 ) (6)

4) Model 4

Make a further hypothesis: In the HGLM framework, the random effects u i of individual claim differences can be reflected. It also corresponds to the gamma distribution. Its obey parameters are α m 4 and β m 4 . Then the model can be built as:

log ( u i j ) = β 0 + β 1 j × agecat i j + β 2 j × valuecat i j + u i numclaims i j | u i ~ P ( μ i j )

In this study, the program package gamlss, lme44, glmmML, and hglm in R software is adopted. And model 1 uses maximum likelihood estimation. The calculation method is Fisher’s score iteration. The discrete parameter in the calculation is 1. In model 2, the estimation parameter method is the RS algorithm under the GAMLSS framework and calculates 12 iterations. Model 3 adopts Gauss-Hermitian integral method [

From the above empirical analysis, the parameters under different models differ greatly. Fundamentally, if random effects of three consecutive policy years are excluded, the GLM model should be built. The statistical quantity of AIC is

rate factor | parameters of model 1 | parameters of model 2 | parameters of model 3 |
---|---|---|---|

intercept term | −1.0225*** | 0.4382*** | −2.2621*** |

agecat2 | −0.1793*** | −0.1117*** | −0.2231*** |

agecat4 | −0.2636*** | −0.2008*** | −0.2649*** |

agecat5 | −0.4320*** | −0.3333*** | −0.4520*** |

agecat6 | −0.3520*** | −0.2472*** | −0.4037*** |

agecat10 | −0.2294*** | −0.1774*** | −0.2186*** |

valuecat3 | −0.1310 | 0.0382 | −0.1221 |

valuecat4 | −0.8596*** | −0.7892** | −0.8213* |

valuecat5 | −0.3604 | −0.2237 | −0.6511 |

valuecat6 | −1.6236** | −1.5906** | −1.4762* |

valuecat9 | −0.1855*** | −0.1418*** | −0.1990*** |

AIC statistic | 169100 | 145475.0 | 81169 |

Note: Data is from calculation of R software. *, ** and *** represent a significant level of confidence of 5%, 1%, 0.1%.

169,100. But the AIC statistic under the ZIP distribution is 14,575.0. And the AIC statistic under GLMM integral method is 81,169. Therefore, the AIC statistic of this method is the lowest, which improved the goodness of fit of the model. If the AIC statistic is used as the standard to measure the model, then the GLMM model will be greatly improved.

In the use of R software, model 4 cannot converge after 10 iterations. Thus it is eliminated. The valuecat and agecat rate factors in the sample data are multi-level classification variables. Not including the base level, each observation unit must estimate 10 fixed effects in a given period of time [

Through this study we can see that the layered framework has a great advantage in the processing of non-life insurance rate. It can deal with the relevant data of the policy year in different risk individuals in the claims data of non-life insurance companies, analyze the relationship between individual loss data in the same risk, and be applied in practice to help actuaries handle complex insurance data. At the same time HGLM can process the data and longitudinal data of the layered structure, and provide enlightenment to actuaries and related personnel on non-life insurance data structure so that they can make scientific analysis and reasonable interpretation of the results. By analyzing a hierarchical generalized linear model, this paper studies the non-life rate determination of multi-year loss data and takes auto insurance data for empirical analysis.

Miao, G.M. (2018) Application of Hierarchical Model in Non-Life Insurance Actuarial Science. Modern Economy, 9, 393-399. https://doi.org/10.4236/me.2018.93025