In recent years, numerous researches have been carried out with purpose of predicting motor vehicle crashes on transportation facilities as freeways and urban or rural highways. Accident process can be modeled successfully with assuming a dual-state data-generating process. Based on this assumption, road components like intersections or road segments have two states of perfectly safe and unsafe. Zero-inflated regression models are applied to model accidents usually in cases of preponderance of excess zero data in crash data. We handle in this research, the investigation into effective factors on frequency and severity of accidents on urban highways and use crash data of Mash had-Iran urban highways as a case study. We use in this study, the Poisson, Negative binomial, Zero-inflated Poisson and Zero-inflated Negative binomial regression models for modeling accidents, and traffic flow and road geometry related variables as in dependent variables of models. In addition to identifying effective factors on crash occurrence probability, we deal with comparison of models, evaluate and prove the efficiency of Zero-inflated regression models against traditional Poisson and Negative binomial models.
In recent years, numerous researches have been carried out with purpose of predicting motor vehicle crashes on transportation facilities as freeways and urban or rural highways [
In this paper, we deal with research that have been conducted for identifying the effective factors on frequency and severity of accidents on urban highways and use crash data of Mashhad urban highways as a case study. These data were gathered by transportation and traffic organization of Mashhad, using GIS for accurate record of time and place of accidents and assistance of police reports. The traffic flow and road geometry related factors have been used in this research as independent variables of models. Such variables are observed in studies of many other researchers particularly in the area of modeling crashes occurred on freeways and urban or rural highways [
Factors applied as road geometry related variables in this research, include number of lanes, horizontal curves and access roads. Although in some investigations, the factors of ADT or AADT per lane were applied in modeling instead of ADT or AADT, for considering the variable of number of lanes [
Statistical models applied in this research to identify effective parameters on crash occurrence on urban highways are four well-known regression models in crash modeling―Poisson, Negative Binomial (NB), Zero- Inflated Poisson (ZIP) and Zero-Inflated Negative Binomial (ZINB). Regression models in this research are developed for modeling accidents with property loss only and more severe accidents included injury and fatal accidents. After developing models, based on modeling conclusions, we examine the part of independent variables in probability of occurrence of accidents with property loss only and more severe accidents. Then we evaluate the models “goodness of fit” and compare them to find out the best and fittest models for no injury (with property loss only) and more severe (injury or fatal) accidents on urban highways.
In this research, the accidents of urban highways of Mashhad are modeled by four regression models―Poisson, Negative binomial (NB), Zero-inflated Poisson (ZIP) and Zero-inflated Negative binomial (ZINB). Two groups model were developed, one for accidents with property damage only and the other for more severe accidents (injury or fatal). Independent variables applied in these models include traffic-related and geometry-related variables. Flow-related variables include traffic volume and speed and geometry-related variables include number of lanes, horizontal curves and access roads. The specific effort made in this research, in addition to development and analysis of four considerable statistical models separately for modeling accidents, is separation of total traffic volume into passenger car volume, heavy vehicle volume including truck trailer, truck, bus and minibus and light non-passenger car vehicle volume including taxi, pickup and motorcycle. By this attempt, we intend to have a thorough look into the role of volume in occurrence of accidents with property damage and injury or fatal ones and see exactly which part of traffic have an effective or more effective part in accident occurrence. What is often heard is the role of heavy vehicles in crash occurrence, but it should be found out whether heavy vehicles result in more accidents or passenger cars or light non-passenger car vehicles like taxis or motorcycles have key role in accident occurrence or frequency.
Accident data are usually two-level data, the first and main level is often road segments i.e. the highway is divided into several parts or segments. The base of this segmentation is different, segmentation can be based on segment length that is, division of highway into equal segments. The problem of such division is that one can not assign to each section a constant value for traffic volume, which is considered an important and effective factor in this study. It is preferred therefore to do this segmentation based on total traffic volume. The second level is daily hours that is, the traffic peak hours is considered as the first sub-level, the day non-peak hours the second and night non-peak hours the third one.
The SAS 9.1 was used for statistical computations related to models. After statistical analyses, it is found out which parameters affect accident occurrence and which does not have much part in accident occurrence. The other important care in this research is evaluating models and their comparison to examine the efficiency of zero-inflated (ZI) models against traditional Poisson and Negative binomial (NB) regression models in modeling property and injury or fatal accidents on urban highways. In this study, to compare Poisson and NB regression models and also ZIP with ZINB regression models, we use significance of dispersion parameter and likelihood ratio (LR) test as criterions. The statistic of likelihood ratio test is given by the following equation:
This statistic has a Chi-squared distribution with
where LL is log-likelihood, k number of parameters and n number of observations. The less AIC is, the more model fit and model with the least AIC is the fit test one [
In Poisson regression model, the i-th observation of dependent variable
In Poisson model, the conditional variance is equal to conditional mean:
The log-likelihood of Poisson regression model is given by:
For estimating the regression coefficients by maximum likelihood method, derivative of log-likelihood relative to vector of coefficients,
Estimation of regression coefficients in Poisson regression model is not obtained of a direct equation, but the Newton-Raphson iteration procedure is used for estimating unknown parameters of the model [
In negative binomial (NB) model
The conditional mean of
The cause of investigators’ inclination for NB distribution is disadvantage of Poisson distribution in equality of mean and variance of distribution. The relationship between mean and variance of NB distribution is as following:
The variance of NB distribution is always greater than the mean, thus it fits the data with variance greater than mean.
In NB regression model, dispersion parameter should be estimated in addition to regression parameters.
For estimating
The other problem, which accident data often encounter, is preponderance of excess zero data. In other words, number of zero data is more than expected in Poisson and NB models. If one meets with excess zero data while data mining, uses zero-inflated (ZI) distribution for data analysis.
The underlying assumption of ZI models is that entities (e.g., intersections, segments, crosswalks, etc.) exist in two states [
1. True-zero or inherently safe state. Although in recent years some have defined it as “virtually safe state” to avoid having to defend the notion that sites can be perfectly safe
2. Non-zero state, which may happen to record zero accidents in an observation period that follows the Poisson (ZIP) or NB (ZINB) distribution.
First state happens with probability
Accident data are usually two-level data that the first level is often a specific segment of road and the other could be a specific period of year or hours of day. If number of crashes occurred on the i-th section in the
In ZIP model, the mean of Poisson distribution,
where
If accident data suffer from over dispersion in addition to excess zero data, as in NB regression model, ZINB regression model is applied in modeling. A count variable
ZIP and ZINB regression models are in fact a combination of Poisson and NB models with a model which models zero data. Therefore, the log-likelihood function, which is set to estimate the regression coefficients β and
where
Usually for simplicity of estimation, the same variables of vector
For testing the relevance of using ZI models instead of Poisson and NB regression models, the Vuong statistic is used. If
where
where
Vuong statistic is asymptotically standard normally distributed, and if
The accident data of Mashhad-Iran urban highways were used for modeling accidents on urban highways. This information is collected by Transportation and Traffic Organization of Mashhad with the help of GIS and using Police reports. In this research two groups model are applied to number of accidents with property loss only (no injury) and more severe accidents (injury or fatal). Therefore accident statistics were collected in two groups of no injury and more severe. Accident data are usually two-level data, the first and main level is often road segment i.e. highway is divided into several parts. In this study, highway segmentation is performed based on total traffic volume. The second level is daily hours that is, traffic peak hours is considered as the first sub-level, the day non-peak hours the second and night non-peak hours the third sub-level. The traffic flow related variables including volume and speed and geometric variables including number of lanes, horizontal curves and access roads, are applied as independent variables of models in this investigation.
To scrutinize the part of different vehicle types in crash occurrence or severity, the traffic is separated into passenger cars, heavy vehicles and light non-passenger car vehicles. The volume data separately were not available readily in Transportation and Traffic Organization of Mashhad. The Organization conducts once every few years, a comprehensive survey in November and obtains the passing volume of different kinds of vehicles through a main part of urban roads in daily hours. In November, the traffic condition of Mashhad is normal and basically, models made in this research do not consider the seasonal changes of traffic, because the statistics of traffic volume in different seasons were not available and the volume data is related to different hours of the day. According to this statistics, one can perceive the traffic combination and percent of each vehicle type in traffic and can trust this combination in future, but the total (equivalent) traffic volume is updated every year.
The Transportation and Traffic Organization of Mashhad, gives out the total traffic volume on different urban roads particularly highways in peak hours, for future years. According to this total equivalent volume, with having passenger car equivalent factors of vehicle types, percent of each in traffic combination and the ratio of total volume in non-peak hours to that in peak hours, the volume of each vehicle type in peak hour and 2 hours representing day and night non-peak hours will be obtained and also the volume of passenger cars and equivalent volume of heavy vehicles and light non-passenger car vehicles in peak hour and 2 representative hours of non-peak day and night hours. These calculations are conducted by Excel program. The passenger car equivalent factors of vehicle types are given in
The total volume (passenger car equivalent) is obtained by following equation:
where Vte is the total equivalent volume and
As the numbers of vehicle types are counted hourly, these numbers are volumes in terms of vehicle per hour
Passenger car | Taxi | Pickup | Minibus | Intercity bus | Intracity bus | Bicycle and Motorcycle | Heavy vehicles |
---|---|---|---|---|---|---|---|
1 | 2 | 1 | 2 | 2.5 | 5 | 0.5 | 2.5 |
tistics of the crashes and independent variables of the models, from 2006 to 2009 for 156 sections of Mashhad urban highways, with 67.5 kilometers length in overall, in 1872 sub-sections (468 sub-sections, considering 3 daily periods, for each year) are presented in
We intend to explore in this investigation, the effective factors on frequency and severity of crashes on urban highways through modeling no injury and more severe (injury and fatal) accidents on urban highways. The well known models of Poisson, Negative binomial (NB), Zero-inflated Poisson (ZIP) and Zero-inflated Negative binomial (ZINB) are applied to developing two groups model for no injury and more severe accidents. Here, according to collected statistics of accidents and data related to independent variables, the models are developed.
For estimating regression coefficients by maximum likelihood approach, Equation (6) is used. For estimating unknown parameters of the model, the iteration method of Newton-Raphson is applied. We consider first a vector of regression parameters as primary estimate,
where
For evaluating significance of independent variables, the inverse hessian obtained at the last iteration will be the asymptotic variance matrix. The variance of the estimates are the diagonal elements and the standard errors their square roots. The t statistic for each parameter is also constructed as the ratio of the parameter estimate over its standard error. If the p-value of a parameter is less than needed level of significance (0.05 or less), the corresponding variable is significant and will stay in model, otherwise it is neglected and leaves the model [
Variable | Mean | S.D. | Min | Max |
---|---|---|---|---|
Dependent variables | ||||
number of accidents with property loss only | 6.98 | 8.90 | 0 | 80 |
number of more severe accidents | 0.95 | 1.44 | 0 | 12 |
Traffic flow characteristics | ||||
Passenger cars volume | 1106 | 783 | 33 | 4603 |
Heavy vehicles volume (pc/h) | 570 | 442 | 29 | 2430 |
Light non-passenger car vehicles volume (pc/h) | 562 | 411 | 42 | 2245 |
Speed (km/h) | 55.3 | 18.7 | 0 | 100 |
Geometric characteristics | ||||
Number of lanes | 3.32 | 0.586 | 2 | 5 |
Number of vertical curves | 0.564 | 0.718 | 0 | 4 |
Number of access roads | 0.974 | 1.419 | 0 | 8 |
Such calculations were proceeding by SAS, estimated parameters and their significance evaluation for accidents with property loss only and more severe accidents are presented in
For estimating regression coefficients and dispersion parameter, the Newton-Raphson iteration procedure is applied like Poisson model, but the process is more complicated. First of all, we put
where
where
the following iteration to estimating updated values of
where:
Variable | No injury accidents | Injury or fatal accidents | ||
---|---|---|---|---|
Parameter | t-statistic | Parameter | t-statistic | |
Poisson model | ||||
Constant | −0.1531 | −1.69 | −0.6315 | −2.16 |
Passenger car volume | 0.00028 | 15.32 | −0.00011 | −1.13 |
Heavy vehicle volume | 0.00018 | 0.82 | −0.00026 | −1.92 |
Light non-passenger car vehicle volume | 0.00054 | 13.10 | 0.00074 | 6.85 |
Traffic speed | 0.00687 | 8.05 | 0.00351 | 1.51 |
Number of lanes | 0.1901 | 8.74 | −0.1521 | 2.81 |
Number of horizontal curves | 0.3221 | 24.14 | 0.5268 | 14.31 |
Number of access roads | 0.1354 | 18.08 | 0.0781 | 3.43 |
Negative binomial model | ||||
Constant | −0.8012 | −3.11 | −0.9245 | −2.80 |
Passenger car volume | 0.00036 | 5.42 | −0.00011 | −1.15 |
Heavy vehicles volume | −0.00001 | −0.08 | −0.00019 | −1.27 |
Light non-passenger car vehicles volume | 0.00066 | 4.93 | 0.0010 | 6.30 |
Traffic speed | 0.01053 | 4.51 | 0.0043 | 1.43 |
Number of lanes | 0.2359 | 3.50 | −0.0971 | −1.20 |
Number of horizontal curves | 0.3715 | 6.90 | 0.5031 | 9.52 |
Number of access roads | 0.1743 | 6.75 | 0.09701 | 3.32 |
Dispersion parameter | 1.0921 | 17.67 | 0.7718 | 7.65 |
The estimated values of
In ZI regression models, the link relations of distribution mean,
Variable | No injury accidents | Injury or fatal accidents | ||
---|---|---|---|---|
Parameter | t-statistic | Parameter | t-statistic | |
Zero-inflated Poisson (ZIP) model. Poisson part | ||||
Constant | 0.8001 | 8.32 | −0.559 | −1.62 |
Passenger car volume | 0.00022 | 10.80 | −0.00012 | −1.53 |
Heavy vehicle volume | 0.00014 | 3.57 | −0.00015 | −0.86 |
Light non-passenger car vehicle volume | 0.00041 | 10.84 | 0.00076 | 5.48 |
Traffic speed | 0.00392 | 5.04 | 0.0021 | 0.56 |
Number of lanes | 0.1012 | 4.34 | −0.0008 | −0.02 |
Number of horizontal curves | 0.2317 | 15.30 | 0.5012 | 11.88 |
Number of access roads | 0.0939 | 12.73 | 0.1075 | 3.59 |
Zero-inflated Negative binomial (ZINB) model. NB part | ||||
Constant | 0.2718 | 0.78 | −1.2654 | −4.15 |
Passenger car volume | 0.00028 | 4.12 | −0.00012 | −1.42 |
Heavy vehicle volume | 0.00009 | 0.67 | −0.00017 | −1.32 |
Light non-passenger car vehicle volume | 0.00064 | 5.09 | 0.0011 | 7.17 |
Traffic speed | 0.0084 | 3.84 | 0.0130 | 4.42 |
Number of lanes | 0.0569 | 0.90 | -0.0991 | −1.31 |
Number of horizontal curves | 0.3031 | 6.51 | 0.4467 | 8.69 |
Number of access roads | 0.1170 | 5.33 | 0.0656 | 2.41 |
Dispersion parameter | 0.7650 | 13.12 | 0.6535 | 7.14 |
What is considerable after modeling, is not only significant variables issue but goodness-of-fit evaluation and comparison between models i.e. not only after modeling we will see which variables have considerable effect on likelihood of property and more severe accidents but model fit and comparison issue is also considered. After receiving results, first Poisson and NB models are compared in terms of data dispersion, so the significance evaluation of dispersion parameter in NB model and LR test is implemented. For comparing Poisson and ZIP models, Vuong test is applied and comparison between ZIP and ZINB models is also made by significance evaluation of dispersion parameter and LR test, note that the result of comparison between Poisson and NB models could not be extended to ZI models. The other step of comparison is goodness-of-fit evaluation of models and their fit comparison, so Akaike (AIC) or Bayesian (BIC) information criteria are employed. The results of comparison in the second stage often approve the first. The results of goodness-of-fit evaluation of models and their comparison are presented in
After developing models for crashes with property damage only and more severe crashes (injury and fatal), we handle analyzing results and discussion and also evaluation and comparison between models to see which variables play considerable roles in crash occurrence on urban highways and which do not have much parts in it. After that, we handle evaluation and comparison between models and will see which models fit better for modeling the likelihood of no injury and more severe accidents.
The results of developing four regression models of P, NB, ZIP and ZINB including regression parameters and significance of independent variables for both no injury and more severe accidents are presented in
The increase in likelihood of accidents with number of access roads, as is clear from results and reflected in
As the results show, the probability of accident occurrence increases with number of horizontal curves, either property accidents or more severe ones. This result has a consistency with some researches [
Poisson | Negative binomial | Zero-inflated Poisson | Zero-inflated Negative binomial | |
---|---|---|---|---|
No injury accidents | ||||
alpha | ? | 1.0921 (17.67) | ? | 0.7650 (13.12) |
−2 LL | 9067.2 | 5368.8 | 7710.2 | 5271.7 |
Vuong statistic | ? | ? | 10.60 | ? |
AIC | 9083.2 | 5386.8 | 7742.2 | 5305.7 |
BIC | 9093.4 | 5398.3 | 7762.6 | 5327.3 |
Injury or fatal accidents | ||||
alpha | ? | 0.7718 (7.65) | ? | 0.6535 (7.14) |
−2 LL | 2485.1 | 2346.5 | 2360.3 | 2320.1 |
Vuong statistic | ? | ? | 3.83 | ? |
AIC | 2501.1 | 2364.5 | 2392.3 | 2354.1 |
BIC | 2511.3 | 2376.0 | 2412.7 | 2375.7 |
Poisson | Negative binomial | Zero-inflated Poisson | Zero-inflated Negative binomial | |
---|---|---|---|---|
No injury accidents | ||||
Passenger car volume | 0.00028* | 0.00036 | 0.00022 | 0.00028 |
Heavy vehicle volume | 0.00018 | −0.00001 | 0.00014 | 0.00009 |
Light non-passenger car vehicle volume | 0.00054 | 0.00066 | 0.00043 | 0.00064 |
Traffic speed | 0.00687 | 0.01053 | 0.00392 | 0.0084 |
Number of lanes | 0.1901 | 0.2359 | 0.1012 | 0.0567 |
Number of horizontal curves | 0.3221 | 0.3715 | 0.2317 | 0.3031 |
Number of access roads | 0.1354 | 0.1743 | 0.0939 | 0.1170 |
Injury or fatal accidents | ||||
Passenger car volume | −0.00011 | −0.00011 | −0.00012 | −0.00012 |
Heavy vehicle volume | −0.00026 | −0.00019 | −0.00015 | −0.00017 |
Light non-passenger car vehicle volume | 0.00074 | 0.0010 | 0.00076 | 0.0011 |
Traffic speed | 0.00351 | 0.0043 | 0.0021 | 0.0130 |
Number of lanes | −0.1521 | −0.0971 | −0.0008 | −0.0991 |
Number of horizontal curves | 0.5268 | 0.5031 | 0.5012 | 0.4467 |
Number of access roads | 0.0781 | 0.09701 | 0.1075 | 0.0656 |
*Highlighted Coefficients are significant at 5%.
likelihood of injury accidents for all highway segments. Their explanation for this were, as the curve density increases, individuals may adjust by driving more slowly to provide more time to process information and to increase their ability to safely negotiate the curves [
The results of modeling crashes presented in
The results show that, speed of traffic plays an effective role on occurrence of accidents with property damage, as the likelihood of property accidents increases with speed, but it does not have much impact on occurrence of injury or fatal accidents. Some researchers implemented the posted speed limit or exceed or not the speed limit as independent variable in modeling crashes and found out that exceeding the speed limit increases the likelihood of injury accidents [
The results of investigation show that the volume of passenger cars and light non-passenger car vehicles have an increasing impact on likelihood of no injury (with property damage only) accidents, but the volume of heavy vehicles does not have much effect on no injury accidents. Also, the volume of light (non-passenger car) vehicles increases the likelihood of injury or fatal accidents, but the volume of passenger cars and heavy vehicles do not have much impact on likelihood of injury or fatal crashes. Researchers came up with different results; Chang (2005) concluded in his studies that conflicts between vehicles and probability of accident occurrence increase with number of vehicles and trucks. First part of this conclusion is consistence with our findings but the other is not [
After analyzing the results of modeling, we handle goodness-of-fit evaluation and comparison between models. For accidents with property loss, as is clear from
For more severe accidents, as is clear from
In this research, we dealt with investigation into effective factors on frequency and severity of accidents on urban highways. The statistical methodology applied in this research, is employing four well known regression models in modeling highway accidents comprising Poisson, Negative binomial, Zero-inflated Poisson and Zero-inflated Negative binomial regression models. In this study, the accident data of Mashhad-Iran urban highways were used as a case study, traffic flow and road geometry related variables as independent variables of models were used to scrutinize the part of traffic in accident occurrence and severity, the traffic volume was divided into volumes of passenger cars, heavy vehicles and light non-passenger car vehicles.
In conducted research, we developed two groups model one for accidents with property damage only (no injury) and one for more severe accidents (injury or fatal) and concluded that, the likelihoods of no injury and more severe accidents increase with existence and number of horizontal curves and access roads, also as speed and number of lanes increase, the likelihood of no injury accidents increases but it does not have much effect on likelihood of more severe accidents. The results of research indicate that, the volume of passenger cars and light (non-passenger car) vehicles have increasing impact on likelihood of no injury accidents, but the volume of heavy vehicles does not have much effect on probability of occurrence of no injury accidents. Also, the volume of light vehicles increases the likelihood of injury or fatal accidents, but the volume of passenger cars and heavy vehicles does not affect much likelihood of such accidents. After that, we handled goodness-of-fit evaluation and comparison between models and concluded that, zero-inflated negative binomial regression model is the best and fittest model, both for no injury and more severe accidents.