RANDOM FORESTS IN THE EVALUATION OF THREAT FOR PEDESTRIAN ACCIDENTS - - PDF document
RANDOM FORESTS IN THE EVALUATION OF THREAT FOR PEDESTRIAN ACCIDENTS - - PDF document
RANDOM FORESTS IN THE EVALUATION OF THREAT FOR PEDESTRIAN ACCIDENTS IN TOWNS Marzena Nowakowska Faculty of Management and Computer Modelling Kielce University of Technology 25-345 Kielce, Al. 1000-lecia Pa stwa Polskiego 7, POLAND phone:
The methodology; decision tree
Leaf Leaf Leaf Node: test X3 Leaf: d1 Leaf Leaf Node: test X2 Tree root: test X1 X1(z)=a X2(z)=b X3(z)=c
- The C4.5 algorithm for data split:
- qualitative attributes
- the information gain criterion
- the identity test
- Probability leaves for the decision
- The variable importance ranking:
X1(z)=a ∧ X2(z)=b ∧ X3(z)=c ⇒ Y = d1 X1(z)=a ∧ X2(z)=b ∧ X3(z)=c ⇒ {P(Y = d1), ..., P(Y = dk)} )} X ( AI { max ) X ( AI ) X ( Importance
j j i i =
∑
∈
=
) t | X ( B b b i
i
) t Y ( SSE
- )
t Y ( SSE ) X ( AI | | where SSE(Y|t) and SSE(Y|tb) are the sums
- f square errors calculated before and after
splitting the data set according to the variable Xi respectively. Importance ∈<0; 1>
The methodology; bootstrapped tree ensemble
Original data set Train data sets Decision trees constituting the random forest
1. Bootstrapping the sample 2. Random selection of observations from each decision stratum 3. The investigation of all attributes in an exhaustive search 4. Bagging – averaging posterior probabilities for the decision (accident severity) 5. The average posterior probability of fatal or serious accident status is the accident threat 6. Tree quality measures: sensitivity SNS, specificity SPS, proportion correctly classified PCC, harmonic mean of sensitivity and specificity HMSS
Investigated variables and their domains
- Pedestrian gender Pd_Gn : M, F
- Pedestrian intoxicated by alcohol/other
substances Pd_BAC : Y, N
- Pedestrian behaviour Pd_Bhv: ImEnFrVh,
ImEnBhVh, InCrRd*, PrRd*, OtPdBh*
- Pedestrian age group Pd_Ag:
- Vehicle type Vhl: BMM (bicycle, moped, motorcycle), Car (passenger car),
HVh (bus, truck), OVh (other vehicle types)
Accident severity Ac_Sv:
pedestrian data set: 61.1% MA, 31.8% SA, 7.1% FA driver data set: 59.1% MA, 33.7% SA, 7.3% FA
The final domain of AcSv: MA, FSA = FA+SA
7 14 20 35 50 65 1 2 3 4 5 6 7
- Driver gender Dr_Gn : M, F
- Driver intoxicated by alcohol/other substances
Dr_BAC : Y, N
- Driver behaviour Dr_Bhv: ExSp, NtGvWay,
InMnGrSp*, InMnSmSp*, InBhTwPd*, OtDrBv*
- Driving experience group Drvg:
1 2 3 4 5 6 7 4 8 12 16 21 26 Small Medium Big Very big
20000 50000 100000
- City size City:
The statistics for the random forests
- The train bootstrapped sets:
- number of observations: 774
- fractions of the decision: 50% of MA,
50% of FSA (15%+35% of FA+SA)
- The number of observations in the test sets:
from 1002 to 1029
AMean – the arithmetic mean: AMean=Σzi /n HMean – the harmonic mean: HMean=n/Σ(1/zi )
- The measures of classification quality:
SNS SPC PCC HMSS SNS SPC PCC HMSS Min [%] 59.6 48.9 63.2 60.0 49.2 49.2 54.6 55.3 Max [%] 77.5 71.0 66.5 66.3 70.0 68.0 61.9 62.5 AMean [%] 65.9 64.5 65.2 64.7 57.9 58.6 58.2 57.8 HMean [%] 65.6 63.8 65.2 64.7 57.4 58 58.1 57.7 Specification Train data sets Test data sets
The guilty pedestrian forest
SNS SPC PCC HMSS SNS SPC PCC HMSS Min [%] 53.2 56.7 60.0 59.3 48.8 49.8 53.5 54.0 Max [%] 64.6 68.1 63.2 62.8 63.5 69.9 61.6 60.0 AMean [%] 59.9 62.6 61.2 61.0 54.5 59.3 57.3 56.3 HMean [%] 59.7 62.3 61.2 61.0 54.0 58.8 57.2 56.3 Specification Train data sets Test data sets
The guilty driver forest
- The train bootstrapped sets:
- number of observations: 1220
- fractions of the decision: 50% of MA,
50% of FSA (15%+35% of FA+SA)
- The number of observations in the test sets:
from 1529 to 1573
- The measures of classification quality:
The diagnostic of input variables
The guilty driver forest
0,2 0,4 0,6 0,8 1 1,2
Vhl City Dr_Gn Dr_BAC Drvg Dr_Bhv Importance % of occurrence
0% 20% 40% 60% 80% 100% 120%
The guilty pedestrian forest
0,00 0,20 0,40 0,60 0,80 1,00 1,20 Vhl City Pd_Gn Pd_BAC Pd_Ag Pd_Bhv Importance % of occurrence 0% 20% 40% 60% 80% 100% 120%
0,25 0,35 0,45 0,55 0,65 0,75 <0; 7) <7; 14) <14; 20) <20; 35) <35;50) <50; 65) >=65ImEnFrVh InCrRd ImEnBhVh PrRd
The average posterior probability
- f
fatal or serious accidents by pedestrian age group and pedestrian behaviour
Very big city 0,25 0,35 0,45 0,55 0,65 0,75 <0; 7) <7; 14) <14; 20) <20; 35) <35;50) <50; 65) >=65 Big city 0,25 0,35 0,45 0,55 0,65 0,75 <0; 7) <7; 14) <14; 20) <20; 35) <35;50) <50; 65) >=65 Medium city 0,25 0,35 0,45 0,55 0,65 0,75 <0; 7) <7; 14) <14; 20) <20; 35) <35;50) <50; 65) >=65 Small city 0,25 0,35 0,45 0,55 0,65 0,75 <0; 7) <7; 14) <14; 20) <20; 35) <35;50) <50; 65) >=65
Bagging for pedestrian caused accident threat
The differences in the pedeatrian caused accident threat level for passenger cars and heavy vehicles
Very big city Big city Medium city Small city
0,2 0,3 0,4 0,5 0,6 0,7 0,8 (0;7> <14;20) <35;50) >=65 <7;14) <20;35) <50;65) (0;7> <14;20) <35;50) >=65 <7;14) <20;35) <50;65) ImEnFrVh InCrRd ImEnBhVh PrRd 0,2 0,3 0,4 0,5 0,6 0,7 0,8 (0;7> <14;20) <35;50) >=65 <7;14) <20;35) <50;65) (0;7> <14;20) <35;50) >=65 <7;14) <20;35) <50;65) ImEnFrVh InCrRd ImEnBhVh PrRd 0,2 0,3 0,4 0,5 0,6 0,7 0,8 (0;7> <14;20) <35;50) >=65 <7;14) <20;35) <50;65) (0;7> <14;20) <35;50) >=65 <7;14) <20;35) <50;65) ImEnFrVh InCrRd ImEnBhVh PrRd 0,2 0,3 0,4 0,5 0,6 0,7 0,8 (0;7> <14;20) <35;50) >=65 <7;14) <20;35) <50;65) (0;7> <14;20) <35;50) >=65 <7;14) <20;35) <50;65) ImEnFrVh InCrRd ImEnBhVh PrRd
0,2 0,3 0,4 0,5 0,6 0,7 0,8 (0;7 <35; <7;1 <50; <14; >=6 <20;Car Heavy vehicle
Bagging for pedestrian caused accident threat
1
2 3 4
The average posterior probability
- f
fatal or serious accidents by driving experience and driver behaviour
Very big city 0,35 0,40 0,45 0,50 0,55 0,60 0,65 0,70 0,75 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26 Big city 0,35 0,40 0,45 0,50 0,55 0,60 0,65 0,70 0,75 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26 Medium city 0,35 0,40 0,45 0,50 0,55 0,60 0,65 0,70 0,75 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26 Small city 0,35 0,40 0,45 0,50 0,55 0,60 0,65 0,70 0,75 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26
0,30 0,80 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26InBhTwPd ExSp NtGvWay InMnGrSp InMnSmSp
Bagging for driver caused accident threat
The city size role in generating the threat by driving experience group and driver behaviour
0,35 0,40 0,45 0,50 0,55 0,60 0,65 0,70 0,75 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26 Very big Big Medium Small InBhTwPd ExSp NtGvWay InMnGrSp InMnSmSp
Bagging for driver caused accident threat
1
2 3 4
Conclusions
- The most important factors influencing over the threat in urban streets are connected
with the age of a road user at guilt
- The lowest threat level is for young pedestrians, then it increases up to the pedestrian
age of 50, then it varies
- The probability of fatal or serious accident status does not necessarily decreases as the
driving experience increases
- In the cities other than a very big one, the highest threat is caused by drivers with the 4-8
years of driving experience, then it decreases up to 21 years, then it increases
- The city size is the second factor that plays the important role in the accident severity
classification
- A very big city and cities of other sizes have different threat profiles
- The influence of the city size is different for guilty pedestrians and guilty drivers
- The behaviour of a road user at guilt is a third importance factor in the accident severity
classification
- The pedestrian threat differences are noticable for the behaviours of youngsters and the
- lder group
- The driver threat differences vary along with the driving experience group, the city size,
and the driver behaviour
Summing up
- The random forests are a promising methodology in the
accident severity classification for vehicle-pedestrian crashes in towns
- In creating bootstrapped train sets a stratified sampling
should be applied
- The modification of the target variable structure is
recommended to overcome the negative influence of big disproportions in the accident severity distribution
- The outcome is specific to urban roads of a certain region
- Better results of the accident severity prediction are
- btained for guilty pedestrians than for guilty drivers
- More factors influencing the threat classification were