RANDOM FORESTS IN THE EVALUATION OF THREAT FOR PEDESTRIAN ACCIDENTS - - PDF document

random forests in the evaluation of threat for pedestrian
SMART_READER_LITE
LIVE PREVIEW

RANDOM FORESTS IN THE EVALUATION OF THREAT FOR PEDESTRIAN ACCIDENTS - - PDF document

RANDOM FORESTS IN THE EVALUATION OF THREAT FOR PEDESTRIAN ACCIDENTS IN TOWNS Marzena Nowakowska Faculty of Management and Computer Modelling Kielce University of Technology 25-345 Kielce, Al. 1000-lecia Pa stwa Polskiego 7, POLAND phone:


slide-1
SLIDE 1

RANDOM FORESTS IN THE EVALUATION OF THREAT FOR PEDESTRIAN ACCIDENTS IN TOWNS

Marzena Nowakowska Faculty of Management and Computer Modelling Kielce University of Technology 25-345 Kielce, Al. 1000-lecia Państwa Polskiego 7, POLAND phone: +48 41 34 24 437, e-mail address: spimn@tu.kielce.pl

Research issue

The random forest methodology in the investigation of road traffic safety decision tree bagging approach The examination of the influence of selected factors on threat on roads in towns: accident severity – the level of a human casualty harm The subject - separated and mutually independent data sets concerning found guilty: pedestrians involved in road accidents: 1625 observations drivers in pedestrian accidents: 2519 observations The data source: urban accidents in the Świętokrzyskie voivodship from the time period 1999-2009

slide-2
SLIDE 2

The methodology; decision tree

Leaf Leaf Leaf Node: test X3 Leaf: d1 Leaf Leaf Node: test X2 Tree root: test X1 X1(z)=a X2(z)=b X3(z)=c

  • The C4.5 algorithm for data split:
  • qualitative attributes
  • the information gain criterion
  • the identity test
  • Probability leaves for the decision
  • The variable importance ranking:

X1(z)=a ∧ X2(z)=b ∧ X3(z)=c ⇒ Y = d1 X1(z)=a ∧ X2(z)=b ∧ X3(z)=c ⇒ {P(Y = d1), ..., P(Y = dk)} )} X ( AI { max ) X ( AI ) X ( Importance

j j i i =

=

) t | X ( B b b i

i

) t Y ( SSE

  • )

t Y ( SSE ) X ( AI | | where SSE(Y|t) and SSE(Y|tb) are the sums

  • f square errors calculated before and after

splitting the data set according to the variable Xi respectively. Importance ∈<0; 1>

The methodology; bootstrapped tree ensemble

Original data set Train data sets Decision trees constituting the random forest

1. Bootstrapping the sample 2. Random selection of observations from each decision stratum 3. The investigation of all attributes in an exhaustive search 4. Bagging – averaging posterior probabilities for the decision (accident severity) 5. The average posterior probability of fatal or serious accident status is the accident threat 6. Tree quality measures: sensitivity SNS, specificity SPS, proportion correctly classified PCC, harmonic mean of sensitivity and specificity HMSS

slide-3
SLIDE 3

Investigated variables and their domains

  • Pedestrian gender Pd_Gn : M, F
  • Pedestrian intoxicated by alcohol/other

substances Pd_BAC : Y, N

  • Pedestrian behaviour Pd_Bhv: ImEnFrVh,

ImEnBhVh, InCrRd*, PrRd*, OtPdBh*

  • Pedestrian age group Pd_Ag:
  • Vehicle type Vhl: BMM (bicycle, moped, motorcycle), Car (passenger car),

HVh (bus, truck), OVh (other vehicle types)

Accident severity Ac_Sv:

pedestrian data set: 61.1% MA, 31.8% SA, 7.1% FA driver data set: 59.1% MA, 33.7% SA, 7.3% FA

The final domain of AcSv: MA, FSA = FA+SA

7 14 20 35 50 65 1 2 3 4 5 6 7

  • Driver gender Dr_Gn : M, F
  • Driver intoxicated by alcohol/other substances

Dr_BAC : Y, N

  • Driver behaviour Dr_Bhv: ExSp, NtGvWay,

InMnGrSp*, InMnSmSp*, InBhTwPd*, OtDrBv*

  • Driving experience group Drvg:

1 2 3 4 5 6 7 4 8 12 16 21 26 Small Medium Big Very big

20000 50000 100000

  • City size City:

The statistics for the random forests

  • The train bootstrapped sets:
  • number of observations: 774
  • fractions of the decision: 50% of MA,

50% of FSA (15%+35% of FA+SA)

  • The number of observations in the test sets:

from 1002 to 1029

AMean – the arithmetic mean: AMean=Σzi /n HMean – the harmonic mean: HMean=n/Σ(1/zi )

  • The measures of classification quality:

SNS SPC PCC HMSS SNS SPC PCC HMSS Min [%] 59.6 48.9 63.2 60.0 49.2 49.2 54.6 55.3 Max [%] 77.5 71.0 66.5 66.3 70.0 68.0 61.9 62.5 AMean [%] 65.9 64.5 65.2 64.7 57.9 58.6 58.2 57.8 HMean [%] 65.6 63.8 65.2 64.7 57.4 58 58.1 57.7 Specification Train data sets Test data sets

The guilty pedestrian forest

SNS SPC PCC HMSS SNS SPC PCC HMSS Min [%] 53.2 56.7 60.0 59.3 48.8 49.8 53.5 54.0 Max [%] 64.6 68.1 63.2 62.8 63.5 69.9 61.6 60.0 AMean [%] 59.9 62.6 61.2 61.0 54.5 59.3 57.3 56.3 HMean [%] 59.7 62.3 61.2 61.0 54.0 58.8 57.2 56.3 Specification Train data sets Test data sets

The guilty driver forest

  • The train bootstrapped sets:
  • number of observations: 1220
  • fractions of the decision: 50% of MA,

50% of FSA (15%+35% of FA+SA)

  • The number of observations in the test sets:

from 1529 to 1573

  • The measures of classification quality:
slide-4
SLIDE 4

The diagnostic of input variables

The guilty driver forest

0,2 0,4 0,6 0,8 1 1,2

Vhl City Dr_Gn Dr_BAC Drvg Dr_Bhv Importance % of occurrence

0% 20% 40% 60% 80% 100% 120%

The guilty pedestrian forest

0,00 0,20 0,40 0,60 0,80 1,00 1,20 Vhl City Pd_Gn Pd_BAC Pd_Ag Pd_Bhv Importance % of occurrence 0% 20% 40% 60% 80% 100% 120%

0,25 0,35 0,45 0,55 0,65 0,75 <0; 7) <7; 14) <14; 20) <20; 35) <35;50) <50; 65) >=65

ImEnFrVh InCrRd ImEnBhVh PrRd

The average posterior probability

  • f

fatal or serious accidents by pedestrian age group and pedestrian behaviour

Very big city 0,25 0,35 0,45 0,55 0,65 0,75 <0; 7) <7; 14) <14; 20) <20; 35) <35;50) <50; 65) >=65 Big city 0,25 0,35 0,45 0,55 0,65 0,75 <0; 7) <7; 14) <14; 20) <20; 35) <35;50) <50; 65) >=65 Medium city 0,25 0,35 0,45 0,55 0,65 0,75 <0; 7) <7; 14) <14; 20) <20; 35) <35;50) <50; 65) >=65 Small city 0,25 0,35 0,45 0,55 0,65 0,75 <0; 7) <7; 14) <14; 20) <20; 35) <35;50) <50; 65) >=65

Bagging for pedestrian caused accident threat

slide-5
SLIDE 5

The differences in the pedeatrian caused accident threat level for passenger cars and heavy vehicles

Very big city Big city Medium city Small city

0,2 0,3 0,4 0,5 0,6 0,7 0,8 (0;7> <14;20) <35;50) >=65 <7;14) <20;35) <50;65) (0;7> <14;20) <35;50) >=65 <7;14) <20;35) <50;65) ImEnFrVh InCrRd ImEnBhVh PrRd 0,2 0,3 0,4 0,5 0,6 0,7 0,8 (0;7> <14;20) <35;50) >=65 <7;14) <20;35) <50;65) (0;7> <14;20) <35;50) >=65 <7;14) <20;35) <50;65) ImEnFrVh InCrRd ImEnBhVh PrRd 0,2 0,3 0,4 0,5 0,6 0,7 0,8 (0;7> <14;20) <35;50) >=65 <7;14) <20;35) <50;65) (0;7> <14;20) <35;50) >=65 <7;14) <20;35) <50;65) ImEnFrVh InCrRd ImEnBhVh PrRd 0,2 0,3 0,4 0,5 0,6 0,7 0,8 (0;7> <14;20) <35;50) >=65 <7;14) <20;35) <50;65) (0;7> <14;20) <35;50) >=65 <7;14) <20;35) <50;65) ImEnFrVh InCrRd ImEnBhVh PrRd

0,2 0,3 0,4 0,5 0,6 0,7 0,8 (0;7 <35; <7;1 <50; <14; >=6 <20;

Car Heavy vehicle

Bagging for pedestrian caused accident threat

1

2 3 4

The average posterior probability

  • f

fatal or serious accidents by driving experience and driver behaviour

Very big city 0,35 0,40 0,45 0,50 0,55 0,60 0,65 0,70 0,75 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26 Big city 0,35 0,40 0,45 0,50 0,55 0,60 0,65 0,70 0,75 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26 Medium city 0,35 0,40 0,45 0,50 0,55 0,60 0,65 0,70 0,75 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26 Small city 0,35 0,40 0,45 0,50 0,55 0,60 0,65 0,70 0,75 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26

0,30 0,80 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26

InBhTwPd ExSp NtGvWay InMnGrSp InMnSmSp

Bagging for driver caused accident threat

slide-6
SLIDE 6

The city size role in generating the threat by driving experience group and driver behaviour

0,35 0,40 0,45 0,50 0,55 0,60 0,65 0,70 0,75 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26 <0; 4) <4; 8) <8; 12) <12; 16) <16; 21) <21; 26) >=26 Very big Big Medium Small InBhTwPd ExSp NtGvWay InMnGrSp InMnSmSp

Bagging for driver caused accident threat

1

2 3 4

Conclusions

  • The most important factors influencing over the threat in urban streets are connected

with the age of a road user at guilt

  • The lowest threat level is for young pedestrians, then it increases up to the pedestrian

age of 50, then it varies

  • The probability of fatal or serious accident status does not necessarily decreases as the

driving experience increases

  • In the cities other than a very big one, the highest threat is caused by drivers with the 4-8

years of driving experience, then it decreases up to 21 years, then it increases

  • The city size is the second factor that plays the important role in the accident severity

classification

  • A very big city and cities of other sizes have different threat profiles
  • The influence of the city size is different for guilty pedestrians and guilty drivers
  • The behaviour of a road user at guilt is a third importance factor in the accident severity

classification

  • The pedestrian threat differences are noticable for the behaviours of youngsters and the
  • lder group
  • The driver threat differences vary along with the driving experience group, the city size,

and the driver behaviour

slide-7
SLIDE 7

Summing up

  • The random forests are a promising methodology in the

accident severity classification for vehicle-pedestrian crashes in towns

  • In creating bootstrapped train sets a stratified sampling

should be applied

  • The modification of the target variable structure is

recommended to overcome the negative influence of big disproportions in the accident severity distribution

  • The outcome is specific to urban roads of a certain region
  • Better results of the accident severity prediction are
  • btained for guilty pedestrians than for guilty drivers
  • More factors influencing the threat classification were

identified for pedestrians at fault than for drivers at fault