Visualizing Crash Data Patterns
Peter Wagner, with Ragna Hoffmann, Marek Junghans, Andreas Leich, and Hagen Saul German Aerospace Center (DLR) – Institute of Transport Systems 32nd ICTCT Conference 2019 Warsaw, Poland 25 October 2019
> Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 DLR.de • Chart 1Recently…
- I had to review a paper, where a CNN was used
to predict crashes online (their probability)
- Not all reviewers were happy with this paper, therefore an interesting discussion
between reviewers and editor started
- One reviewer deems this impossible to work, since CNN’s search for patterns:
- I hope that anybody agrees with me, that this reviewer is wrong
However, crashes are essentially rare events and many are pure random (e.g. due to drunk drivers, drunk pedestrians) with no pattern at all.
CNN = Convolutional Neural Network, Picture taken from here: https://www.superdatascience.com/blogs/the-ultimate-guide-to-convolutional-neural-networks-cnn 0.2 0.5 1 2 5 10 20 Time of day (h) Share of BAC-related crashes (%) Sun
Mon Tue Wed Thu FriSat 4 8 12 16 20 24
Ironically…
- Example has a particular
strong pattern
- Berlin’s data-base 2001–
2016: a factor of 100 between best and worst hour
- I will try to show, that this
is in fact among the strongest patterns in these data
> Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 DLR.de • Chart 3The toolkit
> Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 DLR.de • Chart 4Two main instruments
> Crash Data Patterns > Wagner et al • presentationWarsaw.pptx > 25 Oct 2019 DLR.de • Chart 5- Best introduced by way of an example: crash data-base contains a lot of
information, picking only on two of them (plus the id):
- Constructing the contingency table (cross table) from these data
Id Time-of-day (h) BAC (yes/no) 1 17 “2” / afternoon 2 22 “3” / evening 1 … … … Night (0) Morning (1) Afternoon (2) Evening (3) No 49343 497179 705124 287286 Yes 9843 3573 5193 11316