SLIDE 1 Smart Card Data in Public Transport
Paul Bouman
Also based on work of: Evelien van der Hurk, Timo Polman, Leo Kroon, Peter Vervest and Gábor Maróti
Department of Technology & Operations Management
Complexity in Public Transport: http://www.computr.eu
SLIDE 2
NETHERLANDS RAILWAYS (NS) IN NUMBERS
SLIDE 3
NETHERLANDS RAILWAYS (NS) IN NUMBERS
1,1 ⋅ 106 Journeys per weekday 16.8 ⋅ 109 Yearly Passenger km’s 97.4 % Train Punctuality 1.5 % Cancelled Trains 4800+ Train Services per Weekday 3000+ Train Wagons/Drivers
SLIDE 4
NETHERLANDS RAILWAYS (NS) IN NUMBERS
1,1 ⋅ 106 Journeys per weekday 16.8 ⋅ 109 Yearly Passenger km’s 97.4 % Train Punctuality 1.5 % Cancelled Trains 4800+ Train Services per Weekday 3000+ Train Wagons/Drivers
SLIDE 5 SMART CARD DATA (AUTOMATED FARE COLLECTION)
- Dutch “OV-chipkaart”
- Always both check-in and check-out
- Some differences between modalities
CardID Location Time Type … 543465 Harderwijk 13:42 CHECKOUT … 654345 Amsterdam 13:43 CHECKIN … … … … … …
SLIDE 6 OVERVIEW
- Introduction
- Finding Passenger Routes
From smart card transactions to routes
- A Better Way to Measure Service Quality
New Service Indicators from the perspective of the passenger
Insight into passenger demand enables better service design
SLIDE 7
FROM SMART CARD AND TIMETABLE DATA TO PASSENGER ROUTES
SLIDE 8 PASSENGER ROUTE CHOICE Knowledge on passenger route choice provides
- Estimate demand for capacity
- Test assumptions on passenger behavior and route choice
- Hind-sight analysis of passenger service (delays)
Until now:
- Surveys and panel data to deduce route choice
- Models for route choice: maximum utility, regret minimization,…
Now:
- We have both the Smart Card Data and conductor checks to determine the
routes used by a passenger. Why not use them?
SLIDE 9 PROBLEM OVERVIEW ROUTE DEDUCTION FROM AFC
- Which route (time, space, trains) did a passenger take?
Station A Station B Platform i Platform k
ci
time ci
co
trains
Time +Station Time +Station Conductor check
SLIDE 10 PROBLEM OVERVIEW ROUTE DEDUCTION FROM AFC
- Which route (time, space, trains) did a passenger take?
Station A Station B Platform i Platform k
ci
time ci
co
trains
Time +Station Time +Station Conductor check
Step 1:
How can we find these route options?
SLIDE 11 FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK
Gvc 9:09 Gvc 9:12 Gvc 9:16 Gvc 9:25 Ledn 9:09 Ledn 9:12 Ledn 9:16 Ledn 9:21 Ledn 9:25 Dt 9:09 Dt 9:12 Dt 9:16 Dt 9:21 Dt 9:25 Gvc 9:21
SLIDE 12 FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK
Gvc 9:09 Gvc 9:12 Gvc 9:16 Gvc 9:25 Ledn 9:09 Ledn 9:12 Ledn 9:16 Ledn 9:21 Ledn 9:25 Dt 9:09 Dt 9:12 Dt 9:16 Dt 9:21 Dt 9:25 Gvc 9:21
SLIDE 13 FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK
Gvc 9:09 Gvc 9:12 Gvc 9:16 Gvc 9:25 Ledn 9:09 Ledn 9:12 Ledn 9:16 Ledn 9:21 Ledn 9:25 Dt 9:09 Dt 9:12 Dt 9:16 Dt 9:21 Dt 9:25 Gvc 9:21
Problem Transferring to another train is “free”. However, most passengers will prefer to stay in the same train if only a small amount of time is saved.
SLIDE 14 FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK
Gvc 9:09 Gvc 9:12 Gvc 9:16 Ledn 9:09 Ledn 9:12 Ledn 9:16 Dt 9:09 Dt 9:12 Dt 9:16
SLIDE 15 FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK
Gvc 9:09 Gvc 9:12 Gvc 9:16 Ledn 9:09 Ledn 9:12 Ledn 9:16 Dt 9:09 Dt 9:12 Dt 9:16
SLIDE 16 FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK
Gvc 9:09 Gvc 9:12 Gvc 9:16 Ledn 9:09 Ledn 9:12 Ledn 9:16 Dt 9:09 Dt 9:12 Dt 9:16
D A D A
SLIDE 17 FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK
Gvc 9:09 Gvc 9:12 Gvc 9:16 Ledn 9:09 Ledn 9:12 Ledn 9:16 Dt 9:09 Dt 9:12 Dt 9:16
D A D A
SLIDE 18 FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK
Gvc 9:09 Gvc 9:12 Gvc 9:16 Ledn 9:09 Ledn 9:12 Ledn 9:16 Dt 9:09 Dt 9:12 Dt 9:16
D A D A Problem Whether a transfer is feasible may depend on the platforms of the trains.
SLIDE 19 FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK
Gvc 9:09
Gvc 9:12
Gvc 9:16 Ledn 9:09 Ledn 9:12 Ledn 9:16 Dt 9:09 Dt 9:12 Dt 9:16
D A D A
Gvc 9:13
SLIDE 20 FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK
Gvc 9:09
Gvc 9:12
Gvc 9:16 Ledn 9:09 Ledn 9:12 Ledn 9:16 Dt 9:09 Dt 9:12 Dt 9:16
D A D A
Gvc 9:13
SLIDE 21 TO SUMMARIZE
- We have a ‘Basic Event Activity Network’ where transfers are ‘free’ that
contains an arc for each scheduled trip in the timetable
- We have a ‘Extended Event Activity Network’ where we can include penalties
- n the transfers and make sure that some slack time is included when the
passenger makes a transfer.
SLIDE 22 COMPUTING SHORTEST PATHS
- We use this procedure to obtain an Event-Activity Network
𝐻 = (𝑊, 𝐵)
- Every node in the graph has a time label
- If we assume every arc in the graph is associated with an action that takes a
strictly positive amount of time, we have a Directed Acyclic Graph
- When constructing the graph, we can use the time indices to obtain a
Topological Ordering of the graph
𝑊 ≔ {𝑤1, 𝑤2, … , 𝑤𝑜} such that ∀ 𝑤𝑗, 𝑤𝑘 ∈ 𝐵 ∶ 𝑗 < 𝑘
- We can use this to compute a Shortest Path Tree in 𝑃( 𝐵 ) time. For repeated
computations on 20-50k Origin/Destination pairs, this matters.
- Using different cost parameters for the different types of arcs, we can generate
paths that favor or avoid different types of routes.
SLIDE 23 PROBLEM OVERVIEW ROUTE DEDUCTION FROM AFC
- Now we have a set of possible routes.
Station A Station B Platform i Platform k
ci
time ci
co
trains
Time +Station Time +Station Conductor check
SLIDE 24 PROBLEM OVERVIEW ROUTE DEDUCTION FROM AFC
- Now we have a set of possible routes.
Station A Station B Platform i Platform k
ci
time ci
co
trains
Time +Station Time +Station Conductor check
Step 2:
Which path should we choose, given the check in and check out?
SLIDE 25 METHOD
- Generate routes based either on
– The basic Event-Activity Network – The extended Event-Activity Network
- From these, we pick one according to a fixed rule:
1) First Departure (FD) 2) Earliest Arrival (EA) 3) Latest Arrival (LA) 4) Least Transfers (LT) 5) Maximum Path Length (MPL) 6) Select Least Transfers Last Arrival (STA)
- We will validate the methodology using conductor checks:
– Did we even find the correct route during route generation? – Does our rule pick the correct route?
SLIDE 26 DATA
– Origin station, destination station, start time, end time, card id
– Departure time station, arrival time station, train number
– Card id, time, train number
General:
- 5 days
- Over 500,000 journeys,
- For a significant number of journeys, we have a conductor check
- Full Dutch railway network of Netherlands Railways trains
SLIDE 27 RESULTS
Method Observed Path Generated Basic Network 75,5%
92,3% Rule Basic Network
First Departure 65% 86% Earliest Arrival 67% 86% Last Arrival 65% 86% Least Transfers 68% 90%
70% 92% Selected Least T. 73% 95% NB: These are percentages over the set of journeys for which the correct path was generated.
SLIDE 28 DISCUSSION OF RESULTS
- When constructing the Event-Activity Network, transfers matter for your
succes rate.
- Rules solely based on either the arrival or departure time are
- utperformed by those which include the number of transfers.
- The “Selected Least Transfers” rule performs best, as it combines the
idea of minimizing transfers with the idea that a passenger will likely depart within 10 minutes of check in and check out within 10 minutes of arrival.
SLIDE 29
MEASURING PASSENGER DELAYS
SLIDE 30 MEASURING SERVICE QUALITY
- Recall: punctuality score of NS is quite high (a little higher than 97%).
- This refers to ‘train punctuality’
Did the train arrive within five minutes of the timetable?
- Passengers mostly care whether they reach their destination in time.
‘Passenger punctuality’ would be a better indicator of service quality. – In case of transfers, small train delays can have a big impact on the passenger delays. – In some situations, delays can be lead to additional transfer opportunities
- How can we measure this? (MSc. Thesis of Timo Polman)
- Should be controllable (e.g. the cause can be determined), robust (e.g. not
depend on shopping behavior at a station) and simple.
SLIDE 31
METHODOLOGY
Generate planned route Execute planned route Calculate Delay Realised Timetable Planned Timetable Smart Card Journey
SLIDE 32
METHODOLOGY
Generate planned route Execute planned route Calculate Delay Realised Timetable Planned Timetable Smart Card Journey When the planned route is feasible, use that. If not, recalculate from the first point where it is infeasible.
SLIDE 33 SOME DELAY MEASURES
- Average Delay (Gross Passenger Delay Minutes / Lost Customer Hours)
- Relative delay:
- Delay divided by planned journey time
SLIDE 34 SENSITIVITY TO SELECTION RULES
Early Arr. First Dep. Last Arr. Least Transfers
Transfers Average Delay (min) 2.23 min 2.23 min 2.22 min 2.13 min 1:45 min Relative Delay 11.12 11.11 11.13 10.87 6.56 Missed Connections 1.09% 1.09% 1.10% 0.68% 0.39%
- Average Delay (Gross Passenger Delay Minutes / Lost Customer Hours)
- Relative delay:
- Delay divided by planned journey time
SLIDE 35
SENSITIVITY TO SELECTION RULES
Check in Check out Time →
SLIDE 36
METRICS BASED ON EARLY ARRIVAL
Metric Percentage 5 minutes delayed 8.47% 10 minutes delayed 3.86% 15 minutes delayed 2.05% 30 minutes delayed 0.48% 45 minutes delayed 0.12% 60 minutes delayed 0.07% Metric Value Average Delay 1.93 min Average Travel Time 31.66 min Average Relative Delay 9.44%
SLIDE 37 DISCUSSION
- The distinction between ‘what did the passenger want to do?’ (related to stated
choice) and ‘what did the passenger do?’ (related to revealed choice) is important.
- Using the ‘Earliest Arrival’ rule we are robust against late check outs due to for
example shopping, but sensitive to strategic planning at the beginning of the journey (for example when early advice is given by smartphone)
SLIDE 38
DEMAND ANALYSIS
SLIDE 39
Smart card data
ACTIVITY BASED DEMAND
Trip-based Tour-based Activity-based models
SLIDE 40 FROM JOURNEYS TO ACTIVITY TIME INTERVALS
- Order the journey per individual smartcard according to time and date.
- An activity is detected if the destination and the origin of two consecutive
journeys are equal
- The time and duration of the activity are based on the check out and the check
in time of the consecutive journeys
CardID Check in Check out Origin Destination 5345654 7/11 8:06 7/11 9:15 Utrecht Delft 5345654 7/11 22:03 7/11 23:15 Delft Utrecht 5345654 8/11 8:08 8/11 9:30 Utrecht Delft 6345763 5/11 15:16 5/11 15:45 Groningen Zwolle 6345763 6/11 9:15 6/11 10:15 Groningen Zwolle
SLIDE 41 FROM JOURNEYS TO ACTIVITY TIME INTERVALS
- Order the journey per individual smartcard according to time and date.
- An activity is detected if the destination and the origin of two consecutive
journeys are equal
- The time and duration of the activity are based on the check out and the check
in time of the consecutive journeys
CardID Check in Check out Origin Destination 5345654 7/11 8:06 7/11 9:15 Utrecht Delft 5345654 7/11 22:03 7/11 23:15 Delft Utrecht 5345654 8/11 8:08 8/11 9:30 Utrecht Delft 6345763 5/11 15:16 5/11 15:45 Groningen Zwolle 6345763 6/11 9:15 6/11 10:15 Groningen Zwolle
SLIDE 42 FROM JOURNEYS TO ACTIVITY TIME INTERVALS
- Order the journey per individual smartcard according to time and date.
- An activity is detected if the destination and the origin of two consecutive
journeys are equal
- The time and duration of the activity are based on the check out and the check
in time of the consecutive journeys
CardID Check in Check out Origin Destination 5345654 7/11 8:06 7/11 9:15 Utrecht Delft 5345654 7/11 22:03 7/11 23:15 Delft Utrecht 5345654 8/11 8:08 8/11 9:30 Utrecht Delft 6345763 5/11 15:16 5/11 15:45 Groningen Zwolle 6345763 6/11 9:15 6/11 10:15 Groningen Zwolle
SLIDE 43 FROM JOURNEYS TO ACTIVITIES AND TIME INTERVALS
- In order to simplify our data, we project the time intervals onto a modular ring
consisting of 24 timeslots, generating a set of intervals per station.
- The time interval of an activity has a begin time 𝑦𝑐 and an end time 𝑦𝑓 on this
modular ring
- The duration of a time interval is the clockwise distance between begin and end
- n the ring (and taken modulo 24)
- We define a parametric distance measure 𝑒𝜄 for the distance between two time
intervals based on three parameters 𝜄 = (𝜄1, 𝜄2, 𝜄3) as follows: 𝑒𝜄 𝑦, 𝑧 = 𝜄1 𝑦𝑒 − 𝑧𝑒 2 if 𝑦𝑐 = 𝑧𝑐 or 𝑦𝑓 = 𝑧𝑓 𝜄2 𝑦𝑐 − 𝑧𝑐 2 if 𝑦𝑒 = 𝑧𝑒 𝜄3 𝑦𝑐 − 𝑧𝑐 + 𝑦𝑒 − 𝑧𝑒
2
- therwise
- Partition our observations into k sets using the k-means algorithm.
SLIDE 44
𝑙-MEANS CLUSTERING WITH 𝑙 = 3, 𝜄1 = 1, 𝜄2 = 1, 𝜄3 = 2
Entry Start End Cluster 1 7 10 1 2 7 17 2 3 8 17 2 4 8 18 2 5 9 18 2 6 10 15 3 7 11 20 2 Cluster Begin End 1 7 10 2 8 17 3 10 15 Data Centroids
SLIDE 45 DATA AND ROBUSTNESS FRACTIONS
- Trick for speed up: if a certain time interval occurs more often, we can group
them as a single datapoint and adapt the distance function accordingly.
– It is very fast on large datasets (assigning clusters takes 𝑃(𝑜𝑙) time, calculating a centroid is a bit more involved but still efficient enough)
- We repeat the k-means algorithm for the intervals at a station with:
– Different random seeds – Different settings for the parameters – Different values for k
- For each centroid we calculate how often it occurs in the final output of each
run of the algorithm. The average of this frequency over all stations is the Robustness Fraction
SLIDE 46
OUTPUT: ROBUSTNESS FRACTIONS
NB: This is based on urban public transport data, not Dutch Railways
SLIDE 47 LABELLING PROCEDURE
- Looking at the original chains of activities per individual card in the smart card
data, we can easily construct chains of time intervals.
- We label these time intervals according to a labelling procedure inspired by the
- bserved robustness fractions.
- We will then analyze the frequencies of pairs and triplets of consecutive labels.
Time of Day Duration
SLIDE 48
CONSECUTIVE PAIRS – OCCURRING FREQUENCIES
SLIDE 49
CONSECUTIVE TRIPLETS – OCCURRING FREQUENCIES
SLIDE 50 CONCLUSIONS AND FUTURE WORK
- The most prominent time patterns observed are associated with home-work
travel patterns, but we are also able to detect some less obvious patterns.
- For now, our method is still quite crude and only serves exploratory analysis.
- Interesting opportunities for future research:
– Can we automatically construct labelling rules from our clustering output? – Can we also investigate “spatial” usage patterns using similar methods?
SLIDE 51
CONCLUDING REMARKS
SLIDE 52 CONCLUDING REMARKS
- Although we have a lot of data, in order to generate valuable insights we need
to have a thorough understanding of the underlying processes.
SLIDE 53 CONCLUDING REMARKS
- Although we have a lot of data, in order to generate valuable insights we need
to have a thorough understanding of the underlying processes.
- By itself, the stream of smart card transactions is not enough to gain this
understanding. – For route choice, we need to validate our rules using an additional data set collected by the conductors. – For passenger punctuality and demand prediction, we need to understand the how passengers plan their journeys.
SLIDE 54 CONCLUDING REMARKS
- Although we have a lot of data, in order to generate valuable insights we need
to have a thorough understanding of the underlying processes.
- By itself, the stream of smart card transactions is not enough to gain this
understanding. – For route choice, we need to validate our rules using an additional data set collected by the conductors. – For passenger punctuality and demand prediction, we need to understand the how passengers plan their journeys.
- Information Systems + Human Behavior = Lots of Research Opportunities
SLIDE 55
QUESTIONS?
Questions? Suggestions? Thanks for your attention!
PBouman@rsm.nl