Smart Card Data in Public Transport Paul Bouman Also based on work - - PowerPoint PPT Presentation

smart card data in public transport
SMART_READER_LITE
LIVE PREVIEW

Smart Card Data in Public Transport Paul Bouman Also based on work - - PowerPoint PPT Presentation

Department of Technology & Operations Management Smart Card Data in Public Transport Paul Bouman Also based on work of: Evelien van der Hurk, Timo Polman, Leo Kroon, Peter Vervest and Gbor Marti Complexity in Public Transport:


slide-1
SLIDE 1

Smart Card Data in Public Transport

Paul Bouman

Also based on work of: Evelien van der Hurk, Timo Polman, Leo Kroon, Peter Vervest and Gábor Maróti

Department of Technology & Operations Management

Complexity in Public Transport: http://www.computr.eu

slide-2
SLIDE 2

NETHERLANDS RAILWAYS (NS) IN NUMBERS

slide-3
SLIDE 3

NETHERLANDS RAILWAYS (NS) IN NUMBERS

1,1 ⋅ 106 Journeys per weekday 16.8 ⋅ 109 Yearly Passenger km’s 97.4 % Train Punctuality 1.5 % Cancelled Trains 4800+ Train Services per Weekday 3000+ Train Wagons/Drivers

slide-4
SLIDE 4

NETHERLANDS RAILWAYS (NS) IN NUMBERS

1,1 ⋅ 106 Journeys per weekday 16.8 ⋅ 109 Yearly Passenger km’s 97.4 % Train Punctuality 1.5 % Cancelled Trains 4800+ Train Services per Weekday 3000+ Train Wagons/Drivers

slide-5
SLIDE 5

SMART CARD DATA (AUTOMATED FARE COLLECTION)

  • Dutch “OV-chipkaart”
  • Always both check-in and check-out
  • Some differences between modalities

CardID Location Time Type … 543465 Harderwijk 13:42 CHECKOUT … 654345 Amsterdam 13:43 CHECKIN … … … … … …

slide-6
SLIDE 6

OVERVIEW

  • Introduction
  • Finding Passenger Routes

From smart card transactions to routes

  • A Better Way to Measure Service Quality

New Service Indicators from the perspective of the passenger

  • Analyzing Demand

Insight into passenger demand enables better service design

  • Discussion, Conclusion
slide-7
SLIDE 7

FROM SMART CARD AND TIMETABLE DATA TO PASSENGER ROUTES

slide-8
SLIDE 8

PASSENGER ROUTE CHOICE Knowledge on passenger route choice provides

  • Estimate demand for capacity
  • Test assumptions on passenger behavior and route choice
  • Hind-sight analysis of passenger service (delays)

Until now:

  • Surveys and panel data to deduce route choice
  • Models for route choice: maximum utility, regret minimization,…

Now:

  • We have both the Smart Card Data and conductor checks to determine the

routes used by a passenger. Why not use them?

slide-9
SLIDE 9

PROBLEM OVERVIEW ROUTE DEDUCTION FROM AFC

  • Which route (time, space, trains) did a passenger take?

Station A Station B Platform i Platform k

ci

  • co

time ci

co

trains

Time +Station Time +Station Conductor check

slide-10
SLIDE 10

PROBLEM OVERVIEW ROUTE DEDUCTION FROM AFC

  • Which route (time, space, trains) did a passenger take?

Station A Station B Platform i Platform k

ci

  • co

time ci

co

trains

Time +Station Time +Station Conductor check

Step 1:

How can we find these route options?

slide-11
SLIDE 11

FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK

Gvc 9:09 Gvc 9:12 Gvc 9:16 Gvc 9:25 Ledn 9:09 Ledn 9:12 Ledn 9:16 Ledn 9:21 Ledn 9:25 Dt 9:09 Dt 9:12 Dt 9:16 Dt 9:21 Dt 9:25 Gvc 9:21

slide-12
SLIDE 12

FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK

Gvc 9:09 Gvc 9:12 Gvc 9:16 Gvc 9:25 Ledn 9:09 Ledn 9:12 Ledn 9:16 Ledn 9:21 Ledn 9:25 Dt 9:09 Dt 9:12 Dt 9:16 Dt 9:21 Dt 9:25 Gvc 9:21

slide-13
SLIDE 13

FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK

Gvc 9:09 Gvc 9:12 Gvc 9:16 Gvc 9:25 Ledn 9:09 Ledn 9:12 Ledn 9:16 Ledn 9:21 Ledn 9:25 Dt 9:09 Dt 9:12 Dt 9:16 Dt 9:21 Dt 9:25 Gvc 9:21

Problem Transferring to another train is “free”. However, most passengers will prefer to stay in the same train if only a small amount of time is saved.

slide-14
SLIDE 14

FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK

Gvc 9:09 Gvc 9:12 Gvc 9:16 Ledn 9:09 Ledn 9:12 Ledn 9:16 Dt 9:09 Dt 9:12 Dt 9:16

slide-15
SLIDE 15

FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK

Gvc 9:09 Gvc 9:12 Gvc 9:16 Ledn 9:09 Ledn 9:12 Ledn 9:16 Dt 9:09 Dt 9:12 Dt 9:16

slide-16
SLIDE 16

FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK

Gvc 9:09 Gvc 9:12 Gvc 9:16 Ledn 9:09 Ledn 9:12 Ledn 9:16 Dt 9:09 Dt 9:12 Dt 9:16

D A D A

slide-17
SLIDE 17

FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK

Gvc 9:09 Gvc 9:12 Gvc 9:16 Ledn 9:09 Ledn 9:12 Ledn 9:16 Dt 9:09 Dt 9:12 Dt 9:16

D A D A

slide-18
SLIDE 18

FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK

Gvc 9:09 Gvc 9:12 Gvc 9:16 Ledn 9:09 Ledn 9:12 Ledn 9:16 Dt 9:09 Dt 9:12 Dt 9:16

D A D A Problem Whether a transfer is feasible may depend on the platforms of the trains.

slide-19
SLIDE 19

FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK

Gvc 9:09

Gvc 9:12

Gvc 9:16 Ledn 9:09 Ledn 9:12 Ledn 9:16 Dt 9:09 Dt 9:12 Dt 9:16

D A D A

Gvc 9:13

slide-20
SLIDE 20

FROM THE TIMETABLE TO AN EVENT ACTIVITY NETWORK

Gvc 9:09

Gvc 9:12

Gvc 9:16 Ledn 9:09 Ledn 9:12 Ledn 9:16 Dt 9:09 Dt 9:12 Dt 9:16

D A D A

Gvc 9:13

slide-21
SLIDE 21

TO SUMMARIZE

  • We have a ‘Basic Event Activity Network’ where transfers are ‘free’ that

contains an arc for each scheduled trip in the timetable

  • We have a ‘Extended Event Activity Network’ where we can include penalties
  • n the transfers and make sure that some slack time is included when the

passenger makes a transfer.

slide-22
SLIDE 22

COMPUTING SHORTEST PATHS

  • We use this procedure to obtain an Event-Activity Network

𝐻 = (𝑊, 𝐵)

  • Every node in the graph has a time label
  • If we assume every arc in the graph is associated with an action that takes a

strictly positive amount of time, we have a Directed Acyclic Graph

  • When constructing the graph, we can use the time indices to obtain a

Topological Ordering of the graph

𝑊 ≔ {𝑤1, 𝑤2, … , 𝑤𝑜} such that ∀ 𝑤𝑗, 𝑤𝑘 ∈ 𝐵 ∶ 𝑗 < 𝑘

  • We can use this to compute a Shortest Path Tree in 𝑃( 𝐵 ) time. For repeated

computations on 20-50k Origin/Destination pairs, this matters.

  • Using different cost parameters for the different types of arcs, we can generate

paths that favor or avoid different types of routes.

slide-23
SLIDE 23

PROBLEM OVERVIEW ROUTE DEDUCTION FROM AFC

  • Now we have a set of possible routes.

Station A Station B Platform i Platform k

ci

  • co

time ci

co

trains

Time +Station Time +Station Conductor check

slide-24
SLIDE 24

PROBLEM OVERVIEW ROUTE DEDUCTION FROM AFC

  • Now we have a set of possible routes.

Station A Station B Platform i Platform k

ci

  • co

time ci

co

trains

Time +Station Time +Station Conductor check

Step 2:

Which path should we choose, given the check in and check out?

slide-25
SLIDE 25

METHOD

  • Generate routes based either on

– The basic Event-Activity Network – The extended Event-Activity Network

  • From these, we pick one according to a fixed rule:

1) First Departure (FD) 2) Earliest Arrival (EA) 3) Latest Arrival (LA) 4) Least Transfers (LT) 5) Maximum Path Length (MPL) 6) Select Least Transfers Last Arrival (STA)

  • We will validate the methodology using conductor checks:

– Did we even find the correct route during route generation? – Does our rule pick the correct route?

slide-26
SLIDE 26

DATA

  • Smart card data

– Origin station, destination station, start time, end time, card id

  • Realized timetable

– Departure time station, arrival time station, train number

  • Conductor checks

– Card id, time, train number

General:

  • 5 days
  • Over 500,000 journeys,
  • For a significant number of journeys, we have a conductor check
  • Full Dutch railway network of Netherlands Railways trains
slide-27
SLIDE 27

RESULTS

Method Observed Path Generated Basic Network 75,5%

  • Ext. Network

92,3% Rule Basic Network

  • Ext. Network

First Departure 65% 86% Earliest Arrival 67% 86% Last Arrival 65% 86% Least Transfers 68% 90%

  • Max. Path Length

70% 92% Selected Least T. 73% 95% NB: These are percentages over the set of journeys for which the correct path was generated.

slide-28
SLIDE 28

DISCUSSION OF RESULTS

  • When constructing the Event-Activity Network, transfers matter for your

succes rate.

  • Rules solely based on either the arrival or departure time are
  • utperformed by those which include the number of transfers.
  • The “Selected Least Transfers” rule performs best, as it combines the

idea of minimizing transfers with the idea that a passenger will likely depart within 10 minutes of check in and check out within 10 minutes of arrival.

slide-29
SLIDE 29

MEASURING PASSENGER DELAYS

slide-30
SLIDE 30

MEASURING SERVICE QUALITY

  • Recall: punctuality score of NS is quite high (a little higher than 97%).
  • This refers to ‘train punctuality’

Did the train arrive within five minutes of the timetable?

  • Passengers mostly care whether they reach their destination in time.

‘Passenger punctuality’ would be a better indicator of service quality. – In case of transfers, small train delays can have a big impact on the passenger delays. – In some situations, delays can be lead to additional transfer opportunities

  • How can we measure this? (MSc. Thesis of Timo Polman)
  • Should be controllable (e.g. the cause can be determined), robust (e.g. not

depend on shopping behavior at a station) and simple.

slide-31
SLIDE 31

METHODOLOGY

Generate planned route Execute planned route Calculate Delay Realised Timetable Planned Timetable Smart Card Journey

slide-32
SLIDE 32

METHODOLOGY

Generate planned route Execute planned route Calculate Delay Realised Timetable Planned Timetable Smart Card Journey When the planned route is feasible, use that. If not, recalculate from the first point where it is infeasible.

slide-33
SLIDE 33

SOME DELAY MEASURES

  • Average Delay (Gross Passenger Delay Minutes / Lost Customer Hours)
  • Relative delay:
  • Delay divided by planned journey time
slide-34
SLIDE 34

SENSITIVITY TO SELECTION RULES

Early Arr. First Dep. Last Arr. Least Transfers

  • Sel. Least

Transfers Average Delay (min) 2.23 min 2.23 min 2.22 min 2.13 min 1:45 min Relative Delay 11.12 11.11 11.13 10.87 6.56 Missed Connections 1.09% 1.09% 1.10% 0.68% 0.39%

  • Average Delay (Gross Passenger Delay Minutes / Lost Customer Hours)
  • Relative delay:
  • Delay divided by planned journey time
slide-35
SLIDE 35

SENSITIVITY TO SELECTION RULES

Check in Check out Time →

slide-36
SLIDE 36

METRICS BASED ON EARLY ARRIVAL

Metric Percentage 5 minutes delayed 8.47% 10 minutes delayed 3.86% 15 minutes delayed 2.05% 30 minutes delayed 0.48% 45 minutes delayed 0.12% 60 minutes delayed 0.07% Metric Value Average Delay 1.93 min Average Travel Time 31.66 min Average Relative Delay 9.44%

slide-37
SLIDE 37

DISCUSSION

  • The distinction between ‘what did the passenger want to do?’ (related to stated

choice) and ‘what did the passenger do?’ (related to revealed choice) is important.

  • Using the ‘Earliest Arrival’ rule we are robust against late check outs due to for

example shopping, but sensitive to strategic planning at the beginning of the journey (for example when early advice is given by smartphone)

slide-38
SLIDE 38

DEMAND ANALYSIS

slide-39
SLIDE 39

Smart card data

ACTIVITY BASED DEMAND

Trip-based Tour-based Activity-based models

slide-40
SLIDE 40

FROM JOURNEYS TO ACTIVITY TIME INTERVALS

  • Order the journey per individual smartcard according to time and date.
  • An activity is detected if the destination and the origin of two consecutive

journeys are equal

  • The time and duration of the activity are based on the check out and the check

in time of the consecutive journeys

CardID Check in Check out Origin Destination 5345654 7/11 8:06 7/11 9:15 Utrecht Delft 5345654 7/11 22:03 7/11 23:15 Delft Utrecht 5345654 8/11 8:08 8/11 9:30 Utrecht Delft 6345763 5/11 15:16 5/11 15:45 Groningen Zwolle 6345763 6/11 9:15 6/11 10:15 Groningen Zwolle

slide-41
SLIDE 41

FROM JOURNEYS TO ACTIVITY TIME INTERVALS

  • Order the journey per individual smartcard according to time and date.
  • An activity is detected if the destination and the origin of two consecutive

journeys are equal

  • The time and duration of the activity are based on the check out and the check

in time of the consecutive journeys

CardID Check in Check out Origin Destination 5345654 7/11 8:06 7/11 9:15 Utrecht Delft 5345654 7/11 22:03 7/11 23:15 Delft Utrecht 5345654 8/11 8:08 8/11 9:30 Utrecht Delft 6345763 5/11 15:16 5/11 15:45 Groningen Zwolle 6345763 6/11 9:15 6/11 10:15 Groningen Zwolle

slide-42
SLIDE 42

FROM JOURNEYS TO ACTIVITY TIME INTERVALS

  • Order the journey per individual smartcard according to time and date.
  • An activity is detected if the destination and the origin of two consecutive

journeys are equal

  • The time and duration of the activity are based on the check out and the check

in time of the consecutive journeys

CardID Check in Check out Origin Destination 5345654 7/11 8:06 7/11 9:15 Utrecht Delft 5345654 7/11 22:03 7/11 23:15 Delft Utrecht 5345654 8/11 8:08 8/11 9:30 Utrecht Delft 6345763 5/11 15:16 5/11 15:45 Groningen Zwolle 6345763 6/11 9:15 6/11 10:15 Groningen Zwolle

slide-43
SLIDE 43

FROM JOURNEYS TO ACTIVITIES AND TIME INTERVALS

  • In order to simplify our data, we project the time intervals onto a modular ring

consisting of 24 timeslots, generating a set of intervals per station.

  • The time interval of an activity has a begin time 𝑦𝑐 and an end time 𝑦𝑓 on this

modular ring

  • The duration of a time interval is the clockwise distance between begin and end
  • n the ring (and taken modulo 24)
  • We define a parametric distance measure 𝑒𝜄 for the distance between two time

intervals based on three parameters 𝜄 = (𝜄1, 𝜄2, 𝜄3) as follows: 𝑒𝜄 𝑦, 𝑧 = 𝜄1 𝑦𝑒 − 𝑧𝑒 2 if 𝑦𝑐 = 𝑧𝑐 or 𝑦𝑓 = 𝑧𝑓 𝜄2 𝑦𝑐 − 𝑧𝑐 2 if 𝑦𝑒 = 𝑧𝑒 𝜄3 𝑦𝑐 − 𝑧𝑐 + 𝑦𝑒 − 𝑧𝑒

2

  • therwise
  • Partition our observations into k sets using the k-means algorithm.
slide-44
SLIDE 44

𝑙-MEANS CLUSTERING WITH 𝑙 = 3, 𝜄1 = 1, 𝜄2 = 1, 𝜄3 = 2

Entry Start End Cluster 1 7 10 1 2 7 17 2 3 8 17 2 4 8 18 2 5 9 18 2 6 10 15 3 7 11 20 2 Cluster Begin End 1 7 10 2 8 17 3 10 15 Data Centroids

slide-45
SLIDE 45

DATA AND ROBUSTNESS FRACTIONS

  • Trick for speed up: if a certain time interval occurs more often, we can group

them as a single datapoint and adapt the distance function accordingly.

  • Why k-means?

– It is very fast on large datasets (assigning clusters takes 𝑃(𝑜𝑙) time, calculating a centroid is a bit more involved but still efficient enough)

  • We repeat the k-means algorithm for the intervals at a station with:

– Different random seeds – Different settings for the parameters – Different values for k

  • For each centroid we calculate how often it occurs in the final output of each

run of the algorithm. The average of this frequency over all stations is the Robustness Fraction

slide-46
SLIDE 46

OUTPUT: ROBUSTNESS FRACTIONS

NB: This is based on urban public transport data, not Dutch Railways

slide-47
SLIDE 47

LABELLING PROCEDURE

  • Looking at the original chains of activities per individual card in the smart card

data, we can easily construct chains of time intervals.

  • We label these time intervals according to a labelling procedure inspired by the
  • bserved robustness fractions.
  • We will then analyze the frequencies of pairs and triplets of consecutive labels.

Time of Day Duration

slide-48
SLIDE 48

CONSECUTIVE PAIRS – OCCURRING FREQUENCIES

slide-49
SLIDE 49

CONSECUTIVE TRIPLETS – OCCURRING FREQUENCIES

slide-50
SLIDE 50

CONCLUSIONS AND FUTURE WORK

  • The most prominent time patterns observed are associated with home-work

travel patterns, but we are also able to detect some less obvious patterns.

  • For now, our method is still quite crude and only serves exploratory analysis.
  • Interesting opportunities for future research:

– Can we automatically construct labelling rules from our clustering output? – Can we also investigate “spatial” usage patterns using similar methods?

slide-51
SLIDE 51

CONCLUDING REMARKS

slide-52
SLIDE 52

CONCLUDING REMARKS

  • Although we have a lot of data, in order to generate valuable insights we need

to have a thorough understanding of the underlying processes.

slide-53
SLIDE 53

CONCLUDING REMARKS

  • Although we have a lot of data, in order to generate valuable insights we need

to have a thorough understanding of the underlying processes.

  • By itself, the stream of smart card transactions is not enough to gain this

understanding. – For route choice, we need to validate our rules using an additional data set collected by the conductors. – For passenger punctuality and demand prediction, we need to understand the how passengers plan their journeys.

slide-54
SLIDE 54

CONCLUDING REMARKS

  • Although we have a lot of data, in order to generate valuable insights we need

to have a thorough understanding of the underlying processes.

  • By itself, the stream of smart card transactions is not enough to gain this

understanding. – For route choice, we need to validate our rules using an additional data set collected by the conductors. – For passenger punctuality and demand prediction, we need to understand the how passengers plan their journeys.

  • Information Systems + Human Behavior = Lots of Research Opportunities
slide-55
SLIDE 55

QUESTIONS?

Questions? Suggestions? Thanks for your attention!

PBouman@rsm.nl