Analyzing Big Data From Complex Systems: Smart Cards in Urban - - PowerPoint PPT Presentation

analyzing big data from complex systems
SMART_READER_LITE
LIVE PREVIEW

Analyzing Big Data From Complex Systems: Smart Cards in Urban - - PowerPoint PPT Presentation

Analyzing Big Data From Complex Systems: Smart Cards in Urban Transportation Networks Soong Moon Kang School of Management University College London smkang@ucl.ac.uk The Institute for Korean Regional Studies Seoul National University


slide-1
SLIDE 1

Analyzing Big Data From Complex Systems:

Smart Cards in Urban Transportation Networks Soong Moon Kang

School of Management University College London

smkang@ucl.ac.uk

The Institute for Korean Regional Studies Seoul National University

September 6, 2016

slide-2
SLIDE 2

Transport for London (TfL) Oyster Card

Wikicommons

  • Introduced in 2003
  • By June 2012: - More than 43 million cards issued
  • Used by more than 80% of all public transport
slide-3
SLIDE 3

Agenda:

  • Study 1: Patterns of Urban Movement
  • Study 2: Predicting Traffic Volumes and Estimating Effects
  • f Disruptions
  • Study 3: Extensions of the Study on the Patterns of Urban

Movement

  • Study 4: Extensions of the Study on the Effects of

Disruptions

  • Discussions
slide-4
SLIDE 4

Study 1: Patterns of Urban Movement

  • "Structure of Urban Movements: Polycentric Activity

and Entangled Hierarchical Flows” PLoS ONE, January 7, 2011, 6(1):e15923. (with Camille Roth, Michael Batty and Marc Barthélémy)

slide-5
SLIDE 5

Data:

  • March 31, 2008 — April 6, 2008 (1 week)
  • 11.22 million journeys (trips)
  • 2.03 million individual users (IDs)
  • Information for each ID:
  • time and location of tap-in and tap-out

 individual movements

slide-6
SLIDE 6

Descriptives:

Distribution of travel distances

9.28 km

can be fitted with a negative binomial function distribution of journeys distribution of distances between stations

slide-7
SLIDE 7

Descriptives:

Travel propensity

random simulation (given in- and

  • ut-flow at

stations)  null-model of randomized journeys actual flow (wij) vs random

slide-8
SLIDE 8

Descriptives:

wij: flow of passengers between stations i and j

Flow distribution: normalized histogram of flows of individuals

power law with exponent ≈ 1.3  strong heterogeneity of individual movements

slide-9
SLIDE 9

Descriptives:

Distribution of total flows: Zipf plot

for morning peak hours (7am – 10am)

  • Exponential decay  most of total flows concentrated on few stations

with

slide-10
SLIDE 10

Polycenters:

Identifying polycenters:

  • 1. Arrange stations by decreasing order of inflow

 definition of centers by decreasing importance

  • 2. Account for geographical proximity

 aggregate all stations within a distance (1,500 meters) within the defined center

  • 3. Continue until we capture a large percentage of total flow

(60% of total flow)

slide-11
SLIDE 11

Polycenters:

Hierarchical organization

slide-12
SLIDE 12

Polycenters:

Western Stations West End Northern Stations City Docklands West London Museums Government Parliament Mid-Town

slide-13
SLIDE 13

Polycenters:

anisotropy

Anisotropy

  • Use random simulation

from travel propensity to study relative orientation

  • f incoming flow

 if no bias, fully isotropic (= 1)

slide-14
SLIDE 14

Polycenters:

slide-15
SLIDE 15

Structure of Flows:

How flows from single stations (sources) go to centers

  • squares: sources (single stations)
  • circles: centers
  • grey: 20% of total inflow
  • red: 40% of total inflow
slide-16
SLIDE 16

Structure of Flows:

Proportion of links going from sources to centers (group)

  • For more than 80% of the sources, the most important link (1st link)

connects to a center of Group I

  • For more than 80% of the sources, the least important link (10th link)

connects to a center of Group III.

Group I Group II Group III

slide-17
SLIDE 17

Study 1: Patterns of Urban Movement

  • Contributions:
  • application of complex systems analytical tools to

a novel data

  • a new approach to determine polycenters
  • attempt to model hierarchical nature of urban

movements

  • Limitations:
  • exploratory
  • naive
slide-18
SLIDE 18

Study 2: Predicting Traffic Volumes and Estimating Effects of Disruptions

  • "Predicting Traffic Volumes and Estimating the Effects of

Shocks in Massive Transportation Systems” Proceedings of the National Academy of Sciences (PNAS) May 5, 2015, 112(18): 5643–5648. (with Ricardo Silva and Edoardo M. Airoldi)  Introducing statistical analysis into complex systems

slide-19
SLIDE 19

Data:

  • February 2011 — February 2012
  • 70 weekdays and 25 weekend days
  • 211 million journeys (trips)
  • 10.7 million individual users (IDs)

 1.71 journeys per user per day  1.76 million users per day  3 million journeys per day

  • 374 stations open during the period

(underground + overground + DLR)

slide-20
SLIDE 20

Data:

  • Weekdays only
slide-21
SLIDE 21

Statistical Model:

Basic Idea:

slide-22
SLIDE 22

Statistical Model:

Basic Idea:

slide-23
SLIDE 23

Statistical Model:

Basic Idea:

slide-24
SLIDE 24

Statistical Model:

Basic Idea: Smart Card Data Network Structure Data

Disruption Logs Passenger Route Surveys

“Natural Regime” Model “Disruption” Model

slide-25
SLIDE 25

“Natural Regime” Model:

Smart Card Data Network Structure Data

Disruption Logs Passenger Route Surveys

“Natural Regime” Model “Disruption” Model

slide-26
SLIDE 26

“Natural Regime” Model:

Basic Idea:

slide-27
SLIDE 27

“Natural Regime” Model:

Assessment:

  • Fivefold cross-validation (i.e., 14 days of test data for each fold):

Test if the fine-grained model with 374×374 ≅140,000 components

  • verfits as compared to the fully aggregated (blackbox) models, and

under which conditions the model does better

slide-28
SLIDE 28

“Disruption” Model:

Smart Card Data Network Structure Data

Disruption Logs Passenger Route Surveys

“Natural Regime” Model “Disruption” Model

slide-29
SLIDE 29

“Disruption” Model:

Basic Model:

slide-30
SLIDE 30

“Disruption” Model:

Results:

Average number of exits per minute at Victoria LU station on Tuesday, January 17, 2012. The blue curve represents the 1-min-ahead prediction under the natural regime using the tracking model. Given a disruption from 6:00 PM to 7:00 PM between Victoria station and Brixton station in the Victoria line,

  • blue horizontal line: the average expected exit rate given by the tracking model under the natural regime,
  • red horizontal line: the averaged observed exit count, and
  • black horizontal line: the prediction given by the disruption model
slide-31
SLIDE 31

“Disruption” Model:

Assessment:

(A) Relative errors for line segment events. The absolute error of tracking model for the line segment disruption varies from 3.0 (all stations) to 12.2 (stations with 85 tap-outs per minute or more) persons per minute. (B) Relative errors for station events. The absolute error varies from 3.5 (all stations) to 10.5 (stations with 75 tap-outs per minute or more) persons per minute.

slide-32
SLIDE 32

Station Sensitivity Index:

How sensitive stations are to line closures:

Red dots: top 10% by number of tap-outs

slide-33
SLIDE 33
  • Contributions:
  • application of statistical and machine learning

techniques to complex systems

  • good model to describe and predict the effects
  • f disruptions
  • Limitation:
  • simplistic

Study 2: Predicting Traffic Volumes and Estimating Effects of Disruptions

slide-34
SLIDE 34

Study 3: Extensions of the Study on the Patterns of Urban Movement

  • with Michael Batty, Hae Ran Shin, Ricardo Silva and

Chen Zhong  Introducing statistical analysis into the study of urban movement patterns

slide-35
SLIDE 35

Study 3a: Passenger Travel Distributions

Basic Idea:

slide-36
SLIDE 36

Study 3a: Passenger Travel Distributions

Basic Idea:

frequency frequency distance distance

Station A Station B

slide-37
SLIDE 37

Study 3a: Passenger Travel Distributions

Basic Idea:

frequency frequency distance distance

Station A Station B

slide-38
SLIDE 38

Study 3a: Passenger Travel Distributions

Some Research Questions:

  • Do travel distributions of the passengers entering specific

stations reveal a more generic pattern?  “local” versus “global”

  • If a generic pattern exist, how it relates to the urban geography?

 “center” versus “periphery”

slide-39
SLIDE 39

Study 3b: Passenger Travel Distributions and Geographic Socio-Economic Characteristics

Basic Idea:

  • Correlate passenger travel distributions with geographic socio-

economic characteristics such as income, education, age, employment and family composition.

slide-40
SLIDE 40

Study 3: Extensions of the Study on the Patterns of Urban Movement

  • Data:

 London and Seoul  Major challenges:

  • Only one day of data from Seoul
  • Fine grained socio-economic data for Seoul
slide-41
SLIDE 41

Study 4: Extensions of the Study on the Effects of Disruptions

  • with Ricardo Silva

 Refining the statistical analyses  Ultimate goal: real-time assessment of effects

  • f disruptions system-wide
slide-42
SLIDE 42

Study 4a: Probabilistic and Causal Approaches

Basic Idea:

slide-43
SLIDE 43

Study 4a: Probabilistic and Causal Approaches

Basic Ideas:

  • provide a full probabilistic model of movement inside the subway

network system

  • estimate the distribution (instead of only the expectation) of travel

times, link loads and exit numbers given a disruption  causal inference

slide-44
SLIDE 44

Study 4b: Passenger-level modeling

Basic Ideas:

  • model by taking into account the behaviour of individual travellers,

instead of aggregated counts

  • collect fine-grained passenger movement data using mobile apps
slide-45
SLIDE 45

Discussion