analyzing big data from complex systems
play

Analyzing Big Data From Complex Systems: Smart Cards in Urban - PowerPoint PPT Presentation

Analyzing Big Data From Complex Systems: Smart Cards in Urban Transportation Networks Soong Moon Kang School of Management University College London smkang@ucl.ac.uk The Institute for Korean Regional Studies Seoul National University


  1. Analyzing Big Data From Complex Systems: Smart Cards in Urban Transportation Networks Soong Moon Kang School of Management University College London smkang@ucl.ac.uk The Institute for Korean Regional Studies Seoul National University September 6, 2016

  2. Transport for London (TfL) Oyster Card Wikicommons • Introduced in 2003 • By June 2012: - More than 43 million cards issued - Used by more than 80% of all public transport

  3. Agenda: • Study 1: Patterns of Urban Movement • Study 2: Predicting Traffic Volumes and Estimating Effects of Disruptions • Study 3: Extensions of the Study on the Patterns of Urban Movement • Study 4: Extensions of the Study on the Effects of Disruptions • Discussions

  4. Study 1: Patterns of Urban Movement • "Structure of Urban Movements: Polycentric Activity and Entangled Hierarchical Flows” PLoS ONE , January 7, 2011, 6(1):e15923. (with Camille Roth, Michael Batty and Marc Barthélémy)

  5. Data: • March 31, 2008 — April 6, 2008 (1 week) - 11.22 million journeys (trips) - 2.03 million individual users (IDs) • Information for each ID: - time and location of tap-in and tap-out  individual movements

  6. Descriptives: Distribution of travel distances can be fitted with a negative binomial function distribution of distances between stations distribution of journeys 9.28 km

  7. Descriptives: Travel propensity actual flow ( w ij ) vs random random simulation (given in- and out-flow at stations)  null-model of randomized journeys

  8. Descriptives: Flow distribution: normalized histogram of flows of individuals power law with exponent ≈ 1.3  strong heterogeneity of individual movements w ij : flow of passengers between stations i and j

  9. Descriptives: Distribution of total flows: Zipf plot with for morning peak hours (7am – 10am) • Exponential decay  most of total flows concentrated on few stations

  10. Polycenters: Identifying polycenters: 1. Arrange stations by decreasing order of inflow  definition of centers by decreasing importance 2. Account for geographical proximity  aggregate all stations within a distance (1,500 meters) within the defined center 3. Continue until we capture a large percentage of total flow (60% of total flow)

  11. Polycenters: Hierarchical organization

  12. Polycenters: Northern Stations West End Western Stations City Docklands West London Museums Parliament Mid-Town Government

  13. Polycenters: Anisotropy - Use random simulation from travel propensity to study relative orientation of incoming flow anisotropy  if no bias, fully isotropic (= 1)

  14. Polycenters:

  15. Structure of Flows: How flows from single stations (sources) go to centers - squares: sources (single stations) - grey: 20% of total inflow - circles: centers - red: 40% of total inflow

  16. Structure of Flows: Proportion of links going from sources to centers (group) Group I Group II Group III For more than 80% of the sources, the most important link (1 st link) - connects to a center of Group I For more than 80% of the sources, the least important link (10 th link) - connects to a center of Group III.

  17. Study 1: Patterns of Urban Movement • Contributions: - application of complex systems analytical tools to a novel data - a new approach to determine polycenters - attempt to model hierarchical nature of urban movements • Limitations: - exploratory - naive

  18. Study 2: Predicting Traffic Volumes and Estimating Effects of Disruptions • "Predicting Traffic Volumes and Estimating the Effects of Shocks in Massive Transportation Systems” Proceedings of the National Academy of Sciences ( PNAS ) May 5, 2015, 112(18): 5643 – 5648. (with Ricardo Silva and Edoardo M. Airoldi)  Introducing statistical analysis into complex systems

  19. Data: • February 2011 — February 2012 - 70 weekdays and 25 weekend days - 211 million journeys (trips) - 10.7 million individual users (IDs)  1.71 journeys per user per day  1.76 million users per day  3 million journeys per day - 374 stations open during the period (underground + overground + DLR)

  20. Data: • Weekdays only

  21. Statistical Model: Basic Idea:

  22. Statistical Model: Basic Idea:

  23. Statistical Model: Basic Idea:

  24. Statistical Model: Basic Idea: Smart Card Data “Natural Regime” Model Network Structure Data “Disruption” Model Disruption Logs Passenger Route Surveys

  25. “Natural Regime” Model: Smart Card Data “Natural Regime” Model Network Structure Data “Disruption” Model Disruption Logs Passenger Route Surveys

  26. “Natural Regime” Model: Basic Idea:

  27. “Natural Regime” Model: Assessment: - Fivefold cross-validation (i.e., 14 days of test data for each fold): Test if the fine-grained model with 374×374 ≅ 140,000 components overfits as compared to the fully aggregated (blackbox) models, and under which conditions the model does better

  28. “Disruption” Model: Smart Card Data “Natural Regime” Model Network Structure Data “Disruption” Model Disruption Logs Passenger Route Surveys

  29. “Disruption” Model: Basic Model:

  30. “Disruption” Model: Results: Average number of exits per minute at Victoria LU station on Tuesday, January 17, 2012. The blue curve represents the 1-min-ahead prediction under the natural regime using the tracking model. Given a disruption from 6:00 PM to 7:00 PM between Victoria station and Brixton station in the Victoria line , - blue horizontal line : the average expected exit rate given by the tracking model under the natural regime , - red horizontal line : the averaged observed exit count , and - black horizontal line : the prediction given by the disruption model

  31. “Disruption” Model: Assessment: (A) Relative errors for line segment events. The absolute error of tracking model for the line segment disruption varies from 3.0 (all stations) to 12.2 (stations with 85 tap-outs per minute or more) persons per minute. (B) Relative errors for station events. The absolute error varies from 3.5 (all stations) to 10.5 (stations with 75 tap-outs per minute or more) persons per minute.

  32. Station Sensitivity Index: How sensitive stations are to line closures: Red dots: top 10% by number of tap-outs

  33. Study 2: Predicting Traffic Volumes and Estimating Effects of Disruptions • Contributions: - application of statistical and machine learning techniques to complex systems - good model to describe and predict the effects of disruptions • Limitation: - simplistic

  34. Study 3: Extensions of the Study on the Patterns of Urban Movement • with Michael Batty, Hae Ran Shin, Ricardo Silva and Chen Zhong  Introducing statistical analysis into the study of urban movement patterns

  35. Study 3a: Passenger Travel Distributions Basic Idea:

  36. Study 3a: Passenger Travel Distributions Basic Idea: frequency frequency 0 distance 0 distance Station B Station A

  37. Study 3a: Passenger Travel Distributions Basic Idea: frequency frequency 0 distance 0 distance Station B Station A

  38. Study 3a: Passenger Travel Distributions Some Research Questions: - Do travel distributions of the passengers entering specific stations reveal a more generic pattern?  “local” versus “global” - If a generic pattern exist, how it relates to the urban geography?  “ center ” versus “periphery”

  39. Study 3b: Passenger Travel Distributions and Geographic Socio-Economic Characteristics Basic Idea: - Correlate passenger travel distributions with geographic socio- economic characteristics such as income, education, age, employment and family composition.

  40. Study 3: Extensions of the Study on the Patterns of Urban Movement • Data:  London and Seoul  Major challenges: - Only one day of data from Seoul - Fine grained socio-economic data for Seoul

  41. Study 4: Extensions of the Study on the Effects of Disruptions • with Ricardo Silva  Refining the statistical analyses  Ultimate goal: real-time assessment of effects of disruptions system-wide

  42. Study 4a: Probabilistic and Causal Approaches Basic Idea:

  43. Study 4a: Probabilistic and Causal Approaches Basic Ideas: - provide a full probabilistic model of movement inside the subway network system - estimate the distribution (instead of only the expectation) of travel times, link loads and exit numbers given a disruption  causal inference

  44. Study 4b: Passenger-level modeling Basic Ideas: - model by taking into account the behaviour of individual travellers, instead of aggregated counts - collect fine-grained passenger movement data using mobile apps

  45. Discussion

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend