PULSE: A Real Time System for Crowd Flow Prediction at Metropolitan - - PowerPoint PPT Presentation

pulse a real time system for crowd flow prediction at
SMART_READER_LITE
LIVE PREVIEW

PULSE: A Real Time System for Crowd Flow Prediction at Metropolitan - - PowerPoint PPT Presentation

PULSE: A Real Time System for Crowd Flow Prediction at Metropolitan Subway Stations Ermal Toto, Prof. Elke A. Rundensteiner Prof. Yanhua Li 1 Outline Introduction Challenges State of the art Proposed Solution Experimental


slide-1
SLIDE 1

PULSE: A Real Time System for Crowd Flow Prediction at Metropolitan Subway Stations

1

Ermal Toto,

  • Prof. Elke A. Rundensteiner
  • Prof. Yanhua Li
slide-2
SLIDE 2

Worcester Polytechnic Institute

Outline

  • Introduction
  • Challenges
  • State of the art
  • Proposed Solution
  • Experimental Evaluation

2

slide-3
SLIDE 3

Worcester Polytechnic Institute

Urban Population Growth

3

1960: 1 Billion 2014: 3.9 Billion

United Nations. (2014). World Urbanization Prospects 2014: Highlights. United Nations Publications.

slide-4
SLIDE 4

Worcester Polytechnic Institute

Public Transportation

4

  • Growing size and complexity of

transportation networks

  • Vital to the urbanization process
  • Heterogeneous modes of transport
  • Need for better coordination

Annez, P. C., & Buckley, R. M. (2009). Urbanization and growth: setting the context. Urbanization and growth, 1, 1-45.

slide-5
SLIDE 5

Worcester Polytechnic Institute

Subway Transaction Data Model

5

Morning Evening

  • ID
  • Timestamp
  • Location
  • Action
slide-6
SLIDE 6

Worcester Polytechnic Institute

Outline

  • Introduction
  • Challenges
  • State of the art
  • Proposed Solution
  • Experimental Evaluation

6

slide-7
SLIDE 7

Worcester Polytechnic Institute

Different Traffic Patterns

7

Different models are needed for different:

  • Stations.
  • Days of the week.
  • Prediction Horizons.
slide-8
SLIDE 8

Worcester Polytechnic Institute

High Dimensionality Of Features

8

  • ML prediction models

perform differently depending on traffic characteristics at a location and time.

  • Each location needs a

different model.

  • Each model needs

streams from multiple locations.

Subway Stations Bus Stations

slide-9
SLIDE 9

Worcester Polytechnic Institute

Problem statement.

  • In order to generate high accuracy localized

predictions of human mobility, custom models are needed for each location. This process is complicated by the high dimensionality of data streams in urban transportation networks.

9

slide-10
SLIDE 10

Worcester Polytechnic Institute

Outline

  • Introduction
  • Challenges
  • State of the art
  • Proposed Solution
  • Experimental Evaluation

10

slide-11
SLIDE 11

Worcester Polytechnic Institute

Common Prediction Models

  • ARIMA

Stathopoulos, A., & Karlaftis, M. G. (2003). A multivariate state space approach for urban traffic flow modeling and prediction. Transportation Research Part C: Emerging Technologies, 11(2), 121-135. 11

t t-1 t-2 t-3 t+1

  • Univariate Method.
  • Can be made

multivariate by handcrafting transfer functions that model the interaction between different variables in a regression model.

  • Does not scale to

many nodes. Prediction Current Time

slide-12
SLIDE 12

Worcester Polytechnic Institute

Common Prediction Models

  • Linear Models

Sun, H., Liu, H. X., Xiao, H., He, R. R., & Ran, B. (2003, January). Short term traffic forecasting using the local linear regression model. In 82nd Annual Meeting of the Transportation Research Board, Washington, DC.

  • K-Nearest Neighbors (KNN)

Clark, S. (2003). Traffic prediction using multivariate nonparametric regression. Journal of transportation engineering, 129(2), 161-168. 12

Features: Local Streams, Remote Streams, Weather, Time Information, etc.

  • Features are selected

manually, therefore models are not scalable. Prediction Model Prediction at t+i

slide-13
SLIDE 13

Worcester Polytechnic Institute

Common Prediction Models

  • Random Forest

Hamner, B. (2010, December). Predicting travel times with context-dependent random forests by modeling local and aggregate traffic flow. In Data Mining Workshops (ICDMW), 2010 IEEE International Conference on (pp. 1357-1359). IEEE.

  • Artificial Neural Networks

(ANN)

Vlahogianni, E. I., Karlaftis, M. G., & Golias, J. C. (2005). Optimized and meta-optimized neural networks for short-term traffic flow prediction: a genetic approach. Transportation Research Part C: Emerging Technologies, 13(3), 211-234. 13

Prediction Model Features: Local Streams, Remote Streams, Weather, Time Information, etc. Prediction at t+i

  • Features are selected

manually, therefore models are not scalable.

slide-14
SLIDE 14

Worcester Polytechnic Institute

Generalized

14

slide-15
SLIDE 15

Worcester Polytechnic Institute

Hand Crafted

15

slide-16
SLIDE 16

Worcester Polytechnic Institute

Stream Selection

  • Local Streams
  • All Streams
  • Hand picked streams.
  • Downstream traffic to upstream locations.
  • No automated methods for stream selection.

16

slide-17
SLIDE 17

Worcester Polytechnic Institute

Outline

  • Introduction
  • Challenges
  • State of the art
  • Proposed Solution
  • Experimental Evaluation

17

slide-18
SLIDE 18

Worcester Polytechnic Institute

PULSE: Framework

18

  • Multi-layered

framework.

  • Customized Models:

─ Temporal Localization ─ Spatial Localization ─ Prediction horizon

slide-19
SLIDE 19

Worcester Polytechnic Institute

Streaming Features

  • Time Interval – The
  • peration hours are

divided in 15 min time intervals [1 – 64].

  • Day of the week [1 – 7].
  • Weather – Temperature

and humidity during each Time Interval.

  • Local Station arrivals

and departures during each Time Interval.

  • Remote Station arrivals

and departures during each Time Interval.

slide-20
SLIDE 20

Worcester Polytechnic Institute

Personality Features

Average number of arrivals to a station per (15min) interval. Average time duration of trips arriving to a station (from any station).

slide-21
SLIDE 21

Worcester Polytechnic Institute

Personality Features

Attrition Rate is the ratio of trips that are departing only (do not have a matching return trip). Peak scores capture peak traffic behaviors during mornings and evenings, for both arrivals and departures. They are defined by the number of local outliers.

slide-22
SLIDE 22

Worcester Polytechnic Institute

Stream Selection: TBSS

Horizon = 4 Time Based Stream Selection is based on the assumption that future arrivals at a station, will come from departures of other stations that are within the prediction horizon.

slide-23
SLIDE 23

Worcester Polytechnic Institute

Stream Selection: TBSS

Horizon =12 Time Based Stream Selection is based on the assumption that future arrivals at a station, will come from departures of other stations that are within the prediction horizon.

slide-24
SLIDE 24

Worcester Polytechnic Institute

Stream Selection: FBSS

Flow Based Stream Selection is based on the assumption that future arrivals at a target station will come from stations with high historical traffic to that station.

slide-25
SLIDE 25

Worcester Polytechnic Institute Departures 262018 Departures 261017

Stream Selection: FBSS

Arrivals 268006

Prediction Actual Flow Time Interval

slide-26
SLIDE 26

Worcester Polytechnic Institute

Stream Selection: PFDBSSMin

slide-27
SLIDE 27

Worcester Polytechnic Institute

Stream Selection

27

slide-28
SLIDE 28

Worcester Polytechnic Institute

Model Selection – Temporal Localization

28

slide-29
SLIDE 29

Worcester Polytechnic Institute

Model Selection – Temporal Localization

29

  • Personality Features are

computed separately for weekdays and weekends.

  • Other temporal localizations

exist:

─ Days and Nights ─ Holidays, Fridays etc ..

  • Temporal localization is

treated as if it was a different spatial location.

─ A locations behavior during weekends, is assumed to have no relation to its behavior during the weekdays, unless otherwise described by the personality features.

slide-30
SLIDE 30

Worcester Polytechnic Institute

Model Selection – Spatial Localization

30

Brute Force:

  • 118 Stations x 2 Time Periods x 6 prediction horizons x 118 TBSS

Values x 118 FBSS Values x 118 PFBSS Values ~ 2.3Billion Models

  • 1 to 15 seconds to train and test each model (6seconds on average) ~

443.7years (For each ML Method, therefore x 5).

  • Our system works in parallel and utilize 30cores so it would only need

14.7years : ).

  • What about a network with 10K nodes?

126Million Years 14.7 Years

slide-31
SLIDE 31

Worcester Polytechnic Institute

Model Selection

31

  • After an initial set of models have

been discovered, future models can be quickly looked up from existing models.

  • Further gradient search of

stream selection parameters, only slightly improves the performance when using a lookup model.

  • Post-hoc: Using KNN to decide the

ML method (Classification task) gives a 80% accuracy, when trained with 50% of the discovered models and tested on the other 50%.

Personality Features, Horizon, Day of the Week Classification Model Using KNN KNN, RF, ANN, LM …

slide-32
SLIDE 32

Worcester Polytechnic Institute

Outline

  • Introduction
  • Challenges
  • State of the art
  • Proposed Solution
  • Experimental Evaluation

32

slide-33
SLIDE 33

Worcester Polytechnic Institute

Overall Results

33

slide-34
SLIDE 34

Worcester Polytechnic Institute

Select Stations

34

slide-35
SLIDE 35

Worcester Polytechnic Institute

Future Work

35

Subway Stations Bus Stations

slide-36
SLIDE 36

Worcester Polytechnic Institute

Questions?

36