Bike share traffic predictions using machine learning Arnab Kumar - - PowerPoint PPT Presentation

bike share traffic predictions
SMART_READER_LITE
LIVE PREVIEW

Bike share traffic predictions using machine learning Arnab Kumar - - PowerPoint PPT Presentation

Bike share traffic predictions using machine learning Arnab Kumar Datta Agenda Introduction to bike-sharing Motivation and vision A short introduction to machine learning Overview of software Results Conclusion


slide-1
SLIDE 1

Bike share traffic predictions

using machine learning

Arnab Kumar Datta

slide-2
SLIDE 2

Agenda

  • Introduction to bike-sharing
  • Motivation and vision
  • A short introduction to

machine learning

  • Overview of software
  • Results
  • Conclusion
slide-3
SLIDE 3

Bike-sharing

slide-4
SLIDE 4

Why bike-sharing

slide-5
SLIDE 5

The problem

slide-6
SLIDE 6

Above: A customer reviews London’s bike-share system on the tripadvisor website

slide-7
SLIDE 7

Above: A customer reviews Washington’s bike-share system on the tripadvisor website

slide-8
SLIDE 8

Users currently have real-time systems

slide-9
SLIDE 9

The vision

“I will be downtown at 8 am on Monday. Will the bike station be full?”

slide-10
SLIDE 10

Related work

  • Data science for social good


(predicting bike-share usage in Chicago’s Divvy bike system)

  • Jake VanderPlas (modelling the effects of weather on bike

usage in Seattle)

slide-11
SLIDE 11

Machine learning

slide-12
SLIDE 12

Training set Machine learning algorithm Test set Learned estimator Predictions for test set

slide-13
SLIDE 13

11 bikes

Training set

Tuesday 8:00 AM Downtown Sunny 0 bikes Tuesday 11:00 AM Downtown Sunny 2 bikes Tuesday 8:00 AM Downtown Rainy 2 bikes Tuesday 11:00 AM Downtown Sunny 1 bike Tuesday 1:00 PM Downtown Sunny

slide-14
SLIDE 14

Test set

Tuesday 8:00 AM Downtown Sunny Tuesday 11:00 AM Downtown Sunny Tuesday 8:00 AM Downtown Sunny Tuesday 1:00 PM Downtown Sunny Tuesday 2:00 PM Downtown Sunny 11 bikes 1 bike 10 bikes 2 bikes 1 bike

slide-15
SLIDE 15

Software overview

slide-16
SLIDE 16
slide-17
SLIDE 17

Libraries used

  • Scikit-learn (machine learning algorithms)
  • Pybikes (data collection) to collect data from the

Washington bike-share system

slide-18
SLIDE 18

Machine learning algorithms

slide-19
SLIDE 19

Decision Trees

slide-20
SLIDE 20

Sunny Rainy Morning Morning Noon Noon 10,11 12,13 0,1 2,3 0,1 2,3 0,0 0,0

slide-21
SLIDE 21

Random Forests

slide-22
SLIDE 22

Random Forests

  • Lots of decision trees
  • Output given by the average of the output of all

trees in the forest

  • Cannot overfit by adding more trees (note: RF can
  • verfit on noisy datasets when there are too few

trees!)

slide-23
SLIDE 23

Ada Boost

slide-24
SLIDE 24

AdaBoost

  • Analogy: student preparing for an exam in physics
  • Topics covered: classical physics, thermodynamics,

electromagnetism, quantum physics

  • They start by doing a practice exam
  • They notice they didn’t do well on electromagnetism.

Ignore all other topics until they grasp electromagnetism.

  • Do another practice exam
  • Repeat… until it’s time for the exam
slide-25
SLIDE 25

Thesis contribution

slide-26
SLIDE 26

Data collection using Pybikes

slide-27
SLIDE 27

Feature selection

slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31

Why is the “epoch” so important?

A missing time-related feature that has not been accounted for.

slide-32
SLIDE 32

Genetic algorithms

  • Hyperparameters - algorithm

configuration

  • Can use GA to pick the “optimal” feature

set that provides the best prediction performance

  • GAs did not improve the accuracy over

manually picked hyperparameters

slide-33
SLIDE 33

Results

slide-34
SLIDE 34

A customizable machine-learning package for predicting bike-share usage

slide-35
SLIDE 35
slide-36
SLIDE 36
slide-37
SLIDE 37
slide-38
SLIDE 38

Improvements on existing solutions?

slide-39
SLIDE 39

Error metric: RMSE

slide-40
SLIDE 40

Improvement

Poisson model (DSSG) Decision Tree Regressor Random Forest Regressor Ada Boost Regressor

1,75 3,5 5,25 7

Error (RMSE)

slide-41
SLIDE 41

Further work

slide-42
SLIDE 42

The vision

“I will be downtown at 8 am on Monday. Will the bike station be full?”

slide-43
SLIDE 43
slide-44
SLIDE 44
slide-45
SLIDE 45
slide-46
SLIDE 46