JONATHAN FÜRST1, MAURICIO FADEL ARGERICH1, KALYANARAMAN SHANKARI2, GÜRKAN SOLMAZ1, BIN CHENG1
JONATHAN.FUERST@NECLAB.EU, MAURICIO.FADEL@NECLAB.EU, BIN.CHENG@NECLAB.EU 1NEC LABS EUROPE, 2UC BERKELEY
APPLYING WEAK SUPERVISION TO MOBILE SENSOR DATA: EXPERIENCES WITH - - PowerPoint PPT Presentation
APPLYING WEAK SUPERVISION TO MOBILE SENSOR DATA: EXPERIENCES WITH TRANSPORT MODE DETECTION JONATHAN FRST 1 , MAURICIO FADEL ARGERICH 1 , KALYANARAMAN SHANKARI 2 , GRKAN SOLMAZ 1 , BIN CHENG 1 JONATHAN.FUERST@NECLAB.EU,
JONATHAN FÜRST1, MAURICIO FADEL ARGERICH1, KALYANARAMAN SHANKARI2, GÜRKAN SOLMAZ1, BIN CHENG1
JONATHAN.FUERST@NECLAB.EU, MAURICIO.FADEL@NECLAB.EU, BIN.CHENG@NECLAB.EU 1NEC LABS EUROPE, 2UC BERKELEY
¡
¡
¡
¡
Location
¡
Time
¡
Data Quality
¡
Time Location Data Quality t0 t1 La
External Knowledge
Lb
ML Model External Knowledge Labeled data
¡
¡
¡
¡
Citizens get transport recommendations
¡
City gets an aggregated view of transportation
¡
¡
Location of users (start, trajectory and end point) à TGGPS
¡
Transport mode of user à Manually labeled? à Can we infer it?
Individual Travel Insights Overall Travel Insights
¡
¡
1.
Segmentation
2.
Classification
¡
¡
Data has to be labeled manually
¡
Training sets are small (guess why) and then model is overfitted
¡
Data quality vs. battery and OS limitations
Data is labeled semi-automatically Training set can be much larger, less overfitting The more the data, the less data quality needed
Train Transport Mode Classifier
User Smart Phone
Trip Segmentation Section Candidate Segmentation F1 ([s1 , s2 ... si ]) F2 ([s1 , s2 ... si ]) F3 ([s1 , s2 ... si ]) Fn ([s1 , s2 ... si ]) Learn Generative Model
Dwell-time heuristic Accelerometer supported walk point segmentation
Filtering and resampling
APIs
Label & Training Phase Classification Phase
Transport Mode Classification
Smartphone Data Collection Pre-processing Phase
Classified Trips & Sections Re-segment sections based on classified modes
¡
¡
Sensing location using GPS
¡
Accelerometer and barometer à high frequency
¡
¡
Highly optimized for battery consumption
¡
BUT, sparse and noisy sensor data
Train Transport Mode Classifier
User Smart PhoneTrip Segmentation Section Candidate Segmentation F1 ([s1 , s2 ... si ]) F2 ([s1 , s2 ... si ]) F3 ([s1 , s2 ... si ]) Fn ([s1 , s2 ... si ]) Learn Generative Model
Dwell-time heuristic Accelerometer supported walk point segmentationFiltering and resampling
Label & Training Phase Classification Phase
Transport Mode Classification
Smartphone Data Collection Pre-processing Phase
Classified Trips & Sections Re-segment sections based on classified modes
1.
¡
Sparsity
¡
Location and activity data are not aligned
¡
No fixed sampled interval
2.
¡
Dwell time heuristics
3.
¡
Walk-point-based
Train Transport Mode Classifier
User Smart PhoneTrip Segmentation Section Candidate Segmentation F1 ([s1 , s2 ... si ]) F2 ([s1 , s2 ... si ]) F3 ([s1 , s2 ... si ]) Fn ([s1 , s2 ... si ]) Learn Generative Model
Dwell-time heuristic Accelerometer supported walk point segmentationFiltering and resampling
Label & Training Phase Classification Phase
Transport Mode Classification
Smartphone Data Collection Pre-processing Phase
Classified Trips & Sections Re-segment sections based on classified modes
location activity Trip 1 Trip 2 Trip 2 Segment 1 Segment 2
¡
¡
¡
Programmable functions
¡
Use external knowledge
¡
Cast a (noisy) vote on each data point
¡
¡
¡
Train Transport Mode Classifier
User Smart PhoneTrip Segmentation Section Candidate Segmentation F1 ([s1 , s2 ... si ]) F2 ([s1 , s2 ... si ]) F3 ([s1 , s2 ... si ]) Fn ([s1 , s2 ... si ]) Learn Generative Model
Dwell-time heuristic Accelerometer supported walk point segmentationFiltering and resampling
Label & Training Phase Classification Phase
Transport Mode Classification
Smartphone Data Collection Pre-processing Phase
Classified Trips & Sections Re-segment sections based on classified modes
“if the maximum speed of a segment is less than 3 m/s, then it’s probably a walking segment” “instead, if it’s higher than 3 m/s but less than 10 m/s, then it’s probably a bike segment” ...
data point 1 data point 2 data point 3 data point 4 l a b . f u n c t i
1 l a b . f u n c t i
2 l a b . f u n c t i
3
¡
¡
8 users collected data for 4 months: 300k data points
¡
Features
¡
GPS location (through iOS and Android Location API)
¡
Accelerometer based activity data (through Activity API)
¡
Users partially labeled data using a visual labeling tool
¡
4 transport modes: walk, bike, car, train
¡
Train/test split: 50/50
¡
¡
Sensed speed
¡
Velocity (calculated with GPS)
¡
OpenStreetMaps (to check train stops)
¡
64.00% 70.35% 72.40% 74.10% V V+S V+S+A V+S+A+OSM
(V) (V) (V) (A) (OSM) (S) (S)
¡
¡
74.10% 80.20% 78.40% 81.00% GEN.MODEL WS-RF WS-NN SUP-RF
¡
¡
Domain/Expert knowledge
¡
Physical knowledge
¡
¡
¡
¡