1
Structured Deep Learning of Human Motion
Christian Wolf Fabien Baradel Natalia Neverova Julien Mille Graham W. Taylor Greg Mori
Structured Deep Learning of Human Motion Christian Wolf Fabien - - PowerPoint PPT Presentation
Structured Deep Learning of Human Motion Christian Wolf Fabien Baradel Natalia Neverova Julien Mille Graham W. Taylor Greg Mori 1 Deep Learning of Human Motion Gesture re Ge recognition Re Recognition of in indiv ivid idual act
1
Christian Wolf Fabien Baradel Natalia Neverova Julien Mille Graham W. Taylor Greg Mori
2
Ge Gesture re recognition
Po Pose estimation Re Recognition of in indiv ivid idual act activities es & & interactions Re Recognition of group act activities es
3
3 [Neverova, Wolf, Taylor, Nebout. CVIU 2017]
4
Joint positions (NYU Dataset) Synthetic data (part segmentation)
Graham W. Taylor University of Guelph Canada Natalia Neverova Phd @ LIRIS, Now at Facebook Florian Nebout Awabot Christian Wolf LIRIS INSA-Lyon
5
[Fourure, Emonet, Fromont, Muselet, Tremeau, Wolf, BMVC 2017] Damien Fourure
6
Un Unconstrained in internet/yo youtube vi videos No No acquisition E.g. Youtube-8M dataset: 7M videos, 4716 classes, ~3.4 labels per video. > 1PB of data. Vi Vide deos wi with hum human an act activities es, , fr from yo youtube No No acquisition E.g. ActivityNet/Kinetics dataset: ~300k videos, 400 classes. Hu Human act activities es sh shot wi with de dept pth se senso sors Ac Acquisition is is ti time cons consum uming ng! E.g. NTU RGB-D dataset, MSR dataset, ChaLearn/Montalbano dataset, etc.
7
Deep Learning (Global)
(Mostly after 2012) 7
[Baccouche, Mamalet, Wolf, Garcia, Baskur, HBU 2011] [Baccouche, Mamalet, Wolf, Garcia, Baskur, BMVC 2012]
[Carreira and Zisserman, CVPR 2017] [Ji et al., ICML 2010]
8
8
9
9
[Neverova, Wolf, Taylor, Nebout, PAMI 2016] [Baradel, Wolf, Mille, Taylor, BMVC 2018]
10
11
12
Frame from the NTU RGB-D Dataset
13
Local representations
(Before 2012)
[Felzenszwalb et al., PAMI 2010] 5
Local appearance Deformation
14
Visual recognition
(activities, gestures,
Deep Learning Structured and semi-structured models Structured deep learning Representation learning
Local context
Complex relationships,
Global context
1
l l2 l4
1
F4 F2 F
15
[Johansson, Holsanova, Dewhurst, Holmqvist, 2012]
16 [Song et al., AAAI 2016]
Attention on joints
[Sharma et al., ICLR 2016] [Mnih et al., NIPS 2015]
Hard attention Soft attention in feature maps
Local representations Deep Learning (Global)
(Before 2012) (Mostly after 2012)
Deep Learning (attention maps)
(~2016)
Deep Learning (Local representations)
17
v3,1
1. Learn where to attend 2. Learn how to track attended points 3. Learn how to recognize from a local distributed representation
[Baradel, Wolf, Mille, Taylor, CVPR 2018]
Local representations Deep Learning (Global)
(Before 2012) (Mostly after 2012)
Deep Learning (attention maps)
(~2016)
Deep Learning (Local representations)
9
Recognize activity
18
RGB input video
Time
Feature space
3D Global model: Inflated Resnet 50
Time
[Baradel, Wolf, Mille, Taylor, CVPR 2018]
19
[Baradel, Wolf, Mille, Taylor, CVPR 2018]
Frame context Hidden state from recurrent recognizers (workers) "Differentiable crop » (Spatial Transformer Network)
T i m e
20
Distributed tracking/recognition RGB input video
Spatial Attention process
Time
Unconstrained Attention in feature space +
3D Global model: Inflated Resnet 50
Time
Distributed tracking/recognition
Workers
+
r1
+
r2 r3
h
21
22
[Baradel, Wolf, Mille, Taylor (under review]
CNN
T i m e
SOTA results on two datasets NTU and N-UCLA Larger difference between Glimpse clouds and global model on N-UCLA
[Baradel, Wolf, Mille, Taylor, CVPR 2018]
23
[Baradel, Wolf, Mille, Taylor (under review]
[Baradel, Wolf, Mille, Taylor, CVPR 2018]
24
[Baradel, Wolf, Mille, Taylor, BMVC 2018]
25
Border cells Head direction
26
27
[Cueva, Wei, ICLR 2018]
28
[Cueva, Wei, ICLR 2018]
29
30
31
32
33
22 22
Estimating semantics from low level information (Vision & Learning) Estimating causal relationships from data Reasoning: Logic + Statistics
34
Fabien Baradel Phd @ LIRIS, INSA-Lyon Christian Wolf INRIA Chroma Julien Mille LI, INSA VdL Natalia Neverova Facebook AI Research, Paris Greg Mori Simon Fraser University, Canada [Baradel, Neverova, Wolf, Mille, Mori, ECCV 2018]
35
[Baradel, Neverova, Wolf, Mille, Mori, ECCV 2018]
36
[Baradel, Neverova, Wolf, Mille, Mori, ECCV 2018]
37
38
39
Something-something dataset VLOG dataset EPIC Kitchen dataset
40
– a cloud of unconstrained feature points – Interactions between spatially well defined objects