Structured Deep Learning of Human Motion Christian Wolf Fabien - PowerPoint PPT Presentation

Structured Deep Learning of Human Motion Christian Wolf Fabien Baradel Natalia Neverova Julien Mille Graham W. Taylor Greg Mori 1

Deep Learning of Human Motion Gesture re Ge recognition Re Recognition of in indiv ivid idual act activities es & & interactions Re Recognition of group act activities es Pose estimation Po 2

[Neverova, Wolf, Taylor, Nebout. CVIU 2017] 3 3

Combining real and simulated data Joint positions (NYU Dataset) Synthetic data (part segmentation) Natalia Neverova Christian Wolf Graham W. Taylor Florian Nebout LIRIS University of Guelph Awabot Phd @ LIRIS, INSA-Lyon Canada Now at Facebook 4

Semantic Segmentation with GridNetworks [Fourure, Emonet, Fromont, Muselet, Tremeau, Wolf, BMVC 2017] Damien Fourure E. Fromont, R. Emonet, A. Trémeau, D. Muselet, C. Wolf 5

Activity recognition Un Unconstrained in internet/yo youtube vi videos No No acquisition E.g. Youtube-8M dataset: 7M videos, 4716 classes, ~3.4 labels per video. > 1PB of data. Vide Vi deos wi with hum human an act activities es, , fr from yo youtube No acquisition No E.g. ActivityNet/Kinetics dataset: ~300k videos, 400 classes. Hu Human act activities es sh shot wi with de dept pth se senso sors Ac Acquisition is is ti time cons consum uming ng! E.g. NTU RGB-D dataset, MSR dataset, ChaLearn/Montalbano dataset, etc. 6

Deep Learning (Global) (Mostly after 2012) Deep Learning is mostly based on global models. [Baccouche, Mamalet, Wolf, Garcia, [Ji et al., ICML 2010] Baskur, HBU 2011] [Carreira and Zisserman, CVPR 2017] [Baccouche, Mamalet, Wolf, Garcia, Baskur, BMVC 2012] 7 7

The role of articulated pose 8 Reading Writing 8

The role of articulated pose 9 Appearance is helpful [Neverova, Wolf, Taylor, Nebout, PAMI 2016] [Baradel, Wolf, Mille, Taylor, BMVC 2018] Reading Writing 9

Context We need put attention to places which are not always determined by pose 10

Context We need put attention to places which are not always determined by pose 11

Context Frame from the NTU RGB-D Dataset 12

Local representations (Before 2012) Im Images, o , objects ts and act activities es have often been represented as collections of local features, e.g. through DPMs. [Felzenszwalb et al., PAMI 2010] Local appearance Deformation 13 5

Structured Deep Learning Deep Learning Visual recognition (activities, gestures, Representation objects) learning Local context Complex relationships, Global context Structured Structured and deep learning semi-structured models F 2 1 F F 4 l 2 1 l l 4 14

Human attention: gaze patterns [Johansson, Holsanova, Dewhurst, Holmqvist, 2012] 15

Local representations (Before 2012) Deep Learning (Local representations) Deep Learning (Global) Deep Learning (attention maps) (Mostly after 2012) (~2016) Hard attention Attention on joints Soft attention in feature maps [Sharma et al., ICLR 2016] [Mnih et al., NIPS 2015] [Song et al., AAAI 2016] 16

Local representations (Before 2012) Deep Learning (Local representations) Deep Learning (Global) Deep Learning (attention maps) (Mostly after 2012) (~2016) Objective: fully trainable high-capacity local representations 1. Learn where to attend 2. Learn how to track attended points 3. Learn how to recognize from a local distributed representation v 3,1 Recognize activity [Baradel, Wolf, Mille, Taylor, CVPR 2018] 17 9

Attention in feature space Time Time 3D Global model: Inflated Resnet 50 RGB input video Feature space [Baradel, Wolf, Mille, Taylor, CVPR 2018] 18

Unconstrained differentiable attention e m i T Hidden state from recurrent recognizers (workers) Frame context "Differentiable crop » [Baradel, Wolf, Mille, Taylor, (Spatial Transformer Network) CVPR 2018] 19

Distributed recognition Workers r 1 + Time Time + 3D + Global r 3 r 2 model: Inflated Spatial Resnet Attention 50 h process RGB input video Distributed Distributed Unconstrained Attention tracking/recognition tracking/recognition in feature space 20

Results 21

State-of-the-art comparaison Dynamic visual attention Unstructured Glimpse Cloud e m i T CNN Recognition SOTA results on two datasets NTU and N-UCLA Larger difference between Glimpse clouds and global model on N-UCLA [Baradel, Wolf, Mille, Taylor, CVPR 2018] [Baradel, Wolf, Mille, Taylor (under review] 22

Results Ablation study [Baradel, Wolf, Mille, Taylor (under review] [Baradel, Wolf, Mille, Taylor, CVPR 2018] 23

Pose conditioned attention [Baradel, Wolf, Mille, Taylor, BMVC 2018] 24

AI vs. NI 2014 Nobel Prize in Medecine Head direction Border cells 25

AI vs. NI 2014 Nobel Prize in Medecine 26

AI vs. NI 2018 : discoverty of the same cells in neural networks trained on similar tasks. [Cueva, Wei, ICLR 2018] 27

AI vs. NI Emergence of the different types of cells in the same order. [Cueva, Wei, ICLR 2018] 28

Reasoning : what happened? 29

Human psychology - Daniel Kahnemann (Nobel prize in 2002) - Book: "Thinking Fast and Slow" 30

Cognitive tasks 24*17 = ? 31

Two systems Sy System em 1 - Continuously monitors environment (and mind) - No specific attention - Continuously generates assessments / judgments w/o efforts, even in the presence of low data. Jumps to conclusions - Prone to errors. No capabilities for statistics Sy System em 2 - Receives questions or generates them - Directs attention and searches memory to find answers - Requires (eventually a lot of) effort - More reliable 32

Where is ML today? Claim: AI requires a combination of - Extraction of high-level information from high- dimensional input (visual, audio, language): ma machine le learnin ing - High-level reasoning: com compar pare, e, as asses ess, focus ocus at attent ention on, , perform lo logic ical l deductio ions Roadmap: Estimating semantics from low level information (Vision & Learning) Estimating causal relationships from data Reasoning: Logic + Statistics 33 22 22

Object level Visual Reasoning [Baradel, Neverova, Wolf, Mille, Mori, ECCV 2018] Fabien Baradel Christian Julien Mille Greg Mori Natalia Neverova Phd @ LIRIS, Wolf LI, INSA VdL Simon Facebook AI INRIA INSA-Lyon Research, Paris Fraser Chroma University, Canada 34

Object level Visual Reasoning [Baradel, Neverova, Wolf, Mille, Mori, ECCV 2018] 35

Object level Visual Reasoning [Baradel, Neverova, Wolf, Mille, Mori, ECCV 2018] 36

Learned interactions Class: person-book interaction 37

Failure cases 38

Results Something-something dataset VLOG dataset EPIC Kitchen dataset 39

Conclusion - We propose a models which recognize activities from – a cloud of unconstrained feature points – Interactions between spatially well defined objects - Visual spatial attention is useful and competitive compared to pose - State of the art performance on 5 datasets (NTU RGB-D, Northwestern UCLA, VLOG, Something-Something, Epic Kitchen) - Reasoning is key component of human cognition, also important for IA systems 40

Structured Deep Learning of Human Motion Christian Wolf Fabien - PowerPoint PPT Presentation

Structured Deep Learning of Human Motion Christian Wolf Fabien Baradel Natalia Neverova Julien Mille Graham W. Taylor Greg Mori 1 Deep Learning of Human Motion Gesture re Ge recognition Re Recognition of in indiv ivid idual act

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Visual Motion Motion illusions Uses for motion cues Optic flow Motion blindness

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Motion Aftereffects Without Motion: Engaging the Human Motion Perception System With Still

Motion Estimation for Video Coding Motion-Compensated Prediction Bit Allocation Motion

Forces and Motion Click on the topic to go to that section Motion Motion Graphs of Motion

Forces and Motion Click on the topic to go to that section Motion Motion Graphs of Motion

Learning to Synthesize Motion Blur CVPR 2019 Tim Brooks and Jon Barron Research Motion During

Simple Harmonic Motion (SHM) Slide 2 / 67 SHM and Uniform Circular Motion There is a deep

Simple Harmonic Motion Slide 2 / 70 SHM and Circular Motion There is a deep connection between

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Motion Capture Specialized Motion Capture N. Alberto Borghese Laboratory of Human Motion

Motion in Photography Freeze Motion / Blur Motion Objective The student will create freeze

FYSM 1000-04: science & environmental commmunication University of Colorado-Boulder Fall

100Days Of Hands Video Dataset 131 Days boardgame diy drink food 12 Categories 27.3K Videos

Lecture 1 Basics for Machine Learning and A Special Emphasis on CNN Lin ZHANG, PhD School of

Exploring semantically-related concepts from Wikipedia: the case of SeRE Daniel Hienert, Dennis

How T To M Make Y Your Course I Interactive and E Engaging Stephen Murgatroyd, PhD Chief

THE POWER OF THE TERMINATING CHASE Markus Krtzsch Maximilian Marx Sebastian Rudolph TU

VHDL Simulation Using Mentor Graphics Modelsim SE VHDL Simulation Tools Mentor Graphics

Rotating neutron stars with nonbarotropic thermal profile Giovanni Camelio with Tim Dietrich,

Structured Deep Learning of Human Motion Christian Wolf Fabien - PowerPoint PPT Presentation

Structured Deep Learning of Human Motion Christian Wolf Fabien Baradel Natalia Neverova Julien Mille Graham W. Taylor Greg Mori 1 Deep Learning of Human Motion Gesture re Ge recognition Re Recognition of in indiv ivid idual act

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Visual Motion Motion illusions Uses for motion cues Optic flow Motion blindness

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Motion Aftereffects Without Motion: Engaging the Human Motion Perception System With Still

Motion Estimation for Video Coding Motion-Compensated Prediction Bit Allocation Motion

Forces and Motion Click on the topic to go to that section Motion Motion Graphs of Motion

Forces and Motion Click on the topic to go to that section Motion Motion Graphs of Motion

Learning to Synthesize Motion Blur CVPR 2019 Tim Brooks and Jon Barron Research Motion During

Simple Harmonic Motion (SHM) Slide 2 / 67 SHM and Uniform Circular Motion There is a deep

Simple Harmonic Motion Slide 2 / 70 SHM and Circular Motion There is a deep connection between

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Motion Capture Specialized Motion Capture N. Alberto Borghese Laboratory of Human Motion

Motion in Photography Freeze Motion / Blur Motion Objective The student will create freeze

FYSM 1000-04: science &amp; environmental commmunication University of Colorado-Boulder Fall

100Days Of Hands Video Dataset 131 Days boardgame diy drink food 12 Categories 27.3K Videos

Lecture 1 Basics for Machine Learning and A Special Emphasis on CNN Lin ZHANG, PhD School of

Exploring semantically-related concepts from Wikipedia: the case of SeRE Daniel Hienert, Dennis

How T To M Make Y Your Course I Interactive and E Engaging Stephen Murgatroyd, PhD Chief

THE POWER OF THE TERMINATING CHASE Markus Krtzsch Maximilian Marx Sebastian Rudolph TU

VHDL Simulation Using Mentor Graphics Modelsim SE VHDL Simulation Tools Mentor Graphics

Rotating neutron stars with nonbarotropic thermal profile Giovanni Camelio with Tim Dietrich,

FYSM 1000-04: science & environmental commmunication University of Colorado-Boulder Fall