GESTURE RECOGNITION WITH 3D CNNS Pavlo Molchanov 4/6/2016 Xiaodong - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley GESTURE RECOGNITION WITH 3D CNNS Pavlo Molchanov 4/6/2016 Xiaodong Yang Shalini Gupta Kihwan Kim Stephen Tyree Jan Kautz

Motivation Problem statement AGENDA Selecting the best classifier Online gesture detection and classification Demos 2

MOTIVATION 3

GESTURE IS NATURAL FORM OF COMMUNICATION 4 photo.elsoar.com

SAFE INTERFACES 5 @ bmw.com

IN NEED FOR VIDEO RELAY SERVICES 6 @ http://relayservice.gov.au/

GAMMING @ leapmotion 7

PROBLEM STATEMENT 8

PROBLEM STATEMENT No special devices Single commodity sensor: • Gesture recognition Skeleton tracking • Kinectv1 • Gaze estimation Head tracking • SoftKinetic 9

PROBLEM STATEMENT Understanding gesture concepts We do: We don’t: Classifier Thumb up Classifier Wave hand Hand model fitting and tracking 10 *http://www.virtualrealityreviewer.com/leap-motion-enters-vr-new-software-product-accessory-preview-what%C2%B9s-next/

PROBLEM STATEMENT Understanding gesture concepts We do: We don’t: Classifier Thumb up ?????? Classifier Wave hand Hand model fitting and tracking 11 *http://www.virtualrealityreviewer.com/leap-motion-enters-vr-new-software-product-accessory-preview-what%C2%B9s-next/

SELECTING THE BEST CLASSIFIER 12

SELECTING THE BEST CLASSIFIER VIVA CHALLENGE 2015 organized by UCLA 19 classes, 8 subjects Driver and passenger RGB + Depth from Microsoft Kinect 885 gestures in total 13

SELECTING THE BEST CLASSIFIER VIVA CHALLENGE 2015 organized by UCLA 19 classes, 8 subjects Driver and passenger RGB + Depth from Microsoft Kinect 885 gestures in total Gesture example: Slide 2 fingers left 14

SELECTING THE BEST CLASSIFIER VIVA CHALLENGE 2015 organized by UCLA 19 classes, 8 subjects Driver and passenger RGB + Depth from Microsoft Kinect 885 gestures in total Gesture example: Zoom out 15

SELECTING THE BEST CLASSIFIER VIVA CHALLENGE 2015 organized by UCLA 19 classes, 8 subjects Driver and passenger RGB + Depth from Microsoft Kinect 885 gestures in total Gesture example: Rotate CCW 16

SELECTING THE BEST CLASSIFIER 3D Convolutional Neural Network ReLU ReLU Prediction RGB Depth 3D convolution 3D convolution 3D convolution 3D convolution Softmax and max-pooling and max-pooling and max-pooling and max-pooling 17

SEGMENTED GESTURE CLASSIFICATION Training Depth error Back RGB 3D CNN propagation update 18

SELECTING THE BEST CLASSIFIER First result HON4D 1 HOG 2 3D-CNN Testing set 58.7% 64.5% 48.3% Training set 99.9% Classification accuracy, higher better 1 Oreifej and Liu. HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences, CVPR, 2013 19 2 Ohn-Bar and Trivedi, IEEE Trans. on Intelligent Transportation Systems, 2014.

SELECTING THE BEST CLASSIFIER VIVA IMAGENET 1.5 M examples 885 examples Recent success in deep learning benefited from large data 20

SELECTING THE BEST CLASSIFIER Training Depth error Back RGB 3D CNN propagation update 21

SELECTING THE BEST CLASSIFIER Training Depth error Data Back RGB 3D CNN augmentation propagation update 22

SELECTING THE BEST CLASSIFIER Data augmentation Original Spatial geometric transformations Temporal augmentation Generating new training data Augmented 23

SELECTING THE BEST CLASSIFIER Data augmentation Spatial geometric transformations Temporal augmentation Generating new training data 29

SELECTING THE BEST CLASSIFIER Data augmentation Spatial geometric transformations Temporal augmentation Generating new training data flip 30

SELECTING THE BEST CLASSIFIER VIVA AUGMENTED 0.3 M examples 885 examples 31

SELECTING THE BEST CLASSIFIER Official challenge results NVIDIA (3D-CNN) No data augmentation 48.3 HOG+HOG2 64.5 HON4D 58.7 Dense Trajectories 54 HOG3D 44.6 Harris-3.5D 36.4 0 10 20 30 40 50 60 70 80 Classification accuracy, higher better 32

SELECTING THE BEST CLASSIFIER Official challenge results with data augmentation 77.5 NVIDIA (3D-CNN) 48.3 HOG+HOG2 64.5 HON4D 58.7 Dense Trajectories 54 HOG3D 44.6 Harris-3.5D 36.4 0 10 20 30 40 50 60 70 80 Classification accuracy, higher better 33

SELECTING THE BEST CLASSIFIER Speed NVIDIA (3D-CNN) 110 GPU +250 cuDNNv4 +400 HOG+HOG2 50 HON4D 25 CPU Dense Trajectories 18 HOG3D 3 Harris-3.5D 0.2 0 100 200 300 400 500 600 700 800 900 FPS, higher better 34

SEGMENTED GESTURE CLASSIFICATION Start of the gesture End of the gesture time Gesture Classification Decision Decision after gesture ends introduces latency 35

ONLINE GESTURE DETECTION AND CLASSIFICATION 36

ONLINE GESTURE CLASSIFICATION Start of the gesture End of the gesture time Gesture Classification Decision Decision before gesture ends improve feedback and user experience 37

ONLINE GESTURE CLASSIFICATION R3DCNN Forward recurrence only Connectionist Temporal Classification (CTC) Detection and classification softmax softmax softmax 109M parameters global motion RNN RNN RNN CTC for training only descriptor local 3D CNN 3D CNN motion descriptor 38 8 frames Video server

ONLINE GESTURE CLASSIFICATION Training loss function Labeling dynamic gestures is difficult Labeling per frame is ambiguous Input: Labels: Loss function: Per frame negative log likelihood 39

ONLINE GESTURE CLASSIFICATION Training loss function Sequence based training is the solution Input: nothing – slide right – nothing – slide left - nothing Sequence: Loss function: Connectionist Temporal Classification (CTC) by A. Graves et al. 40

ONLINE GESTURE CLASSIFICATION Italian sign language recognition Chalearn2014 challenge held in 2014 RGBD videos of 20 Italian sign language 13K gestures 20 subjects 41

ONLINE GESTURE CLASSIFICATION Italian sign language recognition Classification accuracy (%) 35% 98.2 Improvement in accuracy 97.4 97.2 By seeing only 41% Pigou et al.* 3D-CNN 3D-CNN CTC of gesture 42 *L. Pigou et al. Beyond temporal pooling: Recurrence and temporal convolutions for gesture recognition in video

ONLINE GESTURE CLASSIFICATION Italian sign language recognition 35% Improvement in accuracy By seeing only 41% of gesture No pre- or post-processing 43

ONLINE GESTURE CLASSIFICATION Car interfaces In-house database Media player, navigation, phone 20 subjects, 25 gestures More information at CVPR2016 44

ONLINE GESTURE CLASSIFICATION Car interfaces Human 88 In-house database Ours 84 Media player, navigation, phone C3D 79 20 subjects, 25 gestures iDT 73 More information at CVPR2016 SNV 71 Two stream CNN 66 HOG+HOG2 37 25 45 65 85 45

ONLINE GESTURE CLASSIFICATION Latency is critical Suitability of hardware for inference: IMAGE CLASSIFICATION VIDEO CLASSIFICATION CPU CPU GPU GPU 46

ONLINE GESTURE CLASSIFICATION Scalability NVIDIA TX1 - for embedded solutions Credit card GPU in your pocket Our R3DCNN takes only 30% of GPU 47

CONTRIBUTIONS Data augmentation helps a lot to deep learning R3DCNN are the best for sign language and gesture recognition CTC helps a lot for video sequence learning Scalable enough to run on NVIDIA TX1 48

April 4-7, 2016 | Silicon Valley Deep Data CTC Learning Augmentation

April 4-7, 2016 | Silicon Valley THANK YOU JOIN THE NVIDIA DEVELOPER PROGRAM AT developer.nvidia.com/join

GESTURE RECOGNITION WITH 3D CNNS Pavlo Molchanov 4/6/2016 Xiaodong - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley GESTURE RECOGNITION WITH 3D CNNS Pavlo Molchanov 4/6/2016 Xiaodong Yang Shalini Gupta Kihwan Kim Stephen Tyree Jan Kautz Motivation Problem statement AGENDA Selecting the best classifier Online gesture

Gesture Recognition with CNN Ahmed Abdelghany 20 January 2020 Outline Motivation for Gesture

GESTURE SENSORS Microsoft Kinect V1 24M - 2013 Microsoft Kinect V2 20M - 2016 + VR + GESTURE

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

Gesture recognition for Smartphones/Wearables Gestures hands, face, body movements

Gesture Recognition Adrian Kndig adkuendi@student.ethz.ch Datum Informatik II Samstag, 27.

Features, Regions, Gestures: Components of a Generic Gesture Recognition Engine Florian Echtler

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye & Woon Kyoung Sung

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Human Gesture Recognition for Drone Control Drones are cool - Flying is hard 2 Drone

GESTURE RECOGNITION: USING A MULTI SENSOR APPROACH SHALINI GUPTA, PAVLO MOLCHANOV, KIHWAN KIM,

Motion Capturing and Machine Learning for Gesture Recognition Sotiris Manitsaris Centre for

uWave: Accelerometer-based Personalized Gesture Recognition and Its Applications Recognition and

The Nature of Gesture Gestures are expressive, meaningful body motions, i.e., physical

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Texture attribute synthesis and transfer using feed-forward CNNs Thomas Irmer, Tobias Glasmachers,

About this presentation Some basics about human (or interpersonal) communication Automated

Hand Gesture Recognition By Jonathan Pritchard Outline Motivation Methods o Kinematic

Community Fieldwork Presentation ISS Vancouver Team A Jamie Jiang, Leona Sun, Meika Uy, Belinda

Results for the six months ended 30 June 2011 Paul Pindar Chief Executive Highlights Good

Social Emotional Learning & School Counseling At Old Mill School Margaret McClung &

SmartGPA: How Smartphones Can Assess and Predict Academic Performance of College Students Rui

Situational Awareness for Smart City: Opportunities and Challenges Hao Lu |

A Short Introduction to Topological Superconductors --- A Glimpse of Topological Phases of Matter

Sambuz

Useful Links

Newsletter

Mail Us

GESTURE RECOGNITION WITH 3D CNNS Pavlo Molchanov 4/6/2016 Xiaodong - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley GESTURE RECOGNITION WITH 3D CNNS Pavlo Molchanov 4/6/2016 Xiaodong Yang Shalini Gupta Kihwan Kim Stephen Tyree Jan Kautz Motivation Problem statement AGENDA Selecting the best classifier Online gesture

Gesture Recognition with CNN Ahmed Abdelghany 20 January 2020 Outline Motivation for Gesture

GESTURE SENSORS Microsoft Kinect V1 24M - 2013 Microsoft Kinect V2 20M - 2016 + VR + GESTURE

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

Gesture recognition for Smartphones/Wearables Gestures hands, face, body movements

Gesture Recognition Adrian Kndig adkuendi@student.ethz.ch Datum Informatik II Samstag, 27.

Features, Regions, Gestures: Components of a Generic Gesture Recognition Engine Florian Echtler

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye &amp; Woon Kyoung Sung

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Human Gesture Recognition for Drone Control Drones are cool - Flying is hard 2 Drone

GESTURE RECOGNITION: USING A MULTI SENSOR APPROACH SHALINI GUPTA, PAVLO MOLCHANOV, KIHWAN KIM,

Motion Capturing and Machine Learning for Gesture Recognition Sotiris Manitsaris Centre for

uWave: Accelerometer-based Personalized Gesture Recognition and Its Applications Recognition and

The Nature of Gesture Gestures are expressive, meaningful body motions, i.e., physical

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Texture attribute synthesis and transfer using feed-forward CNNs Thomas Irmer, Tobias Glasmachers,

About this presentation Some basics about human (or interpersonal) communication Automated

Hand Gesture Recognition By Jonathan Pritchard Outline Motivation Methods o Kinematic

Community Fieldwork Presentation ISS Vancouver Team A Jamie Jiang, Leona Sun, Meika Uy, Belinda

Results for the six months ended 30 June 2011 Paul Pindar Chief Executive Highlights Good

Social Emotional Learning &amp; School Counseling At Old Mill School Margaret McClung &amp;

SmartGPA: How Smartphones Can Assess and Predict Academic Performance of College Students Rui

Situational Awareness for Smart City: Opportunities and Challenges Hao Lu |

A Short Introduction to Topological Superconductors --- A Glimpse of Topological Phases of Matter

Sambuz

Useful Links

Newsletter

Mail Us

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye & Woon Kyoung Sung

Social Emotional Learning & School Counseling At Old Mill School Margaret McClung &