[PPT] - Deep Convolutional Poses for Human Interaction Recognition in PowerPoint Presentation

SLIDE 1

Deep Convolutional Poses for Human Interaction Recognition in Monocular Videos

Marcel Sheeny de Moraes

Supervisor: Neil Robertson

SLIDE 2

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Outline

Introduction
Related Works
Methodology
Results
Conclusion and Future Works

Introduction Related Works Methodology Results Conclusion and Future Works

2

SLIDE 3

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Outline

Introduction
Related Works
Methodology
Results
Conclusion and Future Works

Introduction Related Works Methodology Results Conclusion and Future Works

3

SLIDE 4

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Introduction

Human Interaction Recognition
Surveillance
Human-Computer Interaction
Automatic Video labeling

Introduction Related Works Methodology Results Conclusion and Future Works

Hand shake High five Kicking

4

SLIDE 5

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Introduction

Goal of the project
To use Human Pose Estimation to recognize the Human Interaction in

Monocular videos (RGB).

Introduction Related Works Methodology Results Conclusion and Future Works

5

SLIDE 6

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Outline

Introduction
Related Works
Methodology
Results
Conclusion and Future Works

Introduction Related Works Methodology Results Conclusion and Future Works

6

SLIDE 7

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Related Works

Park and Arggawal, (Multimedia systems 2004)
Ellipse and Convex Hull features.
Hierarchical Bayesian Network.
9 types of interactions.
78% of accuracy.

Introduction Related Works Methodology Results Conclusion and Future Works

7

SLIDE 8

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Related Works

Yun, et al., (CVPR 2012)
Depth camera (kinect) to estimate

the human pose.

6 features from human pose.
Multiple Instance Learning.
8 types of interactions.
80% of accuracy using 3 frames.
91% of accuracy whole sequence.

Introduction Related Works Methodology Results Conclusion and Future Works

8

SLIDE 9

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Related Works

Hu, et al., (ECCV 2014)
Yun, et al. (2012) dataset.
Positive actor features.
Hidden Markov Model to classify.
76.2% of accuracy per frame and

83.3% for the whole sequence.

Zhu, et al., (AAAI 2016)
Yun, et al. (2012) dataset.
Deep LSTM network to recognize

the interaction using the human pose estimation.

90.41% of accuracy.

Introduction Related Works Methodology Results Conclusion and Future Works

9

SLIDE 10

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Benchmark

Benchmark for the Two-Person Interaction dataset.

Introduction Related Works Methodology Results Conclusion and Future Works

10

Method Per frame Whole sequence Yun, et al. (2012) 80.30% 91.10% Hu, et al. (2014) 76.1% 83.33% Zhu, et al. (2015) 90.41%

SLIDE 11

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Outline

Introduction
Related Works
Methodology
Results
Conclusion and Future Works

Introduction Related Works Methodology Results Conclusion and Future Works

11

SLIDE 12

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Methodology

Introduction Related Works Methodology Results Conclusion and Future Works

12

SLIDE 13

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Dataset

Introduction Related Works Methodology Results Conclusion and Future Works

Two-person Interaction Detection Using Body-Pose Features (Yun, et al., 2012).
8 interactions: approaching, departing, pushing, kicking, punching,

exchanging objects, hugging, and shaking hands.

282 samples of interactions.

Kicking Punching Hugging Shaking Hands

13

SLIDE 14

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Methodology

Introduction Related Works Methodology Results Conclusion and Future Works

14

SLIDE 15

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Person Detection

Introduction Related Works Methodology Results Conclusion and Future Works

“Faster R-CNN: Towards Real-Time Object Detection with

Region Proposal Networks”, by Ren, et al (2015)

State-of-the-art method for object detection.
0.1 s to detect the person in each image.
Very Deep Convolutional Neural Network (VGG-16).

15

SLIDE 16

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Person Detection Results

Introduction Related Works Methodology Results Conclusion and Future Works

16

SLIDE 17

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Methodology

Introduction Related Works Methodology Results Conclusion and Future Works

17

SLIDE 18

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Multi-Person Tracking

Introduction Related Works Methodology Results Conclusion and Future Works

Kalman Filter with a linear motion model.
Hungarian Algorithm is used to assign detections and

predictions.

Threshold methods are used to decide new/lost tracks.

18

SLIDE 19

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Methodology

Introduction Related Works Methodology Results Conclusion and Future Works

19

SLIDE 20

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Human Pose Estimation

“Convolutional Pose Machines”, by Wei, et al (CVPR 2016)
12 hierarchical Deep Convolutional Neural Networks (DCNN).
12 different size inputs.
Current state-of-the-art for human pose estimation.

Introduction Related Works Methodology Results Conclusion and Future Works

PCKh @ 0.2 PC Benchmark using LSP dataset

20

Method Head Shoulder Elbow Wrist Hip Knee Ankle Total AUC Pischulin, et al., ICCV’13 87.2 56.7 46.7 38.9 61.0 57.5 52.7 57.1 35.8 Chen and Yulle, NIPS’14 91.8 78.2 71.8 65.5 73.3 70.2 63.4 73.4 40.1 Carreira, et al., CVPR’16 90.5 81.8 65.8 59.8 81.6 70.6 62.0 73.1 41.5 Fan et al., CVPR’15 92.4 75.2 65.3 64.0 75.7 68.3 70.4 73.0 42.2 Tompson, et al., NIPS’14 90.6 79.2 67.9 63.4 69.5 71.0 64.2 72.3 47.3 Yang, et al., CVPR’16 90.6 78.1 73.8 68.8 74.8 69.9 58.9 73.6 39.3 Pischulin, et al., CVPR’16 97.0 91.0 83.8 78.1 91.0 86.7 82.0 87.1 63.5 Wei, et al., CVPR 97.8 92.5 87.0 83.9 91.5 90.8 89.9 90.5 65.4

SLIDE 21

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Human Pose Estimation

“Convolutional Pose Machines”, by Wei, et al (CVPR 2016)

Introduction Related Works Methodology Results Conclusion and Future Works

21

SLIDE 22

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Results

Introduction Related Works Methodology Results Conclusion and Future Works

22

SLIDE 23

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Results

Introduction Related Works Methodology Results Conclusion and Future Works

23

https://www.youtube.com/watch?v=llLj50gE9GI

SLIDE 24

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Methodology

Introduction Related Works Methodology Results Conclusion and Future Works

24

SLIDE 25

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Feature Extraction

6 types of features were used
XY Joint Position (XY)
Distances from Related Joints (DRJ)
Distances from One Joint (DOJ)
Absolute Difference (AD)
Joint Angles (JA)
Velocity (VEL)

Introduction Related Works Methodology Results Conclusion and Future Works

25

SLIDE 26

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Joint Position and Distance from Related Joints

Raw Joint Position (XY)

𝐺 𝑘 = 𝑄(𝑘)

Distance Related Joints (DRJ)

𝐺 𝑘 = | 𝑄

1 𝑘 − 𝑄2 𝑘 |

Introduction Related Works Methodology Results Conclusion and Future Works

26

SLIDE 27

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Distance from One Joint and Absolute Difference

Distance from One Joint (DOJ)

𝐺 𝑘1, 𝑘2 = | 𝑄

1 𝑘1 − 𝑄2 𝑘2 |

Absolute difference (AD)

𝐺 𝑘 = |𝑄

1 𝑘 − 𝑄2 𝑘 |

Introduction Related Works Methodology Results Conclusion and Future Works

27

SLIDE 28

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Joint Angles and Velocity

Joint Angles (JA)

𝐺 𝑘1 , 𝑘2 = 𝑢𝑏𝑜−1

𝑄𝑧 𝑘1 −𝑄𝑧(𝑘2) 𝑄𝑦 𝑘1 −𝑄𝑦(𝑘2)

Velocity (VEL)

𝐺(𝑘, 𝑢1 , 𝑢2) = 𝑄 𝑘, 𝑢1 − 𝑄(𝑘, 𝑢2)

Introduction Related Works Methodology Results Conclusion and Future Works

28

SLIDE 29

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Methodology

Introduction Related Works Methodology Results Conclusion and Future Works

29

SLIDE 30

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Interaction Classification

Support Vector Machine
Grid Search to find best parameters.
Parameters were C-SVC, Radial basis ɣ = 0.0625 and c = 8.

Introduction Related Works Methodology Results Conclusion and Future Works

30

SLIDE 31

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Outline

Introduction
Related Works
Methodology
Results
Conclusion and Future Works

Introduction Related Works Methodology Results Conclusion and Future Works

31

SLIDE 32

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Results

2 methods of evaluation:
Per frame evaluation.
A
A
A
A
Whole sequence evaluation.
5-fold cross validation to evaluation the results.

Introduction Related Works Methodology Results Conclusion and Future Works

32

SLIDE 33

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Results

Number of frames normalization
Check influence of different number of frames for both evaluations.
Only XY position used as features.

Introduction Related Works Methodology Results Conclusion and Future Works

9 13

87.56% 80.67%

33

SLIDE 34

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Influence of each feature

Introduction Related Works Methodology Results Conclusion and Future Works

XY: raw X and Y positions, DRJ: Distance from Related Joints, DOJ: Distance from One Joint, JA: Joint Angles, AD: Absolute difference,VEL: velocity

34

SLIDE 35

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Confusion Matrices

Introduction Related Works Methodology Results Conclusion and Future Works

Per frame conf matrix: 81.75% of acc Whole sequence conf matrix: 87.56% of acc

35

SLIDE 36

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Results Comparison table

Introduction Related Works Methodology Results Conclusion and Future Works

36

Method Per frame Whole sequence Yun, et al (2012) 80.30% 91.10% Hu, et al (2014) 76.1% 83.33% Zhu, et al (2015) 90.41%

My Method (just uses RGB information)

81.75% 87.56%

SLIDE 37

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Outline

Introduction
Related Works
Methodology
Results
Conclusion and Future Works

Introduction Related Works Methodology Results Conclusion and Future Works

37

SLIDE 38

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Conclusions

This work used recent development of DCNN to achieve

those results.

The method achieved 87.56% of accuracy which is only

3.54% worse than in the method developed by Yun, et al. which uses a depth camera to capture the human pose estimation.

This work showed that retrieving the human pose using

DCNN from an RGB camera to recognize the interaction between two persons can be as effective as depth cameras.

Introduction Related Works Methodology Results Conclusion and Future Works

38

SLIDE 39

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Future Works

Track joints using temporal information.
Create more features based on the pose.
Hidden Markov Model.
Deep LSTM network.
Create a more robust approach for more than 2 persons.

Introduction Related Works Methodology Results Conclusion and Future Works

39

SLIDE 40