Deep Convolutional Poses for Human Interaction Recognition in - - PowerPoint PPT Presentation

deep convolutional poses for human
SMART_READER_LITE
LIVE PREVIEW

Deep Convolutional Poses for Human Interaction Recognition in - - PowerPoint PPT Presentation

Deep Convolutional Poses for Human Interaction Recognition in Monocular Videos Marcel Sheeny de Moraes Supervisor: Neil Robertson HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Outline


slide-1
SLIDE 1

Deep Convolutional Poses for Human Interaction Recognition in Monocular Videos

Marcel Sheeny de Moraes

Supervisor: Neil Robertson

slide-2
SLIDE 2

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Outline

  • Introduction
  • Related Works
  • Methodology
  • Results
  • Conclusion and Future Works

Introduction Related Works Methodology Results Conclusion and Future Works

2

slide-3
SLIDE 3

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Outline

  • Introduction
  • Related Works
  • Methodology
  • Results
  • Conclusion and Future Works

Introduction Related Works Methodology Results Conclusion and Future Works

3

slide-4
SLIDE 4

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Introduction

  • Human Interaction Recognition
  • Surveillance
  • Human-Computer Interaction
  • Automatic Video labeling

Introduction Related Works Methodology Results Conclusion and Future Works

Hand shake High five Kicking

4

slide-5
SLIDE 5

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Introduction

  • Goal of the project
  • To use Human Pose Estimation to recognize the Human Interaction in

Monocular videos (RGB).

Introduction Related Works Methodology Results Conclusion and Future Works

5

slide-6
SLIDE 6

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Outline

  • Introduction
  • Related Works
  • Methodology
  • Results
  • Conclusion and Future Works

Introduction Related Works Methodology Results Conclusion and Future Works

6

slide-7
SLIDE 7

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Related Works

  • Park and Arggawal, (Multimedia systems 2004)
  • Ellipse and Convex Hull features.
  • Hierarchical Bayesian Network.
  • 9 types of interactions.
  • 78% of accuracy.

Introduction Related Works Methodology Results Conclusion and Future Works

7

slide-8
SLIDE 8

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Related Works

  • Yun, et al., (CVPR 2012)
  • Depth camera (kinect) to estimate

the human pose.

  • 6 features from human pose.
  • Multiple Instance Learning.
  • 8 types of interactions.
  • 80% of accuracy using 3 frames.
  • 91% of accuracy whole sequence.

Introduction Related Works Methodology Results Conclusion and Future Works

8

slide-9
SLIDE 9

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Related Works

  • Hu, et al., (ECCV 2014)
  • Yun, et al. (2012) dataset.
  • Positive actor features.
  • Hidden Markov Model to classify.
  • 76.2% of accuracy per frame and

83.3% for the whole sequence.

  • Zhu, et al., (AAAI 2016)
  • Yun, et al. (2012) dataset.
  • Deep LSTM network to recognize

the interaction using the human pose estimation.

  • 90.41% of accuracy.

Introduction Related Works Methodology Results Conclusion and Future Works

9

slide-10
SLIDE 10

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Benchmark

  • Benchmark for the Two-Person Interaction dataset.

Introduction Related Works Methodology Results Conclusion and Future Works

10

Method Per frame Whole sequence Yun, et al. (2012) 80.30% 91.10% Hu, et al. (2014) 76.1% 83.33% Zhu, et al. (2015) 90.41%

slide-11
SLIDE 11

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Outline

  • Introduction
  • Related Works
  • Methodology
  • Results
  • Conclusion and Future Works

Introduction Related Works Methodology Results Conclusion and Future Works

11

slide-12
SLIDE 12

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Methodology

Introduction Related Works Methodology Results Conclusion and Future Works

12

slide-13
SLIDE 13

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Dataset

Introduction Related Works Methodology Results Conclusion and Future Works

  • Two-person Interaction Detection Using Body-Pose Features (Yun, et al., 2012).
  • 8 interactions: approaching, departing, pushing, kicking, punching,

exchanging objects, hugging, and shaking hands.

  • 282 samples of interactions.

Kicking Punching Hugging Shaking Hands

13

slide-14
SLIDE 14

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Methodology

Introduction Related Works Methodology Results Conclusion and Future Works

14

slide-15
SLIDE 15

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Person Detection

Introduction Related Works Methodology Results Conclusion and Future Works

  • “Faster R-CNN: Towards Real-Time Object Detection with

Region Proposal Networks”, by Ren, et al (2015)

  • State-of-the-art method for object detection.
  • 0.1 s to detect the person in each image.
  • Very Deep Convolutional Neural Network (VGG-16).

15

slide-16
SLIDE 16

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Person Detection Results

Introduction Related Works Methodology Results Conclusion and Future Works

16

slide-17
SLIDE 17

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Methodology

Introduction Related Works Methodology Results Conclusion and Future Works

17

slide-18
SLIDE 18

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Multi-Person Tracking

Introduction Related Works Methodology Results Conclusion and Future Works

  • Kalman Filter with a linear motion model.
  • Hungarian Algorithm is used to assign detections and

predictions.

  • Threshold methods are used to decide new/lost tracks.

18

slide-19
SLIDE 19

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Methodology

Introduction Related Works Methodology Results Conclusion and Future Works

19

slide-20
SLIDE 20

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Human Pose Estimation

  • “Convolutional Pose Machines”, by Wei, et al (CVPR 2016)
  • 12 hierarchical Deep Convolutional Neural Networks (DCNN).
  • 12 different size inputs.
  • Current state-of-the-art for human pose estimation.

Introduction Related Works Methodology Results Conclusion and Future Works

PCKh @ 0.2 PC Benchmark using LSP dataset

20

Method Head Shoulder Elbow Wrist Hip Knee Ankle Total AUC Pischulin, et al., ICCV’13 87.2 56.7 46.7 38.9 61.0 57.5 52.7 57.1 35.8 Chen and Yulle, NIPS’14 91.8 78.2 71.8 65.5 73.3 70.2 63.4 73.4 40.1 Carreira, et al., CVPR’16 90.5 81.8 65.8 59.8 81.6 70.6 62.0 73.1 41.5 Fan et al., CVPR’15 92.4 75.2 65.3 64.0 75.7 68.3 70.4 73.0 42.2 Tompson, et al., NIPS’14 90.6 79.2 67.9 63.4 69.5 71.0 64.2 72.3 47.3 Yang, et al., CVPR’16 90.6 78.1 73.8 68.8 74.8 69.9 58.9 73.6 39.3 Pischulin, et al., CVPR’16 97.0 91.0 83.8 78.1 91.0 86.7 82.0 87.1 63.5 Wei, et al., CVPR 97.8 92.5 87.0 83.9 91.5 90.8 89.9 90.5 65.4

slide-21
SLIDE 21

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Human Pose Estimation

  • “Convolutional Pose Machines”, by Wei, et al (CVPR 2016)

Introduction Related Works Methodology Results Conclusion and Future Works

21

slide-22
SLIDE 22

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Results

Introduction Related Works Methodology Results Conclusion and Future Works

22

slide-23
SLIDE 23

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Results

Introduction Related Works Methodology Results Conclusion and Future Works

23

https://www.youtube.com/watch?v=llLj50gE9GI

slide-24
SLIDE 24

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Methodology

Introduction Related Works Methodology Results Conclusion and Future Works

24

slide-25
SLIDE 25

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Feature Extraction

  • 6 types of features were used
  • XY Joint Position (XY)
  • Distances from Related Joints (DRJ)
  • Distances from One Joint (DOJ)
  • Absolute Difference (AD)
  • Joint Angles (JA)
  • Velocity (VEL)

Introduction Related Works Methodology Results Conclusion and Future Works

25

slide-26
SLIDE 26

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Joint Position and Distance from Related Joints

  • Raw Joint Position (XY)

𝐺 𝑘 = 𝑄(𝑘)

  • Distance Related Joints (DRJ)

𝐺 𝑘 = | 𝑄

1 𝑘 − 𝑄2 𝑘 |

Introduction Related Works Methodology Results Conclusion and Future Works

26

slide-27
SLIDE 27

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Distance from One Joint and Absolute Difference

  • Distance from One Joint (DOJ)

𝐺 𝑘1, 𝑘2 = | 𝑄

1 𝑘1 − 𝑄2 𝑘2 |

  • Absolute difference (AD)

𝐺 𝑘 = |𝑄

1 𝑘 − 𝑄2 𝑘 |

Introduction Related Works Methodology Results Conclusion and Future Works

27

slide-28
SLIDE 28

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Joint Angles and Velocity

  • Joint Angles (JA)

𝐺 𝑘1 , 𝑘2 = 𝑢𝑏𝑜−1

𝑄𝑧 𝑘1 −𝑄𝑧(𝑘2) 𝑄𝑦 𝑘1 −𝑄𝑦(𝑘2)

  • Velocity (VEL)

𝐺(𝑘, 𝑢1 , 𝑢2) = 𝑄 𝑘, 𝑢1 − 𝑄(𝑘, 𝑢2)

Introduction Related Works Methodology Results Conclusion and Future Works

28

slide-29
SLIDE 29

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Methodology

Introduction Related Works Methodology Results Conclusion and Future Works

29

slide-30
SLIDE 30

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Interaction Classification

  • Support Vector Machine
  • Grid Search to find best parameters.
  • Parameters were C-SVC, Radial basis ɣ = 0.0625 and c = 8.

Introduction Related Works Methodology Results Conclusion and Future Works

30

slide-31
SLIDE 31

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Outline

  • Introduction
  • Related Works
  • Methodology
  • Results
  • Conclusion and Future Works

Introduction Related Works Methodology Results Conclusion and Future Works

31

slide-32
SLIDE 32

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Results

  • 2 methods of evaluation:
  • Per frame evaluation.
  • A
  • A
  • A
  • A
  • Whole sequence evaluation.
  • 5-fold cross validation to evaluation the results.

Introduction Related Works Methodology Results Conclusion and Future Works

32

slide-33
SLIDE 33

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Results

  • Number of frames normalization
  • Check influence of different number of frames for both evaluations.
  • Only XY position used as features.

Introduction Related Works Methodology Results Conclusion and Future Works

9 13

87.56% 80.67%

33

slide-34
SLIDE 34

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Influence of each feature

Introduction Related Works Methodology Results Conclusion and Future Works

XY: raw X and Y positions, DRJ: Distance from Related Joints, DOJ: Distance from One Joint, JA: Joint Angles, AD: Absolute difference,VEL: velocity

34

slide-35
SLIDE 35

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Confusion Matrices

Introduction Related Works Methodology Results Conclusion and Future Works

Per frame conf matrix: 81.75% of acc Whole sequence conf matrix: 87.56% of acc

35

slide-36
SLIDE 36

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Results Comparison table

Introduction Related Works Methodology Results Conclusion and Future Works

36

Method Per frame Whole sequence Yun, et al (2012) 80.30% 91.10% Hu, et al (2014) 76.1% 83.33% Zhu, et al (2015) 90.41%

  • My Method (just uses RGB information)

81.75% 87.56%

slide-37
SLIDE 37

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Outline

  • Introduction
  • Related Works
  • Methodology
  • Results
  • Conclusion and Future Works

Introduction Related Works Methodology Results Conclusion and Future Works

37

slide-38
SLIDE 38

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Conclusions

  • This work used recent development of DCNN to achieve

those results.

  • The method achieved 87.56% of accuracy which is only

3.54% worse than in the method developed by Yun, et al. which uses a depth camera to capture the human pose estimation.

  • This work showed that retrieving the human pose using

DCNN from an RGB camera to recognize the interaction between two persons can be as effective as depth cameras.

Introduction Related Works Methodology Results Conclusion and Future Works

38

slide-39
SLIDE 39

VIVA 15th June 2016 - Marcel Sheeny de Moraes

HERIOT-WATT

UNIVERSITY

/39

Future Works

  • Track joints using temporal information.
  • Create more features based on the pose.
  • Hidden Markov Model.
  • Deep LSTM network.
  • Create a more robust approach for more than 2 persons.

Introduction Related Works Methodology Results Conclusion and Future Works

39

slide-40
SLIDE 40

Thank you!