Gesture Recognition: Hand Pose Estimation Adrian Spurr Ubiquitous - - PowerPoint PPT Presentation

gesture recognition hand pose estimation
SMART_READER_LITE
LIVE PREVIEW

Gesture Recognition: Hand Pose Estimation Adrian Spurr Ubiquitous - - PowerPoint PPT Presentation

Gesture Recognition: Hand Pose Estimation Adrian Spurr Ubiquitous Computing Seminar FS2014 27.05.2014 1 What is hand pose estimation? Input Computer-usable form 2 Augmented Reality Gaming PC Control Robot Control 3 3 Data glove


slide-1
SLIDE 1

Adrian Spurr Ubiquitous Computing Seminar FS2014 27.05.2014

Gesture Recognition: Hand Pose Estimation

1

slide-2
SLIDE 2

What is hand pose estimation?

Input Computer-usable form

2

slide-3
SLIDE 3

3

Augmented Reality Robot Control Gaming PC Control

3

slide-4
SLIDE 4

4

Data glove

  • Utilizes optical flex sensors

to measure finger bending.

  • Advantage: High accuracy,

can provide haptic feedback.

  • Disadvantages: invasive,

long calibration time, unnatural feeling, heavily instrumented.

4

slide-5
SLIDE 5

5

Thanks to cheap depth cameras...

5

RGB Camera Depth Camera

slide-6
SLIDE 6

...and increase in GPU Power

6

slide-7
SLIDE 7

Problems occuring

  • Noisy data

7

  • Segmentation
slide-8
SLIDE 8

Problems occuring

  • Self-occlusion and viewpoint change:

8

slide-9
SLIDE 9

Problems occuring

  • 27 Degrees of freedom per hand -> 280 trillion hand poses:

9

slide-10
SLIDE 10

Problems occuring

  • Performance: For practical use, must be real time.

10

slide-11
SLIDE 11

Principle of operation

Algorithm

11

slide-12
SLIDE 12

Existing schools of thought

  • Model-based:

 Keeps internally track of

current pose.

 Updates pose according

to current pose and

  • bservation.
  • Discriminative:

 Maps directly from

  • bservation to pose.

 “Learn” from training data

and apply knowledge to unseen data.

12

Processing

slide-13
SLIDE 13

Short intro to Random Forests

 Ensemble learning  Classification and Regression  Consists of decision trees

A decision tree:

13

slide-14
SLIDE 14

Short intro to Random Forests

14

Data in feature space

Features = «Properties» of data

slide-15
SLIDE 15

Short intro to Random Forests

15

Data in feature space

Features = «Properties» of data

slide-16
SLIDE 16

Short intro to Random Forests

16

Data in feature space

Features = «Properties» of data

slide-17
SLIDE 17

Short intro to Random Forests

17

Data in feature space

Features = «Properties» of data

slide-18
SLIDE 18

Short intro to Random Forests

18

Data in feature space

Features = «Properties» of data

slide-19
SLIDE 19

Building a classification tree

19

slide-20
SLIDE 20

Building a classification tree

20

slide-21
SLIDE 21

Building a classification tree

21

slide-22
SLIDE 22

Random feature sampling

Choose 𝑈

𝑘 which splits the data with maximum information gain.

22

slide-23
SLIDE 23

Bagging

23

slide-24
SLIDE 24

Prediction

24

slide-25
SLIDE 25

RF for pose estimation

Why Random Forests?

  • Robust
  • Fast
  • Thorougly studied

How should we use them?

  • Must choose what to split on.
  • What should the labels be?

25

slide-26
SLIDE 26

Advanced body pose recognition

26

[Shotton2011]

slide-27
SLIDE 27

Advanced body pose recognition

 Discriminative approach.  Used in the Kinect.  First paper to use synthetic training data.  Basis for many future papers.

27

[Shotton2011]

slide-28
SLIDE 28

Creating synthetic data

28

[Shotton2011]

slide-29
SLIDE 29

Split funtion

29

: Depth at position x

[Shotton2011]

slide-30
SLIDE 30

Joint prediction

30

[Shotton2011]

slide-31
SLIDE 31

31

Per-class accuracy vs. tree depth

  • Accuracy increases as depth
  • f tree increases.
  • Overfitting occurs for 15k

training images.

  • More training images leads

to higher accuracy and less

  • verfitting.

31

[Shotton2011]

slide-32
SLIDE 32

Negative Results

  • Failure due to self-occlusion:
  • Failure due to unseen pose:

32

[Shotton2011]

slide-33
SLIDE 33

Unresolved issues

  • To capture all possible poses, need to generate huge amount
  • f training data.
  • Training RF on big training set means more trees and deeper

trees.

  • Big amount of memory needed.

33

slide-34
SLIDE 34

Unresolved issues

  • To capture all possible poses, need to generate huge amount
  • f training data.
  • Training RF on big training set means more trees and deeper

trees.

  • Big amount of memory needed.
  • Solution: Divide training data into sub-sets and solve

classification for each set separately.

34

slide-35
SLIDE 35

Multi-layered Random Forest

 Cluster training data based

  • n similarity.

 Train RF on and for each

cluster.

 First layer assigns input to

proper cluster.

 Second layer gives the final

hand part label distribution.

35

[Keskin2012]

slide-36
SLIDE 36

Clustering training data

 Cluster based on weighted differences.  Penalize differences of viewpoint, finger positions.  Label each cluster, labels refer to hand shape.  Train Random Forest on clusters.

36

slide-37
SLIDE 37

Experts

 Use hand part labels.  Train for each cluster a

separate Random Forest.

 Each forest is called Expert.

37

slide-38
SLIDE 38

Two prediction methods

 Global Expert Network:

 Feed input to first layer of Random Forest, average input, get

hand shape label.

 Feed input to corresponding expert, get hand part distribution.

38

slide-39
SLIDE 39

Two prediction methods

 Local Expert Network

 Feed input to first layer of Random Forest, get hand shape label

for each pixel.

 Feed each pixel to its corresponding expert, get hand part

distribution.

39

slide-40
SLIDE 40

Parts distribution to pose

  • RDF returns the hand part distribution.
  • Get centre of each distribution by utilizing mean shift.

40

slide-41
SLIDE 41

American Sign Language

41

slide-42
SLIDE 42

42

First layer accuracy on ASL

  • 2-fold cross-validation: 97.8%

42

  • Confusion occurs for (m,n), (m,t) and (n,t)
slide-43
SLIDE 43

Confusions

  • Confusion occurs for (m,n), (m,t) and (n,t)

43

slide-44
SLIDE 44

Second layer accuracy

44

Q = Number of clusters

slide-45
SLIDE 45

Problems

 Not feasible to capture all possible variations of hand with

synthetic data.

 Methods using only synthetic data suffer from synthetic-

realistic discrepancies.

 But: Using realistic training data expensive, due to manually

labelling them.

45

Real Synthetic

slide-46
SLIDE 46

Problems

 Not feasible to capture all possible variations of hand with

synthetic data.

 Methods using only synthetic data suffer from synthetic-

realistic discrepancies.

 But: Using realistic training data expensive, due to manually

labelling them.

 Solution: Transductive Learning.

46

slide-47
SLIDE 47

Transductive Random Forest

 Transductive learning: learn from labelled data, apply

knowledge transform to related unlabelled data

 Estimate pose based on knowledge gained from both labelled

and unlabelled data.

47

slide-48
SLIDE 48

Overview

48

slide-49
SLIDE 49

Training data

 Training data consists of

labelled real data and synthetic data, and unlabelled real data

 Labelled elements are image

patches, not pixels

 Label consists of tuple (a,p,v):

 a = Viewpoint  p = Label of the closest joint  v = Vector containing all

positions of joint

49

a = «Front» p = «Thumb» v = (3x16) coordinates

slide-50
SLIDE 50

Quality Function

  • Randomly choose between the two:

50

Transductive Term Classification-Regression Term

slide-51
SLIDE 51

Quality Function

  • 𝑅𝑏 : Measures quality of split with respect to viewpoint a
  • 𝑅𝑞 : Measures quality of split with respect to joint label p
  • 𝑅𝑤 : Measures compactness of vote vector v

51

slide-52
SLIDE 52

Quality Function Parameter

52

Measures the “purity” of the node with respect to either the viewpoint a, or the joint label p

slide-53
SLIDE 53

Quality Function

  • 𝑅𝑢 : Measures image similarity between real data patches
  • 𝑅𝑣 : Measures purity based on the association between the

labelled and unlabelled data

53

slide-54
SLIDE 54

Kinematic Refinement

  • Hands are biomechanically constrained on the poses it can do.
  • Use this for our advantage.
  • Utilize kinematic refinement to enforce these constraints.

54

slide-55
SLIDE 55

Some results

55

slide-56
SLIDE 56

Joint prediction accuracy

56

slide-57
SLIDE 57

Estimating pose of two hands?

 Just apply single hand pose estimator twice?  What if both hands are strongly interacting?  Additional occlusion must be accounted for.

57

slide-58
SLIDE 58

Dual hand pose estimation

 Model-based approach.  Set up parameter space

representing all degrees of freedom for both hands.

 Employ PSO to find best

parameters suiting

  • bservation and current

configuration with respect to a cost function.

58

slide-59
SLIDE 59

Sample parameter space

59

x - Roll y - Pitch z - Yaw

slide-60
SLIDE 60

Cost function over param. space

60

slide-61
SLIDE 61

Initialization

61

Random sample of n particles with random velocities.

slide-62
SLIDE 62

Iterating over parameter space

62

Update particle position according to velocity Update particle velocities with regards to:  Current velocity  Local best position  Global best position

slide-63
SLIDE 63

Tracking

 Use RGB image to create skin map.  Segment depth image according to skin map.

63

slide-64
SLIDE 64

Tracking

 Cost function to optimize:

P(h): Penalizes invalid finger positions. D(O,h,C): Penalizes discrepancies between hypothesis h and observation O.

64

slide-65
SLIDE 65

Applying PSO

 Change particle velocity according to:

= Best known position of particle i in generation k. = Best known position of all particles in generation k.

 Apply PSO for each observation O. Exploit temporal

information by sampling particles around previous hypothesis.

65

slide-66
SLIDE 66

Some results

66

slide-67
SLIDE 67

67

Accuracy

67

slide-68
SLIDE 68

Future of Hand Pose estimation

  • Academically solved
  • Further research in areas of recovering more than pose, such

as hand model or 3D skin models.

 Including RGB image for prediction increases accuracy.  Use of real data reduces synthetic-realistic discrepancies.

68

slide-69
SLIDE 69

69

Thank you for your attention!

69