Adrian Spurr Ubiquitous Computing Seminar FS2014 27.05.2014
Gesture Recognition: Hand Pose Estimation
1
Gesture Recognition: Hand Pose Estimation Adrian Spurr Ubiquitous - - PowerPoint PPT Presentation
Gesture Recognition: Hand Pose Estimation Adrian Spurr Ubiquitous Computing Seminar FS2014 27.05.2014 1 What is hand pose estimation? Input Computer-usable form 2 Augmented Reality Gaming PC Control Robot Control 3 3 Data glove
Adrian Spurr Ubiquitous Computing Seminar FS2014 27.05.2014
1
Input Computer-usable form
2
3
Augmented Reality Robot Control Gaming PC Control
3
4
to measure finger bending.
can provide haptic feedback.
long calibration time, unnatural feeling, heavily instrumented.
4
5
5
RGB Camera Depth Camera
6
7
8
9
10
Algorithm
11
Keeps internally track of
current pose.
Updates pose according
to current pose and
Maps directly from
“Learn” from training data
and apply knowledge to unseen data.
12
Processing
Ensemble learning Classification and Regression Consists of decision trees
A decision tree:
13
14
Data in feature space
Features = «Properties» of data
15
Data in feature space
Features = «Properties» of data
16
Data in feature space
Features = «Properties» of data
17
Data in feature space
Features = «Properties» of data
18
Data in feature space
Features = «Properties» of data
19
20
21
Choose 𝑈
𝑘 which splits the data with maximum information gain.
22
23
24
Why Random Forests?
How should we use them?
25
26
[Shotton2011]
Discriminative approach. Used in the Kinect. First paper to use synthetic training data. Basis for many future papers.
27
[Shotton2011]
28
[Shotton2011]
29
: Depth at position x
[Shotton2011]
30
[Shotton2011]
31
training images.
to higher accuracy and less
31
[Shotton2011]
32
[Shotton2011]
trees.
33
trees.
classification for each set separately.
34
Cluster training data based
Train RF on and for each
cluster.
First layer assigns input to
proper cluster.
Second layer gives the final
hand part label distribution.
35
[Keskin2012]
Cluster based on weighted differences. Penalize differences of viewpoint, finger positions. Label each cluster, labels refer to hand shape. Train Random Forest on clusters.
36
Use hand part labels. Train for each cluster a
separate Random Forest.
Each forest is called Expert.
37
Global Expert Network:
Feed input to first layer of Random Forest, average input, get
hand shape label.
Feed input to corresponding expert, get hand part distribution.
38
Local Expert Network
Feed input to first layer of Random Forest, get hand shape label
for each pixel.
Feed each pixel to its corresponding expert, get hand part
distribution.
39
40
41
42
42
43
44
Q = Number of clusters
Not feasible to capture all possible variations of hand with
synthetic data.
Methods using only synthetic data suffer from synthetic-
realistic discrepancies.
But: Using realistic training data expensive, due to manually
labelling them.
45
Real Synthetic
Not feasible to capture all possible variations of hand with
synthetic data.
Methods using only synthetic data suffer from synthetic-
realistic discrepancies.
But: Using realistic training data expensive, due to manually
labelling them.
Solution: Transductive Learning.
46
Transductive learning: learn from labelled data, apply
knowledge transform to related unlabelled data
Estimate pose based on knowledge gained from both labelled
and unlabelled data.
47
48
Training data consists of
labelled real data and synthetic data, and unlabelled real data
Labelled elements are image
patches, not pixels
Label consists of tuple (a,p,v):
a = Viewpoint p = Label of the closest joint v = Vector containing all
positions of joint
49
a = «Front» p = «Thumb» v = (3x16) coordinates
50
Transductive Term Classification-Regression Term
51
52
Measures the “purity” of the node with respect to either the viewpoint a, or the joint label p
labelled and unlabelled data
53
54
55
56
Just apply single hand pose estimator twice? What if both hands are strongly interacting? Additional occlusion must be accounted for.
57
Model-based approach. Set up parameter space
representing all degrees of freedom for both hands.
Employ PSO to find best
parameters suiting
configuration with respect to a cost function.
58
59
x - Roll y - Pitch z - Yaw
60
61
Random sample of n particles with random velocities.
62
Update particle position according to velocity Update particle velocities with regards to: Current velocity Local best position Global best position
Use RGB image to create skin map. Segment depth image according to skin map.
63
Cost function to optimize:
P(h): Penalizes invalid finger positions. D(O,h,C): Penalizes discrepancies between hypothesis h and observation O.
64
Change particle velocity according to:
= Best known position of particle i in generation k. = Best known position of all particles in generation k.
Apply PSO for each observation O. Exploit temporal
information by sampling particles around previous hypothesis.
65
66
67
67
as hand model or 3D skin models.
Including RGB image for prediction increases accuracy. Use of real data reduces synthetic-realistic discrepancies.
68
69
69