Probabilistic Tracking and Probabilistic Tracking and Probabilistic - - PowerPoint PPT Presentation

probabilistic tracking and probabilistic tracking and
SMART_READER_LITE
LIVE PREVIEW

Probabilistic Tracking and Probabilistic Tracking and Probabilistic - - PowerPoint PPT Presentation

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Reconstruction of 3D Human Motion Reconstruction of 3D Human Motion Reconstruction of 3D Human Motion in Monocular Video Sequences in Monocular Video Sequences


slide-1
SLIDE 1

Probabilistic Tracking and Reconstruction of 3D Human Motion in Monocular Video Sequences Probabilistic Tracking and Probabilistic Tracking and Reconstruction of 3D Human Motion Reconstruction of 3D Human Motion in Monocular Video Sequences in Monocular Video Sequences

Presentation of the thesis work of: Hedvig Sidenbladh, KTH Thesis opponent: Prof. Bill Freeman, MIT

slide-2
SLIDE 2

Thesis supervisors Thesis supervisors

  • Prof. Jan-Olof Eklundh, KTH
  • Prof. Michael Black, Brown University
  • Dr. David Fleet, Xerox PARC
  • Prof. Dirk Ormoneit, Stanford University

Collaborators Collaborators

slide-3
SLIDE 3

A vision of the future from the past.

New York Worlds Fair, 1939 (Westinghouse Historical Collection) Elektro Sparky

slide-4
SLIDE 4
  • Entertainment: motion

capture for games, animation, and film.

  • Surveillance
  • Video search

Applications of computers Applications of computers looking at people looking at people

  • Human-machine interaction

– Robots – Intelligent rooms

slide-5
SLIDE 5

Technical Goal Technical Goal Technical Goal

Tracking a human in 3D

slide-6
SLIDE 6

Why is it Hard? Why is it Hard?

The appearance of people can vary dramatically.

slide-7
SLIDE 7

Why is it hard? Why is it hard?

People can appear in arbitrary poses. Structure is unobservable— inference from visible parts.

slide-8
SLIDE 8

Why is it hard? Why is it hard?

Geometrically under-constrained.

slide-9
SLIDE 9

One solution: One solution: One solution:

  • Use markers
  • Use multiple cameras

http://www.vicon.com/animation/

slide-10
SLIDE 10

State of the Art.

Bregler Bregler and and Malik Malik ‘ ‘98 98

  • Brightness constancy

cue

– Insensitive to appearance

  • Full-body required

multiple cameras

  • Single hypothesis
slide-11
SLIDE 11

2D vs. 3D tracking 2D vs. 3D tracking 2D vs. 3D tracking

  • Artist

Artist’ ’s models... s models...

slide-12
SLIDE 12

State of the Art.

Cham Cham and and Rehg Rehg ‘ ‘99 99

  • Single camera, multiple hypotheses
  • 2D templates (no drift but view dependent)

I(x, t) = I(x+u, 0) + η

slide-13
SLIDE 13

1999 state of art 1999 state of art 1999 state of art

Pavlovic, Rehg, Cham, and Murphy, Intl. Conf. Computer Vision, 1999

slide-14
SLIDE 14

State of the Art.

Deutscher Deutscher, North, , North, Bascle Bascle, & Blake , & Blake ‘ ‘00 00

  • Multiple hypotheses
  • Multiple cameras
  • Simplified clothing,

lighting and background

slide-15
SLIDE 15

Note: we can fake it with clever system design

  • M. Krueger,

“Artificial Reality”, Addison-Wesley, 1983.

slide-16
SLIDE 16

Game videos... Game videos... Game videos...

slide-17
SLIDE 17

Black background No other people in camera Display tells person what motion to do. Person at known distance and position.

Decathlete 100m hurdles Decathlete 100m hurdles Decathlete 100m hurdles

slide-18
SLIDE 18

Performance specifications Performance specifications

* No special clothing * Monocular, grayscale, sequences (archival data) * Unknown, cluttered, environment

Task: Infer 3D human motion from 2D image

slide-19
SLIDE 19

Bayesian formulation Bayesian formulation

p(model | cues) = p(cues | model) p(model)

  • 3. Posterior probability: Need an effective way to

explore the model space (very high dimensional) and represent ambiguities. p(cues)

  • 1. Need a constraining likelihood model that is also

invariant to variations in human appearance.

  • 2. Need a prior model of how people move.
slide-20
SLIDE 20

System components System components System components

  • Representation for probabilistic

analysis.

  • Models for human appearance

(likelihood term).

  • Models for human motion (prior term).

– Very general model – Very specific model – Example-based model

slide-21
SLIDE 21

System components System components System components

  • Representation for probabilistic

analysis.

  • Models for human appearance

(likelihood term).

  • Models for human motion (prior term).

– Very general model – Very specific model – Example-based model

slide-22
SLIDE 22

Simple Body Model Simple Body Model

* Limbs are truncated cones * Parameter vector of joint angles and angular velocities = φ

slide-23
SLIDE 23

Multiple Hypotheses Multiple Hypotheses

  • Posterior distribution over

model parameters often multi- modal (due to ambiguities)

  • Represent whole distribution:

– sampled representation – each sample is a pose – predict over time using a particle filtering approach

slide-24
SLIDE 24

Particle Filter Particle Filter

sample sample sample sample normalize normalize Posterior

) I | (

1 1 − − t t

p r φ

Temporal dynamics

) | (

1 − t t

p φ φ

Likelihood

) | I (

t t

p φ

) I | (

t t

p r φ

Posterior Problem: Expensive represententation of posterior! Approaches to solve problem:

  • Lower the number of samples. (Deutsher et al., CVPR00)
  • Represent the space in other ways (Choo and Fleet, ICCV01)
slide-25
SLIDE 25

System components System components System components

  • Representation for probabilistic

analysis.

  • Models for human appearance

(likelihood term).

  • Models for human motion (prior term).

– Very general model – Very specific model – Example-based model

slide-26
SLIDE 26

Changing background Low contrast limb boundaries Occlusion Varying shadows Deforming clothing

What do people look like? What do non-people look like?

slide-27
SLIDE 27

Edge Detection? Edge Detection?

  • Probabilistic model?
  • Under/over-segmentation,

thresholds, …

slide-28
SLIDE 28

Key Idea #1 (Likelihood) Key Idea #1 (Likelihood)

  • 1. Use the 3D model to predict the location of

limb boundaries (not necessarily features) in the scene.

  • 2. Compute various filter responses steered to the

predicted orientation of the limb.

  • 3. Compute likelihood of filter responses using a

statistical model learned from examples.

slide-29
SLIDE 29

Edge Filters Edge Filters

Normalized derivatives of Gaussians (Lindeberg, Granlund

and Knutsson, Perona, Freeman&Adelson, …)

Edge filter response steered to limb orientation:

) , ( cos ) , ( sin ) , , ( σ θ σ θ σ θ x x x

y x e

f f f + =

Filter responses steered to arm

  • rientation.
slide-30
SLIDE 30

Example Training Images Example Training Images

slide-31
SLIDE 31

Edge Distributions Edge Distributions

Edge response steered to model edge:

) , ( cos ) , ( sin ) , , ( σ θ σ θ σ θ x x x

y x e

f f f + =

Similar to Konishi et al., CVPR 99

slide-32
SLIDE 32

Edge Likelihood Ratio Edge Likelihood Ratio

Edge response Likelihood ratio

slide-33
SLIDE 33

Other Cues Other Cues

Ridges I(x, t) I(x+u, t+1) Motion

slide-34
SLIDE 34

Ridge Distributions Ridge Distributions

Ridge response steered to limb orientation

| ) , ( cos sin 2 ) , ( sin ) , ( cos | | ) , ( cos sin 2 ) , ( cos ) , ( sin | ) , , (

2 2 2 2

σ θ θ σ θ σ θ σ θ θ σ θ σ θ σ θ x x x x x x x

xy yy xx xy yy xx r

f f f f f f f + + − − + =

Ridge response only on certain image scales!

slide-35
SLIDE 35

Motion distributions Motion distributions

Different underlying motion models

slide-36
SLIDE 36

Likelihood Formulation Likelihood Formulation

  • Independence assumptions:

– Cues: p(image | model) = p(cue1 | model) p(cue2 | model) – Spatial: p(image | model) = Π p(image(x) | model) – Scales: p(image | model) = Π p(image(σ) | model)

  • Combines cues and scales!
  • Simplification, in reality there are

dependencies

x∈image σ=1,...

slide-37
SLIDE 37

The power of cue combination The power of cue combination The power of cue combination

slide-38
SLIDE 38

Using edge cues alone Using edge cues alone Using edge cues alone

Edge cues

slide-39
SLIDE 39

Using ridge cues alone Using ridge cues alone Using ridge cues alone

Ridge cues

slide-40
SLIDE 40

Using flow cue alone Using flow cue alone Using flow cue alone

Flow cues

slide-41
SLIDE 41

Using edge, ridge, and motion cues together Using edge, ridge, and motion cues Using edge, ridge, and motion cues together together

Edge cues Ridge cues Flow cues

slide-42
SLIDE 42

p(image | foreground, background) ∝

Do not look in parts of the image considered background Foreground part of image

Key Idea #2 Key Idea #2

p(foreground part of image | foreground) p(foreground part of image | background)

slide-43
SLIDE 43

Likelihood Likelihood

∏ ∏

=

pixels back pixels fore

back image p fore image p back fore image p ) | ( ) | ( ) , | (

∏ ∏ ∏

=

pixels fore pixels fore pixels all

back image p fore image p back image p ) | ( ) | ( ) | (

∏ ∏

=

pixels fore pixels fore

back image p fore image p const ) | ( ) | (

Foreground pixels Background pixels

slide-44
SLIDE 44

System components System components System components

  • Representation for probabilistic

analysis.

  • Models for human appearance

(likelihood term).

  • Models for human motion (prior term).

– Very general model – Very specific model – Example-based model

slide-45
SLIDE 45

The Prior term The Prior term

– Need a constraining likelihood model that is also invariant to variations in human appearance

Bayesian formulation: p(model | cue) p(cue | model) p(model)

– Need a good model of how people move

slide-46
SLIDE 46

Very general model Very general model Very general model

  • Constant velocity motions
  • Not constrained by how people tend to

move.

slide-47
SLIDE 47

Constant velocity model Constant velocity model

  • All DOF in the model parameter space, φ,

independent

  • Angles are assumed to change with constant speed
  • Speed and position changes are randomly sampled

from normal distribution

slide-48
SLIDE 48

Tracking an Arm Tracking an Arm

1500 samples ~2 min/frame

Moving camera, constant velocity model

slide-49
SLIDE 49

Self Occlusion Self Occlusion

1500 samples ~2 min/frame

Constant velocity model

slide-50
SLIDE 50

System components System components System components

  • Representation for probabilistic

analysis.

  • Models for human appearance

(likelihood term).

  • Models for human motion (prior term).

– Very general model – Very specific model – Example-based model

slide-51
SLIDE 51

Very specific model Very specific model Very specific model

  • Only handles people walking.
  • Very powerful constraint on human motion.
slide-52
SLIDE 52

Models of Human Dynamics Models of Human Dynamics

  • Action-specific model - Walking

– Training data: 3D motion capture data – From training set, learn mean cycle and common modes of deviation (PCA)

Mean cycle Small noise Large noise

slide-53
SLIDE 53

Walking Person Walking Person

#samples from 15000 to 2500 by using the learned likelihood 2500 samples ~10 min/frame

Walking model

slide-54
SLIDE 54

No likelihood No likelihood

* how strong is the walking prior? (or is our likelihood doing anything?)

slide-55
SLIDE 55

System components System components System components

  • Representation for probabilistic

analysis.

  • Models for human appearance

(likelihood term).

  • Models for human motion (prior term).

– Very general model – Very specific model – Example-based model

slide-56
SLIDE 56

Example-based model Example Example-

  • based model

based model

  • Take lots of training data.
  • Use “snippets” of the data as models for

how people are likely to move.

slide-57
SLIDE 57

Example-based model Example Example-

  • based model

based model

Ten samples from the prior, drawn using approximate probabilistic tree search.

slide-58
SLIDE 58

Tracking with only 300 particles. Tracking with only 300 particles. Tracking with only 300 particles.

Example-based motion prior. Smooth motion prior.

slide-59
SLIDE 59

Lessons Learned Lessons Learned Lessons Learned

  • Representation for probabilistic

analysis.

– Probabilistic (Bayesian) framework allows

  • Integration of information in a principled way
  • Modeling of priors

– Particle filtering allows

  • Multi-modal distributions
  • Tracking with ambiguities and non-linear

models

  • Models for human appearance

(likelihood term).

  • Models for human motion (prior term).
slide-60
SLIDE 60

Lessons Learned Lessons Learned Lessons Learned

  • Representation for probabilistic analysis.
  • Models for human appearance (likelihood

term).

– Generic, learned, model of appearance

  • Combines multiple cues
  • Exploits work on image statistics

– Use the 3D model to predict features – Model of foreground and background

  • Exploits the ratio between foreground and background

likelihood

  • Improves tracking
  • Models for human motion (prior term).
slide-61
SLIDE 61

Lessons Learned Lessons Learned Lessons Learned

  • Representation for probabilistic

analysis.

  • Models for human appearance

(likelihood term).

  • Models for human motion (prior

term).

– Explored 3 different models; analyzed the tradeoffs between each.

slide-62
SLIDE 62

End End End

slide-63
SLIDE 63

Decathlete javelin throw

slide-64
SLIDE 64

Edges Edges

slide-65
SLIDE 65

Bayesian Bayesian Inference Inference

Exploit cues in the images. Learn likelihood models: p(image cue | model) Build models of human form and motion. Learn priors over model parameters: p(model) Represent the posterior distribution: p(model | cue) p(cue | model) p(model)