Pictorial Structures Revisited: People Detection and Articulated - - PowerPoint PPT Presentation

pictorial structures revisited people detection and
SMART_READER_LITE
LIVE PREVIEW

Pictorial Structures Revisited: People Detection and Articulated - - PowerPoint PPT Presentation

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation Mykhaylo Andriluka Stefan Roth Bernt Schiele Department of Computer Science TU Darmstadt Pictorial Structures Revisited: People Detection and Articulated Pose


slide-1
SLIDE 1

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation

Mykhaylo Andriluka Bernt Schiele Stefan Roth Department of Computer Science TU Darmstadt

slide-2
SLIDE 2

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Generic model for human detection and pose estimation

2

Human pose estimation

[Felzenszwalb&Huttenlocher, ICCV’05], [Ren et al., ICCV’05], [Sigal&Black, CVPR’06], [Zhang et al., CVPR’06], [Jiang&Marin, CVPR’08], [Ramanan, NIPS’06], [Ferrari et al., CVPR’08], [Ferrari et al., CVPR’09]

People Detection

[Viola et al., ICCV’03], [Dalal&Triggs, CVPR’05], [Leibe et al., CVPR’05], [Andriluka et al., CVPR’08]

  • ften rather simple appearance model

focus on finding optimal assembly of parts complex appearance model no pose model or limited to walking motion

slide-3
SLIDE 3

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Generic model for human detection and pose estimation

3

Human pose estimation

[Felzenszwalb&Huttenlocher, ICCV’05], [Ren et al., ICCV’05], [Sigal&Black, CVPR’06], [Zhang et al., CVPR’06], [Jiang&Marin, CVPR’08], [Ramanan, NIPS’06], [Ferrari et al., CVPR’08], [Ferrari et al., CVPR’09]

People Detection

[Viola et al., ICCV’03], [Dalal&Triggs, CVPR’05], [Leibe et al., CVPR’05], [Andriluka et al., CVPR’08]

  • ften rather simple appearance model

focus on finding optimal assembly of parts complex appearance model no pose model or limited to walking motion

slide-4
SLIDE 4

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 4

[Fischler&Elschlager, 1973]

Can we make pictorial structures model effective for these tasks?

slide-5
SLIDE 5

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 5

Can we make pictorial structures model effective for these tasks? Yes... if the model components are chosen right.

slide-6
SLIDE 6

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Pictorial Structures Model

  • Body is represented as flexible

configuration of body parts

6

L

di

  • configuration of parts

L = {l0, l1, . . . , lN} D = {d0, d1, . . . , dN} - part evidence p(L|D) ∝ p(D|L)p(L)

prior on body poses likelihood of observations posterior over body poses

di

slide-7
SLIDE 7

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Pictorial Structures Model

7

posterior marginals

p(li|D) ∝

  • L\li

p(L|D)

sum- product BP

Pictorial structures allow exact and efficient inference.

  • tree-structured prior
  • independent part appearance

model

  • discretized part locations
  • Gaussian pairwise part

relationships

l7 l5 l6 l8 l9 l10 l2 l1 l3 l4

slide-8
SLIDE 8

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 8

Can we make pictorial structures model effective for these tasks? So... what are the right components?

slide-9
SLIDE 9

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Model Components

9

. . . .

...

  • rientation 1
  • rientation K

likelihood

  • f part 1

likelihood

  • f part N

AdaBoost Local Features

estimated pose

...

. . . .

part posteriors

Appearance Model: Prior and Inference:

−50 50 −60 −40 −20 20 40 60 80 100
slide-10
SLIDE 10

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Model Components

10

. . . .

...

  • rientation 1
  • rientation K

likelihood

  • f part 1

likelihood

  • f part N

AdaBoost Local Features

estimated pose

...

. . . .

part posteriors

Appearance Model: Prior and Inference:

−50 50 −60 −40 −20 20 40 60 80 100
slide-11
SLIDE 11

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Likelihood Model

  • Build on recent advances in object detection:
  • state-of-the-art image descriptor: Shape Context

[Belongie et al., PAMI’02; Mikolajczyk&Schmid, PAMI’05]

  • dense representation
  • discriminative model: AdaBoost classifier for each body part

11

  • Shape Context: 96 dimensions

(4 angular, 3 radial, 8 gradient

  • rientations)
  • Feature Vector: concatenate the

descriptors inside part bounding box

  • head: 4032 dimensions
  • torso: 8448 dimensions
slide-12
SLIDE 12

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Likelihood Model

  • Part likelihood derived from the boosting score:

12

˜ p(di|li) = max

  • t αi,tht(x(li))
  • t αi,t

, ε0

  • part location

decision stump output decision stump weight small constant to deal with part

  • cclusions
slide-13
SLIDE 13

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Likelihood Model

13

Upper leg

[Ramanan, NIPS’06] Our part likelihoods

Input image Torso Head

. . . .

slide-14
SLIDE 14

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Likelihood Model

14

Upper leg Input image Torso Head

[Ramanan, NIPS’06] Our part likelihoods

. . . .

slide-15
SLIDE 15

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Likelihood Model

15

Upper leg Input image Torso Head

[Ramanan, NIPS’06] Our part likelihoods

. . . .

slide-16
SLIDE 16

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Model Components

16

. . . .

...

  • rientation 1
  • rientation K

likelihood

  • f part 1

likelihood

  • f part N

AdaBoost Local Features

estimated pose

...

. . . .

part posteriors

Appearance Model: Prior and Inference:

−50 50 −60 −40 −20 20 40 60 80 100
slide-17
SLIDE 17

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

  • Represent pairwise part relations

[Felzenszwalb & Huttenlocher, IJCV’05]

Kinematic Tree Prior

17

l1

p(l2|l1) = N(T12(l2)|T21(l1), Σ12) p(L) = p(l0)

  • (i,j)∈E

p(li|lj),

l2

part locations relative to the joint transformed part locations

−50 50 −50 −40 −30 −20 −10 10 20 30 40 50 −50 50 −50 −40 −30 −20 −10 10 20 30 40 50

+

l1 l2

slide-18
SLIDE 18

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Kinematic Tree Prior

  • Prior parameters:
  • Parameters of the prior are estimated with maximum

likelihood

18

{Tij, Σij}

−50 50 −60 −40 −20 20 40 60 80 100

−80 −60 −40 −20 20 40 60 80 −60 −40 −20 20 40 60 80 100 120 −80 −60 −40 −20 20 40 60 80 −60 −40 −20 20 40 60 80 100 120 −80 −60 −40 −20 20 40 60 80 −60 −40 −20 20 40 60 80 100 120 −80 −60 −40 −20 20 40 60 80 −60 −40 −20 20 40 60 80 100 120 −80 −60 −40 −20 20 40 60 80 −60 −40 −20 20 40 60 80 100 120 −80 −60 −40 −20 20 40 60 80 −60 −40 −20 20 40 60 80 100 120

Figure 2. (left) Kinematic prior learned on the multi-view and

mean pose several independent samples

slide-19
SLIDE 19

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Evaluation Scenarios

19

  • 3. Pedestrian Detection

“TUD Pedestrians” dataset [Andriluka et al., CVPR’08]

  • 2. Upper-body Pose Estimation

“Buffy” dataset [Ferrari et al., CVPR’08]

  • 1. Human Pose Estimation

“People” dataset [Ramanan, NIPS’06]

slide-20
SLIDE 20

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Evaluation Scenarios

20

  • 3. Pedestrian Detection

“TUD Pedestrians” dataset [Andriluka et al., CVPR’08]

  • 2. Upper-body Pose Estimation

“Buffy” dataset [Ferrari et al., CVPR’08]

  • 1. Human Pose Estimation

“People” dataset [Ramanan, NIPS’06]

slide-21
SLIDE 21

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Scenario 1: Qualitative Results

21

(l) 6/10 3/10 (bottom). The numbers on the left of (i) 8/10 4/10

(a) 8/10 0/10

(g) 8/10 7/10

(d) 7/10 3/10

(k) 7/10 3/10

Our model Our model [Ramanan, NIPS’06] [Ramanan, NIPS’06]

slide-22
SLIDE 22

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Scenario 1: Quantitative Results

22

Method Torso Upper legs Lower legs Upper arm Forearm Head Total [Ramanan, NIPS’06] 2nd parse 52 30 29 17 13 37 27 Our inference, edge features from [Ramanan, NIPS’06] 63 48 37 26 20 45 37 Our part detectors (SC) 29 12 18 3 4 40 14 Our prior, our part detectors (SC) 81 63 55 47 31 75 55 Our prior, our part detectors (SIFT) 78 58 54 44 31 66 52

slide-23
SLIDE 23

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Scenario 1: Quantitative Results

23

Method Torso Upper legs Lower legs Upper arm Forearm Head Total [Ramanan, NIPS’06] 2nd parse 52 30 29 17 13 37 27 Our prior, edge features from [Ramanan, NIPS’06] 63 48 37 26 20 45 37 Our part detectors (SC) 29 12 18 3 4 40 14 Our prior, our part detectors (SC) 81 63 55 47 31 75 55 Our prior, our part detectors (SIFT) 78 58 54 44 31 66 52

slide-24
SLIDE 24

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Scenario 1: Quantitative Results

24

Method Torso Upper legs Lower legs Upper arm Forearm Head Total [Ramanan, NIPS’06] 2nd parse 52 30 29 17 13 37 27 Our inference, edge features from [Ramanan, NIPS’06] 63 48 37 26 20 45 37 Our part detectors (SC) 29 12 18 3 4 40 14 Our prior, our part detectors (SC) 81 63 55 47 31 75 55 Our prior, our part detectors (SIFT) 78 58 54 44 31 66 52

SC = Shape Context

slide-25
SLIDE 25

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Scenario 1: Quantitative Results

25

Method Torso Upper legs Lower legs Upper arm Forearm Head Total [Ramanan, NIPS’06] 2nd parse 52 30 29 17 13 37 27 Our inference, edge features from [Ramanan, NIPS’06] 63 48 37 26 20 45 37 Our part detectors (SC) 29 12 18 3 4 40 14 Our prior, our part detectors (SC) 81 63 55 47 31 75 55 Our prior, our part detectors (SIFT) 78 58 54 44 31 66 52

SC = Shape Context

slide-26
SLIDE 26

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Scenario 1: Quantitative Results

26

Method Torso Upper legs Lower legs Upper arm Forearm Head Total [Ramanan, NIPS’06] 2nd parse 52 30 29 17 13 37 27 Our inference, edge features from [Ramanan, NIPS’06] 63 48 37 26 20 45 37 Our part detectors (SC) 29 12 18 3 4 40 14 Our prior, our part detectors (SC) 81 63 55 47 31 75 55 Our prior, our part detectors (SIFT) 78 58 54 44 31 66 52

SC = Shape Context

slide-27
SLIDE 27

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Evaluation Scenarios

27

  • 3. Pedestrian Detection

“TUD Pedestrians” dataset [Andriluka et al., CVPR’08]

  • 2. Upper-body Pose Estimation

“Buffy” dataset [Ferrari et al., CVPR’08]

  • 1. Human Pose Estimation

“People” dataset [Ramanan, NIPS’06]

slide-28
SLIDE 28

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Estimated upper-body poses

28

slide-29
SLIDE 29

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Quantitative Results

29

Method Torso Upper arm Lower arm Head Total [Ferrari et al. CVPR’08]

  • 57.9

detectors only 18.9 6.8 3.1 47.2 14.3 full model 90.7 79.3 41.2 95.9 71.3

−50 50 −60 −40 −20 20 40 60 80 100

  • generic model
  • prior and appearance learned on

the “People” dataset

slide-30
SLIDE 30

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Quantitative Results

30

Method Torso Upper arm Lower arm Head Total [Ferrari et al. CVPR’08]

  • 57.9

detectors only 18.9 6.8 3.1 47.2 14.3 full model 90.7 79.3 41.2 95.9 71.3 [Ferrari et al. CVPR’09]

  • 72.2

−50 50 −60 −40 −20 20 40 60 80 100

  • generic model
  • prior and appearance learned on

the “People” dataset

slide-31
SLIDE 31

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Quantitative Results

31

Method Torso Upper arm Lower arm Head Total [Ferrari et al. CVPR’08]

  • 57.9

detectors only 18.9 6.8 3.1 47.2 14.3 full model 90.7 79.3 41.2 95.9 71.3 [Ferrari et al. CVPR’09]

  • 72.2

full model, Buffy pose prior 90.7 81.35 46.5 95.5 73.5

  • specialized upper body prior
  • appearance learned on the

“People” dataset

−50 50 −100 −50 50 100

slide-32
SLIDE 32

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Typical Failure Cases

32

Foreshortening Detections on other body parts Part occlusion

slide-33
SLIDE 33

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Evaluation Scenarios

33

  • 3. Pedestrian Detection

“TUD Pedestrians” dataset [Andriluka et al., CVPR’08]

  • 2. Upper-body Pose Estimation

“Buffy” dataset [Ferrari et al., CVPR’08]

  • 1. Human Pose Estimation

“People” dataset [Ramanan, NIPS’06]

slide-34
SLIDE 34

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

1-precision

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

recall

  • ur model, 8 parts, tree prior
  • ur model, 8 parts, star prior

partISM detector HOG (INRIA Training Set)

People Detection: Results

  • Comparison with state-of-the art in people detection

34

[Andriluka et al., CVPR’08] This work

−50 50 −100 −80 −60 −40 −20 20 40 60 80 100 −50 50 −60 −40 −20 20 40 60 80 100 120 140

[Andriluka et al., CVPR’08]

  • ur model, 8 parts, tree prior
  • ur model, 8 parts, star prior

HOG (INRIA Training Set)

slide-35
SLIDE 35

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Conclusion & Future Work

  • Success of pose estimation by “body part” detection
  • use well understood pose estimation framework (Pictorial

Structures)

  • use appropriate representation for kinematic dependencies
  • use state of the art appearance representation (SIFT, SC) and

classification (AdaBoost)

  • Next steps:
  • estimate poses in 3D
  • part occlusions
  • appearance constraints between parts

35

slide-36
SLIDE 36

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009

Thanks!

  • Acknowledgements:
  • Thanks to Krystian Mikolajczyk for image descriptors code
  • Thanks to Christian Wojek for AdaBoost code and helpful

suggestion

  • Thanks to Deva Ramanan and Vittorio Ferrari for making their

code and datasets publicly available

  • This work is partially funded by German Research Foundation

(DFG) through GRK 1362.

  • Code and pre-trained models will be available at:
  • http://www.mis.informatik.tu-darmstadt.de/code

36