Rich representations for Rich representations for learning visual - - PowerPoint PPT Presentation

rich representations for rich representations for
SMART_READER_LITE
LIVE PREVIEW

Rich representations for Rich representations for learning visual - - PowerPoint PPT Presentation

Rich representations for Rich representations for learning visual recognition learning visual recognition g g g g Jitendra Malik Jitendra Malik Jitendra Malik Jitendra Malik University of California at Berkeley University of California


slide-1
SLIDE 1

Rich representations for Rich representations for learning visual recognition learning visual recognition g g g g

Jitendra Malik Jitendra Malik Jitendra Malik Jitendra Malik University of California at Berkeley University of California at Berkeley

slide-2
SLIDE 2

Detection can be very fast Detection can be very fast Detection can be very fast Detection can be very fast

O k f j d i i l O k f j d i i l

 On a task of judging animal vs no

On a task of judging animal vs no animal, humans can make mostly correct animal, humans can make mostly correct saccades in 150 ms (Kirchner & Thorpe, saccades in 150 ms (Kirchner & Thorpe, ( p , ( p , 2006) 2006)

C bl i d l i h i C bl i d l i h i

 Comparable to synaptic delay in the retina,

Comparable to synaptic delay in the retina, LGN, V1, V2, V4, IT pathway. LGN, V1, V2, V4, IT pathway.

 Doesn’t rule out feed back but shows feed

Doesn’t rule out feed back but shows feed f d l i f l f d l i f l forward only is very powerful forward only is very powerful

 Detection and categorization are

Detection and categorization are practically simultaneous (Grill practically simultaneous (Grill-Spector & Spector & practically simultaneous (Grill practically simultaneous (Grill Spector & Spector & Kanwisher, 2005) Kanwisher, 2005)

slide-3
SLIDE 3

Rolls et al (2000) Rolls et al (2000) Rolls et al (2000) Rolls et al (2000)

slide-4
SLIDE 4

Some opinions Some opinions Some opinions… Some opinions…

 A hierarchical, mostly

A hierarchical, mostly feedforward feedforward network is network is the right model, the question is how to train it the right model, the question is how to train it g , q g , q

 Unsupervised,

Unsupervised, sparsity sparsity encouraging techniques encouraging techniques are promising for lower layers are promising for lower layers are promising for lower layers are promising for lower layers

 But so far the success of this approach at the

But so far the success of this approach at the higher stages has not yet been demonstrated higher stages has not yet been demonstrated

slide-5
SLIDE 5

Insights from child development Insights from child development Insights from child development Insights from child development

  • Trying to learn object recognition from bounding boxes

is like trying to learn language from a list of sentences. y g g g

  • The development of visual recognition, like language

acquisition benefits from supportive “scaffolding” acquisition, benefits from supportive scaffolding  Grouping and tracking can play an important role by helping solve the correspondence problem. In a machine vision system, we can “cheat” by supplying keypoint correspondences

slide-6
SLIDE 6

Detecting and Segmenting People

Where are they? What are they wearing? What are they doing?

Jitendra Malik Jitendra Malik UC Berkeley

This is joint work with L. Bourdev, S. Maji and T. Brox. Th s s jo t wo w th .

  • u dev, S. Maj a d T.
  • .
slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10

Trying to extract stick figures is hard Trying to extract stick figures is hard (and unnecessary!) (and unnecessary!)

Generalized cylinders (Marr & Nishihara, Binford) Pictorial Structures (Felszenswalb & Huttenlocher)

slide-11
SLIDE 11

All the wrong limbs… All the wrong limbs… g

slide-12
SLIDE 12

High High-Level Computer Vision Level Computer Vision High High Level Computer Vision Level Computer Vision

slide-13
SLIDE 13

High High-Level Computer Vision Level Computer Vision

Object Recognition

High High Level Computer Vision Level Computer Vision

person an

Object Recognition

person person van dog

slide-14
SLIDE 14

High High-Level Computer Vision Level Computer Vision

Object Recognition

High High Level Computer Vision Level Computer Vision

person an

Object Recognition Semantic Segmentation

person person van dog

slide-15
SLIDE 15

High High-Level Computer Vision Level Computer Vision

Object Recognition

High High Level Computer Vision Level Computer Vision

Object Recognition Semantic Segmentation Pose Estimation

Facing the camera

Pose Estimation

In a back view Facing back, head to the right

slide-16
SLIDE 16

High High-Level Computer Vision Level Computer Vision

Object Recognition

High High Level Computer Vision Level Computer Vision

Walking away

Object Recognition Semantic Segmentation Pose Estimation

talking g y

Pose Estimation Action Recognition

slide-17
SLIDE 17

High High-Level Computer Vision Level Computer Vision

Object Recognition

High High Level Computer Vision Level Computer Vision

Object Recognition Semantic Segmentation Pose Estimation

blue GMC van

Pose Estimation Action Recognition Attribute Classification

Man with glasses and a elderly white

Attribute Classification

coat man with a baseball hat Entlebucher mountain dog

slide-18
SLIDE 18

High High-Level Computer Vision Level Computer Vision

Object Recognition

High High Level Computer Vision Level Computer Vision

“A blue GMC van k d i b k i ”

Object Recognition Semantic Segmentation Pose Estimation

parked, in a back view”

Pose Estimation Action Recognition Attribute Classification

“A man with glasses “An elderly man with a

Attribute Classification

g and a coat, facing back, walking away” An elderly man with a hat and glasses, facing the camera and talking” “An entlebucher m nt in d sittin in mountain dog sitting in a bag”

slide-19
SLIDE 19

Person Detection is Challenging Person Detection is Challenging g g g g

Occlusion Clothing Occlusion Clothing Articulation No silhouette Accessories Viewpoint Wrinkles

slide-20
SLIDE 20

How can we make the problem harder? How can we make the problem harder? p

 Solution: Severely limit the supervision

Solution: Severely limit the supervision

slide-21
SLIDE 21

The best approach in such setup? The best approach in such setup? pp p pp p

Part 2 fires on left torso …but sometimes

  • n ½ of the

head head Learned part Learned part

 Divide

Divide and and conquer: One global template + five parts conquer: One global template + five parts

Learned part Learned part location penalty location penalty Part 5 fires on one leg… …or both legs

 Divide

Divide-and and-conquer: One global template + five parts conquer: One global template + five parts

 Positions and appearance of parts trained jointly (Latent SVM)

Positions and appearance of parts trained jointly (Latent SVM) Mi f d l f i ( di i i ) Mi f d l f i ( di i i )

g

 Mixture of models for various poses (standing, sitting, etc)

Mixture of models for various poses (standing, sitting, etc) [Felzenszwalb Felzenszwalb et al. PAMI 2010] et al. PAMI 2010]

 Parts are not well localized and have large appearance variations

Parts are not well localized and have large appearance variations

slide-22
SLIDE 22

Radical idea: What if, instead, we try to Radical idea: What if, instead, we try to make the problem easier? make the problem easier? make the problem easier? make the problem easier?

Nose Right Shoulder f Sh ld Left Shoulder Right Elbow Left Elbow

[Bourdev and Malik, ICCV 2009] [Bourdev and Malik, ICCV 2009]

slide-23
SLIDE 23

Can we build upon the success of Can we build upon the success of faces and pedestrians? faces and pedestrians?

 Both do template matching

Both do template matching

 Both do template matching

Both do template matching

 Capture salient and common patterns

Capture salient and common patterns

 Are these the only two salient & common patterns?

Are these the only two salient & common patterns?

 But how are we going to create the training set?

But how are we going to create the training set?

slide-24
SLIDE 24

Agenda Agenda Agenda Agenda

 Poselets

Poselets

 Training a

Training a poselet poselet g p

 Selecting a good set of

Selecting a good set of poselets poselets

 Impro ing

Impro ing poselets poselets ith conte t ith conte t

 Improving

Improving poselets poselets with context with context

 Detection with

Detection with poselets poselets

 Segmentation

Segmentation

 Attributes

Attributes

 Attributes

Attributes

 Action Recognition

Action Recognition

slide-25
SLIDE 25

Agenda Agenda Agenda Agenda

 Poselets

Poselets

 Training a

Training a poselet poselet g p

 Selecting a good set of

Selecting a good set of poselets poselets

 Impro ing

Impro ing poselets poselets ith conte t ith conte t

 Improving

Improving poselets poselets with context with context

 Detection with

Detection with poselets poselets

 Segmentation

Segmentation

 Attributes

Attributes

 Attributes

Attributes

 Action Recognition

Action Recognition

slide-26
SLIDE 26

Examples of poselets Examples of poselets Examples of poselets Examples of poselets

Patches are often far Patches are often far visually visually, but they are close , but they are close semantically semantically Patches are often far Patches are often far visually visually, but they are close , but they are close semantically semantically

slide-27
SLIDE 27

Agenda Agenda Agenda Agenda

 Poselets

Poselets

 Training a

Training a poselet poselet g p

 Selecting a good set of

Selecting a good set of poselets poselets

 Impro ing

Impro ing poselets poselets ith conte t ith conte t

 Improving

Improving poselets poselets with context with context

 Detection with

Detection with poselets poselets

 Segmentation

Segmentation

 Attributes

Attributes

 Attributes

Attributes

 Action Recognition

Action Recognition

slide-28
SLIDE 28

How do we train a How do we train a poselet poselet for a for a given pose configuration? given pose configuration?

slide-29
SLIDE 29

Finding Correspondences Finding Correspondences Finding Correspondences Finding Correspondences

Given part of a human Given part of a human How do we find a similar How do we find a similar pose pose pose configuration in the pose configuration in the training set? training set?

slide-30
SLIDE 30

Finding Correspondences Finding Correspondences Finding Correspondences Finding Correspondences

Left Shoulder Left Hip

We use We use keypoints keypoints to annotate the joints, eyes, nose, to annotate the joints, eyes, nose,

  • etc. of people
  • etc. of people
slide-31
SLIDE 31

Finding Correspondences Finding Correspondences Finding Correspondences Finding Correspondences

Resid al Error Resid al Error Residual Error Residual Error

slide-32
SLIDE 32

Training Training poselet poselet classifiers classifiers Training Training poselet poselet classifiers classifiers

Residual Residual Error: Error: 0.15 0.15 0.20 0.20 0.10 0.10 0.35 0.35 0.15 0.15 0.85 0.85

1.

  • 1. Given a seed patch

Given a seed patch Fi d h l h f h Fi d h l h f h

2.

  • 2. Find the closest patch for every other person

Find the closest patch for every other person

3.

  • 3. Sort them by residual error

Sort them by residual error y

4.

  • 4. Threshold them

Threshold them

slide-33
SLIDE 33

Training Training poselet poselet classifiers classifiers Training Training poselet poselet classifiers classifiers

1.

  • 1. Given a seed patch

Given a seed patch Fi d h l h f h Fi d h l h f h

2.

  • 2. Find the closest patch for every other person

Find the closest patch for every other person

3.

  • 3. Sort them by residual error

Sort them by residual error y

4.

  • 4. Threshold them

Threshold them

5

U th p iti t i i pl f U th p iti t i i pl f

5.

  • 5. Use them as positive training examples for a

Use them as positive training examples for a classifier (HOG features, linear SVM) classifier (HOG features, linear SVM)

slide-34
SLIDE 34

For a trained poselet we fit: For a trained poselet we fit: For a trained poselet we fit: For a trained poselet we fit:

Nose Right elbow Left knee

Expected person bounds Foreground probability mask Keypoint predictions

slide-35
SLIDE 35

Agenda Agenda Agenda Agenda

 Poselets

Poselets

 Training a

Training a poselet poselet g p

 Selecting a good set of

Selecting a good set of poselets poselets

 Impro ing

Impro ing poselets poselets ith conte t ith conte t

 Improving

Improving poselets poselets with context with context

 Detection with

Detection with poselets poselets

 Segmentation

Segmentation

 Attributes

Attributes

 Attributes

Attributes

 Action Recognition

Action Recognition

slide-36
SLIDE 36

How do we find poselets? How do we find poselets? How do we find poselets? How do we find poselets?

h h d f d d h h d f d d

 Choose thousands of random windows, generate

Choose thousands of random windows, generate poselet poselet candidates, train linear candidates, train linear SVMs SVMs

 Select a small set of

Select a small set of poselets poselets that are: that are:

 Individually effective

Individually effective

 Complementary

Complementary

slide-37
SLIDE 37

Selecting a small set of Selecting a small set of complementary complementary poselets poselets

slide-38
SLIDE 38

Poselet Activations Poselet Activations  Detections & Segmentations Detections & Segmentations

slide-39
SLIDE 39

Creating Poselet Activation Vector Creating Poselet Activation Vector Creating Poselet Activation Vector Creating Poselet Activation Vector

 Step 1: Detect poselets in the image

Step 1: Detect poselets in the image

 Step 1: Detect poselets in the image

Step 1: Detect poselets in the image

slide-40
SLIDE 40

Creating Poselet Activation Vector Creating Poselet Activation Vector Creating Poselet Activation Vector Creating Poselet Activation Vector

 Step 2: Enhance their scores using context

Step 2: Enhance their scores using context

 Step 2: Enhance their scores using context

Step 2: Enhance their scores using context

slide-41
SLIDE 41

Creating Poselet Activation Vector Creating Poselet Activation Vector Creating Poselet Activation Vector Creating Poselet Activation Vector

Two poselets refer to the same Two poselets refer to the same person if their person if their keypoint keypoint predictions predictions are consistent: are consistent:

Not consistent Consistent

 Step 3: Cluster poselets of the same person

Step 3: Cluster poselets of the same person

Not consistent Consistent

 Step 3: Cluster poselets of the same person

Step 3: Cluster poselets of the same person together together

slide-42
SLIDE 42

Creating Poselet Activation Vector Creating Poselet Activation Vector Creating Poselet Activation Vector Creating Poselet Activation Vector

Cluster3 Poselet type Cluster1 Cluster2 0.32 0.11 0.95 0.77 0.08 0.72 0.41

 Step 4: Collect the scores of all poselets in a

Step 4: Collect the scores of all poselets in a

Poselet activation vector

 Step 4: Collect the scores of all poselets in a

Step 4: Collect the scores of all poselets in a cluster into a poselet activation vector cluster into a poselet activation vector

slide-43
SLIDE 43

Poselet Activation Vector Poselet Activation Vector Poselet Activation Vector Poselet Activation Vector

Cluster3 Poselet type Cluster1 Cluster2 0.32 0.11 0.95 0.77 0.08 0.72

Front facing

0 41 0.72

Mostly facing Front facing

0.41

 PAV provides a

PAV provides a distributed representation distributed representation of

  • f

facing right Likely false positive

 PAV provides a

PAV provides a distributed representation distributed representation of

  • f

the pose and is the basis for poselet the pose and is the basis for poselet-

  • based tasks

based tasks

slide-44
SLIDE 44
slide-45
SLIDE 45

Agenda Agenda Agenda Agenda

 Poselets

Poselets

 Training a

Training a poselet poselet g p

 Selecting a good set of

Selecting a good set of poselets poselets

 Improving

Improving poselets poselets with context with context

 Improving

Improving poselets poselets with context with context

 Detection with

Detection with poselets poselets

 Segmentation

Segmentation

 Attributes

Attributes

 Attributes

Attributes

 Action Recognition

Action Recognition

slide-46
SLIDE 46

Problem: The patch may have Problem: The patch may have k i l k i l weak signal weak signal

Front and back Left or h l Face false look similar right leg? positive

A front face poselet Location of d i l Lack of head-and-shoulders A front face poselet can disambiguate them pedestrian poselet can disambiguate Lack of head and shoulders poselet suggests a false positive

Solution: Enhance the Solution: Enhance the poselet poselet score using other score using other consistent consistent poselets poselets consistent consistent poselets poselets

slide-47
SLIDE 47

Using context Using context Using context Using context

1.

  • 1. For each

For each poselet poselet activation on the training set: activation on the training set:

A. A.

Find its label: True positive, False positive, Find its label: True positive, False positive, p , p , p , p , Unknown Unknown

B

Construct a feature vector from activations of Construct a feature vector from activations of

B. B.

Construct a feature vector from activations of Construct a feature vector from activations of

  • ther consistent
  • ther consistent poselets

poselets

T i li l ifi f h T i li l ifi f h l t l t

2.

  • 2. Train a linear classifier for each

Train a linear classifier for each poselet poselet

3.

  • 3. Convert score to probability via logistic

Convert score to probability via logistic p y g p y g regression regression

slide-48
SLIDE 48

The effect of using context The effect of using context The effect of using context The effect of using context

ROC curves for three random poselets ROC curves for three random poselets

Green: Context Green: Context Red: No context

slide-49
SLIDE 49

Agenda Agenda Agenda Agenda

 Poselets

Poselets

 Training a

Training a poselet poselet g p

 Selecting a good set of

Selecting a good set of poselets poselets

 Impro ing

Impro ing poselets poselets ith conte t ith conte t

 Improving

Improving poselets poselets with context with context

 Detection with

Detection with poselets poselets

 Segmentation

Segmentation

 Attributes

Attributes

 Attributes

Attributes

 Action Recognition

Action Recognition

slide-50
SLIDE 50

Object Detection with Object Detection with Poselets Poselets Object Detection with Object Detection with Poselets Poselets

1.

  • 1. Detect

Detect poselets poselets in the image in the image

2

Enhance their scores via context Enhance their scores via context

2.

  • 2. Enhance their scores via context

Enhance their scores via context

3.

  • 3. Cluster consistent ones into object hypotheses

Cluster consistent ones into object hypotheses

The most salient poselet creates the first hypothesis If a poselet is consistent with an existing hypothesis Otherwise it starts a new hypothesis

4.

  • 4. Predict bounding box and score of the cluster

Predict bounding box and score of the cluster

it gets assigned to it

slide-51
SLIDE 51

Object Detection with Object Detection with Poselets Poselets Object Detection with Object Detection with Poselets Poselets

1.

  • 1. Detect

Detect poselets poselets in the image in the image

2

Enhance their scores via context Enhance their scores via context

2.

  • 2. Enhance their scores via context

Enhance their scores via context

3.

  • 3. Cluster consistent ones into object hypotheses

Cluster consistent ones into object hypotheses

The most salient poselet creates the first hypothesis If a poselet is consistent with an existing hypothesis Otherwise it starts a new hypothesis

4.

  • 4. Predict bounding box and score of the cluster

Predict bounding box and score of the cluster

it gets assigned to it

slide-52
SLIDE 52

Results Results Results Results

 Best results on all PASCAL person

Best results on all PASCAL person detection competitions detection competitions

POSELETS Felzenszwalb et al. 2010 48 5% 47 5%

detection competitions detection competitions

2010 48.5% 47.5% 2009 48.3% 47.4% 2008 54.1% 43.1%

slide-53
SLIDE 53

Agenda Agenda Agenda Agenda

 Poselets

Poselets

 Training a

Training a poselet poselet g p

 Selecting a good set of

Selecting a good set of poselets poselets

 Impro ing

Impro ing poselets poselets ith conte t ith conte t

 Improving

Improving poselets poselets with context with context

 Detection with

Detection with poselets poselets

 Segmentation

Segmentation

 Attributes

Attributes

 Attributes

Attributes

 Action Recognition

Action Recognition

slide-54
SLIDE 54

Segmenting people Segmenting people Segmenting people Segmenting people

slide-55
SLIDE 55
slide-56
SLIDE 56
slide-57
SLIDE 57
slide-58
SLIDE 58

Align Align poselet poselet activations (1 of 3) activations (1 of 3) g p ( ) ( )

 Threshold the mask of each

Threshold the mask of each poselet poselet

 Make boundary map of the image (

Make boundary map of the image (Arbelaez Arbelaez et al.) et al.)

 Align the

Align the poselet poselet activations using this non activations using this non-rigid rigid

 Align the

Align the poselet poselet activations using this non activations using this non rigid rigid deformation: deformation:

slide-59
SLIDE 59

Variational Variational smoothing (2 of 3) smoothing (2 of 3) g ( ) g ( )

 The initial object mask is smoothed by taking

The initial object mask is smoothed by taking into account the predicted object boundary : into account the predicted object boundary : into account the predicted object boundary : into account the predicted object boundary :

S h d bj k Smoothed object mask

slide-60
SLIDE 60

Refine via self Refine via self-similarity (3 of 3) similarity (3 of 3) y ( ) y ( )

B f fi Af fi Before refinement After refinement

slide-61
SLIDE 61

Multi Multi-

  • object segmentation
  • bject segmentation

j g j g

Person and horse

slide-62
SLIDE 62

Multi Multi-

  • object segmentation
  • bject segmentation

j g j g

Person and bicycle Person and bicycle

slide-63
SLIDE 63

Some segmentation results… Some segmentation results… g

slide-64
SLIDE 64

Categories r b st in we are best in

slide-65
SLIDE 65

Agenda Agenda Agenda Agenda

 Poselets

Poselets

 Training a

Training a poselet poselet g p

 Selecting a good set of

Selecting a good set of poselets poselets

 Impro ing

Impro ing poselets poselets ith conte t ith conte t

 Improving

Improving poselets poselets with context with context

 Detection with

Detection with poselets poselets

 Segmentation

Segmentation

 Attributes

Attributes

 Attributes

Attributes

 Action Recognition

Action Recognition

slide-66
SLIDE 66

Male or female? Male or female? Male or female? Male or female?

slide-67
SLIDE 67

How do we train attribute How do we train attribute classifiers “in the wild”? classifiers “in the wild”?

 Effective prediction requires inferring the pose

Effective prediction requires inferring the pose and camera view and camera view

 Pose reconstruction is itself a hard problem, but

Pose reconstruction is itself a hard problem, but we don’t need perfect solution. we don’t need perfect solution. W i ib l ifi f h W i ib l ifi f h l

 We train attribute classifiers for each

We train attribute classifiers for each poselet poselet

 Poselets

Poselets implicitly decompose the pose implicitly decompose the pose

slide-68
SLIDE 68

Gender classifier per Gender classifier per poselet poselet is is much easier to train much easier to train

slide-69
SLIDE 69

P l P l l Poselets Poselets: general : general-purpose pose purpose pose decomposition engine Can be decomposition engine Can be decomposition engine. Can be decomposition engine. Can be used any time separating pose used any time separating pose y p g p y p g p from appearance is important from appearance is important

Appearance Appearance is key for: is key for: Pose Pose is key for: is key for:

 Attribute classification

Attribute classification

 Pose reconstruction

Pose reconstruction Attribute classification Attribute classification Pose reconstruction Pose reconstruction

 Action recognition

Action recognition

slide-70
SLIDE 70

Attribute Classification Overview Attribute Classification Overview

Given Given a a test image test image

Poselet Poselet Activations

slide-71
SLIDE 71

Features Features

 Pyramid HOG

Pyramid HOG

 Pyramid HOG

Pyramid HOG

 LAB histogram

LAB histogram

 Skin

Skin features features

 Hands

Hands skin skin

 Hands

Hands-skin skin

 Legs

Legs-

  • skin

skin

Poselet patch B .* C Skin mask Arms mask

Features

patch mask mask

Poselet Poselet Activations

slide-72
SLIDE 72

Attribute Classification Overview Attribute Classification Overview

Poselet-level Attribute Features Classifiers Poselet Poselet Activations

slide-73
SLIDE 73

Attribute Classification Overview Attribute Classification Overview

Person-level Person-level Attribute Classifiers Poselet-level Attribute Features Classifiers Poselet Poselet Activations

slide-74
SLIDE 74

Attribute Classification Overview Attribute Classification Overview

Context-level Attribute Person-level Attribute Classifiers Person-level Attribute Classifiers Poselet-level Attribute Features Classifiers Poselet Poselet Activations

slide-75
SLIDE 75

Is male Is male Is male Is male

slide-76
SLIDE 76

Has long hair Has long hair Has long hair Has long hair

slide-77
SLIDE 77

Wears a hat Wears a hat Wears a hat Wears a hat

slide-78
SLIDE 78

Wears glasses Wears glasses Wears glasses Wears glasses

slide-79
SLIDE 79

Wears long pants Wears long pants Wears long pants Wears long pants

slide-80
SLIDE 80

Wears long sleeves Wears long sleeves Wears long sleeves Wears long sleeves

slide-81
SLIDE 81
slide-82
SLIDE 82

Results Results – Average Precision Average Precision g

slide-83
SLIDE 83

Random 2% of the test set Random 2% of the test set

slide-84
SLIDE 84

Agenda Agenda Agenda Agenda

 Poselets

Poselets

 Training a

Training a poselet poselet g p

 Selecting a good set of

Selecting a good set of poselets poselets

 Impro ing

Impro ing poselets poselets ith conte t ith conte t

 Improving

Improving poselets poselets with context with context

 Detection with

Detection with poselets poselets

 Segmentation

Segmentation

 Attributes

Attributes

 Attributes

Attributes

 Action Recognition

Action Recognition

slide-85
SLIDE 85

Actions in still images Actions in still images Actions in still images … Actions in still images …

 have characteristic :

have characteristic :

 pose and appearance

pose and appearance pose a d appea a ce pose a d appea a ce

 interaction with objects and agents

nteraction with objects and agents

slide-86
SLIDE 86

PASCAL VOC 2010 Action Classification PASCAL VOC 2010 Action Classification

 Action Classification

Action Classification: Predicting the : Predicting the action(s action(s) being ) being performed by a person in a still image Bounding performed by a person in a still image Bounding performed by a person in a still image. Bounding performed by a person in a still image. Bounding boxes are given boxes are given

Relatively small training data/classes

slide-87
SLIDE 87

Poselet selection and training Poselet selection and training

 Restrict training examples to ones from the

Restrict training examples to ones from the g p g p category category

takingphoto Examples from all actions Examples from takingphoto

slide-88
SLIDE 88

Some discriminative Some discriminative poselets poselets

slide-89
SLIDE 89

Spatial model of person Spatial model of person-

  • object
  • bject

p p p p j interaction interaction

slide-90
SLIDE 90

Action classification Action classification Action classification Action classification

  • ne vs. all classifier

Image context

  • ne vs. all classifier

action context bbox poselet activation vector

  • bject activation vector

action context (9 dim) bbox (4 dim) (~500 dim) (4 dim)

slide-91
SLIDE 91

Results on static action Results on static action classification classification

slide-92
SLIDE 92

Feed Feed-

  • forward network

forward network

High level questions: High-level questions: “is this a woman?” “is she running?” g Local pattern matching Local pattern matching “left half of head and shoulder” Oriented gradients

slide-93
SLIDE 93

Feed Feed-

  • forward network

forward network

Lots of context View independent context independent No View context specific

slide-94
SLIDE 94

P l P l l Poselets Poselets: general : general-purpose pose purpose pose decomposition engine Can be decomposition engine Can be decomposition engine. Can be decomposition engine. Can be used any time separating pose used any time separating pose y p g p y p g p from appearance is important from appearance is important

Appearance Appearance is key for: is key for: Pose Pose is key for: is key for:

 Attribute classification

Attribute classification

 Pose reconstruction

Pose reconstruction Attribute classification Attribute classification Pose reconstruction Pose reconstruction

 Action recognition

Action recognition