Deciphering the Face Deciphering the Face Aleix M. Martinez - - PowerPoint PPT Presentation
Deciphering the Face Deciphering the Face Aleix M. Martinez - - PowerPoint PPT Presentation
Deciphering the Face Deciphering the Face Aleix M. Martinez Computational Biology Computational Biology and Cognitive Science Lab aleix@ece.osu.edu l i @ d Human-Computer Interaction Politics Human Human face face face face Art
Human-Computer Politics Interaction
Human Human face face
Art Sign Language
face face
Language Cognitive Cognitive Science Computer Vision
Models of Face Perception
- Features: Shape vs. texture.
… …
- 2D vs. 3D
- Form of the computational space:
p p
Continuous vs. Categorical
What we are going to show
- What is the form of the computational space
in human face perception? Hybrid approach: in human face perception? Hybrid approach: Linear combination of continuous representations of categories representations of categories.
+ c2 c1 + … + cn
- What are the dimensions? Mostly configural.
2 1 n
- In computer vision we need precise detailed
detection of faces and facial features. detect o o aces a d ac a eatu es.
Identity
Same or different?
Identity
Same or different?
Identity
Same or different?
Identity, expression, gender, etc.
Dimensions of the Face Space
Same or different?
Configural processing
Form of the Computational Face Space Computational Face Space
Exemplar-based model Exemplar based model
Exemplar cells
… Norm-based model
Mid-level cells vision Low-level vision
Facial Expressions of Emotion
Muscle Positions Model
Muscle Positions Model
- Global shape (bone structure)
determines identity – configural. y g
- But ONLY muscles are responsible
for expression interaction for expression, interaction …
Configural Processing
Emotion perception in l f emotionless faces
Neutral Neutral Angry Sad Neth & Martinez, JOV, 2009.
Stimuli
25% 50% 100% 75% Neth & Martinez, JOV, 2009.
Experiment Experiment
Less, same, more.
Configural Processing
Sad * * * * * * *
80 90
* *
50 60 70 Less Same
* * * * * * *
20 30 40 More 10
- 100% -75%
- 50%
- 25%
0% 25% 50% 75% 100%
Neth & Martinez, JOV, 2009.
Configural Processing
Angry
* * * * * 80 90 * * * * 50 60 70 Less Same * * * * * * * * 20 30 40 More 10
- 100% -75%
- 50%
- 25%
0% 25% 50% 75% 100%
Neth & Martinez, JOV, 2009.
Norm-based Face Space
Sadness Multidimensional S
75% 100%
Face Space
- density
+ density
50% 75%
+ density
25%
Easier + density
- density
MEAN
density
100%
More difficult Anger Neth & Martinez, JOV, 2009.
Configural Processing
Neth & Martinez, JOV, 2009.
Computational Space
Neth & Martinez, Vision Research, 2010
Computational Space
Thinner face Thinner face Wider face
Neth & Martinez, Vision Research, 2010
American Gothic Illusion
Neth & Martinez, Vision Research, 2010
Why Configural Features?
15 x 10 pixels
Why Configural cues?
sad neutral angry
Neth & Martinez, Vision Research, 2010; Du & Martinez, 2011
Proposed Hybrid Model: Recognizing other emotion labels Recognizing other emotion labels
+ c c + + c + c2 c1 + … + cn
Happily Angrily surprised g y surprised
Martinez, CVPR, 2011
Configural Processing = Precise detection of facial features detection of facial features
4 2 pixels 3,930 images 4.2 pixels error (1.5%) (1.5%)
Ding & Martinez, PAMI, 2010
Face Detection
Features VS context
Observation: Most detections are near the correct location – they are not incorrect, they are imprecise. location they are not incorrect, they are imprecise. Key idea: Use context information to train where not t d t t f d f i l f t
Ding & Martinez, CVPR, 2008; PAMI, 2010
to detect faces and facial features.
Features VS context
Observation: Most detections are near the correct location – they are not incorrect, they are imprecise. location they are not incorrect, they are imprecise. Key idea: Use context information to train where not t d t t f d f i l f t to detect faces and facial features.
Ding & Martinez, CVPR, 2008; PAMI, 2010
Features VS context
Observation: Most detections are near the correct location – they are not incorrect, they are imprecise. location they are not incorrect, they are imprecise. Key idea: Use context information to train where not t d t t f d f i l f t to detect faces and facial features.
Ding & Martinez, CVPR, 2008; PAMI, 2010
Subclass Discriminant Analysis y
Between subclass Between-subclass scatter matrix:
( ) ( )
∑∑
C H T
i
Σ
( ) ( )
.
1 1
∑∑
= =
− − =
i j ij T ij ij B
p μ μ μ μ Σ
Basis vectors:
. Λ = V Σ V Σ
X B
Basis vectors: How many subclasses (H): Minimize the conflict, K.
Zhu & Martinez, PAMI, 2006
Precise Detailed Detection
E 6 2 i l (2%) M l 4 2 (1 5%) Error: 6.2 pixels (2%) vs Manual: 4.2 (1.5%)
Ding & Martinez, CVPR, 2008; PAMI, 2010
Detection + non-rigid SfM
Gotardo & Martinez, PAMI, 2011; Gotardo & Martinez, CVPR, 2011.
36
Take Home Messages
- What is the form of the computational space
in human face perception? Linear combination
- f known categories.
+ c2 c1 + … + cn
Wh t th di i ? M tl fi l
2 1 n
- What are the dimensions? Mostly configural.
- Precise detection of facial features.
CBCSL
Paulo Gotardo, Shichuan Du, Don Neth, Liya Ding, Onur Paulo Gotardo, Shichuan Du, Don Neth, Liya Ding, Onur Hamsici, Samuel Rivera, Fabian Benitez, Hongjun Jia, Di You. National Institutes of Health National Institutes of Health National Science Foundation