A Model of the Development of the Fusiform Face Area Garrison W. - - PowerPoint PPT Presentation

a model of the development of the fusiform face area
SMART_READER_LITE
LIVE PREVIEW

A Model of the Development of the Fusiform Face Area Garrison W. - - PowerPoint PPT Presentation

A Model of the Development of the Fusiform Face Area Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) Computer Science and Engineering Department Institute for Neural Computation UCSD Collaborators, Past & Present: Ralph


slide-1
SLIDE 1

A Model of the Development of the Fusiform Face Area

Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) Computer Science and Engineering Department Institute for Neural Computation UCSD Collaborators, Past & Present: Ralph Adolphs, Luke Barrington, Serge Belongie, Kristin Branson, Tom Busey, Andy Calder, Eric Christiansen, Matthew Dailey, Piotr Dollar, Michael Fleming, AfmZakaria Haque, Janet Hsiao, Carrie Joyce, Brenden Lake, Kang Lee, Tim Marks, Joe McCleery, Janet Metcalfe, Jonathan Nelson, Nam Nguyen, Curt Padgett, Angelina Saldivar, Honghao Shan, Maki Sugimoto, Matt Tong, Brian Tran, Danke Xie, Keiji Yamada, Lingyun Zhang
slide-2
SLIDE 2

Why model?

  • Models rush in where theories fear to tread.
  • Models can be manipulated in ways people cannot
  • Models can be analyzed in ways people cannot.
slide-3
SLIDE 3

Models rush in where theories fear to tread

  • Theories are high level descriptions of the processes underlying
behavior.
  • They are often not explicit about the processes involved.
  • They are difficult to reason about if no mechanisms are explicit -- they may
be too high level to make explicit predictions.
  • Theory formation itself is difficult.
  • Using machine learning techniques, one can often build a working
working model model of a task for which we have no theories or algorithms (e.g., expression recognition).
  • A working model provides an “intuition pump” for how things might
work, especially if they are “neurally plausible” (e.g., development of face processing - Dailey and Cottrell).
  • A working model may make unexpected predictions (e.g., the
Interactive Activation Model and SLNT).
slide-4
SLIDE 4

Models can be manipulated in ways people cannot

  • We can see the effects of variations in cortical architecture (e.g., split
(hemispheric) vs. non-split models (Shillcock and Monaghan word perception model)).
  • We can see the effects of variations in processing resources (e.g.,
variations in number of hidden units in Plaut et al. models).
  • We can see the effects of variations in environment (e.g., what if our
parents were cans, cups or books instead of humans? I.e., is there something special about face expertise versus visual expertise in general? (Sugimoto and Cottrell, Joyce and Cottrell)).
  • We can see variations in behavior due to different kinds of brain
damage within a single “brain” (e.g. Juola and Plunkett, Hinton and Shallice).
slide-5
SLIDE 5

Models can be analyzed in ways people cannot

In the following, I specifically refer to neural network models.
  • We can do single unit recordings.
  • We can selectively ablate and restore parts of the network, even down
to the single unit level, to assess the contribution to processing.
  • We can measure the individual connections -- e.g., the receptive and
projective fields of a unit.
  • We can measure responses at different layers of processing (e.g.,
which level accounts for a particular judgment: perceptual, object, or categorization? (Dailey et al. J Cog Neuro 2002).
slide-6
SLIDE 6

How (I like) to build Cognitive Models

  • I like to be able to relate them to the brain, so “neurally
plausible” models are preferred -- neural nets.
  • The model should be a working model of the actual task,
rather than a cartoon version of it.
  • Of course, the model should nevertheless be simplifying
(i.e. it should be constrained to the essential features of the problem at hand):
  • Do we really need to model the (supposed) translation invariance
and size invariance of biological perception?
  • As far as I can tell, NO!
  • Then, take the model “as is” and fit the experimental data:
0 fitting parameters is preferred over 1, 2 , or 3.
slide-7
SLIDE 7

The other way (I like) to build Cognitive Models

  • Same as above, except:
  • Use them as exploratory models -- in domains where there
is little direct data (e.g. no single cell recordings in infants
  • r undergraduates) to suggest what we might find if we
could get the data. These can then serve as “intuition pumps.”
  • Examples:
  • Why we might get specialized face processors
  • Why those face processors get recruited for other tasks
slide-8
SLIDE 8

Outline

  • Review of our model of face and object processing
  • Some insights from modeling:
  • Does a specialized processor for faces need to be innately
specified?
  • Why is there a left-side face bias?
slide-9
SLIDE 9

Outline

  • Review of our model of face and object processing
  • Some insights from modeling:
  • Does a specialized processor for faces need to be innately
specified?
  • Why is there a left-side face bias?
slide-10
SLIDE 10

The Face Processing System The Face Processing System

PCA . . . . . . Gabor Filtering Happy Sad Afraid Angry Surprised Disgusted Neural Net Pixel (Retina) Level Object (IT) Level Perceptual (V1) Level Category Level
slide-11
SLIDE 11 PCA . . . . . . Gabor Filtering Bob Carol Ted Alice Neural Net Pixel (Retina) Level Object (IT) Level Perceptual (V1) Level Category Level

The Face Processing System The Face Processing System

slide-12
SLIDE 12

The Face Processing System The Face Processing System

PCA . . . . . . Gabor Filtering Bob Carol Ted Cup Can Book Neural Net Pixel (Retina) Level Object (IT) Level Perceptual (V1) Level Category Level Feature Feature level level
slide-13
SLIDE 13

The Face Processing System The Face Processing System

LSF PCA HSF PCA . . . . . . Gabor Filtering Bob Carol Ted Cup Can Book Neural Net Pixel (Retina) Level Object (IT) Level Perceptual (V1) Level Category Level
slide-14
SLIDE 14

The Gabor Filter Layer

  • Basic feature: the 2-D Gabor wavelet filter (Daugman, 85):
  • These model the processing in early visual areas
Convolution * Magnitudes Subsample in a 29x36 grid
slide-15
SLIDE 15

How to do PCA with a neural network

(Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991)  A self-organizing network that learns whole-object representations (features, Principal Components, (features, Principal Components, Holons Holons, , eigenfaces eigenfaces) ) ... Holons (Gestalt layer) Input from Input from Perceptual Layer Perceptual Layer
slide-16
SLIDE 16

How to do PCA with a neural network

(Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991)  A self-organizing network that learns whole-object representations (features, Principal Components, (features, Principal Components, Holons Holons, , eigenfaces eigenfaces) ) ... Holons (Gestalt layer) Input from Input from Perceptual Layer Perceptual Layer
slide-17
SLIDE 17

How to do PCA with a neural network

(Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991)  A self-organizing network that learns whole-object representations (features, Principal Components, (features, Principal Components, Holons Holons, , eigenfaces eigenfaces) ) ... Holons (Gestalt layer) Input from Input from Perceptual Layer Perceptual Layer
slide-18
SLIDE 18

How to do PCA with a neural network

(Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991)  A self-organizing network that learns whole-object representations (features, Principal Components, (features, Principal Components, Holons Holons, , eigenfaces eigenfaces) ) ... Holons (Gestalt layer) Input from Input from Perceptual Layer Perceptual Layer
slide-19
SLIDE 19

How to do PCA with a neural network

(Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991)  A self-organizing network that learns whole-object representations (features, Principal Components, (features, Principal Components, Holons Holons, , eigenfaces eigenfaces) ) ... Holons (Gestalt layer) Input from Input from Perceptual Layer Perceptual Layer
slide-20
SLIDE 20

How to do PCA with a neural network

(Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991)  A self-organizing network that learns whole-object representations (features, Principal Components, (features, Principal Components, Holons Holons, , eigenfaces eigenfaces) ) ... Holons (Gestalt layer) Input from Input from Perceptual Layer Perceptual Layer
slide-21
SLIDE 21

The “Gestalt” Layer: Holons

(Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991)  A self-organizing network that learns whole-object representations (features, Principal Components, (features, Principal Components, Holons Holons, , eigenfaces eigenfaces) ) ... Holons (Gestalt layer) Input from Input from Perceptual Layer Perceptual Layer
slide-22
SLIDE 22

Holons

  • They act like face cells (Desimone, 1991):
  • Response of single units is strong despite occluding eyes, e.g.
  • Response drops off with rotation
  • Some fire to my dog’s face
  • A novel representation: Distributed templates --
  • each unit’s optimal stimulus is a ghostly looking face (template-
like),
  • but many units participate in the representation of a single face
(distributed).
  • For this audience: Neither exemplars nor prototypes!
  • Explain holistic processing:
  • Why? If stimulated with a partial match, the firing
represents votes for this template: Units “downstream” don’t know what caused this unit to fire. (more on this later…)
slide-23
SLIDE 23

The Final Layer: Classification

(Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; Padgett & Cottrell 1996; Dailey & Cottrell, 1999; Dailey et al. 2002) The holistic representation is then used as input to a categorization network trained by supervised learning.
  • Excellent generalization performance demonstrates the
sufficiency of the holistic representation for recognition Holons Categories Categories ... Output: Cup, Can, Book, Greeble, Face, Bob, Carol, Ted, Happy, Sad, Afraid, etc. Output: Cup, Can, Book, Greeble, Face, Bob, Carol, Ted, Happy, Sad, Afraid, etc. Input from Input from Perceptual Layer Perceptual Layer
slide-24
SLIDE 24

The Final Layer: Classification

  • Categories can be at different levels: basic, subordinate.
  • Simple learning rule (~delta rule). It says (mild lie here):
  • add inputs to your weights (synaptic strengths) when
you are supposed to be on,
  • subtract them when you are supposed to be off.
  • This makes your weights “look like” your favorite patterns
– the ones that turn you on.
  • When no hidden units => No back propagation of error.
  • When hidden units: we get task-specific features (most
interesting when we use the basic/subordinate distinction)
slide-25
SLIDE 25

Outline

  • Review of our model of face and object processing
  • Some insights from modeling:
  • Does a specialized processor for faces need to be innately
specified?
  • Why is there a left-side face bias?
slide-26
SLIDE 26

Introduction

  • The brain appears to devote specialized resources to
face processing.
  • The issue: innate or learned?
  • Our approach: computational models guided by
neuropsychological and experimental data.
  • The model: competing neural networks + biologically
plausible task and input biases.
  • Results: interaction between face discrimination and
low visual acuity leads to networks specializing for face recognition.
  • No innateness necessary!
slide-27
SLIDE 27

Step one: a model with parts

  • Independent networks compete to perform new tasks
  • A mediator rewards winners
  • The question: What might cause a specialized face processor?
Stimulus Decision Mediator Feature Extraction units Processing Face Processing Object Processing ??
slide-28
SLIDE 28

Developmental biases in learning

  • The task: we have strong need to discriminate between
faces but not between baby bottles.
  • Mother’s face recognition at 4 days (Pascalis et al., 1995)
  • The input: low spatial frequencies - which tends to be more
holistic in nature
  • Infant sensitivity to high spatial frequencies is low at birth
From Banks and Salapatek, 1981
slide-29
SLIDE 29 Campbell & Robson (1968) Spatial frequency (cycles/degree) Sensitivity Contrast-Sensitivity Function (CSF) Resolution limit: 50cpd
slide-30
SLIDE 30

Neural Network Implementation

Input Stimulus Image Preprocessing
  • Networks in
competition
  • Output mixed
by gate network
  • More error feedback to “winner”
  • Rich get richer effect
... ... ... ... ... Gate Output ... multiplicative connections High spatial frequency Low spatial frequency
slide-31
SLIDE 31

Experimental methods

  • Image data: 12 faces, 12 books, 12 cups, 12 soda
cans, five examples each.
  • 8-bit grayscale, cropped and scaled to
64x64 pixels
slide-32
SLIDE 32

Image Preprocessing

Gabor Jet Pattern Vector (8x5 Elements) Filter Responses (512x5 Elements) Dimensionality Reduction PCA PCA PCA PCA PCA
slide-33
SLIDE 33

Effects of filtering with different spatial frequencies

slide-34
SLIDE 34

Task Manipulation

  • To investigate the question of task effects,
we trained our system in two conditions:
  • Superordinate four-way classification (book?
face?)
  • Subordinate classification within one class; simple
classification for others (book? John?) Face Can Book Cup Bob Can Book Cup Carol Ted ... Alice Task 1: Superordinate Task 2: Subordinate Network Output Units
slide-35
SLIDE 35

Input spatial frequency manipulation

  • To investigate the effects of spatial frequencies,
we trained our system in two conditions:
  • Each module receives same full pattern vector
  • One module receives low spatial frequencies; other
receives high spatial frequencies a e d c b a 0.5c b e d 0.5c
slide-36
SLIDE 36

Conditions summary

  • Within the subordinate training condition, we also
manipulated which task was the one learned at the subordinate level: faces, cups, cans or books.
  • Thus we have a simple 2x2 design:
  • Two task conditions
  • Two input conditions
  • Within the subordinate task condition, there are four
cells for the four subordinate tasks
slide-37
SLIDE 37

Measuring specialization

  • Train the network
  • Record how gate network outputs change with each
pattern Net 1 Net 2 Gate 0.2 0.8 Net 1 Net 2 Gate 0.7 0.3
slide-38
SLIDE 38

Specialization Results

Gating Unit Average Weight Four-way classification (Face, Book, Cup, Can?) Book identification (Face, Cup, Can, Book1, Book2, ...?) Face identification (Book, Cup, Can, Bob, Carol, Ted, ...?) Module 1 Module 2 Hi freq Lo freq TASK All frequencies Hi/Lo split INPUT
slide-39
SLIDE 39

Why does this happen?

  • To investigate why the low spatial frequency network
always won the face task, we trained single networks on face identification or book identification
  • We measured how well these networks generalized to new
data.
  • The results show that low spatial frequencies generalize
better for face identification.
  • This means that a network with low spatial frequencies will
learn faster also!
slide-40
SLIDE 40

Results from a single network

slide-41
SLIDE 41

Modeling prosopagnosia

  • Can “damage” the specialized network.
Damage to high spatial frequency network degrades object classification Damage to low spatial frequency network differentially degrades face identification X axis is percent damage
slide-42
SLIDE 42

Conclusions so far…

  • There is a strong interaction between task and spatial
frequency in the degree of specialization for face processing.
  • The model suggests that the infant’s low visual acuity
and the need to discriminate between faces but not
  • ther objects could “lock in” a special face processor
early in development.
  • => General mechanisms (competition, known innate
biases) could lead to a specialized face processing “module”
  • No need for an innately-specified processor
slide-43
SLIDE 43

Outline

  • Review of our model of face and object processing
  • Some insights from modeling:
  • Does a specialized processor for faces need to be innately
specified?
  • Why is there a left-side face bias?
slide-44
SLIDE 44

Which of these two people look the most like the middle one?

slide-45
SLIDE 45

Why is that?

slide-46
SLIDE 46

Modeling a split fovea

  • Three models for comparison:
  • No split
  • Split, early convergence of information
  • Split, intermediate: like early, but half the weights
  • Split, late convergence
slide-47
SLIDE 47

Spatial Frequency bias

  • Two methods for biasing the spatial frequencies:
  • No bias
  • Biased “sigmoidally” - one side gets more LSF, the other
side gets more HSF
  • This corresponds to attentional filtering in the DFF
slide-48
SLIDE 48

Experiment 1 Training Data

  • Faces vary in expression
  • From CAFÉ dataset (California Facial Expressions)
slide-49
SLIDE 49

Data Analysis

  • We will compare the three architectures on
performance
  • But most importantly, we will compare them on
the Left-Side bias effect:
  • Given a left-left face, and a right-right face, how
active is the correct output for that face?
  • Left Side Bias effect:
  • Activation(left,left) - Activation(right,right)
slide-50
SLIDE 50

Experiment 1 results: Accuracy

  • Having a biased input reduces accuracy
  • Why?
slide-51
SLIDE 51

Experiment 1 results: Left Side Bias

  • The Late and Intermediate architectures show an
LSB
slide-52
SLIDE 52

Experiment 2 data: Greebles from San Francisco and Ketchikan, AK

  • Lighting is from morning to late afternoon at
different azimuths
  • Train on one, test on the other
slide-53
SLIDE 53

Experiment 2 results: LSB in late and intermediate architectures

slide-54
SLIDE 54

Experiment 3 Data: faces with different lighting

Yale face database
slide-55
SLIDE 55

Experiment 3 Results: LSB in late and intermediate architectures

slide-56
SLIDE 56

Early, Intermediate and Late architectures

  • The “intermediate” architecture = early with half the weights
  • LSF are more redundant, and can probably work well with
fewer weights - this hypothesis needs testing.
slide-57
SLIDE 57

Wrap up

  • We are able to explain a variety of results in face
processing.
  • How a specialized area might arise for faces, and why low
spatial frequencies (LSF) appear to be important in face processing (specialization model: LSF -> better learning and generalization).
  • Why there might be a left-side bias in face recognition
  • And a whole lot more I didn’t talk about today!
slide-58
SLIDE 58

END