Going after object recognition peformance to discover how the - - PowerPoint PPT Presentation

going after object recognition peformance to discover how
SMART_READER_LITE
LIVE PREVIEW

Going after object recognition peformance to discover how the - - PowerPoint PPT Presentation

invariance is crux problem Going after object recognition peformance to discover how the ventral stream works. hierarchical, working system James DiCarlo MD, PhD Professor of Neuroscience Head, Department of Brain and Cognitive Sciences


slide-1
SLIDE 1

James DiCarlo MD, PhD

Professor of Neuroscience Head, Department of Brain and Cognitive Sciences Investigator, The McGovern Institute for Brain Research Massachusetts Institute of Technology, Cambridge MA, USA

Going after object recognition peformance to discover how the ventral stream works.

“invariance” is crux problem hierarchical, working system

slide-2
SLIDE 2

Ventral visual stream

Systems neuroscience: the non human primate model

slide-3
SLIDE 3

Ventral visual stream

Systems neuroscience: the non human primate model

Powerful set of visual features

slide-4
SLIDE 4

Ventral visual stream

Systems neuroscience: the non human primate model

Powerful set of visual features

slide-5
SLIDE 5

Understanding the brain and discovering game-changing information processing technology are two sides of the same coin.

How the brain works

slide-6
SLIDE 6

When biological brains perform better than computers

computer science

neuroscience psychophysics

The convergence of three fields

How the brain works

When computers perform as well as

  • r better than biological brains

Falsifiable hypotheses Attempt to test/ falsify those hypotheses New ideas, algorithm parameters New phenomena

slide-7
SLIDE 7

Common physical source (object) leads to many images

Poggio, Ullman, Grossberg, Edleman, Biederman, etc. DiCarlo and Cox, TICS (2007); Pinto, Cox, and DiCarlo, PLoS Comp Bio (2008)

“identity preserving image variation”

View: position, size, pose, illumination Clutter, occlusion, illumination Intraclass Deformation, articulation

slide-8
SLIDE 8

computer science

neuroscience

The convergence of three fields

How the brain works

New ideas, algorithm parameters New phenomena

psychophysics

slide-9
SLIDE 9
  • Examples:
  • Hubel & Wiesel (1962)
  • Fukushima (1980)
  • Perrett & Oram (1993)
  • Wallis & Rolls (1997)
  • LeCun et al. (1998)
  • Risenhuber & Poggio (1999)
  • Serre, Kouh, et al. (2005)

Brain-inspired computer algorithms

  • 1. Selectivity
  • 2. Tolerance

“AND” “OR”

Serre, Kouh, Cadieu, Knoblich, Kreiman & Poggio 2005

  • Hierarchy
  • Spatially local filters
  • Convolution
  • Normalization
  • Threshold NL
  • Unsupervised learning
  • ...

FROM BIOLOGY:

slide-10
SLIDE 10

computer science

neuroscience psychophysics

The convergence of three fields

How the brain works

Falsifiable hypotheses Attempt to test/ falsify those hypotheses

e.g. HMAX

slide-11
SLIDE 11

Serre, Kouh, Cadieu, Knoblich, Kreiman & Poggio 2005

HMAX successes (~2005)

slide-12
SLIDE 12

Serre Oliva & Poggio 2007 (under limited human viewing conditions)

HMAX successes (~2007)

slide-13
SLIDE 13

pixels

Human level

IT population HMAX

Circa 2007 Performance

slide-14
SLIDE 14

~2008: But HMAX and other models failed to explain neurons

HMAX model

Representational similarity analysis

Kriegeskorte, Frontiers in Neuroscience (2009)

Biological ventral stream Models of ventral stream

slide-15
SLIDE 15

computer science

neuroscience psychophysics

What went wrong?

How the brain works

Falsifiable hypotheses Attempt to test/ falsify those hypotheses New ideas, algorithm parameters New phenomena

Stringency of these “Brains vs. Machines” tests was far too weak

slide-16
SLIDE 16

“V1-like” models One problem was insufficient variation in the test sets.

~2008: Tests of performance were not stringent enough.

Pinto, Cox, and DiCarlo, PLoS Comp Bio (2008)

SLF (~HMAX)

Caltech 101 benchmark

Head Close-body Medium-body Far-body

50 75 100

Performance (%)

“HMAX 2.0”

(Serre et al. PNAS 2007)

Pinto, Majaj, Barhomi, Salomon, Cox, DiCarlo COSYNE 2010

Animal vs. Non-animal

Humans V1-like

slide-17
SLIDE 17

pixels

Human level

IT population HMAX V1-like

Performance

slide-18
SLIDE 18

Example object recognition task: “car detection”

Pinto, Cox & DiCarlo, PLoS Comp Bol (2008), Pinto, DiCarlo and Cox, ECCV (2008); Pinto, Doukan, DiCarlo & Cox, PLoS Comp Biol (2009)

Image generation strategy: 2009: More stringent, but compact tests of “object recognition”

slide-19
SLIDE 19

Example object recognition task: “car detection”

Pinto, Cox & DiCarlo, PLoS Comp Bol (2008), Pinto, DiCarlo and Cox, ECCV (2008); Pinto, Doukan, DiCarlo & Cox, PLoS Comp Biol (2009)

no variation

more variation lots of variation

Image generation strategy:

  • Parametric control of task demand (esp. invariance)
  • Few images needed to bring computer vision features to their knees

no variation more variation lots of variation

“car” not “car” ... ...

n>100 n>700 Basic car task, variation level: 3

2009: Toward more stringent tests of “object recognition”

slide-20
SLIDE 20

Δ

Data merged here: 48 basic-level tasks (8 labels x 6 level of variation)

Machines lose to humans

2010: Machines vs. human brains

Machines beat humans!

0% 0% 0% 0° 0° 10% 20% 10% 15° 15° 20% 40% 20% 30° 30° 30% 60% 30% 45° 45° 40% 80% 40% 60° 60° 50% 100% 50% 75° 75° 60% 120% 60% 90° 90° position (x-axis) position (y-axis) scale in-plane rotation in-depth rotation

Increasing Composite Variation Performance (%)

4 6 1 2 3 50 60 70 80 90 100

P i x e l s V1-like

chance SIFT SLF V1-like

a) “cars vs. planes” task b) controls

25 Performance relative to Pixels (%)

Geometric Blur PHOG PHOW

Pinto, Barhomi, Cox & DiCarlo, WACV(2010)

S L F P H O W P H O G S I F T

( ~ H M A X )

slide-21
SLIDE 21

pixels

Human level

IT population HMAX V1-like

Performance

slide-22
SLIDE 22

pixels

Human level

IT population HMAX V1-like

Performance

slide-23
SLIDE 23

pixels

Human level

IT population HMAX V1-like

Performance

slide-24
SLIDE 24

pixels

Human level

IT population HMAX V1-like

Performance

simple decode

slide-25
SLIDE 25

pixels

Human level

IT population HMAX V1-like

Performance

V4 population

simple decode

slide-26
SLIDE 26

pixels

Human level

IT population HMAX V1-like

Performance

V4 population Super Vision HMO

?

Zeiler& Fergus

simple decode

slide-27
SLIDE 27

IT neuronal units V2-like V4 neuronal units V1-like Pixels

a

HMO model

Animals (8) Boats (8) Cars (8) Chairs (8) Faces (8) Fruits (8) Planes (8) Tables (8)

Image

Neural population similarity of images along the ventral stream

IT neuronal units HMO model

Animals (8) Boats (8) Cars (8) Chairs (8) Faces (8) Fruits (8) Planes (8) Tables (8)

Image

  • ther models

Image generalization Object generalization 0.9 0.6 0.3 0.0 Popululation similaritty to IT

b

Pixels V1-like SIFT HMAX V2-like HMO V4 units IT units split-half (RDM correlation)

Explanatory power of HMO model Current maximum expected explanatory power *

Yamins, Hong, Soloman, Seibert and DiCarlo (under review) Inspired by N. Kriegeskorte et al. (2008, 2009)

slide-28
SLIDE 28

Animals Boats Cars

Chairs

Faces Fruits Planes Tables Unit 1: r2 = 0.48 Animals Boats Cars Chairs

Faces

Fruits Planes Tables Unit 2: r2 = 0.55

d

Yamins, Hong, Soloman, Seibert and DiCarlo (under review)

Ability to predict IT responses to new images and new objects is dramatically better than previous models.

Predictions of single site IT responses from current best model

Response of neural site Prediction of HMO model Response of neural site Prediction of HMO model

slide-29
SLIDE 29

... 1 2 k

  • Normalize

Pool Filter Threshold & Saturate Neural-like basic operations

L2 L3 a Basic operations: L1

  • filter , thr

, sat ,

pool

, norm

  • Hierarchical Stacking

Basic bio-constrained model component inside HMO

Hubel & Wiesel (1962), Fukushima (1980); Perrett & Oram (1993); Wallis & Rolls (1997); LeCun et al. (1998); Riesenhuber & Poggio (1999); Serre, Kouh, et al. (2005), etc....

Pinto, Doukan, DiCarlo & Cox, PLoS Comp Biol (2009)

“Output” is thousands

  • f visual

features

slide-30
SLIDE 30

0% 50%

Performance of artificial visual features

(% correct)

Ability of artificial visual features to predict IT responses

(% variance explained)

Exploration of basic model class

We are optimizing this way

The better a model performs, the better is explains IT responses.

(2013)

slide-31
SLIDE 31

pixels

Human level

IT population HMAX V1-like

Performance

V4 population Super Vision

Zeiler& Fergus

HMO

? ?

simple decode

Today:

slide-32
SLIDE 32

computer science

neuroscience psychophysics

Follow the performance trail...

How the brain works

Falsifiable hypotheses Attempt to test/ falsify those hypotheses New ideas, algorithm parameters New phenomena

Stringency of these tests is crucial. Must include “invariance”.

slide-33
SLIDE 33

The power of stringent tests to elucidate biological brains

  • Discover IT neuronal codes that can explain behavior
  • Demonstrate that other possible codes CANNOT
  • Demonstrate which computer vision features CANNOT

1)

  • Driving discovery (“learning?”) of new CV features
  • These are becoming more and more capable of

explaining what the brain is doing

2)

Dan Yamins Ha Hong Charles Cadieu

Dave Cox Nicolas Pinto

Dan Yamins Ha Hong Ethan Soloman