James DiCarlo MD, PhD
Professor of Neuroscience Head, Department of Brain and Cognitive Sciences Investigator, The McGovern Institute for Brain Research Massachusetts Institute of Technology, Cambridge MA, USA
Going after object recognition peformance to discover how the - - PowerPoint PPT Presentation
invariance is crux problem Going after object recognition peformance to discover how the ventral stream works. hierarchical, working system James DiCarlo MD, PhD Professor of Neuroscience Head, Department of Brain and Cognitive Sciences
Professor of Neuroscience Head, Department of Brain and Cognitive Sciences Investigator, The McGovern Institute for Brain Research Massachusetts Institute of Technology, Cambridge MA, USA
Ventral visual stream
Ventral visual stream
Powerful set of visual features
Ventral visual stream
Powerful set of visual features
Poggio, Ullman, Grossberg, Edleman, Biederman, etc. DiCarlo and Cox, TICS (2007); Pinto, Cox, and DiCarlo, PLoS Comp Bio (2008)
View: position, size, pose, illumination Clutter, occlusion, illumination Intraclass Deformation, articulation
“AND” “OR”
Serre, Kouh, Cadieu, Knoblich, Kreiman & Poggio 2005
FROM BIOLOGY:
Serre, Kouh, Cadieu, Knoblich, Kreiman & Poggio 2005
Serre Oliva & Poggio 2007 (under limited human viewing conditions)
Kriegeskorte, Frontiers in Neuroscience (2009)
Pinto, Cox, and DiCarlo, PLoS Comp Bio (2008)
SLF (~HMAX)
“HMAX 2.0”
(Serre et al. PNAS 2007)
Pinto, Majaj, Barhomi, Salomon, Cox, DiCarlo COSYNE 2010
Humans V1-like
Example object recognition task: “car detection”
Pinto, Cox & DiCarlo, PLoS Comp Bol (2008), Pinto, DiCarlo and Cox, ECCV (2008); Pinto, Doukan, DiCarlo & Cox, PLoS Comp Biol (2009)
Example object recognition task: “car detection”
Pinto, Cox & DiCarlo, PLoS Comp Bol (2008), Pinto, DiCarlo and Cox, ECCV (2008); Pinto, Doukan, DiCarlo & Cox, PLoS Comp Biol (2009)
no variation
n>100 n>700 Basic car task, variation level: 3
Δ
Machines lose to humans
Machines beat humans!
0% 0% 0% 0° 0° 10% 20% 10% 15° 15° 20% 40% 20% 30° 30° 30% 60% 30% 45° 45° 40% 80% 40% 60° 60° 50% 100% 50% 75° 75° 60% 120% 60% 90° 90° position (x-axis) position (y-axis) scale in-plane rotation in-depth rotation
Increasing Composite Variation Performance (%)
4 6 1 2 3 50 60 70 80 90 100
P i x e l s V1-like
chance SIFT SLF V1-like
25 Performance relative to Pixels (%)
Geometric Blur PHOG PHOW
Pinto, Barhomi, Cox & DiCarlo, WACV(2010)
S L F P H O W P H O G S I F T
( ~ H M A X )
simple decode
simple decode
simple decode
IT neuronal units V2-like V4 neuronal units V1-like Pixels
a
HMO model
Animals (8) Boats (8) Cars (8) Chairs (8) Faces (8) Fruits (8) Planes (8) Tables (8)
Image
IT neuronal units HMO model
Animals (8) Boats (8) Cars (8) Chairs (8) Faces (8) Fruits (8) Planes (8) Tables (8)
Image
Image generalization Object generalization 0.9 0.6 0.3 0.0 Popululation similaritty to IT
b
Pixels V1-like SIFT HMAX V2-like HMO V4 units IT units split-half (RDM correlation)
Explanatory power of HMO model Current maximum expected explanatory power *
Yamins, Hong, Soloman, Seibert and DiCarlo (under review) Inspired by N. Kriegeskorte et al. (2008, 2009)
Animals Boats Cars
Chairs
Faces Fruits Planes Tables Unit 1: r2 = 0.48 Animals Boats Cars Chairs
Faces
Fruits Planes Tables Unit 2: r2 = 0.55
Yamins, Hong, Soloman, Seibert and DiCarlo (under review)
Ability to predict IT responses to new images and new objects is dramatically better than previous models.
Response of neural site Prediction of HMO model Response of neural site Prediction of HMO model
pool
Hubel & Wiesel (1962), Fukushima (1980); Perrett & Oram (1993); Wallis & Rolls (1997); LeCun et al. (1998); Riesenhuber & Poggio (1999); Serre, Kouh, et al. (2005), etc....
Pinto, Doukan, DiCarlo & Cox, PLoS Comp Biol (2009)
0% 50%
(% correct)
(% variance explained)
Exploration of basic model class
(2013)
simple decode
Dan Yamins Ha Hong Charles Cadieu
Dave Cox Nicolas Pinto
Dan Yamins Ha Hong Ethan Soloman