[PPT] - Visual deep learning models, in particular for face recognition and PowerPoint Presentation

SLIDE 1

Center Name

Presenter Name

Visual deep learning models,

in particular for face recognition and models of invariant recognition in the ventral stream

Towards a theory of the above

tomaso poggio, CBMM, BCS, CSAIL, McGovern MIT

SLIDE 2

Plan

Recognition in visual cortex
DCLNs
Deep Face systems
iTheory

SLIDE 3

Second ¡Annual ¡NSF ¡Site ¡Visit, ¡June ¡2 ¡– ¡3, ¡2015

Theoretical/conceptual framework for vision

The first 100ms of vision: feedforward and invariant:

what, who, where

Top-down needed for verification step and more

complex questions: generative models, probabilistic inference, top-down visual routines. Following this conceptual framework we are working on: 1.a theory of invariance cortical computation —> i-theory 2.a generative approach, probabilistic in nature 3.visual routines, and of how they may be learned.

SLIDE 4

Object ¡recogni-on

SLIDE 5

Human Brain

–1010-1011 neurons (~1 million flies) –1014- 1015 synapses

Vision: ¡what ¡is ¡where ¡

Ventral stream in rhesus monkey

–~109 neurons in the ventral stream (350 106 in each emisphere) –~15 106 neurons in AIT (Anterior InferoTemporal) cortex

~200M in V1, ~200M in V2, 50M in V4

Van Essen & Anderson, 1990

SLIDE 6

Source: Lennie, Maunsell, Movshon

Vision: ¡what ¡is ¡where ¡

SLIDE 7

[software available online]

Riesenhuber & Poggio 1999, 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; Serre Oliva Poggio 2007

It is in the family of “Hubel-Wiesel”

models (Hubel & Wiesel, 1959: qual. Fukushima, 1980: quant; Oram & Perrett, 1993: qual; Wallis & Rolls, 1997; Riesenhuber & Poggio, 1999; Thorpe, 2002; Ullman et al., 2002; Mel, 1997; Wersing and Koerner, 2003; LeCun et al 1998: not-bio; Amit & Mascaro, 2003: not-bio; Hinton, LeCun, Bengio not-bio; Deco & Rolls 2006…)

As a biological model of
bject recognition in the

ventral stream – from V1 to PFC -- it is perhaps the most quantitatively faithful to known neuroscience data

¡ ¡Recogni-on ¡in ¡Visual ¡Cortex: ¡‘’classical ¡model”, ¡ selec-ve ¡and ¡invariant

SLIDE 8

Feedforward Models:  “predict” rapid categorization   (82% model vs. 80% humans)

Hierarchical feedforward models of the ventral stream

SLIDE 9

Why do these networks including DLCNs work so well? Models are not enough… we need a theory!

SLIDE 10

Plan

Recognition in visual cortex
DCLNs
Deep Face systems
iTheory

SLIDE 11

11

SLIDE 12

12

SLIDE 13

13

SLIDE 14

14

SLIDE 15

15

SLIDE 16

16

SLIDE 17

17

SLIDE 18

18

SLIDE 19

19

SLIDE 20

20

SLIDE 21

21

SLIDE 22

22

SLIDE 23

23

SLIDE 24

24

SLIDE 25

25

SLIDE 26

26

SLIDE 27

27

SLIDE 28

28

Invariance via pooling

SLIDE 29

29

SLIDE 30

30

SLIDE 31

31

SLIDE 32

32

SLIDE 33

33

SLIDE 34

34

SLIDE 35

35

SLIDE 36

36

SLIDE 37

37

SLIDE 38

38

SLIDE 39

39

SLIDE 40

40

SLIDE 41

41

SLIDE 42

42

SLIDE 43

43

SLIDE 44

44

New name for virtual examples

SLIDE 45

45

SLIDE 46

46

A poor man regularization!

SLIDE 47

47

SLIDE 48

48

SLIDE 49

49

SLIDE 50

50

SLIDE 51

51

SLIDE 52

52

SLIDE 53

53

SLIDE 54

54

SLIDE 55

55

SLIDE 56

56

SLIDE 57

57

SLIDE 58

58

SLIDE 59

59

SLIDE 60

60

SLIDE 61

SLIDE 62

SLIDE 63

SLIDE 64

SLIDE 65

SLIDE 66

SLIDE 67

Mobileye

SLIDE 68

SLIDE 69

Plan

Recognition in visual cortex
DCLNs
Deep Face systems
iTheory

SLIDE 70

Plan

Recognition in visual cortex
DCLNs
Deep Face systems
iTheory

SLIDE 71

71

i-theory

Learning of invariant&selective Representations in Sensory Cortex

SLIDE 72

i-theory: exploring a new hypothesis

A main computational goal of the feedforward ventral stream hierarchy — and of vision — is to compute a representation for each incoming image which is invariant to transformations previously experienced in the visual environment.

SLIDE 73

73

Empirical ¡demonstraCon: ¡invariant ¡representaCon ¡ leads ¡to ¡lower ¡sample ¡complexity ¡for ¡a ¡supervised ¡classifier

Theorem ¡ (transla)on ¡ case) ¡ Consider ¡a ¡space ¡of ¡images ¡of ¡ dimensions ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡pixels ¡ which ¡ may ¡ appear ¡ in ¡ any ¡ posiCon ¡ within ¡ a ¡ window ¡ of ¡ size ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡pixels. ¡The ¡usual ¡ image ¡ representaCon ¡ yields ¡ a ¡ sample ¡complexity ¡( ¡of ¡a ¡linear ¡ c l a s s i fi e r ) ¡ ¡

f ¡
rder ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡;the ¡ ¡oracle ¡

representaCon ¡ ¡ (invariant) ¡ yields ¡ (because ¡ of ¡ much ¡ smaller ¡ covering ¡ numbers) ¡ a ¡ ¡ sample ¡complexity ¡of ¡order

d × d

rd × rd

m = O(r2d 2)

moracle = O(d 2) = mimage r2

SLIDE 74

74

An algorithm that learns in an unsupervised way to compute invariant representations

ν

P(ν)

ν

µk

n(I) = 1/|G| |G|

X

i=1

σ(I · gitk + n∆)

SLIDE 75

75

Invariant signature from a single image of a new object

SLIDE 76

We need only a finite number of projections, K, to distinguish among n images. Similar in spirit to Johnson-Lindestrauss

SLIDE 77

Local and global invariance: whole-parts theorem

l=4 l=3 l=2 l=1

HW module

SLIDE 78

biophysics: prediction

SLIDE 79

...

Basic machine: a HW module

(dot products and histograms/moments for image seen through RF)

The cumulative histogram (empirical cdf) can be be computed as
This maps directly into a set of simple cells with threshold
…and a complex cell indexed by n and k summating the simple

cells

µn

k(I) =

1 | G | σ( I,git k + nΔ)

i=1 |G|

∑

nΔ

The nonlinearity can be rather arbitrary for invariance provided it is stationary in time

SLIDE 80

Second ¡Annual ¡NSF ¡Site ¡Visit, ¡June ¡2 ¡– ¡3, ¡2015

Dendrites of a complex cells as simple cells…

Active properties in the dendrites of the complex cell

SLIDE 81

Plan

i-theory
DCLNs
equivalence to DCLNs, theory notes
Some predictions of i-theory
Deep Face systems