Dense Associative Memories and Deep Learning Dmitry Krotov IBM - - PowerPoint PPT Presentation

dense associative memories and deep learning
SMART_READER_LITE
LIVE PREVIEW

Dense Associative Memories and Deep Learning Dmitry Krotov IBM - - PowerPoint PPT Presentation

Dense Associative Memories and Deep Learning Dmitry Krotov IBM Research MIT-IBM Watson AI Lab Institute for Advanced Study Learning Mechanisms Architectures What is associative memory? energy landscape 1 2 3 4 memories


slide-1
SLIDE 1

Dense Associative Memories and Deep Learning

Dmitry Krotov IBM Research MIT-IBM Watson AI Lab Institute for Advanced Study

slide-2
SLIDE 2

Architectures Learning Mechanisms

slide-3
SLIDE 3

What is associative memory?

ξ1 ξ2 ξ3 ξ4 energy landscape memories

slide-4
SLIDE 4

E = −

N

X

i,j=1

σiTijσj

Standard Associative Memory Dense Associative Memory

Tij =

K

X

µ=1

ξµ

i ξµ j

σi ξµ

i

  • dynamical variables
  • memorized patterns

N -number of neurons K -number of memories E = −

K

X

µ=1

⇣ N X

i=1

ξµ

i σi

⌘n power of the interaction vertex

E = −

K

X

µ=1

⇣ N X

i=1

ξµ

i σi

⌘2

Kmax ≈ 0.14N Kmax ≈ αnN n−1 n ≥ 2

slide-5
SLIDE 5

σ(t+1)

i

= Sign  K X

µ=1

✓ F ⇣ ξµ

i +

X

j6=i

ξµ

j σ(t) j

⌘ F ⇣ ξµ

i +

X

j6=i

ξµ

j σ(t) j

⌘◆

hξµ

i i = 0

hξµ

i ξν j i = δµνδij

slide-6
SLIDE 6

Pattern recognition with DAM

vi =

28 28 784 visible neurons classification neurons 10

vi

xα or cα

slide-7
SLIDE 7

σ(t+1)

i

= Sign  K X

µ=1

✓ F ⇣ ξµ

i +

X

j6=i

ξµ

j σ(t) j

⌘ F ⇣ ξµ

i +

X

j6=i

ξµ

j σ(t) j

⌘◆

cα = g  β

K

X

µ=1

✓ F ⇣ − ξµ

αxα +

X

γ6=α

ξµ

γ xγ + N

X

i=1

ξµ

i vi

⌘ − F ⇣ ξµ

αxα +

X

γ6=α

ξµ

γ xγ + N

X

i=1

ξµ

i vi

⌘◆

  • utput cα. The update

g(x) = tanh(x)

slide-8
SLIDE 8

MNIST Dataset

ξµ

i ∈ N(0, 0.1)

random memories constructed memory vectors training

slide-9
SLIDE 9

Main question: What kind of representation of the data has the neural network learned?

slide-10
SLIDE 10

Features vs. prototypes in psychology and neuroscience

Solso, McCarthy,1981 Wallis, et al., Journal of Vision,2008

Feature-matching theory Prototype theory

Hubel,Wiesel, 1959

Electrical signal from brain Visual area

  • f brain

Recording electrode Stimulus

training set

slide-11
SLIDE 11

Feature to prototype transition

64 128 192 256

−1

−0.5 0.5

1

n = 2 n = 3 n = 20 n = 30

power of the interaction vertex feature detectors prototype detectors

slide-12
SLIDE 12

Feature to prototype transition

64 128 192 256

−1

−0.5 0.5

1

n = 2 n = 3 n = 20 n = 30

1.80%

1.61% 1.44% 1.51%

Simard, Steinkraus, Platt, 2003

1.6% power of the interaction vertex

slide-13
SLIDE 13

Duality with feed-forward nets

f(x) = F 0(x)

Duality rule:

energy function activation function

vi

cα = g ⇣ K X

µ=1

ξµ

αhµ

hµ = f ⇣ N X

i=1

ξµ

i vi

vi cα

vi

E = −

K

X

µ=1

F ⇣ N X

i=1

ξµ

i vi + 10

X

α=1

ξµ

αcα

slide-14
SLIDE 14

Commonly used activation functions

x

x

f(x) = ReLU f(x) = RePn−1

n = 2

n

standard Hopfield net DAM

slide-15
SLIDE 15

Question: Are there any tasks for which models with higher

  • rder interactions perform

better than models with quadratic interactions?

slide-16
SLIDE 16

n=2

Adversarial Inputs

2 3 vi → vi − ∂C ∂vi

slide-17
SLIDE 17

10 20 30 40 50 60 70 80

  • 20
  • 10

10 A A A A

decision boundary

number of image updates

log(Cα)

C1st C2nd

n=2 n=3 n=20 n=30 3 8 9 5 8 8 8 3 3 3 8 3

Adversarial Deformations in DAM

slide-18
SLIDE 18

Question: Can we use Dense Associative Memories for classification of high resolution images?

slide-19
SLIDE 19
  • VGG16 coupled to DAM
slide-20
SLIDE 20
  • Adversarial Inputs in the Image Domain
slide-21
SLIDE 21

Input transfer

  • made with n=2

classified by n=2 classified by n=8 made with n=8 classified by n=2 classified by n=8

slide-22
SLIDE 22
slide-23
SLIDE 23

n=2 n=8 n=2 100% 32% n=8 57% 100%

Error rate of misclassification

Generate Classify

slide-24
SLIDE 24

n=2 n=2 n=3 n=20 n=30 n=3 n=20 n=30

generate test

98.9% 50.7% 9.07% 3.44% 33.9% 99% 8.71% 3.32% 45.3% 63.7% 98.9% 5.77% 37.6% 48.3% 56.9% 98.8%

slide-25
SLIDE 25

Results on ImageNet

Accuracy: 69%

slide-26
SLIDE 26

ImageNet errors

police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria bell cote, bell cot

slide-27
SLIDE 27

E = −

K

X

µ=1

⇣ N X

i=1

ξµ

i σi

⌘n

Large Capacity Physics Computer Science Feature to Prototype Transition No Adversarial Problems Psychology Neuroscience Dense Associative Memories