Dense Associative Memories and Deep Learning
Dmitry Krotov IBM Research MIT-IBM Watson AI Lab Institute for Advanced Study
Dense Associative Memories and Deep Learning Dmitry Krotov IBM - - PowerPoint PPT Presentation
Dense Associative Memories and Deep Learning Dmitry Krotov IBM Research MIT-IBM Watson AI Lab Institute for Advanced Study Learning Mechanisms Architectures What is associative memory? energy landscape 1 2 3 4 memories
Dmitry Krotov IBM Research MIT-IBM Watson AI Lab Institute for Advanced Study
Architectures Learning Mechanisms
ξ1 ξ2 ξ3 ξ4 energy landscape memories
E = −
N
X
i,j=1
σiTijσj
Tij =
K
X
µ=1
ξµ
i ξµ j
σi ξµ
i
N -number of neurons K -number of memories E = −
K
X
µ=1
⇣ N X
i=1
ξµ
i σi
⌘n power of the interaction vertex
E = −
K
X
µ=1
⇣ N X
i=1
ξµ
i σi
⌘2
Kmax ≈ 0.14N Kmax ≈ αnN n−1 n ≥ 2
σ(t+1)
i
= Sign K X
µ=1
✓ F ⇣ ξµ
i +
X
j6=i
ξµ
j σ(t) j
⌘ F ⇣ ξµ
i +
X
j6=i
ξµ
j σ(t) j
⌘◆
hξµ
i i = 0
hξµ
i ξν j i = δµνδij
vi =
28 28 784 visible neurons classification neurons 10
vi
xα or cα
σ(t+1)
i
= Sign K X
µ=1
✓ F ⇣ ξµ
i +
X
j6=i
ξµ
j σ(t) j
⌘ F ⇣ ξµ
i +
X
j6=i
ξµ
j σ(t) j
⌘◆
cα = g β
K
X
µ=1
✓ F ⇣ − ξµ
αxα +
X
γ6=α
ξµ
γ xγ + N
X
i=1
ξµ
i vi
⌘ − F ⇣ ξµ
αxα +
X
γ6=α
ξµ
γ xγ + N
X
i=1
ξµ
i vi
⌘◆
g(x) = tanh(x)
ξµ
i ∈ N(0, 0.1)
random memories constructed memory vectors training
Solso, McCarthy,1981 Wallis, et al., Journal of Vision,2008
Feature-matching theory Prototype theory
Hubel,Wiesel, 1959
Electrical signal from brain Visual area
Recording electrode Stimulus
training set
64 128 192 256
−1
−0.5 0.5
1
n = 2 n = 3 n = 20 n = 30
power of the interaction vertex feature detectors prototype detectors
64 128 192 256
−1
−0.5 0.5
1
n = 2 n = 3 n = 20 n = 30
1.80%
1.61% 1.44% 1.51%
Simard, Steinkraus, Platt, 2003
1.6% power of the interaction vertex
vi
hµ
cα
cα = g ⇣ K X
µ=1
ξµ
αhµ
⌘
hµ = f ⇣ N X
i=1
ξµ
i vi
⌘
vi cα
vi
xα
E = −
K
X
µ=1
F ⇣ N X
i=1
ξµ
i vi + 10
X
α=1
ξµ
αcα
⌘
x
x
f(x) = ReLU f(x) = RePn−1
n = 2
n
standard Hopfield net DAM
2 3 vi → vi − ∂C ∂vi
10 20 30 40 50 60 70 80
10 A A A A
decision boundary
number of image updates
log(Cα)
C1st C2nd
n=2 n=3 n=20 n=30 3 8 9 5 8 8 8 3 3 3 8 3
classified by n=2 classified by n=8 made with n=8 classified by n=2 classified by n=8
n=2 n=8 n=2 100% 32% n=8 57% 100%
n=2 n=2 n=3 n=20 n=30 n=3 n=20 n=30
98.9% 50.7% 9.07% 3.44% 33.9% 99% 8.71% 3.32% 45.3% 63.7% 98.9% 5.77% 37.6% 48.3% 56.9% 98.8%
Accuracy: 69%
police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria bell cote, bell cot
E = −
K
X
µ=1
⇣ N X
i=1
ξµ
i σi
⌘n
Large Capacity Physics Computer Science Feature to Prototype Transition No Adversarial Problems Psychology Neuroscience Dense Associative Memories