Energy Based Models
Volodymyr Kuleshov
Cornell Tech
Lecture 11
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 1 / 37
Energy Based Models Volodymyr Kuleshov Cornell Tech Lecture 11 - - PowerPoint PPT Presentation
Energy Based Models Volodymyr Kuleshov Cornell Tech Lecture 11 Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 1 / 37 Announcements Assignment 2 is due at midnight today! If submitting late, please mark it as such.
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 1 / 37
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 2 / 37
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 3 / 37
1 Energy-Based Models
2 Representation
3 Learning
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 4 / 37
1 non-negative: p(x) ≥ 0 2 sum-to-one:
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 5 / 37
1 non-negative: p(x) ≥ 0 2 sum-to-one:
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 6 / 37
1
2σ2 . Volume is:
2σ2 dx =
2
λ. → Exponential
3
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 7 / 37
1
2
3
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 8 / 37
1
2
3
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 9 / 37
1 extreme flexibility: can use pretty much any function fθ(x) you want
1 Sampling from pθ(x) is hard 2 Evaluating and optimizing likelihood pθ(x) is hard (learning is hard) 3 No feature learning (but can add latent variables)
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 10 / 37
σ2 , −1 2σ2 ).
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 11 / 37
1 Energy-Based Models
2 Representation
3 Learning
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 12 / 37
1
2
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 13 / 37
X
Y
X
Y
X
Y
cat
“class” noun
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 14 / 37
Y1 X1 Y2 X2 Y3 X3 Y7 X7 Y4 X4 Y5 X5 Y6 X6 Y8 X8 Y9 X9
Xi: noisy pixels Yi: “true” pixels Markov Random Field
i
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 15 / 37
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 16 / 37
1
2
m
Hidden units
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 17 / 37
Deep Boltzmann machine v h(3) h(2) h(1) W(3) W(2) W(1)
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 18 / 37
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 19 / 37
1 Energy-Based Models
2 Representation
3 Learning
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 20 / 37
1 can plug in pretty much any function fθ(x) you want
1 Sampling is hard 2 Evaluating likelihood (learning) is hard 3 Feature learning is even harder
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 21 / 37
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 22 / 37
1
2
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 23 / 37
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 24 / 37
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 25 / 37
1 MCMC sampling from the distribution at each step of gradient
2 (Persistent) contrastive divergence, a variant of MCMC sampling
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 26 / 37
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 27 / 37
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 28 / 37
1
2
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 29 / 37
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 30 / 37
P(x)Q(x′|x) P(x′)Q(x|x′) > 1 and thus
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 31 / 37
1 Sample x′
2 Set xt+1 = (xt
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 32 / 37
1
2
1
2
3
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 33 / 37
Z(θ)
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 34 / 37
Z(θ)
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 35 / 37
1 Can plug in pretty much any function fθ(x) you want 2 Can be combined with other model families 3 Can be combined with ideas from graphical models
1 Sampling is hard 2 Evaluating likelihood (learning) is hard 3 Feature learning is even harder Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 36 / 37
Volodymyr Kuleshov (Cornell Tech) Deep Generative Models Lecture 11 37 / 37