CS 4803 / 7643: Deep Learning Topics: Variational Auto-Encoders - - PowerPoint PPT Presentation

cs 4803 7643 deep learning
SMART_READER_LITE
LIVE PREVIEW

CS 4803 / 7643: Deep Learning Topics: Variational Auto-Encoders - - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Variational Auto-Encoders (VAEs) Reparameterization trick Dhruv Batra Georgia Tech Administrativia HW4 Grades Released Regrade requests close: 12/03, 11:55pm Please check solutions


slide-1
SLIDE 1

CS 4803 / 7643: Deep Learning

Dhruv Batra Georgia Tech

Topics:

– Variational Auto-Encoders (VAEs) – Reparameterization trick

slide-2
SLIDE 2

Administrativia

  • HW4 Grades Released

– Regrade requests close: 12/03, 11:55pm – Please check solutions first!

  • Grade histogram: 7643

– Max possible: 100 (regular credit) + 40 (extra credit)

(C) Dhruv Batra 2

slide-3
SLIDE 3

Administrativia

  • HW4 Grades Released

– Regrade requests close: 12/03, 11:55pm – Please check solutions first!

  • Grade histogram: 4803

– Max possible: 100 (regular credit) + 40 (extra credit)

(C) Dhruv Batra 3

slide-4
SLIDE 4

Recap from last time

(C) Dhruv Batra 4

slide-5
SLIDE 5

Variational Autoencoders (VAE)

slide-6
SLIDE 6

So far...

VAEs define intractable density function with latent z: PixelCNNs define tractable density function, optimize likelihood of training data:

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-7
SLIDE 7

Variational Auto Encoders

VAEs are a combination of the following ideas:

  • 1. Auto Encoders
  • 2. Variational Approximation
  • Variational Lower Bound / ELBO
  • 3. Amortized Inference Neural Networks
  • 4. “Reparameterization” Trick

(C) Dhruv Batra 7

slide-8
SLIDE 8

Encoder Input data Features Decoder Reconstructed input data

Reconstructed data Input data

Encoder: 4-layer conv Decoder: 4-layer upconv

L2 Loss function:

Train such that features can be used to reconstruct original data Doesn’t use labels!

Autoencoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-9
SLIDE 9

Encoder Input data Features Decoder Reconstructed input data

Autoencoders can reconstruct data, and can learn features to initialize a supervised model Features capture factors of variation in training data. Can we generate new images from an autoencoder?

Autoencoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-10
SLIDE 10

Variational Autoencoders

Probabilistic spin on autoencoders - will let us sample from the model to generate data!

Image Credit: https://jaan.io/what-is-variational-autoencoder-vae-tutorial/

q𝜚 𝑨 𝑦 p𝜄 𝑦 𝑨

slide-11
SLIDE 11

Variational Auto Encoders

VAEs are a combination of the following ideas:

  • 1. Auto Encoders
  • 2. Variational Approximation
  • Variational Lower Bound / ELBO
  • 3. Amortized Inference Neural Networks
  • 4. “Reparameterization” Trick

(C) Dhruv Batra 11

slide-12
SLIDE 12

Key problem

  • P(z|x)

(C) Dhruv Batra 12

slide-13
SLIDE 13

What is Variational Inference?

  • Key idea

– Reality is complex – Can we approximate it with something “simple”? – Just make sure simple thing is “close” to the complex thing.

(C) Dhruv Batra 13

slide-14
SLIDE 14

Intuition

(C) Dhruv Batra 14

slide-15
SLIDE 15
  • Marginal likelihood – x is observed, z is missing:

The general learning problem with missing data

15 (C) Dhruv Batra

ll(θ : D) = log

N

Y

i=1

P(xi | θ) =

N

X

i=1

log P(xi | θ) =

N

X

i=1

log X

z

P(xi, z | θ)

slide-16
SLIDE 16

Jensen’s inequality

  • Use: log åz P(z) g(z) ≥ åz P(z) log g(z)

16 (C) Dhruv Batra

slide-17
SLIDE 17

Applying Jensen’s inequality

  • Use: log åz P(z) g(z) ≥ åz P(z) log g(z)

17 (C) Dhruv Batra

slide-18
SLIDE 18

Evidence Lower Bound

  • Define potential function F(q,Q):

18 (C) Dhruv Batra

ll(θ : D) ≥ F(θ, Qi) =

N

X

i=1

X

z

Qi(z) log P(xi, z | θ) Qi(z)

slide-19
SLIDE 19

ELBO: Factorization #1 (GMMs)

19 (C) Dhruv Batra

ll(θ : D) ≥ F(θ, Qi) =

N

X

i=1

X

z

Qi(z) log P(xi, z | θ) Qi(z)

slide-20
SLIDE 20

ELBO: Factorization #2 (VAEs)

20 (C) Dhruv Batra

ll(θ : D) ≥ F(θ, Qi) =

N

X

i=1

X

z

Qi(z) log P(xi, z | θ) Qi(z)

slide-21
SLIDE 21

Variational Auto Encoders

VAEs are a combination of the following ideas:

  • 1. Auto Encoders
  • 2. Variational Approximation
  • Variational Lower Bound / ELBO
  • 3. Amortized Inference Neural Networks
  • 4. “Reparameterization” Trick

(C) Dhruv Batra 21

slide-22
SLIDE 22

Amortized Inference Neural Networks

(C) Dhruv Batra 22

slide-23
SLIDE 23

VAEs

(C) Dhruv Batra 23

Image Credit: https://www.kaggle.com/rvislaywade/visualizing-mnist-using-a-variational-autoencoder

slide-24
SLIDE 24

Variational Autoencoders

Probabilistic spin on autoencoders - will let us sample from the model to generate data!

Image Credit: https://jaan.io/what-is-variational-autoencoder-vae-tutorial/

q𝜚 𝑨 𝑦 p𝜄 𝑦 𝑨

slide-25
SLIDE 25

Encoder network Input Data

Putting it all together: maximizing the likelihood lower bound

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-26
SLIDE 26

Encoder network Input Data

Putting it all together: maximizing the likelihood lower bound

Make approximate posterior distribution close to prior

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-27
SLIDE 27

Encoder network Sample z from Input Data

Putting it all together: maximizing the likelihood lower bound

Make approximate posterior distribution close to prior

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-28
SLIDE 28

Encoder network Decoder network Sample z from Input Data

Putting it all together: maximizing the likelihood lower bound

Make approximate posterior distribution close to prior

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-29
SLIDE 29

Encoder network Decoder network Sample z from Sample x|z from Input Data

Putting it all together: maximizing the likelihood lower bound

Make approximate posterior distribution close to prior Maximize likelihood of

  • riginal input

being reconstructed

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-30
SLIDE 30

Decoder network Sample z from Sample x|z from

Use decoder network. Now sample z from prior! Data manifold for 2-d z Vary z1 Vary z2

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Variational Auto Encoders: Generating Data

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-31
SLIDE 31

Vary z1 Vary z2 Degree of smile Head pose Diagonal prior on z => independent latent variables Different dimensions of z encode interpretable factors

  • f variation

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Variational Auto Encoders: Generating Data

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-32
SLIDE 32

Plan for Today

  • VAEs

– Reparameterization trick

(C) Dhruv Batra 32

slide-33
SLIDE 33

Variational Auto Encoders

VAEs are a combination of the following ideas:

  • 1. Auto Encoders
  • 2. Variational Approximation
  • Variational Lower Bound / ELBO
  • 3. Amortized Inference Neural Networks
  • 4. “Reparameterization” Trick

(C) Dhruv Batra 33

slide-34
SLIDE 34

Encoder network Decoder network Sample z from Sample x|z from Input Data

Putting it all together: maximizing the likelihood lower bound

Make approximate posterior distribution close to prior Maximize likelihood of

  • riginal input

being reconstructed

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-35
SLIDE 35

Putting it all together: maximizing the likelihood lower bound

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-36
SLIDE 36

Basic Problem

(C) Dhruv Batra 36

n Ez∼pθ(z)[f(z)]

slide-37
SLIDE 37

Basic Problem

  • Goal

(C) Dhruv Batra 37

min

θ

Ez∼pθ(z)[f(z)]

slide-38
SLIDE 38

Basic Problem

  • Goal
  • Need to compute:

(C) Dhruv Batra 38

min

θ

Ez∼pθ(z)[f(z)] rθ Ez∼pθ(z)[f(z)]

slide-39
SLIDE 39

Basic Problem

  • Need to compute:

(C) Dhruv Batra 39

rθ Ez∼pθ(z)[f(z)]

slide-40
SLIDE 40

Example

(C) Dhruv Batra 40

slide-41
SLIDE 41

Does this happen in supervised learning?

  • Goal

(C) Dhruv Batra 41

min

θ

Ez∼pθ(z)[f(z)]

slide-42
SLIDE 42

But what about other kinds of learning?

  • Goal

(C) Dhruv Batra 42

min

θ

Ez∼pθ(z)[f(z)]

slide-43
SLIDE 43

Two Options

  • Score Function based Gradient Estimator

aka REINFORCE (and variants)

  • Path Derivative Gradient Estimator

aka “reparameterization trick”

(C) Dhruv Batra 43

slide-44
SLIDE 44

Option 1

  • Score Function based Gradient Estimator

aka REINFORCE (and variants)

(C) Dhruv Batra 44

slide-45
SLIDE 45

45

= rθ Z πθ(τ)R(τ)dτ

<latexit sha1_base64="4gyGx/NjBUPBtNX1H/STwXdkv6o=">AD/nicjVNLixNBEJ6d8bGOj82qNy+NIZBoCDOroJfAogjiYVlX5CeD2dTtJsz4PuGtkwDvhXvHhQxKu/w5v/xu7MxM1DxIZqr/6quqrojvKBFfgeb+2bOfK1WvXt2+4N2/dvrPT2L17otJcUnZMU5HKs4goJnjCjoGDYGeZCSOBDuNzl+a+Ol7JhVPkyOYZSyIySThY04JaCjcte+3UB/hmMA0iopXZViQELDiMc54G9NRCh9UCJ0uUmEBj/3ShFB2GelqeqfEgo1hgFUea1bfK4dHVUlKRPGubC94WPLJFAKEsdtCb9oYpgxIZ60/BpJXcK0DbInx5LZef4oqauiEmWyfQC4bEktPDL4qCsJPG+Xw4PFvL8TXkF70LZRaRyOm6r6jx8ZLQROdHsC6PMgOWl2NWxTd1/TG0EZryeyRwSEglSn1e3sRL5n91sriVwNwvxBJYk1OtbTxyZf9hoej1vbmjT8WunadV2GDZ+4lFK85glQAVRauB7GQFkcCpYKWLc8UyQs/JhA20m5CYqaCYX98StTQyQuNU6k9LnKPLGQWJlZrFkWYatWo9ZsC/xQY5jJ8HBU+yHFhCq0bjXCBIkXkLaMQloyBm2iFUcq0V0SnRlwf0i3H1Evz1kTedk72e/6S39/Zpc/9FvY5t64H10GpbvXM2rdeW4fWsUXtwv5kf7G/Oh+dz84353tFtbfqnHvWijk/fgNGClAV</latexit>

Recall: Policy Gradients

rθJ(θ) = rθEτ∼pθ(τ)[R(τ)]

<latexit sha1_base64="Ak6iFp2ndaPkRSt0gfcdrY+Q7cQ=">AEBXicjVNb9NAEN3afBTzlcKxlxVRJAeiyi5IcIlUgZAQh6qgpq2UTaz1ZpOs6i/tjlEjsxcu/BUuHECIK/+BG/+Gdey0TYJQR7I0fvP2zZvRbphFQoHn/dmw7GvXb9zcvOXcvnP3v3G1oMjleaS8R5Lo1SehFTxSCS8BwIifpJTuMw4sfh6auyfvyBSyXS5BmGR/EdJKIsWAUDBRsWdst3MUkpjANw+K1DgoaAFEiJplwCRul8FEF0O5gFRTwxNdlCWcXlY6htzWJ+Bj6ROWxYXU9PTysJBmNivfaXfCIFJMpDAhTgu/dQlMOdD2Sn8CNK+6BXBLZHzHpdk5/hC0ygSmUyPcNkLCkrfF3s68qS6Pp6uL+w56/bK0QHdAfTKmk7rarz8HpjcqJYZ+VzkpQX5hdHrvU/c/UpcFM1DM5JKFhROu/5V0sVa6ymfWlM2uNVzjrum0g4aTW/HmwdeT/w6aI6DoLGbzJKWR7zBFhEler7XgaDgkoQLOLaIbniGWndML7Jk1ozNWgmN9ijVsGeFxKs2XAJ6jl08UNFZqFoeGWTpVq7US/Fetn8P4xaAQSZYDT1jVaJxHGFJcPgk8EpIziGYmoUwK4xWzKTV3CMzDcwS/NWR15Oj3R3/6c7u2fNvZf1OjbRNnqEXOSj52gPvUEHqIeY9cn6Yn2zvtuf7a/2D/tnRbU26jMP0VLYv/4CxJTew=</latexit>

= Z rθπθ(τ)R(τ)dτ

<latexit sha1_base64="NIR6vyQDXONwIZJVMqpd8Xs+Xk=">AD/nicjVNdixMxFJ2d8WMdv7rqmy/BUmi1lJlV0JfCogjiw7LKdnehmQ6ZNG3DzheTO7IlBvwrvigiK/+Dt/8Nyad6e62FTEw825J+e0miPOYCPO/3lu1cuXrt+vYN9+at23fuNnbuHYmsLCgb0CzOipOICBbzlA2AQ8xO8oKRJIrZcXT6yuSP7BC8Cw9hHnOgoRMUz7hlICGwh37Qv1EU4IzKJIvlahJCFgwROc8zam4w+ihA6XSRCU98ZVIov8h0Nb2jcMwmMSiTDSr76nRYSVJSzfq/aShws+nUGAMHZb6G0bw4wB6azVx0DKqkpYEdoGOa9xSXaBLzW1IiZ5XmRnCE8KQqWv5L6qLPG+r0b7S3v+pj3Ju6C6iFRBx21VlUePjTdSTDX7zDgzoLowu9q20f1H18ZgzuezCYlUzq/eo0VjL/M5vNsQSuEeIprKudW6jHt35wbP5ho+n1vMVCm4FfB02rXgdh4xceZ7RMWAo0JkIMfS+HQJICOI2ZcnEpWE7oKZmyoQ5TkjARyMX1VailkTGaZIX+tOEFevmEJIkQ8yTSTONWrOcM+LfcsITJi0DyNC+BpbQqNCljBkybwGNecEoxHMdEFpw7RXRGdGXB/SLcfUQ/PWN4Oj3Z7/tLf7lz72U9jm3rofXIalu+9dzas95YB9bAora0P9tf7W/OJ+eL8935UVHtrfrMfWtlOT/AER1UBU=</latexit>

= Z rθπθ(τ) · πθ(τ) πθ(τ) · R(τ)dτ

<latexit sha1_base64="6Cf4ltUuB2RixiA9bfabPClb2TA=">AEMXicjVPLbtNAFHVtHsU8msKSzYgoUgJRZRck2ESqQAjEoiqoaStlHGs8GSej+iXPNWpk5pfY8CeITRcgxJafYCZ2sZBiJFs3Tn3zL3nHnuCLOICHOd8w7SuXb9xc/OWfvO3Xtbre37RyItcsqGNI3S/CQgkU8YUPgELGTLGckDiJ2HJy+0vnjywXPE0OYZ4xLybThIecElCQv2+6aABwjGBWRCUr6VfEh+w4DHOeBfTSQqfhA+9PhJ+CU9cqVMou8z0Fb0ncRCGFRxIo1cOT4sCpJSVR+kN0lD+d8OgMPYWx30LsuhkD0mv0x0CKqotfEboauehxpewCX9ZUFTHJsjw9QzjMCS1dWe7LShIfuHK8v5TnrsreR9kH5Eq6NmdqvP4sdZG8qlin2lGpSXYlfH1nX/MbUWmPF6Jr1JSBCRer/qxkrmf7xZt8WzdSGeQLPahYTaPv0hK8OaKbmOVOxmr4l+62s+MsFloP3DpoG/U68Ftf8SlRcwSoBERYuQ6GXglyYHTiEkbF4JlhJ6SKRupMCExE165+OMl6ihkgsI0V4+acYFePVGSWIh5HCimViuaOQ3+LTcqIHzhlTzJCmAJrRqFRYQgRfr6oAnPGYVorgJCc60Ijojyj5Ql8xWJrjNkdeDo90d9+nO7vtn7b2XtR2bxkPjkdE1XO5sWe8NQ6MoUHNz+Y387v5w/pinVs/rV8V1dyozwVpb1+w8M52YB</latexit>

= Z πθ(τ)rθ log πθ(τ)R(τ)dτ

<latexit sha1_base64="81H4Sz9KP1lTiTaRTFADwQ1hG8=">AEgnicjVNdb9MwFM3aACN8dfDIi0VqR1Vl2xI8EClCYSEeJgGWrdJdRc5rtNacz4U36BVwf+D38UbvwbsJt3WdEJYSnR97vE9517LQSq4BNf9vdVo2vfuP9h+6Dx6/OTps9bO81OZ5BlI5qIJDsPiGSCx2wEHAQ7TzNGokCws+Dyo8mfWeZ5El8AouUTSIyi3nIKQEN+TuNnx0RDgiMA+C4pPyC+IDljzCKe9iOk3gh/Sh10fSL+C1p0wKpTeZvqb3FBYshDGWeaRZQ1dnJQlKRHFN9Vd8XDGZ3OYIydDvrSxTBnQHo1fQwkL1X8ktA1yLXGrbJLfFVTV8QkTbPkCuEwI7TwVHGkSkt86KmLo5U9b9Newfug+oiUQc/plMoXu8YbyWafWcGVDdmF1v29T9R9fGYMqrnswmJoEg1X59GmuZ/5nN5li0nC7EY6hXu7ZQjc9cZDmwekptIiW7rjU1f2clVztT1xfJbMPDnQX9VtsduMuFNgOvCtpWtY791i8TWgesRioIFKOPTeFSUEy4FQw5eBcspTQSzJjYx3GJGJyUiyfkEIdjUxRmGT6010s0dsnChJuYgCzTRuZT1nwLty4xzCd5OCx2kOLKalUJgLBAky7xFNecYoiIUOCM249oronOj7AP1qHT0Er97yZnC6P/AOBvtf37QP1Tj2LZeWq+sruVZb61D67N1bI0s2vjT7DQHzT3btndtz4oqY2t6swLa23Z7/8CAmB+6g=</latexit>

= Eτ∼pθ(τ)[rθ log πθ(τ)R(τ)]

<latexit sha1_base64="68w1wxgh2qI5Wm1k19LrzwOBZpQ=">AE3icjVTPb9MwFM7WAqP86uDIxaKq1EJVJQMJLpUmEBLiMA20bpPqNnJcp7Xm/FD8glYFX7hwACGu/Fvc+EO4YzfptiYTqVEz97/r7PL3a8WHAJtv1na7tWv3Hz1s7txp279+4/aO4+PJZRmlA2pJGIklOPSCZ4yIbAQbDTOGEk8AQ78c7emPzJ5ZIHoVHsIjZOCzkPucEtCQu7v9t40GCAcE5p6XvVuRlzAkgc45h1MpxF8li50e0i6GTxzlEmh+DLT0+VdhQXzYRlGuiqga0mRzklJSL7qDqrOpzw2RzGCONG73vYJgzIN2SPgaS5ipuXtAxyIXGFdolvuLUjJjEcRKdI+wnhGaOyg5UbokPHDU5WNlzqvYy3gPVQyQPuo12rjx5aryRZKarz40zA6pLs+vbNrz/2bUxGPNiT2YSEk+QYr7ejbXMJr2ptkXLaSIeQpntwkLRPvMh84aVU6qK5NVlral5N5bnaClYWlV2IKJZxcX1lJsdjNEa/0b0Y7fZsv2cqBq4BRByrGodv8jacRTQMWAhVEypFjxzDOSAKcCqYaOJUsJvSMzNhIhyEJmBxny/upUFsjU+RHiX50g5bo1RUZCaRcBJ6uNE5lOWfA63KjFPxX4yHcQospLmQnwoETKXHU15wiIhQ4ITbj2iuic6I8N+pfQ0E1wyluBsd7fed5f+/Di9b+6IdO9Zj64nVsRzrpbVvbMOraFa7j2pfat9r1O6l/rP+o/89LtrWLNI2t1H/9A7FZpC8=</latexit>

Expand expectation Exchange integration and expectation

rθ log π(τ) = rθπ(τ) π(τ)

<latexit sha1_base64="PFBv8gP4ATPO1SJnvIETFRlOHmU=">AFOnicjVTLbtNAFHWbAMW8WliyGVFSiCq4oIEm0gVCAmxqALqS8qk1ngySUYdPzRzjVKZ+S42fAU7FmxYgBbPoAZ20Tu6o6kq3rc+/c+7x2EiuIJe7/vKaqN54+atdvunbv37j9Y3h4oOJUrZPYxHLo4AoJnjE9oGDYEeJZCQMBDsMTt7Y/OEnJhWPoz04TdgoJNOITzglYCB/ozFoT7CIYFZEGRvtZ8RH7DiIU54G9NxDJ+VD50uUn4GzxtUyi5yHRNeUdjwSYwxCoNTVW/p4/3ipaUiOyjbp/VYcmnMxghjN0Wet/GMGNAOhV+DCQtWPyioG2Rc46Ftjl+1tN0xCRJZDxHeCIJzTyd7epCEu97+nj3TJ5Xl5fxLuguIkXQcVsF8/FTq43IqameW2UW1Bdil8e2fa+Y2gpMeDmTfYhIEj5vOzGUuY63tRtMXSmEY+g2u1cQmfZGFYdWUriNFdZVrbO9ufo5ywsquqgIRT2sqrmh5rfEXGa5FMHIxsDnkH1Am2dgMe5nKcoJ+eaZqXi4VYbaX9/sbfXyheqBVwabTrkG/vo3PI5pGrIqCBKDb1eAqOMSOBUMO3iVLGE0BMyZUMTRiRkapTlyjVqGWSMJrE0l/E+Rxd3ZCRU6jQMTKW1QFVzFrwsN0xh8mqU8ShJgUW0IJqkAkGM7H8EjblkFMSpCQiV3GhFdEaMSWD+Nq4xwauOXA8Otre851vbH15s7rwu7VhzHjtPnLbjOS+dHedM3D2Hdr40vjR+NX43fza/Nn80/xblK6ulHseOUur+e8/uA7IlQ=</latexit>
slide-46
SLIDE 46

Example

(C) Dhruv Batra 46

slide-47
SLIDE 47

Mental Break!

  • VAE Demo

– https://www.siarez.com/projects/variational-autoencoder

(C) Dhruv Batra 47

slide-48
SLIDE 48

Two Options

  • Score Function based Gradient Estimator

aka REINFORCE (and variants)

(C) Dhruv Batra 48

  • Path Derivative Gradient Estimator

aka “reparameterization trick”

slide-49
SLIDE 49

Option 2

(C) Dhruv Batra 49

  • Path Derivative Gradient Estimator

aka “reparameterization trick”

slide-50
SLIDE 50

Option 2

(C) Dhruv Batra 50

  • Path Derivative Gradient Estimator

aka “reparameterization trick”

slide-51
SLIDE 51

Reparameterization Intuition

(C) Dhruv Batra 51

z = µ + 2✏i ✏i ∼ p(✏) σ2

Figure Credit: http://blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-reparameterisation-tricks/

slide-52
SLIDE 52

Reparameterization Intuition

(C) Dhruv Batra 52

Image Credit: https://www.kaggle.com/rvislaywade/visualizing-mnist-using-a-variational-autoencoder

slide-53
SLIDE 53

Example

(C) Dhruv Batra 53

slide-54
SLIDE 54

Two Options

  • Score Function based Gradient Estimator

aka REINFORCE (and variants)

(C) Dhruv Batra 54

  • Path Derivative Gradient Estimator

aka “reparameterization trick”

slide-55
SLIDE 55

Example

(C) Dhruv Batra 55

Figure Credit: http://gokererdogan.github.io/2016/07/01/reparameterization-trick/

slide-56
SLIDE 56

Example

(C) Dhruv Batra 56

Figure Credit: http://gokererdogan.github.io/2016/07/01/reparameterization-trick/

slide-57
SLIDE 57

Variational Auto Encoders

VAEs are a combination of the following ideas:

  • 1. Auto Encoders
  • 2. Variational Approximation
  • Variational Lower Bound / ELBO
  • 3. Amortized Inference Neural Networks
  • 4. “Reparameterization” Trick

(C) Dhruv Batra 57

slide-58
SLIDE 58

Encoder network Input Data

Putting it all together: maximizing the likelihood lower bound

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-59
SLIDE 59

Encoder network Input Data

Putting it all together: maximizing the likelihood lower bound

Make approximate posterior distribution close to prior

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-60
SLIDE 60

Encoder network Sample z from Input Data

Putting it all together: maximizing the likelihood lower bound

Make approximate posterior distribution close to prior

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-61
SLIDE 61

Encoder network Decoder network Sample z from Input Data

Putting it all together: maximizing the likelihood lower bound

Make approximate posterior distribution close to prior

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n