CS 4803 / 7643: Deep Learning
Dhruv Batra Georgia Tech
Topics:
– Variational Auto-Encoders (VAEs) – Reparameterization trick
CS 4803 / 7643: Deep Learning Topics: Variational Auto-Encoders - - PowerPoint PPT Presentation
CS 4803 / 7643: Deep Learning Topics: Variational Auto-Encoders (VAEs) Reparameterization trick Dhruv Batra Georgia Tech Administrativia HW4 Grades Released Regrade requests close: 12/03, 11:55pm Please check solutions
Topics:
– Variational Auto-Encoders (VAEs) – Reparameterization trick
– Regrade requests close: 12/03, 11:55pm – Please check solutions first!
– Max possible: 100 (regular credit) + 40 (extra credit)
(C) Dhruv Batra 2
– Regrade requests close: 12/03, 11:55pm – Please check solutions first!
– Max possible: 100 (regular credit) + 40 (extra credit)
(C) Dhruv Batra 3
(C) Dhruv Batra 4
VAEs define intractable density function with latent z: PixelCNNs define tractable density function, optimize likelihood of training data:
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
VAEs are a combination of the following ideas:
(C) Dhruv Batra 7
Encoder Input data Features Decoder Reconstructed input data
Reconstructed data Input data
Encoder: 4-layer conv Decoder: 4-layer upconv
L2 Loss function:
Train such that features can be used to reconstruct original data Doesn’t use labels!
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Encoder Input data Features Decoder Reconstructed input data
Autoencoders can reconstruct data, and can learn features to initialize a supervised model Features capture factors of variation in training data. Can we generate new images from an autoencoder?
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Probabilistic spin on autoencoders - will let us sample from the model to generate data!
Image Credit: https://jaan.io/what-is-variational-autoencoder-vae-tutorial/
q𝜚 𝑨 𝑦 p𝜄 𝑦 𝑨
VAEs are a combination of the following ideas:
(C) Dhruv Batra 11
(C) Dhruv Batra 12
– Reality is complex – Can we approximate it with something “simple”? – Just make sure simple thing is “close” to the complex thing.
(C) Dhruv Batra 13
(C) Dhruv Batra 14
15 (C) Dhruv Batra
ll(θ : D) = log
N
Y
i=1
P(xi | θ) =
N
X
i=1
log P(xi | θ) =
N
X
i=1
log X
z
P(xi, z | θ)
16 (C) Dhruv Batra
17 (C) Dhruv Batra
18 (C) Dhruv Batra
ll(θ : D) ≥ F(θ, Qi) =
N
X
i=1
X
z
Qi(z) log P(xi, z | θ) Qi(z)
19 (C) Dhruv Batra
ll(θ : D) ≥ F(θ, Qi) =
N
X
i=1
X
z
Qi(z) log P(xi, z | θ) Qi(z)
20 (C) Dhruv Batra
ll(θ : D) ≥ F(θ, Qi) =
N
X
i=1
X
z
Qi(z) log P(xi, z | θ) Qi(z)
VAEs are a combination of the following ideas:
(C) Dhruv Batra 21
(C) Dhruv Batra 22
(C) Dhruv Batra 23
Image Credit: https://www.kaggle.com/rvislaywade/visualizing-mnist-using-a-variational-autoencoder
Probabilistic spin on autoencoders - will let us sample from the model to generate data!
Image Credit: https://jaan.io/what-is-variational-autoencoder-vae-tutorial/
q𝜚 𝑨 𝑦 p𝜄 𝑦 𝑨
Encoder network Input Data
Putting it all together: maximizing the likelihood lower bound
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Encoder network Input Data
Putting it all together: maximizing the likelihood lower bound
Make approximate posterior distribution close to prior
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Encoder network Sample z from Input Data
Putting it all together: maximizing the likelihood lower bound
Make approximate posterior distribution close to prior
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Encoder network Decoder network Sample z from Input Data
Putting it all together: maximizing the likelihood lower bound
Make approximate posterior distribution close to prior
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Encoder network Decoder network Sample z from Sample x|z from Input Data
Putting it all together: maximizing the likelihood lower bound
Make approximate posterior distribution close to prior Maximize likelihood of
being reconstructed
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Decoder network Sample z from Sample x|z from
Use decoder network. Now sample z from prior! Data manifold for 2-d z Vary z1 Vary z2
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Vary z1 Vary z2 Degree of smile Head pose Diagonal prior on z => independent latent variables Different dimensions of z encode interpretable factors
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
– Reparameterization trick
(C) Dhruv Batra 32
VAEs are a combination of the following ideas:
(C) Dhruv Batra 33
Encoder network Decoder network Sample z from Sample x|z from Input Data
Putting it all together: maximizing the likelihood lower bound
Make approximate posterior distribution close to prior Maximize likelihood of
being reconstructed
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Putting it all together: maximizing the likelihood lower bound
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
(C) Dhruv Batra 36
(C) Dhruv Batra 37
θ
(C) Dhruv Batra 38
θ
(C) Dhruv Batra 39
(C) Dhruv Batra 40
(C) Dhruv Batra 41
θ
(C) Dhruv Batra 42
θ
aka REINFORCE (and variants)
aka “reparameterization trick”
(C) Dhruv Batra 43
aka REINFORCE (and variants)
(C) Dhruv Batra 44
45
= rθ Z πθ(τ)R(τ)dτ
<latexit sha1_base64="4gyGx/NjBUPBtNX1H/STwXdkv6o=">AD/nicjVNLixNBEJ6d8bGOj82qNy+NIZBoCDOroJfAogjiYVlX5CeD2dTtJsz4PuGtkwDvhXvHhQxKu/w5v/xu7MxM1DxIZqr/6quqrojvKBFfgeb+2bOfK1WvXt2+4N2/dvrPT2L17otJcUnZMU5HKs4goJnjCjoGDYGeZCSOBDuNzl+a+Ol7JhVPkyOYZSyIySThY04JaCjcte+3UB/hmMA0iopXZViQELDiMc54G9NRCh9UCJ0uUmEBj/3ShFB2GelqeqfEgo1hgFUea1bfK4dHVUlKRPGubC94WPLJFAKEsdtCb9oYpgxIZ60/BpJXcK0DbInx5LZef4oqauiEmWyfQC4bEktPDL4qCsJPG+Xw4PFvL8TXkF70LZRaRyOm6r6jx8ZLQROdHsC6PMgOWl2NWxTd1/TG0EZryeyRwSEglSn1e3sRL5n91sriVwNwvxBJYk1OtbTxyZf9hoej1vbmjT8WunadV2GDZ+4lFK85glQAVRauB7GQFkcCpYKWLc8UyQs/JhA20m5CYqaCYX98StTQyQuNU6k9LnKPLGQWJlZrFkWYatWo9ZsC/xQY5jJ8HBU+yHFhCq0bjXCBIkXkLaMQloyBm2iFUcq0V0SnRlwf0i3H1Evz1kTedk72e/6S39/Zpc/9FvY5t64H10GpbvXM2rdeW4fWsUXtwv5kf7G/Oh+dz84353tFtbfqnHvWijk/fgNGClAV</latexit>rθJ(θ) = rθEτ∼pθ(τ)[R(τ)]
<latexit sha1_base64="Ak6iFp2ndaPkRSt0gfcdrY+Q7cQ=">AEBXicjVNb9NAEN3afBTzlcKxlxVRJAeiyi5IcIlUgZAQh6qgpq2UTaz1ZpOs6i/tjlEjsxcu/BUuHECIK/+BG/+Gdey0TYJQR7I0fvP2zZvRbphFQoHn/dmw7GvXb9zcvOXcvnP3v3G1oMjleaS8R5Lo1SehFTxSCS8BwIifpJTuMw4sfh6auyfvyBSyXS5BmGR/EdJKIsWAUDBRsWdst3MUkpjANw+K1DgoaAFEiJplwCRul8FEF0O5gFRTwxNdlCWcXlY6htzWJ+Bj6ROWxYXU9PTysJBmNivfaXfCIFJMpDAhTgu/dQlMOdD2Sn8CNK+6BXBLZHzHpdk5/hC0ygSmUyPcNkLCkrfF3s68qS6Pp6uL+w56/bK0QHdAfTKmk7rarz8HpjcqJYZ+VzkpQX5hdHrvU/c/UpcFM1DM5JKFhROu/5V0sVa6ymfWlM2uNVzjrum0g4aTW/HmwdeT/w6aI6DoLGbzJKWR7zBFhEler7XgaDgkoQLOLaIbniGWndML7Jk1ozNWgmN9ijVsGeFxKs2XAJ6jl08UNFZqFoeGWTpVq7US/Fetn8P4xaAQSZYDT1jVaJxHGFJcPgk8EpIziGYmoUwK4xWzKTV3CMzDcwS/NWR15Oj3R3/6c7u2fNvZf1OjbRNnqEXOSj52gPvUEHqIeY9cn6Yn2zvtuf7a/2D/tnRbU26jMP0VLYv/4CxJTew=</latexit>= Z rθπθ(τ)R(τ)dτ
<latexit sha1_base64="NIR6vyQDXONwIZJVMqpd8Xs+Xk=">AD/nicjVNdixMxFJ2d8WMdv7rqmy/BUmi1lJlV0JfCogjiw7LKdnehmQ6ZNG3DzheTO7IlBvwrvigiK/+Dt/8Nyad6e62FTEw825J+e0miPOYCPO/3lu1cuXrt+vYN9+at23fuNnbuHYmsLCgb0CzOipOICBbzlA2AQ8xO8oKRJIrZcXT6yuSP7BC8Cw9hHnOgoRMUz7hlICGwh37Qv1EU4IzKJIvlahJCFgwROc8zam4w+ihA6XSRCU98ZVIov8h0Nb2jcMwmMSiTDSr76nRYSVJSzfq/aShws+nUGAMHZb6G0bw4wB6azVx0DKqkpYEdoGOa9xSXaBLzW1IiZ5XmRnCE8KQqWv5L6qLPG+r0b7S3v+pj3Ju6C6iFRBx21VlUePjTdSTDX7zDgzoLowu9q20f1H18ZgzuezCYlUzq/eo0VjL/M5vNsQSuEeIprKudW6jHt35wbP5ho+n1vMVCm4FfB02rXgdh4xceZ7RMWAo0JkIMfS+HQJICOI2ZcnEpWE7oKZmyoQ5TkjARyMX1VailkTGaZIX+tOEFevmEJIkQ8yTSTONWrOcM+LfcsITJi0DyNC+BpbQqNCljBkybwGNecEoxHMdEFpw7RXRGdGXB/SLcfUQ/PWN4Oj3Z7/tLf7lz72U9jm3rofXIalu+9dzas95YB9bAora0P9tf7W/OJ+eL8935UVHtrfrMfWtlOT/AER1UBU=</latexit>= Z rθπθ(τ) · πθ(τ) πθ(τ) · R(τ)dτ
<latexit sha1_base64="6Cf4ltUuB2RixiA9bfabPClb2TA=">AEMXicjVPLbtNAFHVtHsU8msKSzYgoUgJRZRck2ESqQAjEoiqoaStlHGs8GSej+iXPNWpk5pfY8CeITRcgxJafYCZ2sZBiJFs3Tn3zL3nHnuCLOICHOd8w7SuXb9xc/OWfvO3Xtbre37RyItcsqGNI3S/CQgkU8YUPgELGTLGckDiJ2HJy+0vnjywXPE0OYZ4xLybThIecElCQv2+6aABwjGBWRCUr6VfEh+w4DHOeBfTSQqfhA+9PhJ+CU9cqVMou8z0Fb0ncRCGFRxIo1cOT4sCpJSVR+kN0lD+d8OgMPYWx30LsuhkD0mv0x0CKqotfEboauehxpewCX9ZUFTHJsjw9QzjMCS1dWe7LShIfuHK8v5TnrsreR9kH5Eq6NmdqvP4sdZG8qlin2lGpSXYlfH1nX/MbUWmPF6Jr1JSBCRer/qxkrmf7xZt8WzdSGeQLPahYTaPv0hK8OaKbmOVOxmr4l+62s+MsFloP3DpoG/U68Ftf8SlRcwSoBERYuQ6GXglyYHTiEkbF4JlhJ6SKRupMCExE165+OMl6ihkgsI0V4+acYFePVGSWIh5HCimViuaOQ3+LTcqIHzhlTzJCmAJrRqFRYQgRfr6oAnPGYVorgJCc60Ijojyj5Ql8xWJrjNkdeDo90d9+nO7vtn7b2XtR2bxkPjkdE1XO5sWe8NQ6MoUHNz+Y387v5w/pinVs/rV8V1dyozwVpb1+w8M52YB</latexit>= Z πθ(τ)rθ log πθ(τ)R(τ)dτ
<latexit sha1_base64="81H4Sz9KP1lTiTaRTFADwQ1hG8=">AEgnicjVNdb9MwFM3aACN8dfDIi0VqR1Vl2xI8EClCYSEeJgGWrdJdRc5rtNacz4U36BVwf+D38UbvwbsJt3WdEJYSnR97vE9517LQSq4BNf9vdVo2vfuP9h+6Dx6/OTps9bO81OZ5BlI5qIJDsPiGSCx2wEHAQ7TzNGokCws+Dyo8mfWeZ5El8AouUTSIyi3nIKQEN+TuNnx0RDgiMA+C4pPyC+IDljzCKe9iOk3gh/Sh10fSL+C1p0wKpTeZvqb3FBYshDGWeaRZQ1dnJQlKRHFN9Vd8XDGZ3OYIydDvrSxTBnQHo1fQwkL1X8ktA1yLXGrbJLfFVTV8QkTbPkCuEwI7TwVHGkSkt86KmLo5U9b9Newfug+oiUQc/plMoXu8YbyWafWcGVDdmF1v29T9R9fGYMqrnswmJoEg1X59GmuZ/5nN5li0nC7EY6hXu7ZQjc9cZDmwekptIiW7rjU1f2clVztT1xfJbMPDnQX9VtsduMuFNgOvCtpWtY791i8TWgesRioIFKOPTeFSUEy4FQw5eBcspTQSzJjYx3GJGJyUiyfkEIdjUxRmGT6010s0dsnChJuYgCzTRuZT1nwLty4xzCd5OCx2kOLKalUJgLBAky7xFNecYoiIUOCM249oronOj7AP1qHT0Er97yZnC6P/AOBvtf37QP1Tj2LZeWq+sruVZb61D67N1bI0s2vjT7DQHzT3btndtz4oqY2t6swLa23Z7/8CAmB+6g=</latexit>= Eτ∼pθ(τ)[rθ log πθ(τ)R(τ)]
<latexit sha1_base64="68w1wxgh2qI5Wm1k19LrzwOBZpQ=">AE3icjVTPb9MwFM7WAqP86uDIxaKq1EJVJQMJLpUmEBLiMA20bpPqNnJcp7Xm/FD8glYFX7hwACGu/Fvc+EO4YzfptiYTqVEz97/r7PL3a8WHAJtv1na7tWv3Hz1s7txp279+4/aO4+PJZRmlA2pJGIklOPSCZ4yIbAQbDTOGEk8AQ78c7emPzJ5ZIHoVHsIjZOCzkPucEtCQu7v9t40GCAcE5p6XvVuRlzAkgc45h1MpxF8li50e0i6GTxzlEmh+DLT0+VdhQXzYRlGuiqga0mRzklJSL7qDqrOpzw2RzGCONG73vYJgzIN2SPgaS5ipuXtAxyIXGFdolvuLUjJjEcRKdI+wnhGaOyg5UbokPHDU5WNlzqvYy3gPVQyQPuo12rjx5aryRZKarz40zA6pLs+vbNrz/2bUxGPNiT2YSEk+QYr7ejbXMJr2ptkXLaSIeQpntwkLRPvMh84aVU6qK5NVlral5N5bnaClYWlV2IKJZxcX1lJsdjNEa/0b0Y7fZsv2cqBq4BRByrGodv8jacRTQMWAhVEypFjxzDOSAKcCqYaOJUsJvSMzNhIhyEJmBxny/upUFsjU+RHiX50g5bo1RUZCaRcBJ6uNE5lOWfA63KjFPxX4yHcQospLmQnwoETKXHU15wiIhQ4ITbj2iuic6I8N+pfQ0E1wyluBsd7fed5f+/Di9b+6IdO9Zj64nVsRzrpbVvbMOraFa7j2pfat9r1O6l/rP+o/89LtrWLNI2t1H/9A7FZpC8=</latexit>Expand expectation Exchange integration and expectation
rθ log π(τ) = rθπ(τ) π(τ)
<latexit sha1_base64="PFBv8gP4ATPO1SJnvIETFRlOHmU=">AFOnicjVTLbtNAFHWbAMW8WliyGVFSiCq4oIEm0gVCAmxqALqS8qk1ngySUYdPzRzjVKZ+S42fAU7FmxYgBbPoAZ20Tu6o6kq3rc+/c+7x2EiuIJe7/vKaqN54+atdvunbv37j9Y3h4oOJUrZPYxHLo4AoJnjE9oGDYEeJZCQMBDsMTt7Y/OEnJhWPoz04TdgoJNOITzglYCB/ozFoT7CIYFZEGRvtZ8RH7DiIU54G9NxDJ+VD50uUn4GzxtUyi5yHRNeUdjwSYwxCoNTVW/p4/3ipaUiOyjbp/VYcmnMxghjN0Wet/GMGNAOhV+DCQtWPyioG2Rc46Ftjl+1tN0xCRJZDxHeCIJzTyd7epCEu97+nj3TJ5Xl5fxLuguIkXQcVsF8/FTq43IqameW2UW1Bdil8e2fa+Y2gpMeDmTfYhIEj5vOzGUuY63tRtMXSmEY+g2u1cQmfZGFYdWUriNFdZVrbO9ufo5ywsquqgIRT2sqrmh5rfEXGa5FMHIxsDnkH1Am2dgMe5nKcoJ+eaZqXi4VYbaX9/sbfXyheqBVwabTrkG/vo3PI5pGrIqCBKDb1eAqOMSOBUMO3iVLGE0BMyZUMTRiRkapTlyjVqGWSMJrE0l/E+Rxd3ZCRU6jQMTKW1QFVzFrwsN0xh8mqU8ShJgUW0IJqkAkGM7H8EjblkFMSpCQiV3GhFdEaMSWD+Nq4xwauOXA8Otre851vbH15s7rwu7VhzHjtPnLbjOS+dHedM3D2Hdr40vjR+NX43fza/Nn80/xblK6ulHseOUur+e8/uA7IlQ=</latexit>(C) Dhruv Batra 46
– https://www.siarez.com/projects/variational-autoencoder
(C) Dhruv Batra 47
aka REINFORCE (and variants)
(C) Dhruv Batra 48
aka “reparameterization trick”
(C) Dhruv Batra 49
aka “reparameterization trick”
(C) Dhruv Batra 50
aka “reparameterization trick”
(C) Dhruv Batra 51
Figure Credit: http://blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-reparameterisation-tricks/
(C) Dhruv Batra 52
Image Credit: https://www.kaggle.com/rvislaywade/visualizing-mnist-using-a-variational-autoencoder
(C) Dhruv Batra 53
aka REINFORCE (and variants)
(C) Dhruv Batra 54
aka “reparameterization trick”
(C) Dhruv Batra 55
Figure Credit: http://gokererdogan.github.io/2016/07/01/reparameterization-trick/
(C) Dhruv Batra 56
Figure Credit: http://gokererdogan.github.io/2016/07/01/reparameterization-trick/
VAEs are a combination of the following ideas:
(C) Dhruv Batra 57
Encoder network Input Data
Putting it all together: maximizing the likelihood lower bound
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Encoder network Input Data
Putting it all together: maximizing the likelihood lower bound
Make approximate posterior distribution close to prior
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Encoder network Sample z from Input Data
Putting it all together: maximizing the likelihood lower bound
Make approximate posterior distribution close to prior
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Encoder network Decoder network Sample z from Input Data
Putting it all together: maximizing the likelihood lower bound
Make approximate posterior distribution close to prior
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n