amortised learning by wake sleep
play

Amortised learning by wake-sleep Li Kevin Wenliang, Ted Moskovitz, - PowerPoint PPT Presentation

Amortised learning by wake-sleep Li Kevin Wenliang, Ted Moskovitz, Heishiro Kanagawa, Maneesh Sahani Gatsby Unit, University College London direct max likelihood upda update i te in V n VAE AE amor amortis tised ed lear learning


  1. Amortised learning by wake-sleep Li Kevin Wenliang, Ted Moskovitz, Heishiro Kanagawa, Maneesh Sahani Gatsby Unit, University College London

  2. direct max likelihood 𝜾 upda update i te in V n VAE AE amor amortis tised ed lear learning ning consis isten tent! t! biased… simple le, direct! ect! Intractable… approximate ximate agnost gnostic t ic to o model s model str tructur ucture and type and ty pe of of Z Z gives giv es bett better er tr trained ained models models ,

  3. Least square regression gives conditional expectation

  4. How to estimate ? β€’ define β€’ then β€’ In practice, draws and solve Algorithm: Issues: } 1. 𝑨 π‘œ , 𝑦 π‘œ ∼ π‘ž πœ„ β€’ sleep is high dimensional 2. find ො 𝑕 by regression β€’ computing for all sleep samples can be slow 3. 𝑦 𝑛 ∼ 𝒠 } wake 4. update πœ„ by ො 𝑕(𝑦 𝑛 )

  5. How to estimate more efficiently ? β€’ define β€’ suppose we estimate with kernel ridge regression, then auto-diff is an estimator of by kernel ridge regression Theorem : if and the kernel is rich, then is a consistent estimator of

  6. Amortised learning by wake-sleep 1. 𝑨 π‘œ , 𝑦 π‘œ ∼ π‘ž πœ„ consis isten tent! t! 2. kernel ridge regression simple le, direct! ect! 3. 𝑦 𝑛 ∼ 𝒠 4. update πœ„ by 𝑕 𝑦 𝑛 = βˆ‡ πœ„ መ 𝑔 πœ„ (𝑦 𝑛 ) Assumptions: Non-assumptions: β€’ easy to sample from π‘ž πœ„ β€’ posterior β€’ βˆ‡ πœ„ log π‘ž πœ„ (𝑦, 𝑨) exists β€’ structure of π‘ž πœ„ 2 β€’ true gradient is β„’ π‘ž β€’ type of π‘Ž ,

  7. consis isten tent! t! Experiments β€’ Log likelihood gradient estimation simple le, direct! ect! β€’ Non-Euclidean latent β€’ Dynamical models β€’ Image generation β€’ Non-negative matrix factorisation β€’ Hierarchical models β€’ Independent component analysis β€’ Neural processes

  8. Experiment 1: gradient estimation

  9. Experiment II: prior on the unit circle 𝑨 ∈

  10. Experiment III: dynamical model

  11. Experiment IV:sample quality

  12. Experiment IV: downstream tasks

  13. amor amortis tised ed lear learning ning consis isten tent! t! simple, le, direct! ect! , Thank you!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend