Amortised learning by wake-sleep Li Kevin Wenliang, Ted Moskovitz, - - PowerPoint PPT Presentation

β–Ά
amortised learning by wake sleep
SMART_READER_LITE
LIVE PREVIEW

Amortised learning by wake-sleep Li Kevin Wenliang, Ted Moskovitz, - - PowerPoint PPT Presentation

Amortised learning by wake-sleep Li Kevin Wenliang, Ted Moskovitz, Heishiro Kanagawa, Maneesh Sahani Gatsby Unit, University College London direct max likelihood upda update i te in V n VAE AE amor amortis tised ed lear learning


slide-1
SLIDE 1

Amortised learning by wake-sleep

Li Kevin Wenliang, Ted Moskovitz, Heishiro Kanagawa, Maneesh Sahani Gatsby Unit, University College London

slide-2
SLIDE 2

𝜾 upda update i te in V n VAE AE amor amortis tised ed lear learning ning direct max likelihood agnost gnostic t ic to

  • model s

model str tructur ucture giv gives es bett better er tr trained ained models models , approximate ximate biased… Intractable… simple le, direct! ect! consis isten tent! t! and ty and type pe of

  • f Z

Z

slide-3
SLIDE 3

Least square regression gives conditional expectation

slide-4
SLIDE 4

How to estimate ?

  • define

Algorithm:

  • 1. π‘¨π‘œ, π‘¦π‘œ ∼ π‘žπœ„
  • 2. find ො

𝑕 by regression

  • 3. 𝑦𝑛 ∼ 𝒠
  • 4. update πœ„ by ො

𝑕(𝑦𝑛)

  • then
  • In practice, draws

and solve

}

sleep

}

wake Issues: is high dimensional

  • computing

for all sleep samples can be slow

slide-5
SLIDE 5
  • define

How to estimate

  • suppose we estimate with kernel ridge regression, then

more efficiently?

is an estimator of by kernel ridge regression Theorem: if and the kernel is rich, then is a consistent estimator of

auto-diff

slide-6
SLIDE 6

Amortised learning by wake-sleep

  • 1. π‘¨π‘œ, π‘¦π‘œ ∼ π‘žπœ„
  • 2. kernel ridge regression
  • 3. 𝑦𝑛 ∼ 𝒠
  • 4. update πœ„ by 𝑕 𝑦𝑛 = βˆ‡πœ„ መ

𝑔

πœ„(𝑦𝑛) simple le, direct! ect! consis isten tent! t! ,

Assumptions:

  • easy to sample from π‘žπœ„
  • βˆ‡πœ„ log π‘žπœ„(𝑦, 𝑨) exists
  • true gradient is β„’π‘ž

2

Non-assumptions:

  • posterior
  • structure of π‘žπœ„
  • type of π‘Ž
slide-7
SLIDE 7

Experiments

  • Log likelihood gradient estimation
  • Non-Euclidean latent
  • Dynamical models
  • Image generation
  • Non-negative matrix factorisation
  • Hierarchical models
  • Independent component analysis
  • Neural processes

simple le, direct! ect! consis isten tent! t!

slide-8
SLIDE 8

Experiment 1: gradient estimation

slide-9
SLIDE 9

Experiment II: prior on the unit circle

𝑨 ∈

slide-10
SLIDE 10

Experiment III: dynamical model

slide-11
SLIDE 11

Experiment IV:sample quality

slide-12
SLIDE 12

Experiment IV: downstream tasks

slide-13
SLIDE 13

Thank you!

amor amortis tised ed lear learning ning simple, le, direct! ect! consis isten tent! t! ,