Margulies mark : - Daniel 2- eiberg Stochastic Recap : - - PowerPoint PPT Presentation

margulies
SMART_READER_LITE
LIVE PREVIEW

Margulies mark : - Daniel 2- eiberg Stochastic Recap : - - PowerPoint PPT Presentation

Lecture 18 : Black Box Variational Inference Niklas Swede Scribes Margulies mark : - Daniel 2- eiberg Stochastic Recap : Variation Inference al Bound ( ELBO ) Lower Objective Evidence : , ,[ log.gl?tTr.. , ] 219,10 Ea , ) =


slide-1
SLIDE 1 Lecture 18 : Black Box Variational Inference Scribes : Niklas Swede mark
  • Margulies
Daniel 2- eiberg
slide-2
SLIDE 2 Recap : Stochastic Variation al Inference Objective : Evidence Lower Bound ( ELBO ) 219,10 ) = Ea , . ;µaa , ,[ log.gl?tTr.. , ] Natural Gradient : Coordinate . invariant notion
  • f
steepest descent . Computable in closed form for exponential family models t.LU ) = Eq , , ;¢,[ yglx ,z ,a ) ] . I ngl x. 7.4 = ( d , + § , tcxa ,7a ) , 92 + D )
slide-3
SLIDE 3 Today : Black . box Variation Inference Problem 1 : Closed
  • form
updates in VBEM and SUI need to be re derived for every new model . Problem 2 : Closed . form updates in practice require Conjugate priors in
  • rder
to work ( which limits the number
  • f
models we can define |
slide-4
SLIDE 4 Black . box Variation Inference foal : Can we formulate general strategies for computing gradients
  • f
the ELBO ? T.LK ) =

Taegu

: , ,[ log MEET , ] Problem : How do we compute gradients
  • f
expectations ? Idea : Can we use Monte Carlo methods ? '
slide-5
SLIDE 5 Gradients
  • f
Expectations Problem : Compute gradient with respect to some set ef parameters I % El [ f( z ; z , ] Ann function q (z if )

#

Any distribution Easy case : Distribution qcz ) does not depend an I T.IE [ fc2.si ) ] = Eq , ,[ P , fast ) ] q ( z ) ± § § , To flz ' " :o) 7 " ' n 9177
slide-6
SLIDE 6 Gradients
  • f
Expectations Harden Case : Distribution qcz ;D ) d0es= depend an I % Eqcz

;g,[

fl

71

] =P , |dt qct :o) fat , = | at ' 0×917=511 fl 7 ;D ) . What do we do here ?
slide-7
SLIDE 7 Gradients
  • f
Expectations Williams 1992 # ( @ Northeastern ) Trick ( REINFORCE ) i Rewrite % got :D ) Tg log get ;D ) = " , % 917 :D , Li hel : head
  • ratio
Estimator

:

7,9 Eqcz ;D , [ f A ) I =

fat

To get :D ) ft ) = / da get :D ) To log 917 :D ) fo ) = Eq , ;g , [ To log got :D , fth )
  • Can

use

Mc approximation
slide-8
SLIDE 8 Blach Box Variatianal Inference Idea , Perform ✓at at ional Inference fct ;D ) with Likelihood
  • Ratio

Estimator

To £17 ) = To Eg , ,,[ log PgYI?,÷ , ] = Egan,|% leg qizsd , log PgYzYf,d
  • Poly
got ;D ) ] = Egan , 10 , log gash ( log Phhf?÷ . ,
  • i ) ]
slide-9
SLIDE 9 Blach Box Variatianal Inference Observation : Expected Value
  • f
% leg qct ;D ) = Pg ) d7 917 . :D ) = |d7 got ;D ) % log ql7 ;D ) Implication : Can add any constant a to estimator VILLA ) = Egan , [ T.ly qasi , ( log PhY¥I , . a ) ] i. Llo )

=±§%byqz'

" d) (bspgYI÷' *
  • a )
v
slide-10
SLIDE 10 Black Box Variational Inference Problem : Reinforce
  • style
Estimators hare High Variance To Los = Et go.io , to , loggia :O ,

flog

  • a) I
qcz ;D , = Norm ( t ;D ' ' %) leg get ;D , P pig , 't I log go

IT

. . . . 2- / Y it To . by 917%1 Few scruples here Many samples here ( all terms large ) ( all terms small )
slide-11
SLIDE 11 Exploiting Conditional Independence PlYitiB_) 99,133 £1197 = Ea

eogqcz.o.p.iq/ligw

  • b

)

) I 7,0 ,p ) leg qcz.O.pl = ? leggttdulodn) t § leg gloat Eat t fu logqtpulwu ) Idea : Can we use conditional independence to simplify yan Bh expressions far gradients ? K

Gd

Fdn Nd D
slide-12
SLIDE 12 Exploiting Conditional Independence PCYitiB_) . / 97133 £1197 = E ( r , log

quo

, pit ) (

log

w
  • b

)

)

917,0

, p ) to an £ = IF I Dolan by 91 2- an I loan ) ( log w
  • bi ))
917,0 , p ) = Ethan) gcoasqcp , autos 917dm Idan )

flog

P' 9%1,171,0%7

?

  • i))
T much lower
  • dimensional
estimator log w = Cabg P '

} ? ! ; ! ;D

" " t fees +

fees

god Gd ) 9 C Bul Jul log q Cao , p) = Lan leg gttdul loan ) t Ea leg qc a) + fu log q wi)
slide-13
SLIDE 13 Control Variates General Setup : Zero
  • mean
terms preserve expected values II [ fits ] = # [ flzi
  • a
( hcz )
  • E[
hcz , ] ) ] = # [ ft ,] Goal : Choose a to minimize variance Var ( ILA ] : Van [ 1,17 ) ) + a 'Var[ hh . ) ]
  • 2cal
flu ,hh . ) ] = favor [ fly ) → a = covlfth.hn# Van [ hh . ))
slide-14
SLIDE 14 Control Variates : BBVI % Lb ) = st § T.by qiz 's ' ; 1) (bspgYI÷

'm

,
  • a )
a = Cov[fth,hczi]_ fat = To log qct :D ) log Phff#o , Var [ hl 7) ) hth = % log qc 7 ;D ) = 's } ,

!fa'

" 1
  • fill
it 't .

in

) t 5 I. In 'z' " I
  • it
slide-15
SLIDE 15 Blach Box Variatianal Inference Algorithm 1 C )
  • Initialize
I ( randomly )
  • For
t in 1 , i. . , T
  • for
sin i. ... ,s

y {

pmnsilide . z ' " ~ qc 7 ;D 't ' ' ' )
  • 8.
Lcd ) = 's? ( fcz " ' )
  • an )
. It ' = d " " ' + g ' " t.CH ) \ Can replace vanilla SGD with move recent algorithms
slide-16
SLIDE 16 Improvements
  • n

SGA

Normal

86A

: 2 " ' = 7 " " ' + g 't 't,£( a) Problem 1 : Estimator may be high Variance ( even with control vaniates ) E[ to LH ) ]= VILA ) but Var[§Lt )] > > 1%11/912 Problem 2 : We have no way
  • f
computing natural gradient
  • 0 (
D ' ) when De RD 7 ' "= I " " ' . g ' "ti' Fila ) Hi ; = 0k 27 : dij
slide-17
SLIDE 17 Improvements
  • n
SGA : ADAM
  • Parameters
: B , , Be , E , f " ' , . . . , f " ' I c )
  • Initialize
I ( randomly ) g " ' so , m " ' =
  • ,
v " '=o
  • Fon
t in 7 , . . . , T
  • g
't ' = G , 11911,9=9 't" '
  • in
't ' = p , in 't
  • ' '
t It
  • p
, ) g '

"

in " ' = n' " 111
  • p
, )
  • v
't ' = 13 , v "
  • "
th
  • B.)
I g 't 'T I ' " = u 't 'll ,
  • pay
  • I
' " = I ' " " + g 't ' wi 't ' / ( TE " + e )
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21 Improvements
  • n

SGA

Normal

86A

: 2 " ' = I " " ' + g 't 't,£( a) Improvement 2 : Smooth
  • ver
time steps Problem : E[ to LH ) ] = VILA ) but Var[§Lt )] > > IT .LI/7)I2 Solution :
slide-22
SLIDE 22 Blach Box Variatianal Inference Algorithm 1 C )
  • Initialize
I ( randomly )
  • For
t in 1 , i. . , T
  • For
s in 7 , . . . , 5 . z ' " ~ qc 7 ;D 't ' ' ' )
  • glz
' '' ) = Pg log got ' " ;D " " ' I f = 's { fit ' " ) flz 's 't = get ' 's ' ) by PgY÷?¥ , ,y← 's , g- = 's ? got 's 't . an = § ( f 't 's ' 1
  • F)
(gc 7 ' " ) . g- ) § ( gcz ' " 1 . g) 2
slide-23
SLIDE 23 Improvements an

SGA

Normal gradient / not natural gradient ) Normal 86A : 2 " ' = I " " '

+

g 't 't,L( a) Improvement 1 : Approximate Natural Gradient 9 ' "= 9 " " ' . g '

"ti'

Film ) Hi ; = 023 diidij Problem : H " requires O(D3 ) time for It Rb Approximation : H = diag ( H ) diag ( H ) = 06 dg , ? = # [ ( To :[ (d) 5) " 2