161-17 Stochastic Variational Lecture Inference I Kaushal - - PowerPoint PPT Presentation

161 17
SMART_READER_LITE
LIVE PREVIEW

161-17 Stochastic Variational Lecture Inference I Kaushal - - PowerPoint PPT Presentation

161-17 Stochastic Variational Lecture Inference I Kaushal Panchi Scribes : Jay DeYoung Recap Vaniatianal Inference : Approximate Goal Posterior : 17,13 ) pH )p( 137 Generative Model 13 ) pc plx ,z : = , Approx


slide-1
SLIDE 1 Lecture

161-17

Stochastic Variational Inference I Scribes : Kaushal Panchi Jay DeYoung
slide-2
SLIDE 2 Recap : Vaniatianal Inference Goal : Approximate Posterior Generative Model : plx ,z , 13 ) =

pc

× 17,13 ) pH )p( 137 Variational Approx : qc 7 ;¢)q( pit ) I p( 7,13 1×7 Objective : Evidence Lower Bound ( ELBO ) pcx , 7,13 ) LC 10,9 ) = Etqn . :O, , qcp :O )[ lofty , , , ] = log pl × )
  • KL( qczsqlqlpsi
) 11pA,pl×))
slide-3
SLIDE 3 Latent Diniohlet Allocation : Generative Model Generative model : Pu ~ Dirichetlw ) Yan Bh Oa ~ Dirichlet ( x ) K 2- dn~ Discrete ( Od )
  • n Qd
Zan % ydnl 7dn=h ~ Discrete ( Pa ) Na D p ( x , 7 , p ) = ( l§ , p( Xdlttd ,p)p(7d ) ) § .

.pl/3u)p(7did)=/dOdp(7d1Od)p(0dj4

)
slide-4
SLIDE 4 LDA Global us Local Parameters Generative Model : pix ,z , 13 ) =

pix

17 , pspttsplp ) Variational Approx : get :p ) qcp :D ) = pet , 131×7 pcx , 7 , p ) = (¥ , pcxdltd , B) pltd ) ) . .pl/3u ) 917419113 )
  • (%
, 917ns
  • ld ) ) II.

alpa

:D !
  • ld
i local parameters Dui global parameters ( only depend an doc d ) I depend
  • n
all does )
slide-5
SLIDE 5 Stochastic Variational Inference Problem : Wikipedia has 5Mt entries . If we wanted to do VBEM then we need updates I = arggnax £19,7 )
  • f
= angqmax 114,9 ) Now each
  • f
these updates
  • requires
a full pass
  • ver
the 5Mt documents Solution ; Optimize I with stochastic gradient descent
slide-6
SLIDE 6 Stochastic Gradient Descent Idea i Optimize
  • bjective
with noisy estimate
  • f gradient
I " = It "

t get

, LID ) Step sire I I Approximation
  • f
the

gradient

Requirement I : Gradient estimate should be unbiased E IT , LCD ) I = Dfid ) Requirement 2 ; Robbins
  • Monro
conditions as as [ ft = as ( Int mean I ft ' Los / Finite variance t = , displacement ) te , in displacement )
slide-7
SLIDE 7 Stochastic Variation al Inference Approximation : Compute ELBO for batch
  • f
does £17 ) = myax LC 4,9 ) = my # an ;¢,qµa[ by PgYhIIhT$⇒ , ] plxa , 7 app ) =

§

mgay # qaa ; 9) 91Pa ' [ 68 qq.FI
  • #
q( p , , , 1 log qcp :D , ]
slide-8
SLIDE 8 Stochastic Variation al Inference Approximation : Compute ELBO for batch
  • f
docs plxa , Zapp ) 217 ) =

{ (mgay

# qaa ; 9) 91ps ' ' [ 68 qq.FI
  • I
# q( p , , , I log qcp :D , ] ) Choose batch
  • f
does : xb~ Uniform ( { x ' , ... ,x " } ) 2^19 )

§ my

; E a , µ , llogp

"→"9(7bi¢b

) 9(7si¢b ) ] , 'Yg

(

year

. ' T
  • , ,[ logqcp :o)
)) Assume ' we can

(

do this with VB

End

slide-9
SLIDE 9 Stochastic Variational Inference TL 's DR : For conditionally conjugate exponential family models we can compute a natural gradient i. LCD ) = Eq , , Ing kit ,
  • 7 ]
  • I
'

Efnglx

, 7

d)

=

(

d , + a tea ,7a ) , at D ) [ Sufficient statistics This yields gradient updates It = It
  • '
+ gtv , fly ) = It . 'll
  • ft )
i ft Ega ; Mg " it 'd)
slide-10
SLIDE 10 Natural Gradients : Coordinate Invariance Example : Suppose we have two equivalent sets of coordinates × , × , ~
  • ~
  • ×
1 = 6 , ×2 = 62 Xz Iz ~ x . X . Llx , ,xd=
  • ( ¥i+¥
, ) LCI , ,I.l= . Kite )
slide-11
SLIDE 11 Gradient Values Change with Coordinates Xz

E

I , = I 6 ,
  • I
, = I %

/

62 x .

I

L( x.
  • x. I
=
  • ( ¥i
+ fig ) LCI , ,Ii = . Kite ) 2£ = . ¥
  • ff
, =
, = 's , ¥ , 2 × , 6,2

¥

, = . 2g Off , = . zxi = I It 622 2×~
slide-12
SLIDE 12 Coordinate
  • Invariant
Gradients Idea : Can we define a notice
  • f
steepest descent that is in Variate under coordinate transformations ? DX = anguish dxt P×LCx ) 5. t . dxtdx SEZ dx d L = dxt 0×1 = DET PIL Jacobian : Matrix
  • f
partial derivatives

÷¥÷÷÷÷s

i

:÷÷÷÷÷l

slide-13
SLIDE 13 Coordinate
  • Invariant
Gradients Observation :

Differentials

and gradients can be transformed using the chain rule dxi = § j de ; = J de dxtdx = DET JTJ DE I 2x . Qi ; =

f Txt

; Ex , . Qi = JT Tx
slide-14
SLIDE 14 Coordinate
  • Invariant
Gradients Example ; Polar coordinates Xz Cos J ' " 1%1=1:

.int

÷ yo

i ,
  • r
sin O
  • Xz
. ya .

tit

Sino r cos & distance metric G dxtdx = dx , ' + dx ? =

d#TJd

= [ dr do ) I !
  • n
, ] I day) = dr ' t n ? do
  • "
slide-15
SLIDE 15 Coordinate
  • Invariant
Gradients Assumption : de points along KIKI DE LCE ) = JT Ifl x ) dx = 0×1×1 Tx fix ) = J 't DILE ) DE = DELK , dext ELK , =

(0×214)+(0×141)

=

(0×-2679554%4×7)

=/ COILED 'T0eLM )
slide-16
SLIDE 16 Coordinate
  • Invariant
Gradients Solution : Define natural gradient DE = By LCE I = ( JTJ ) " T , LIE ) dx = 8×11×1 = 7×11×1 a' ET

ELLI

, = L (E) IT J
  • '
J
  • the
LCE ) ) ^ T = (0×26)/55 ' JTT×Lc× ,

\

  • (
L K )

)T0×[

C x ) = dxt 0×21×1 = Change in
  • b.ec
five is

independent

  • f
choice
  • f
Lordi nates
slide-17
SLIDE 17 Natural Gradients in Vaniational Inference Distance Metric ; Symmetric KL divergence KL "mld,7 't = Ear , ,llg9g"h¥aI + Ftaanoillgag "hs¥ts ) ddt 6C 7) di = KL "m( 7. 1 + di ) Felli ) = Gilb ) 0 , Lt )
slide-18
SLIDE 18 Coordinate
  • Invariant
Gradients Example ; Polar coordinates Xz ~ n C xi . xi I " ' ×
  • 1.1=1

t.n.yx.nl

do

J
  • f
is "

=/

case sing ' ' Fa
  • .
s
  • cage
° ' =

fusion