Scribe Graphs Stochastic Computation 22 : Heiko Zimmermann : - - PowerPoint PPT Presentation

scribe
SMART_READER_LITE
LIVE PREVIEW

Scribe Graphs Stochastic Computation 22 : Heiko Zimmermann : - - PowerPoint PPT Presentation

Lecture Scribe Graphs Stochastic Computation 22 : Heiko Zimmermann : Auto encoding Variational General Methods View : - weighted Ulp Auto encoding & Importance auto encoders SMC : EI [ IT Replace unbiased estimator Cx )


slide-1
SLIDE 1 Lecture 22 : Stochastic Computation Graphs

Scribe

: Heiko Zimmermann
slide-2
SLIDE 2 Auto encoding Variational Methods : General View Importance
  • weighted
auto encoders & Auto encoding SMC : Replace W with unbiased estimator EI [ IT = pg Cx ) £194 )
  • ftp.#qczixsllogwDsEp..llogpdxiIw=Po&I9pl71X
) n

Wake
  • sleep
methods : Replace lower band with upper bound

Ulp

) > ,

log

poles
slide-3
SLIDE 3 Gradient Estimation in Variational Inference Reinforce
  • Style
d ddqo ICO , a ) =
  • f
# go, czixsl log wo.pk , t ) ) wo.pk ,zs= Pok , Z ) 94171×3 =

& &

! (Ig leg

go ,

At

a) (log

Wo ,

#

ith ) tbh ) Z "
  • aaczks
Normal : 2-41 E. x ) = µ¢lxlt6¢X ) E Re parameterized . dido L ( O , to ) = ddp Epee , flog woo, C x. E )) Wo ,¢c× , = Pok ,

2%5×3

)

9¢CZgk,x

) IX ) K =

If

dd-plogwo.pk

, eh )

duplet

K k
  • I
slide-4
SLIDE 4 Stochastic

Computation

Graphs Example : Structured VAE far MNIST (3 lectures ago )

9.

  • ,
Lionel = Oo

.gl#gcxifEh*.axflogPf;.1

)

Data qcx ) Encoder golly , 71×7 Decoder pocx , 9,7 )
slide-5
SLIDE 5 Stochastic

Computation

Graphs Example : Structured VAE far MNIST (3 lectures ago ) Do ( x , y ,7 ) To ,o, £10,91 = To.gl#gcxsfEtqcy.zixflog+zx )) Idea : Represent loss as stochastic computation

graph

X h , y 9 ¢ Not re parameterized : y n Discrete ( h , ( x , ¢ , ) )
slide-6
SLIDE 6 Stochastic

Computation

Graphs Example : Structured VAE far MNIST (3 lectures ago ) Do ( x , y ,7 ) To ,qL( 0,91 = To.gl#gcxsfEtqcy.zixflog+zx )) Idea : Represent loss as stochastic computation

graph

en =
  • log
""-

, = 0¥ Eh ' q ly 1h , ) 24 , 9 X h , y 9 ¢ Not re parameterized : y n Discrete ( h , ( x , to , ) )
slide-7
SLIDE 7 Stochastic

Computation

Graphs Example : Structured VAE far MNIST (3 lectures ago ) Do ( x , y ,7 ) To ,o, £10,91 = To.gl#gcxsfEtqcy.zixflog+zx )) Idea : Represent loss as stochastic computation

graph

#\

pepanametvi red : 9 X h , y hz 2- 2- ( hzlx.y.dz ) , E ) to & E E n p ( Es
slide-8
SLIDE 8 Stochastic

Computation

Graphs Example : Structured VAE far MNIST (3 lectures ago ) Do ( x , y ,7 ) To ,o, £10,91 = To.gl#gcxsfEtqcy.zixflog+zx )) Idea : Represent loss as stochastic computation

graph

em

e . er .
  • log
, 9 9 x h , y

4/2

. dl¥z=, ghq

;

d Q & E
slide-9
SLIDE 9 Stochastic

Computation

Graphs Example : Structured VAE far MNIST (3 lectures ago ) Do ( x , y ,7 ) To ,o, £10,91 = To.gl#gcxsfEtqcs.zixflog+zx )) Idea : Represent loss as stochastic computation

graph

em

e . er .
  • log
, 9 9 × h , yhtt

dl¥z=

ghq

;

d Q 19 E

+9¥

z%

092
slide-10
SLIDE 10 Stochastic

Computation

Graphs Example : Structured VAE far MNIST (3 lectures ago ) Do ( x , y ,7 ) To ,o, £10,91 = To.gl#gcxsfEtqcy.zixflog+zx )) Idea : Represent loss as stochastic computation

graph

(a)

h . g. / 9 n X h , y ha z

dlg

I

dot

, Q & E
slide-11
SLIDE 11 Stochastic

Computation

Graphs Example : Structured VAE far MNIST (3 lectures ago ) Do ( x , y ,7 ) To ,o, £10,91 = To.gl#gcxsfEtqcy.zixflog+zx )) Idea : Represent loss as stochastic computation

graph

Problem . I : Cannot

#\

l , take derivative w . n . t . discrete value 9 9 × n . s nit

¥

.

it

  • f

:&

?%÷¥

Q & E
slide-12
SLIDE 12 Stochastic

Computation

Graphs Example : Structured VAE far MNIST (3 lectures ago ) Do ( x , y ,7 ) To ,o, £10,91 = To.gl#gcxsfEtqcy.zixflog+zx )) Idea : Represent loss as stochastic computation

graph

Problem . I : Cannot

#)

l , tame derivative w .
  • t
. discrete value 9 9 × h , y hz 2- d daff t 1) 8¥ i foil

:* :*

Q & E Problem 2 : Cannot tale derivative
  • f
stochastic value
slide-13
SLIDE 13 Stochastic

Computation

Graphs Example : Structured VAE far MNIST (3 lectures ago ) Do ( x , y ,7 ) To ,o, £10,91 = To.gl#gcxsfEtqcy.zixflog+zx )) Idea : Represent loss as stochastic computation

graph

#)#dl×

l×=
  • leg paths
) q / 9 9 X h , y he 2- h , =
  • log
, to 19 E a ly
  • log
%) 9 ly 1h , )
slide-14
SLIDE 14 Stochastic

Computation

Graphs Example : Structured VAE far MNIST (3 lectures ago ) To ,qL( 0,9 ) = To .gl#gcxsfEtlqo,eyixspcesfllxtlztly ))) Idea : Represent loss as stochastic computation

graph

,#dl×

l×=
  • leg paths
) q / 9 9 X h , y he 2- h , =
  • log
, to Q E a ly
  • log
%) qty 1h , )
slide-15
SLIDE 15 Gradients
  • f
Stochastic

Computation

Graphs Idea : Combine Reinforce
  • stale
and Re parameterized gradients

÷

=
  • E acxiaocsixspcail data

I

× l×=
  • leg paths
) q / 9 9 X h , y he 2- h , =
  • log
, to 19 E a ly
  • log
%) qlyih , )
slide-16
SLIDE 16 Gradients
  • f
Stochastic

Computation

Graphs Idea : Combine Reinforce
  • stale
and Re parameterized gradients DL

dlx

  • =
  • E gcxiqocyixspcaif
To , ) DO ,

=
  • #
gag

.is#pcqfdel-oxz+dd-ozy }

re Parameterized

)dl×

l×=
  • leg paths
) q / 9 9 × h , y he 2- h , =
  • log
to 9h E a ly
  • log
%) qlyih , )
slide-17
SLIDE 17 Gradients
  • f

Stochastic

Computation

Graphs Idea : Combine Reinforce
  • stale
and Re parameterized

gradients

dlx

daff

=
  • E
gcxioocsix ' P' af To , ) Reinforce
  • style
d l daff , =
  • Eacxiqocsixspcaildoztddoz

]

I

DL

dq

=
  • #

9ksqcsixspceifddo.bg

9cg 1h , )) ( text

lyte

, ) ]

,#dl×

l×=
  • leg paths
) q / 9 9 × h , y he 2- h , =
  • log
to
  • h
E a ly
  • log
1£ qlyih , )
slide-18
SLIDE 18 Gradients
  • f
Stochastic

Computation

Graphs Question ; What happens if we add an extra edge ?
  • ftp. !
  • E

gcxiqocsixspcaifddqlogqcyih

, )) (

lxtlytl

, ) ]

#)dl×

l×=
  • leg paths
) q / 9 9 × h , y he 2- h , =
  • log
  • to
19 E a ly
  • log
%) qlyih , )
slide-19
SLIDE 19 Gradients
  • f
Stochastic

Computation

Graphs Question ; What happens if we add an extra edge ?

Fp

, =
  • #

gcxsqcsixspcafddo.bg

qcylh , )) (

lxtlytl

, )

+8,9 +9¥ )

)dl×

l×=
  • leg paths
) q / 9 9 × h , y he 2- h , =
  • log
  • to
9h E a ly
  • log
%) qlyih , )
slide-20
SLIDE 20 Gradients
  • f
Stochastic

Computation

Graphs Question ; What happens if we add an extra edge ? Ideal . Sum
  • ver
paths f ( Paths

through

y )

¥p

, =
  • #

gcxiqocsixspcaifddo.bg

qcylh , )) (

lxtlytl

, ) +

ftp.t#yt1(pathsaroId4

)

)de×

l×=
  • leg paths
) q / 9 9 × h , y he 2- h , =
  • log
  • to
19 E a ly
  • log
%) qlyih , )
slide-21
SLIDE 21 Gradients
  • f
Stochastic

Computation

Graphs General

rules

: We can compute derivatives
  • f
an expected loss 1 . Sum derivatives
  • ver
all paths from loss nodes to parameters 2 . Stochastic nodes " bloch " gradients 3 . Use reinforce
  • style
derivative for blocked paths 4 Use repanahreberised derivative for
  • ther
paths

e.me#l3

q / 9 9 V , d , Vz dz des du L = l , tlztl ,
  • O
, Oz

V3

03

slide-22
SLIDE 22 Gradients
  • f
Stochastic

Computation

Graphs Sun
  • ver
" blocked " paths d E IL ] =

IT

; PLY , Veith

(

Sum
  • ver
" unblocked " paths I '

Epa

.mn/j..o?y.ddoibg9hil/Lta..?aeuado;ea)#.eTl3

q / 9 9 V , d ,

Vz

dz des da L = l , tlztl , O , Oz

V3

03

slide-23
SLIDE 23 Gradients
  • f
Stochastic

Computation

Graphs Sum
  • ver
" blocked " paths d E IL ] =

IT

, PLY , Veith

(

Sum
  • ver
" unblocked " paths I E per .

.mn/j..o?sy.dd-oibg9hil/Lta..?aeuad-oieu

]

Variational Inference Policy Search ( and max likelihood ) ' ( and model
  • based
RL ) Oi : Generative I inference Oi : Policy parameters model parameters t would I model . based RL )

Vj

: Latent variables Vj : States and actions lie : Log incremental weights la : Incremental rewards
slide-24
SLIDE 24 Credit Assignment in Stochastic

Computation

Graphs Sum
  • ver
" blocked " paths d E IL ] =

IT

; PLY , Veith

(

Sum
  • ver
" unblocked " paths I per .

.mn/j..o?sy.dd-oibg9hil/Lta..?aeuadoieu ]

Credit Assignment : The loss L = & eh depends
  • n
all variables { Vj } . If we sample Vsnpcv ) s Then some Vj are eihely " good " samples that increase Ls , whereas " bad " samples decrease L ? As a result bad samples may get " credit " .
slide-25
SLIDE 25 Credit Assignment Th Stochastic

Computation

Graphs Sum
  • ver
" blocked " paths d E IL ] =

IT

, PLY , Veith

(

Sum
  • ver
" unblocked " paths I per .

.mn/j..o?sy.dd-oibg9hil/Lta..?aeuadoieu ]

I

cific loss

Lj

Idea : Replace L with a Variable
  • Spe
that such that the expected gradient remains the same , but has a lower variance
slide-26
SLIDE 26 Credit Assignment in Stochastic

Computation

Graphs

IE

IL ) = Lj do ; PCY.k.vn m E per .

.mn/j..o?y.ddoibg9hilllQiB;lta..?aeuadoieu

]

÷

"

T.i.a.e.su

. . . which E I adobe.gg/vj ) ( L
  • Qj ) ]
=
  • ( Rao
  • Blachwellizatian)
2 . Add / subtract any baseline Bj with

Ftfddobogglvj

) Bj ) =
  • (
also known as a control craniate )
slide-27
SLIDE 27 Baseline Functions Definition : Any scalar function
  • f
a set
  • f
Variables non
  • .
descendants { Vb }

such that

Vb f Vj is a baseline

e.me#l3

q / 9 9 V , d , dz d3 Vz du

O ,

Vz

Oz O , constant relative per , , he , r , ) I ddg log P

Nzlvzih

) hi IVi,Vz )) to expectation
  • ver
, I = E per ,

.vn/Epcvziv..vzyddologPN31VzirillilVi,Vz

))) = O
slide-28
SLIDE 28 Base line and Critic Functions Definition : Any scalar function
  • f
a set
  • f
Variables non
  • .
descendants { Vb }

such that

Vb f Vj is a baseline

e.me#l3

q / 9 9 V , d , dz d3 Vz du

O ,

Vz

Oz O , Cost
  • to
  • go
:

= L

I

E la = I la ksj

hjjsh

Baseline : Loss before ; Critic : Loss after j
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33 Base line and Critic Functions p I V? ( Cu B ) I c )

plat

B ) pc B )

÷

E per , fado log pivjl Kj )

Lj ]

=

Epc I #

ply

. ,

fado

log P ' Vj ' Kj ) Et

per

, ; , y . ve ; , I Lj )]