14 Allocation Dirichlet Latent Lecture : Taheri Sara Scribes - - PowerPoint PPT Presentation

14
SMART_READER_LITE
LIVE PREVIEW

14 Allocation Dirichlet Latent Lecture : Taheri Sara Scribes - - PowerPoint PPT Presentation

14 Allocation Dirichlet Latent Lecture : Taheri Sara Scribes : Chu 4am Exam Man Tue 12 : Midterm Exam focus Open Format book understanding on : , IEM ) Everything 8 Lecture Content to up ' . ( VBEM 9810 lectures


slide-1
SLIDE 1 Lecture

14

: Latent Dirichlet Allocation Scribes : Sara Taheri 4am Chu Exam : Tue 12 Man
slide-2
SLIDE 2 Midterm Exam Format : Open book , focus
  • n
understanding Content ' . Everything up to Lecture 8 IEM ) ( lectures 9810 , VBEM , will n±t be an exam ) Grading : Typically designed to maximize spread Average score :# 5/100 Standard Devi .
  • 10
Preparation : Read lecture notes !
slide-3
SLIDE 3 Latent Dirichlet Allocation : Generative Model Generative model :

/

Topics shared between does Bu ~ DirichletIN , , . . . ,wr )} yan Bh Oa ~ Dirichlet 6 , , . . . .dk ) K

|

2- du n Discrete (

Odi

, . . . ,Gdk ) A Od 2- an w Y ydnl 7 anti ~ Discreteai , . . . , Buu) Na b K topics {

\

Mixture D documents model for each document a mixture component B a distribution
  • ver
Wends a. b. a a " topic "
slide-4
SLIDE 4 Word Mixtures Idea : Model document as mixture
  • ver
topics ( ignore
  • rder )
Document Topics yw I 2- n= be ~ Disc (13h) 2- a
  • Disc ( O )
pl yn
  • v
I 7- n=h ) = Bwv B , . Bz B3 By
slide-5
SLIDE 5 Word Mixtures Question : Can this model learn topics 13h ? Problem ; How do we know which words go together ? yw I 2- n= be ~ Disc ( Bh) 2- a
  • Disc ( O )
pl yn
  • v
I 7- n=h ) = Bwv B , . Bz B 3 By
slide-6
SLIDE 6 Latent Dirichlet Allocation / and PLSA ) Idea : Infer topics by modeling multiple documents Bu~p( B) ydnltnih ~ Disc ( Bu ) Zdnn Disclose ) Odnplo) ( LDA ) ( LPA & PLSA ) ( LDA ) ( k ) ( D ) B , Adi Bz Ydi Td
  • l
  • 7dH
BK Yan
slide-7
SLIDE 7 LDA : Intuition Idea : Not all documents will contain the same words
  • "
genetic " is unlikely in a sports article
  • "
innings " ic unlikely in a science paper foal : Use Ta ( top :c probs ) to " cluster " documents
  • Want
number
  • f
topics to be small fish ) ~ 1
  • 70
Od
slide-8
SLIDE 8 Gamma function Dirioulet Distribution Symmetric

)

Chloe i 4 , = .
  • =
& K K V Density K d M Man ) p( Osa , , ... ,ak ) = I M f 9h " BG ) =

maB(

a) he h p( fgw ) Et [Ow]= ¥ prim an normalized weights 80ns ' A = ( 0.1 , OI , 0,1 ) A = ( 7.0 , 7.0 , 7.0 ) 9 = ( 10,10 , 70 )
slide-9
SLIDE 9 Dirioulet Distribution

an

fsw

:# D= 1.0

:

no .o

law

,
slide-10
SLIDE 10 LDA : Example
  • f
most frequent Topics B , Bz B3 By 9 di > Odz 7 Od3 7 0dg
slide-11
SLIDE 11 LDA : Example
  • f
most frequent Topics
slide-12
SLIDE 12 LDA : Example
  • f
most frequent Topics
slide-13
SLIDE 13 LDA : Gibbs Sampling Generative Model Bh ~ Dirichlet

In

, , . . .iq ) Yan Bh K Oa ~ Dirichlet ( d , , . . . .dk ) & Fdn W 7- du ~ Discrete (

Odi

, . . . ,8dk ) d Nd ydnl 7 anti ~ Discrete

ai

, . . . , Buu) I b a- Gibbs Sampler Conditional Independence 2- du n pttdulydn , Od , 13 ) yd , Zd ,

Iyetd.to#d,8e-td/pOd~pCOdlyd.2-d

, P ) ( documents independent given B) Pu ~ pcphly.tl put Beth I g. 2-
slide-14
SLIDE 14 Gibbs Sampling : Sufficient Statistics ply , 7,0 , p 10 , w ) = ply 17 , B ) plz 103

plot

a ) p I BI w ) If

yaniv

) Ittdn
  • h )
pl y I 2- , B ) = Ma M I? Tal p

cyan

  • v
I

Zdn

  • h
, p ) n i I Pur documents wolves Too!bularT topics =/? I? I? Ma 13 ur II both =D Iftar he ] = exp [ § f log Phu ( Ea ? If yan
  • u ] Iban
  • HI))
  • Nhu
: number
  • f
times the word " v " appears in topic " he "
slide-15
SLIDE 15 Gibbs Sampling : Sufficient Statistics ply , 7,0 ,

play

) = ply 17 , B) PCZIO )

photo

) p I pl w ) ply it , B ) = exp [ { f leg par Nur ) P' 710 ) = Ma I? Tal p c a dn=h I
  • d
) Iltdh
  • h ]
= DM I? My Oda It Zdih ) = exp I Cafe log 0dL ( { Ittanh ) ) ) Ndh : Number
  • f
words in document " d " that belong to topic " U "
slide-16
SLIDE 16 Gibbs Sampling : Sufficient Statistics ply , 7,0 ,

play

) = ply 17 , B) PCZIO )

photo

) p I pl w ) ply iz , B ) = exp [ { f leg par Nur ) plz 10 ) = exp [ § ? log Odu Ndh ) p I O lo ) = Ma Dirichlet ( Od I A) = 17 a , Ma
  • ne
  • u
  • i
= exp I ? log 0dL (
  • u
  • I ))
  • log
BIO ) )
slide-17
SLIDE 17 Gibbs Sampling : Sufficient Statistics ply , 7,0 , p 10 , w ) = ply 17 , B) PCZIO )

photo

) p I BI w ) pc y I z , B ) = exp [ § f leg par Nur ) plz IO ) = exp [ § ? log Odu Ndh ) p Cola ) = exp If log 0dL (
  • u
  • t ))
  • log
BIO ) ) co pep , w , = exp flog Pw Iwao
  • D)
  • leg Blunt)
slide-18
SLIDE 18 Gibbs Sampling i Conjugacy ply , 7,0 , p la , w ) = ply 17 , B) PCZIO ) photo ) p I pl w ) pc y I z , B ) = exp I { f leg par Nur ) pep Iw ) = exp

If

( flog Pnv I who
  • it)
  • log Blunt)
Thu
  • ply
, Plz ,w ) = exp I fu ( flog But Nhutwuu
  • D)
  • leg Blind)
= expffff.bg/3uufw~w
  • I))
  • leg
Btwn ) PCB a I bit , w ) = Dirichlet ( Bu ; Wh ) exp ( leg Bluth )
  • leg
B C wht ] = pl y I Z , w )
slide-19
SLIDE 19 Gibbs Sampling i Conjugacy ply , 7,0 , p to , w ) = ply 17 , B ) plz 103 plot a ) p I pl w ) pc y I z , B ) = exp I { f leg par Nur ) pep I w ) = exp I fu ( { log Pnv I who
  • it)
  • by Blunt)
Plp ly , 2- , w ) = exp I fu (flog Puufuiw
  • I))
  • leg
Btwn ) p I y I Z , w ) = exp I ? ( tog Bluth )
  • leg
B Cwhl )] hv = Nur t W hv
slide-20
SLIDE 20 Gibbs Sampling i Conjugacy ply , 7,0 , play ) = ply 17 , B) PCZIO ) photo ) p I pl w ) plz 10 ) = exp [ § ? log Odu Ndh ) p Colo ) = exp If log 0dL (
  • u
  • I ))
  • log
BIO ) ) plotz , a ) = exp I ? ( Cu log Odle ( Out Ndh
  • I)
  • log Blot Nd))
= Ma Dirichlet ( Od I Td ) pczio ) = exp

If

(log Blotted)
  • log
Bla )) ] Idk = Out Ndh
slide-21
SLIDE 21 Collapsed Gibbs Sampling Gibbs Updates ( topic assignments I 2- In ~ PC 2- du I y , 7
  • dn¥
7 . du = 7 I { tan } Implementation Requirement Need to . compute marginals
  • ver
, and B ply

,7lw,o

) = / do dB pig , 7 , 13,0 I w ,
  • )
. t Idea : Exploit Conjugacy = . ply 17 , w ) p 1710 )