Dirichet Allocation Latent Lecture 11 : Jordan Yuan Andrea - - PowerPoint PPT Presentation

dirichet
SMART_READER_LITE
LIVE PREVIEW

Dirichet Allocation Latent Lecture 11 : Jordan Yuan Andrea - - PowerPoint PPT Presentation

Dirichet Allocation Latent Lecture 11 : Jordan Yuan Andrea Scribes : , , Exam Next Feb ) ( week Wed 28 : on ) ( Due Homework Man Fri next was : Midterm Exam focus Open Format book understanding on : , 1 Gandhy


slide-1
SLIDE 1 Lecture 11 : Latent Dirichet Allocation Scribes : Yuan , Andrea , Jordan Exam : Next week
  • n
Wed ( 28 Feb ) Homework : Due next Fri ( was Man )
slide-2
SLIDE 2 Midterm Exam Format : Open book , focus
  • n
understanding 1Gandhy ) Content ' . Everything up to Lecture 8 IEM ) ( lectures 9810 , VBEM , Exponential Fans will n±+ be an exam ) WE be
  • n
the exam Grading : Typically designed to maximize spread Average scare :# 5/700 Standard Devi .
  • 10
Preparation : Read lecture notes ! ( 60 pages to date )
slide-3
SLIDE 3 Word Mixtures Idea : Model document as mixture
  • ver
topics ( ignore
  • rder )
Document Topics yw I

Zn=k~Dis<

( pw) Eu ~ Disc ( O ) PC yn=v I 7n=h ) B , e Ryn # words . Rz in vochb
  • # topics

Bkd

slide-4
SLIDE 4 Word Mixtures Question : Can this model learn topics 13h ? ↳ Problem : How do we figure cut which wards go together yw I Zn= be ~ Disc ( Bh) Eu ~ Disc ( O ) PC yn=v I 7n=h ) = Bwu B. . B: By
slide-5
SLIDE 5 Latent Dirichlet Allocation / and PLSA ) Probabilistic Latent Senate Analysts Idea : Infer topics by modeling multiple documents Pw~P(137

ydnltaehn

Disc ( Bu ) Zdnn Disclose ) Od~plO ) KDA ) ( PLSA ) ( LPA ) ( K ) ( D ) P , 7d , I Oa Bz . : . in I 7dNd Nd PK
slide-6
SLIDE 6 LDA : Intuition Idea : Not all documents will contain the same words
  • "
genetic " is unlikely in a sports article
  • "
innings " ic unlikely in a science paper foal : Use Ta ( top :c probs ) to " cluster " documents Od
slide-7
SLIDE 7 LDA : Example
  • f
most frequent Topics B , Bz B3 By

÷

meal Be , 7 Odz 7 % > Odh
slide-8
SLIDE 8 LDA : Example
  • f
most frequent Topics
slide-9
SLIDE 9 LDA : Example
  • f
most frequent Topics
slide-10
SLIDE 10 LDA : Conditional Independence Generative model : Pu ~ Dirichetlw ) Yan Bh Oa ~ Dirichlet ( x ) K 2- dn~ Discrete ( Od )
  • n Ad
Zdn W ydnl 7dn=h ~ Discrete ( Pa ) Na D Yd ,

Zakyd

' 't

d.

7 a ' ¥ d Bh 1- Bh ' | y ,7 y'd , 7- a 1- Ya ' # d. Zd '±d I B On 1- Od±d ly , B
slide-11
SLIDE 11 Algorithm 1 : Collapsed Gibbs Sampling fibbs Updates ( topic assignments 1 Zdn ~ p ( 7 du 1 y , 7. FT f. an = 7 \ { 7dn } Implementation Requirement Need to . compute monginals
  • ver
O , and B pcy

,7

) = |dOdB pcyit , p ,O ) Conjugate

Priori

Bun Dinichlet ( W , , ...

,wv)

Od~ Diniohktld , , ... ,ak )
slide-12
SLIDE 12 Dirioulet Distribution Gamma function k I Density K 1 an . i M Man ) non

.io#Eg.EaIw.*..Bo=rFon

E [ Ow]= g that sm to , ? 9 Eou =L A = ( 0,1 , Oi , 0,1 ) A = ( 7.0 , 7.0 , 7.0 ) 9 = ( 10 , 10 , 70 )
slide-13
SLIDE 13 # [ On ] =L Dirioulet Distribution g. q

an

JEFF da= 1. am

law

,
slide-14
SLIDE 14 Discrete / Multiannual Distribution Joint Distribution
  • n
Topic Assignments Zn ~ Disc ( 0 . , ... ,Ok ) N
  • Nh=
[ Ifh=h7 Pl7i , ... ,7µ ; 0 ) = MY .lu?qIHn=h ] n= , =
  • E. qE±'
an in =
  • II. oath
Multinemial Distribution PINI , ... , NKIQN ) = MUIHN , , ... ,Nh , O ,µ ) = II.
  • n
"
slide-15
SLIDE 15 Conjuyacy : Exponential Family Forms Observation : The D: riowlet distribution Is conjugate to the mulitinemial

/

" ⇐ lgq Pc717 ) =

Has

explnttct )
  • ayy
) ] HF) = Nh =

th

OUNH =

ftp.expllogtu.NU

] Dinioukt 10 ; as = pta , II. OF " ' p ( n ; k ) = hcy ) exp

1/9

's y
  • Hay
)
  • ad
) ] = exp [ ( au
  • i ) log On
  • log BK )
) In = Out at ) = log Bl 1+1 )
slide-16
SLIDE 16 Conjugacy : Predictive and Posterior Distributions plziy ) = exp [ ytttf ) ] ply 1/7 ) = lxp[ Tty . aci ) ] N nu = log Oh tu(7l=Nh=[ Zn In = an . I u=i Posterior : ply 17 ) = pcy ;§ ) ~
  • 9h=
Jutta ) = In + Nh

4n=dutNh

Marginal : pc7 ) = |dy pain ) pin ) = exp [ ACT )
  • ac 7 ) ]
= BC £ )1B(o7
slide-17
SLIDE 17 LDA : Marginal
  • ver
B and Nau Observation : p 113,0 ) Is conjugate to ply , 7113,97 4

pl7a\

= B(£)_

Eau

= an + Ndw N B 19 Nah .

%I[7dn=w)

=L D N pcylz ) = B (

E

) Eau : Wu + [ III ydniv ) Ittdiih )
  • D=
, hi , Blwl
slide-18
SLIDE 18 LDA : Collapsed Gibbs Updates Update far Taste Assignments ~ ~ PC 7dn= U ( Z . an , Y . an , Yduiv ) & Odh Wwv
  • dn
~
  • dn
£du = An + Ndh whv = Bv + Nuv e- Sufficient statistics Ni " " 's E Bu . an q Nail "

.in?nII7dn=h)Nhv=na,n,=,aIjYdni=YIIZa'nih

] Hid "

=T

Ittaneh ) d 'n' tdh
slide-19
SLIDE 19