A Hierarchical Bayesian Language Model Based on Pitman-Yor - - PowerPoint PPT Presentation
A Hierarchical Bayesian Language Model Based on Pitman-Yor - - PowerPoint PPT Presentation
A Hierarchical Bayesian Language Model Based on Pitman-Yor Processes Author: Yee Whye Teh, 2006 Reviewer: Xueqing Liu Dirichlet Process (CRP) Recap Model sequence of words as a sequence of customers coming to a restaurant: x 1 , x 2 ,
Model sequence of words as a sequence of customers coming to a restaurant: x1, x2, …… Model vocabulary set as a sequence of tables (dishes) : y1, y2, …… The c. = 𝑑𝑙
𝑢 𝑙=1
th customer comes and chooses table according to customer number in each table:
𝑑𝑙 𝛽+𝑑., chose a new table: 𝛽 𝛽+𝑑.
Dirichlet Process (CRP) Recap
y1 y2 y3 c1= 3 c2= 2 c1= 1
𝐻~𝑄𝑍(𝑒, 𝛽, 𝐻0) G: customers sequence; G0: tables sequence Probability customer sits at table yk:
𝑑𝑙−𝑒 𝛽+𝑑. ; chooses new table: 𝛽+𝑒𝑢 𝛽+𝑑.
Assumes a finite vocabulary set/table sequence of size V: W Power-law distribution: “rich-gets-richer”; number of unique words scale exponentially as 𝑃(𝛽𝑈𝑒)
Pitman-Yor Process: a generalization
𝐻𝒗~𝑄𝑍(𝑒 𝒗 , 𝛽|𝒗|, 𝐻𝜌(𝒗)) Draw a sequence of customers 𝐻𝒗 from another sequence of customers 𝐻𝜌(𝒗), 𝜌 𝒗 = 𝒗2𝒗3 … 𝒗|𝒗| Consider W = {a, b, c}
Hierarchical Pitman-Yor Language Model
𝐻∅~𝑄𝑍(𝑒0, 𝛽0, 𝐻0)
𝐻𝑏~𝑄𝑍(𝑒1, 𝛽1, 𝐻∅) 𝐻𝑐~𝑄𝑍(𝑒1, 𝛽1, 𝐻∅) 𝐻𝑑~𝑄𝑍(𝑒1, 𝛽1, 𝐻∅)
𝐻𝑏𝑏~𝑄𝑍(𝑒2, 𝛽2, 𝐻𝑏) 𝐻𝑐𝑏~𝑄𝑍(𝑒2, 𝛽2, 𝐻𝑏) 𝐻𝑑𝑏~𝑄𝑍(𝑒2, 𝛽2, 𝐻𝑏)
………….
Hierarchical CRP: an Example
W = { a, b, c } Context u = c a c ? Sequence xu1, xu2, ……drawn from Gcac
Gcac
xu1 xu2 xu3 ……. ? ? ?
Hierarchical CRP: an Example
W = { a, b, c } Context u = c a c ? Sequence xu1, xu2, ……drawn from Gcac xu1 xu2 xu3 ……. ? ? ?
Gcac Gac
?
W = { a, b, c } Context u = c a c ? Sequence xu1, xu2, ……drawn from Gcac
Gcac Gac Gc
?
G∅
?
G0(uni form)
a ? xu1 xu2 xu3 ……. ? ? ?
Hierarchical CRP: an Example
W = { a, b, c } Context u = c a c ? Sequence xu1, xu2, ……drawn from Gcac
Gcac Gac
a ?
Gc
a
G∅
a
G0(uni form)
a xu1 xu2 xu3 ……. =a ? ?
1 − 𝑒3 𝛽3 + 1 𝛽3 + 𝑒3 𝛽3 + 1
Hierarchical CRP: an Example
W = { a, b, c } Context u = c a c ? Sequence xu1, xu2, ……drawn from Gcac
Gcac Gac
a ?
Gc
a ?
G∅
a
G0(uni form)
a xu1 xu2 xu3 ……. =a ? ?
1 − 𝑒2 𝛽2 + 1 𝛽2 + 𝑒2 𝛽2 + 1
Hierarchical CRP: an Example
W = { a, b, c } Context u = c a c ? Sequence xu1, xu2, ……drawn from Gcac
Gcac Gac
a a
Gc
a
G∅
a
G0(uni form)
a xu1 xu2 xu3 ……. =a a ?
Hierarchical CRP: an Example
W = { a, b, c } Context u = c a c ? Sequence xu1, xu2, ……drawn from Gcac
Gcac Gac
a a ?
Gc
a ?
G∅
a ?
G0(uni form)
a b xu1 xu2 xu3 ……. =a a ?
Hierarchical CRP: an Example
1 − 𝑒3 𝛽3 + 2 1 − 𝑒3 𝛽3 + 2 𝛽3 + 2𝑒3 𝛽3 + 2 2 − 𝑒2 𝛽2 + 2 𝛽2 + 𝑒2 𝛽2 + 2 𝛽1 + 𝑒1 𝛽1 + 1 1 − 𝑒1 𝛽1 + 1 1 − 𝑒0 𝛽0 + 1 𝛽0 + 𝑒0 𝛽0 + 1
W = { a, b, c } Context u = c a c ? Sequence xu1, xu2, ……drawn from Gcac
Gcac Gac
a a b
Gc
a b
G∅
a b
G0(uni form)
a b xu1 xu2 xu3 ……. =a a b
Hierarchical CRP: an Example
𝐻𝑣s are marginalized out; use 𝑇𝑣, 𝑻 = {𝑇𝑤}, 𝚰 = {𝛽𝑛, 𝑒𝑛} 𝑞 𝑥 𝒗, 𝐸 = 𝑞 𝑥 𝒗, 𝑻, 𝚰 𝑞 𝑻, 𝚰 𝐸 𝑒(𝑻, 𝚰) Approximate the integral with samples: 𝑞 𝑥 𝒗, 𝐸 ≈ 𝑞 𝑥 𝒗, 𝑻 𝑗 , 𝚰 𝑗
𝑗
Recursively compute 𝑞(𝑥|𝒗, 𝑻, 𝚰) : 𝑞 𝑥 𝒗, 𝑻, 𝚰 = 𝑑𝒗𝑥∙ − 𝑒|𝒗|𝑢𝒗𝑥∙ 𝜄|𝒗| + 𝑑𝒗∙∙ + 𝜄 𝒗 + 𝑒 𝒗 𝑢𝒗𝑥∙ 𝜄 𝒗 + 𝑑𝒗∙∙ 𝑞(𝑥|𝜌 𝒗 , 𝑻, 𝚰)
Inference with Gibbs Sampling
Gibbs sampling:
𝑞 𝑙𝒗𝑚 = 𝑙 𝑻−𝒗𝑚, 𝚰) ∝ max 0, 𝑑𝒗𝑦𝒗𝑚𝑙
−𝒗𝑚
− 𝑒 𝜄 + 𝑑𝒗∙∙
−𝒗𝑚
𝑞 𝑙𝒗𝑚 = 𝑙𝑜𝑓𝑥𝑥𝑗𝑢ℎ 𝑧𝒗𝑙𝑜𝑓𝑥 = 𝑦𝒗𝑚 𝑇−𝒗𝑚, Θ ∝ 𝜄 + 𝑒𝑢𝑣∙∙
−𝒗𝑚
𝜄 + 𝑑𝑣∙∙
−𝒗𝑚 𝑞 𝑦𝒗𝑚 𝜌 𝒗 , 𝑻−𝒗𝑚, 𝚰)
Inference with Gibbs Sampling
IKN: interpolated Kneser-Ney MKN: modified Kneser-Ney HPYLM: Pitman-Yor using Gibbs sampler HPYCV: parameters obtained by cross-validaion