1
Graham Neubig – Non-parametric Bayesian Statistics
Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 - - PowerPoint PPT Presentation
Graham Neubig Non-parametric Bayesian Statistics Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric Bayesian Statistics Overview About Bayesian Non-parametrics Basic theory Inference
1
Graham Neubig – Non-parametric Bayesian Statistics
2
Graham Neubig – Non-parametric Bayesian Statistics
3
Graham Neubig – Non-parametric Bayesian Statistics
4
Graham Neubig – Non-parametric Bayesian Statistics
5
Graham Neubig – Non-parametric Bayesian Statistics
6
Graham Neubig – Non-parametric Bayesian Statistics
multinomial
i c i
7
Graham Neubig – Non-parametric Bayesian Statistics
we could have
if
8
Graham Neubig – Non-parametric Bayesian Statistics
9
Graham Neubig – Non-parametric Bayesian Statistics
10
Graham Neubig – Non-parametric Bayesian Statistics
n
11
Graham Neubig – Non-parametric Bayesian Statistics
n
P base x=i −1
n
n
12
Graham Neubig – Non-parametric Bayesian Statistics
From Wikipedia
α = 15 Pbase = {0.2,0.47,0.33} α = 9 Pbase = {0.22,0.33,0.44} α = 10 Pbase = {0.6,0.2,0.2} α = 14 Pbase = {0.43,0.14,0.43}
13
Graham Neubig – Non-parametric Bayesian Statistics
2=∏i =1 n
c x=i
n
cx=i ∗
n
i−1→
n
cx= ii−1
14
Graham Neubig – Non-parametric Bayesian Statistics
1
1−12 2−1d 1
1
11−1 2−1d 1
1
1−1d 1
2−1d 1
2/2
11−1 2/2]0 1−
1
2/2∗11 1−1d 1
1
1−11−1 2d 1
15
Graham Neubig – Non-parametric Bayesian Statistics
1
n
c x=i i −1
n
16
Graham Neubig – Non-parametric Bayesian Statistics
c = { 0, 0, 0, 0 } c = { 1, 0, 0, 0 } c = { 1, 1, 0, 0 }
Px 4=3∣x 1,2,3=01∗.25 31 =.063
c = { 2, 1, 0, 0 }
Px 5=1∣x1,2,3,4=21∗.25 41 =.45
c = { 2, 1, 1, 0 }
17
Graham Neubig – Non-parametric Bayesian Statistics
Psits at table i∝ci Psits at a new table∝
…
1 2 1 3
18
Graham Neubig – Non-parametric Bayesian Statistics
19
Graham Neubig – Non-parametric Bayesian Statistics
P(Noun)= 4/10 = 0.4, P(Verb)= 4/10 = 0.4, P(Preposition) = 2/10 = 0.2 Sample: Verb Verb Prep. Noun Noun Prep. Noun Verb Verb Noun …
1E+00 1E+01 1E+02 1E+03 1E+04 1E+05 1E+06 0.2 0.4 0.6 0.8 1 Noun Verb Prep. Samples Probability
20
Graham Neubig – Non-parametric Bayesian Statistics
Generate number from uniform distribution over [0,z) Iterate over all probabilities Subtract current prob. value If smaller than zero, return current index as answer Calculate sum of probs
21
Graham Neubig – Non-parametric Bayesian Statistics
22
Graham Neubig – Non-parametric Bayesian Statistics
23
Graham Neubig – Non-parametric Bayesian Statistics
Moth/Daugh Moth/Son Fath/Daugh Fath/Son
24
Graham Neubig – Non-parametric Bayesian Statistics
25
Graham Neubig – Non-parametric Bayesian Statistics
26
Graham Neubig – Non-parametric Bayesian Statistics
27
Graham Neubig – Non-parametric Bayesian Statistics
PT(1|0) PT(2|1) PT(3|2) … PE(the|1) PE(boats|2) PE(row|3) …
28
Graham Neubig – Non-parametric Bayesian Statistics
29
Graham Neubig – Non-parametric Bayesian Statistics
30
Graham Neubig – Non-parametric Bayesian Statistics
31
Graham Neubig – Non-parametric Bayesian Statistics
Subtract current tag counts Calculate all possible tag probabilities Choose a new tag Add the new tag counts
32
Graham Neubig – Non-parametric Bayesian Statistics
For N iterations Sample all the tags Save sample of θ Average parameters θ Randomly initialize tags
33
Graham Neubig – Non-parametric Bayesian Statistics
– If we want each word to have one POS tag, we can set
34
Graham Neubig – Non-parametric Bayesian Statistics
35
Graham Neubig – Non-parametric Bayesian Statistics
1 2 3 4 5 6 0.1 0.2
6 Parts of Speech
POS Number Probability
1 2 3 4 5 6 7 8 9 1011121314151617181920 0.02 0.04 0.06
20 Parts of Speech
POS Number Probability
36
Graham Neubig – Non-parametric Bayesian Statistics
1 13 5 9 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 0.01 0.01 0.02
100 Parts of Speech
POS Number Probability
10000 130000 250000 370000 490000 610000 730000 850000 970000 0.00E+00 5.00E-07 1.00E-06 1.50E-06
1 Million Parts of Speech
POS Number Probability
Py i∣y i −1=cy i −1 y i∗Pbasey i c y i−1 lim
N ∞ ∑i =1 N
1 N =1
N= number
37
Graham Neubig – Non-parametric Bayesian Statistics
Py i∣y i −1=cy i −1 y i∗Pbasey i c y i−1 Py i∣y i −1= cy i −1 y i cy i −1 Py i=new∣y i −1= cy i −1
38
Graham Neubig – Non-parametric Bayesian Statistics
Py i=1∣y i−1=1=1∗1/2 2 Py i=2∣y i −1=1=1∗1/2 2 Py i≠1,2∣y i −1=1= ∗0 2 Py i=1∣y i−1=1=1∗1/20 2 Py i=2∣y i −1=1=1∗1/20 2 Py i≠1,2∣y i −1=1=∗18/20 2 Py i=1∣y i−1=1=1∗1/∞ 2 Py i=2∣y i −1=1=1∗1/∞ 2 Py i≠1,2∣y i −1=1= ∗1 2
39
Graham Neubig – Non-parametric Bayesian Statistics
Remove counts for current tag Calculate existing POS probabilities Pick a single value Add the new counts Calculate new POS probability
40
Graham Neubig – Non-parametric Bayesian Statistics
Pword=cword∗Pbaseword cword Pbaseword=P len4Pchar wPchar oPchar rPchar d
41
Graham Neubig – Non-parametric Bayesian Statistics
PT y i∣y i−1=DP ,PT y i PT y i=DP , Pbasey i Transition Prob. POS Prob.
c(y1)=5 c(y2)=0 c(y3)=1 Dumb: c(y1)=5 c(y2)=0 c(y3)=1 c(y4)=1 Smart: c(y1)=5 c(y2)=1 c(y3)=1
42
Graham Neubig – Non-parametric Bayesian Statistics
43
Graham Neubig – Non-parametric Bayesian Statistics
44
Graham Neubig – Non-parametric Bayesian Statistics
sampling depends depends
45
Graham Neubig – Non-parametric Bayesian Statistics
s0 s1 s2 s3 s4 s5 p(s1|s0) p(s2|s0) p(s3|s2) p(s4|s1) p(s3|s1) p(s4|s2) p(s5|s3) p(s5|s4)
s0 s1 s2 s3 s4 s5
46
Graham Neubig – Non-parametric Bayesian Statistics
s0 s1 s2 s3 s4 s5 p(s1|s0) p(s2|s0) p(s3|s2) p(s4|s1) p(s3|s1) p(s4|s2) p(s5|s3) p(s5|s4)
s2 s3 s5
47
Graham Neubig – Non-parametric Bayesian Statistics
48
Graham Neubig – Non-parametric Bayesian Statistics
# of x=1 Values Prob.
49
Graham Neubig – Non-parametric Bayesian Statistics
# of x=1 Values Prob.
50
Graham Neubig – Non-parametric Bayesian Statistics
Py i∣y i −1=cy i −1 y i∗Pbasey i c y i−1 Pbasey i =cbasey i∗1/N cbase・ Transition prob:
P(yi|yi-1=1) P(yi|yi-1=2) P(yi|yi-1=3)
Pbase(yi)
51
Graham Neubig – Non-parametric Bayesian Statistics
…
1 2 1 3
…
1 4 2
…
1 2 3
4
52
Graham Neubig – Non-parametric Bayesian Statistics
…
1 2 1 3
Px i=cx i−d∗t x id∗t ・∗Pbasex i c・
Px i=1=3−d∗2d∗4∗0.25 5
53
Graham Neubig – Non-parametric Bayesian Statistics
54
Graham Neubig – Non-parametric Bayesian Statistics
this is a document this is a document this is a document this is a document this is a document this is a document this is a document this is a document this is a document this is a document this is a document this is a document
Collection of Documents Generate a multinomial topic distribution (with a Dirichlet prior)
Generate each word's topic from the topic dist. Generate each word from the topic's word dist
55
Graham Neubig – Non-parametric Bayesian Statistics
P(wi|wi-1=1) P(wi|wi-1=2) P(wi|wi-1=3)
Pbase(wi)
56
Graham Neubig – Non-parametric Bayesian Statistics
57
Graham Neubig – Non-parametric Bayesian Statistics
Acoustic Model
Learning Spoken Language Model
Speech Phoneme Lattice
58
Graham Neubig – Non-parametric Bayesian Statistics
59
Graham Neubig – Non-parametric Bayesian Statistics
Proceedings of the 16th Annual Conference on Neural Information Processing Systems, 1:577– 584.
Machine Learning Research, 3:993–1022.
synchronous grammar induction. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics, pages 782–790.
a Bayesian translation model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 314–323.
word segmentation: Exploring the effects of context. Cognition, 112(1):21–54.
allocation and an efficient topic inference algorithm. In Proceedings of the 8th Annual Conference of the International Speech Communication Association (InterSpeech).
In Proceedings of the 2007 IEEE Automatic Speech Recognition and Understanding Workshop, pages 124–129.
via Markov chain Monte Carlo. In Proceedings of the Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 139–146
60
Graham Neubig – Non-parametric Bayesian Statistics
Processing, pages 688–697.
segmentation with nested Pitman-Yor modeling. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics.
language model from continuous speech. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (InterSpeech), Makuhari, Japan, 9.
An unsupervised model for joint phrase alignment and extraction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, USA, 6.
Conference on Empirical Methods in Natural Language Processing, pages 1–10.
School of Computing, National Univ. of Singapore.
Linguistics.