anlp lecture 9 algorithms for hmms
play

ANLP Lecture 9: Algorithms for HMMs Sharon Goldwater 4 Oct 2019 - PowerPoint PPT Presentation

ANLP Lecture 9: Algorithms for HMMs Sharon Goldwater 4 Oct 2019 Recap: HMM Elements of HMM: Set of states (tags) Output alphabet (word types) Start state (beginning of sentence) State transition probabilities Output


  1. ANLP Lecture 9: Algorithms for HMMs Sharon Goldwater 4 Oct 2019

  2. Recap: HMM • Elements of HMM: – Set of states (tags) – Output alphabet (word types) – Start state (beginning of sentence) – State transition probabilities – Output probabilities from each state Algorithms for HMMs (Goldwater, ANLP) 2

  3. More general notation • Previous lecture: – Sequence of tags T = t 1 … t n – Sequence of words S = w 1 … w n • This lecture: – Sequence of states Q = q 1 ... q T – Sequence of outputs O = o 1 ... o T – So t is now a time step, not a tag! And T is the sequence length. Algorithms for HMMs (Goldwater, ANLP) 3

  4. Recap: HMM • Given a sentence O = o 1 ... o T with tags Q = q 1 ... q T , compute P(O,Q) as: 𝑈 𝑄(𝑃, 𝑅) = 𝑄 𝑝 𝑢 𝑟 𝑢 𝑄 𝑟 𝑢 𝑟 𝑢−1 𝑢=1 argmax 𝑅 𝑄(𝑅|𝑃) • But we want to find without enumerating all possible Q – Use Viterbi algorithm to store partial computations. Algorithms for HMMs (Goldwater, ANLP) 4

  5. Today’s lecture • What algorithms can we use to – Efficiently compute the most probable tag sequence for a given word sequence? – Efficiently compute the likelihood for an HMM (probability it outputs a given sequence s )? – Learn the parameters of an HMM given unlabelled training data? • What are the properties of these algorithms (complexity, convergence, etc)? Algorithms for HMMs (Goldwater, ANLP) 5

  6. Tagging example Words: <s> one dog bit </s> Possible tags: <s> CD NN NN </s> (ordered by NN VB VBD frequency for PRP each word) Algorithms for HMMs (Goldwater, ANLP) 6

  7. Tagging example Words: <s> one dog bit </s> Possible tags: <s> CD NN NN </s> (ordered by NN VB VBD frequency for PRP each word) • Choosing the best tag for each word independently gives the wrong answer (<s> CD NN NN </s>). • P(VBD|bit) < P(NN|bit), but may yield a better sequence (<s> CD NN VB </s>) – because P(VBD|NN) and P(</s>|VBD) are high. Algorithms for HMMs (Goldwater, ANLP) 7

  8. Viterbi: intuition Words: <s> one dog bit </s> Possible tags: <s> CD NN NN </s> (ordered by NN VB VBD frequency for PRP each word) • Suppose we have already computed a) The best tag sequence for <s> … bit that ends in NN. b) The best tag sequence for <s> … bit that ends in VBD. • Then, the best full sequence would be either – sequence (a) extended to include </s>, or – sequence (b) extended to include </s>. Algorithms for HMMs (Goldwater, ANLP) 8

  9. Viterbi: intuition Words: <s> one dog bit </s> Possible tags: <s> CD NN NN </s> (ordered by NN VB VBD frequency for PRP each word) • But similarly, to get a) The best tag sequence for <s> … bit that ends in NN. • We could extend one of: – The best tag sequence for <s> … dog that ends in NN. – The best tag sequence for <s> … dog that ends in VB. • And so on… Algorithms for HMMs (Goldwater, ANLP) 9

  10. Viterbi: high-level picture • Intuition: the best path of length t ending in state q must include the best path of length t-1 to the previous state. ( t now a time step , not a tag ). Algorithms for HMMs (Goldwater, ANLP) 10

  11. Viterbi: high-level picture • Intuition: the best path of length t ending in state q must include the best path of length t-1 to the previous state. ( t now a time step , not a tag ). So, – Find the best path of length t-1 to each state. – Consider extending each of those by 1 step, to state q . – Take the best of those options as the best path to state q . Algorithms for HMMs (Goldwater, ANLP) 11

  12. Notation • Sequence of observations over time o 1 , o 2 , …, o T – here, words in sentence • Vocabulary size V of possible observations • Set of possible states q 1 , q 2 , …, q N (see note next slide) – here, tags • A , an NxN matrix of transition probabilities – a ij : the prob of transitioning from state i to j . (JM3 Fig 8.7) • B , an NxV matrix of output probabilities – b i (o t ) : the prob of emitting o t from state i . ( JM3 Fig 8.8) Algorithms for HMMs (Goldwater, ANLP) 12

  13. Note on notation • J&M use q 1 , q 2 , …, q N for set of states, but also use q 1 , q 2 , …, q T for state sequence over time. – So, just seeing q 1 is ambiguous (though usually disambiguated from context). – I’ll instead use q i for state names, and q t for state at time t. – So we could have q t = q i , meaning: the state we’re in at time t is q i . Algorithms for HMMs (Goldwater, ANLP) 13

  14. HMM example w/ new notation Start .3 .5 q 1 q 2 .7 .5 .6 .1 .3 .1 .7 .2 x y z x y z • States {q 1 , q 2 } (or {<s>, q 1 , q 2 } ) • Output alphabet {x, y, z} Adapted from Manning & Schuetze, Fig 9.2 Algorithms for HMMs (Goldwater, ANLP) 14

  15. Transition and Output Probabilities • Transition matrix A : q 1 q 2 a ij = P(q j | q i ) <s> 1 0 q 1 .7 .3 q 2 .5 .5 • Output matrix B : x y z b i (o) = P(o | q i ) q 1 .6 .1 .3 q 2 for output o .1 .7 .2 Algorithms for HMMs (Goldwater, ANLP) 15

  16. Joint probability of (states, outputs) • Let λ = (A, B) be the parameters of our HMM. • Using our new notation, given state sequence Q = (q 1 ... q T ) and output sequence O = (o 1 ... o T ), we have: 𝑈 𝑄 𝑃, 𝑅 𝜇 = 𝑄 𝑝 𝑢 𝑟 𝑢 𝑄 𝑟 𝑢 𝑟 𝑢−1 𝑢=1 Algorithms for HMMs (Goldwater, ANLP) 16

  17. Joint probability of (states, outputs) • Let λ = (A, B) be the parameters of our HMM. • Using our new notation, given state sequence Q = (q 1 ... q T ) and output sequence O = (o 1 ... o T ), we have: 𝑈 𝑄 𝑃, 𝑅 𝜇 = 𝑄 𝑝 𝑢 𝑟 𝑢 𝑄 𝑟 𝑢 𝑟 𝑢−1 𝑢=1 𝑈 • Or: 𝑄 𝑃, 𝑅 𝜇 = 𝑐 𝑟 𝑢 (𝑝 𝑢 ) 𝑏 𝑟 𝑢−1 𝑟 𝑢 𝑢=1 Algorithms for HMMs (Goldwater, ANLP) 17

  18. Joint probability of (states, outputs) • Let λ = (A, B) be the parameters of our HMM. • Using our new notation, given state sequence Q = (q 1 ... q T ) and output sequence O = (o 1 ... o T ), we have: 𝑈 𝑄 𝑃, 𝑅 𝜇 = 𝑄 𝑝 𝑢 𝑟 𝑢 𝑄 𝑟 𝑢 𝑟 𝑢−1 𝑢=1 𝑈 • Or: 𝑄 𝑃, 𝑅 𝜇 = 𝑐 𝑟 𝑢 (𝑝 𝑢 ) 𝑏 𝑟 𝑢−1 𝑟 𝑢 𝑢=1 • Example: 𝑄 𝑃 = 𝑧, 𝑨 , 𝑅 = (𝑟 1 , 𝑟 1 ) 𝜇 = 𝑐 1 𝑧 ∙ 𝑐 1 𝑨 ∙ 𝑏 <𝑡>,1 ∙ 𝑏 11 = (.1)(.3)(1)(.7) Algorithms for HMMs (Goldwater, ANLP) 18

  19. Viterbi: high-level picture argmax 𝑅 𝑄(𝑅|𝑃) • Want to find • Intuition: the best path of length t ending in state q must include the best path of length t-1 to the previous state. So, – Find the best path of length t-1 to each state. – Consider extending each of those by 1 step, to state q . – Take the best of those options as the best path to state q . Algorithms for HMMs (Goldwater, ANLP) 19

  20. Viterbi algorithm • Use a chart to store partial results as we go – NxT table, where v(j,t) is the probability* of the best state sequence for o 1 … o t that ends in state j . Algorithms for HMMs (Goldwater, ANLP) 20 *Specifically, v(j,t) stores the max of the joint probability P(o 1 …o t ,q 1 …q t-1 ,q t =j| λ )

  21. Viterbi algorithm • Use a chart to store partial results as we go – NxT table, where v(j,t) is the probability* of the best state sequence for o 1 … o t that ends in state j . • Fill in columns from left to right, with 𝑂 𝑤 𝑘, 𝑢 = max 𝑗=1 𝑤 𝑗, 𝑢 − 1 ∙ 𝑏 𝑗𝑘 ∙ 𝑐 𝑘 𝑝 𝑢 Algorithms for HMMs (Goldwater, ANLP) 21 *Specifically, v(j,t) stores the max of the joint probability P(o 1 …o t ,q 1 …q t-1 ,q t =j| λ )

  22. Viterbi algorithm • Use a chart to store partial results as we go – NxT table, where v(j,t) is the probability* of the best state sequence for o 1 … o t that ends in state j . • Fill in columns from left to right, with 𝑂 𝑤 𝑘, 𝑢 = max 𝑗=1 𝑤 𝑗, 𝑢 − 1 ∙ 𝑏 𝑗𝑘 ∙ 𝑐 𝑘 𝑝 𝑢 • Store a backtrace to show, for each cell, which state at t-1 we came from. Algorithms for HMMs (Goldwater, ANLP) 22 *Specifically, v(j,t) stores the max of the joint probability P(o 1 …o t ,q 1 …q t-1 ,q t =j| λ )

  23. Example • Suppose O=xzy . Our initially empty table: o 1 =x o 2 =z o 3 =y q 1 q 2 Algorithms for HMMs (Goldwater, ANLP) 23

  24. Filling the first column o 1 =x o 2 =z o 3 =y q 1 .6 q 2 0 𝑤 1,1 = 𝑏 <𝑡>1 ∙ 𝑐 1 𝑦) = 1 (.6 𝑤 2,1 = 𝑏 <𝑡>2 ∙ 𝑐 2 𝑦) = 0 (.1 Algorithms for HMMs (Goldwater, ANLP) 24

  25. Starting the second column o 1 =x o 2 =z o 3 =y q 1 .6 q 2 0 𝑂 𝑤 1,2 = max 𝑗=1 𝑤 𝑗, 1 ∙ 𝑏 𝑗1 ∙ 𝑐 1 𝑨 = max 𝑤 1,1 ∙ 𝑏 11 ∙ 𝑐 1 𝑨 = (.6)(.7)(.3) 𝑤 2,1 ∙ 𝑏 21 ∙ 𝑐 1 𝑨 = (0)(.5)(.3) Algorithms for HMMs (Goldwater, ANLP) 25

  26. Starting the second column o 1 =x o 2 =z o 3 =y q 1 .6 .126 q 2 0 𝑂 𝑤 1,2 = max 𝑗=1 𝑤 𝑗, 1 ∙ 𝑏 𝑗1 ∙ 𝑐 1 𝑨 = max 𝑤 1,1 ∙ 𝑏 11 ∙ 𝑐 1 𝑨 = (.6)(.7)(.3) 𝑤 2,1 ∙ 𝑏 21 ∙ 𝑐 1 𝑨 = (0)(.5)(.3) Algorithms for HMMs (Goldwater, ANLP) 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend