decoding in latent conditional models
play

Decoding in Latent Conditional Models: A Practically Fast Solution - PowerPoint PPT Presentation

Latent dynamics workshop 2010 Decoding in Latent Conditional Models: A Practically Fast Solution for an NP-hard Problem Xu Sun ( ) University of Tokyo 2010.06.16 Outline Introduction Related Work & Motivations Our


  1. Latent dynamics workshop 2010 Decoding in Latent Conditional Models: A Practically Fast Solution for an NP-hard Problem Xu Sun ( 孫 栩 ) University of Tokyo 2010.06.16

  2. Outline • Introduction • Related Work & Motivations • Our proposals • Experiments • Conclusions 2

  3. Latent dynamics • Latent-structures (latent dynamics here) are important in information processing – Natural language processing – Data mining – Vision recognition • Modeling latent dynamics: Latent-dynamic conditional random fields (LDCRF) 3

  4. Latent dynamics • Latent-structures (latent dynamics here) are important in information processing Parsing: Learn refined grammars with latent info S NP VP . PRP VBD NP . He heard DT NN the voice 4

  5. Latent dynamics • Latent-structures (latent dynamics here) are important in information processing Parsing: Learn refined grammars with latent info S -x NP -x VP -x . -x PRP -x VBD -x NP -x . He heard DT -x NN -x the voice 5

  6. More common cases: linear-chain latent dynamics • The previous example is a tree-structure • More common cases could be linear-chain latent dynamics – Named entity recognition – Phrase segmentation – Word segmentation seg seg seg noSeg These are her flowers. Phrase segmentation [Sun+ COLING 08] 6

  7. A solution without latent annotation: Latent-dynamic CRFs A solution: Latent-dynamic conditional random fields (LDCRFs) [Morency+ CVPR 07] * No need to annotate latent info seg seg seg noSeg These are her flowers. Phrase segmentation [Sun+ COLING 08] 7

  8. Current problem & our target A solution: Latent-dynamic conditional random fields (LDCRFs) [Morency+ CVPR 07] * No need to annotate latent info Our target: Current problem: An almost exact Inference (decoding) is an inference method NP-hard problem. with fast speed. 8

  9. Outline • Introduction • Related Work & Motivations • Our proposals • Experiments • Conclusions 9

  10. Traditional methods • Traditional sequential labeling models – Hidden Markov Model (HMM) [Rabiner IEEE 89] – Maximum Entropy Model (MEM) [Ratnaparkhi EMNLP 96] – Conditional Random Fields (CRF) [Lafferty+ ICML 01] – Collins Perceptron [Collins EMNLP 02] Arguably the most accurate one. • Problem: not able to model latent structures  We will use it as one of the baseline. 10

  11. Conditional random field (CRF) [Lafferty+ ICML 01] y 1 y 2 y 3 y 4 y n x 1 x 2 x 3 x 4 x n   1       ( | , ) exp ( , ) P y x F y x  k k   Z ( , ) x k Problem: CRF does not model latent info 11

  12. Latent-Dynamic CRFs [Morency+ CVPR 07] y 1 y 2 y 3 y 4 y n Latent-dynamic h 1 h 2 h 3 h 4 h n CRFs x 1 x 2 x 3 x 4 x n y 1 y 2 y 3 y 4 y n Conditional random fields x 1 x 2 x 3 x 4 x n 12

  13. Latent-Dynamic CRFs [Morency+ CVPR 07] y 1 y 2 y 3 y 4 y n Latent-dynamic h 1 h 2 h 3 h 4 h n CRFs x 1 x 2 x 3 x 4 x n We can think (informally) it as y 1 y 2 y 3 y 4 y n “CRF + unsup . learning on latent info” Conditional random fields x 1 x 2 x 3 x 4 x n 13

  14. Latent-Dynamic CRFs [Morency+ CVPR 07]   1           exp ( , ) ( | , ) ( | , ) F h x P y x P h x  k k Z   ( , ) x     : H H h h h : h k j j y y j j Good performance reports * Outperforming HMM, MEMM, SVM, CRF, etc. * Syntactic parsing [Petrov+ NIPS 08] * Syntactic chunking [Sun+ COLING 08] * Vision object recognition [Morency+ CVPR 07; Quattoni+ PAMI 08] 14

  15. Outline • Introduction • Related Work & Motivations • Our proposals • Experiments • Conclusions 15

  16. Inference problem Recent fast solutions are only y 1 y 2 y 3 y 4 y n approximation methods: *Best Hidden Path [Matsuzaki+ ACL 05] h 1 h 2 h 3 h 4 h n *Best Marginal Path [Morency+ CVPR 07] x 1 x 2 x 3 x 4 x n • Prob: Exact inference (find the sequence with max probability) is NP-hard! – no fast solution existing 16

  17. Related work 1: Best hidden path (BHP) [Matsuzaki+ ACL 05] Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 These are her flowers . 17

  18. Related work 1: Best hidden path (BHP) [Matsuzaki+ ACL 05] Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 Result: These are her flowers . Seg Seg Seg NoSeg Seg 18

  19. Related work 2: Best marginal path (BMP) [Morency+ CVPR 07] Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 These are her flowers . 19

  20. Related work 2: Best marginal path (BMP) [Morency+ CVPR 07] Seg-0 0.1 0.1 0.4 0.0 0.1 0.6 0.1 0.3 0.1 0.1 Seg-1 Seg-2 0.2 0.5 0.0 0.1 0.5 noSeg-0 0.1 0.1 0.2 0.1 0.2 0.0 0.2 0.0 0.7 0.0 noSeg-1 noSeg-2 0.0 0.0 0.1 0.0 0.1 Result: These are her flowers . Seg Seg Seg NoSeg Seg 20

  21. Our target 1) Exact inference y 1 y 2 y 3 y 4 y n 2) Comparable speed to existing approximation methods h 1 h 2 h 3 h 4 h n x 1 x 2 x 3 x 4 x n • Prob: Exact inference (find the sequence with Challenge/Difficulty: Exact & practically-fast solution max probability) is NP-hard! on an NP-hard problem – no fast solution existing 21

  22. Outline • Introduction • Related Work & Motivations • Our proposals • Experiments • Conclusions 22

  23. Essential ideas [Sun+ EACL 09] • Fast & exact inference from a key observation – A key observation on prob. Distribution – Dynamic top-n search – Fast decision on optimal result from top-n candidates 23

  24. Key observation • Natural problems (e.g., NLP problems) are not completely ambiguous • Normally, Only a few result candidate are highly probable • Therefore, probability distribution on latent models could be sharp 24

  25. Key observation • Probability distribution on latent models is sharp These are her flowers . seg noSeg seg seg seg P = 0.2 0.8 seg seg seg noSeg seg P = 0.3 prob seg seg seg seg seg P = 0.2 seg seg noSeg noSeg seg P = 0.1 P = … seg noSeg seg noSeg seg P = … … … … … … 25

  26. Key observation • Probability distribution on latent models is • Challenge: the number of probable candidates are unknown & changing sharp • Need a method which can automatically These are her flowers . adapt itself on different cases seg noSeg seg seg seg P = 0.2 seg seg seg noSeg seg P = 0.3 seg seg seg seg seg P = 0.2 compare seg seg noSeg noSeg seg P = 0.1 P = … seg noSeg seg noSeg seg P(unknown) P = … … … … … … ≤ 0.2 26

  27. A demo on lattice Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 These are her flowers . 27

  28. (1) Admissible heuristics for A* search Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 These are her flowers . 28

  29. (1) Admissible heuristics for A* search Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . Viterbi algo. (Right to left) 29

  30. (1) Admissible heuristics for A* search Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 30

  31. (2) Find 1st latent path h1: A* search Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 31

  32. (3) Get y1 & P(y1): Forward-Backward algo. Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 32

  33. (3) Get y1 & P(y1): Forward-Backward algo. Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 P(seg, noSeg, seg, seg, seg) = 0.2 noSeg-2 P(y*) = 0.2 h05 h15 h25 h35 h45 P(unknown) = 1 - 0.2 = 0.8 These are her flowers . P(y*) > P(unknown) ? 33

  34. (4) Find 2nd latent path h2: A* search Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 34

  35. (5) Get y2 & P(y2): Forward-backward algo. Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 35

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend