attention shifting for parsing speech
play

Attention Shifting for Parsing Speech Keith Hall and Mark Johnson - PowerPoint PPT Presentation

Attention Shifting for Parsing Speech Keith Hall and Mark Johnson Brown University ACL 2004 July 22, 2004 Attention Shifting Iterative best-first word-lattice parsing algorithm Posits a complete syntactic analyses for each path of a


  1. Attention Shifting for Parsing Speech Keith Hall and Mark Johnson Brown University ACL 2004 July 22, 2004

  2. Attention Shifting • Iterative best-first word-lattice parsing algorithm • Posits a complete syntactic analyses for each path of a word-lattice • Goals of Attention Shifting – Improve accuracy of best-first parsing on word-lattices (Oracle Word Error Rate) – Improve efficiency of word-lattice parsing (Number of parser operations) – Improve syntactic language modeling based on multi-stage parsing (Word Error Rate) • Inspired by edge demeriting for efficient parsing Blaheta & Charniak demeriting (ACL99) 7/22/2004 ACL04: Attention Shifting for Parsing Speech 1

  3. Outline • Syntactic language modeling • Word-lattice parsing • Multi-stage best-first parsing 7/22/2004 ACL04: Attention Shifting for Parsing Speech 2

  4. Noisy Channel Langauge Language Noisy Channel Source Output P ( A, W ) = P ( A | W ) P ( W ) Noise Model Language Model • Speech recognition: Noise model = Acoustic model arg max W P ( W | A ) = arg max W P ( A, W ) 7/22/2004 ACL04: Attention Shifting for Parsing Speech 3

  5. Syntactic Language Modeling is/0 4 8 w1, ..., wi, ..., wn1 man/0 early/0 man's/1.385 surly/0.692 w1, ..., wi, ..., wn2 5 2 early/0 Language o1, ..., oi, ..., on w1, ..., wi, ..., wn3 n-best list the/0 mans/1.385 surly/0 Model extractor early/0 w1, ..., wi, ..., wn4 1 6 9 ... duh/1.385 surely/0 w1, ..., wi, ..., wnm is/0 man/0 3 7 10 • Adding syntactic information to context (conditioning information) P ( W ) = Q k 1 P ( w i | π ( w k , . . . , w 1 )) • n -best reranking – Select n -best strings using some model (trigram) – Process each string independently – Select string with highest P ( A, W ) • Charniak (ACL01), Chelba & Jelinek (CS&L00,ACL02), Roark (CL01) 7/22/2004 ACL04: Attention Shifting for Parsing Speech 4

  6. Parsing word-lattice VP VBZ VB VB JJ NN 4 man/0 is/0 early/0 5 mans/0.510 surly/0.694 DT </s>/0 the/0 2 man’s/0.916 7 8/1.307 early/0 <s>/0 6 0 1 duh/1.223 surely/0 man/0 is/0 3 9 10 • Compress lattice with Weighted FSM determinization and minimization (Mohri, Pereira, & Riley CS&L02) • Use compressed word-lattice graph as the parse chart • Structure sharing due to compressed lattice VP → NN VB covers string man is VP → VBZ covers string mans 7/22/2004 ACL04: Attention Shifting for Parsing Speech 5

  7. Word-lattice example • I WOULD NOT SUGGEST ANYONE MAKE A DECISION ON WHO TO VOTE FOR BASED ON A STUDY LIKE THIS (160 arcs, 72 nodes) for/0.210 steady/0 four/1.660 24 23 based/0 four/0.203 austerity/85.86 the/98.57 four/57.89 developed/188.5 30 based/0 on/10.99 for/1.683 25 is/29.99 26 sonat/0 study/11.19 on/10.41 51 for/0 based/0 as/18.04 27 29 steady/0 for/2.234 based/0 as/18.04 suggested/92.75 40 61 sonat/0 i/0 develop/174.3 50 its/47.22 steady/0 four/57.01 i./10.71 would/0 not/0 see/97.62 based/0 just/0 0 2 3 4 8 to/0 22 vote/0 suggest/0 four/0.113 sonat/0 a/0 if/81.46 i/0 are/58.93 to/76.96 33 49 28 1 they/55.21 form/51.87 sonat/0 how/238.1 based/37.40 steady/0.023 anyone/0 is/29.99 21 17 devotes/127.1 on/11.00 it/73.10 foreign/74.75 9 32 to/76.32 study/12.52 to/80.41 whom/157.8 60 one/0.5 did/75.42 vote/0 foreign/126.3 18 19 20 sonat/0 48 ford/42.80 base/0 on/11.01 his/113.1 teddy/3.808 the/54.78 who/0 way/87.54 5 do/65.48 devote/0 based/53.86 study/11.87 four/0 that/133.4 38 36 on/11.00 when/0.933 make/0 on/0 16 6 10 to/201.2 study/13.62 for/0 39 devote/0 its/47.22 are/96.33 any/55.00 would/110.6 37 food/74.85 standing/107.2 like/0 this/0 </s>/0 make/0 anyway/93.69 a/0 decision/0 vote/0 for/0 a/0 62 63 64 65 66/0 44 11 12 13 and/65.94 a/0 of/63.72 steady/0 43 42 45 anyone/0 made/132.9 four/12.94 57 his/113.1 a/106.9 study/0 7 who/0 14 59 base/12.53 15 to/156.0 to/76.32 34 are/111.9 steady/0 donna/0 the/144.9 a/148.7 four/0 41 58 devote/0 on/106 study/11.19 force/158.2 on/0 austerity/85.86 more/264.4 56 based/0 55 standing/107.2 ford/94.43 donna/0 31 fork/87.49 steady/0 bass/0 a/0 on/105.9 54 52 53 on/120.3 form/103.5 study/13.62 steady/0 donna/0 formed/101.8 46 based/53.86 47 study/11.89 former/165.2 fort/73.29 forth/124.9 forward/166.8 fourth/122.7 a/108.3 • compressed NIST ’93 HUB-1 lattices for/0 are/111.9 35 base/12.52 bass/0 – average of 800 arcs/lattice (max 15000 arcs) – average of 100 nodes/lattice (max 500 nodes) 7/22/2004 ACL04: Attention Shifting for Parsing Speech 6

  8. Best-first Word-lattice Parsing NP(0,4) 0.567 VP NN(3,9) 0.550 VB VB VB NN man/0 4 is/0 early/0 5 Agenda mans/0.510 surly/0.694 (Priority Queue) DT </s>/0 the/0 2 man’s/0.916 7 8/1.307 early/0 <s>/0 6 0 1 duh/1.223 surely/0 man/0 is/0 3 10 9 Compute Grammar FOM • Bottom-up best-first PCFG parser • Stack-based search technique based on figure-of-merit • Attempts to work on “likely” parts of the chart • Ideal figure-of-merit: P ( edge ) = inside ( edge ) ∗ outside ( edge ) details in (Hall & Johnson ASRU03) 7/22/2004 ACL04: Attention Shifting for Parsing Speech 7

  9. Word-lattice Parsing Word-lattice Compresed lattice First Outside Stage HMM Best-first PCFG Parser • First stage: best-first bottom-up PCFG parser Syntactic Category • Second stage: Charniak Parser Language Model Lattice (Charniak ACL01) • Parsing from lattice allows structure sharing Inside-Outside Prunning • Combines search for candidate lattice paths with search for candidate parses Local Trees Second Lexicalized Syntactic Stage Language Model (Charniak Parser) Word String from Optimal Parse 7/22/2004 ACL04: Attention Shifting for Parsing Speech 8

  10. Multi-stage Deficiency • First-stage PCFG parser selects parses for a subset of word-lattice paths • Lexicalized syntactic analysis not performed on all of the word-lattice • Covering entire word-lattice requires excessive over-parsing – 100X over-parsing produces forests too large for lexical-parser – additional pruning required, resulting in loss of lattice-paths • Attention shifting algorithm addresses the coverage problem 7/22/2004 ACL04: Attention Shifting for Parsing Speech 9

  11. Attention Shifting PCFG Word-lattice Parser • Iterative reparsing Identify 1. Perform best-first PCFG parsing (over-parse as Unused Words with normal best-first parsing) 2. Identify words not covered by a complete parse (unused word has 0 outside probability) Clear Agenda/ Add Edges for 3. Reset parse Agenda to contain unused words Unused Words 4. If Agenda � = ∅ repeat • Prune chart using inside/outside pruning • At most | A | iterations ( | A | = number of arcs) Is Agenda no Empty? • Forces coverage of word-lattice yes Continue Multi-stage Parsing 7/22/2004 ACL04: Attention Shifting for Parsing Speech 10

  12. Experimental Setup • PCFG Parser trained on Penn WSJ Treebank f2-21,24 (speech-normalization via Roark’s normalization) – Generated at most 30k local-trees for second-stage parser • Lexicalized parser: Charniak’s Language Model Parser (Charniak ACL01) – trained on parsed BLLIP99 corpus (30 million words of WSJ) – BLLIP99 parsed using Charniak string parser trained on Penn WSJ 7/22/2004 ACL04: Attention Shifting for Parsing Speech 11

  13. Evaluation • Evaluation set: NIST ’93 HUB-1 – 213 utterances – Professional readers reading WSJ text • Word-lattices evaluated on: – n -best word-lattices using Chelba A ∗ decoder ( 50 -best paths) – compressed acoustic word-lattices • Metrics – Word-lattice accuracy (first-stage parser): Oracle Word Error Rate – Word-string accuracy (multi-stage parser): Word Error Rate – Efficiency: number of parser agenda operations 7/22/2004 ACL04: Attention Shifting for Parsing Speech 12

  14. Results: n -best word-lattices • Charniak parser run on each of the n -best strings (reranking) (4X over-parsing) • n -best word-lattice: pruned acoustic word-lattices containing only n -best word-strings • Oracle WER of n -best lattices: 7 . 75 Model # edge pops Oracle WER WER n –best (Charniak) 2.5 million 7.75 11.8 100x LatParse 3.4 million 8.18 12.0 10x AttShift 564,895 7.78 11.9 7/22/2004 ACL04: Attention Shifting for Parsing Speech 13

  15. Results: Acoustic word-lattices • Compressed acoustic lattices Model # edge pops Oracle WER WER acoustic lats N/A 3.26 N/A 100x LatParse 3.4 million 5.45 13.1 10x AttShift 1.6 million 4.17 13.1 7/22/2004 ACL04: Attention Shifting for Parsing Speech 14

  16. Conclusion • Attention shifting – Improves parsing efficiency – Increases first-stage accuracy (correcting for best-first search errors) – Does not improve multi-stage accuracy • Pruning for second-stage parser constrains number of edges • Useful for best-first word-lattices parsing 7/22/2004 ACL04: Attention Shifting for Parsing Speech 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend