 
              Attention Shifting for Parsing Speech Keith Hall and Mark Johnson Brown University ACL 2004 July 22, 2004
Attention Shifting • Iterative best-first word-lattice parsing algorithm • Posits a complete syntactic analyses for each path of a word-lattice • Goals of Attention Shifting – Improve accuracy of best-first parsing on word-lattices (Oracle Word Error Rate) – Improve efficiency of word-lattice parsing (Number of parser operations) – Improve syntactic language modeling based on multi-stage parsing (Word Error Rate) • Inspired by edge demeriting for efficient parsing Blaheta & Charniak demeriting (ACL99) 7/22/2004 ACL04: Attention Shifting for Parsing Speech 1
Outline • Syntactic language modeling • Word-lattice parsing • Multi-stage best-first parsing 7/22/2004 ACL04: Attention Shifting for Parsing Speech 2
Noisy Channel Langauge Language Noisy Channel Source Output P ( A, W ) = P ( A | W ) P ( W ) Noise Model Language Model • Speech recognition: Noise model = Acoustic model arg max W P ( W | A ) = arg max W P ( A, W ) 7/22/2004 ACL04: Attention Shifting for Parsing Speech 3
Syntactic Language Modeling is/0 4 8 w1, ..., wi, ..., wn1 man/0 early/0 man's/1.385 surly/0.692 w1, ..., wi, ..., wn2 5 2 early/0 Language o1, ..., oi, ..., on w1, ..., wi, ..., wn3 n-best list the/0 mans/1.385 surly/0 Model extractor early/0 w1, ..., wi, ..., wn4 1 6 9 ... duh/1.385 surely/0 w1, ..., wi, ..., wnm is/0 man/0 3 7 10 • Adding syntactic information to context (conditioning information) P ( W ) = Q k 1 P ( w i | π ( w k , . . . , w 1 )) • n -best reranking – Select n -best strings using some model (trigram) – Process each string independently – Select string with highest P ( A, W ) • Charniak (ACL01), Chelba & Jelinek (CS&L00,ACL02), Roark (CL01) 7/22/2004 ACL04: Attention Shifting for Parsing Speech 4
Parsing word-lattice VP VBZ VB VB JJ NN 4 man/0 is/0 early/0 5 mans/0.510 surly/0.694 DT </s>/0 the/0 2 man’s/0.916 7 8/1.307 early/0 <s>/0 6 0 1 duh/1.223 surely/0 man/0 is/0 3 9 10 • Compress lattice with Weighted FSM determinization and minimization (Mohri, Pereira, & Riley CS&L02) • Use compressed word-lattice graph as the parse chart • Structure sharing due to compressed lattice VP → NN VB covers string man is VP → VBZ covers string mans 7/22/2004 ACL04: Attention Shifting for Parsing Speech 5
Word-lattice example • I WOULD NOT SUGGEST ANYONE MAKE A DECISION ON WHO TO VOTE FOR BASED ON A STUDY LIKE THIS (160 arcs, 72 nodes) for/0.210 steady/0 four/1.660 24 23 based/0 four/0.203 austerity/85.86 the/98.57 four/57.89 developed/188.5 30 based/0 on/10.99 for/1.683 25 is/29.99 26 sonat/0 study/11.19 on/10.41 51 for/0 based/0 as/18.04 27 29 steady/0 for/2.234 based/0 as/18.04 suggested/92.75 40 61 sonat/0 i/0 develop/174.3 50 its/47.22 steady/0 four/57.01 i./10.71 would/0 not/0 see/97.62 based/0 just/0 0 2 3 4 8 to/0 22 vote/0 suggest/0 four/0.113 sonat/0 a/0 if/81.46 i/0 are/58.93 to/76.96 33 49 28 1 they/55.21 form/51.87 sonat/0 how/238.1 based/37.40 steady/0.023 anyone/0 is/29.99 21 17 devotes/127.1 on/11.00 it/73.10 foreign/74.75 9 32 to/76.32 study/12.52 to/80.41 whom/157.8 60 one/0.5 did/75.42 vote/0 foreign/126.3 18 19 20 sonat/0 48 ford/42.80 base/0 on/11.01 his/113.1 teddy/3.808 the/54.78 who/0 way/87.54 5 do/65.48 devote/0 based/53.86 study/11.87 four/0 that/133.4 38 36 on/11.00 when/0.933 make/0 on/0 16 6 10 to/201.2 study/13.62 for/0 39 devote/0 its/47.22 are/96.33 any/55.00 would/110.6 37 food/74.85 standing/107.2 like/0 this/0 </s>/0 make/0 anyway/93.69 a/0 decision/0 vote/0 for/0 a/0 62 63 64 65 66/0 44 11 12 13 and/65.94 a/0 of/63.72 steady/0 43 42 45 anyone/0 made/132.9 four/12.94 57 his/113.1 a/106.9 study/0 7 who/0 14 59 base/12.53 15 to/156.0 to/76.32 34 are/111.9 steady/0 donna/0 the/144.9 a/148.7 four/0 41 58 devote/0 on/106 study/11.19 force/158.2 on/0 austerity/85.86 more/264.4 56 based/0 55 standing/107.2 ford/94.43 donna/0 31 fork/87.49 steady/0 bass/0 a/0 on/105.9 54 52 53 on/120.3 form/103.5 study/13.62 steady/0 donna/0 formed/101.8 46 based/53.86 47 study/11.89 former/165.2 fort/73.29 forth/124.9 forward/166.8 fourth/122.7 a/108.3 • compressed NIST ’93 HUB-1 lattices for/0 are/111.9 35 base/12.52 bass/0 – average of 800 arcs/lattice (max 15000 arcs) – average of 100 nodes/lattice (max 500 nodes) 7/22/2004 ACL04: Attention Shifting for Parsing Speech 6
Best-first Word-lattice Parsing NP(0,4) 0.567 VP NN(3,9) 0.550 VB VB VB NN man/0 4 is/0 early/0 5 Agenda mans/0.510 surly/0.694 (Priority Queue) DT </s>/0 the/0 2 man’s/0.916 7 8/1.307 early/0 <s>/0 6 0 1 duh/1.223 surely/0 man/0 is/0 3 10 9 Compute Grammar FOM • Bottom-up best-first PCFG parser • Stack-based search technique based on figure-of-merit • Attempts to work on “likely” parts of the chart • Ideal figure-of-merit: P ( edge ) = inside ( edge ) ∗ outside ( edge ) details in (Hall & Johnson ASRU03) 7/22/2004 ACL04: Attention Shifting for Parsing Speech 7
Word-lattice Parsing Word-lattice Compresed lattice First Outside Stage HMM Best-first PCFG Parser • First stage: best-first bottom-up PCFG parser Syntactic Category • Second stage: Charniak Parser Language Model Lattice (Charniak ACL01) • Parsing from lattice allows structure sharing Inside-Outside Prunning • Combines search for candidate lattice paths with search for candidate parses Local Trees Second Lexicalized Syntactic Stage Language Model (Charniak Parser) Word String from Optimal Parse 7/22/2004 ACL04: Attention Shifting for Parsing Speech 8
Multi-stage Deficiency • First-stage PCFG parser selects parses for a subset of word-lattice paths • Lexicalized syntactic analysis not performed on all of the word-lattice • Covering entire word-lattice requires excessive over-parsing – 100X over-parsing produces forests too large for lexical-parser – additional pruning required, resulting in loss of lattice-paths • Attention shifting algorithm addresses the coverage problem 7/22/2004 ACL04: Attention Shifting for Parsing Speech 9
Attention Shifting PCFG Word-lattice Parser • Iterative reparsing Identify 1. Perform best-first PCFG parsing (over-parse as Unused Words with normal best-first parsing) 2. Identify words not covered by a complete parse (unused word has 0 outside probability) Clear Agenda/ Add Edges for 3. Reset parse Agenda to contain unused words Unused Words 4. If Agenda � = ∅ repeat • Prune chart using inside/outside pruning • At most | A | iterations ( | A | = number of arcs) Is Agenda no Empty? • Forces coverage of word-lattice yes Continue Multi-stage Parsing 7/22/2004 ACL04: Attention Shifting for Parsing Speech 10
Experimental Setup • PCFG Parser trained on Penn WSJ Treebank f2-21,24 (speech-normalization via Roark’s normalization) – Generated at most 30k local-trees for second-stage parser • Lexicalized parser: Charniak’s Language Model Parser (Charniak ACL01) – trained on parsed BLLIP99 corpus (30 million words of WSJ) – BLLIP99 parsed using Charniak string parser trained on Penn WSJ 7/22/2004 ACL04: Attention Shifting for Parsing Speech 11
Evaluation • Evaluation set: NIST ’93 HUB-1 – 213 utterances – Professional readers reading WSJ text • Word-lattices evaluated on: – n -best word-lattices using Chelba A ∗ decoder ( 50 -best paths) – compressed acoustic word-lattices • Metrics – Word-lattice accuracy (first-stage parser): Oracle Word Error Rate – Word-string accuracy (multi-stage parser): Word Error Rate – Efficiency: number of parser agenda operations 7/22/2004 ACL04: Attention Shifting for Parsing Speech 12
Results: n -best word-lattices • Charniak parser run on each of the n -best strings (reranking) (4X over-parsing) • n -best word-lattice: pruned acoustic word-lattices containing only n -best word-strings • Oracle WER of n -best lattices: 7 . 75 Model # edge pops Oracle WER WER n –best (Charniak) 2.5 million 7.75 11.8 100x LatParse 3.4 million 8.18 12.0 10x AttShift 564,895 7.78 11.9 7/22/2004 ACL04: Attention Shifting for Parsing Speech 13
Results: Acoustic word-lattices • Compressed acoustic lattices Model # edge pops Oracle WER WER acoustic lats N/A 3.26 N/A 100x LatParse 3.4 million 5.45 13.1 10x AttShift 1.6 million 4.17 13.1 7/22/2004 ACL04: Attention Shifting for Parsing Speech 14
Conclusion • Attention shifting – Improves parsing efficiency – Increases first-stage accuracy (correcting for best-first search errors) – Does not improve multi-stage accuracy • Pruning for second-stage parser constrains number of edges • Useful for best-first word-lattices parsing 7/22/2004 ACL04: Attention Shifting for Parsing Speech 15
Recommend
More recommend