Attention Shifting for Parsing Speech Keith Hall and Mark Johnson - PowerPoint PPT Presentation

Attention Shifting for Parsing Speech Keith Hall and Mark Johnson Brown University ACL 2004 July 22, 2004

Attention Shifting • Iterative best-first word-lattice parsing algorithm • Posits a complete syntactic analyses for each path of a word-lattice • Goals of Attention Shifting – Improve accuracy of best-first parsing on word-lattices (Oracle Word Error Rate) – Improve efficiency of word-lattice parsing (Number of parser operations) – Improve syntactic language modeling based on multi-stage parsing (Word Error Rate) • Inspired by edge demeriting for efficient parsing Blaheta & Charniak demeriting (ACL99) 7/22/2004 ACL04: Attention Shifting for Parsing Speech 1

Outline • Syntactic language modeling • Word-lattice parsing • Multi-stage best-first parsing 7/22/2004 ACL04: Attention Shifting for Parsing Speech 2

Noisy Channel Langauge Language Noisy Channel Source Output P ( A, W ) = P ( A | W ) P ( W ) Noise Model Language Model • Speech recognition: Noise model = Acoustic model arg max W P ( W | A ) = arg max W P ( A, W ) 7/22/2004 ACL04: Attention Shifting for Parsing Speech 3

Syntactic Language Modeling is/0 4 8 w1, ..., wi, ..., wn1 man/0 early/0 man's/1.385 surly/0.692 w1, ..., wi, ..., wn2 5 2 early/0 Language o1, ..., oi, ..., on w1, ..., wi, ..., wn3 n-best list the/0 mans/1.385 surly/0 Model extractor early/0 w1, ..., wi, ..., wn4 1 6 9 ... duh/1.385 surely/0 w1, ..., wi, ..., wnm is/0 man/0 3 7 10 • Adding syntactic information to context (conditioning information) P ( W ) = Q k 1 P ( w i | π ( w k , . . . , w 1 )) • n -best reranking – Select n -best strings using some model (trigram) – Process each string independently – Select string with highest P ( A, W ) • Charniak (ACL01), Chelba & Jelinek (CS&L00,ACL02), Roark (CL01) 7/22/2004 ACL04: Attention Shifting for Parsing Speech 4

Parsing word-lattice VP VBZ VB VB JJ NN 4 man/0 is/0 early/0 5 mans/0.510 surly/0.694 DT </s>/0 the/0 2 man’s/0.916 7 8/1.307 early/0 <s>/0 6 0 1 duh/1.223 surely/0 man/0 is/0 3 9 10 • Compress lattice with Weighted FSM determinization and minimization (Mohri, Pereira, & Riley CS&L02) • Use compressed word-lattice graph as the parse chart • Structure sharing due to compressed lattice VP → NN VB covers string man is VP → VBZ covers string mans 7/22/2004 ACL04: Attention Shifting for Parsing Speech 5

Word-lattice example • I WOULD NOT SUGGEST ANYONE MAKE A DECISION ON WHO TO VOTE FOR BASED ON A STUDY LIKE THIS (160 arcs, 72 nodes) for/0.210 steady/0 four/1.660 24 23 based/0 four/0.203 austerity/85.86 the/98.57 four/57.89 developed/188.5 30 based/0 on/10.99 for/1.683 25 is/29.99 26 sonat/0 study/11.19 on/10.41 51 for/0 based/0 as/18.04 27 29 steady/0 for/2.234 based/0 as/18.04 suggested/92.75 40 61 sonat/0 i/0 develop/174.3 50 its/47.22 steady/0 four/57.01 i./10.71 would/0 not/0 see/97.62 based/0 just/0 0 2 3 4 8 to/0 22 vote/0 suggest/0 four/0.113 sonat/0 a/0 if/81.46 i/0 are/58.93 to/76.96 33 49 28 1 they/55.21 form/51.87 sonat/0 how/238.1 based/37.40 steady/0.023 anyone/0 is/29.99 21 17 devotes/127.1 on/11.00 it/73.10 foreign/74.75 9 32 to/76.32 study/12.52 to/80.41 whom/157.8 60 one/0.5 did/75.42 vote/0 foreign/126.3 18 19 20 sonat/0 48 ford/42.80 base/0 on/11.01 his/113.1 teddy/3.808 the/54.78 who/0 way/87.54 5 do/65.48 devote/0 based/53.86 study/11.87 four/0 that/133.4 38 36 on/11.00 when/0.933 make/0 on/0 16 6 10 to/201.2 study/13.62 for/0 39 devote/0 its/47.22 are/96.33 any/55.00 would/110.6 37 food/74.85 standing/107.2 like/0 this/0 </s>/0 make/0 anyway/93.69 a/0 decision/0 vote/0 for/0 a/0 62 63 64 65 66/0 44 11 12 13 and/65.94 a/0 of/63.72 steady/0 43 42 45 anyone/0 made/132.9 four/12.94 57 his/113.1 a/106.9 study/0 7 who/0 14 59 base/12.53 15 to/156.0 to/76.32 34 are/111.9 steady/0 donna/0 the/144.9 a/148.7 four/0 41 58 devote/0 on/106 study/11.19 force/158.2 on/0 austerity/85.86 more/264.4 56 based/0 55 standing/107.2 ford/94.43 donna/0 31 fork/87.49 steady/0 bass/0 a/0 on/105.9 54 52 53 on/120.3 form/103.5 study/13.62 steady/0 donna/0 formed/101.8 46 based/53.86 47 study/11.89 former/165.2 fort/73.29 forth/124.9 forward/166.8 fourth/122.7 a/108.3 • compressed NIST ’93 HUB-1 lattices for/0 are/111.9 35 base/12.52 bass/0 – average of 800 arcs/lattice (max 15000 arcs) – average of 100 nodes/lattice (max 500 nodes) 7/22/2004 ACL04: Attention Shifting for Parsing Speech 6

Best-first Word-lattice Parsing NP(0,4) 0.567 VP NN(3,9) 0.550 VB VB VB NN man/0 4 is/0 early/0 5 Agenda mans/0.510 surly/0.694 (Priority Queue) DT </s>/0 the/0 2 man’s/0.916 7 8/1.307 early/0 <s>/0 6 0 1 duh/1.223 surely/0 man/0 is/0 3 10 9 Compute Grammar FOM • Bottom-up best-first PCFG parser • Stack-based search technique based on figure-of-merit • Attempts to work on “likely” parts of the chart • Ideal figure-of-merit: P ( edge ) = inside ( edge ) ∗ outside ( edge ) details in (Hall & Johnson ASRU03) 7/22/2004 ACL04: Attention Shifting for Parsing Speech 7

Word-lattice Parsing Word-lattice Compresed lattice First Outside Stage HMM Best-first PCFG Parser • First stage: best-first bottom-up PCFG parser Syntactic Category • Second stage: Charniak Parser Language Model Lattice (Charniak ACL01) • Parsing from lattice allows structure sharing Inside-Outside Prunning • Combines search for candidate lattice paths with search for candidate parses Local Trees Second Lexicalized Syntactic Stage Language Model (Charniak Parser) Word String from Optimal Parse 7/22/2004 ACL04: Attention Shifting for Parsing Speech 8

Multi-stage Deficiency • First-stage PCFG parser selects parses for a subset of word-lattice paths • Lexicalized syntactic analysis not performed on all of the word-lattice • Covering entire word-lattice requires excessive over-parsing – 100X over-parsing produces forests too large for lexical-parser – additional pruning required, resulting in loss of lattice-paths • Attention shifting algorithm addresses the coverage problem 7/22/2004 ACL04: Attention Shifting for Parsing Speech 9

Attention Shifting PCFG Word-lattice Parser • Iterative reparsing Identify 1. Perform best-first PCFG parsing (over-parse as Unused Words with normal best-first parsing) 2. Identify words not covered by a complete parse (unused word has 0 outside probability) Clear Agenda/ Add Edges for 3. Reset parse Agenda to contain unused words Unused Words 4. If Agenda � = ∅ repeat • Prune chart using inside/outside pruning • At most | A | iterations ( | A | = number of arcs) Is Agenda no Empty? • Forces coverage of word-lattice yes Continue Multi-stage Parsing 7/22/2004 ACL04: Attention Shifting for Parsing Speech 10

Experimental Setup • PCFG Parser trained on Penn WSJ Treebank f2-21,24 (speech-normalization via Roark’s normalization) – Generated at most 30k local-trees for second-stage parser • Lexicalized parser: Charniak’s Language Model Parser (Charniak ACL01) – trained on parsed BLLIP99 corpus (30 million words of WSJ) – BLLIP99 parsed using Charniak string parser trained on Penn WSJ 7/22/2004 ACL04: Attention Shifting for Parsing Speech 11

Evaluation • Evaluation set: NIST ’93 HUB-1 – 213 utterances – Professional readers reading WSJ text • Word-lattices evaluated on: – n -best word-lattices using Chelba A ∗ decoder ( 50 -best paths) – compressed acoustic word-lattices • Metrics – Word-lattice accuracy (first-stage parser): Oracle Word Error Rate – Word-string accuracy (multi-stage parser): Word Error Rate – Efficiency: number of parser agenda operations 7/22/2004 ACL04: Attention Shifting for Parsing Speech 12

Results: n -best word-lattices • Charniak parser run on each of the n -best strings (reranking) (4X over-parsing) • n -best word-lattice: pruned acoustic word-lattices containing only n -best word-strings • Oracle WER of n -best lattices: 7 . 75 Model # edge pops Oracle WER WER n –best (Charniak) 2.5 million 7.75 11.8 100x LatParse 3.4 million 8.18 12.0 10x AttShift 564,895 7.78 11.9 7/22/2004 ACL04: Attention Shifting for Parsing Speech 13

Results: Acoustic word-lattices • Compressed acoustic lattices Model # edge pops Oracle WER WER acoustic lats N/A 3.26 N/A 100x LatParse 3.4 million 5.45 13.1 10x AttShift 1.6 million 4.17 13.1 7/22/2004 ACL04: Attention Shifting for Parsing Speech 14

Conclusion • Attention shifting – Improves parsing efficiency – Increases first-stage accuracy (correcting for best-first search errors) – Does not improve multi-stage accuracy • Pruning for second-stage parser constrains number of edges • Useful for best-first word-lattices parsing 7/22/2004 ACL04: Attention Shifting for Parsing Speech 15

Attention Shifting for Parsing Speech Keith Hall and Mark Johnson - PowerPoint PPT Presentation

Attention Shifting for Parsing Speech Keith Hall and Mark Johnson Brown University ACL 2004 July 22, 2004 Attention Shifting Iterative best-first word-lattice parsing algorithm Posits a complete syntactic analyses for each path of a

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

waru) NoR e-vP + c;-Vta eillt< a'v\ if St-. 9; t +t^,L ""I-t^l- .,,[*l4t'n irt(<

Kan-injectivity and KZ-monads Lurdes Sousa IPV / CMUC July 10, 2018 Category Theory 2018,

Data Mining in the Humanities Text Data Beatrice Alex balex@inf.ed.ac.uk DigitalHSS Seminar,

Introduction scopes and structure the EEE station MRPC technology telescopes

COPE : : S PO POTTING M ON ONEY L AU AUNDERING B AS ON G RAP ASED ED ON RAPHS Xiangfeng Li 1 ,

Combatting Money Laundering in BC Real Estate Expert Panel, Professor Maureen Maloney, Chair,

Lets get schizophrenic ...by doing our laundry Matthis Kruse @Natriumchlorid Saarland

BitConeView: Visualization of Flows in the Bitcoin Transaction Graph IEEE Symposium on

Sambuz

Useful Links

Newsletter

Mail Us

Attention Shifting for Parsing Speech Keith Hall and Mark Johnson - PowerPoint PPT Presentation

Attention Shifting for Parsing Speech Keith Hall and Mark Johnson Brown University ACL 2004 July 22, 2004 Attention Shifting Iterative best-first word-lattice parsing algorithm Posits a complete syntactic analyses for each path of a

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

waru) NoR *e-vP + c;-Vta eillt&lt; a'v\ if* St-. 9; t +t^,L &quot;&quot;I-t^l- .,,[*l4t'n irt(&lt;

Kan-injectivity and KZ-monads Lurdes Sousa IPV / CMUC July 10, 2018 Category Theory 2018,

Data Mining in the Humanities Text Data Beatrice Alex balex@inf.ed.ac.uk DigitalHSS Seminar,

Introduction scopes and structure the EEE station MRPC technology telescopes

COPE : : S PO POTTING M ON ONEY L AU AUNDERING B AS ON G RAP ASED ED ON RAPHS Xiangfeng Li 1 ,

Combatting Money Laundering in BC Real Estate Expert Panel, Professor Maureen Maloney, Chair,

Lets get schizophrenic ...by doing our laundry Matthis Kruse @Natriumchlorid Saarland

BitConeView: Visualization of Flows in the Bitcoin Transaction Graph IEEE Symposium on

Sambuz

Useful Links

Newsletter

Mail Us

waru) NoR e-vP + c;-Vta eillt< a'v\ if St-. 9; t +t^,L ""I-t^l- .,,[*l4t'n irt(<