Enriching confusion networks for post-processing Sahar Ghannay, - PowerPoint PPT Presentation

Enriching confusion networks for post-processing Sahar Ghannay, Yannick Estève, Nathalie Camelin LIUM, IICC, Le Mans University SLSP 2017, Le Mans, France 23/10/2017

1. Introduction 2.Word embeddings 3.Similarity measure 4.Experiments 5. Conclusion I NTRODUCTION ✤ Automatic speech recognition (ASR) errors are still unavoidable ✤ Impact of ASR errors ✦ Information retrieval, ✦ Speech to speech translation, ✦ Spoken language understanding, ✦ Subtitling ✦ Etc. � 2

1. Introduction 2.Word embeddings 3.Similarity measure 4.Experiments 5. Conclusion I NTRODUCTION ✤ Detection and correction of ASR errors ✦ Improve recognition accuracy: using post processing of ASR outputs [S. Stoyanchev et. al 2012, E. Pincus et. al 2014] ✦ Decrease word error rate using of confusion networks (CN) [L. Mangu et. al 2000] ✦ Correct erroneous words in CNs [Y. Fusayasu et. al 2015] ✦ Improve post-processing of ASR outputs using CNs - Propose alternative word hypotheses when ASR outputs are corrected by a human on post-edition ‣ CN bins don’t have a fixed length and sometimes contain one or two words ‣ Number of alternatives to correct a misrecognized word is very low � 3

1. Introduction 2.Word embeddings 3.Similarity measure 4.Experiments 5. Conclusion C ONTRIBUTIONS ➡ Approach of CN enrichment ✦ Assumption: words in the same bin should be close in terms of acoustics and /or linguistics ✦ New similarity measure computed from acoustic and linguistic word embeddings ➡ Evaluation ✦ Predict potential ASR errors for rare words ✦ Enrich CN to improve post-edition of automatic transcriptions ✦ Propose semantically relevant alternative words to ASR outputs for Spoken Langage Understanding (SLU) system � 4

1. Introduction Acoustic embeddings 2.Word embeddings Linguistic embeddings 3.Similarity measure 4.Experiments 5. Conclusion Conclusions Et Perspectives W ORD EMBEDDINGS A COUSTIC EMBEDDINGS ✤ f: speech segments → ℝ n is a function for mapping speech segments to low-dimensional vectors. words that sound similar = neighbors in the continuous space ✤ Successfully used in: ✦ Query-by-example search system [levin et al, 2013, kamper et al, 2015] ✦ ASR lattice re-scoring system [S. Bengio and Heiglod 2014] ✦ ASR Error detection [S. Ghannay et al, 2016] � 5

1. Introduction Acoustic embeddings 2.Word embeddings Linguistic embeddings 3.Similarity measure 4.Experiments 5. Conclusion Conclusions Et Perspectives W ORD EMBEDDINGS A COUSTIC EMBEDDINGS -A RCHITECTURE Approach inspired by [Bengio and Heiglod 2014] acoustic word embedding ( a ) acoustic signal embedding ( s ) Loss = max(0 , m − Sim dot ( s, w + ) + Sim dot ( s, w − )) CNN DNN CNN Triplet Ranking Loss Softmax Softmax Embedding s Embedding s Embedding w- fully fully Embedding w+ connected connected layers layers convolution convolution and max and max O+ O- pooling pooling layers layers Orthographic embedding ( o ) .... .... .. .. .. .. .. .. .. .. .. .. .. .. .. .. Lookup table Orthographic representation bag of letter n-grams bag of letter n-grams filter bank features Word Wrong word 1 word = Vec 2300 D bag of letter n-grams= 10222 tri-bi-1-grammes � 6

1. Introduction Acoustic embeddings 2.Word embeddings Linguistic embeddings 3.Similarity measure 4.Experiments 5. Conclusion Conclusions Et Perspectives L INGUISTIC EMBEDDINGS C OMBINED WORD EMBEDDINGS Skip-gram [T. Mikolov et al. 2013] Evaluation and combination of word embeddings [S.Ghannay et al. SLSP 2015, LREC 2016] wi-2 ✤ ASR error detection ✤ NLP tasks wi-1 ✤ Analogical and similarity tasks wi wi+1 ➡ Combination of word embeddings through PCA yields good results on analogical and similarity task wi+2 Principal Component Analysis w2vf-deps [O. Levy et al. 2014] 600-d 600-d New 600-d 600-d coordinate Correlation Vk N words Concat PCA matrix system (k=200) 200-d 200-d 600-d GloVe [J. Pennington et al. 2014] N words 600-d N words X Vk = Concat ✤ building a co-occurrence matrix ✤ estimating continuous representations of the words Combined word � 7 embeddings

1. Introduction 2.Word embeddings 3. Similarity measure 4.Experiments 5. Conclusion Conclusions Et Perspectives S IMILARITY MEASURE TO ENRICH CONFUSION NETWORKS (1/2) ✤ Enriching confusing network by adding nearest neighbors Based on cosine similarities (A Sim , L Sim ) of acoustic and linguistic ✦ embeddings LA SimInter ( λ , x, y ) = (1 − λ ) × L Sim ( x, y ) + λ × A Sim ( x, y ) Optimisation of ℷ value: ✦ ˆ λ = argmin λ MSE ( ∀ ( h, r ) : P ( h | r ) , LA SimInter ( λ , h, r )) � 8

1. Introduction 2.Word embeddings 3. Similarity measure 4.Experiments 5. Conclusion Conclusions Et Perspectives S IMILARITY MEASURE TO ENRICH CONFUSION NETWORKS (2/2) ✤ Nearest neighbors of the hypothesis word portables Nearest neighbors of the French word ’portables’, pronounced \ pOKtabl \ L Sim t´ el´ ephones, ordinateurs, portable, portatif telephones, computers, portable, portable \ telefOn \\ OKdinatœK \\ pOKtabl \\ pOKtatif \ A Sim portable, portant, portant, portait portable, carrying, racks, carried \ pOKtabl \\ pOKt˜ a \\ pOKt˜ a \\ pOKtE \ LA SimInter portable, portant, portatif, portait portable, carrying, portable, carried \ pOKtabl \\ pOKt˜ a \\ pOKtatif \\ pOKtE \ � 9

1. Introduction Experimental setup 2.Word embeddings Tasks and evaluation score 3. Similarity measure Experimental results 4.Experiments Conclusions Et Perspectives 5. Conclusion E XPERIMENTS E XPERIMENTAL SETUP ✤ Training data of acoustic embeddings ✦ 488 hours of French Broadcast news (ESTER1, ESTER2 et EPAC) ✦ Vocabulary : 45k words and classes of homophones ✦ Occurrences : 5.75 millions ✤ Training data of the linguistic word embeddings Corpus composed of 2 billions of words: ✦ Articles of the French newspaper ”Le Monde”, ✦ French Gigaword corpus, ✦ Articles provided by Google News, ✦ Manual transcriptions: 400 hours of French broadcast news. � 10

1. Introduction Experimental setup 2.Word embeddings Tasks and evaluation score 3. Similarity measure Experimental results 4.Experiments Conclusions Et Perspectives 5. Conclusion E XPERIMENTS E XPERIMENTAL SETUP ✤ Experimental data ✦ ETAPE corpus of French broadcast news shows #sub. Error Name WER Sub.Err. - Enriched with automatic transcriptions generated by the LIUM pairs (ref, hyp) ASR system Train 25.3 10.3 30678 ✦ List of substitution errors: Test 21.9 8.3 4678 - Sub Train : estimate the interpolation coefficient Description of the - Sub Test : evaluate the performance of the Confusion Network experimental corpus (CN) enrichment approach - CN bins: Percentage of confusion network bins according to their sizes 30 22,5 15 7,5 0 1 2 3 4 5 6 [7-12] � 11

1. Introduction Experimental setup 2.Word embeddings Tasks and evaluation score 3. Similarity measure Experimental results 4.Experiments Conclusions Et Perspectives 5. Conclusion E XPERIMENTS T ASKS AND EVALUATION SCORE ✤ Two Evaluation tasks ✦ Task 1: prediction of errors for rare words (a = ref, b = hyp) ✦ Task 2: post processing of ASR errors (a = hyp, b = ref) ➡ Given a word pair (a,b) in a list L of m substitution errors ➡ looking for b in list N of the n nearest words of a based on the similarity measure Γ : A Sim, or L Sim, or LA SimInter P m i =1 f ( i, Γ , n ) × #( a i , b i ) ✤ Evaluation score: S ( Γ , n ) = P m i =1 #( a i , b i ) ⇢ 1 if b i ⊂ N ( a i , Γ , n ) f ( i, Γ , n ) = 0 otherwise � 12

1. Introduction Experimental setup 2.Word embeddings Tasks and evaluation score 3. Similarity measure Experimental results 4.Experiments Conclusions Et Perspectives 5. Conclusion E XPERIMENTS E XPERIMENTAL RESULTS ✤ Prediction of potential error for rare words ✦ List of rare words : 538 pairs of substitution errors ✦ Lists: List SimL, List SimA, List SimInter of nearest neighbors to the reference word (r) 0.4 ListSimInter LA_SimInter ListSimA A_Sim ListSimL L_Sim ● 0.3 n t a n o 0.2 i s i c e r ● ● ● ● ● P ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.1 ● ● ● ● ● ● ● ● ● ● 0.0 1 2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 24 26 28 30 List size � 13 list size

1. Introduction Experimental setup 2.Word embeddings Tasks and evaluation score 3. Similarity measure Experimental results 4.Experiments Conclusions Et Perspectives 5. Conclusion E XPERIMENTS E XPERIMENTAL RESULTS ✤ The similarity LA SimInter is used to: ✦ Enrich confusion networks bins with nearest neighbors of hypothesis (hyp) word - Evaluation on post processing of automatic transcriptions List CN List ErichCN P@6 0,17 0,21 (+23,5%) � 14

1. Introduction Experimental setup 2.Word embeddings Tasks and evaluation score 3. Similarity measure Experimental results 4.Experiments Conclusions Et Perspectives 5. Conclusion E XPERIMENTS E XPERIMENTAL RESULTS ✤ The similarity LA SimInter is used to: ✦ Expand the automatic transcriptions (1-best) provided for a spoken language understanding (SLU) system -> build confusion networks - Task: correction of semantically relevant erroneous word - Data: French MEDIA corpus (1257 dialogues for hotel reservation) - Evaluation corpus: 1204 occurrences of semantically relevant erroneous words Enrich1-best P@6 0,206 � 15

Enriching confusion networks for post-processing Sahar Ghannay, - PowerPoint PPT Presentation

Enriching confusion networks for post-processing Sahar Ghannay, Yannick Estve, Nathalie Camelin LIUM, IICC, Le Mans University SLSP 2017, Le Mans, France 23/10/2017 1. Introduction 2.Word embeddings 3.Similarity measure 4.Experiments 5.

Pedestrian Pedestrian Pedestrian C Pedestrian C C Crossing confusion rossing confusion

Trademark Law as to source) (are these forms of irrelevant confusion?): Prof. Madison 1.

Post Processing Effects By Michael Michuki What is Post processing? Post Processing is the

STAR-CCM+ Pre/Post Processing Bill Jester, CD-adapco Introduction Pre/Post Processing

The Web of Confusion Douglas Crockford Yahoo! Inc. http://crockford.com/codecamp/confusion.ppt

Confusion Detection in Code Reviews Felipe Ebert Fernando Castor Nicole Novielli Alexander

LEARNING Outline Confusion Matrix F1 Score Gain and Lift Charts Kolmogorov Smirnov

Trademark and Unfair Competition Law Slides 19: Surveys & More Confusion LAWS 7341-001

Indifferentiability of Confusion- Diffusion Networks Yevgeniy Dodis (NYU), Martijn Stam

Trademark Confusion: Proving or Defending Against Infringement Addressing Forward, Reverse,

Trademark Confusion: Proving or Defending Against Infringement Addressing Forward, Reverse,

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

Rail Resource Management Rail Resource Management (RRM) (RRM) Post Post Post Post

DATA BINDING Client-side View of Data Client Server MY BLOG This is my first post. ADD POST

Multiscale Processing on Networks and Community Mining Part 1 - Communities in networks Graph

Modelling of thin and imperfect interfaces Public Licentiate Thesis Defence, Wednesday 13 May 2018

Investigations on Translation Model Adaptation Using Monolingual Data Patrik Lambert, Holger

Texture-Structure-Microstructure: a combined analysis by x-ray diffraction of Pb 0.76 Ca 0.24 TiO

The Risk-Sensitive Switching Problem Under Knightian Uncertainty S.Hamad` ene & H.Wang

Viscosity Solutions of Path-Dependent PDEs Zhenjie Ren CMAP, Ecole Polytechnique The 3rd young

TouSIX First OpenFlow European IXP Marc Bruyre, CNRS 2 TouSIX First OpenFlow European IXP

Python for manufacturing musical instruments Olivier CAYROL - June 15th, 2016 2 Prolegomena

Pr t r

Enriching confusion networks for post-processing Sahar Ghannay, - PowerPoint PPT Presentation

Enriching confusion networks for post-processing Sahar Ghannay, Yannick Estve, Nathalie Camelin LIUM, IICC, Le Mans University SLSP 2017, Le Mans, France 23/10/2017 1. Introduction 2.Word embeddings 3.Similarity measure 4.Experiments 5.

Pedestrian Pedestrian Pedestrian C Pedestrian C C Crossing confusion rossing confusion

Trademark Law as to source) (are these forms of irrelevant confusion?): Prof. Madison 1.

Post Processing Effects By Michael Michuki What is Post processing? Post Processing is the

STAR-CCM+ Pre/Post Processing Bill Jester, CD-adapco Introduction Pre/Post Processing

The Web of Confusion Douglas Crockford Yahoo! Inc. http://crockford.com/codecamp/confusion.ppt

Confusion Detection in Code Reviews Felipe Ebert Fernando Castor Nicole Novielli Alexander

LEARNING Outline Confusion Matrix F1 Score Gain and Lift Charts Kolmogorov Smirnov

Trademark and Unfair Competition Law Slides 19: Surveys &amp; More Confusion LAWS 7341-001

Indifferentiability of Confusion- Diffusion Networks Yevgeniy Dodis (NYU), Martijn Stam

Trademark Confusion: Proving or Defending Against Infringement Addressing Forward, Reverse,

Trademark Confusion: Proving or Defending Against Infringement Addressing Forward, Reverse,

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

Rail Resource Management Rail Resource Management (RRM) (RRM) Post Post Post Post

DATA BINDING Client-side View of Data Client Server MY BLOG This is my first post. ADD POST

Multiscale Processing on Networks and Community Mining Part 1 - Communities in networks Graph

Modelling of thin and imperfect interfaces Public Licentiate Thesis Defence, Wednesday 13 May 2018

Investigations on Translation Model Adaptation Using Monolingual Data Patrik Lambert, Holger

Texture-Structure-Microstructure: a combined analysis by x-ray diffraction of Pb 0.76 Ca 0.24 TiO

The Risk-Sensitive Switching Problem Under Knightian Uncertainty S.Hamad` ene &amp; H.Wang

Viscosity Solutions of Path-Dependent PDEs Zhenjie Ren CMAP, Ecole Polytechnique The 3rd young

TouSIX First OpenFlow European IXP Marc Bruyre, CNRS 2 TouSIX First OpenFlow European IXP

Python for manufacturing musical instruments Olivier CAYROL - June 15th, 2016 2 Prolegomena

Pr t r

Trademark and Unfair Competition Law Slides 19: Surveys & More Confusion LAWS 7341-001

The Risk-Sensitive Switching Problem Under Knightian Uncertainty S.Hamad` ene & H.Wang