Summary of the REVERB challenge .. Reinhold Haeb Umbach, Keisuke - PowerPoint PPT Presentation

http://reverb2014.dereverberation.com/ Summary of the REVERB challenge .. Reinhold Haeb ‐ Umbach, Keisuke Kinoshita, Emanuel Habets International AudioLabs Erlangen Volker Leutnant Marc Delcroix, Paderborn Univ. Takuya Yoshioka, Tomohiro Nakatani NTT Corporation Walter Kellermann, Sharon Gannot Bhiksha Raj Armin Sehr Carnegie Mellon Univ. Bar ‐ Ilan Univ. Beuth Univ. of Roland Maas Applied Sciences Berlin Univ. of Erlangen ‐ Nuremberg

Outline - Motivation and design of the REVERB challenge - Summary of the participants’ systems - Result summary - The ASR results - The SE (Speech Enhancement) results - Concluding remarks 2

Motivation  Recently, substantial progress made in the field of reverberant speech signal processing, including - Single- and multi-channel de-reverberation techniques - ASR techniques robust to reverberation  Lack of common evaluation framework REVERB challenge to provide a common evaluation framework for both ASR and SE studies 3

Target acoustic scenarios - Reverberant - Moderate stationary noise (~ SNR* 20dB) - 1ch, 2ch and 8ch scenarios Fig: One of microphone arrays used * “S” includes direct signal and early reflections up to 50ms. 4

The challenge data (1/ 2) - Based on Wall Street Journal Cambridge (WSJCAM0) 5K task Real recordings (RealData) * 1 & simulated data (SimData) * 2 - (Development and evaluation sets provided) - RealData for validity assessment in real reverb conditions - SimData for experiments in various reverb conditions (A part of SimData simulates RealData in terms of the reverb time) - Text prompts used for both data were the same. - Clean and multi-condition (simulated) training data provided * 1 RealData is available from the LDC catalog as a part of MC-WSJ-AV corpus (since April 2014). * 2 Materials required to generate SimData is available on our webpage. The data will soon be available http://catalog.ldc.upenn.edu/LDC2014S03 through the LDC catalog. 5

The challenge data (2/ 2) - Acoustic conditions for SimData and RealData Reverb time (T 60 ) Distance between speaker and mic SimData 0.25s , 0.5s, 0.7s* near: 0.5m (Room1, 2, 3) far: 2.0m RealData 0.7s* near: ～ 1.0m far: > 2.5m * SimData room3 simulates RealData - Sound examples RealData (far) SimData (Room2, far) Male Female Male Female Clean/Headset Observed 6

The challenge tasks: ASR and SE - ASR task - Evaluation criterion: Word Error Rate (WER) - SE task - Objective evaluation criteria - Intrusive measure (that requires reference clean speech) - Cepstrum distance (CD) - Freq-weighted segmental SNR (FWsegSNR) - Log likelihood ratio (LLR) - PESQ (optional) - Non-intrusive measure - Speech-to-reverb modulation ratio (SRMR) - Subjective evaluation criteria (web-based MUSHRA test) - Perceived amount of reverberation - Overall quality (i.e.,artifacts, distortions, remaining reverb and etc) - Same test & training data provided for both tasks 7

Number of submissions - 27 participants (i.e., # of papers) - 18 submissions (incl. 49 systems) to the ASR task - 14 submissions (incl. 25 systems) to the SE task - Percentage of 1ch, 2ch and 8ch systems in each task - 8

Quick introduction to the submitted participants’ systems 9

A wide variety of approaches submitted Spatial filtering 1ch SE/FE Main focus of SE participants 10

A wide variety of approaches submitted Robust feature Spatial filtering 1ch SE/FE Decoding Extraction/ normalization AM LM Main focus of ASR participants 11

A wide variety of approaches submitted System Robust feature Spatial filtering 1ch SE/FE Decoding combination Extraction/ normalization AM LM Main focus of ASR participants 12

A wide variety of approaches submitted System Robust feature Spatial filtering 1ch SE/FE Decoding combination Extraction/ normalization AM LM Adapt. Main focus of ASR participants Submission ranges from 1ch/multi ‐ channel SE algorithms to the ASR back ‐ end algorithms. 13

Various approaches (1/ 4) System Robust feature Spatial filtering 1ch SE/FE Decoding combination Extraction/ normalization ‐ De ‐ reverb AM LM ‐ STFT domain ‐ Inverse filtering ‐ Linear prediction ‐ Correlation shaping Adapt. ‐ DOA detection based Beamformer ‐ Mask ‐ based approach ‐ Phase ‐ error filter ‐ Magnitude spec domain ‐ Estimation of nonnegative RIRs ‐ De ‐ noising (STFT, auditory ‐ feature domain) e.g., MVDR, delay ‐ sum, GSC, Mch ‐ WF. 14

Various approaches (2/ 4) System Robust feature Spatial filtering 1ch SE/FE Decoding combination Extraction/ normalization AM LM ‐ De ‐ reverb ‐ Power/magnitude/auditory spec domain e.g., Exponential RIR model, Adapt. Linear prediction, Non ‐ negative Matrix Fact./Deconv., DNN/DRNN/DAE based dereverb ‐ Cepstral domain e.g., Cepstral smoothing, ML ‐ based inverse filter estimation ‐ De ‐ noising e.g., SS, MMSE ‐ STSA. 15

Various approaches (3/ 4) System Robust feature Spatial filtering 1ch SE/FE Decoding combination Extraction/ normalization AM LM ‐ Robust features e.g., PLP, auditory/articulatory based features, modified cepstral features, Adapt. i ‐ vector, warped MVDR, etc... ‐ Normalizatoin e.g., CMS, VTLN, CMLLR, (H)LDA, 16

Various approaches (4/ 4) System Robust feature Spatial filtering 1ch SE/FE Decoding combination Extraction/ normalization ‐ Acoustic model AM LM ‐ GMM ‐ SGMM ‐ DNN ‐ LSTM Adapt. ‐ Adaptation ‐ MLLR ‐ DNN ‐ adaptation ‐ System combination ‐ Training ‐ ROVER ‐ Clean/multi ‐ condition ‐ Multi ‐ stream HMM ‐ SAT ‐ Decoding ‐ ML/MMI/bMMI ‐ Minimum Bayes risk dec. 17

Various approaches (4/ 4) System Robust feature Spatial filtering 1ch SE/FE Decoding combination Extraction/ normalization AM LM Adapt. 18

19 Now, the results... 

Results already publicly available - Results for the ASR task http://reverb2014.dereverberation.com/result_asr.html - Results for the SE task http://reverb2014.dereverberation.com/result_se.html Note: More results (detailed/new/updated results) are available in participants’ papers. 20

21 Let’s start with the ASR results... 

ASR results: baselines 100 HTK ‐ baseline (clean training) HTK ‐ baseline+CMLLR (clean training) HTK ‐ baseline WER (%) (multicondition training) 50 HTK ‐ baseline+CMLLR (multicondition training) Recognition of unprocessed 0 1ch observation Near Near Near Near Far Far Far Far Small room Mid. room Large room Large room SimData RealData 22

ASR results: at a glance - All the submitted WERs (everything mixed, not a fair comparison) 100 HTK ‐ baseline (clean training) HTK ‐ baseline+CMLLR (clean training) HTK ‐ baseline WER (%) (multicondition training) 50 HTK ‐ baseline+CMLLR (multicondition training) Clean/Headset WERs 0 Near Near Near Near Far Far Far Far Small room Mid. room Large room Large room SimData RealData 23

ASR results analysis with bubble chart - Relationship between (averaged) WER and # of mic., data and acoust. model The size of a circle indicates the # of systems in the corresponding category 24

ASR results analysis with bubble chart Results per 1ch, 2ch and 8ch systems More microphones lead to better performance 25

ASR results analysis with bubble chart Training data: “Clean” vs “multi-condition” vs “own data” ※ E.g., WSJ America, Data with different SNRs More training data (acoustic variety) lead to better performance 26

ASR results analysis with bubble chart GMM -HMM recognizers vs DNN -HMM recognizers - The top-performing systems often employ DNN-HMM - Resultant performance may differ due to the front-end proc. and the DNN config. etc 27

ASR results analysis: SimData vs RealData - Relationship between SimData scores and RealData scores SimData vs RealData SimData Room3 Far vs RealData Very strong correlation between SimData and RealData scores (Even stronger between SimData Room3 Far and RealData) 28

ASR results: Some remarks... - Strategies often present in the top-performing systems include: - Some kind of dereverberation (STFT/Amp spec/feature domain) - Linear Multi-ch filtering (MVDR, DS, etc) often for denoising - Strong backend (e.g., DNN-HMM recognizer, sophisticated adaptation, robust feature extraction, multi-condition training) - System combination - However, it’s hard to tell the exact impact of each SE/ASR technique. (It’s something we should discover at this workshop!) - Some more works required to achieve the clean/headset performance. (E.g., for RealData, the headset WER is roughly 60% of the best performing system.) 29

30 Now, the SE part... 

- An important question in the SE task- Most submissions managed to improve the objective measures (cf. webpage, presentations) , but how about their subjective qualities? 31

Summary of the REVERB challenge .. Reinhold Haeb Umbach, Keisuke - PowerPoint PPT Presentation

http://reverb2014.dereverberation.com/ Summary of the REVERB challenge .. Reinhold Haeb Umbach, Keisuke Kinoshita, Emanuel Habets International AudioLabs Erlangen Volker Leutnant Marc Delcroix, Paderborn Univ. Takuya Yoshioka, Tomohiro

Ac#ve Velocity Acous#c Absorber Why is the AVAA necessary

Long live robustness! Michael L. Seltzer Microsoft Research REVERB 2014 | May 10, 2014

VAST CHALLENGE 2017 Bianca Barnucz & Stephanie Wegscheidl OVERVIEW VAST Challenge

ReSAKSS DATA CHALLENGE Annual Newsletter www.resakss.org/challenge ReSAKSS DATA CHALLENGE ANNUAL

STEP CHALLENGE February 7 th March 8 th CHALLENGE OVERVIEW This Step Challenge is a fun

Michelin Challenge Bibendum 2014 CONTENT CHALLENGE BIBENDUM THINK & ACTION TANK TO

Ultimately our vision is about GRAND CHALLENGE using science to make a difference in the world.

New Challenge 10 New Challenge 10 June 1, 2007 Business environment Direction Challenge

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

Heat Program Challenge: Risk Perception Source: NOAA, ADHS Challenge: Risk Perception Source:

Arizona FAF$A Challenge Julie Sainz, M.Ed. Arizona FAF$A Challenge Project Manager Arizona

City of Santa Clara Challenge Team May 10, 2017 https://hkidsf.org/our-programs/challenge-team/

@ International KEYSTONE Challenge Track Conference Challenge Track Koice 11 12 May 2015

THIS IS WHERE CHANGE BEGINS - Worlds Challenge Challenge February 12, 2018 1 AGENDA

www.bpho.org.uk Oxford 24 th June 2014 Physics Challenge AS Challenge A2 Challenge

Smarter Cities Challenge Burlington, Vermont 2 | Smarter Cities Challenge Mission

1. The code that we have written is, as I pointed out already, a complete code that can be

Low Scale Baryogenesis from Hidden Bubble Collisions An ds ey Ka t{ work in progr et s w/ Toni

Visualization with scatterplots Kelly McConville Assistant Professor of Statistics DataCamp

Data Visualization with R Data Visualization with R Workshop Day 2 Workshop Day 2 Determining

Advanced Data Mining with Weka Class 5 Lesson 1 Invoking Python from Weka Peter Reutemann

Pathways for Discovery of Free Sofware Katherine Thornton, Morane Gruenpeter Wikidata for Digital

Objectives Chapter 23 Sorting To study and analyze time complexity of various sorting

Bubble Sort Algorithm for (int outer=0; outer<a.length-1; outer++) { for (int inner=0;

Sambuz

Useful Links

Newsletter

Mail Us

Summary of the REVERB challenge .. Reinhold Haeb Umbach, Keisuke - PowerPoint PPT Presentation

http://reverb2014.dereverberation.com/ Summary of the REVERB challenge .. Reinhold Haeb Umbach, Keisuke Kinoshita, Emanuel Habets International AudioLabs Erlangen Volker Leutnant Marc Delcroix, Paderborn Univ. Takuya Yoshioka, Tomohiro

Ac#ve Velocity Acous#c Absorber Why is the AVAA necessary

Long live robustness! Michael L. Seltzer Microsoft Research REVERB 2014 | May 10, 2014

VAST CHALLENGE 2017 Bianca Barnucz &amp; Stephanie Wegscheidl OVERVIEW VAST Challenge

ReSAKSS DATA CHALLENGE Annual Newsletter www.resakss.org/challenge ReSAKSS DATA CHALLENGE ANNUAL

STEP CHALLENGE February 7 th March 8 th CHALLENGE OVERVIEW This Step Challenge is a fun

Michelin Challenge Bibendum 2014 CONTENT CHALLENGE BIBENDUM THINK &amp; ACTION TANK TO

Ultimately our vision is about GRAND CHALLENGE using science to make a difference in the world.

New Challenge 10 New Challenge 10 June 1, 2007 Business environment Direction Challenge

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

Heat Program Challenge: Risk Perception Source: NOAA, ADHS Challenge: Risk Perception Source:

Arizona FAF$A Challenge Julie Sainz, M.Ed. Arizona FAF$A Challenge Project Manager Arizona

City of Santa Clara Challenge Team May 10, 2017 https://hkidsf.org/our-programs/challenge-team/

@ International KEYSTONE Challenge Track Conference Challenge Track Koice 11 12 May 2015

THIS IS WHERE CHANGE BEGINS - Worlds Challenge Challenge February 12, 2018 1 AGENDA

www.bpho.org.uk Oxford 24 th June 2014 Physics Challenge AS Challenge A2 Challenge

Smarter Cities Challenge Burlington, Vermont 2 | Smarter Cities Challenge Mission

1. The code that we have written is, as I pointed out already, a complete code that can be

Low Scale Baryogenesis from Hidden Bubble Collisions An ds ey Ka t{ work in progr et s w/ Toni

Visualization with scatterplots Kelly McConville Assistant Professor of Statistics DataCamp

Data Visualization with R Data Visualization with R Workshop Day 2 Workshop Day 2 Determining

Advanced Data Mining with Weka Class 5 Lesson 1 Invoking Python from Weka Peter Reutemann

Pathways for Discovery of Free Sofware Katherine Thornton, Morane Gruenpeter Wikidata for Digital

Objectives Chapter 23 Sorting To study and analyze time complexity of various sorting

Bubble Sort Algorithm for (int outer=0; outer&lt;a.length-1; outer++) { for (int inner=0;

Sambuz

Useful Links

Newsletter

Mail Us

VAST CHALLENGE 2017 Bianca Barnucz & Stephanie Wegscheidl OVERVIEW VAST Challenge

Michelin Challenge Bibendum 2014 CONTENT CHALLENGE BIBENDUM THINK & ACTION TANK TO

Bubble Sort Algorithm for (int outer=0; outer<a.length-1; outer++) { for (int inner=0;