Meetings Research at ICSI Barbara Peskin reporting on work of: - PowerPoint PPT Presentation

Meetings Research at ICSI Barbara Peskin reporting on work of: Don Baron, Sonali Bhagat, Hannah Carvey, Rajdip Dhillon, Dan Ellis, David Gelbart, Adam Janin, Ashley Krupski, Nelson Morgan, Thilo Pfau, Elizabeth Shriberg, Andreas Stolcke, Chuck Wooters International Computer Science Institute Berkeley, CA M4 Meeting, Sheffield 29-30 January 2003 1

Overview • Automatic Speech Recognition (ASR) Research – Baseline performance – Language modeling exploration – Far-field acoustics – Speech activity detection • Sentence Segmentation & Disfluency Detection • Dialogue Acts: Annotation & Automatic Modeling M4 Meeting, Sheffield 29-30 January 2003 2

ASR Research: Baselines Meeting data formed a track of NIST’s RT-02 evaluation • Eval data (and limited dev) was available from 4 sites – test on 10-minute excerpts from 2 meetings from each site – only 5 transcribed meetings for dev (not all sites represented) – evaluation included both close-talking and table-top recordings – close-talking test used hand-segmented turns; far-field used automatic chopping • We used a Switchboard-trained recognizer from SRI – no Meeting data was used to train the models! – waveforms were downsampled to 8 kHz (for telephone bandwidth) – recognizer used gender-dependent models, feature normalization, VTLN, speaker adaptation (MLLR) and speaker-adaptive training (SAT), bigram lattice generation with trigram expansion, then interpolated class 4-gram LM N-best rescoring, … (fairly standard Hub 5 evaluation system) M4 Meeting, Sheffield 29-30 January 2003 3

Baselines (cont’d) word error rates (WER) on Meeting track of RT-02: Data source ⇒ ICSI CMU LDC NIST all SWB close-talking mic 25.9 47.9 36.8 35.2 36.0 30.2 table-top mic * 53.6 64.5 69.7 61.6 61.6 --- • Performance on close-talking mics quite comparable to SWB • Table just shows bottom-line numbers, but incremental improvements at each recognition stage parallel those on SWB • Overall, far-field WER’s about twice as high as close-talking • CMU data worst for close-talking (they used lapel mics, not headset) but difference disappears on far-field * note: table-top mic system was somewhat simplified (bigram LM, etc.) – insufficient gains from full system to justify added complexity M4 Meeting, Sheffield 29-30 January 2003 4

A Language Modeling Experiment Problem: RT-02 recognizer does not speak the Meetings language (many OOV words, unknown n-grams, etc.) Experiment: – train Meeting LM on 270k words of data from 28 ICSI meetings (excluding RT-02’s dev & eval meetings) – include all words from these meetings in recognizer’s vocabulary (~1200 new words) – interpolate Meeting LM with SWB-trained LM – choose interpolation weights by minimizing perplexity on 2 ICSI RT-02 dev meetings – test on 2 ICSI eval meetings using simplified recognition protocol SWB LM Interpolated LM WER 30.6% 28.4% OOV 1.5% 0.5% M4 Meeting, Sheffield 29-30 January 2003 5

Far-Field Acoustics • Far-field performance was improved by applying Weiner filtering techniques developed for the Aurora program – On RT-02 dev set, WER dropped 64.1% → 61.7% • Systematically addressed far-field acoustics using Digits Task – Model as convolutive distortion (reverb) followed by additive distortion (bkg noise) – For additive noise: used Weiner filtering approach, as above – For reverb: used long-term log spectral subtraction (similar to CMS but longer window) – See [D. Gelbart & N. Morgan, ICSLP-2002 ] for details baseline noise reducn log spec subtr both near 4.1% 3.6% 3.1% 2.7% WER on Mtg Digits far 26.3% 24.8% 8.2% 7.2% • Also explored PZM (high-quality) vs “PDA” (cheap mic) performance – “PDA” performance much worse, but above techniques greatly reduced difference – Error rates roughly comparable after processing as above M4 Meeting, Sheffield 29-30 January 2003 6

Speech Activity Detection Detecting regions of speech activity is a challenge for Meeting data, even on close-talking channels (due to cross-talk, etc.) • Standard echo cancellation techniques ineffective (due to head movement) • We devised an algorithm which performs SAD on close-talking channel, using information from all recorded channels – First, detect speech region candidates on each channel separately, using a standard two-class HMM with min duration constraints – Then compute cross-correlations between channels and threshold them to suppress detections due to cross-talk – Key feature is normalization of energy features on each channel not only for channel min but also by average across all channels • Greatly reduces error rates Frame error rate for speech/nonspeech detection: 18.6% → 13.7% → 12.0% – – WER for SWB-trained recognizer: within 10% (rel) of hand-segmented result; (cf. unsegmented waveforms 75% higher largely due to cross-talk insertions) Note: details can be found in [T. Pfau, D. Ellis, and A. Stolcke, ASRU-2001 ]. M4 Meeting, Sheffield 29-30 January 2003 7

“Hidden Event” Modeling • Detect events implicit in speech stream (e.g. sentence and topic breaks, disfluency locations, …) using prosodic & lexical cues • Developed by Shriberg, Stolcke, et al. at SRI (for topic and sentence segmentation of Broadcast News and Switchboard) • 3 main ingredients – Hidden event language model built from n-grams over words and event labels – Prosodic model built from features (phone & pause durations, pitch, energy) extracted within window around each interword boundary; classifies via decision trees – Model combination using HMM defined from hidden event LM and incorporating observation likelihoods for states from prosodic decision tree posteriors • Meetings work used parallel feature databases (true words, ASR) to detect sentence boundaries and disfluencies – for true: LM better than prosody – for recognized: prosody better than LM – combining models always helps, even when one is much better Note: details can be found in [D. Baron, E. Shriberg, and A. Stolcke, ICSLP-2002 ]. M4 Meeting, Sheffield 29-30 January 2003 8

Dialogue Acts To understand what’s going on in a meeting, we need more than the words ⇒ DA’s tell the role of an utterance in the discourse; use to spot topic shifts, floor grabbing, agreement / disagreement, etc. e.g. Yeah. (as backchannel) Yeah. (as response) Yeah? (as question) Yeah! (as exclamation) • Hand labeling now with goal of automatic labeling later • Using set of 58 tags refined for this work, based on SWB-DAMSL conventions • Using cues from both prosody and words • Currently more than 20 meetings (over 20 hours of speech) hand labeled • Started work on automatic modeling (in collaboration with SRI) A draft of the DA spec is available at our Meetings website: http://www.icsi.berkeley.edu/Speech/mr/ M4 Meeting, Sheffield 29-30 January 2003 9

Summary • Meetings support an amazing range of speech & language research (nearly “ASR complete”) • We are just starting to tap some of the possibilities, including – Automatically transcribing natural, spontaneous multi-party speech – Enriching language models to handle new / specialized topics – Detecting speech activity, segmenting speech stream, labeling talkers – Dealing with far-field acoustics – Moving beyond the words to model • hidden events such as sentence breaks and disfluencies • dialogue acts and discourse structure • We look forward to continued collaboration with the M4 community to tackle the challenges posed by Meeting data M4 Meeting, Sheffield 29-30 January 2003 10

Meetings Research at ICSI Barbara Peskin reporting on work of: - PowerPoint PPT Presentation

Meetings Research at ICSI Barbara Peskin reporting on work of: Don Baron, Sonali Bhagat, Hannah Carvey, Rajdip Dhillon, Dan Ellis, David Gelbart, Adam Janin, Ashley Krupski, Nelson Morgan, Thilo Pfau, Elizabeth Shriberg, Andreas Stolcke, Chuck

Meetings ASJ Types of company meetings The broader categories of the meetings are as follows:

The ICSI corpus; Browsing meetings nlssd natural language and speech system design . Steve

A Digital Fountain Approach to Reliable Distribution of Bulk Data John Byers, ICSI Michael Luby,

the eyes of Secretarial Standards By Pavan Kumar Vijay Past President, ICSI

ICSI Updates: Netalyzr Nicholas Weaver International Computer Science Institute 1

The ICSI Haystack A Platform for Hybrid Mobile Measurements in the Wild Narseo

Broker Bros New Communication Library Robin Sommer ICSI / LBNL / Broala

A First Look at Modern Enterprise Traffic Ruoming Pang , Princeton University Mark Allman ( ICSI

Open and Public Meetings Act Open and Public Meetings Act Purpose of the Open and Public Meetings

5/12/2016 Ground Rules for Meetings Ground Rules for Meetings (contd) The ground rules for

5/4/2016 Ground Rules for Meetings Ground Rules for Meetings (contd) The ground rules for the

Monday Tuesday Wednesday Thursday Friday 8:50 -9am Registration Maths meetings /Pre- Maths

@OhioComplete Complete to Compete Regional Meetings Complete to Compete Regional Meetings

The ICSI Meeting Corpus Barbara Peskin [on behalf of ICSIs MeetingRecorder Team]

ICANN Public Meetings Consolidated Meetings Strategy for 2014 2015 2016 Goals: Adopt

Open Public Open Public Meetings Meetings Act Act ____________________________________________

Implicit Prosodic Priming and Autistic Traits in Relative Clause Attachment Sun-Ah Jun &

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN 6820 - Speech and Audio

Some Cyclicity and Opacity Effects in the Prosody of Two Different Clitic Classes in New-

Announcements Extra credit Dont forget to enter questions on Canvas. At least 1 on

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

Critical Analysis & the Reading Process revised: 07.21.12 || English 1302: Composition II ||

LECTURE 25: PRESENTATIONS CSE 442 Software Engineering My Story My Story My Solution, Part

RUNNING FROM BEARS SOLVING YOUR PROBLEMS BY LETTING CHARACTERS SOLVE THEIRS 1 WHO AM I? 2

Sambuz

Useful Links

Newsletter

Mail Us

Meetings Research at ICSI Barbara Peskin reporting on work of: - PowerPoint PPT Presentation

Meetings Research at ICSI Barbara Peskin reporting on work of: Don Baron, Sonali Bhagat, Hannah Carvey, Rajdip Dhillon, Dan Ellis, David Gelbart, Adam Janin, Ashley Krupski, Nelson Morgan, Thilo Pfau, Elizabeth Shriberg, Andreas Stolcke, Chuck

Meetings ASJ Types of company meetings The broader categories of the meetings are as follows:

The ICSI corpus; Browsing meetings nlssd natural language and speech system design . Steve

A Digital Fountain Approach to Reliable Distribution of Bulk Data John Byers, ICSI Michael Luby,

the eyes of Secretarial Standards By Pavan Kumar Vijay Past President, ICSI

ICSI Updates: Netalyzr Nicholas Weaver International Computer Science Institute 1

The ICSI Haystack A Platform for Hybrid Mobile Measurements in the Wild Narseo

Broker Bros New Communication Library Robin Sommer ICSI / LBNL / Broala

A First Look at Modern Enterprise Traffic Ruoming Pang , Princeton University Mark Allman ( ICSI

Open and Public Meetings Act Open and Public Meetings Act Purpose of the Open and Public Meetings

5/12/2016 Ground Rules for Meetings Ground Rules for Meetings (contd) The ground rules for

5/4/2016 Ground Rules for Meetings Ground Rules for Meetings (contd) The ground rules for the

Monday Tuesday Wednesday Thursday Friday 8:50 -9am Registration Maths meetings /Pre- Maths

@OhioComplete Complete to Compete Regional Meetings Complete to Compete Regional Meetings

The ICSI Meeting Corpus Barbara Peskin [on behalf of ICSIs MeetingRecorder Team]

ICANN Public Meetings Consolidated Meetings Strategy for 2014 2015 2016 Goals: Adopt

Open Public Open Public Meetings Meetings Act Act ____________________________________________

Implicit Prosodic Priming and Autistic Traits in Relative Clause Attachment Sun-Ah Jun &amp;

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN 6820 - Speech and Audio

Some Cyclicity and Opacity Effects in the Prosody of Two Different Clitic Classes in New-

Announcements Extra credit Dont forget to enter questions on Canvas. At least 1 on

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

Critical Analysis &amp; the Reading Process revised: 07.21.12 || English 1302: Composition II ||

LECTURE 25: PRESENTATIONS CSE 442 Software Engineering My Story My Story My Solution, Part

RUNNING FROM BEARS SOLVING YOUR PROBLEMS BY LETTING CHARACTERS SOLVE THEIRS 1 WHO AM I? 2

Sambuz

Useful Links

Newsletter

Mail Us

Implicit Prosodic Priming and Autistic Traits in Relative Clause Attachment Sun-Ah Jun &

Critical Analysis & the Reading Process revised: 07.21.12 || English 1302: Composition II ||