respite tandem multistream research
play

RESPITE: Tandem & multistream research Dan Ellis International - PowerPoint PPT Presentation

RESPITE: Tandem & multistream research Dan Ellis International Computer Science Institute, Berkeley CA <dpwe@icsi.berkeley.edu> Outline 1 Tandem & LVCSR 2 Mutual information for multistream design 3 Other multistream work at


  1. RESPITE: Tandem & multistream research Dan Ellis International Computer Science Institute, Berkeley CA <dpwe@icsi.berkeley.edu> Outline 1 Tandem & LVCSR 2 Mutual information for multistream design 3 Other multistream work at ICSI 4 Other projects: • Meeting recorder • LabROSA • CRAC workshop ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 1

  2. Recent Tandem work 1 Combo over msg: +20% plp Neural net classifier h# C 0 pcl bcl C 1 tcl dcl C 2 C k t n t n+w PCA Gauss mix HTK models decoder Pre-nonlinearity over orthog'n + posteriors: +12% Input sound s ah t msg Neural net Words Combo-into-HTK over KLT over classifier Combo-into-noway: direct: h# C 0 pcl bcl C 1 tcl +15% dcl +8% C 2 C k t n Combo over plp: t n+w +20% Combo over mfcc: NN over HTK: Tandem over HTK: Tandem over hybrid: +25% +15% +35% +25% Tandem combo over HTK mfcc baseline: +53% • Aurora 2000 (mismatched test conditions) - normalization much more important: online? - baseline WER ratio (smaller is better): System Matched test Medium mismatch High mismatch plp, utt-norm 78% 69% 63% tandem, utt-norm 63% 73% 52% tandem, onl-norm 74% 81% 64% (Pratibha Jain, OGI) ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 2

  3. Tandem for LVCSR • DARPA SPINE task (spont. noisy) (e.g.) • Collaboration with OGI & CMU - tandem needs GMM-HMM expertise! • Tight timescale - Tandem system not optimized, one stream • Evaluation submitted, results not yet official - unofficial WERs: MFCC/SPHINX: 35% Tandem/SPHINX: 30.1% full-up CMU (ROVER+MLLR): 26.5% CMU + Tandem (ROVER): 25.7% • Conclusions: - Tandem from CI labels still tractable for LV - improvements may not be so dramatic ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 3

  4. Current Tandem work • Aurora 2000: Cross-language - training Finnish & Italian systems - union of all phone sets? - clustering of cross-language phones • Other targets for neural net training - HMM states - articulatory targets • System variants - ‘mixture of posteriors’ Rottland & Rigoll (2000) HMM decoder s ah t Feature Neural net calculation classifier sub-phone states h# C 0 pcl bcl C 1 tcl dcl C 2 C k t n mixture weights t n+w Phone Input Speech Words probabilities sound features • Transfer to DC ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 4

  5. Mutual Info for multistream design 2 • Combination best for complementary streams • Try to predict by looking at Mutual Information: - low classification MI implies different information • Can also use to choose combination point - feature combination (concatenation) for streams with interdependence ( high feature MI) - else posterior (post-classifier) combination Hx = 7.30, Hy = 6.99, MI = 0.52 Hx = 7.17, Hy = 6.88, MI = 0.03 4 4 3 2 2 MSGa:14 MSGa:2 1 0 0 -1 -2 -2 -3 -4 -2 0 2 -2 0 2 PLPa:2 PLPa:2 ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 5

  6. MI for multistream: results Conditional Mutual Information between feature streams MSGb MSGa PLPb 0.5 0.4 PLPa 0.3 0.2 0.1 0 CMI/bits PLPa PLPb MSGa MSGb Stream 1 Stream 2 Feature CMI Classif CMI FC WER ratio PC WER ratio PLPa PLPb 0.04 0.26 89.6% 97.6% MSGa MSGb 0.21 0.25 85.8% 101.1% PLPa MSGb 0.11 0.15 78.1% 86.3% PLPb MSGa 0.09 0.24 87.5% 89.7% • Low Classif. CMI correlates with good pairs • PC vs. FC more complex than Feature CMI ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 6

  7. Other Multistream work: 3 Multifeature combination (Mike Shire) • LDA design of condition-dependent features: Critical filter Log Band LDA train Power speech Critical RASTA Scale + LPC test Log Band Cepstra Filter Smooth speech Power 0 5 clean 10 LDA filter 1 dB 15 light 20 clean light 25 heavy heavy 30 -0.5 0 0.5 0 1 10 10 seconds Hz • Combine various conditions, test on all: LDA-RASTA-PLP: Combining CLEAN and REVERB Frame Accuracy Word Error Rate 65 70 mild T60 2.5 60 0.25 2.25 60 2 severe 1.75 1.5 55 1.25 50 0.5 Frame Accuracy% 1 50 WER% 0.75 40 0.75 45 1 1.25 0.5 30 1.5 40 severe 1.75 2 2.25 2.5 20 0.25 35 mild T60 30 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 reverb clean reverb clean weighting weighting ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 7

  8. ‘Oracle nets’ for FC multistream (Barry Chen) 4 bands → 15 combinations (+priors): • (smoothed) ‘oracle’ choice halves WER • Can we train a net to make ‘oracle’ choice? - based on KL distance between posteriors? System Word Error Rate best net (4 band) 5.1% phone-smoothed oracle 2.7% KL oracle-net weighted streams 4.9% • Not much help in practice... ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 8

  9. Other projects: Meeting recorder 4 • ASR in conventional meeting environments - for transcription/summarization/retrieval - distant acoustics! - informal, overlapped speech (c/w ShATR) • Data collection: - wired room at ICSI - other systems at UW ... ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 9

  10. Meeting Recorder (cont’d) • Preliminary analysis - transcription & forced alignment (IBM) - ground truth in turns/overlaps - preliminary distant-mic recordings • Research areas - meeting dialog: overlaps, turns etc. - language modeling for meetings - feature design for distant acoustics • Future support - DARPA ‘ROAR’ program? ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 10

  11. LabROSA: The Laboratory for Recognition and Organization of Speech and Audio • New research group at Columbia University in the City of New York - existing EE dept. signal processing group - addition of speech/audio for true multimedia • Research: extracting information from sound - real-world ASR - higher-order: speaker ID, dialog structure - nonspeech: events, acoustic environment ID • Recruiting students http://www.ctr.columbia.edu/~dpwe/LabROSA/ ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 11

  12. CRAC2001: “ Consistent and Reliable Acoustic Cues for speech and sound analysis” http://www.ee.columbia.edu/CRAC2001/ • RESPITE Contractual Obligation Workshop: - Identifying sources/info (CASA, BSS, SNR est) - Robust ASR (MD, MS, compensation) - Nonspeech, music applications - Psychoacoustics of perception in noise - Combinations • Satellite event at Eurospeech-2001, Aarhus - held on Sunday 2000-09-02 (day before) at Eurospeech location - separate registration • Workshop structure - lecture + posters, am + pm, discussion - limit to ~ 40 participants ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 12

  13. CRAC2001 (cont’d) • Organizing committee - Dan & Martin, co-chairs - Fred, Phil, Andy - Andrzej Drygajlo (EPFL) & H. Okuno (CASA) • Timetable: - CFP: imminent - Abstracts: April 30th, 2001 • Actions: - help with publicity - plan your submission! ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend