RESPITE: Tandem & multistream research Dan Ellis International - - PowerPoint PPT Presentation

respite tandem multistream research
SMART_READER_LITE
LIVE PREVIEW

RESPITE: Tandem & multistream research Dan Ellis International - - PowerPoint PPT Presentation

RESPITE: Tandem & multistream research Dan Ellis International Computer Science Institute, Berkeley CA <dpwe@icsi.berkeley.edu> Outline 1 Tandem & LVCSR 2 Mutual information for multistream design 3 Other multistream work at


slide-1
SLIDE 1

ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 1

RESPITE: Tandem & multistream research

Dan Ellis International Computer Science Institute, Berkeley CA <dpwe@icsi.berkeley.edu>

Outline Tandem & LVCSR Mutual information for multistream design Other multistream work at ICSI Other projects:

  • Meeting recorder
  • LabROSA
  • CRAC workshop

1 2 3 4

slide-2
SLIDE 2

ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 2

Recent Tandem work

  • Aurora 2000 (mismatched test conditions)
  • normalization much more important: online?
  • baseline WER ratio (smaller is better):

(Pratibha Jain, OGI)

System Matched test Medium mismatch High mismatch plp, utt-norm 78% 69% 63% tandem, utt-norm 63% 73% 52% tandem, onl-norm 74% 81% 64%

1

plp

Input sound

Neural net classifier

C0 C1 C2 Ck tn tn+w h# pcl bcl tcl dcl

PCA

  • rthog'n

msg Neural net classifier

C0 C1 C2 Ck tn tn+w h# pcl bcl tcl dcl

Gauss mix models HTK decoder

Words

s ah t

+

Tandem combo over HTK mfcc baseline: +53%

Combo-into-HTK over Combo-into-noway: +15% Combo over msg: +20% NN over HTK: +15% Combo over mfcc: +25% Tandem over hybrid: +25% Tandem over HTK: +35% Combo over plp: +20% KLT over direct: +8% Pre-nonlinearity over posteriors: +12%

slide-3
SLIDE 3

ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 3

Tandem for LVCSR

  • DARPA SPINE task (spont. noisy) (e.g.)
  • Collaboration with OGI & CMU
  • tandem needs GMM-HMM expertise!
  • Tight timescale
  • Tandem system not optimized, one stream
  • Evaluation submitted, results not yet official
  • unofficial WERs:

MFCC/SPHINX: 35% Tandem/SPHINX: 30.1% full-up CMU (ROVER+MLLR): 26.5% CMU + Tandem (ROVER): 25.7%

  • Conclusions:
  • Tandem from CI labels still tractable for LV
  • improvements may not be so dramatic
slide-4
SLIDE 4

ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 4

Current Tandem work

  • Aurora 2000: Cross-language
  • training Finnish & Italian systems
  • union of all phone sets?
  • clustering of cross-language phones
  • Other targets for neural net training
  • HMM states
  • articulatory targets
  • System variants
  • ‘mixture of posteriors’
  • Transfer to DC

Speech features

Feature calculation

Input sound

Neural net classifier

Phone probabilities

C0 C1 C2 Ck tn tn+w h# pcl bcl tcl dcl

HMM decoder

Words

s

sub-phone states mixture weights

ah t

Rottland & Rigoll (2000)

slide-5
SLIDE 5

ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 5

Mutual Info for multistream design

  • Combination best for complementary streams
  • Try to predict by looking at Mutual Information:
  • low classification MI implies different information
  • Can also use to choose combination point
  • feature combination (concatenation) for streams

with interdependence (high feature MI)

  • else posterior (post-classifier) combination

2

  • 2

2

  • 3
  • 2
  • 1

1 2 3 4

  • 2

2

  • 4
  • 2

2 4

PLPa:2 PLPa:2 MSGa:2 MSGa:14 Hx = 7.30, Hy = 6.99, MI = 0.52 Hx = 7.17, Hy = 6.88, MI = 0.03

slide-6
SLIDE 6

ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 6

MI for multistream: results

  • Low Classif. CMI correlates with good pairs
  • PC vs. FC more complex than Feature CMI

Stream 1 Stream 2 Feature CMI Classif CMI FC WER ratio PC WER ratio PLPa PLPb 0.04 0.26 89.6% 97.6% MSGa MSGb 0.21 0.25 85.8% 101.1% PLPa MSGb 0.11 0.15 78.1% 86.3% PLPb MSGa 0.09 0.24 87.5% 89.7%

0.1 0.2 0.3 0.4 0.5

PLPa PLPb MSGa MSGb

CMI/bits

PLPa PLPb MSGa MSGb

Conditional Mutual Information between feature streams

slide-7
SLIDE 7

ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 7

Other Multistream work: Multifeature combination (Mike Shire)

  • LDA design of condition-dependent features:
  • Combine various conditions, test on all:

3

Critical Band Power Log LDA Scale + Smooth LPC Cepstra Critical Band Power Log RASTA Filter filter train speech test speech

  • 0.5

0.5 heavy light clean seconds LDA filter 1 10 10

1

30 25 20 15 10 5 Hz dB clean light heavy

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 30 35 40 45 50 55 60 65 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 Frame Accuracy% T60

clean reverb weighting clean reverb weighting mild severe

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10 20 30 40 50 60 70 WER% 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 T60

LDA-RASTA-PLP: Combining CLEAN and REVERB

Frame Accuracy Word Error Rate

mild severe

slide-8
SLIDE 8

ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 8

‘Oracle nets’ for FC multistream (Barry Chen)

  • 4 bands → 15 combinations (+priors):

(smoothed) ‘oracle’ choice halves WER

  • Can we train a net to make ‘oracle’ choice?
  • based on KL distance between posteriors?
  • Not much help in practice...

System Word Error Rate best net (4 band) 5.1% phone-smoothed oracle 2.7% KL oracle-net weighted streams 4.9%

slide-9
SLIDE 9

ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 9

Other projects: Meeting recorder

  • ASR in conventional meeting environments
  • for transcription/summarization/retrieval
  • distant acoustics!
  • informal, overlapped speech (c/w ShATR)
  • Data collection:
  • wired room at ICSI
  • other systems at UW ...

4

slide-10
SLIDE 10

ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 10

Meeting Recorder (cont’d)

  • Preliminary analysis
  • transcription & forced alignment (IBM)
  • ground truth in turns/overlaps
  • preliminary distant-mic recordings
  • Research areas
  • meeting dialog: overlaps, turns etc.
  • language modeling for meetings
  • feature design for distant acoustics
  • Future support
  • DARPA ‘ROAR’ program?
slide-11
SLIDE 11

ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 11

LabROSA:

The Laboratory for Recognition and Organization

  • f Speech and Audio
  • New research group at Columbia University in

the City of New York

  • existing EE dept. signal processing group
  • addition of speech/audio for true multimedia
  • Research: extracting information from sound
  • real-world ASR
  • higher-order: speaker ID, dialog structure
  • nonspeech: events, acoustic environment ID
  • Recruiting students

http://www.ctr.columbia.edu/~dpwe/LabROSA/

slide-12
SLIDE 12

ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 12

CRAC2001:

“Consistent and Reliable Acoustic Cues

for speech and sound analysis”

http://www.ee.columbia.edu/CRAC2001/

  • RESPITE Contractual Obligation Workshop:
  • Identifying sources/info (CASA, BSS, SNR est)
  • Robust ASR (MD, MS, compensation)
  • Nonspeech, music applications
  • Psychoacoustics of perception in noise
  • Combinations
  • Satellite event at Eurospeech-2001, Aarhus
  • held on Sunday 2000-09-02 (day before)

at Eurospeech location

  • separate registration
  • Workshop structure
  • lecture + posters, am + pm, discussion
  • limit to ~ 40 participants
slide-13
SLIDE 13

ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 13

CRAC2001 (cont’d)

  • Organizing committee
  • Dan & Martin, co-chairs
  • Fred, Phil, Andy
  • Andrzej Drygajlo (EPFL) & H. Okuno (CASA)
  • Timetable:
  • CFP: imminent
  • Abstracts: April 30th, 2001
  • Actions:
  • help with publicity
  • plan your submission!