Microphone Array Processing : A Quick Update Iain McCowan - - PowerPoint PPT Presentation

microphone array processing a quick update
SMART_READER_LITE
LIVE PREVIEW

Microphone Array Processing : A Quick Update Iain McCowan - - PowerPoint PPT Presentation

Microphone Array Processing : A Quick Update Iain McCowan Guillaume Lathoud, Darren Moore, Olivier Masson TAM May 2003 p. 1/6 Outline Speech enhancement Speaker segmentation Files available online TAM May 2003 p. 2/6


slide-1
SLIDE 1

Microphone Array Processing : A Quick Update

Iain McCowan Guillaume Lathoud, Darren Moore, Olivier Masson

TAM May 2003 – p. 1/6

slide-2
SLIDE 2

Outline

  • Speech enhancement
  • Speaker segmentation
  • Files available online

TAM May 2003 – p. 2/6

slide-3
SLIDE 3

Speech enhancement

  • Improving enhancement in overlapping speech by post-filtering

beamformer outputs

  • Beamformer outputs : yn(f) for each speaker location n = 1 : N
  • Post-filter A (Wiener-like - |S|2

|N|2 )

  • ˆ

yn(f) = |yn(f)|2

1 N−1

  • m=n |ym(f)|2 yn(f)

(1)

  • Post-filter B (Binary Mask)
  • ˆ

yn(f) =

  • yn(f)

n = arg maxm ym(f)

  • therwise

(2)

TAM May 2003 – p. 3/6

slide-4
SLIDE 4

Speech enhancement

  • Subjectively, post-filter B leads to significant reduction in

cross-talk level.

  • To verify, initial recognition experiments
  • MONC (Multi-channel Overlapping Numbers Corpus -

re-recording of Numbers 95). Note : baseline lapel with no conflicting

speech is 7.0% WER.

  • With one overlapping speaker (word error rates) :

Lapel Previous Array Best Post-filter B 26.7 19.3 12.2

  • With two overlapping speaker :

Lapel Previous Array Best Post-filter B 35.3 26.6 15.8

TAM May 2003 – p. 4/6

slide-5
SLIDE 5

Speaker Segmentation

  • Previously, presented work on segmenting using location

features.

  • Since then...
  • Now doing clustering and segmentation using both location

features and standard acoustic features across meetings.

  • Segment in terms of location and identity (cluster index)

concurrently.

  • Using multi-stream HMM to cluster in each space

independently, but enforce same temporal segmentation.

  • Automatically converges to correct number of locations and

identities.

  • Initial results show high segmentation accuracy (≈ 95%

frame accuracy).

TAM May 2003 – p. 5/6

slide-6
SLIDE 6

Files available online

  • Now appearing on mmm.idiap.ch
  • Beamformer outputs for Post-filter A and B for each seated

speaker location (1-4). (Scripted Meeting set only).

  • Beamformer-B files have lower noise, though perhaps more

distortion than Beamformer-A.

  • Beamformer outputs for whiteboard and presentation not yet

available.

  • current beamformers are too precise for the typical movement in these

regions - investigating minimum beam-width constraint or adaptive techniques.

  • Beamformer-B mix file available (BeamB-mix) - simple sum
  • f 4 speaker beamformers.
  • remember, this does not yet cater for white-board or presentation speech.
  • currently, low level buzz apparent in this mix file... to be fixed.

TAM May 2003 – p. 6/6