Microphone Array Processing : A Quick Update Iain McCowan - - PowerPoint PPT Presentation

▶

Oct 07, 2022 267 likes •330 views

Microphone Array Processing : A Quick Update Iain McCowan Guillaume Lathoud, Darren Moore, Olivier Masson TAM May 2003 p. 1/6 Outline Speech enhancement Speaker segmentation Files available online TAM May 2003 p. 2/6

SLIDE 1

Microphone Array Processing : A Quick Update

Iain McCowan Guillaume Lathoud, Darren Moore, Olivier Masson

TAM May 2003 – p. 1/6

SLIDE 2

Outline

Speech enhancement
Speaker segmentation
Files available online

TAM May 2003 – p. 2/6

SLIDE 3

Speech enhancement

Improving enhancement in overlapping speech by post-filtering

beamformer outputs

Beamformer outputs : yn(f) for each speaker location n = 1 : N
Post-filter A (Wiener-like - |S|2

|N|2 )

yn(f) = |yn(f)|2

1 N−1

m=n |ym(f)|2 yn(f)

(1)

Post-filter B (Binary Mask)
ˆ

yn(f) =

yn(f)

n = arg maxm ym(f)

therwise

(2)

TAM May 2003 – p. 3/6

SLIDE 4

Speech enhancement

Subjectively, post-filter B leads to significant reduction in

cross-talk level.

To verify, initial recognition experiments
MONC (Multi-channel Overlapping Numbers Corpus -

re-recording of Numbers 95). Note : baseline lapel with no conflicting

speech is 7.0% WER.

With one overlapping speaker (word error rates) :

Lapel Previous Array Best Post-filter B 26.7 19.3 12.2

With two overlapping speaker :

Lapel Previous Array Best Post-filter B 35.3 26.6 15.8

TAM May 2003 – p. 4/6

SLIDE 5

Speaker Segmentation

Previously, presented work on segmenting using location

features.

Since then...
Now doing clustering and segmentation using both location

features and standard acoustic features across meetings.

Segment in terms of location and identity (cluster index)

concurrently.

Using multi-stream HMM to cluster in each space

independently, but enforce same temporal segmentation.

Automatically converges to correct number of locations and

identities.

Initial results show high segmentation accuracy (≈ 95%

frame accuracy).

TAM May 2003 – p. 5/6

SLIDE 6

Files available online

Now appearing on mmm.idiap.ch
Beamformer outputs for Post-filter A and B for each seated

speaker location (1-4). (Scripted Meeting set only).

Beamformer-B files have lower noise, though perhaps more

distortion than Beamformer-A.

Beamformer outputs for whiteboard and presentation not yet

available.

current beamformers are too precise for the typical movement in these

regions - investigating minimum beam-width constraint or adaptive techniques.

Beamformer-B mix file available (BeamB-mix) - simple sum
f 4 speaker beamformers.
remember, this does not yet cater for white-board or presentation speech.
currently, low level buzz apparent in this mix file... to be fixed.

TAM May 2003 – p. 6/6