microphone array processing a quick update
play

Microphone Array Processing : A Quick Update Iain McCowan - PowerPoint PPT Presentation

Microphone Array Processing : A Quick Update Iain McCowan Guillaume Lathoud, Darren Moore, Olivier Masson TAM May 2003 p. 1/6 Outline Speech enhancement Speaker segmentation Files available online TAM May 2003 p. 2/6


  1. Microphone Array Processing : A Quick Update Iain McCowan Guillaume Lathoud, Darren Moore, Olivier Masson TAM May 2003 – p. 1/6

  2. Outline • Speech enhancement • Speaker segmentation • Files available online TAM May 2003 – p. 2/6

  3. Speech enhancement • Improving enhancement in overlapping speech by post-filtering beamformer outputs • Beamformer outputs : y n ( f ) for each speaker location n = 1 : N • Post-filter A (Wiener-like - | S | 2 | N | 2 ) • | y n ( f ) | 2 y n ( f ) = ˆ m � = n | y m ( f ) | 2 y n ( f ) (1) 1 � N − 1 • Post-filter B (Binary Mask) • � y n ( f ) n = arg max m y m ( f ) y n ( f ) = ˆ (2) 0 otherwise TAM May 2003 – p. 3/6

  4. Speech enhancement • Subjectively, post-filter B leads to significant reduction in cross-talk level. • To verify, initial recognition experiments • MONC (Multi-channel Overlapping Numbers Corpus - re-recording of Numbers 95). Note : baseline lapel with no conflicting speech is 7.0% WER. • With one overlapping speaker (word error rates) : Lapel Previous Array Best Post-filter B 26.7 19.3 12.2 • With two overlapping speaker : Lapel Previous Array Best Post-filter B 35.3 26.6 15.8 TAM May 2003 – p. 4/6

  5. Speaker Segmentation • Previously, presented work on segmenting using location features. • Since then... • Now doing clustering and segmentation using both location features and standard acoustic features across meetings. • Segment in terms of location and identity (cluster index) concurrently. • Using multi-stream HMM to cluster in each space independently, but enforce same temporal segmentation. • Automatically converges to correct number of locations and identities. • Initial results show high segmentation accuracy ( ≈ 95% frame accuracy). TAM May 2003 – p. 5/6

  6. Files available online • Now appearing on mmm.idiap.ch • Beamformer outputs for Post-filter A and B for each seated speaker location (1-4). (Scripted Meeting set only). • Beamformer-B files have lower noise, though perhaps more distortion than Beamformer-A. • Beamformer outputs for whiteboard and presentation not yet available. • current beamformers are too precise for the typical movement in these regions - investigating minimum beam-width constraint or adaptive techniques. • Beamformer-B mix file available (BeamB-mix) - simple sum of 4 speaker beamformers. • remember, this does not yet cater for white-board or presentation speech. • currently, low level buzz apparent in this mix file... to be fixed. TAM May 2003 – p. 6/6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend