meeting recorder audio processing
play

Meeting Recorder: Audio Processing Dan Ellis - PowerPoint PPT Presentation

Meeting Recorder: Audio Processing Dan Ellis <dpwe@ee.columbia.edu> Lab ROSA , Columbia University and ICSI, Berkeley Outline 1 ICSI Meeting Recorder 2 Close-mics: cancellation & turn estimation 3 Tabletop mics: turns &


  1. Meeting Recorder: Audio Processing Dan Ellis <dpwe@ee.columbia.edu> Lab ROSA , Columbia University and ICSI, Berkeley Outline 1 ICSI Meeting Recorder 2 Close-mics: cancellation & turn estimation 3 Tabletop mics: turns & speaker location 4 Visualization tools 5 Future Work Meeting Audio - Dan Ellis 2002-08-29 - 1/11

  2. ICSI Meeting Recorder data 1 (with UW, SRI, IBM, Columbia) • Microphones in conventional meetings - for summarization/retrieval/behavior analysis - informal, overlapped speech • Data collection (ICSI, UW, ...): - 100 hours collected, ongoing transcription • NSF ‘Mapping Meetings’ project - also interest from NIST, DARPA Meeting Audio - Dan Ellis 2002-08-29 - 2/11

  3. Data from the ICSI project TX1 Lapel mic ICSI Meeting Recorder Room Audio Setup Wireless TX2 2000-05-05 headsets TX3 TX4 TX5 Audio PC Ambient PZM1 Wireless RX 5 mics ADAT lightpipe Mackie PZM2 2 A/D 1 MainL/R mixer STUDI/O PCI card PZM3 2 Aux1/2 ADAT lightpipe A/D 2 PZM4 6/8 2 Jimlet Dummy JimBox PDA 2 Jimlet PSU & breakout 2 Jimlet Computer headsets 2 Jimlet Notes: 1. The JimBox and the Jimlets are the custom electronics manufactured at ICSI to interface PC-style headsets to pro-audio XLR. • 16 channels @ 16 kHz, 16 bit • Preprocessing Avg spec, 20s, mr-2000-11-02-1440-chanE (pzm) 60 level / dB - high-pass filter! 40 20 - 64 sample skew! 0 20 1 2 3 4 10 10 10 10 freq / Hz Meeting Audio - Dan Ellis 2002-08-29 - 3/11

  4. Close-mic channels 2 backchannel floor seizure (signals desire to regain floor?) mr-2000-06-30-1600 Spkr A speaker active speaker B Spkr B cedes floor Spkr C interruptions Spkr D breath noise Spkr E crosstalk level/dB 40 Table 20 top 0 120 125 130 135 140 145 150 155 time / secs • Crosstalk • Speaker activity detection Meeting Audio - Dan Ellis 2002-08-29 - 4/11

  5. Impulse response coupling • Cross-correlation recovers impulse response Example cross coupling response, chan3 to chan0 0.02 0 0.02 20 E / dB 40 (8 pt hann) 60 80 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 time / s • Coupling to each mic gives motion participant movement mr-2000-11-02-1440 20 freq / chan 15 Spkr C 10 5 20 delay / samples 15 Coupling 10 C → A 5 20 delay / samples 15 Coupling 10 C → Tabletop 5 1020 1020.5 1021 1021.5 1022 1022.5 1023 1023.5 1024 1024.5 time / sec Meeting Audio - Dan Ellis 2002-08-29 - 5/11

  6. Speaker Activity Detection (with Sam Keene) m C s ⋅ n • Noisy crosstalk model: = + • Estimate subband C xA from A’s peak energy - i.e. ‘sparsity’ assumption - ... then linear inversion to recover speaker act. • 20 subband crosstalk gains for each spkr x mic mr-2000-06-30-1600 chan0 0B 0A 20 0 0 frq chan -10 10 -20 -20 0 -30 -40 0 mr-2000-06-30-1600 chanB B0 BA 20 0 0 10 -50 -50 0 -100 -100 mr-2000-06-30 1600 chanA A0 AB 0 -20 20 -40 10 -50 -60 0 -100 -80 125 130 135 140 145 150 155 160 0 10 20 0 10 20 time/s Meeting Audio - Dan Ellis 2002-08-29 - 6/11

  7. Tabletop mics: Turn detection 3 • 4 mics ~ 1m separated along 5 4 center of table 3 2 - 3 timing differences 1 0 - slight L/R offset to -1 -2 -3 disambiguate -4 -5 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 • Hi-res cross-correlation for timings - use normalized peak value for confidence - cluster results mr-2000-11-02-1440: PZM xcorr lags 1 Example cross coupling response, chan3 to chan0 4 300 0 250 100xR skew/samps lag 3-4 / ms 200 -1 150 100 3 -2 50 2 1 0 0 50 100 150 200 250 300 -3 time / s -3 -2 -1 0 1 2 3 lag 1-2 / ms Meeting Audio - Dan Ellis 2002-08-29 - 7/11

  8. Speaker localization (with Huan Wei Hee) • Timing differences → speaker positions (x,y,z) Inferred talker positions ( x =mic) 4 1 2 z / m 0.5 2 3 1 0 0 x / m 2 1 0 -1 -2 -2 y / m - gradient descent on implied ∆ t s • Ambiguity: - mic positions not fixed - speaker motions • Iterative estimation of speaker, mic locations Meeting Audio - Dan Ellis 2002-08-29 - 8/11

  9. Visualization: transPlotter 3 • Speaker turn patterns are informative • Browser for ‘high-level’ view, quick examination - snack, iwidgets based - public release Meeting Audio - Dan Ellis 2002-08-29 - 9/11

  10. Meeting IR tool • IR on (ASR) transcripts from meetings - repurposed from Thisl project Meeting Audio - Dan Ellis 2002-08-29 - 10/11

  11. Future work 5 • Speaker turns - evaluation of close-mic system - speaker characteristics for tabletop mics • Nonspeech events - unsupervised clustering of audio - finding the feature space... • Speech fragment recognition - missing-data recognition based on ‘good’ signal - recognition of overlapping voices • High-level browsing - the ‘meeting map’ concept - summarization Meeting Audio - Dan Ellis 2002-08-29 - 11/11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend