Meeting Recorder: Audio Processing Dan Ellis - - PowerPoint PPT Presentation

meeting recorder audio processing
SMART_READER_LITE
LIVE PREVIEW

Meeting Recorder: Audio Processing Dan Ellis - - PowerPoint PPT Presentation

Meeting Recorder: Audio Processing Dan Ellis <dpwe@ee.columbia.edu> Lab ROSA , Columbia University and ICSI, Berkeley Outline 1 ICSI Meeting Recorder 2 Close-mics: cancellation & turn estimation 3 Tabletop mics: turns &


slide-1
SLIDE 1

Meeting Audio - Dan Ellis 2002-08-29 - 1/11

Meeting Recorder: Audio Processing

Dan Ellis <dpwe@ee.columbia.edu> LabROSA, Columbia University and ICSI, Berkeley

Outline ICSI Meeting Recorder Close-mics: cancellation & turn estimation Tabletop mics: turns & speaker location Visualization tools Future Work 1 2 3 4 5

slide-2
SLIDE 2

Meeting Audio - Dan Ellis 2002-08-29 - 2/11

ICSI Meeting Recorder data

(with UW, SRI, IBM, Columbia)

  • Microphones in conventional meetings
  • for summarization/retrieval/behavior analysis
  • informal, overlapped speech
  • Data collection (ICSI, UW, ...):
  • 100 hours collected, ongoing transcription
  • NSF ‘Mapping Meetings’ project
  • also interest from NIST, DARPA

1

slide-3
SLIDE 3

Meeting Audio - Dan Ellis 2002-08-29 - 3/11

Data from the ICSI project

  • 16 channels @ 16 kHz, 16 bit
  • Preprocessing
  • high-pass filter!
  • 64 sample skew!

Audio PC STUDI/O PCI card A/D 1 A/D 2 Wireless RX 5 2 2 2 2 2 2 6/8 TX1 TX2 TX3 TX4 TX5 PZM1 PZM2 PZM3 PZM4 Dummy PDA Mackie mixer JimBox PSU & breakout Jimlet Jimlet Jimlet Jimlet

Notes:

  • 1. The JimBox and the Jimlets are the custom electronics

manufactured at ICSI to interface PC-style headsets to pro-audio XLR. ADAT lightpipe Lapel mic Wireless headsets Computer headsets Ambient mics ADAT lightpipe MainL/R Aux1/2

ICSI Meeting Recorder Room Audio Setup 2000-05-05

10

1

10

2

10

3

10

4

20 20 40 60 Avg spec, 20s, mr-2000-11-02-1440-chanE (pzm) freq / Hz level / dB

slide-4
SLIDE 4

Meeting Audio - Dan Ellis 2002-08-29 - 4/11

Close-mic channels

  • Crosstalk
  • Speaker activity detection

2

120 125 130 135 140 145 150 155 time / secs

speaker active

level/dB mr-2000-06-30-1600

Spkr A Spkr B Spkr C Spkr D Spkr E Table top

20 40

breath noise crosstalk backchannel (signals desire to regain floor?) floor seizure interruptions speaker B cedes floor

slide-5
SLIDE 5

Meeting Audio - Dan Ellis 2002-08-29 - 5/11

Impulse response coupling

  • Cross-correlation recovers impulse response
  • Coupling to each mic gives motion

0.02 0.02 Example cross coupling response, chan3 to chan0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 80 60 40 20 time / s E / dB (8 pt hann) 5 10 15 20 5 10 15 20 1020 freq / chan delay / samples delay / samples 1020.5 1021 1021.5 1022 1022.5 1023 1023.5 1024 1024.5 time / sec mr-2000-11-02-1440 5 10 15 20

Spkr C Coupling C → A Coupling C → Tabletop participant movement

slide-6
SLIDE 6

Meeting Audio - Dan Ellis 2002-08-29 - 6/11

Speaker Activity Detection

(with Sam Keene)

  • Noisy crosstalk model:
  • Estimate subband CxA from A’s peak energy
  • i.e. ‘sparsity’ assumption
  • ... then linear inversion to recover speaker act.
  • 20 subband crosstalk gains for each spkr x mic

m C s ⋅ n + =

mr-2000-06-30-1600 chan0 20 10 mr-2000-06-30-1600 chanB 20 10 mr-2000-06-30 1600 chanA 125 130 135 140 145 150 155 160 20 10 time/s frq chan

  • 30
  • 20
  • 10

0B

  • 40
  • 20

0A

  • 100
  • 50

B0

  • 100
  • 50

BA 10 20

  • 100
  • 50

A0 10 20

  • 80
  • 60
  • 40
  • 20

AB

slide-7
SLIDE 7

Meeting Audio - Dan Ellis 2002-08-29 - 7/11

Tabletop mics: Turn detection

  • 4 mics ~ 1m separated along

center of table

  • 3 timing differences
  • slight L/R offset to

disambiguate

  • Hi-res cross-correlation for timings
  • use normalized peak value for confidence
  • cluster results

3

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5
0.5 1 1.5 2 2.5
  • 5
  • 4
  • 3
  • 2
  • 1
1 2 3 4 5
  • 3
  • 2
  • 1

1 2 3

  • 3
  • 2
  • 1

1 lag 1-2 / ms lag 3-4 / ms

mr-2000-11-02-1440: PZM xcorr lags

4 1 2 3

50 100 150 200 250 300 50 100 150 200 250 300 time / s 100xR skew/samps Example cross coupling response, chan3 to chan0

slide-8
SLIDE 8

Meeting Audio - Dan Ellis 2002-08-29 - 8/11

Speaker localization

(with Huan Wei Hee)

  • Timing differences → speaker positions (x,y,z)
  • gradient descent on implied ∆t s
  • Ambiguity:
  • mic positions not fixed
  • speaker motions
  • Iterative estimation of speaker, mic locations
  • 2

2

  • 2
  • 1

1 2 0.5 1

Inferred talker positions (x=mic)

x / m y / m z / m 4 1 2 3

slide-9
SLIDE 9

Meeting Audio - Dan Ellis 2002-08-29 - 9/11

Visualization: transPlotter

  • Speaker turn patterns are informative
  • Browser for ‘high-level’ view, quick examination
  • snack, iwidgets based
  • public release

3

slide-10
SLIDE 10

Meeting Audio - Dan Ellis 2002-08-29 - 10/11

Meeting IR tool

  • IR on (ASR) transcripts from meetings
  • repurposed from Thisl project
slide-11
SLIDE 11

Meeting Audio - Dan Ellis 2002-08-29 - 11/11

Future work

  • Speaker turns
  • evaluation of close-mic system
  • speaker characteristics for tabletop mics
  • Nonspeech events
  • unsupervised clustering of audio
  • finding the feature space...
  • Speech fragment recognition
  • missing-data recognition based on ‘good’ signal
  • recognition of overlapping voices
  • High-level browsing
  • the ‘meeting map’ concept
  • summarization

5