Meeting Audio - Dan Ellis 2002-08-29 - 1/11
Meeting Recorder: Audio Processing Dan Ellis - - PowerPoint PPT Presentation
Meeting Recorder: Audio Processing Dan Ellis - - PowerPoint PPT Presentation
Meeting Recorder: Audio Processing Dan Ellis <dpwe@ee.columbia.edu> Lab ROSA , Columbia University and ICSI, Berkeley Outline 1 ICSI Meeting Recorder 2 Close-mics: cancellation & turn estimation 3 Tabletop mics: turns &
Meeting Audio - Dan Ellis 2002-08-29 - 2/11
ICSI Meeting Recorder data
(with UW, SRI, IBM, Columbia)
- Microphones in conventional meetings
- for summarization/retrieval/behavior analysis
- informal, overlapped speech
- Data collection (ICSI, UW, ...):
- 100 hours collected, ongoing transcription
- NSF ‘Mapping Meetings’ project
- also interest from NIST, DARPA
1
Meeting Audio - Dan Ellis 2002-08-29 - 3/11
Data from the ICSI project
- 16 channels @ 16 kHz, 16 bit
- Preprocessing
- high-pass filter!
- 64 sample skew!
Audio PC STUDI/O PCI card A/D 1 A/D 2 Wireless RX 5 2 2 2 2 2 2 6/8 TX1 TX2 TX3 TX4 TX5 PZM1 PZM2 PZM3 PZM4 Dummy PDA Mackie mixer JimBox PSU & breakout Jimlet Jimlet Jimlet Jimlet
Notes:
- 1. The JimBox and the Jimlets are the custom electronics
manufactured at ICSI to interface PC-style headsets to pro-audio XLR. ADAT lightpipe Lapel mic Wireless headsets Computer headsets Ambient mics ADAT lightpipe MainL/R Aux1/2
ICSI Meeting Recorder Room Audio Setup 2000-05-05
10
1
10
2
10
3
10
4
20 20 40 60 Avg spec, 20s, mr-2000-11-02-1440-chanE (pzm) freq / Hz level / dB
Meeting Audio - Dan Ellis 2002-08-29 - 4/11
Close-mic channels
- Crosstalk
- Speaker activity detection
2
120 125 130 135 140 145 150 155 time / secs
speaker active
level/dB mr-2000-06-30-1600
Spkr A Spkr B Spkr C Spkr D Spkr E Table top
20 40
breath noise crosstalk backchannel (signals desire to regain floor?) floor seizure interruptions speaker B cedes floor
Meeting Audio - Dan Ellis 2002-08-29 - 5/11
Impulse response coupling
- Cross-correlation recovers impulse response
- Coupling to each mic gives motion
0.02 0.02 Example cross coupling response, chan3 to chan0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 80 60 40 20 time / s E / dB (8 pt hann) 5 10 15 20 5 10 15 20 1020 freq / chan delay / samples delay / samples 1020.5 1021 1021.5 1022 1022.5 1023 1023.5 1024 1024.5 time / sec mr-2000-11-02-1440 5 10 15 20
Spkr C Coupling C → A Coupling C → Tabletop participant movement
Meeting Audio - Dan Ellis 2002-08-29 - 6/11
Speaker Activity Detection
(with Sam Keene)
- Noisy crosstalk model:
- Estimate subband CxA from A’s peak energy
- i.e. ‘sparsity’ assumption
- ... then linear inversion to recover speaker act.
- 20 subband crosstalk gains for each spkr x mic
m C s ⋅ n + =
mr-2000-06-30-1600 chan0 20 10 mr-2000-06-30-1600 chanB 20 10 mr-2000-06-30 1600 chanA 125 130 135 140 145 150 155 160 20 10 time/s frq chan
- 30
- 20
- 10
0B
- 40
- 20
0A
- 100
- 50
B0
- 100
- 50
BA 10 20
- 100
- 50
A0 10 20
- 80
- 60
- 40
- 20
AB
Meeting Audio - Dan Ellis 2002-08-29 - 7/11
Tabletop mics: Turn detection
- 4 mics ~ 1m separated along
center of table
- 3 timing differences
- slight L/R offset to
disambiguate
- Hi-res cross-correlation for timings
- use normalized peak value for confidence
- cluster results
3
- 2.5
- 2
- 1.5
- 1
- 0.5
- 5
- 4
- 3
- 2
- 1
- 3
- 2
- 1
1 2 3
- 3
- 2
- 1
1 lag 1-2 / ms lag 3-4 / ms
mr-2000-11-02-1440: PZM xcorr lags
4 1 2 3
50 100 150 200 250 300 50 100 150 200 250 300 time / s 100xR skew/samps Example cross coupling response, chan3 to chan0
Meeting Audio - Dan Ellis 2002-08-29 - 8/11
Speaker localization
(with Huan Wei Hee)
- Timing differences → speaker positions (x,y,z)
- gradient descent on implied ∆t s
- Ambiguity:
- mic positions not fixed
- speaker motions
- Iterative estimation of speaker, mic locations
- 2
2
- 2
- 1
1 2 0.5 1
Inferred talker positions (x=mic)
x / m y / m z / m 4 1 2 3
Meeting Audio - Dan Ellis 2002-08-29 - 9/11
Visualization: transPlotter
- Speaker turn patterns are informative
- Browser for ‘high-level’ view, quick examination
- snack, iwidgets based
- public release
3
Meeting Audio - Dan Ellis 2002-08-29 - 10/11
Meeting IR tool
- IR on (ASR) transcripts from meetings
- repurposed from Thisl project
Meeting Audio - Dan Ellis 2002-08-29 - 11/11
Future work
- Speaker turns
- evaluation of close-mic system
- speaker characteristics for tabletop mics
- Nonspeech events
- unsupervised clustering of audio
- finding the feature space...
- Speech fragment recognition
- missing-data recognition based on ‘good’ signal
- recognition of overlapping voices
- High-level browsing
- the ‘meeting map’ concept
- summarization