The ICSI Meeting Corpus Barbara Peskin [on behalf of ICSIs - - PowerPoint PPT Presentation

the icsi meeting corpus
SMART_READER_LITE
LIVE PREVIEW

The ICSI Meeting Corpus Barbara Peskin [on behalf of ICSIs - - PowerPoint PPT Presentation

The ICSI Meeting Corpus Barbara Peskin [on behalf of ICSIs MeetingRecorder Team] International Computer Science Institute Berkeley, CA M4 Meeting, Sheffield 29-30 January 2003 1 Basic Facts 75 natural meetings collected at


slide-1
SLIDE 1

29-30 January 2003 M4 Meeting, Sheffield 1

The ICSI Meeting Corpus

Barbara Peskin

[on behalf of ICSI’s MeetingRecorder Team] International Computer Science Institute Berkeley, CA

slide-2
SLIDE 2

29-30 January 2003 M4 Meeting, Sheffield 2

Basic Facts

  • 75 “natural” meetings collected at ICSI, 2000-2002

– regular weekly meetings of ICSI working teams (mostly) – 3 – 10 participants per meeting, averaging ~6 – Roughly 1 hour each (17 – 103 minutes; 72 hours total) – 4 main meeting types, 53 unique talkers

  • Simultaneous multi-channel recordings, using both

close-talking and far-field microphones

  • Audio only (no video), plus complete transcriptions
  • “Digits task”: a small-vocab read-speech subtask
  • Supports a wealth of research possibilities
  • Available through the LDC this summer (and to

research partners soon, direct from us)

slide-3
SLIDE 3

29-30 January 2003 M4 Meeting, Sheffield 3

Recording Set-up

All meetings were recorded at ICSI in the same conference room, using the same set-up

  • Close-talking microphones for each speaker

– mostly head-mounted – some lapel mics in early meetings

  • 6 tabletop microphones

– 4 high-quality omnidirectional PZM’s arrayed down the center of the table – 2 inexpensive microphone elements mounted on a “PDA mock-up”

  • All channels recorded separately and simultaneously
  • Collected at 48 kHz, downsampled on the fly to 16 kHz
  • Audio files are 16-bit linear, compressed NIST SPHERE formats
slide-4
SLIDE 4

29-30 January 2003 M4 Meeting, Sheffield 4

Meeting Types, Meeting Participants

A few main meeting types with slowly changing mix of speakers, content

  • Meeting Recorder Project [29]
  • Robustness (signal processing for robust ASR) [23]
  • Even Deeper Understanding (natural language understanding) [15]
  • Network Services & Applications [3]
  • Other sporadic types, incl. 2 transcription team meetings [5]

53 unique talkers in the corpus

  • Speakers may appear in more than one meeting type
  • Significant proportion of non-native English speakers
  • Demographic info on sex, age, education level, dialect, etc. (all opt’l)

collected on enrollment

  • For non-native speakers, info available on native tongue and time in

English-speaking country

slide-5
SLIDE 5

29-30 January 2003 M4 Meeting, Sheffield 5

The Digits Task

At most meetings, participants read digit strings (similar to TIDIGITS) at start or end of meeting

  • Same speakers, same mics, same room as for spontaneous

speech collection

  • Allows factorization of speech challenges offered by corpus:

– Tackle spontaneous multi-party ASR using high-quality channel – Explore far-field acoustics on simpler speech task

  • Digits usually read by each speaker in turn, but there are some

interesting exceptions:

– Occasionally, digits read by all speakers simultaneously – Once, all speakers used same digits script and read in unison (more or less)

slide-6
SLIDE 6

29-30 January 2003 M4 Meeting, Sheffield 6

Transcriptions

All meetings are fully transcribed at the word level

  • Uses simple conventions, favoring standard orthography
  • Includes word fragments, mangled prons, disfluencies
  • Includes vocal (breath, laugh, …) and nonvocal (door slam, coffee mug

clinks, mic noise, …) nonspeech sounds, and contextual comments

  • Produced from close-talking channels, permitting careful transcription of
  • verlapping speech, soft-spoken backchannels, etc.

Transcripts were post-processed into a simple XML format

  • Headers include meeting date, time, participant, mic, etc. info (plus a free-

form notes field)

  • XML format designed specifically for this corpus
  • Software provided for translating from our format to other common ones
slide-7
SLIDE 7

29-30 January 2003 M4 Meeting, Sheffield 7

Additional Information

  • Released corpus will also include

– Speaker table of demographic info – Transcription guidelines – XML documentation, including our “Meetings DTD”

  • Additional annotation of Meeting subsets underway

– Prosodic feature database – Dialogue act annotations

  • For further information, consult

– ICSI’s many publications on Meetings work, incl.

  • corpus overview: A. Janin et al., Proc. ICASSP’03
  • research updates: N. Morgan et al., Proc. HLT-2001 and Proc. ICASSP’03
  • [see our website for other listings]

– Our website: http://www.icsi.berkeley.edu/Speech/mr/