capturing natural interactions Nick Campbell Trinity College, - - PowerPoint PPT Presentation

capturing natural interactions
SMART_READER_LITE
LIVE PREVIEW

capturing natural interactions Nick Campbell Trinity College, - - PowerPoint PPT Presentation

capturing natural interactions Nick Campbell Trinity College, Dublin Clarin/FLaReNet Workshop@KTH November 26th, 2009 Thursday 26 November 2009 introduction Speech recognition and synthesis technologies can now be considered mature,


slide-1
SLIDE 1

capturing natural interactions

Nick Campbell Trinity College, Dublin Clarin/FLaReNet Workshop@KTH November 26th, 2009

Thursday 26 November 2009

slide-2
SLIDE 2

introduction

  • Speech recognition and synthesis technologies can now be considered

mature, but their simple incorporation into speech-based human-computer interfaces reveals shortcomings in their capabilities.

  • Perhaps the biggest reason for this is that each technology was designed

explicitly to convert between text and spoken modalities, without taking into consideration the complexities of human spoken interaction as the joint creation of mutually understood meaning.

  • If the goal of this meeting is to facilitate data collection to improve these

interfaces and make them more intelligent and human-like (through a better understanding of human interaction and communication), then we might make a start by designing improved techniques for efficiently capturing, storing, annotating, and distributing large corpora of natural spoken interactions

Thursday 26 November 2009

slide-3
SLIDE 3

Overview of the Talk

  • Speech & Multimodal Databases
  • Annotating,

Viewing & Distributing Data

  • Two-way Dissemination (crowd-sourcing)
  • plus, because I am now at Trinity, working again

in the Humanities, a selection of 18th Century poetic thought (but with an engineering bias) !

Thursday 26 November 2009

slide-4
SLIDE 4

Speech & Multimodal Databases

  • my primary interest: collecting natural speech -

modelling human conversational interactions

  • age 12: WW, ’98: “we murder to dissect” . . .

“quit your books; let Nature be your teacher”

  • important design notes for corpus gatherers!

Thursday 26 November 2009

slide-5
SLIDE 5

Verse > William Wordsworth > Complete Poetical Works THE TABLES TURNED, 1798

UP! up! my Friend, and quit your books; Or surely you'll grow double: Up! up! my Friend, and clear your looks; Why all this toil and trouble? The sun, above the mountain's head, A freshening lustre mellow Through all the long green fields has spread, His first sweet evening yellow. Books! 'tis a dull and endless strife: Come, hear the woodland linnet, How sweet his music! on my life, There's more of wisdom in it. And hark! how blithe the throstle sings! He, too, is no mean preacher: Come forth into the light of things, Let Nature be your teacher. She has a world of ready wealth, Our minds and hearts to bless-- Spontaneous wisdom breathed by health, Truth breathed by cheerfulness. One impulse from a vernal wood May teach you more of man, Of moral evil and of good, Than all the sages can. Sweet is the lore which Nature brings; Our meddling intellect Mis-shapes the beauteous forms of things:-- We murder to dissect. Enough of Science and of Art; Close up those barren leaves; Come forth, and bring with you a heart That watches and receives.

Thursday 26 November 2009

slide-6
SLIDE 6

multifaceted behaviour

  • by constraining a corpus, we limit the types
  • f interaction that it can illustrate
  • only by releasing these constraints on

participant behaviour can we gather a corpus that will teach us something new about human conversational interaction

Thursday 26 November 2009

slide-7
SLIDE 7

dimensions of speech

Thursday 26 November 2009

slide-8
SLIDE 8

example: what is a “turn”

Thursday 26 November 2009

slide-9
SLIDE 9

Thursday 26 November 2009

slide-10
SLIDE 10

contact management?

Thursday 26 November 2009

slide-11
SLIDE 11

Thursday 26 November 2009

slide-12
SLIDE 12

bias in corpora

  • the proposed ISO standard also illustrates

the inherent bias in existing corpora:

  • e.g., Tables 1 & 2 in Annex F show considerable

differences in “Contact Management” between corpora

  • Our conclusion is that Contact Management could be considered as an

ʻoptionalʼ dimension, since this aspect of communication is not reflected in most existing dialogue act annotation schemes (6 out of 18). It was noticed, however, that for some types of dialogues, e.g. phone conversations or tele- conferences (as in the OVIS corpus), this aspect may be important.”

  • only 0.1% in AMI, vs 12.3% in OVIS .....

Thursday 26 November 2009

slide-13
SLIDE 13
  • Results from survey of dimensions and communicative functions in existing annotation schemas

Thursday 26 November 2009

slide-14
SLIDE 14

Annotating, Viewing and Distributing New Data

There are presently several tools for manual annotation of data that each store the results in a prescribed format, easy for dissemination, but my experience of working with these and of talking with people who use them regularly is that the task is tedious, and the framework often restrictive. Rather than prescribe a standard at this time, we might benefit more from creating a support group whereby people who annotate data regularly can communicate and share samples, tools, and formats for rapid assisted evolution. My LREC 2010 paper (A Software Toolkit for Viewing Annotated

Multimodal Data Interactively over the Web) may be relevant here.

Thursday 26 November 2009

slide-15
SLIDE 15

Thursday 26 November 2009

slide-16
SLIDE 16

A Software Toolkit for Viewing Annotated Multimodal Data Interactively over the Web

section headings (lrec-2010)

  • introduction
  • the freetalk multimodal corpus
  • assembling complex data
  • viewing complex data interactively
  • details of the software
  • downloading & use
  • summary & conclusion

Thursday 26 November 2009

slide-17
SLIDE 17

flash-based data interface

Thursday 26 November 2009

slide-18
SLIDE 18

flash movies & dataplots

  • we archive ALL originals, and link various

derived annotations, data streams, and compressed video versions .... flash movie format (xxx.flv) appears to offer the most efficient service and access software ........

  • interactive pages at www.speech-data.jp

Thursday 26 November 2009

slide-19
SLIDE 19

Two-Way Dissemination

By sharing a corpus, we stand to gain added annotation levels. We should also examine crowd-sourcing in this respect. As with our own FreeTalk corpus (www.speech-data.jp), by making the initial data public and co-operating worldwide with interested partners, the annotations can be grown as researchers with different interests contribute their own layers of knowledge. Since the world of multimodal corpora is still young, perhaps the most we might expect from this initial meeting is the opening up of channels whereby the exchange of sources and resources might take place.

Thursday 26 November 2009

slide-20
SLIDE 20

a growing community

  • we don’t yet have clearly defined “interface

standards” but we try to keep a flexible, open- minded approach

  • different people are working on the corpus

each from their own viewpoints, using different software and both ‘top-down’ (theory driven) vs ‘bottom-up’ (data driven) approaches ....

  • we are hoping for a happy marriage of both

Thursday 26 November 2009

slide-21
SLIDE 21

summary

  • we do not yet know how to properly create a

‘balanced’ and ‘representative’ speech corpus

  • we do not yet know how to integrate & manage

complex multimodal data packages

  • we do not yet know the best ways to disseminate

and share these types of data

  • so maybe it is a bit early to propose standards
  • but we can gain a lot by encouraging exchange

and interchange of related annotations & data

Thursday 26 November 2009

slide-22
SLIDE 22
  • thank you ....

Thursday 26 November 2009