Multiple-Level Models for Multi- Modal Interaction Martin Russell 1 - - PowerPoint PPT Presentation

multiple level models for multi modal interaction
SMART_READER_LITE
LIVE PREVIEW

Multiple-Level Models for Multi- Modal Interaction Martin Russell 1 - - PowerPoint PPT Presentation

Multiple-Level Models for Multi- Modal Interaction Martin Russell 1 , Antje S. Meyer 2 , Stephen Cox 3 , Alan Wing 2 1 School of Engineering, University of Birmingham 2 School of Psychology, University of Birmingham 3 School of Computing,


slide-1
SLIDE 1

Spoken Language and HCI Grand Challenge: slide 1

Multiple-Level Models for Multi- Modal Interaction

Martin Russell1, Antje S. Meyer2, Stephen Cox3, Alan Wing2

1School of Engineering, University of Birmingham

2School of Psychology, University of Birmingham 3School of Computing, University of East Anglia

slide-2
SLIDE 2

Spoken Language and HCI Grand Challenge: slide 2

Outline of talk

  • Motivation for multi-modal interaction
  • Multiple-level representations to explain variability
  • Multiple-level representations to integrate modalities
  • Issues in combining modalities
  • Example: speech and gaze
  • Proposed research
  • Conclusions
slide-3
SLIDE 3

Spoken Language and HCI Grand Challenge: slide 3

Motivation

  • Linguistic utterances rarely unambiguous, but

communication succeeds

– Shared world knowledge – Common discourse model – Speech augmented with eye-gaze and gesture

slide-4
SLIDE 4

Spoken Language and HCI Grand Challenge: slide 4

Psycholinguistic perspective

  • In psycholinguistic theories the processes of

retrieving and combining words are far better described than the processes of using world and discourse knowledge, eye gaze or gestures

slide-5
SLIDE 5

Spoken Language and HCI Grand Challenge: slide 5

Computational perspective

  • Automatic spoken language processing lacks

knowledge and theory to explain ambiguity

– Assumes direct relationship between word sequences and acoustic signals – Variability treated as noise

  • No established framework to accommodate

complimentary modalities

slide-6
SLIDE 6

Spoken Language and HCI Grand Challenge: slide 6

Challenges

  • Psycholinguistics needs:

– Better understanding of how speakers and listeners use eye gaze and gesture to augment the speech signal

  • Computational spoken language processing needs:

– Better treatment of variability in spoken language – Better frameworks for augmenting speech with other modalities

  • Both need fruitful interaction between psycholinguistics

and computational spoken language processing

slide-7
SLIDE 7

Spoken Language and HCI Grand Challenge: slide 7

Example: acoustic variability

  • Sources of acoustic variability not naturally

characterised in the acoustic domain:

– Speech dynamics – Individual speaker differences – Speaking styles – …

slide-8
SLIDE 8

Spoken Language and HCI Grand Challenge: slide 8

A model of acoustic variability

  • Introduce intermediate,

‘articulatory’ layer

  • Speech dynamics

modelled as trajectory in this layer

  • Trajectory mapped into

acoustic space

  • Probabilities calculated

in acoustic space

Synthetic acoustic Acoustic ‘Modeled’ articulatory Articulatory-to- acoustic mapping

W

slide-9
SLIDE 9

Spoken Language and HCI Grand Challenge: slide 9

Combining modalities

  • Examples:

– Lip-shape correlates with speech at the acoustic level… – … but this is not the case in general – Correlation between speech and eye-movement (when it exists) likely to be at conceptual level

slide-10
SLIDE 10

Spoken Language and HCI Grand Challenge: slide 10

Multiple-level models

  • Different levels of representation needed:

– To model causes of variability in speech – To capture relationship between speech and other modalities

  • Candidate formalisms already exist:

– Graphical models, – Bayesian networks, – layered HMMs – …

slide-11
SLIDE 11

Spoken Language and HCI Grand Challenge: slide 11

Example: speech and gaze

slide-12
SLIDE 12

Spoken Language and HCI Grand Challenge: slide 12

Results from ‘map task’ experiment

giver follower

slide-13
SLIDE 13

Spoken Language and HCI Grand Challenge: slide 13

Results from map task

giver follower

slide-14
SLIDE 14

Spoken Language and HCI Grand Challenge: slide 14

Results from map task

giver follower

slide-15
SLIDE 15

Spoken Language and HCI Grand Challenge: slide 15

Results from map task

giver follower

slide-16
SLIDE 16

Spoken Language and HCI Grand Challenge: slide 16

Object naming

From ESRC Meyer, Wheeldon time lag speech onset

Phonetic, articulatory planning

4

PLUS advanced planning for next object

gaze

Planning to phonological level

slide-17
SLIDE 17

Spoken Language and HCI Grand Challenge: slide 17

Object naming

slide-18
SLIDE 18

Spoken Language and HCI Grand Challenge: slide 18

Lessons from psychology

  • Gaze-to-speech lags

a)

50 100 150 200 250 300 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Repetition Speech-to-Gaze Lags (ms)

monosyllabic disyllabic

slide-19
SLIDE 19

Spoken Language and HCI Grand Challenge: slide 19

More lessons…

  • Gaze duration

d)

200 250 300 350 400 450 500 550 600 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Repetition Viewing Times (ms)

slide-20
SLIDE 20

Spoken Language and HCI Grand Challenge: slide 20

Speech and gaze

  • In general, a speaker who looks at an object might:

a) Name the object, b) Say something about the object c) Say something about a different topic altogether d) Say nothing at all

  • There will be a delay (200-300ms for object naming)

between finishing looking at an object and talking about it

  • The delay will be less if the object was discussed previously
slide-21
SLIDE 21

Spoken Language and HCI Grand Challenge: slide 21

Speech and gaze (continued)

  • Alternatively, gaze might provide an

important cue for classifying the ‘state’ of a communication (e.g. meeting)

– Monologue (all eyes on one subject) – Discussion (eyes move between subjects)

slide-22
SLIDE 22

Spoken Language and HCI Grand Challenge: slide 22

Proposed research

  • Goal: Improved understanding of user goals

and communication states through integration

  • f speech, gaze and gesture
  • Integrated, multi-disciplinary project,

involving psycholinguistics, speech and language processing, and mathematical modeling

slide-23
SLIDE 23

Spoken Language and HCI Grand Challenge: slide 23

Proposed research (1)

  • Experimental study of speech, gaze and gesture in

referential communication and matching tasks, to determine:

– How speakers’ and listeners’ gaze are coordinated spatially and in time – Functional significance of eye gaze and gesture information (by allowing or preventing mutual eye contact between the interlocutors) – Importance of temporal co-ordination of speaker and listener gaze

slide-24
SLIDE 24

Spoken Language and HCI Grand Challenge: slide 24

Proposed research (2)

  • Development of multiple-level computer models

for integration of speech, gaze and gesture, for

– Improved understanding of user goals – Improved classification of communication states (meeting actions)

slide-25
SLIDE 25

Spoken Language and HCI Grand Challenge: slide 25

Summary

  • Speech in multi-modal interfaces
  • Multiple-level models for:

– Characterising variability within a modality – Characterising relationships between modalities

  • Proposal for collaborative research in

psycholinguistics and speech technology

slide-26
SLIDE 26

Spoken Language and HCI Grand Challenge: slide 26

CETaDL meeting room