Multiple-Level Models for Multi- Modal Interaction Martin Russell 1 - PowerPoint PPT Presentation

Multiple-Level Models for Multi- Modal Interaction Martin Russell 1 , Antje S. Meyer 2 , Stephen Cox 3 , Alan Wing 2 1 School of Engineering, University of Birmingham 2 School of Psychology, University of Birmingham 3 School of Computing, University of East Anglia Spoken Language and HCI Grand Challenge : slide 1

Outline of talk • Motivation for multi-modal interaction • Multiple-level representations to explain variability • Multiple-level representations to integrate modalities • Issues in combining modalities • Example: speech and gaze • Proposed research • Conclusions Spoken Language and HCI Grand Challenge : slide 2

Motivation • Linguistic utterances rarely unambiguous, but communication succeeds – Shared world knowledge – Common discourse model – Speech augmented with eye-gaze and gesture Spoken Language and HCI Grand Challenge : slide 3

Psycholinguistic perspective • In psycholinguistic theories the processes of retrieving and combining words are far better described than the processes of using world and discourse knowledge, eye gaze or gestures Spoken Language and HCI Grand Challenge : slide 4

Computational perspective • Automatic spoken language processing lacks knowledge and theory to explain ambiguity – Assumes direct relationship between word sequences and acoustic signals – Variability treated as noise • No established framework to accommodate complimentary modalities Spoken Language and HCI Grand Challenge : slide 5

Challenges • Psycholinguistics needs: – Better understanding of how speakers and listeners use eye gaze and gesture to augment the speech signal • Computational spoken language processing needs: – Better treatment of variability in spoken language – Better frameworks for augmenting speech with other modalities • Both need fruitful interaction between psycholinguistics and computational spoken language processing Spoken Language and HCI Grand Challenge : slide 6

Example: acoustic variability • Sources of acoustic variability not naturally characterised in the acoustic domain: – Speech dynamics – Individual speaker differences – Speaking styles – … Spoken Language and HCI Grand Challenge : slide 7

A model of acoustic variability • Introduce intermediate, ‘articulatory’ layer Acoustic • Speech dynamics Synthetic modelled as trajectory in acoustic this layer Articulatory-to- W acoustic mapping • Trajectory mapped into acoustic space ‘Modeled’ articulatory • Probabilities calculated in acoustic space Spoken Language and HCI Grand Challenge : slide 8

Combining modalities • Examples: – Lip-shape correlates with speech at the acoustic level… – … but this is not the case in general – Correlation between speech and eye-movement (when it exists) likely to be at conceptual level Spoken Language and HCI Grand Challenge : slide 9

Multiple-level models • Different levels of representation needed: – To model causes of variability in speech – To capture relationship between speech and other modalities • Candidate formalisms already exist: – Graphical models, – Bayesian networks, – layered HMMs – … Spoken Language and HCI Grand Challenge : slide 10

Example: speech and gaze Spoken Language and HCI Grand Challenge : slide 11

Results from ‘map task’ experiment giver follower Spoken Language and HCI Grand Challenge : slide 12

Results from map task giver follower Spoken Language and HCI Grand Challenge : slide 13

Object naming 4 time gaze lag speech onset Planning to Phonetic, articulatory planning phonological PLUS advanced planning for level next object From ESRC Meyer, Wheeldon Spoken Language and HCI Grand Challenge : slide 16

Object naming Spoken Language and HCI Grand Challenge : slide 17

Lessons from psychology • Gaze-to-speech lags a) Speech-to-Gaze Lags (ms) 300 monosyllabic 250 disyllabic 200 150 100 50 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Repetition Spoken Language and HCI Grand Challenge : slide 18

More lessons… • Gaze duration d) 600 Viewing Times (ms) 550 500 450 400 350 300 250 200 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Repetition Spoken Language and HCI Grand Challenge : slide 19

Speech and gaze • In general, a speaker who looks at an object might: a) Name the object, b) Say something about the object c) Say something about a different topic altogether d) Say nothing at all • There will be a delay (200-300ms for object naming) between finishing looking at an object and talking about it • The delay will be less if the object was discussed previously Spoken Language and HCI Grand Challenge : slide 20

Speech and gaze (continued) • Alternatively, gaze might provide an important cue for classifying the ‘state’ of a communication (e.g. meeting) – Monologue (all eyes on one subject) – Discussion (eyes move between subjects) Spoken Language and HCI Grand Challenge : slide 21

Proposed research • Goal : Improved understanding of user goals and communication states through integration of speech, gaze and gesture • Integrated, multi-disciplinary project, involving psycholinguistics, speech and language processing, and mathematical modeling Spoken Language and HCI Grand Challenge : slide 22

Proposed research (1) • Experimental study of speech, gaze and gesture in referential communication and matching tasks, to determine: – How speakers’ and listeners’ gaze are coordinated spatially and in time – Functional significance of eye gaze and gesture information (by allowing or preventing mutual eye contact between the interlocutors) – Importance of temporal co-ordination of speaker and listener gaze Spoken Language and HCI Grand Challenge : slide 23

Proposed research (2) • Development of multiple-level computer models for integration of speech, gaze and gesture, for – Improved understanding of user goals – Improved classification of communication states (meeting actions) Spoken Language and HCI Grand Challenge : slide 24

Summary • Speech in multi-modal interfaces • Multiple-level models for: – Characterising variability within a modality – Characterising relationships between modalities • Proposal for collaborative research in psycholinguistics and speech technology Spoken Language and HCI Grand Challenge : slide 25

CETaDL meeting room Spoken Language and HCI Grand Challenge : slide 26

Multiple-Level Models for Multi- Modal Interaction Martin Russell 1 - PowerPoint PPT Presentation

Multiple-Level Models for Multi- Modal Interaction Martin Russell 1 , Antje S. Meyer 2 , Stephen Cox 3 , Alan Wing 2 1 School of Engineering, University of Birmingham 2 School of Psychology, University of Birmingham 3 School of Computing,

The Expressive Power of Backround Modal Dependence Logic Modal logic Team semantics Modal

Multi-modal Face Recognition Hu Han hanhu@ict.ac.cn http: / / vipl.ict.ac.cn/ members/ hhan

Guiding Interaction Behaviors for Multi-modal Grounded Language Learning Jesse Thomason, Jivko

W HAT IS EHD? Introduction EHD without cross-flow Modal Dielectric fluid Non-modal EHD with

Modal logic Benzm uller/Rojas, 2014 Artificial Intelligence 2 What is Modal Logic?

Why is modal logic decidable Petros Potikas NTUA 9/5/2017 Petros Potikas (NTUA) Modal logic

the interaction The Interaction interaction models translations between user and system

MoQA A Multi-Modal Question Answering Architecture Monica Haurilet, Ziad Al-Halah and Rainer

Temporal and Modal Logic Based on paper: E.A. Emerson. Temporal and Modal Logic J. van Leeuwen,

Modal logic for concurrent processes Lu s Soares Barbosa Interaction & Concurrency

the interaction physical characteristics of interaction interaction styles the

ADDRESS INTER-MODAL CONFLICT CONTENTS 1. Introduction 2. Identified inter-modal conflicts within

A Southeast Louisiana Inter Modal A Southeast Louisiana Inter Modal Transportation Hub

Projective unification in modal logic II Projective unification in modal logic II Piotr Wojtylak

MODAL AUTOMATA studying modal fixpoint logics one step at a time Yde Venema

Introduction to modal logic Lus Soares Barbosa Jos Proena HASLab - INESC TEC Universidade

OTN Governance Peter Harrison Chair, OTN Council 1 What does OTN do? Tracks local-to-global

Outline: SUZero MML Talk Interspeech talk (for Ewald) Explain one technique in a bit more

Actinide Science: A focus on the properties of Uranium Dioxide Nuclear waste actinide

EMMA Extensible Multimodal Annotation markup language Canonical structure semantic

Multiscale methods for time-harmonic acoustic and elastic wave propagation Dietmar Gallistl

Finding Musically Meaningful Words Using Sparse CCA David A. Torres, Douglas Turnbull, Bharath K.

On the dynamic behavior of Vowel-to-Vowel Harmony in French: Do speakers control states or

Togo 1 3/3/2017 AUDITORY INTEGRATION THERAPY (AIT) Despite approximately one decade of

Sambuz

Useful Links

Newsletter

Mail Us

Multiple-Level Models for Multi- Modal Interaction Martin Russell 1 - PowerPoint PPT Presentation

Multiple-Level Models for Multi- Modal Interaction Martin Russell 1 , Antje S. Meyer 2 , Stephen Cox 3 , Alan Wing 2 1 School of Engineering, University of Birmingham 2 School of Psychology, University of Birmingham 3 School of Computing,

The Expressive Power of Backround Modal Dependence Logic Modal logic Team semantics Modal

Multi-modal Face Recognition Hu Han hanhu@ict.ac.cn http: / / vipl.ict.ac.cn/ members/ hhan

Guiding Interaction Behaviors for Multi-modal Grounded Language Learning Jesse Thomason, Jivko

W HAT IS EHD? Introduction EHD without cross-flow Modal Dielectric fluid Non-modal EHD with

Modal logic Benzm uller/Rojas, 2014 Artificial Intelligence 2 What is Modal Logic?

Why is modal logic decidable Petros Potikas NTUA 9/5/2017 Petros Potikas (NTUA) Modal logic

the interaction The Interaction interaction models translations between user and system

MoQA A Multi-Modal Question Answering Architecture Monica Haurilet, Ziad Al-Halah and Rainer

Temporal and Modal Logic Based on paper: E.A. Emerson. Temporal and Modal Logic J. van Leeuwen,

Modal logic for concurrent processes Lu s Soares Barbosa Interaction &amp; Concurrency

the interaction physical characteristics of interaction interaction styles the

ADDRESS INTER-MODAL CONFLICT CONTENTS 1. Introduction 2. Identified inter-modal conflicts within

A Southeast Louisiana Inter Modal A Southeast Louisiana Inter Modal Transportation Hub

Projective unification in modal logic II Projective unification in modal logic II Piotr Wojtylak

MODAL AUTOMATA studying modal fixpoint logics one step at a time Yde Venema

Introduction to modal logic Lus Soares Barbosa Jos Proena HASLab - INESC TEC Universidade

OTN Governance Peter Harrison Chair, OTN Council 1 What does OTN do? Tracks local-to-global

Outline: SUZero MML Talk Interspeech talk (for Ewald) Explain one technique in a bit more

Actinide Science: A focus on the properties of Uranium Dioxide Nuclear waste actinide

EMMA Extensible Multimodal Annotation markup language Canonical structure semantic

Multiscale methods for time-harmonic acoustic and elastic wave propagation Dietmar Gallistl

Finding Musically Meaningful Words Using Sparse CCA David A. Torres, Douglas Turnbull, Bharath K.

On the dynamic behavior of Vowel-to-Vowel Harmony in French: Do speakers control states or

Togo 1 3/3/2017 AUDITORY INTEGRATION THERAPY (AIT) Despite approximately one decade of

Sambuz

Useful Links

Newsletter

Mail Us

Modal logic for concurrent processes Lu s Soares Barbosa Interaction & Concurrency