Directions in Dialogue Research (Engineering Applications in mind) - PDF document

2012/7/5 Multi modal Sensing and Analysis of Multi ‐ modal Sensing and Analysis of Poster Conversations Toward Smart Posterboard Tatsuya Kawahara (Kyoto University, Japan) http://www.ar.media.kyoto ‐ u.ac.jp/crest/ Directions in Dialogue Research (Engineering Applications in mind) • Speech ‐ only • Multi ‐ modal • Dyadic • Multi ‐ party • Human ‐ Machine • Human ‐ Human Interface Interaction 1

2012/7/5 Human ‐ Machine Human ‐ Human Interface Communication constrained speech/dialog natural speech/dialogue • task domain • many sentences per one turn • one sentence per one turn • backchannels • clear articulation Project Overview 2

2012/7/5 Problems “Understanding” of • Speaker Diarization human ‐ human speech human human speech • Speech to Text (ASR) • Speech ‐ to ‐ Text (ASR) communication • Dialogue Act (?) • Comprehension level • Interest level • Interest level Goal (Application Scenario) Mining human interaction patterns • A new indexing scheme of speech archives – Review summary of QA – Portion difficult for audience to follow (  presenter) – Interesting spots (  third ‐ party viewers) “P “People would be interested in what other people l ld b i d i h h l were interested in.” • A model of intelligent conversational agents (future topic) 3

2012/7/5 From Content ‐ based Indexing to Interaction ‐ based Indexing • Content ‐ based approach • Content ‐ based approach – try to understand & annotate content of speech … ASR+NLP – Actually hardly “ understand ” • Interaction ‐ based approach I t ti b d h – look into reaction of listeners/audience, who understand the content – More oriented for human cognitive process From Content ‐ based Approach to Interaction ‐ based Approach • Even if we do not understand the talk, we can see funny/important parts by observing audience ’ s laughing/nodding • Page rank is determined by the number of links rather than by the content 4

2012/7/5 System Overview Content ‐ based indexing indexing Speech Speech Content Content recognition analysis Interactive presentation Audio analysis Interaction analysis Video Reaction ‐ based analysis indexing Multi ‐ modal Sensing & Analysis [signal] [behavior] [mental state] Pointing g Gaze (head) attention Video compre ‐ Nodding Motion hension Backchannel Audio interest Laughter courtesy Utterance 5

2012/7/5 Methodology • Sensing devices – Gold ‐ standard: special devices worn by subjects Gold standard: special devices worn by subjects – Final system: distant microphones & cameras • Milestones for high ‐ level annotation “Good reactions”  “attracted” Good reactions  attracted – Reactive tokens  interest level – when & who asks questions  interest level – kind of questions  comprehension level Multi ‐ modal Corpus of Poster Conversations 6

2012/7/5 Why Poster Sessions? • Norm in conferences & open ‐ houses • Mixture characteristics of lectures and meetings • Mixture characteristics of lectures and meetings – One main speaker with a small audience – Audience can make questions/comments at any time • Interactive – Real ‐ time feedback including backchannels by audience l f db k l d b k h l b d • Multi ‐ modal (truly) – Standing & moving • Controllable (knowledge/familiarity) and yet real Multi ‐ modal Sensing Environment: IMADE room • Wire ‐ less head ‐ worn microphone h Audio • Microphone array mounted on poster stand • 6 ‐ 8 cameras installed Video in the room • Motion ‐ capturing system Motion • Accelerometer • Eye ‐ tracking recorders Eye ‐ gaze 7

2012/7/5 Multi ‐ modal Recording Setting Video camera Motion- capturing camera Microphone Distant array microphone Multi ‐ modal Recording Setting Eye-tracking Accelerometer Accelerometer recorder Motion ‐ capturing Wireless marker microphone 8

2012/7/5 Prototype of Smart Posterboard 65’ LCD Screen + Microphone Array + Cameras Microphone Array mounted on LCD Posterboard 19 ‐ channel microphone array Pre ‐ amplifier AD converter 9

2012/7/5 Corpus of Poster Conversations • 31 sessions recorded  4 used in this work Poster – One presenter + audience of two persons One presenter + audience of two persons A A – Presentation of research; unfamiliar to audience B C – Each 20 min. • Manual transcription – IPU, clause unit – Fillers, Backchannels (reactive tokens), Laughter • Non ‐ verbal behavior labels (almost automated!!) b l b h i l b l ( l d!!) – Nodding…non ‐ verbal backchannel  accelerometer – Eye ‐ gaze (to other person & poster)  eye ‐ track rec.  motion cap. – Pointing (to poster) 10

2012/7/5 Detection of Interest Level with Reactive Tokens of Audience Multi ‐ modal Sensing & Analysis [signal] [behavior] [mental state] Pointing g Gaze (head) attention Video compre ‐ Nodding Motion hension Backchannel Audio interest Laughter courtesy Utterance 11

2012/7/5 Reactive Token of Audience • Reactive Token ( aizuchi ) – short verbal responses made in real time and h t b l d i l ti d backchannel – focus on non ‐ lexical kinds (ex.) “uh ‐ huh”, “wow” – change syllabic & prosodic patterns, according to the state of mind [Ward2004] • Audience’s interest level • Interesting spot (“hot spot”) in the session Prosodic Features • For each reactive token – Duration D ti – F0 (maximum, range) – power (maximum) • Normalized for each person – For each feature, compute the mean – The mean is subtracted from feature values 12

2012/7/5 Variation (SD) of Prosodic Features • Tokens used for assessment have a large variation Duration F0 max F0 range Power SD (sec.) SD (Hz) SD (Hz) SD (db) ふーん (hu:N) 114 0.44 0.44 22 38 38 4.3 Non ‐ lexical & へー (he:) 78 0.54 0.54 34 34 41 41 5.4 used for assessment あー (a:) 59 0.37 35 35 39 39 6.4 6.4 はあ (ha:) 55 0.24 35 35 36 6.3 6.3 ああ (aa) 23 0.17 30 38 38 6.3 6.3 ははー (ha:) (h ) 21 21 0 65 0 65 0.65 0.65 32 32 30 30 4 8 4.8 うーん (u:N) 544 0.27 27 35 4.6 Lexical & うん (uN) 356 0.15 25 30 4.9 used for はい (hai) 188 0.19 28 24 5.8 Ack. ふん (huN) 166 0.31 25 21 4.1 ええ (ee) 38 0.1 31 37 5.5 Relationship with Interest Level (Subjective Evaluation) • For each token (syllable pattern) and for each prosodic feature, for each prosodic feature, – Pick up top ‐ 10 & bottom ‐ 10 samples (largest & smallest values of the feature) • Audio file is segmented to cover the reactive token and its preceding clause • Five subjects listen and evaluate the audience’s state of the mind state of the mind – 12 items to be evaluated in 4 scales – two for interest: 興味 , 関心 – two for surprise: 驚き , 意外 13

2012/7/5 Relationship with Interest Level (Subjective Evaluation)  There are particular combinations of syllabic & prosodic patterns which express interest & surprise patterns which express interest & surprise Reactive prosody interest surprise token へー duration ○ ○ F0max ○ ○ he: F0range ○ ○ Power ○ ○ あーあ duration duration a: F0max ○ F0range Power ○ ふーん duration ○ ○ fu:N F0max F0range powe (p<0.05) Podspotter: Conversation Browser based on Audience’s Reaction • “Funny Spot”  laughter Demo • “Interesting Spot”  reactive token Interesting Spot  reactive token • 14

2012/7/5 Third ‐ party Evaluation of Hot Spots • Four subjects, who had not attended presentation nor listened to the content t ti li t d t th t t • Listen to a sequence of utterances (max. 20sec.) which induced the laughter and/or reactive tokens • Evaluate the spots Evaluate the spots – Is “Funny Spot” really funny? – Is “Interesting Spot” really interesting? Third ‐ party Evaluation of Hot Spots • “Funny Spot”  laughter ? Funny Spot  laughter ? • – Only a half are funny; 35% are NOT funny – Feeling funny largely depends on the person – Laughter was often made to relax the audience • “Interesting Spot”  reactive token ? te est g Spot eact e to e – Over 90% are interesting and useful for the subjects 15

2012/7/5 Conclusions • Non ‐ lexical reactive tokens with prominent prosody indicates interest level. d i di t i t t l l • The spots detected based on reactive tokens are interesting for third ‐ party viewers. • Laughter does not necessarily mean “funny”. Prediction of Turn ‐ taking with Eye ‐ gaze and Backchannel 16

2012/7/5 Multi ‐ modal Sensing & Analysis [signal] [behavior] [mental state] Pointing g Gaze (head) attention Video Nodding compre ‐ Motion hension Backchannel Audio interest Laughter courtesy Utterance Prediction of Turn ‐ taking by Audience • Questions & comments  Interest level – Audience asks more & better questions when A di k & b tt ti h attracted more • Automated control to beamform microphones or cameras – before someone in the audience actually speaks b f i th di t ll k • Intelligent conversational agent handling multiple partners – wait for someone to speak OR continue to speak 17

Directions in Dialogue Research (Engineering Applications in mind) - PDF document

2012/7/5 Multi modal Sensing and Analysis of Multi modal Sensing and Analysis of Poster Conversations Toward Smart Posterboard Tatsuya Kawahara (Kyoto University, Japan) http://www.ar.media.kyoto u.ac.jp/crest/ Directions in Dialogue

G. G. Stokes 1857 Stokes diagram with Stokes directions Halo at with singular directions

G. G. Stokes 1857 Stokes diagram with Stokes directions Halo at with singular directions

dialogue notations and design Dialogue Notations and Design Dialogue Notations

Dialogue corpora NPFL070 December 11, 2019 (NPFL070) Dialogue corpora December 11, 2019 1 /

Language and Computers Speech acts Rules Early dialogue Dialog Systems systems ELIZA Other

dialogue systems, dialogue modeling 15 June 2007 ptt dialogue systems: intro 1/71 Dialog

dialogue notations and Dialogue linked to the semantics of the system what it does

The Computer and Natural Language Speech acts Discourse structure (Ling 445/515) Early dialogue

2 nd Dialogue on REDD Finance Mechanisms The Forests Dialogue 19-20 June 2009 Montreux,

Social Dialogue al Dialogue Soci in the Slov Slovak ak R Republi epublic c in the Juraj

Social dialogue at EU level Kristne Krivmane DG EMPL Unit A2 Social Dialogue ??? What

Presentation Strategies and Dialogue Presentation Strategies and Dialogue Filesize: 4.45 MB

Outline Language Technology II Tasks of dialogue management Dialogue Management

Language Technology II: Natural Language Dialogue Dialogue System Design and Evaluation

Advanced Lesson 11 Topic 11: Unit 5: Educational program and dialogue Dialogue is typically a

Language Technology II: Natural Language Dialogue Dialogue Phenomena (1) Ivana

INVESTOR PRESENTATION PeerStream, Inc. | OTCQB: PEER September 2019 Ticker: PEER Safe Harbor

Advisory Group Roundtable 2 December 4, 2018 @DOEE_DC Where Is Your Happy Place? 2 C-PAG Recap

Introduction of GR@PPA event generator Soushi Tsuno Okayama U. l Introduction l Some examples

Development of medical record and presentation system in orthopedic emergency by application

Ohio Tax VDAs, Amnesty Programs, Offers in Compromise and Back Channel Efforts - The Best Tool

Making e-learning social and collaborative Nik Peachey Nik Peachey | Learning Technology

The Tech Savvy Teacher Dr. Kelly Edenfield GA MSP Quick Poll Answer in Chat Box What are

The Effective Remote Developer David Copeland Director of Engineering, Stitch Fix @davetron5000

Sambuz

Useful Links

Newsletter

Mail Us