Generating output in the COMIC multimodal dialogue system Mary - PowerPoint PPT Presentation

Generating output in the COMIC multimodal dialogue system Mary Ellen Foster School of Informatics University of Edinburgh W3C MMI Workshop Sophia Antipolis, 20 July 2004 1

Overview  The COMIC project and demonstrator  Planning and generating output in COMIC  Multimodal fission in COMIC  Planning text, gestures, and facial expressions  Speech synthesis and output coordination  System evaluation  Next steps for fission 2

COMIC: “COnversational Multimodal Interaction with Computers”  EU FP5 project: March 2002-Feb 2005  Goal: apply results and models from cognitive psychology to multimodal dialogue  Demonstrator: adds a multimodal dialogue interface to a CAD-like system for bathroom design  1. Specify shape of bathroom  (2. Place furniture) ➢ 3. Browse available tiles 3

Input processing and dialogue management  Speech recognition and NLP  Handwriting and (pen-)gesture recognition  Multimodal fusion  Dialogue manager  Dialogue history manager, ontology manager 4

Fission and output processing ➢ Fission module (presentation planner)  Speech synthesis (Edinburgh)  Surface realiser: OpenCCG (White, 2004)  Speech synthesiser: Festival, unit selection  “Talking head” avatar  Bathroom-design application 5

Sample interaction (browsing tiles)  COMIC: [Introduction] ... “Are you ready?”  User: “Yes.”  COMIC: [Describes tiles on screen] ... “Please choose one.”  User: “Show me this one.” [Circles second design]  COMIC: [Chooses and describes tiles] ... “Do you want to see more modern designs?”  ... etc. ... 6

Fission inputs and outputs Realizer and Synthesizer ● Logical forms ● Canned text Avatar Dialogue acts Dialogue ● Phonemes FISSION manager ● Emphasis commands Application ● Expressions commands ● Gaze directions ViSoft application ● Phase switches ● Choosing tile sets ● Pointer commands 7

Fission tasks  Content selection and structuring  Elaborate the high-level specification from the dialogue manager  Modality selection  Decide on the content to be produced on each channel  Output coordination  Ensure the output is coordinated temporally and spatially 8

Sample output plan  DAM input: show(tileset21), describe(tileset21) Turn Acknowledge Choose tile set Describe tile set [nod] “Okay.” “This design “It uses tiles [...] is classic.” from ...” Sequence Immediate command Sentence 9

Creating and executing an output plan  Create initial high-level structure based on DAM specification  Elaborate and then output children in order  Planning and execution are interleaved; later children in preparation while output is being produced from earlier ones  Avoid adding to (already non-trivial) latency 10

Text planning with XSLT (non-canned text)  Gather information from system ontology; filter based on dialogue history; put in order  Combine adjacent messages when possible  Create a logical form (with alternatives) for each message and send it to the realiser  Details:  M E Foster and M White. Techniques for text planning with XSLT . NLPXML-4 Workshop, 25 July 2004, Barcelona. 11

Speech synthesis  Voice: general-purpose unit selection, with in-domain recording scripts  Realiser output includes intonation, but current voice can't support it (stay tuned!) <?xml version ="1.0" encoding ="UTF-8" ?> <!DOCTYPE apml SYSTEM "apml.dtd" > <apml> <performative> <emphasis x-pitchaccent ="Hstar">This </emphasis> <emphasis x-pitchaccent ="Hstar">design </emphasis> is <emphasis x-pitchaccent ="Hstar" > classic </emphasis> <boundary type ="LL" /> . </performative> </apml> 12

Speech timing  Speech timing determines presentation timing  Coordination achieved by adding labelled spans to the input of the speech module <seg id="123"> <speech id="123"> <words> <speech> <word id="w0" start="0.018750" end="0.334000" content="Hello"> Hello <phoneme id="p0" start="0.018750" end="0.101750" content="h"/> <span label="ww"> <phoneme id="p1" start="0.101750" end="0.114000" content="@"/> world <phoneme id="p2" start="0.114000" end="0.194563" content="l"/> </span> <phoneme id="p3" start="0.194563" end="0.334000" content="ou"/> . </word> <word id="w1" start="0.334000" end="0.819688" content="world"> </speech> <phoneme id="p4" start="0.334000" end="0.445750" content="w"/> </seg> <phoneme id="p5" start="0.445750" end="0.511813" content="@@r"/> <phoneme id="p6" start="0.511813" end="0.577188" content="r"/> <phoneme id="p7" start="0.577188" end="0.730187" content="l"/> <phoneme id="p8" start="0.730187" end="0.819688" content="d"/> </word> </words> <spans> <span type="labelled" info="ww" start="w1" end="w1"/> </spans> 13 </speech>

Planning pointer “gestures”  Mark NPs in input with on-screen referents, and choose gestures and offsets for some subset  Use application screen state to find objects  Two versions: rule-based, or corpus-based  Evaluation (just completed): forced choice between two versions; justify choice where possible  Details:  M E Foster. Corpus-based planning of deictic gestures in COMIC . INLG-04 (Student Session), Brockenhurst, 14-16 July 2004. 14

Facial expressions, gaze, and emphasis  Expressions and gaze: only between sentences  Phonemes: extracted from speech- synthesiser timing  Emphasis commands: based on pitch accents 15

Output sequencing and coordination  Sequences: Traverse subtree in order, waiting for any nodes that are not ready yet  Immediate commands (expressions, gaze, screen-state changes) : send command, wait for “done” report  Sentences:  Send text to synthesiser (canned or via realiser)  Send timing to avatar; prepare gestures  Send “go at time t ” + concrete gesture schedule 16

System evaluation  Subjects use system for 15-20 minutes  Conditions: full face or “zombie”  Measures  Recall of information presented (task success)  Subjective user-satisfaction questionnaire  Objective measures from log files  Just completed (37 subjects); no results yet  Evaluation of room-drawing phase pending 17

Next steps for fission  Incorporate ideas from centering theory into text planning (Kibble & Power, 2000; Karamanis, 2003)  Refer to a user model throughout the generation process (Moore et al., 2004)  Holy grail: instance-based multimodal generation  Gather good instances by having users rate various combinations (as in current gesture evaluation)  Use (upcoming) factored language models in OpenCCG to choose among cross-modal alternatives 18

W3C standards  Currently in use  XSLT, XPath: for text planning (NLPXML paper), plus many other stylesheets used internally  Possible additions  SMIL: not for serialisation; possibly for internal data structures  SSML: if the synthesiser supports it  EMMA for output? Find out more  (EMMA for input? can't comment) 19

References http://www.hcrc.ed.ac.uk/comic/ http://www.iccs.inf.ed.ac.uk/~mef/ 20

Generating output in the COMIC multimodal dialogue system Mary - PowerPoint PPT Presentation

Generating output in the COMIC multimodal dialogue system Mary Ellen Foster School of Informatics University of Edinburgh W3C MMI Workshop Sophia Antipolis, 20 July 2004 1 Overview The COMIC project and demonstrator Planning and

ROAD RANGERS ONLINE COMIC WEBINAR 2PM START VANIA PORFIRIO ROAD RANGERS - ONLINE COMIC

Adaptive Multimodal Dialogue Adaptive Multimodal Dialogue Management based on the Management

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

San Diego Comic-Con 2011 San Diego Comic-Con 2011 THANK YOU! THANK YOU! o State of the Galaxy

H2 F2009 H2 F2009 GENERATING GENERATING GENERATING GENERATING FREE CASH FLOW FREE CASH FLOW

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

dialogue notations and design Dialogue Notations and Design Dialogue Notations

Language Technology II: Natural Language Dialogue Verbal Output Generation in Dialogue

Comic and Serious Drinking in David Copperfield Veronica Delafosse Talk presented online to the

Mathematics Graduation Spiel 2007 aka . . . Mathematics Graduation Spiel 2007 aka . . .

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Chapter 12 Overview Devices and Output Visual Output Dynamic Visualizations Sound

16. Recursion 2 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20 Output: 12

Advanced Electric Generating Advanced Electric Generating Advanced Electric Generating

Ratchaburi Electricity Generating Holding PCL. Ratchaburi Electricity Generating Holding PCL.

Round-Optimal Secure Multiparty Computation with Honest Majority Prabhanjan Ananth Arka Rai

CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer

Dual-Decomposed Learning with Factorwise Oracles for Structured Prediction of Large Output Domain

Differential Privacy Privacy & Fairness in Data Science CS848 Fall 2019 2 Outline

Lecture 7: Convolutional Networks Justin Johnson Lecture 7 - 1 September 24, 2019 Reminder: A2

A Linearised Input-Output Representation for Control Synthesis in Flexible Multibody System A

Understanding simhwrt Output Nevember 22, 2011 CS6965 Fall 11 Simulator Updates You may or

61A Lecture 34 ! There will be a screencast of live lecture (as always) ! Screencasts:

Generating output in the COMIC multimodal dialogue system Mary - PowerPoint PPT Presentation

Generating output in the COMIC multimodal dialogue system Mary Ellen Foster School of Informatics University of Edinburgh W3C MMI Workshop Sophia Antipolis, 20 July 2004 1 Overview The COMIC project and demonstrator Planning and

ROAD RANGERS ONLINE COMIC WEBINAR 2PM START VANIA PORFIRIO ROAD RANGERS - ONLINE COMIC

Adaptive Multimodal Dialogue Adaptive Multimodal Dialogue Management based on the Management

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

San Diego Comic-Con 2011 San Diego Comic-Con 2011 THANK YOU! THANK YOU! o State of the Galaxy

H2 F2009 H2 F2009 GENERATING GENERATING GENERATING GENERATING FREE CASH FLOW FREE CASH FLOW

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

dialogue notations and design Dialogue Notations and Design Dialogue Notations

Language Technology II: Natural Language Dialogue Verbal Output Generation in Dialogue

Comic and Serious Drinking in David Copperfield Veronica Delafosse Talk presented online to the

Mathematics Graduation Spiel 2007 aka . . . Mathematics Graduation Spiel 2007 aka . . .

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Chapter 12 Overview Devices and Output Visual Output Dynamic Visualizations Sound

16. Recursion 2 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20 Output: 12

Advanced Electric Generating Advanced Electric Generating Advanced Electric Generating

Ratchaburi Electricity Generating Holding PCL. Ratchaburi Electricity Generating Holding PCL.

Round-Optimal Secure Multiparty Computation with Honest Majority Prabhanjan Ananth Arka Rai

CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer

Dual-Decomposed Learning with Factorwise Oracles for Structured Prediction of Large Output Domain

Differential Privacy Privacy &amp; Fairness in Data Science CS848 Fall 2019 2 Outline

Lecture 7: Convolutional Networks Justin Johnson Lecture 7 - 1 September 24, 2019 Reminder: A2

A Linearised Input-Output Representation for Control Synthesis in Flexible Multibody System A

Understanding simhwrt Output Nevember 22, 2011 CS6965 Fall 11 Simulator Updates You may or

61A Lecture 34 ! There will be a screencast of live lecture (as always) ! Screencasts:

Differential Privacy Privacy & Fairness in Data Science CS848 Fall 2019 2 Outline