and System Evaluation EE596B/LING580K -- Conversational Artificial - PowerPoint PPT Presentation

Dialog Management and System Evaluation EE596B/LING580K -- Conversational Artificial Intelligence Hao Fang University of Washington 4/17/2018 Slides adapted from: Andrew Maas, Spring 2017, CS224S/LING285 Spoken Language Processing (Lecture 10&11) Gina-Anne Levow, Spring 2017, LING 575 Spoken Dialog Systems (Lecture 4&5)

Content Management 1

Dialog Manager • Takes input from ASR/NLU components • Communicates with backend database & services • Determines what system does next • Passes output to NLG/TTS modules Dialog Policy 2

Dialog Policy 3

Dialog Policy • Dialog Structure • Dialog Initiative • Conversational Grounding 4

Turn-taking • Dialog is characterized by turn-taking. 5

Dialog Structure vs. Storytelling in Games • Linear storytelling • A fixed chronological order 6 Figures from: https://www.gamecareerguide.com/features/882/nonlinear_narrative_in_games_.php?print=1

Dialog Structure vs. Storytelling in Games • Nonlinear storytelling • Explore the world in any order 7 Figures from: https://www.gamecareerguide.com/features/882/nonlinear_narrative_in_games_.php?print=1

Dialog Structure vs. Storytelling in Games • Other non-linear structures 8 Figures from: https://www.gamecareerguide.com/features/882/nonlinear_narrative_in_games_.php?print=1

Dialog Structure • Three-act structure Beginning Middle End 9

Dialog Structure Accept Bid • Three-act structure Bid of of Start • Dialog Macrogame Theory Start A Reject Bid (Mann 2002) Game of Start • http://www- bcf.usc.edu/~billmann/dialogue /dtsite.htm Accept Bid • dialog as a sequence of games Bid of of End • 6 game acts End A • 15 frequently occurring games Reject Bid Game of End 10

Dialog Structure • Three-act structure Negotiation Execution Termination • Dialog Macrogame Theory (Mann 2002) Propose Continue Propose • Sounding Board (Fang et al. 2018) Accept Skip Accept • social chat as a sequence of Reject Pause Reject sub-dialogs • 3 stages Backoff • 10 coarse-grained actions 11

Sub-dialog Cycle 12

Dialog Initiative • Initiative : who has control of conversation System Initiative User Initiative • User knows what they can say • System is reactive but not proactive • System knows what user can say • User knows what system can do • Simple to build • question answering • OK for VERY simple tasks • voice web search • entering a credit card • System doesn’t • login name and password • ask questions back • engage in clarification dialog • engage in confirmation dialog 14

Mixed Initiative • Normal human-human dialog • initiative shifts back and forth between participants. • Mix of control based on prompt type • Open prompt: “How may I help you?” • open-ended, user can respond in any way • Directive prompt: “Say yes to accept call, or no otherwise” • stipulates user response type 15

Conversational Grounding • Presumed a joint & collaborative communication • speaker & hearer mutually believe the same thing • Speaker tries to establish and add to • common ground • mutual belief • Hearer must ground speaker’s utterances • indicate heard and understood • Principle of Closure (Clark 1996) (Norman 1988) • agents performing an action require evidence that they have succeeded in performing it 17

Principle of Closure • Non-speech closure example • push elevator button • light turns on • Grounding in HCI • Users confused if system fails to ground (Stifelman et al., 1993), (Yankelovich et al, 1995) 18

A Human-Human Conversation 19

Sounding Board Conversation • Indicate ASR/NLU errors • Acknowledge user reaction What’s your opinion? That’s cool! That’s sad. I heard you asked: I’m sorry to make your I’m happy you feel this what’s your peanut ? I’m sad! Do you want to is cool! Have you read not sure I know the talk about something this news? … answer else? 20

Conversational Implicature • Meaning more than just literal contribution • Indirect speech acts How about we talk about movies? OK uh I don’t watch movies very often. Continue Switch Topic 21

Grice’s Maxims Quantity Quality Be informative Be truthful Grice’s Maxims Relevance Manner Be relevant Be perspicuous 22

Dialog Manager Architectures 23

Example: A Trivial Airline Travel System • Ask the user for a departure city • Ask for a destination city • Ask for a time • Ask whether the trip is round-trip or not 24

Finite-state Dialog Manager • System completely controls the conversation with the user • It asks the user a series of questions • Ignores (or misinterprets) anything the user says that is not a direct answer to the system’s questions 26

System Initiative + Universals • We can give users a little more flexibility by adding universals : commands you can say anywhere • As if we augmented every state of FSA with these • Help (AMAZON.HelpIntent) • Start Over (AMAZON.StartOverIntent) • Repeat (AMAZON.RepeatIntent) • This describes many implemented systems • But still doesn’t allow user much flexibility 27

Finite-state Dialog Manager Advantages Disadvantages • Straightforward to encode • Limited flexibility of interaction • constrained input – single item • Clear mapping of interaction to • fully system controlled model • restrictive dialog structure & order • Well-suited to simple • Ill-suited to complex problem- information access solving 28

Frame-based Dialog Manager FLIGHT FRAME: ORIGIN: CITY: Boston DATE: Tuesday TIME: morning DEST: CITY: San Francisco AIRLINE: … 29

Frame-based Dialog Manager • Use the structure of the frame to guide dialogue Slot Question ORIGIN What city are you leaving from? DEST Where are you going? DEPT DATE What day would you like to leave? DEPT TIME What time would you like to leave? AIRLINE What is your preferred airline? 30

Frame-based Dialog Manager • Mixed initiative • User can answer multiple questions at once • System asks questions of user, filling any slots that user specifies • when frame is filled • when to query database • If user answers 3 questions at once, system has to fill slots and not ask these questions again! • Avoids strict constraints on order of the finite-state architecture. 31

Frame-based Dialog Manager Advantages Disadvantages • Relatively flexible input & orders • Ill-suited to more complex problem-solving • Well-suited to complex information access • Supports different types of initiative 32

Hierarchical Dialog Manager • Master (Boss) • rank miniskills • long-term coherence • user engagement • Miniskills (Minions) • greeting / goodbye / menu / topics • probe user personality • discuss a news article / movie • tell a fact / thought / advice / joke • ask / answer a question

Other Dialog Manager Architectures • Classic AI Planning • Information State (Markov Decision Process) • Distributional (Neural Network) 34

Natural Language Generation 35

Natural Language Generation (NLG) Natural Abstract Language Understanding Language Representation Natural Abstract Language Generation Language Representation 36

NLG Modules • Content planning • Language generation • what to say • how to say it • select syntactic structure and words • a module in dialog manager • adjust prosody NLG Content Sentence Surface Prosody TTS Planner Planner Realizer Assigner 37

NLG Approaches • Template-based generation • most common in practical systems • “What time do you want to leave CITY- ORIG?” • “How about we talk about TOPIC?” • Neural sequence models • recent research interest 38 Figure from: Hannaneh Hajishirzi, EE 511 Winter 2018 – “Introduction to Statistical Learning”.

System Evaluation 39

Motivation • Goal: determine overall user satisfaction • A metric to compare systems • can’t improve it if we don’t know where it fails • can’t decide between two systems without a goodness metric • A metric as an input to reinforcement learning • automatically improve system performance via learning 40

Dialog System Evaluation • Extrinsic Evaluation: embedded in some external task • Intrinsic Evaluation: evaluating the component as such • What constitutes success or failure for a dialog system? TTS Performance Was the system easy to understand? ASR Performance Did the system understand what you said? Task Ease Was it easy to find the message/flight/train you wanted? Interaction Pace Was the pace of interaction with the system appropriate? User Expertise Did you know what you could say at each point? System Response How often was the system sluggish and slow to reply to you? Expected Behavior Did the system work the way you expected it to? Future Use Do you think you’d use the system in future? User Satisfaction survey, adapted from (Walker et al. 2001) 41

PARADISE Framework 42

and System Evaluation EE596B/LING580K -- Conversational Artificial - PowerPoint PPT Presentation

Dialog Management and System Evaluation EE596B/LING580K -- Conversational Artificial Intelligence Hao Fang University of Washington 4/17/2018 Slides adapted from: Andrew Maas, Spring 2017, CS224S/LING285 Spoken Language Processing (Lecture

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Dependability Evaluation Techniques for Dependability Evaluation The dependability evaluation of

Programme BRICK Programme Evaluation: How, why and what? The plan Practical evaluation -

Webinar on Meta-evaluation Approaches to Improve Evaluation Practice Mnica Lomea Gelis,

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation & Analysis Lori

Evaluation DEMMS: Evaluation of Multimedia What are the Evaluation lectures about: When

Heuristic Evaluation (Pinelle) Heuristic evaluation is a method of qualitative evaluation of

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson & Pyla UX Evaluation

Telematics 2 & Performance Evaluation Chapter 4 Introduction to Performance Evaluation

Chapter 3: Operating-System Structures System Components Operating System Services

Chapter 3: Operating-System Structures System Components Operating System Services

Leader Evaluation System www.engageNY.org Teacher and Leader Evaluation System 20% State

End User Development: Approaches Towards A End User Development: Approaches Towards A Flexible

End-to-End Arguments in System Design J.H. Saltzer, D.P. Reed and D.D. Clark Presented by Ankit

Advisor Integrations Supercharging your practice 1 TODAYS WEBINAR On the call Jonathan

Office of Inspector General NSF Grants Conference June 1-2, 2015 William J. Kilgallin Senior

over Pub/Sub Systems Georgios Bouloukakis 1 , Nikolaos Georgantas 1 , Ajay Kattepur 2 &

MC714 - Sistemas Distribuidos slides by Maarten van Steen (adapted from Distributed System - 3rd

XMLTree Model 5 May 2016 OSU CSE 1 XMLTree The XMLTree component family allows you to

Goal: next 2-3 weeks Create a pla9orm game (side scrolling game) leveraging Canvas Tutorial

and System Evaluation EE596B/LING580K -- Conversational Artificial - PowerPoint PPT Presentation

Dialog Management and System Evaluation EE596B/LING580K -- Conversational Artificial Intelligence Hao Fang University of Washington 4/17/2018 Slides adapted from: Andrew Maas, Spring 2017, CS224S/LING285 Spoken Language Processing (Lecture

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Dependability Evaluation Techniques for Dependability Evaluation The dependability evaluation of

Programme BRICK Programme Evaluation: How, why and what? The plan Practical evaluation -

Webinar on Meta-evaluation Approaches to Improve Evaluation Practice Mnica Lomea Gelis,

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation &amp; Analysis Lori

Evaluation DEMMS: Evaluation of Multimedia What are the Evaluation lectures about: When

Heuristic Evaluation (Pinelle) Heuristic evaluation is a method of qualitative evaluation of

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson &amp; Pyla UX Evaluation

Telematics 2 &amp; Performance Evaluation Chapter 4 Introduction to Performance Evaluation

Chapter 3: Operating-System Structures System Components Operating System Services

Chapter 3: Operating-System Structures System Components Operating System Services

Leader Evaluation System www.engageNY.org Teacher and Leader Evaluation System 20% State

End User Development: Approaches Towards A End User Development: Approaches Towards A Flexible

End-to-End Arguments in System Design J.H. Saltzer, D.P. Reed and D.D. Clark Presented by Ankit

Advisor Integrations Supercharging your practice 1 TODAYS WEBINAR On the call Jonathan

Office of Inspector General NSF Grants Conference June 1-2, 2015 William J. Kilgallin Senior

over Pub/Sub Systems Georgios Bouloukakis 1 , Nikolaos Georgantas 1 , Ajay Kattepur 2 &amp;

MC714 - Sistemas Distribuidos slides by Maarten van Steen (adapted from Distributed System - 3rd

XMLTree Model 5 May 2016 OSU CSE 1 XMLTree The XMLTree component family allows you to

Goal: next 2-3 weeks Create a pla9orm game (side scrolling game) leveraging Canvas Tutorial

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation & Analysis Lori

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson & Pyla UX Evaluation

Telematics 2 & Performance Evaluation Chapter 4 Introduction to Performance Evaluation

over Pub/Sub Systems Georgios Bouloukakis 1 , Nikolaos Georgantas 1 , Ajay Kattepur 2 &