A Framework for Learning Multimodal Clarification Strategies Verena - - PowerPoint PPT Presentation

a framework for learning multimodal clarification
SMART_READER_LITE
LIVE PREVIEW

A Framework for Learning Multimodal Clarification Strategies Verena - - PowerPoint PPT Presentation

Motivation Framework The Data Collection Performance modelling Future work Summary A Framework for Learning Multimodal Clarification Strategies Verena Rieser 1 Ivana Kruijff-Korbayov 1 Oliver Lemon 2 1 Department of Computational


slide-1
SLIDE 1

Motivation Framework The Data Collection Performance modelling Future work Summary

A Framework for Learning Multimodal Clarification Strategies

Verena Rieser1 Ivana Kruijff-Korbayová1 Oliver Lemon2

1Department of Computational Linguistics,

Saarland University

2School of Informatics,

University of Edinburgh

In affiliation with: TALK Project http://www.talk-project.org/

slide-2
SLIDE 2

Motivation Framework The Data Collection Performance modelling Future work Summary

CRs in Spoken Dialogue Systems

System: What city are you leaving from? User: Urbana Champaign. System: Sorry, I’m not sure I understood what you said. Where are you leaving from? User: Urbana Champaign. System: I’m still having trouble understanding you. . . . What city are you leaving from? User: Chicago.

[CMU Communicator – User-System]

→ System performs badly and sounds quite artificial.

slide-3
SLIDE 3

Motivation Framework The Data Collection Performance modelling Future work Summary

CRs in Human-Human Dialogue

Cust: I guess getting a car in London will not do me much good in /uh/ Spain is that right? Agent: I’m sorry? Getting a car . . . ? Cust: Yeah I’ll need a car in Madrid. Agent: OK. Cust.: I’ll be returning on Thursday the fifth. Agent: The fifth of February? Cust.: /UHU/

[CMU Communicator – Human-Human]

→ How to convert these kinds of clarification strategies to dialogue systems?

slide-4
SLIDE 4

Motivation Framework The Data Collection Performance modelling Future work Summary

Outline

Motivation Previous work Framework The Learning Approach The Data Collection Experimental Setup Results form the WOZ study Performance modelling RL and Performance modelling Dialogue costs and multimodality Ambiguity and (sub-)task success Future work Policy Shaping User-centred rewards

slide-5
SLIDE 5

Motivation Framework The Data Collection Performance modelling Future work Summary

Outline

Motivation Previous work Framework The Learning Approach The Data Collection Experimental Setup Results form the WOZ study Performance modelling RL and Performance modelling Dialogue costs and multimodality Ambiguity and (sub-)task success Future work Policy Shaping User-centred rewards

slide-6
SLIDE 6

Motivation Framework The Data Collection Performance modelling Future work Summary

Outline

Motivation Previous work Framework The Learning Approach The Data Collection Experimental Setup Results form the WOZ study Performance modelling RL and Performance modelling Dialogue costs and multimodality Ambiguity and (sub-)task success Future work Policy Shaping User-centred rewards

slide-7
SLIDE 7

Motivation Framework The Data Collection Performance modelling Future work Summary

Outline

Motivation Previous work Framework The Learning Approach The Data Collection Experimental Setup Results form the WOZ study Performance modelling RL and Performance modelling Dialogue costs and multimodality Ambiguity and (sub-)task success Future work Policy Shaping User-centred rewards

slide-8
SLIDE 8

Motivation Framework The Data Collection Performance modelling Future work Summary

Outline

Motivation Previous work Framework The Learning Approach The Data Collection Experimental Setup Results form the WOZ study Performance modelling RL and Performance modelling Dialogue costs and multimodality Ambiguity and (sub-)task success Future work Policy Shaping User-centred rewards

slide-9
SLIDE 9

Motivation Framework The Data Collection Performance modelling Future work Summary

Outline

Motivation Previous work Framework The Learning Approach The Data Collection Experimental Setup Results form the WOZ study Performance modelling RL and Performance modelling Dialogue costs and multimodality Ambiguity and (sub-)task success Future work Policy Shaping User-centred rewards

slide-10
SLIDE 10

Motivation Framework The Data Collection Performance modelling Future work Summary

Generating CRs in task-oriented dialogues

[Rieser and Moore] Implications for generating clarification requests in task-oriented dialogues, ACL-05.

  • Form-function mappings
  • Human decision making on function features was

influenced by dialogue type, modality and channel quality.

slide-11
SLIDE 11

Motivation Framework The Data Collection Performance modelling Future work Summary

Generating CRs in task-oriented dialogues

[Rieser and Moore] Implications for generating clarification requests in task-oriented dialogues, ACL-05.

  • Form-function mappings
  • Human decision making on function features was

influenced by dialogue type, modality and channel quality.

slide-12
SLIDE 12

Motivation Framework The Data Collection Performance modelling Future work Summary

Generating CRs in task-oriented dialogues

[Rieser and Moore] Implications for generating clarification requests in task-oriented dialogues, ACL-05.

  • Form-function mappings

→ We know how to generate surface forms of CRs once we have the functions

  • Human decision making on function features was

influenced by dialogue type, modality and channel quality.

slide-13
SLIDE 13

Motivation Framework The Data Collection Performance modelling Future work Summary

Generating CRs in task-oriented dialogues

[Rieser and Moore] Implications for generating clarification requests in task-oriented dialogues, ACL-05.

  • Form-function mappings

→ We know how to generate surface forms of CRs once we have the functions

  • Human decision making on function features was

influenced by dialogue type, modality and channel quality.

slide-14
SLIDE 14

Motivation Framework The Data Collection Performance modelling Future work Summary

Generating CRs in task-oriented dialogues

[Rieser and Moore] Implications for generating clarification requests in task-oriented dialogues, ACL-05.

  • Form-function mappings

→ We know how to generate surface forms of CRs once we have the functions

  • Human decision making on function features was

influenced by dialogue type, modality and channel quality. For dialogue systems we still don’t know:

slide-15
SLIDE 15

Motivation Framework The Data Collection Performance modelling Future work Summary

Generating CRs in task-oriented dialogues

[Rieser and Moore] Implications for generating clarification requests in task-oriented dialogues, ACL-05.

  • Form-function mappings

→ We know how to generate surface forms of CRs once we have the functions

  • Human decision making on function features was

influenced by dialogue type, modality and channel quality. For dialogue systems we still don’t know: → How to set the function features?

slide-16
SLIDE 16

Motivation Framework The Data Collection Performance modelling Future work Summary

Generating CRs in task-oriented dialogues

[Rieser and Moore] Implications for generating clarification requests in task-oriented dialogues, ACL-05.

  • Form-function mappings

→ We know how to generate surface forms of CRs once we have the functions

  • Human decision making on function features was

influenced by dialogue type, modality and channel quality. For dialogue systems we still don’t know: → How to set the function features? → How do these strategies perform?

slide-17
SLIDE 17

Motivation Framework The Data Collection Performance modelling Future work Summary

Outline

Motivation Previous work Framework The Learning Approach The Data Collection Experimental Setup Results form the WOZ study Performance modelling RL and Performance modelling Dialogue costs and multimodality Ambiguity and (sub-)task success Future work Policy Shaping User-centred rewards

slide-18
SLIDE 18

Motivation Framework The Data Collection Performance modelling Future work Summary

Approach

Assumptions

  • Clarification strategies involve complex decision making
  • ver a variety of contextual factors
  • and exhaustive planning towards maximising a desired
  • utcome.

→ Apply reinforcement learning (RL) in the information state update (ISU) approach.

What is RL?

slide-19
SLIDE 19

Motivation Framework The Data Collection Performance modelling Future work Summary

Framework for learning multimodal CRs

Overall approach: MDP = (S, A, T, R)

  • 1. Collect data on possible strategies in WOZ experiment.

→ Extract {A, S, R}

  • 2. Bootstrap an initial policy using supervised learning in the

ISU approach. → Learn wizards’ decisions in context (T)

  • 3. Optimise the learnt policy for dialogue systems using RL

(π* ≈ maxE[

j≥i r(d, j)|si, a]).

→ How can we improve online reward measures r(d, j)?

slide-20
SLIDE 20

Motivation Framework The Data Collection Performance modelling Future work Summary

Outline

Motivation Previous work Framework The Learning Approach The Data Collection Experimental Setup Results form the WOZ study Performance modelling RL and Performance modelling Dialogue costs and multimodality Ambiguity and (sub-)task success Future work Policy Shaping User-centred rewards

slide-21
SLIDE 21

Motivation Framework The Data Collection Performance modelling Future work Summary

The SAMMIE-21 Data Collection

Figure: Multimodal Wizard-of-Oz data collection setup for an in-car music player application, using the Lane Change driving simulator. Top right: User, Top left: Wizard, Bottom: transcribers.

1SAMMIE stands for Saarbrücken Multimodal MP3 Player Interaction

Experiment (cf. for more details [Kruijff-Korbayová et al.], ENLG 2005).

slide-22
SLIDE 22

Motivation Framework The Data Collection Performance modelling Future work Summary

Experimental Setup

6 wizards, 24 subjects User:

  • User’s primary task is driving
  • Secondary MP3 selection task

Wizard:

  • Screen output options pre-computed, wizard freely talking
  • Wizard “sees what the system sees"

Introducing uncertainty:

  • Corrupted transcriptions by "word killer" agent (≈ acoustic

problems)

  • Lexical and reference ambiguities by task and DB
  • Pop-up questionnaire window "CLARIE" agent
slide-23
SLIDE 23

Motivation Framework The Data Collection Performance modelling Future work Summary

Outline

Motivation Previous work Framework The Learning Approach The Data Collection Experimental Setup Results form the WOZ study Performance modelling RL and Performance modelling Dialogue costs and multimodality Ambiguity and (sub-)task success Future work Policy Shaping User-centred rewards

slide-24
SLIDE 24

Motivation Framework The Data Collection Performance modelling Future work Summary

Evaluation

  • 1772 turns and 17076 words.
  • 774 wizard turns, 10.2% CRs (from CLARIE)
  • User Satisfaction fairly high across wizards (15.0, δ=2.9,

range 5 to 25)

  • Multimodality: “Most helpful" vs. distracting
slide-25
SLIDE 25

Motivation Framework The Data Collection Performance modelling Future work Summary

Corpus Requirements for Performance Modelling

  • “Costs" caused by multi-modal dialogue acts.
  • Vague task success by non directed task definition and

high ambiguity.

  • In-car environment: cognitive workload on primary task.
  • Need to explore → online reward measure!
slide-26
SLIDE 26

Motivation Framework The Data Collection Performance modelling Future work Summary

Outline

Motivation Previous work Framework The Learning Approach The Data Collection Experimental Setup Results form the WOZ study Performance modelling RL and Performance modelling Dialogue costs and multimodality Ambiguity and (sub-)task success Future work Policy Shaping User-centred rewards

slide-27
SLIDE 27

Motivation Framework The Data Collection Performance modelling Future work Summary

Currently applied (ad hoc) Reward Measures

  • User satisfaction from questionnaires (offline)

e.g. Final Reward = 14.94;

  • Binary task success (online)

e.g. Final Reward = +1|-1;

  • Cost function of filled and confirmed slot values, dialogue

length etc. (online)

e.g. Final Reward = (expected length)+(filled slots)+(retrieving info)+. . .

  • US as defined in PARADISE (online)

e.g. Final Reward (US)=0.47*(Mean Recognition Score)+0.21(Perception of task completion)+0.15*(elapsed time); → Can we use existing (fine grained) evaluation schemes?

slide-28
SLIDE 28

Motivation Framework The Data Collection Performance modelling Future work Summary

RL and PARADISE

Performance modelling for RL in PARADISE [Walker], 2000.

slide-29
SLIDE 29

Motivation Framework The Data Collection Performance modelling Future work Summary

RL and PARADISE

Performance modelling for RL in PARADISE [Walker], 2000. UserSatisfaction(max TaskSuccess, min Costs)

slide-30
SLIDE 30

Motivation Framework The Data Collection Performance modelling Future work Summary

RL and PARADISE

Performance modelling for RL in PARADISE [Walker], 2000. UserSatisfaction(max TaskSuccess, min Costs)

slide-31
SLIDE 31

Motivation Framework The Data Collection Performance modelling Future work Summary

Outline

Motivation Previous work Framework The Learning Approach The Data Collection Experimental Setup Results form the WOZ study Performance modelling RL and Performance modelling Dialogue costs and multimodality Ambiguity and (sub-)task success Future work Policy Shaping User-centred rewards

slide-32
SLIDE 32

Motivation Framework The Data Collection Performance modelling Future work Summary

Dialogue costs and dialogue acts

PARADISE:

  • turn duration, elapsed time, number of turns, . . .

DATE:

  • accounts for relations between cost features and features

indicating task success

  • multiple views on one turn: conversational domain,

task/sub-task level, speech act Example: For certain speech acts turn duration is positively related to US [Walker and Passonneau], 2001) → present-info indicates task success

slide-33
SLIDE 33

Motivation Framework The Data Collection Performance modelling Future work Summary

Dialogue costs and dialogue acts

PARADISE:

  • turn duration, elapsed time, number of turns, . . .

DATE:

  • accounts for relations between cost features and features

indicating task success

  • multiple views on one turn: conversational domain,

task/sub-task level, speech act Example: For certain speech acts turn duration is positively related to US [Walker and Passonneau], 2001) → present-info indicates task success

slide-34
SLIDE 34

Motivation Framework The Data Collection Performance modelling Future work Summary

Dialogue costs and dialogue acts

PARADISE:

  • turn duration, elapsed time, number of turns, . . .

DATE:

  • accounts for relations between cost features and features

indicating task success

  • multiple views on one turn: conversational domain,

task/sub-task level, speech act Example: For certain speech acts turn duration is positively related to US [Walker and Passonneau], 2001) → present-info indicates task success

slide-35
SLIDE 35

Motivation Framework The Data Collection Performance modelling Future work Summary

Dialogue costs and dialogue acts

PARADISE:

  • turn duration, elapsed time, number of turns, . . .

DATE:

  • accounts for relations between cost features and features

indicating task success

  • multiple views on one turn: conversational domain,

task/sub-task level, speech act Example: For certain speech acts turn duration is positively related to US [Walker and Passonneau], 2001) → present-info indicates task success

slide-36
SLIDE 36

Motivation Framework The Data Collection Performance modelling Future work Summary

Dialogue costs and dialogue acts

PARADISE:

  • turn duration, elapsed time, number of turns, . . .

DATE:

  • accounts for relations between cost features and features

indicating task success

  • multiple views on one turn: conversational domain,

task/sub-task level, speech act Example: For certain speech acts turn duration is positively related to US [Walker and Passonneau], 2001) → present-info indicates task success

slide-37
SLIDE 37

Motivation Framework The Data Collection Performance modelling Future work Summary

Dialogue costs and dialogue acts

PARADISE:

  • turn duration, elapsed time, number of turns, . . .

DATE:

  • accounts for relations between cost features and features

indicating task success

  • multiple views on one turn: conversational domain,

task/sub-task level, speech act Example: For certain speech acts turn duration is positively related to US [Walker and Passonneau], 2001) → present-info indicates task success

slide-38
SLIDE 38

Motivation Framework The Data Collection Performance modelling Future work Summary

Costs of Multimodal Dialogue Acts

ID Utterance Speaker Modality Speech act 1 Please play “Nevermind". user speech request 2a Does this list contain the song? wizard speech request info 2b [shows list with 20 DB matches] wizard graphic present info 3a

  • Yes. It’s number 4.

user speech provide info 3b [selects item 4] user graphic provide info

  • Simultaneous actions
  • Redundant actions
slide-39
SLIDE 39

Motivation Framework The Data Collection Performance modelling Future work Summary

Costs of Multimodal Dialogue Acts

ID Utterance Speaker Modality Speech act 1 Please play “Nevermind". user speech request 2a Does this list contain the song? wizard speech request info 2b [shows list with 20 DB matches] wizard graphic present info 3a

  • Yes. It’s number 4.

user speech provide info 3b [selects item 4] user graphic provide info

  • Simultaneous actions
  • Redundant actions
slide-40
SLIDE 40

Motivation Framework The Data Collection Performance modelling Future work Summary

Costs of Multimodal Dialogue Acts

ID Utterance Speaker Modality Speech act 1 Please play “Nevermind". user speech request 2a Does this list contain the song? wizard speech request info 2b [shows list with 20 DB matches] wizard graphic present info 3a

  • Yes. It’s number 4.

user speech provide info 3b [selects item 4] user graphic provide info

  • Simultaneous actions
  • Redundant actions
slide-41
SLIDE 41

Motivation Framework The Data Collection Performance modelling Future work Summary

Costs of Multimodal Dialogue Acts

ID Utterance Speaker Modality Speech act 1 Please play “Nevermind". user speech request 2a Does this list contain the song? wizard speech request info 2b [shows list with 20 DB matches] wizard graphic present info 3a

  • Yes. It’s number 4.

user speech provide info 3b [selects item 4] user graphic provide info

  • Simultaneous actions
  • Redundant actions
slide-42
SLIDE 42

Motivation Framework The Data Collection Performance modelling Future work Summary

Costs of Multimodal Dialogue Acts

ID Utterance Speaker Modality Speech act 1 Please play “Nevermind". user speech request 2a Does this list contain the song? wizard speech request info 2b [shows list with 20 DB matches] wizard graphic present info 3a

  • Yes. It’s number 4.

user speech provide info 3b [selects item 4] user graphic provide info

  • Simultaneous actions
  • Redundant actions
slide-43
SLIDE 43

Motivation Framework The Data Collection Performance modelling Future work Summary

Outline

Motivation Previous work Framework The Learning Approach The Data Collection Experimental Setup Results form the WOZ study Performance modelling RL and Performance modelling Dialogue costs and multimodality Ambiguity and (sub-)task success Future work Policy Shaping User-centred rewards

slide-44
SLIDE 44

Motivation Framework The Data Collection Performance modelling Future work Summary

Task success

PARADISE: AVM-style definition of task success

attribute possible values info flow <depart-city> {Milano, Roma, Torino, Trento} to agent <arrival-city> {Milano, Roma, Torino, Trento} to agent <depart-range> {morning, evening} to agent <depart-time> {6am, 8am, 6pm, 9pm} to user

PROMISE: [Beringer et al.], 2002

  • information bits to measure (sub-)task success

info bits are defined to describe when a task is completed; Example: "Plan an evening watching TV": film = [channel, time] ∨ [title, time] ∨ [title, channel]∨ . . .

slide-45
SLIDE 45

Motivation Framework The Data Collection Performance modelling Future work Summary

Task success

PARADISE: AVM-style definition of task success

attribute possible values info flow <depart-city> {Milano, Roma, Torino, Trento} to agent <arrival-city> {Milano, Roma, Torino, Trento} to agent <depart-range> {morning, evening} to agent <depart-time> {6am, 8am, 6pm, 9pm} to user

PROMISE: [Beringer et al.], 2002

  • information bits to measure (sub-)task success

info bits are defined to describe when a task is completed; Example: "Plan an evening watching TV": film = [channel, time] ∨ [title, time] ∨ [title, channel]∨ . . .

slide-46
SLIDE 46

Motivation Framework The Data Collection Performance modelling Future work Summary

Task success

PARADISE: AVM-style definition of task success

attribute possible values info flow <depart-city> {Milano, Roma, Torino, Trento} to agent <arrival-city> {Milano, Roma, Torino, Trento} to agent <depart-range> {morning, evening} to agent <depart-time> {6am, 8am, 6pm, 9pm} to user

PROMISE: [Beringer et al.], 2002

  • information bits to measure (sub-)task success

info bits are defined to describe when a task is completed; Example: "Plan an evening watching TV": film = [channel, time] ∨ [title, time] ∨ [title, channel]∨ . . .

slide-47
SLIDE 47

Motivation Framework The Data Collection Performance modelling Future work Summary

Ambiguity in PROMISE

Your little brother likes to listen to heavy metal music. You want to build him a playlist including three metal songs. Make sure you have “Enter Sandman" on the playlist! Save the playlist under the name “heavy guys".

main task (makePlaylist) sub-tasks: search(item1), search(item2),

search(item3), playlist( name), add(item1, name), add(item2, name), add(item3, name)

info-bits: item1= [ title: “ Enter Sandman" ] ,

item2=[ title] ∨ [ album,track] . . .

What to do when “Enter Sandman" has several matches in the DB? How to measure task success online?

slide-48
SLIDE 48

Motivation Framework The Data Collection Performance modelling Future work Summary

Ambiguity in PROMISE

Your little brother likes to listen to heavy metal music. You want to build him a playlist including three metal songs. Make sure you have “Enter Sandman" on the playlist! Save the playlist under the name “heavy guys".

main task (makePlaylist) sub-tasks: search(item1), search(item2),

search(item3), playlist( name), add(item1, name), add(item2, name), add(item3, name)

info-bits: item1= [ title: “ Enter Sandman" ] ,

item2=[ title] ∨ [ album,track] . . .

What to do when “Enter Sandman" has several matches in the DB? How to measure task success online?

slide-49
SLIDE 49

Motivation Framework The Data Collection Performance modelling Future work Summary

Ambiguity in PROMISE

Your little brother likes to listen to heavy metal music. You want to build him a playlist including three metal songs. Make sure you have “Enter Sandman" on the playlist! Save the playlist under the name “heavy guys".

main task (makePlaylist) sub-tasks: search(item1), search(item2),

search(item3), playlist( name), add(item1, name), add(item2, name), add(item3, name)

info-bits: item1= [ title: “ Enter Sandman" ] ,

item2=[ title] ∨ [ album,track] . . .

What to do when “Enter Sandman" has several matches in the DB? How to measure task success online?

slide-50
SLIDE 50

Motivation Framework The Data Collection Performance modelling Future work Summary

Ambiguity in PROMISE

Your little brother likes to listen to heavy metal music. You want to build him a playlist including three metal songs. Make sure you have “Enter Sandman" on the playlist! Save the playlist under the name “heavy guys".

main task (makePlaylist) sub-tasks: search(item1), search(item2),

search(item3), playlist( name), add(item1, name), add(item2, name), add(item3, name)

info-bits: item1= [ title: “ Enter Sandman" ] ,

item2=[ title] ∨ [ album,track] . . .

What to do when “Enter Sandman" has several matches in the DB? How to measure task success online?

slide-51
SLIDE 51

Motivation Framework The Data Collection Performance modelling Future work Summary

Ambiguity in PROMISE

Your little brother likes to listen to heavy metal music. You want to build him a playlist including three metal songs. Make sure you have “Enter Sandman" on the playlist! Save the playlist under the name “heavy guys".

main task (makePlaylist) sub-tasks: search(item1), search(item2),

search(item3), playlist( name), add(item1, name), add(item2, name), add(item3, name)

info-bits: item1= [ title: “ Enter Sandman" ] ,

item2=[ title] ∨ [ album,track] . . .

What to do when “Enter Sandman" has several matches in the DB? How to measure task success online?

slide-52
SLIDE 52

Motivation Framework The Data Collection Performance modelling Future work Summary

Algorithm for flexible task success definition

  • 1. Extend the information bit set until the description is

precise. Example: item1= [title: “Enter Sandman"] If item1 has several matches in the DB: item1= [title:“Enter Sandman"] ∧ [album] → Recursive online definition of task success based on ambiguity.

  • 2. Backing-off to evaluate final task success based on

“user’s goal".

slide-53
SLIDE 53

Motivation Framework The Data Collection Performance modelling Future work Summary

Outline

Motivation Previous work Framework The Learning Approach The Data Collection Experimental Setup Results form the WOZ study Performance modelling RL and Performance modelling Dialogue costs and multimodality Ambiguity and (sub-)task success Future work Policy Shaping User-centred rewards

slide-54
SLIDE 54

Motivation Framework The Data Collection Performance modelling Future work Summary

Policy shaping for immediate credit

Policy shaping: argument the underlying reward structure with shaping function F (bias reflecting prior knowledge). M′ = (S, A, T, R + F) (1)

  • Task success: give credit for every (grounded) information

bit.

  • Mutlimodal cost function: F can be estimated with dynamic

shaping.

slide-55
SLIDE 55

Motivation Framework The Data Collection Performance modelling Future work Summary

Outline

Motivation Previous work Framework The Learning Approach The Data Collection Experimental Setup Results form the WOZ study Performance modelling RL and Performance modelling Dialogue costs and multimodality Ambiguity and (sub-)task success Future work Policy Shaping User-centred rewards

slide-56
SLIDE 56

Motivation Framework The Data Collection Performance modelling Future work Summary

What we haven’t solved so far . . .

  • How to account for more user-centred reward measures?
  • What about more qualitative measures?
  • What about cognitive load while driving?

→ Can we utilise “emotions" as continuos reward signal?

slide-57
SLIDE 57

Motivation Framework The Data Collection Performance modelling Future work Summary

Summary

Hypothesis

  • Multi-modal clarification strategies involve complex

planning over a variety of contextual factors while maximising user satisfaction. Method

  • Apply RL in the ISU update approach and model user

satisfaction by assigning continuous, local rewards in combination with “delayed" rewards. Expected outcome

  • Learn flexible, context-adaptive strategy for clarification

subdialogues

  • Define a portable online reward measure.
slide-58
SLIDE 58

Motivation Framework The Data Collection Performance modelling Future work Summary

In other words . . .

Asking the “right" clarification depends on the context and the reward as the “goal".

Figure: Performance modelling for multi-modal in-car dialogues

slide-59
SLIDE 59

Motivation Framework The Data Collection Performance modelling Future work Summary

In other words . . .

Asking the “right" clarification depends on the context and the reward as the “goal".

  • Help to accomplish the task!
  • Save costs!
  • Don’t distract the driver!
  • Don’t frustrate the driver!
slide-60
SLIDE 60

Motivation Framework The Data Collection Performance modelling Future work Summary

In other words . . .

Asking the “right" clarification depends on the context and the reward as the “goal".

  • Help to accomplish the task!
  • Save costs!
  • Don’t distract the driver!
  • Don’t frustrate the driver!
slide-61
SLIDE 61

Motivation Framework The Data Collection Performance modelling Future work Summary

Papers associated with this talk:

  • Verena Rieser, Ivana Kruijff-Korbayová, Oliver Lemon: A

Framework for Learning Multimodal Clarification

  • Strategies. To be published in: Proceedings of SIGDIAL,

2005.

  • Ivana Kruijff-Korbayová, Nate Blaylock, Ciprian

Gerstenberger, Verena Rieser, Tilman Becker, Michael Kaisser, Peter Poller, Jan Schehl. An Experimental Setup for Collecting Data for Adaptive Output Planning in a Mutlimodal Dailogue System.Proceedings of European Natural Language Generation Workshop, 2005.

  • Verena Rieser and Johanna Moore. Implications for

Generating Clarification Requests in Task-oriented

  • Dialogues. Proceedings of the 43rd Annual Meeting of the

Association for Computational Linguistics (ACL), 2005.

slide-62
SLIDE 62

Appendix

For Further Reading I

Richard S. Sutton and Anrew G. Barto. Reinforcement Learning: An Introduction. The MIT Press, 1998. Marylin Walker and Rebecca Passoneau. DATE: A dialogue act tagging scheme for evaluation. Proceedings of the Human Language Technology Conference, 2001. Nicole Beringer and Ute Kartal and Katerina Louka and Florian Schiel and Uli Türk. PROMISE: A Procedure for Multimodal Interactive System Evaluation. Proceedings of the Workshop Multimodal Resources and Multimodal Systems Evaluation, 2002.

slide-63
SLIDE 63

Appendix

For Further Reading II

Marylin Walker. An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email. Journal of Artificial Intelligence Research, 2000.

slide-64
SLIDE 64

Appendix

Algorithm for flexible task success definition

U is user input string DB is number of matches in the database Initialize: task = makePlaylist makePlaylist = subtask(item1) ∧ . . . ∧ subtask(itemN) item1, . . . , item N = alternativeSetList alternativeSetList =infoSet1 ∨ infoSet2 ∨ . . . ∨ infoSetN infoSet1, infoSet2, . . . , infoSetN = infoBit1 ∧ infoBit2 ∧ infoBitN For every U: value = Parse(U) If (DB != 0): newSet = currentSet.add(infoBit) alternativeSetList.add(newSet) For every infoSet in alternativeSetList: try to instantiate infoSet currentUserGoal = infoSet instatiated

slide-65
SLIDE 65

Appendix

Outline

Implications for reward measures

slide-66
SLIDE 66

Appendix

Implications for a more informative reward

  • Hypothesis1: Local reward measures lead to faster

learning. → Filled slots as local and task success as final reward

  • Hypothesis2: The reward measure is the place to

incorporate complex domain knowledge → Reflect the relation between costs and speech acts

slide-67
SLIDE 67

Appendix

Policy shaping

Policy shaping: argument the underlying reward structure with shaping function F (bias reflecting prior knowledge). M′ = (S, A, T, R + F) (2) F can be estimated with dynamic shaping.

slide-68
SLIDE 68

Appendix

Reinforcement Learning (RL)

Figure: [Sutton and Barto], 1998.

The reward/performance function defines the “goal" of the RL agent.

slide-69
SLIDE 69

Appendix

MDP model for RL

  • Markov Decision Process:

MDP = (S, A, T, R)

  • Transition probability function:

Pa

ss = Pr{st+1 = s′|st = s, at = a}

  • Reward signal:

Ra

ss = E{rt+1|st = s, at = a, st+1}

  • Optimal policy π*:

Q(si, a) ≈ E[

j≥i r(d, j)|si, a]

slide-70
SLIDE 70

Appendix

Major features of RL

  • Adaptation
  • Evaluative feedback
  • Delayed reinforcement
  • Exploitation vs. exploration
slide-71
SLIDE 71

Appendix

Greedy actions

slide-72
SLIDE 72

Appendix

RL for dialogue systems

How does this work for us?