Speech Processing 15-492/18-492 Spoken Dialog Systems Advanced - - PowerPoint PPT Presentation

speech processing 15 492 18 492
SMART_READER_LITE
LIVE PREVIEW

Speech Processing 15-492/18-492 Spoken Dialog Systems Advanced - - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Spoken Dialog Systems Advanced Concepts in Dialog Spoken Dialog Systems Basic steps for machine conversation: Basic steps for machine conversation: Take speech to text (ASR) Take speech to text (ASR)


slide-1
SLIDE 1

Speech Processing 15-492/18-492

Spoken Dialog Systems Advanced Concepts in Dialog

slide-2
SLIDE 2

Spoken Dialog Systems

  • Basic steps for machine conversation:

Basic steps for machine conversation:

  • Take speech to text (ASR)

Take speech to text (ASR)

  • Extract meaning from text (Parsing)

Extract meaning from text (Parsing)

  • Interpret meaning, decide what to do next (DM)

Interpret meaning, decide what to do next (DM)

  • Generate text to convey decision (NLG)

Generate text to convey decision (NLG)

  • Speak text (TTS)

Speak text (TTS)

  • When these work well, actual conversations

When these work well, actual conversations are possible are possible

  • Still can be too machine

Still can be too machine-

  • like to be natural

like to be natural

slide-3
SLIDE 3

Example

S: Welcome to S: Welcome to MyBus

  • MyBus. Which itinerary do you want information for?

. Which itinerary do you want information for? U: I need to go to the airport. U: I need to go to the airport. S: Where are you leaving from? S: Where are you leaving from? U: Downtown. U: Downtown. S: Just a minute. Let me check that for you. S: Just a minute. Let me check that for you. S: There is a 28X leaving downtown at 7:03 PM. It will arrive at S: There is a 28X leaving downtown at 7:03 PM. It will arrive at the the airport at 7:37 PM. airport at 7:37 PM. S: You can say when is the next bus, when is the previous bus, s S: You can say when is the next bus, when is the previous bus, start a tart a new query, or goodbye. new query, or goodbye. U: When is the previous bus? U: When is the previous bus? S: Just a second. There is a 28X leaving downtown at 4:20 PM. It S: Just a second. There is a 28X leaving downtown at 4:20 PM. It will will arrive at the airport at 4:56 PM. arrive at the airport at 4:56 PM. S: You can say when is the next bus, when is the previous bus, s S: You can say when is the next bus, when is the previous bus, start a tart a new query, or goodbye. new query, or goodbye. U: Goodbye. U: Goodbye. S: Thank you for using S: Thank you for using MyBus

  • MyBus. Goodbye.

. Goodbye.

slide-4
SLIDE 4

Spoken Dialog Systems

  • Of course, all those steps can be error

Of course, all those steps can be error-

  • prone

prone

  • In fact, rarely do all work 100% correctly

In fact, rarely do all work 100% correctly

  • What happens when errors occur?

What happens when errors occur?

  • Degraded machine conversation

Degraded machine conversation

  • Errors can chain, compounding problems

Errors can chain, compounding problems

  • Misrecognition

Bad parse Wrong decision Misrecognition Bad parse Wrong decision

slide-5
SLIDE 5

Example

S: Welcome to S: Welcome to MyBus MyBus. . S: Which itinerary do you want information for? S: Which itinerary do you want information for? U: uh… I U: uh… I wanna wanna go from Pittsburgh Airport to the Cathedral of go from Pittsburgh Airport to the Cathedral of Learning. Learning. S: Which itinerary do you want information for? S: Which itinerary do you want information for? U: I said I U: I said I wanna wanna go from Pittsburgh Airport to the Cathedral of go from Pittsburgh Airport to the Cathedral of Learning. Learning. S: Which itinerary do you want information for? S: Which itinerary do you want information for? U: What? U: What? S: Which itinerary do you want information for? S: Which itinerary do you want information for? U: Well… I U: Well… I wanna wanna go from Pittsburgh Airport to the… go from Pittsburgh Airport to the… S: Which itinerary do you want information for? S: Which itinerary do you want information for? S: Just a minute. Let me check that for you. S: Just a minute. Let me check that for you. U: What? U: What? S: Which itinerary do you want… S: Which itinerary do you want… S: Which itinerary do you want information for? S: Which itinerary do you want information for?

slide-6
SLIDE 6

Overview

  • What to do with imperfect ASR results?

What to do with imperfect ASR results?

  • Can anything useful be extracted or is there no

Can anything useful be extracted or is there no meaning at all in what was recognized? meaning at all in what was recognized?

  • How can a conversation that’s going badly

How can a conversation that’s going badly recover and end up successful? recover and end up successful?

  • What approaches can be used for dialog

What approaches can be used for dialog decision making? decision making?

slide-7
SLIDE 7

Handling Imperfect Recognition

  • Even with good recognition, you cannot

Even with good recognition, you cannot blindly trust ASR output… blindly trust ASR output…

  • Confidence annotation: Helios

Confidence annotation: Helios

  • Given the current state of the dialog, how

Given the current state of the dialog, how confident is the system that the input matches the confident is the system that the input matches the user’s intention? user’s intention?

  • Logistic regression based on speech, parsing,

Logistic regression based on speech, parsing, dialog features dialog features

  • Training from corpus of transcribed data

Training from corpus of transcribed data

slide-8
SLIDE 8

Grounding Concept Values

  • Grounding: process where conversation

Grounding: process where conversation participants establish common understanding participants establish common understanding

  • For each understood concept, choose among 3

For each understood concept, choose among 3 possible actions possible actions

  • Explicit confirmation: ask user a direct question, wait

Explicit confirmation: ask user a direct question, wait for a positive response before accepting for a positive response before accepting

“To the airport. Is this correct?” “To the airport. Is this correct?”

  • Implicit confirmation: repeat what was understood,

Implicit confirmation: repeat what was understood, accept unless user indicates it was wrong accept unless user indicates it was wrong

“To the airport. Where are you leaving from?” “To the airport. Where are you leaving from?”

  • No action: silently accept without informing user

No action: silently accept without informing user

  • Best choice can be situationally dependent

Best choice can be situationally dependent

slide-9
SLIDE 9

Enabling Confirmation in Olympus

Special Rosetta template prompts Special Rosetta template prompts

$Rosetta::Templates::act{“implicit_confirm"} = { “origin_place" => “Leaving from <origin_place>.", … }

Write Prompts Create Policies Attach Policies to Concepts

slide-10
SLIDE 10

Enabling Confirmation in Olympus

Confirmation policies Confirmation policies config config file: file: (Configurations/DesktopSAPI/expl_impl.pol)

EXPLORATION_MODE=epsilon-greedy EXPLORATION_PARAMETER=0.1 ACCEPT EXPL_CONF IMPL_CONF INACTIVE 1 -

  • CONFIDENT -

5 10 UNCONFIDENT - 10 5 GROUNDED 1 -

  • Write Prompts

Create Policies Attach Policies to Concepts

slide-11
SLIDE 11

Enabling Confirmation in Olympus

When defining concepts in the dialog manager, indicate When defining concepts in the dialog manager, indicate which policy to apply: which policy to apply:

DEFINE_AGENCY( CPerformTask, DEFINE_CONCEPTS( INT_USER_CONCEPT(query_type, “impl") STRING_USER_CONCEPT(origin_place, “expl_impl") … ) … )

Write Prompts Create Policies Attach Policies to Concepts

Concept name Policy name

slide-12
SLIDE 12

Handling Non-Understandings

  • No meaning can be extracted from user input

No meaning can be extracted from user input

S: Where do you want to go? S: Where do you want to go? U: (no parse) U: (no parse) S: ??? S: ???

  • Many possible system responses:

Many possible system responses:

S: Where do you want to go? S: Where do you want to go? S: Could you repeat that? S: Could you repeat that? S: For example, you can say, S: For example, you can say, “ “Downtown Downtown” ”. . S: Which route are you looking for? S: Which route are you looking for? … …

slide-13
SLIDE 13

Non-Understanding Policies

  • Repeat question

Repeat question

  • May work if temporary channel issue caused

May work if temporary channel issue caused ASR problems ASR problems

  • Frustrating to user if continued

Frustrating to user if continued

  • Provide example of what to say

Provide example of what to say

  • Can assist unfamiliar users

Can assist unfamiliar users

  • Annoys users who already said the example

Annoys users who already said the example and weren’t understood and weren’t understood

  • Change topic

Change topic

  • Gets user to talk about something else

Gets user to talk about something else

  • Still have to get original question answered

Still have to get original question answered

slide-14
SLIDE 14

Non-Understanding Policies

  • Two types of policies:

Two types of policies:

  • Handcrafted/deterministic

Handcrafted/deterministic

  Design a (small) space of dialog states

Design a (small) space of dialog states

  Set a utility to each action in each state

Set a utility to each action in each state

  • Data

Data-

  • driven

driven

  Learn optimal weight based on collected dialogs

Learn optimal weight based on collected dialogs

  Exploration/exploitation trade

Exploration/exploitation trade-

  • off at runtime
  • ff at runtime
slide-15
SLIDE 15

Create alternative versions of request template prompts: Create alternative versions of request template prompts:

"origin_place" => "Where are you leaving from?“,

Write Prompts Create Policies Attach Policies to Agents

Using Non-Understanding Strategies

slide-16
SLIDE 16

Create alternative versions of request template prompts: Create alternative versions of request template prompts:

"origin_place" => {“default”=>"Where are you leaving from?“, “explain_more”=>”Right now, I need to know the stop or landmark from which you will be leaving.”, “what_can_i_say”=>”For example, you can say THE AIRPORT, or DOWNTOWN.”},

Write Prompts Create Policies Attach Policies to Agents

Using Non-Understanding Strategies

slide-17
SLIDE 17

Non Non-

  • understanding policies

understanding policies config config file: file: (Configurations/DesktopSAPI/repeat.pol)

EXPLORATION_MODE=greedy STATE[FIRST_NONU] num_prev_nonu = 1 END STATE[SUBSEQUENT_NONU] num_prev_nonu > 1 END

Write Prompts Create Policies Attach Policies to Agents

Using Non-Understanding Strategies

slide-18
SLIDE 18

Non Non-

  • understanding policies

understanding policies config config file: file: (Configurations/DesktopSAPI/repeat.pol)

AREP TYCS FIRST_NONU 1 - SUBSEQUENT_NONU - 1

Write Prompts Create Policies Attach Policies to Agents

Using Non-Understanding Strategies

slide-19
SLIDE 19

When defining subagents in the dialog manager, indicate When defining subagents in the dialog manager, indicate which policy to apply: which policy to apply:

DEFINE_SUBAGENTS( SUBAGENT(RequestQuery, CRequestQuery, “tycs") SUBAGENT(RequestOriginPlace, CRequestOriginPlace, “arep_tycs") … )

Write Prompts Create Policies Attach Policies to Agents

Agent name Class name Policy name

Using Non-Understanding Strategies

slide-20
SLIDE 20

Dialog Management

  • Fundamentally, it is a task that:

Fundamentally, it is a task that:

  • Given some input, decide what output should

Given some input, decide what output should be generated be generated

  • How can this decision be made?

How can this decision be made?

  • Manually written rules

Manually written rules

  Olympus uses this approach in general

Olympus uses this approach in general – – you write you write the dialog task specification the dialog task specification

  • Or …?

Or …?

slide-21
SLIDE 21

Statistical Dialog Decision Making

  • POMDP: partially observable Markov decision

POMDP: partially observable Markov decision process process

  • Instead of a single explicit dialog state, track a

Instead of a single explicit dialog state, track a probability distribution over all possible states probability distribution over all possible states

  • Update probability mass as dialog progresses

Update probability mass as dialog progresses

  • Choose next action based on likelihood of

Choose next action based on likelihood of being in a given state being in a given state

  • Probabilities generated using confidence

Probabilities generated using confidence scores from ASR, parser, etc. scores from ASR, parser, etc.

slide-22
SLIDE 22

POMDP Dialog Design

slide-23
SLIDE 23

Hybrid Approach

  • Possible to improve hand

Possible to improve hand-

  • generated dialog

generated dialog manager by also using statistical approach manager by also using statistical approach

  • Take a “normal” set of dialog management rules

Take a “normal” set of dialog management rules

  • Treat these rules as a belief network

Treat these rules as a belief network

  • Add confidence scores as with a POMDP

Add confidence scores as with a POMDP approach approach

  • At decision time, apply all information (original

At decision time, apply all information (original handwritten rules, plus state likelihood) to handwritten rules, plus state likelihood) to determine next state in dialog determine next state in dialog

slide-24
SLIDE 24

Summary

  • Spoken dialog systems integrate several

Spoken dialog systems integrate several speech and language technologies to allow speech and language technologies to allow conversational machines conversational machines

  • Many different application types:

Many different application types:

  • Information giving, question answering,

Information giving, question answering, interactive machine personalities interactive machine personalities

  • Several types of dialog:

Several types of dialog:

  • System initiative, mixed initiative, classification

System initiative, mixed initiative, classification

slide-25
SLIDE 25

Summary

  • Olympus: a complete spoken dialog framework

Olympus: a complete spoken dialog framework

  • Provides all components needed to build SDS

Provides all components needed to build SDS

  • Modular, allows drop

Modular, allows drop-

  • in replacement of any

in replacement of any component component

  • RavenClaw

RavenClaw: Olympus dialog manager : Olympus dialog manager

  • Plan

Plan-

  • based hierarchical dialog structure

based hierarchical dialog structure

  • Dynamic task modification

Dynamic task modification

  • Provides advanced dialog features

Provides advanced dialog features

  Help, Error handing/recovery, Confirmation strategies

Help, Error handing/recovery, Confirmation strategies

  • Allows for more complex dialog tasks than

Allows for more complex dialog tasks than simpler frameworks (like simpler frameworks (like VoiceXML VoiceXML) )

slide-26
SLIDE 26

Summary

  • Error detection and recovery

Error detection and recovery

  • Misunderstandings

Confirmation Misunderstandings Confirmation

  Explicit vs. Implicit

Explicit vs. Implicit

  • Nonunderstandings

Nonunderstandings Recovery approaches Recovery approaches

  Mixture of several choices most helpful

Mixture of several choices most helpful

  Selecting which can be done heuristically or trained

Selecting which can be done heuristically or trained from data from data

  • Dialog decision making

Dialog decision making

  • Hand

Hand-

  • crafted rules and heuristics

crafted rules and heuristics

  • Statistical approaches

Statistical approaches

slide-27
SLIDE 27