Speech Processing 15-492/18-492 Spoken Dialog Systems Advanced - - PowerPoint PPT Presentation
Speech Processing 15-492/18-492 Spoken Dialog Systems Advanced - - PowerPoint PPT Presentation
Speech Processing 15-492/18-492 Spoken Dialog Systems Advanced Concepts in Dialog Spoken Dialog Systems Basic steps for machine conversation: Basic steps for machine conversation: Take speech to text (ASR) Take speech to text (ASR)
Spoken Dialog Systems
- Basic steps for machine conversation:
Basic steps for machine conversation:
- Take speech to text (ASR)
Take speech to text (ASR)
- Extract meaning from text (Parsing)
Extract meaning from text (Parsing)
- Interpret meaning, decide what to do next (DM)
Interpret meaning, decide what to do next (DM)
- Generate text to convey decision (NLG)
Generate text to convey decision (NLG)
- Speak text (TTS)
Speak text (TTS)
- When these work well, actual conversations
When these work well, actual conversations are possible are possible
- Still can be too machine
Still can be too machine-
- like to be natural
like to be natural
Example
S: Welcome to S: Welcome to MyBus
- MyBus. Which itinerary do you want information for?
. Which itinerary do you want information for? U: I need to go to the airport. U: I need to go to the airport. S: Where are you leaving from? S: Where are you leaving from? U: Downtown. U: Downtown. S: Just a minute. Let me check that for you. S: Just a minute. Let me check that for you. S: There is a 28X leaving downtown at 7:03 PM. It will arrive at S: There is a 28X leaving downtown at 7:03 PM. It will arrive at the the airport at 7:37 PM. airport at 7:37 PM. S: You can say when is the next bus, when is the previous bus, s S: You can say when is the next bus, when is the previous bus, start a tart a new query, or goodbye. new query, or goodbye. U: When is the previous bus? U: When is the previous bus? S: Just a second. There is a 28X leaving downtown at 4:20 PM. It S: Just a second. There is a 28X leaving downtown at 4:20 PM. It will will arrive at the airport at 4:56 PM. arrive at the airport at 4:56 PM. S: You can say when is the next bus, when is the previous bus, s S: You can say when is the next bus, when is the previous bus, start a tart a new query, or goodbye. new query, or goodbye. U: Goodbye. U: Goodbye. S: Thank you for using S: Thank you for using MyBus
- MyBus. Goodbye.
. Goodbye.
Spoken Dialog Systems
- Of course, all those steps can be error
Of course, all those steps can be error-
- prone
prone
- In fact, rarely do all work 100% correctly
In fact, rarely do all work 100% correctly
- What happens when errors occur?
What happens when errors occur?
- Degraded machine conversation
Degraded machine conversation
- Errors can chain, compounding problems
Errors can chain, compounding problems
- Misrecognition
Bad parse Wrong decision Misrecognition Bad parse Wrong decision
Example
S: Welcome to S: Welcome to MyBus MyBus. . S: Which itinerary do you want information for? S: Which itinerary do you want information for? U: uh… I U: uh… I wanna wanna go from Pittsburgh Airport to the Cathedral of go from Pittsburgh Airport to the Cathedral of Learning. Learning. S: Which itinerary do you want information for? S: Which itinerary do you want information for? U: I said I U: I said I wanna wanna go from Pittsburgh Airport to the Cathedral of go from Pittsburgh Airport to the Cathedral of Learning. Learning. S: Which itinerary do you want information for? S: Which itinerary do you want information for? U: What? U: What? S: Which itinerary do you want information for? S: Which itinerary do you want information for? U: Well… I U: Well… I wanna wanna go from Pittsburgh Airport to the… go from Pittsburgh Airport to the… S: Which itinerary do you want information for? S: Which itinerary do you want information for? S: Just a minute. Let me check that for you. S: Just a minute. Let me check that for you. U: What? U: What? S: Which itinerary do you want… S: Which itinerary do you want… S: Which itinerary do you want information for? S: Which itinerary do you want information for?
Overview
- What to do with imperfect ASR results?
What to do with imperfect ASR results?
- Can anything useful be extracted or is there no
Can anything useful be extracted or is there no meaning at all in what was recognized? meaning at all in what was recognized?
- How can a conversation that’s going badly
How can a conversation that’s going badly recover and end up successful? recover and end up successful?
- What approaches can be used for dialog
What approaches can be used for dialog decision making? decision making?
Handling Imperfect Recognition
- Even with good recognition, you cannot
Even with good recognition, you cannot blindly trust ASR output… blindly trust ASR output…
- Confidence annotation: Helios
Confidence annotation: Helios
- Given the current state of the dialog, how
Given the current state of the dialog, how confident is the system that the input matches the confident is the system that the input matches the user’s intention? user’s intention?
- Logistic regression based on speech, parsing,
Logistic regression based on speech, parsing, dialog features dialog features
- Training from corpus of transcribed data
Training from corpus of transcribed data
Grounding Concept Values
- Grounding: process where conversation
Grounding: process where conversation participants establish common understanding participants establish common understanding
- For each understood concept, choose among 3
For each understood concept, choose among 3 possible actions possible actions
- Explicit confirmation: ask user a direct question, wait
Explicit confirmation: ask user a direct question, wait for a positive response before accepting for a positive response before accepting
“To the airport. Is this correct?” “To the airport. Is this correct?”
- Implicit confirmation: repeat what was understood,
Implicit confirmation: repeat what was understood, accept unless user indicates it was wrong accept unless user indicates it was wrong
“To the airport. Where are you leaving from?” “To the airport. Where are you leaving from?”
- No action: silently accept without informing user
No action: silently accept without informing user
- Best choice can be situationally dependent
Best choice can be situationally dependent
Enabling Confirmation in Olympus
Special Rosetta template prompts Special Rosetta template prompts
$Rosetta::Templates::act{“implicit_confirm"} = { “origin_place" => “Leaving from <origin_place>.", … }
Write Prompts Create Policies Attach Policies to Concepts
Enabling Confirmation in Olympus
Confirmation policies Confirmation policies config config file: file: (Configurations/DesktopSAPI/expl_impl.pol)
EXPLORATION_MODE=epsilon-greedy EXPLORATION_PARAMETER=0.1 ACCEPT EXPL_CONF IMPL_CONF INACTIVE 1 -
- CONFIDENT -
5 10 UNCONFIDENT - 10 5 GROUNDED 1 -
- Write Prompts
Create Policies Attach Policies to Concepts
Enabling Confirmation in Olympus
When defining concepts in the dialog manager, indicate When defining concepts in the dialog manager, indicate which policy to apply: which policy to apply:
DEFINE_AGENCY( CPerformTask, DEFINE_CONCEPTS( INT_USER_CONCEPT(query_type, “impl") STRING_USER_CONCEPT(origin_place, “expl_impl") … ) … )
Write Prompts Create Policies Attach Policies to Concepts
Concept name Policy name
Handling Non-Understandings
- No meaning can be extracted from user input
No meaning can be extracted from user input
S: Where do you want to go? S: Where do you want to go? U: (no parse) U: (no parse) S: ??? S: ???
- Many possible system responses:
Many possible system responses:
S: Where do you want to go? S: Where do you want to go? S: Could you repeat that? S: Could you repeat that? S: For example, you can say, S: For example, you can say, “ “Downtown Downtown” ”. . S: Which route are you looking for? S: Which route are you looking for? … …
Non-Understanding Policies
- Repeat question
Repeat question
- May work if temporary channel issue caused
May work if temporary channel issue caused ASR problems ASR problems
- Frustrating to user if continued
Frustrating to user if continued
- Provide example of what to say
Provide example of what to say
- Can assist unfamiliar users
Can assist unfamiliar users
- Annoys users who already said the example
Annoys users who already said the example and weren’t understood and weren’t understood
- Change topic
Change topic
- Gets user to talk about something else
Gets user to talk about something else
- Still have to get original question answered
Still have to get original question answered
Non-Understanding Policies
- Two types of policies:
Two types of policies:
- Handcrafted/deterministic
Handcrafted/deterministic
Design a (small) space of dialog states
Design a (small) space of dialog states
Set a utility to each action in each state
Set a utility to each action in each state
- Data
Data-
- driven
driven
Learn optimal weight based on collected dialogs
Learn optimal weight based on collected dialogs
Exploration/exploitation trade
Exploration/exploitation trade-
- off at runtime
- ff at runtime
Create alternative versions of request template prompts: Create alternative versions of request template prompts:
"origin_place" => "Where are you leaving from?“,
Write Prompts Create Policies Attach Policies to Agents
Using Non-Understanding Strategies
Create alternative versions of request template prompts: Create alternative versions of request template prompts:
"origin_place" => {“default”=>"Where are you leaving from?“, “explain_more”=>”Right now, I need to know the stop or landmark from which you will be leaving.”, “what_can_i_say”=>”For example, you can say THE AIRPORT, or DOWNTOWN.”},
Write Prompts Create Policies Attach Policies to Agents
Using Non-Understanding Strategies
Non Non-
- understanding policies
understanding policies config config file: file: (Configurations/DesktopSAPI/repeat.pol)
EXPLORATION_MODE=greedy STATE[FIRST_NONU] num_prev_nonu = 1 END STATE[SUBSEQUENT_NONU] num_prev_nonu > 1 END
Write Prompts Create Policies Attach Policies to Agents
Using Non-Understanding Strategies
Non Non-
- understanding policies
understanding policies config config file: file: (Configurations/DesktopSAPI/repeat.pol)
AREP TYCS FIRST_NONU 1 - SUBSEQUENT_NONU - 1
Write Prompts Create Policies Attach Policies to Agents
Using Non-Understanding Strategies
When defining subagents in the dialog manager, indicate When defining subagents in the dialog manager, indicate which policy to apply: which policy to apply:
DEFINE_SUBAGENTS( SUBAGENT(RequestQuery, CRequestQuery, “tycs") SUBAGENT(RequestOriginPlace, CRequestOriginPlace, “arep_tycs") … )
Write Prompts Create Policies Attach Policies to Agents
Agent name Class name Policy name
Using Non-Understanding Strategies
Dialog Management
- Fundamentally, it is a task that:
Fundamentally, it is a task that:
- Given some input, decide what output should
Given some input, decide what output should be generated be generated
- How can this decision be made?
How can this decision be made?
- Manually written rules
Manually written rules
Olympus uses this approach in general
Olympus uses this approach in general – – you write you write the dialog task specification the dialog task specification
- Or …?
Or …?
Statistical Dialog Decision Making
- POMDP: partially observable Markov decision
POMDP: partially observable Markov decision process process
- Instead of a single explicit dialog state, track a
Instead of a single explicit dialog state, track a probability distribution over all possible states probability distribution over all possible states
- Update probability mass as dialog progresses
Update probability mass as dialog progresses
- Choose next action based on likelihood of
Choose next action based on likelihood of being in a given state being in a given state
- Probabilities generated using confidence
Probabilities generated using confidence scores from ASR, parser, etc. scores from ASR, parser, etc.
POMDP Dialog Design
Hybrid Approach
- Possible to improve hand
Possible to improve hand-
- generated dialog
generated dialog manager by also using statistical approach manager by also using statistical approach
- Take a “normal” set of dialog management rules
Take a “normal” set of dialog management rules
- Treat these rules as a belief network
Treat these rules as a belief network
- Add confidence scores as with a POMDP
Add confidence scores as with a POMDP approach approach
- At decision time, apply all information (original
At decision time, apply all information (original handwritten rules, plus state likelihood) to handwritten rules, plus state likelihood) to determine next state in dialog determine next state in dialog
Summary
- Spoken dialog systems integrate several
Spoken dialog systems integrate several speech and language technologies to allow speech and language technologies to allow conversational machines conversational machines
- Many different application types:
Many different application types:
- Information giving, question answering,
Information giving, question answering, interactive machine personalities interactive machine personalities
- Several types of dialog:
Several types of dialog:
- System initiative, mixed initiative, classification
System initiative, mixed initiative, classification
Summary
- Olympus: a complete spoken dialog framework
Olympus: a complete spoken dialog framework
- Provides all components needed to build SDS
Provides all components needed to build SDS
- Modular, allows drop
Modular, allows drop-
- in replacement of any
in replacement of any component component
- RavenClaw
RavenClaw: Olympus dialog manager : Olympus dialog manager
- Plan
Plan-
- based hierarchical dialog structure
based hierarchical dialog structure
- Dynamic task modification
Dynamic task modification
- Provides advanced dialog features
Provides advanced dialog features
Help, Error handing/recovery, Confirmation strategies
Help, Error handing/recovery, Confirmation strategies
- Allows for more complex dialog tasks than
Allows for more complex dialog tasks than simpler frameworks (like simpler frameworks (like VoiceXML VoiceXML) )
Summary
- Error detection and recovery
Error detection and recovery
- Misunderstandings
Confirmation Misunderstandings Confirmation
Explicit vs. Implicit
Explicit vs. Implicit
- Nonunderstandings
Nonunderstandings Recovery approaches Recovery approaches
Mixture of several choices most helpful
Mixture of several choices most helpful
Selecting which can be done heuristically or trained
Selecting which can be done heuristically or trained from data from data
- Dialog decision making
Dialog decision making
- Hand
Hand-
- crafted rules and heuristics
crafted rules and heuristics
- Statistical approaches