Speech Processing 15-492/18-492 Spoken Dialog Systems Beyond basic - - PowerPoint PPT Presentation
Speech Processing 15-492/18-492 Spoken Dialog Systems Beyond basic - - PowerPoint PPT Presentation
Speech Processing 15-492/18-492 Spoken Dialog Systems Beyond basic dialogs Building your own dialogs Back-channeling Human response to speech Human response to speech Robots dont really do this Robots dont really do this
SLIDE 1
SLIDE 2
Back-channeling
- Human response to speech
Human response to speech
- Robots don’t really do this
Robots don’t really do this
- Uhms
Uhms, errs filler works , errs filler works
- Yeah, uh
Yeah, uh-
- huh,
huh, hm hm, right, okay , right, okay
- Typically words *not* in the lexicon
Typically words *not* in the lexicon
- Prosody delivery is important
Prosody delivery is important
- Timing is important
Timing is important
SLIDE 3
Back-channel Example
H It is like a party, like, “rave” type party or like H It is like a party, like, “rave” type party or like C well, it’s someone’s house C well, it’s someone’s house H yeah H yeah C there’s going to be, I mean there’s like, they’re C there’s going to be, I mean there’s like, they’re going to be spinning. So, in that sense, maybe, going to be spinning. So, in that sense, maybe, but it’s just at someone’s house, like but it’s just at someone’s house, like H yah H yah-
- yeah
yeah C It’s in the middle of the night, C It’s in the middle of the night, that,too that,too, but , but
(from Nigel Ward UTEP) (from Nigel Ward UTEP)
SLIDE 4
Timing
- Replies happen before question ends
Replies happen before question ends
- Humans can guess when turn is ending
Humans can guess when turn is ending
- Combination of semantics, prosody (and
Combination of semantics, prosody (and arrogance) arrogance)
- Human
Human-
- machine dialogs more restricted
machine dialogs more restricted
SLIDE 5
Gesture and Gaze
- What you look at when talking
What you look at when talking
- What the machine should look at
What the machine should look at
- Talking to the machine
Talking to the machine vs vs talking to your talking to your friend friend
SLIDE 6
Laughter
- Most common non
Most common non-
- verbal vocal production
verbal vocal production
- Should machines laugh?
Should machines laugh?
- Yes to fit in with the other participants
Yes to fit in with the other participants
- Laughing takes different forms
Laughing takes different forms
- Near verbal (ha ha
Near verbal (ha ha ha ha) )
- Vocal but unlike speech
Vocal but unlike speech
- Subvocal
Subvocal
- Overlayed
Overlayed on speech
- n speech
SLIDE 7
Participant in Meeting
- Machine participants in meetings
Machine participants in meetings
- At least follow the speaker
At least follow the speaker
- Know when to agree/laugh etc
Know when to agree/laugh etc
- Know when it can speak
Know when it can speak
Needs to watch how people interact
Needs to watch how people interact
SLIDE 8
Machine assistant
- Needs to watch what you do
Needs to watch what you do
- When are you busy
When are you busy
- When are you
When are you interruptable interruptable
- What is the importance of the information
What is the importance of the information
- (Cell phone just rings, no matter where you are)
(Cell phone just rings, no matter where you are)
- Look at human brain state
Look at human brain state
- Find when you are thinking
Find when you are thinking
- Busy, thinking, dreaming
Busy, thinking, dreaming
SLIDE 9
How do humans interact with machines
- Look at human
Look at human-
- human calls
human calls
- “Pretend” they are talking to a machine
“Pretend” they are talking to a machine
- “Wizard of Oz” (WOZ)
“Wizard of Oz” (WOZ)
- Have a human play a machine
Have a human play a machine
- Need to constrain the human
Need to constrain the human
Give them “robotic” voice
Give them “robotic” voice
Constrain their options
Constrain their options
SLIDE 10
Building a New Dialog Systems
- What will it do?
What will it do?
- Write down a typical dialog
Write down a typical dialog
- No *really* write down a typical dialog
No *really* write down a typical dialog
- Write a second (simpler) one
Write a second (simpler) one
- Look at human
Look at human-
- human dialogs
human dialogs
- What information is being passed
What information is being passed
- Can you avoid the hard ASR parts
Can you avoid the hard ASR parts
(Avoid large numbers of names)
(Avoid large numbers of names)
SLIDE 11
Breaking down the task
- What is the ontology
What is the ontology
- What entity types must you deal with
What entity types must you deal with
e.g. Busses, times, bus stops
e.g. Busses, times, bus stops
- How will people say them
How will people say them
List *many* yourself and ask others
List *many* yourself and ask others
- How should your system say them
How should your system say them
Consistently, and in a way that’s easy to recognize
Consistently, and in a way that’s easy to recognize
SLIDE 12
Breaking down the task
- What is the flow of the dialog
What is the flow of the dialog
- How should you order the questions
How should you order the questions
- Should you allow multiple orders
Should you allow multiple orders
- Is this ordering reasonable for your users
Is this ordering reasonable for your users
Ask others, you are too close to the task
Ask others, you are too close to the task
- Test with your written down dialogs
Test with your written down dialogs
(You did write them down didn’t you?)
(You did write them down didn’t you?)
SLIDE 13
Writing grammars
- Write grammars for what response
Write grammars for what response
- Test them with multiple examples
Test them with multiple examples
- (Get others too if you can)
(Get others too if you can)
- Test it with text.
Test it with text.
- ASR will have errors
ASR will have errors
- Test by typing first, easier to debug
Test by typing first, easier to debug
SLIDE 14
Testing the dialog
- Check for one dialog you know works
Check for one dialog you know works
- Test it in the system
Test it in the system
- Modify you grammar/dialog accordingly
Modify you grammar/dialog accordingly
- Then try the variations
Then try the variations
- Get others to test it
Get others to test it
- Does it do the task you expect
Does it do the task you expect
SLIDE 15
Help
- Try to be consistent and concise
Try to be consistent and concise
- Give good examples of what to say
Give good examples of what to say
- Give multiple levels of help
Give multiple levels of help
- Nobody will listen ….
Nobody will listen ….
- Test your help advice
Test your help advice
- Is it really useful?
Is it really useful?
SLIDE 16
SLIDE 17