Speech Processing 15-492/18-492 Spoken Dialog Systems Beyond basic - - PowerPoint PPT Presentation

speech processing 15 492 18 492
SMART_READER_LITE
LIVE PREVIEW

Speech Processing 15-492/18-492 Spoken Dialog Systems Beyond basic - - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Spoken Dialog Systems Beyond basic dialogs Building your own dialogs Back-channeling Human response to speech Human response to speech Robots dont really do this Robots dont really do this


slide-1
SLIDE 1

Speech Processing 15-492/18-492

Spoken Dialog Systems Beyond basic dialogs Building your own dialogs

slide-2
SLIDE 2

Back-channeling

  • Human response to speech

Human response to speech

  • Robots don’t really do this

Robots don’t really do this

  • Uhms

Uhms, errs filler works , errs filler works

  • Yeah, uh

Yeah, uh-

  • huh,

huh, hm hm, right, okay , right, okay

  • Typically words *not* in the lexicon

Typically words *not* in the lexicon

  • Prosody delivery is important

Prosody delivery is important

  • Timing is important

Timing is important

slide-3
SLIDE 3

Back-channel Example

H It is like a party, like, “rave” type party or like H It is like a party, like, “rave” type party or like C well, it’s someone’s house C well, it’s someone’s house H yeah H yeah C there’s going to be, I mean there’s like, they’re C there’s going to be, I mean there’s like, they’re going to be spinning. So, in that sense, maybe, going to be spinning. So, in that sense, maybe, but it’s just at someone’s house, like but it’s just at someone’s house, like H yah H yah-

  • yeah

yeah C It’s in the middle of the night, C It’s in the middle of the night, that,too that,too, but , but

(from Nigel Ward UTEP) (from Nigel Ward UTEP)

slide-4
SLIDE 4

Timing

  • Replies happen before question ends

Replies happen before question ends

  • Humans can guess when turn is ending

Humans can guess when turn is ending

  • Combination of semantics, prosody (and

Combination of semantics, prosody (and arrogance) arrogance)

  • Human

Human-

  • machine dialogs more restricted

machine dialogs more restricted

slide-5
SLIDE 5

Gesture and Gaze

  • What you look at when talking

What you look at when talking

  • What the machine should look at

What the machine should look at

  • Talking to the machine

Talking to the machine vs vs talking to your talking to your friend friend

slide-6
SLIDE 6

Laughter

  • Most common non

Most common non-

  • verbal vocal production

verbal vocal production

  • Should machines laugh?

Should machines laugh?

  • Yes to fit in with the other participants

Yes to fit in with the other participants

  • Laughing takes different forms

Laughing takes different forms

  • Near verbal (ha ha

Near verbal (ha ha ha ha) )

  • Vocal but unlike speech

Vocal but unlike speech

  • Subvocal

Subvocal

  • Overlayed

Overlayed on speech

  • n speech
slide-7
SLIDE 7

Participant in Meeting

  • Machine participants in meetings

Machine participants in meetings

  • At least follow the speaker

At least follow the speaker

  • Know when to agree/laugh etc

Know when to agree/laugh etc

  • Know when it can speak

Know when it can speak

  Needs to watch how people interact

Needs to watch how people interact

slide-8
SLIDE 8

Machine assistant

  • Needs to watch what you do

Needs to watch what you do

  • When are you busy

When are you busy

  • When are you

When are you interruptable interruptable

  • What is the importance of the information

What is the importance of the information

  • (Cell phone just rings, no matter where you are)

(Cell phone just rings, no matter where you are)

  • Look at human brain state

Look at human brain state

  • Find when you are thinking

Find when you are thinking

  • Busy, thinking, dreaming

Busy, thinking, dreaming

slide-9
SLIDE 9

How do humans interact with machines

  • Look at human

Look at human-

  • human calls

human calls

  • “Pretend” they are talking to a machine

“Pretend” they are talking to a machine

  • “Wizard of Oz” (WOZ)

“Wizard of Oz” (WOZ)

  • Have a human play a machine

Have a human play a machine

  • Need to constrain the human

Need to constrain the human

  Give them “robotic” voice

Give them “robotic” voice

  Constrain their options

Constrain their options

slide-10
SLIDE 10

Building a New Dialog Systems

  • What will it do?

What will it do?

  • Write down a typical dialog

Write down a typical dialog

  • No *really* write down a typical dialog

No *really* write down a typical dialog

  • Write a second (simpler) one

Write a second (simpler) one

  • Look at human

Look at human-

  • human dialogs

human dialogs

  • What information is being passed

What information is being passed

  • Can you avoid the hard ASR parts

Can you avoid the hard ASR parts

  (Avoid large numbers of names)

(Avoid large numbers of names)

slide-11
SLIDE 11

Breaking down the task

  • What is the ontology

What is the ontology

  • What entity types must you deal with

What entity types must you deal with

  e.g. Busses, times, bus stops

e.g. Busses, times, bus stops

  • How will people say them

How will people say them

  List *many* yourself and ask others

List *many* yourself and ask others

  • How should your system say them

How should your system say them

  Consistently, and in a way that’s easy to recognize

Consistently, and in a way that’s easy to recognize

slide-12
SLIDE 12

Breaking down the task

  • What is the flow of the dialog

What is the flow of the dialog

  • How should you order the questions

How should you order the questions

  • Should you allow multiple orders

Should you allow multiple orders

  • Is this ordering reasonable for your users

Is this ordering reasonable for your users

  Ask others, you are too close to the task

Ask others, you are too close to the task

  • Test with your written down dialogs

Test with your written down dialogs

  (You did write them down didn’t you?)

(You did write them down didn’t you?)

slide-13
SLIDE 13

Writing grammars

  • Write grammars for what response

Write grammars for what response

  • Test them with multiple examples

Test them with multiple examples

  • (Get others too if you can)

(Get others too if you can)

  • Test it with text.

Test it with text.

  • ASR will have errors

ASR will have errors

  • Test by typing first, easier to debug

Test by typing first, easier to debug

slide-14
SLIDE 14

Testing the dialog

  • Check for one dialog you know works

Check for one dialog you know works

  • Test it in the system

Test it in the system

  • Modify you grammar/dialog accordingly

Modify you grammar/dialog accordingly

  • Then try the variations

Then try the variations

  • Get others to test it

Get others to test it

  • Does it do the task you expect

Does it do the task you expect

slide-15
SLIDE 15

Help

  • Try to be consistent and concise

Try to be consistent and concise

  • Give good examples of what to say

Give good examples of what to say

  • Give multiple levels of help

Give multiple levels of help

  • Nobody will listen ….

Nobody will listen ….

  • Test your help advice

Test your help advice

  • Is it really useful?

Is it really useful?

slide-16
SLIDE 16
slide-17
SLIDE 17

SDS Architecture