Speech Processing 15-492/18-492 Spoken Dialog Systems Conversing - - PowerPoint PPT Presentation

speech processing 15 492 18 492
SMART_READER_LITE
LIVE PREVIEW

Speech Processing 15-492/18-492 Spoken Dialog Systems Conversing - - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Spoken Dialog Systems Conversing with machines Spoken Dialog Systems Not just ASR bolted onto TTS Not just ASR bolted onto TTS Different styles of interaction Different styles of interaction


slide-1
SLIDE 1

Speech Processing 15-492/18-492

Spoken Dialog Systems Conversing with machines

slide-2
SLIDE 2

Spoken Dialog Systems

  • Not just ASR bolted onto TTS

Not just ASR bolted onto TTS

  • Different styles of interaction

Different styles of interaction

  • IVR/Tree question/response systems

IVR/Tree question/response systems

  • Mixed initiative systems

Mixed initiative systems

  • “How May I Help You?” open questions

“How May I Help You?” open questions

  • True conversational machine

True conversational machine-

  • human interaction

human interaction

  • Strings of characters to words

Strings of characters to words

slide-3
SLIDE 3

SDS Overview

  • Introduction

Introduction

  • Building simple dialog systems

Building simple dialog systems

  • VoiceXML

VoiceXML

  • A language for writing systems

A language for writing systems

  • Beyond tree

Beyond tree-

  • based systems

based systems

  • CMU’s Olympus systems

CMU’s Olympus systems

  • Real

Real-

  • world deployment considerations

world deployment considerations

slide-4
SLIDE 4

SDS Applications

  • Information giving

Information giving

  • Flights, buses, stocks weather

Flights, buses, stocks weather

  • Driving directions

Driving directions

  • News

News

  • Information navigators

Information navigators

  • Read your mail

Read your mail

  • Search the web

Search the web

  • Answer questions

Answer questions

  • Provide personalities

Provide personalities

  • Game characters (NPC), toys, robots

Game characters (NPC), toys, robots

  • Speech

Speech-

  • to

to-

  • speech translation

speech translation

  • Cross

Cross-

  • lingual interaction

lingual interaction

slide-5
SLIDE 5

Dialog Types

  • System initiative

System initiative

  • Form

Form-

  • filling paradigm

filling paradigm

  • Can switch language models at each turn

Can switch language models at each turn

  • Can “know” which is likely to be said

Can “know” which is likely to be said

  • Mixed initiative

Mixed initiative

  • Users can go where they like

Users can go where they like

  • System or user can lead the discussion

System or user can lead the discussion

  • Classifying:

Classifying:

  • Users can say what they like

Users can say what they like

  • But really only “N” operations possible

But really only “N” operations possible

  • E.g. AT&T? “How may I help you?”

E.g. AT&T? “How may I help you?”

slide-6
SLIDE 6

System Initiative

  • Most common

Most common

  • Machine controls the call

Machine controls the call

  • Few choices in the dialog

Few choices in the dialog

  • Simple form filling:

Simple form filling:

  • What is your bank account number

What is your bank account number

  • Advantages:

Advantages:

  • You know what users will say (sort of)

You know what users will say (sort of)

  • Hard for user to get confused

Hard for user to get confused

  • Hard for system to get confused

Hard for system to get confused

  • Easy to build

Easy to build

  • Disadvantages:

Disadvantages:

  • Limited flexibility in interaction

Limited flexibility in interaction

  • Fixed dialog structure

Fixed dialog structure

  • Most reliable, but many turns

Most reliable, but many turns

slide-7
SLIDE 7

System Initiative

  • Let’s Go Bus Information

Let’s Go Bus Information

  • 412 442 2000 (Evenings)

412 442 2000 (Evenings)

  • Provides bus information for Pittsburgh East

Provides bus information for Pittsburgh East End (61x 5[469]x) End (61x 5[469]x)

  • Tell Me

Tell Me

  • Company getting others to build systems

Company getting others to build systems

  • Stocks, weather, entertainment

Stocks, weather, entertainment

  • 1 800 555 8355

1 800 555 8355

slide-8
SLIDE 8

Mixed Initiative

  • User or system takes initiative

User or system takes initiative

  • More interesting dialogs

More interesting dialogs

  • “jump” through different parts of dialog state

“jump” through different parts of dialog state

  • Advantages

Advantages

  • More realistic dialog

More realistic dialog

  • Can do more complex tasks

Can do more complex tasks

  • Disadvantages

Disadvantages

  • Can get confusing

Can get confusing

  • Can miss important parts

Can miss important parts

slide-9
SLIDE 9

Vera

slide-10
SLIDE 10

Classification Dialogs

  • Sort out from N things

Sort out from N things

  • User says “anything” and system directs them

User says “anything” and system directs them

  • Receptionist

Receptionist

  I have a problem with my bill

I have a problem with my bill

  What’s the area code for Miami

What’s the area code for Miami

  Did you know I can see the beach from here

Did you know I can see the beach from here

  • Advantages

Advantages

  • (Apparently) complex understanding

(Apparently) complex understanding

  • Solves a very common task

Solves a very common task

  • Disadvantages

Disadvantages

  • Actually quite restrictive

Actually quite restrictive

  • Needs data to train from

Needs data to train from

  • Needs to be updated

Needs to be updated

slide-11
SLIDE 11

Beyond Telephones

  • Telematics

Telematics

  • Voice communication in cars

Voice communication in cars

  • CPS, music selection etc

CPS, music selection etc

  • Robot Interaction

Robot Interaction

  • Robot

Robot-

  • robot and robot

robot and robot-

  • human interaction

human interaction

  • Animated talking head

Animated talking head

  • Non

Non-

  • player characters

player characters – – web agents web agents

  • Speech to Speech translation

Speech to Speech translation

slide-12
SLIDE 12

Team Talk

  • Using speech to control multiple robots

Using speech to control multiple robots

  • Robots have names and distinct voices

Robots have names and distinct voices

  • They report to each other and to you in voice

They report to each other and to you in voice

slide-13
SLIDE 13

USI

  • Lots of different interfaces is confusing

Lots of different interfaces is confusing

  • Try to have general expectations and discover

Try to have general expectations and discover

  • Try for some level of standardization

Try for some level of standardization

  • (like programming applications: file menu)

(like programming applications: file menu)

slide-14
SLIDE 14

True conversation

  • Requires mores than just speech

Requires mores than just speech

  • Non

Non-

  • verbal noises: laughing,

verbal noises: laughing, er er, um, etc , um, etc

  • Eye gaze

Eye gaze

  • Proper timing (not waiting 500ms before

Proper timing (not waiting 500ms before speaker) speaker)

  • Back

Back-

  • channeling

channeling

  • Movement

Movement

  • Talking about nothing

Talking about nothing

slide-15
SLIDE 15

Roboreceptionist

  • Entrance to NSH

Entrance to NSH

  • Keyboard (no ASR)

Keyboard (no ASR)

  • TTS, face, movement

TTS, face, movement

  • Range finder to detect people

Range finder to detect people

  • Significant background

Significant background character character

  • Mostly talks about nothing

Mostly talks about nothing

slide-16
SLIDE 16