Speech Processing 15-492/18-492 Spoken Dialog Systems SDS - - PowerPoint PPT Presentation

speech processing 15 492 18 492
SMART_READER_LITE
LIVE PREVIEW

Speech Processing 15-492/18-492 Spoken Dialog Systems SDS - - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Spoken Dialog Systems SDS components Spoken Dialog Systems More than just ASR and TTS More than just ASR and TTS Recognition Recognition Parsing Parsing Manipulation of utterances


slide-1
SLIDE 1

Speech Processing 15-492/18-492

Spoken Dialog Systems SDS components

slide-2
SLIDE 2

Spoken Dialog Systems

  • More than just ASR and TTS

More than just ASR and TTS

  • Recognition

Recognition

  • Parsing

Parsing

  • Manipulation of utterances

Manipulation of utterances

  • Generation of new information

Generation of new information

  • Text generation

Text generation

  • Synthesis

Synthesis

slide-3
SLIDE 3

SDS Architecture

slide-4
SLIDE 4

SDS Internals

  • Parser

Parser

  • From words to structure

From words to structure

  • Dialog Manager

Dialog Manager

  • State of dialog (who is talking)

State of dialog (who is talking)

  • Direction of dialog (what next)

Direction of dialog (what next)

  • References, user profile etc

References, user profile etc

  • Interaction of database/internet

Interaction of database/internet

  • Language Generation

Language Generation

  • From structure to words

From structure to words

slide-5
SLIDE 5

Parsing

  • Parsing of SPEECH not TEXT

Parsing of SPEECH not TEXT

  • Eh, I

Eh, I wanna wanna go, go, wanna wanna go to Boston tomorrow go to Boston tomorrow

  • If its not too much trouble I’d be very grateful if

If its not too much trouble I’d be very grateful if

  • ne might be able to aid me in arranging my
  • ne might be able to aid me in arranging my

travel arrangements to Boston, Logan airport, travel arrangements to Boston, Logan airport, at sometime tomorrow morning, thank you. at sometime tomorrow morning, thank you.

  • Boston, tomorrow

Boston, tomorrow

slide-6
SLIDE 6

Parsing: Output structure

“I I wanna wanna go to Boston, tomorrow” go to Boston, tomorrow”

  • Convert speech to structure

Convert speech to structure

  • Sufficient for further processing/query

Sufficient for further processing/query

slide-7
SLIDE 7

Phoenix Parser

7

[Place] (carnegie mellon university) (downtown) (robinson towne center) (the airport) (south hills junction) (mount oliver) (the south side) (oakland) (bloomfield) (polish hill) (the strip district) (the north side) ; [NextBus] (*WHEN_IS *the next *BUS) (*WHEN_IS *the BUS after that *BUS) WHEN_IS (when is) (when's) BUS (bus) (one) ;

slide-8
SLIDE 8

Phoenix Parser

  • Parse what is important

Parse what is important

  • Ignore other parts

Ignore other parts

  • Map know parts to usually information

Map know parts to usually information

slide-9
SLIDE 9

Parsing vs Language Model

  • Language Model

Language Model

  • Model what actually gets says

Model what actually gets says

  • Parsing

Parsing

  • Extract the information you want

Extract the information you want

  • Models *can* be shared

Models *can* be shared

  • Only accept things in the grammar

Only accept things in the grammar

  • Can be over limiting

Can be over limiting

slide-10
SLIDE 10

Dialog Manager

  • Maintain state

Maintain state

  • Where are we in the dialog

Where are we in the dialog

  • Whose turn is it

Whose turn is it

  Waiting for speaker

Waiting for speaker

  Waiting for database query (stall user)

Waiting for database query (stall user)

  • Deal with barge

Deal with barge-

  • in

in

slide-11
SLIDE 11

Language Generation

  • Query for flights to Boston

Query for flights to Boston

  • Template fill

Template fill answer(s answer(s) )

  • The next flight to DEST leaves at

The next flight to DEST leaves at DEPART_TIME arriving at ARRIVE_TIME. DEPART_TIME arriving at ARRIVE_TIME.

  • Templates may be much more complex

Templates may be much more complex

slide-12
SLIDE 12

Language Generation

  • Choose which template to use

Choose which template to use

  • Based on state, answer type

Based on state, answer type

  • Natural variation

Natural variation

  • Statistical variation

Statistical variation

  • Include <

Include <ssml ssml> tags to help synthesis > tags to help synthesis

  • Can <

Can <emph emph>emphasize</ >emphasize</emph emph> parts > parts

  • Can identify dates, numbers etc.

Can identify dates, numbers etc.

  • Humans like variation in the output

Humans like variation in the output

  • It is rare for a human to repeat things exactly

It is rare for a human to repeat things exactly

slide-13
SLIDE 13

Language Generation

  • Frames structures to (marked up) text

Frames structures to (marked up) text

  • START: Pittsburgh

START: Pittsburgh

  • END: Boston

END: Boston

  • DATE: 20081028

DATE: 20081028

  • TIME: 07:45

TIME: 07:45

  • FLIGHT: US075

FLIGHT: US075

  • Can generation

Can generation

  • I have US 075 leaving at 07:45 tomorrow

I have US 075 leaving at 07:45 tomorrow

  • US Airways has a flight departing tomorrow at 07:45

US Airways has a flight departing tomorrow at 07:45

slide-14
SLIDE 14

Standardized things

  • Help

Help

  • User should be able to get help at any time

User should be able to get help at any time

  • Explain where they are and what they are

Explain where they are and what they are expected to say (with explicit examples) expected to say (with explicit examples)

  • Errors

Errors

  • “I didn’t understand” …

“I didn’t understand” …

  • Confirmation

Confirmation

  • Did you say “Boston”?

Did you say “Boston”?

slide-15
SLIDE 15

Confirmation

  • Explicit confirmation

Explicit confirmation

  • Where are you traveling to ?

Where are you traveling to ? Boston Boston

  • Boston, did I get that right?

Boston, did I get that right? Yes Yes

slide-16
SLIDE 16

Confirmation

  • Implicit confirmation

Implicit confirmation

  • Where are you traveling to?

Where are you traveling to? Boston Boston

  • Boston, where …

Boston, where … <can barge in> <can barge in>

slide-17
SLIDE 17

Confirmation

  • Explicit confirmation

Explicit confirmation

  • Safe but slow

Safe but slow

  • Implicit confirmation

Implicit confirmation

  • Natural, but requires good support for barge

Natural, but requires good support for barge-

  • in

in

slide-18
SLIDE 18

Grounding

  • Showing evidence the system understands

Showing evidence the system understands

  • Where are you traveling to?

Where are you traveling to? Boston. Boston.

  • Right. Where ….
  • Right. Where ….

Boston, right. Where …. Boston, right. Where ….

slide-19
SLIDE 19

Designing Prompts

  • Constrain your questions:

Constrain your questions:

  • How may I help you?

How may I help you?

  Long story reply

Long story reply

  • What bus number would like schedules for?

What bus number would like schedules for?

  Expect bus number replies

Expect bus number replies

slide-20
SLIDE 20