Speech Processing 15-492/18-492 Spoken Dialog Systems Tree based - - PowerPoint PPT Presentation

speech processing 15 492 18 492
SMART_READER_LITE
LIVE PREVIEW

Speech Processing 15-492/18-492 Spoken Dialog Systems Tree based - - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Spoken Dialog Systems Tree based dialogs VoiceXML State-based Dialogs Simple state- -based dialog systems based dialog systems Simple state Get Name Get Name Get Account number Get


slide-1
SLIDE 1

Speech Processing 15-492/18-492

Spoken Dialog Systems Tree based dialogs VoiceXML

slide-2
SLIDE 2

State-based Dialogs

  • Simple state

Simple state-

  • based dialog systems

based dialog systems

  • Get Name

Get Name

  • Get Account number

Get Account number

  • Get Pin

Get Pin

  • Present balance

Present balance

  • Go back to start or exit

Go back to start or exit

slide-3
SLIDE 3

State-based Dialogs

  • Get Name:

Get Name:

  • What is your name?

What is your name?

  ASR Name

ASR Name

  May be correct (in the database)

May be correct (in the database)

  May be unknown (not in database)

May be unknown (not in database)

  May not be name (What do I say?/Help/Repeat)

May not be name (What do I say?/Help/Repeat)

  Should you echo the recognized name?

Should you echo the recognized name?

  Confirmation (or not)

Confirmation (or not)

slide-4
SLIDE 4

State-based dialog

  • Get name

Get name

  • Check in database

Check in database

  • Ask again if not

Ask again if not

  • Deal with help

Deal with help

  • Get account number

Get account number

  • Check in database (with name)

Check in database (with name)

  • Confirm account number and name

Confirm account number and name

  • For security

For security

slide-5
SLIDE 5

State-based Interaction

  • Trees can get very large

Trees can get very large

  • User can get lost easily

User can get lost easily

  • You want to minimize the number of turns

You want to minimize the number of turns

  • Faster throughput means more calls

Faster throughput means more calls

  • Faster throughput means happier customer

Faster throughput means happier customer

slide-6
SLIDE 6

The level of help

  • First time users *need* a successful call

First time users *need* a successful call

  • Otherwise, they wont call back

Otherwise, they wont call back

  • Having very helpful prompts is good

Having very helpful prompts is good

  At start, gets annoying quickly

At start, gets annoying quickly

  • Designing prompts is a craft

Designing prompts is a craft

  • What should say that is understood

What should say that is understood

  • How much should you tailor it to the user

How much should you tailor it to the user

slide-7
SLIDE 7

VoiceXML

  • A W3C standard for voice browsing

A W3C standard for voice browsing

  • XML based “programming” language for

XML based “programming” language for speech speech

  • Output synthesized (and recorded) speech

Output synthesized (and recorded) speech

  • Recognition of speech and DTMF

Recognition of speech and DTMF

  • Recording of spoken input

Recording of spoken input

  • Telephony features

Telephony features

slide-8
SLIDE 8

VoiceXML

  • ASR

ASR

  • From Grammars (JSGF)

From Grammars (JSGF)

  • From tri

From tri-

  • grams

grams

  • From “Domain Managers”

From “Domain Managers”

  Credit card numbers

Credit card numbers

  City, Stats

City, Stats

slide-9
SLIDE 9

VoiceXML

  • TTS

TTS

  • <

<ssml ssml> markup > markup

  • Choice of voice

Choice of voice

  • Choice of language

Choice of language

  • Choice of how to pronounce things

Choice of how to pronounce things

  • Specify breaks, timing emphasis

Specify breaks, timing emphasis

slide-10
SLIDE 10

Structure

< <vxml vxml version="1.0"> version="1.0"> <meta name="author" content="John Doe"/> <meta name="author" content="John Doe"/> < <var var name="hi" name="hi" expr expr="'Hello World!'"/> ="'Hello World!'"/> <form> <form> <block> <block> <value <value expr expr="hi"/> ="hi"/> < <goto goto next="# next="#say_goodbye say_goodbye"/> "/> </block> </block> </form> </form> <form id=" <form id="say_goodbye say_goodbye"> "> <block> <block> Goodbye! Goodbye! </block> </block> </form> </form> </ </vxml vxml> >

slide-11
SLIDE 11

Basic Tags

  • <form id=“

<form id=“xxxx xxxx”> ”>

  • <

<goto goto next=“#xxx”> next=“#xxx”>

  • <field> gather info from user through

<field> gather info from user through speech or DTMF speech or DTMF

  • <record>

<record> record record data user data user

  • <

<subdialog subdialog> performs some sub dialog > performs some sub dialog

slide-12
SLIDE 12

<field> tag

<form id=“ <form id=“getBusNumber getBusNumber”> ”> <field name=“ <field name=“BusNumber BusNumber”> ”> <prompt>Which bus line do you want?</prompt> <prompt>Which bus line do you want?</prompt> <grammar <grammar src src=“grams/ =“grams/bus.gram bus.gram”> ”> <help> Please say you desired bus number, e.g. <help> Please say you desired bus number, e.g. 61C</help> 61C</help> </field> </field> </form> </form>

slide-13
SLIDE 13

Flow of Control

  • Goto

Goto < <goto goto next=“# next=“#GetBusNumber GetBusNumber> >

< <goto goto next=“ next=“Trains.vxml Trains.vxml”> ”>

  • <if

<if cond cond=“ =“BusNumber BusNumber == ‘501”> == ‘501”>

<prompt> Sorry that bus no longer runs</prompt> <prompt> Sorry that bus no longer runs</prompt> < <elseif elseif cond cond=“ =“BusNumber BusNumber == ’56U”> == ’56U”> <prompt> Sorry it’ll be a long wait </prompt> <prompt> Sorry it’ll be a long wait </prompt> <else /> <else /> <prompt> One will be along shortly </prompt> <prompt> One will be along shortly </prompt> </if> </if>

slide-14
SLIDE 14

Variables

  • <

<var var name=“var1” name=“var1” expr expr =“hello”> =“hello”> <prompt I just wanted to say <value <prompt I just wanted to say <value expr expr=“var1”> </prompt> =“var1”> </prompt> <assign name=“var1” <assign name=“var1” expr expr=“goodbye”> =“goodbye”>

slide-15
SLIDE 15

Recognition Grammars

  • Speech Recognition Grammar Specification

Speech Recognition Grammar Specification

  • (SRGS)

(SRGS)

  • Augmented BNF

Augmented BNF

$order = I would like a $drink $order = I would like a $drink $drink = coke | $drink = coke | pepsi pepsi | | mountain_dew mountain_dew

slide-16
SLIDE 16

VoiceXML Browsers

  • Compatibility

Compatibility

  • Not as compatible as one would like

Not as compatible as one would like

  • <objects> can be different (but useful)

<objects> can be different (but useful)

  City, State recognizers

City, State recognizers

  • ECMAscript

ECMAscript ( (Javascript Javascript) )

slide-17
SLIDE 17

Beyond VoiceXML (in VoiceXML)

  • Mixing html/

Mixing html/cgi cgi scripts in scripts in VoiceXML VoiceXML

  • Use

Use php php to generate to generate VoiceXML VoiceXML files files

  • Use

Use urls urls (with ?...) to calculate/get data (with ?...) to calculate/get data

  http://weather.com?zip=“15213

http://weather.com?zip=“15213” ”

  • Use

Use urls urls to get waveforms to get waveforms

  http://tts.com?text=“Hello

http://tts.com?text=“Hello World” World”

slide-18
SLIDE 18

VoiceXML future

  • N

N-

  • gram grammar Markup Language

gram grammar Markup Language

  • Many browsers hove own extensions

Many browsers hove own extensions

  • Pronunciation Lexicon Markup Language

Pronunciation Lexicon Markup Language

  • A way to add new items to the lexicon

A way to add new items to the lexicon

  • Hard to find good standards

Hard to find good standards

  • Call Control Markup Language

Call Control Markup Language

  • For management and logging of calls

For management and logging of calls

slide-19
SLIDE 19

Microsoft SALT

  • SALT tags

SALT tags

Listen DTMF prompt bind grammar (plus Listen DTMF prompt bind grammar (plus ssml ssml) )

  • Designed for desktop not just phone

Designed for desktop not just phone

  • Design to be shared documents

Design to be shared documents

  • Viewing (HTML) and Speech (SALT)

Viewing (HTML) and Speech (SALT)

slide-20
SLIDE 20

Available Systems

  • Nuance

Nuance

  • Be

Be-

  • vocal

vocal

  • Tell Me

Tell Me

  • Tell

Tell-

  • me studio

me studio

  • OpenVXI/publicvoicexml.org

OpenVXI/publicvoicexml.org

  • Many

Many others

  • thers
slide-21
SLIDE 21
slide-22
SLIDE 22

SDS Architecture