Speech Processing 15-492/18-492 Spoken Dialog Systems Tree based - - PowerPoint PPT Presentation
Speech Processing 15-492/18-492 Spoken Dialog Systems Tree based - - PowerPoint PPT Presentation
Speech Processing 15-492/18-492 Spoken Dialog Systems Tree based dialogs VoiceXML State-based Dialogs Simple state- -based dialog systems based dialog systems Simple state Get Name Get Name Get Account number Get
State-based Dialogs
- Simple state
Simple state-
- based dialog systems
based dialog systems
- Get Name
Get Name
- Get Account number
Get Account number
- Get Pin
Get Pin
- Present balance
Present balance
- Go back to start or exit
Go back to start or exit
State-based Dialogs
- Get Name:
Get Name:
- What is your name?
What is your name?
ASR Name
ASR Name
May be correct (in the database)
May be correct (in the database)
May be unknown (not in database)
May be unknown (not in database)
May not be name (What do I say?/Help/Repeat)
May not be name (What do I say?/Help/Repeat)
Should you echo the recognized name?
Should you echo the recognized name?
Confirmation (or not)
Confirmation (or not)
State-based dialog
- Get name
Get name
- Check in database
Check in database
- Ask again if not
Ask again if not
- Deal with help
Deal with help
- Get account number
Get account number
- Check in database (with name)
Check in database (with name)
- Confirm account number and name
Confirm account number and name
- For security
For security
State-based Interaction
- Trees can get very large
Trees can get very large
- User can get lost easily
User can get lost easily
- You want to minimize the number of turns
You want to minimize the number of turns
- Faster throughput means more calls
Faster throughput means more calls
- Faster throughput means happier customer
Faster throughput means happier customer
The level of help
- First time users *need* a successful call
First time users *need* a successful call
- Otherwise, they wont call back
Otherwise, they wont call back
- Having very helpful prompts is good
Having very helpful prompts is good
At start, gets annoying quickly
At start, gets annoying quickly
- Designing prompts is a craft
Designing prompts is a craft
- What should say that is understood
What should say that is understood
- How much should you tailor it to the user
How much should you tailor it to the user
VoiceXML
- A W3C standard for voice browsing
A W3C standard for voice browsing
- XML based “programming” language for
XML based “programming” language for speech speech
- Output synthesized (and recorded) speech
Output synthesized (and recorded) speech
- Recognition of speech and DTMF
Recognition of speech and DTMF
- Recording of spoken input
Recording of spoken input
- Telephony features
Telephony features
VoiceXML
- ASR
ASR
- From Grammars (JSGF)
From Grammars (JSGF)
- From tri
From tri-
- grams
grams
- From “Domain Managers”
From “Domain Managers”
Credit card numbers
Credit card numbers
City, Stats
City, Stats
VoiceXML
- TTS
TTS
- <
<ssml ssml> markup > markup
- Choice of voice
Choice of voice
- Choice of language
Choice of language
- Choice of how to pronounce things
Choice of how to pronounce things
- Specify breaks, timing emphasis
Specify breaks, timing emphasis
Structure
< <vxml vxml version="1.0"> version="1.0"> <meta name="author" content="John Doe"/> <meta name="author" content="John Doe"/> < <var var name="hi" name="hi" expr expr="'Hello World!'"/> ="'Hello World!'"/> <form> <form> <block> <block> <value <value expr expr="hi"/> ="hi"/> < <goto goto next="# next="#say_goodbye say_goodbye"/> "/> </block> </block> </form> </form> <form id=" <form id="say_goodbye say_goodbye"> "> <block> <block> Goodbye! Goodbye! </block> </block> </form> </form> </ </vxml vxml> >
Basic Tags
- <form id=“
<form id=“xxxx xxxx”> ”>
- <
<goto goto next=“#xxx”> next=“#xxx”>
- <field> gather info from user through
<field> gather info from user through speech or DTMF speech or DTMF
- <record>
<record> record record data user data user
- <
<subdialog subdialog> performs some sub dialog > performs some sub dialog
<field> tag
<form id=“ <form id=“getBusNumber getBusNumber”> ”> <field name=“ <field name=“BusNumber BusNumber”> ”> <prompt>Which bus line do you want?</prompt> <prompt>Which bus line do you want?</prompt> <grammar <grammar src src=“grams/ =“grams/bus.gram bus.gram”> ”> <help> Please say you desired bus number, e.g. <help> Please say you desired bus number, e.g. 61C</help> 61C</help> </field> </field> </form> </form>
Flow of Control
- Goto
Goto < <goto goto next=“# next=“#GetBusNumber GetBusNumber> >
< <goto goto next=“ next=“Trains.vxml Trains.vxml”> ”>
- <if
<if cond cond=“ =“BusNumber BusNumber == ‘501”> == ‘501”>
<prompt> Sorry that bus no longer runs</prompt> <prompt> Sorry that bus no longer runs</prompt> < <elseif elseif cond cond=“ =“BusNumber BusNumber == ’56U”> == ’56U”> <prompt> Sorry it’ll be a long wait </prompt> <prompt> Sorry it’ll be a long wait </prompt> <else /> <else /> <prompt> One will be along shortly </prompt> <prompt> One will be along shortly </prompt> </if> </if>
Variables
- <
<var var name=“var1” name=“var1” expr expr =“hello”> =“hello”> <prompt I just wanted to say <value <prompt I just wanted to say <value expr expr=“var1”> </prompt> =“var1”> </prompt> <assign name=“var1” <assign name=“var1” expr expr=“goodbye”> =“goodbye”>
Recognition Grammars
- Speech Recognition Grammar Specification
Speech Recognition Grammar Specification
- (SRGS)
(SRGS)
- Augmented BNF
Augmented BNF
$order = I would like a $drink $order = I would like a $drink $drink = coke | $drink = coke | pepsi pepsi | | mountain_dew mountain_dew
VoiceXML Browsers
- Compatibility
Compatibility
- Not as compatible as one would like
Not as compatible as one would like
- <objects> can be different (but useful)
<objects> can be different (but useful)
City, State recognizers
City, State recognizers
- ECMAscript
ECMAscript ( (Javascript Javascript) )
Beyond VoiceXML (in VoiceXML)
- Mixing html/
Mixing html/cgi cgi scripts in scripts in VoiceXML VoiceXML
- Use
Use php php to generate to generate VoiceXML VoiceXML files files
- Use
Use urls urls (with ?...) to calculate/get data (with ?...) to calculate/get data
http://weather.com?zip=“15213
http://weather.com?zip=“15213” ”
- Use
Use urls urls to get waveforms to get waveforms
http://tts.com?text=“Hello
http://tts.com?text=“Hello World” World”
VoiceXML future
- N
N-
- gram grammar Markup Language
gram grammar Markup Language
- Many browsers hove own extensions
Many browsers hove own extensions
- Pronunciation Lexicon Markup Language
Pronunciation Lexicon Markup Language
- A way to add new items to the lexicon
A way to add new items to the lexicon
- Hard to find good standards
Hard to find good standards
- Call Control Markup Language
Call Control Markup Language
- For management and logging of calls
For management and logging of calls
Microsoft SALT
- SALT tags
SALT tags
Listen DTMF prompt bind grammar (plus Listen DTMF prompt bind grammar (plus ssml ssml) )
- Designed for desktop not just phone
Designed for desktop not just phone
- Design to be shared documents
Design to be shared documents
- Viewing (HTML) and Speech (SALT)
Viewing (HTML) and Speech (SALT)
Available Systems
- Nuance
Nuance
- Be
Be-
- vocal
vocal
- Tell Me
Tell Me
- Tell
Tell-
- me studio
me studio
- OpenVXI/publicvoicexml.org
OpenVXI/publicvoicexml.org
- Many
Many others
- thers