SIV for VoiceXML 3.0: Language and Application Design - - PowerPoint PPT Presentation

siv for voicexml 3 0 language and application design
SMART_READER_LITE
LIVE PREVIEW

SIV for VoiceXML 3.0: Language and Application Design - - PowerPoint PPT Presentation

SIV for VoiceXML 3.0: Language and Application Design Considerations Ken Rehor Cisco Systems, Inc. krehor@cisco.com March 05, 2009 VoiceXML Application Architecture VoiceXML VoIP VoiceXML VoiceXML Verification Server Gateway IP PSTN


slide-1
SLIDE 1

SIV for VoiceXML 3.0: Language and Application Design Considerations

Ken Rehor Cisco Systems, Inc.

krehor@cisco.com

March 05, 2009

slide-2
SLIDE 2

VoiceXML Application Architecture

PSTN / VoIP VoIP Gateway

(HTTP)

VoiceXML Server ASR TTS SIV Audio DTMF

VoiceXML

VoiceXML Application IP Verification Application SIV engine

VP DB

slide-3
SLIDE 3

SIV in VoiceXML 2.x

  • Server-side SIV processing

– <record> – <field> with recordutterance

  • Language extensions

– Nuance "voiceprint forms" – BeVocal

slide-4
SLIDE 4

VoiceXML 2.x SIV Integration

PSTN / VoIP

(HTTP)

VoiceXML Server

VoiceXML

VoiceXML Application IP Verification Application SIV engine

VP DB

<subdialog> recordutterance <record>

slide-5
SLIDE 5

VoiceXML 2.x SIV Integration

PSTN / VoIP

(HTTP)

VoiceXML Server

VoiceXML

VoiceXML Application IP Verification Application SIV engine

VP DB

<subdialog> recordutterance <record>

slide-6
SLIDE 6

Standard VoiceXML prompt/field model

  • Text-independent

– <prompt> / <record> – Submit recording to application server

  • Text-dependent, Text-prompted

– <prompt> / <field> (with recordutterance) – Submit utterance recording to application server

slide-7
SLIDE 7

VoiceXML 2.x <record>

<form name="verify"> <!-- could use external grammar --> <record name="utterance" maxtime="5s <prompt> Say this digit sequence: one two three four five.</prompt> <noinput> I didn't hear anything, please try again. </noinput> </record> <block> <submit next="check_utterance.pl" enctype="multipart/form-data" method="post" namelist="utterance"/> </block> </form>

slide-8
SLIDE 8

VoiceXML 2.1 <field>

<form name="verify"> <prompt>Say this digit sequence: one two three four five.</prompt> <field type="digits"> <filled> <!-- if spoken digits match expected response, then process voice model --> </filled> </field> </form>

slide-9
SLIDE 9

VoiceXML 2.1 <field> with recordutterance

<form name="verify"> <property name="recordutterance" value="true"/> <prompt>Say this digit sequence: one two three four five.</prompt> <field type="digits"> <filled> <!-- if spoken digits match expected response, then process voice model --> </filled> </field> </form>

slide-10
SLIDE 10

Security Concerns

slide-11
SLIDE 11

Architecture / Security / Trust

  • One architecture may not be suitable for every

use case

Some architectures may not support the level of (dis)trust required for a particular deployment

slide-12
SLIDE 12

.wav VoiceXML browser MRCP Server Voice Web Application Server

PSTN

  • r IP

network

Voice template database

Architecture options carry security implications

SIV Engine ASR Engine Authentication Web Service Other Application Web Services

Voice template database

<grxml>

<vxml> TTS Engine

<ssml>

voice template

?

Voice DEFF may be used between SIV components and services Web Service interface

Security, Trust and Protocol Considerations in Distributed Voice Web Applications

slide-13
SLIDE 13

.wav VoiceXML browser MRCP Server Voice Web Application Server

PSTN

  • r IP

network

MRCP Client ASR Engine

SIV engine and database managed by App server

VoiceXML browser records the utterance and forwards to app server (typical scenario for VoiceXML 2.0/2.1)

Voice template database

<grxml>

<vxml> TTS Engine

<ssml>

voice template Voice templates managed and stored locally by SIV engine Note: DTMF processing not shown audio Audio stream vs. buffers Streaming handled by RTP? Buffers may be handled by audio recorder function. Part

  • f browser or MRCP engine?

<record> audio SIV Engine

slide-14
SLIDE 14

.wav VoiceXML browser MRCP Server Voice Web Application Server

PSTN

  • r IP

network

MRCP Client ASR Engine

SIV engine and database managed by App server

VoiceXML browser records the utterance and forwards to app server (typical scenario for VoiceXML 2.0/2.1)

Voice template database

<grxml>

<vxml> TTS Engine

<ssml>

voice template Voice templates managed and stored locally by SIV engine Note: DTMF processing not shown audio <record> audio SIV Engine Voice Web Application Server

IP

Service Provider

slide-15
SLIDE 15

.wav VoiceXML browser MRCP Server Voice Web Application Server

PSTN

  • r IP

network

MRCP Client SIV Engine ASR Engine

SIV engine and database managed by MRCP server

Voice template database

<grxml>

<vxml> TTS Engine

<ssml>

voice template Voice templates managed and stored locally by SIV engine Note: DTMF processing not shown audio Audio stream vs. buffers Streaming handled by RTP? Buffers may be handled by audio recorder function. Part

  • f browser or MRCP engine?
slide-16
SLIDE 16

.wav VoiceXML browser MRCP Server Voice Web Application Server

PSTN

  • r IP

network

MRCP Client SIV Engine ASR Engine

SIV engine managed by MRCP server SIV database managed by app server Voice model transmission managed by engine or MRCP Server

<grxml>

<vxml> TTS Engine

<ssml>

Voice templates retrieved from database by app server Note: DTMF processing not shown audio Voice template database voice template voice template

slide-17
SLIDE 17

.wav VoiceXML browser MRCP Server Voice Web Application Server

PSTN

  • r IP

network

MRCP Client SIV Engine ASR Engine

SIV engine managed by MRCP server SIV database managed by app server Voice model transmission managed by VoiceXML browser

<grxml>

<vxml> TTS Engine

<ssml>

Voice templates retrieved from database by ap server Note: DTMF processing not shown audio Voice template database voice template Voice templates managed and stored locally by SIV engine voice template

slide-18
SLIDE 18

SIV in VoiceXML 3.0

slide-19
SLIDE 19

V3 Integration Requirements

  • Control multiple Input Resources

– ASR and biometric engines – Simultaneously – Switch on a per <field> or verification basis

  • Consistent with V3 overall design goals
  • Simplify integration, yet provide sufficient control
slide-20
SLIDE 20

V3 Data, Event relationship between components

Resource Controller

(an object with semantics similar to form item)

Prompt queue Input

SSML FA Commands from

  • ther resource

controllers Stop, Play audio, DTMF Mark data

SSML/media player

Events: mark, error, done Stop, Play Barge-in on/off, done

Input 2 Input 3

audio …

Inputs are all session-level

audio recorder recognition device(s) verification, etc

Add grammar()

events

Recording types to consider:

  • <record>
  • Utterance recording
  • Whole-call recording (two-channel?)
  • Multi-turn recording

(e.g. mixed-initiative recording)

Add voiceprint()

Resources

YOU ARE HERE YOU ARE HERE

slide-21
SLIDE 21

SIV "Session"

  • Enrollment Session or Verification Session
  • Verification process: Uninterrupted process over several

dialog states (having a Session-ID) where the results of each utterance are cumulated

VoiceXMLSession SIV dialog SIV dialog SIV dialog Verification Session

slide-22
SLIDE 22

Define Data Model

  • Data passed to SIV engine

– Environment – Properties – Attributes – Voice models

  • Data returned from SIV engine

– Results specified as an EMMA result – Errors/info

  • Data used within SIV session
  • Associate SIV result with ASR result
slide-23
SLIDE 23

Define event model

  • Combine references from:

– VoiceXML Forum – MRCP v2 – Engine vendors

slide-24
SLIDE 24

VoiceXML and SIV Web Services

slide-25
SLIDE 25

VoiceXML 2.x/3.x SIV Integration via BIAS web service

PSTN / VoIP VoiceXML Browser VoiceXML Application IP Verification Application SIV engine

VP DB

recordutterance <record>

VoiceXML (HTTP)

BIAS

(Web Service)

BioAPI

slide-26
SLIDE 26

VoiceXML 2.x/3.x SIV Integration via <subdialog>

PSTN / VoIP

(HTTP)

VoiceXML Browser

VoiceXML

VoiceXML Application IP Verification Application SIV engine

VP DB

<subdialog> recordutterance <record>

VoiceXML (HTTP)

slide-27
SLIDE 27

VoiceXML 3.0 SIV Integration

PSTN / VoIP VoiceXML Browser VoiceXML Application

VP DB VoiceXML (HTTP)

SIV engine

BioAPI, MRCP, etc.

  • V3 SIV native language features
  • Browser/Engine integration via BioAPI, MRCP, proprietary API, etc.
slide-28
SLIDE 28

VoiceXML 3.0 SIV Integration

PSTN / VoIP

(HTTP)

VoiceXML Browser

VoiceXML

VoiceXML Application IP Verification Application

VP DB

<subdialog>

VoiceXML (HTTP)

SIV engine SIV engine

BioAPI, MRCP, etc.

  • V3 SIV native language features
  • Browser/Engine integration via BioAPI, MRCP, proprietary API, etc.
slide-29
SLIDE 29

VoiceXML SIV Integration via BIAS web service or <subdialog>

PSTN / VoIP

(HTTP)

VoiceXML Browser

VoiceXML

VoiceXML Application IP Verification Application

VP DB

<subdialog> recordutterance <record>

VoiceXML (HTTP)

BIAS

(Web Service)

SIV engine SIV engine

slide-30
SLIDE 30

VoiceXML Application Switching

PSTN / VoIP

(HTTP)

VoiceXML Browser

VoiceXML

VoiceXML Application IP Verification Application

VP DB

<subdialog> recordutterance <record>

VoiceXML (HTTP)

SIV engine SIV engine

slide-31
SLIDE 31

Pros and Cons of Native V3 SIV functions

slide-32
SLIDE 32

V3 SIV Native Functions: Pros and Cons

SIV engines controlled directly by VoiceXML

  • V3 Requirement: simultaneous

processing

– More than one input engine, e.g. ASR+Verify

  • Performance

– Speed/responsiveness – Accuracy

  • Consistency of resource control

– Aligned with other input resources

  • Benefit to app developers

– Don't have to buy, install, maintain SIV engine – Shared resource on VXML platform

  • Benefit to Platform Vendors /

Service Providers

– Shared resource – Ease of deployment – Enhanced service offering

  • Not available today; need an interim

solution

  • App concerns

– Enables developers to do 'bad' things

  • Security concerns

– Enables developers to do 'bad' things

  • Full portability still a long way off

– Voice models, engine capabilities, results/errors, etc. are all proprietary

  • Platform integration not standard yet

– MRCPv2 not sufficient; need more features (MRCPv3?)

Pros Cons

slide-33
SLIDE 33

VoiceXML browser -- Resource Controller VoiceXML document/script – Input review

Voice Web Application App-specific Decisions ASR Engine TTS Engine

<ssml>

<grxml>

results Audio player .wav DTMF Engine

<grxml>

results SIV Engine results voice model Audio recorder .wav

Resource Control and Distributed Decision Making

Input

  • ASR
  • audio quality
  • confidence/threshold
  • r nomatch
  • result: word or phrase
  • DTMF
  • result: digit string
  • r no match

Application-specific Decisions

  • user selection
  • authentication

<vxml>

Output