INCITS 456 Speaker Recognition Format for Raw Data Interchange - - PowerPoint PPT Presentation

incits 456 speaker recognition format for raw data
SMART_READER_LITE
LIVE PREVIEW

INCITS 456 Speaker Recognition Format for Raw Data Interchange - - PowerPoint PPT Presentation

INCITS 456 Speaker Recognition Format for Raw Data Interchange (SIVR-1) Judith A. Markowitz, PhD J. Markowitz, Consultants Chicago, IL www.jmarkowitz.com W3C Workshop on SIV March 6, 2009 1 1 What is INCITS 456? What is INCITS 456?


slide-1
SLIDE 1

1 1

INCITS 456 Speaker Recognition Format for Raw Data Interchange (SIVR-1)

Judith A. Markowitz, PhD

  • J. Markowitz, Consultants

Chicago, IL www.jmarkowitz.com

W3C Workshop on SIV March 6, 2009

slide-2
SLIDE 2

W3C SIV Workshop March 5 2009 Judith Markowitz

  • J. Markowitz, Consultants

2 2

  • Data interchange format

Data interchange format

  • Storage & exchange of speech/voice data

Storage & exchange of speech/voice data

  • CBEFF Biometric Data Block (BDB)

CBEFF Biometric Data Block (BDB)

  • Draft standard

Draft standard

  • First for speech/voice

First for speech/voice

  • First in XML

First in XML

  • Developed jointly by ANSI/INCITS M1 and

Developed jointly by ANSI/INCITS M1 and VoiceXML Forum VoiceXML Forum’ ’s Speaker Biometrics s Speaker Biometrics Committee Committee

What is INCITS 456? What is INCITS 456?

slide-3
SLIDE 3

W3C SIV Workshop March 5 2009 Judith Markowitz

  • J. Markowitz, Consultants

3 3

  • Captures, stores, and exchanges RAW data

Captures, stores, and exchanges RAW data

  • Does not capture features or models

Does not capture features or models

  • Goal is to provide information that will

Goal is to provide information that will enable recipient to analyze the data enable recipient to analyze the data

□ □ Audio format Audio format □ □ Input device and channel Input device and channel □ □ Speaker (sex, age) but not claim Speaker (sex, age) but not claim □ □ Language/dialect Language/dialect

What is INCITS 456? What is INCITS 456?

slide-4
SLIDE 4

W3C SIV Workshop March 5 2009 Judith Markowitz

  • J. Markowitz, Consultants

4 4

  • Data sharing

Data sharing

  • Watch list creation

Watch list creation

  • Internal system audit

Internal system audit

  • Automatic reenrollment of users

Automatic reenrollment of users

  • Multi

Multi-

  • biometric fusion

biometric fusion

  • Product/algorithm testing

Product/algorithm testing

  • SIV registry/service

SIV registry/service

Uses of INCITS 456 Uses of INCITS 456

slide-5
SLIDE 5

W3C SIV Workshop March 5 2009 Judith Markowitz

  • J. Markowitz, Consultants

5 5

  • Session header

Session header

information that should not change during information that should not change during the session (EX: sex of speaker, date of the session (EX: sex of speaker, date of session, device & channel, audio format session, device & channel, audio format… …) )

  • Instance header

Instance header (Interaction Turn)

(Interaction Turn) information that changes from turn to turn information that changes from turn to turn

  • f a dialogue (EX: utterance length, SNR,
  • f a dialogue (EX: utterance length, SNR,

prompt, content prompt, content… …) )

Structure of INCITS 456 Structure of INCITS 456

Two levels Two levels

slide-6
SLIDE 6

W3C SIV Workshop March 5 2009 Judith Markowitz

  • J. Markowitz, Consultants

6 6

July 14, 2008 Chicago IVR: IVR: Welcome ABC Bank Welcome ABC Bank’ ’s VoiceSure enrollment s VoiceSure enrollment system system… …Please say your account number. Please say your account number. Caller: Caller: 357128999 357128999 ASR processes input ASR processes input

SIV Session begins at 1:14 Central Daylight time

IVR: IVR: Thank you Thank you… … Please say your password Please say your password Caller : Caller : lollapalooza lollapalooza 2.5 seconds IVR: IVR: Please say your password again. Please say your password again. Caller: Caller: lollapalooza lollapalooza 2.2 seconds

SIV Session ends at 1:16 Central Daylight time

Example: Enrollment Example: Enrollment

slide-7
SLIDE 7

W3C SIV Workshop March 5 2009 Judith Markowitz

  • J. Markowitz, Consultants

7 7

Date & Time start: 2008-07-14T13:14-5:00 Date & Time end:2008-07-14T13:16-5:00 Channel: Digital NonVoIP, 300-3100 Hz Audio format: ByteOrder=0xFF00, streaming, format=OGG Vorbis, mono, sampling rate=8000, bits per sample=8… Speaker: female Input Device: telephone

Example: Enrollment Example: Enrollment

Session header Session header (partial)

(partial)

slide-8
SLIDE 8

W3C SIV Workshop March 5 2009 Judith Markowitz

  • J. Markowitz, Consultants

8 8

ASR used: No Prompt used: prompt1.wav Utterance: utt1.wav, 2.5 sec. (20000 samples), content unknown, Volume=68.5 dB, SNR 42.1 Quality rating: unknown

Example: Enrollment Example: Enrollment

Instance 1 header Instance 1 header (partial)

(partial)

slide-9
SLIDE 9

W3C SIV Workshop March 5 2009 Judith Markowitz

  • J. Markowitz, Consultants

9 9

<Session FormatVersion="SIVR-1"> <DateAndTime> <start>“2008-07-14T13:11-5:00”</start> <end>“2008-07-14T13:14-5:00”</end> </DateAndTime> <Channel> <Type>”DigitalNonVoIP”</Type> <CutoffTop>3100</CutoffTop> <CutoffBottom>300</CutoffBottom> </Channel> <AudioFormatHeader> <ByteOrder>0xFF00</ByteOrder> <Streaming>0</Streaming> <HeaderSize>25</HeaderSize> <FileLengthInSamples>13600</FileLengthInSamples> <AudioFormat>“OGG Vorbis”</AudioFormat> <ChannelCount>1</ChannelCount> <SamplingRate>8000</SamplingRate> <BitsPerSample>8</BitsPerSample> <AudioFullSecondsOf>6</AudioFullSecondsOf> <AudioRemainderSamples>5600</AudioRemainderSamples> </AudioFormatHeader>

Code for Example (Session Header)

Audio Format

slide-10
SLIDE 10

W3C SIV Workshop March 5 2009 Judith Markowitz

  • J. Markowitz, Consultants

10 10

<Speaker> <SpeakerMF>”Female”</SpeakerMF> </Speaker> <InputDevice> <Type>”Telephone”</Type> </InputDevice>

Code for Example (Session Header cont.)

slide-11
SLIDE 11

W3C SIV Workshop March 5 2009 Judith Markowitz

  • J. Markowitz, Consultants

11 11

<Instance> <InstanceNumber>1</InstanceNumber> <ASRUsed>”No”</ASRUsed? <TypeOfPromptContent>”String”</TypeOfPromptContent> <StringPromptContent>”URL EnrollPrompts/Prompt1.wav”</StringPromptContent> <Utterance> <DataType>”Pointer”</DataType> <Data>”20080714-3124554/Utt1.wav”</Data> <FileLengthInSamples>20000</FileLengthInSamples> <AudioFullSecondsOf>2</AudioFullSecondsOf> <AudioRemainderSamples>4000</AudioRemainderSamples> <Content>”Unknown”</Content> <Volume>68.5</Volume> <SNREstimate>42.1</SNREstimate> <Quality> <Score>254</Score> <AlgorithmVendorID>0</AlgorithmVendorID> <AlgorithmID>0</AlgorithmID> </Quality> </Utterance> </Instance>

Code for Example (instance #1) (instance #1)

slide-12
SLIDE 12

W3C SIV Workshop March 5 2009 Judith Markowitz

  • J. Markowitz, Consultants

12 12

  • Application generates format directly

Application generates format directly

  • The generated format can then become part

The generated format can then become part

  • f an EMMA tag
  • f an EMMA tag

< <emma:interpretation emma:interpretation id="intp1" id="intp1" emma:medium emma:medium="acoustic" ="acoustic" emma:mode emma:mode="voice ="voice“ “ emma:function emma:function="verification"> ="verification"> <DEFF <DEFF uri uri="http://example.com/DEFF ="http://example.com/DEFF-

  • docs/mydoc12345/>

docs/mydoc12345/> </ </emma:interpretation emma:interpretation> >

Implementation Implementation

slide-13
SLIDE 13

W3C SIV Workshop March 5 2009 Judith Markowitz

  • J. Markowitz, Consultants

13 13

  • Or be used as a resource in an EMMA

Or be used as a resource in an EMMA derivation derivation

< <emma:derivation emma:derivation> > < <emma:interpretation emma:interpretation id="better"> id="better"> < <emma:derived emma:derived-

  • from

from resource= resource=http://www.INCITS456 http://www.INCITS456-

  • 1.txt

1.txt composite="false"/> composite="false"/> . . : : </ </emma:interpretation emma:interpretation> > </ </emma:derivation emma:derivation> >

Implementation Implementation