incits 456 speaker recognition format for raw data
play

INCITS 456 Speaker Recognition Format for Raw Data Interchange - PowerPoint PPT Presentation

INCITS 456 Speaker Recognition Format for Raw Data Interchange (SIVR-1) Judith A. Markowitz, PhD J. Markowitz, Consultants Chicago, IL www.jmarkowitz.com W3C Workshop on SIV March 6, 2009 1 1 What is INCITS 456? What is INCITS 456?


  1. INCITS 456 Speaker Recognition Format for Raw Data Interchange (SIVR-1) Judith A. Markowitz, PhD J. Markowitz, Consultants Chicago, IL www.jmarkowitz.com W3C Workshop on SIV March 6, 2009 1 1

  2. What is INCITS 456? What is INCITS 456? • Data interchange format Data interchange format • • Storage & exchange of speech/voice data Storage & exchange of speech/voice data • • CBEFF Biometric Data Block (BDB) CBEFF Biometric Data Block (BDB) • • Draft standard Draft standard • • First for speech/voice First for speech/voice • • First in XML First in XML • • Developed jointly by ANSI/INCITS M1 and Developed jointly by ANSI/INCITS M1 and • VoiceXML Forum’ ’s Speaker Biometrics s Speaker Biometrics VoiceXML Forum Committee Committee Judith Markowitz W3C SIV Workshop 2 2 J. Markowitz, Consultants March 5 2009

  3. What is INCITS 456? What is INCITS 456? • Captures, stores, and exchanges RAW data Captures, stores, and exchanges RAW data • • Does not capture features or models Does not capture features or models • • Goal is to provide information that will Goal is to provide information that will • enable recipient to analyze the data enable recipient to analyze the data □ Audio format Audio format □ □ Input device and channel Input device and channel □ □ Speaker (sex, age) but not claim Speaker (sex, age) but not claim □ □ Language/dialect Language/dialect □ Judith Markowitz W3C SIV Workshop 3 3 J. Markowitz, Consultants March 5 2009

  4. Uses of INCITS 456 Uses of INCITS 456 • Data sharing Data sharing • • Watch list creation Watch list creation • • Internal system audit Internal system audit • • Automatic reenrollment of users Automatic reenrollment of users • • Multi Multi- -biometric fusion biometric fusion • • Product/algorithm testing Product/algorithm testing • • SIV registry/service SIV registry/service • Judith Markowitz W3C SIV Workshop 4 4 J. Markowitz, Consultants March 5 2009

  5. Structure of INCITS 456 Structure of INCITS 456 Two levels Two levels • Session header Session header • information that should not change during information that should not change during the session (EX: sex of speaker, date of the session (EX: sex of speaker, date of session, device & channel, audio format… …) ) session, device & channel, audio format • Instance header Instance header (Interaction Turn) • (Interaction Turn) information that changes from turn to turn information that changes from turn to turn of a dialogue (EX: utterance length, SNR, of a dialogue (EX: utterance length, SNR, prompt, content… …) ) prompt, content Judith Markowitz W3C SIV Workshop 5 5 J. Markowitz, Consultants March 5 2009

  6. Example: Enrollment Example: Enrollment July 14, 2008 Chicago IVR: Welcome ABC Bank Welcome ABC Bank’ ’s VoiceSure enrollment s VoiceSure enrollment IVR: system… system …Please say your account number. Please say your account number. Caller: 357128999 357128999 ASR processes input Caller: ASR processes input SIV Session begins at 1:14 Central Daylight time IVR: Thank you Thank you… … Please say your password Please say your password IVR: Caller : lollapalooza lollapalooza 2.5 seconds Caller : IVR: Please say your password again. Please say your password again. IVR: Caller: lollapalooza lollapalooza 2.2 seconds Caller: SIV Session ends at 1:16 Central Daylight time Judith Markowitz W3C SIV Workshop 6 6 J. Markowitz, Consultants March 5 2009

  7. Example: Enrollment Example: Enrollment Session header (partial) Session header (partial) Date & Time start: 2008-07-14T13:14-5:00 Date & Time end:2008-07-14T13:16-5:00 Channel: Digital NonVoIP, 300-3100 Hz Audio format: ByteOrder=0xFF00, streaming, format=OGG Vorbis, mono, sampling rate=8000, bits per sample=8… Speaker: female Input Device: telephone Judith Markowitz W3C SIV Workshop 7 7 J. Markowitz, Consultants March 5 2009

  8. Example: Enrollment Example: Enrollment Instance 1 header (partial) Instance 1 header (partial) ASR used: No Prompt used: prompt1.wav Utterance: utt1.wav, 2.5 sec. (20000 samples), content unknown, Volume=68.5 dB, SNR 42.1 Quality rating: unknown Judith Markowitz W3C SIV Workshop 8 8 J. Markowitz, Consultants March 5 2009

  9. Code for Example (Session Header) <Session FormatVersion="SIVR-1"> <DateAndTime> <start>“2008-07-14T13:11-5:00”</start> <end>“2008-07-14T13:14-5:00”</end> </DateAndTime> <Channel> <Type>”DigitalNonVoIP”</Type> <CutoffTop>3100</CutoffTop> <CutoffBottom>300</CutoffBottom> </Channel> <AudioFormatHeader> <ByteOrder>0xFF00</ByteOrder> <Streaming>0</Streaming> Audio <HeaderSize>25</HeaderSize> Format <FileLengthInSamples>13600</FileLengthInSamples> <AudioFormat>“OGG Vorbis”</AudioFormat> <ChannelCount>1</ChannelCount> <SamplingRate>8000</SamplingRate> <BitsPerSample>8</BitsPerSample> <AudioFullSecondsOf>6</AudioFullSecondsOf> <AudioRemainderSamples>5600</AudioRemainderSamples> </AudioFormatHeader> Judith Markowitz W3C SIV Workshop 9 9 J. Markowitz, Consultants March 5 2009

  10. Code for Example (Session Header cont.) <Speaker> <SpeakerMF>”Female”</SpeakerMF> </Speaker> <InputDevice> <Type>”Telephone”</Type> </InputDevice> Judith Markowitz W3C SIV Workshop 10 10 J. Markowitz, Consultants March 5 2009

  11. Code for Example (instance #1) (instance #1) <Instance> <InstanceNumber>1</InstanceNumber> <ASRUsed>”No”</ASRUsed? <TypeOfPromptContent>”String”</TypeOfPromptContent> <StringPromptContent>”URL EnrollPrompts/Prompt1.wav”</StringPromptContent> <Utterance> <DataType>”Pointer”</DataType> <Data>”20080714-3124554/Utt1.wav”</Data> <FileLengthInSamples>20000</FileLengthInSamples> <AudioFullSecondsOf>2</AudioFullSecondsOf> <AudioRemainderSamples>4000</AudioRemainderSamples> <Content>”Unknown”</Content> <Volume>68.5</Volume> <SNREstimate>42.1</SNREstimate> <Quality> <Score>254</Score> <AlgorithmVendorID>0</AlgorithmVendorID> <AlgorithmID>0</AlgorithmID> </Quality> </Utterance> </Instance> Judith Markowitz W3C SIV Workshop 11 11 J. Markowitz, Consultants March 5 2009

  12. Implementation Implementation • Application generates format directly Application generates format directly • • The generated format can then become part The generated format can then become part • of an EMMA tag of an EMMA tag <emma:interpretation < emma:interpretation id="intp1" id="intp1" emma:medium="acoustic" emma:medium ="acoustic" emma:mode emma:mode="voice ="voice“ “ emma:function="verification"> emma:function ="verification"> <DEFF uri uri="http://example.com/DEFF ="http://example.com/DEFF- -docs/mydoc12345/> docs/mydoc12345/> <DEFF </emma:interpretation emma:interpretation> > </ Judith Markowitz W3C SIV Workshop 12 12 J. Markowitz, Consultants March 5 2009

  13. Implementation Implementation • Or be used as a resource in an EMMA Or be used as a resource in an EMMA • derivation derivation <emma:derivation emma:derivation> > < <emma:interpretation emma:interpretation id="better"> id="better"> < <emma:derived emma:derived- -from from < resource= resource=http://www.INCITS456 http://www.INCITS456- -1.txt 1.txt composite="false"/> composite="false"/> . . : : </emma:interpretation emma:interpretation> > </ </emma:derivation emma:derivation> > </ Judith Markowitz W3C SIV Workshop 13 13 J. Markowitz, Consultants March 5 2009

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend