EMMA Extensible Multimodal Annotation markup language Canonical - PowerPoint PPT Presentation

EMMA Extensible Multimodal Annotation markup language Canonical structure semantic interpretations for a variety of inputs including: • Speech • Natural language text • GUI • Ink 1 James A. Larson EMMA

EMMA Extensible Multimodal Annotation markup language Canonical structure semantic interpretations for a variety of inputs including: • Speech • Natural language text • GUI • Ink Ink W3C standard: http://www.w3.org/2002/mmi/ 2 James A. Larson kEMMA

EMMA Represents user input Vehicle for transmitting user’s intention throughout application Three components • Data model • Interpretation • Annotation (main focus of standard)

General Annotations Confidence Timestamps Alternative interpretations Language Medium (visual, acoustic, tactile) Modality (voice, keys, photograph) Function (dialog, recording, verification…)

EMMA Example EMMA document “I want to go from Boston to Denver on March 11, 2003” <emma:emma emma:version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <emma:interpretation emma:id="int1"> <origin>Boston</origin> Interpretation <destination>Denver</destination> <date>03112003</date> </emma:interpretation> <rdf:RDF>  Annotations <rdf:Description rdf:about="#int1" <emma:absolute-timestamp emma:start="2003-03-26T0:00:00.15" emma:end="2003-03-26T0:00:00.2"/>  <rdf:Description rdf:about="#int1“ emma:confidence="0.75"/> • <rdf:Description rdf:about="#int1" emma:model="http://myserver/models/city.xml"/> </rdf:RDF> Data Model </emma:emma>

The same meaning with speech and mouse input <emma:interpretation medium=“acoustic” mode=“voice” id="int1"> <origin>Boston</origin> Speech <destination>Denver</destination> <date>03112008</date> </emma:interpretation> <emma:interpretation medium=“tactile” mode=“gui” id="int1"> Mouse <origin>Boston</origin> <destination>Denver</destination> <date>03112008</date> </emma:interpretation>

EMMA Annotations •Tokens of input: emma:tokens attribute •Duration of input: emma:duration attribute •Reference to processing: emma:process attribute •Composite Input and Relative Timestamps •Lack of input: emma:no-input attribute •Medium, mode, and function of user inputs: emma:medium, emma:mode, emma:function, •Uninterpreted input: emma:uninterpreted attribute emma:verbal attributes •Human language of input: emma:lang attribute •Composite multimodality: emma:hook attribute •Reference to signal: emma:signal and •Cost: emma:cost attribute emma:signal-size attributes •Endpoint properties: emma:endpoint-role, •Media type: emma:media-type attribute emma:endpoint-address, emma:port-type, emma:port-num, emma:message-id, •Confidence scores: emma:confidence attribute emma:service-name, emma:endpoint-pair-ref, •Input source: emma:source attribute emma:endpoint-info-ref attributes •Absolute timestamps: emma:start, emma:end •Reference to emma:grammar element: attributes emma:grammar-ref attribute •Relative timestamps: emma:time-ref-uri, •Dialog turns: emma:dialog-turn attribute emma:time-ref-anchor-point, emma:offset-to-start attributes 7 James A. Larson EMMA

Verification Claiming to be 'charles foster kane', the user said 'rosebud', and the speaker verification engine accepted the claim with a confidence of 0.95. <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma/"> <emma:interpretation id="interp1 emma:duration="1810“ emma:confidence="0.95" emma:process=file://myverifier emma:signal="http://example.com/signals/sg23.bin" emma:medium="acoustic“ emma:verbal="true" emma:mode="speech" emma:start="1149773124516" emma:uninterpreted="false" emma:function="verification" emma:dialog-turn="1“ emma:end="1149773126326" emma:lang=" en-US " emma:tokens="rosebud" > <claim>charles foster kane</claim> <result>verified</result> </emma:interpretation> </emma:emma> If no ASR results are available, 'emma:tokens="rosebud"' would be omitted.

Identification The user said 'rosebud' and the speaker identification engine identified the speaker as 'charles foster kane' with a confidence of 0.95. <emma:emma version="1.0“ xmlns:emma="http://www.w3.org/2003/04/emma/"> <emma:interpretation id="interp1" emma:duration="1810“ emma:confidence="0.95" emma:process=file://myidentifier emma:signal=http://example.com/signals/sg23.bin emma:medium="acoustic“ emma:verbal="true" emma:mode="speech" emma:start="1149773124516“ emma:uninterpreted="false" emma:function="identification" emma:dialog-turn="1" emma:end="1149773126326" emma:lang="en-US“ emma:tokens="rosebud" > <result>charles foster kane</result> </emma:interpretation> </emma:emma>

Emma: fusion Multiple sources of input • Voice into a speaker verification engine • Dialog into a VoiceXML 2.x engine Results of both engines are represented using EMMA Merging engine combines these two results into a single result The three engines may be • Co-located at a single site or distributed across a network • May be performed in real time or delayed time 10 James A. Larson EMMA

EMMA: fusion Speech Keyboard Speaker VoiceXML Voice VoiceXML Identification Engine Samples Dialog EMMA EMMA Merging/ Unification EMMA Applications 11 James A. Larson EMMA

EMMA: fusion Speech Keyboard Grammar Speech Keyboard + Semantic Interpretation Recognition Interpretation Interpretation Instructions Instructions EMMA EMMA <interpretation mode = “voice"> Merging/ <emma:interpretation Unification id="interp1“ emma:function=“verification" emma:confidence="0.6"> EMMA <result> John Dow </result> </interpretation> Applications 12 James A. Larson EMMA

EMMA: fusion Speech Keyboard Grammar Speech Keyboard + Semantic Interpretation Recognition Interpretation Interpretation Instructions Instructions EMMA EMMA <interpretation mode = “voice"> Merging/ <interpretation mode = “text"> <emma:interpretation Unification emma:interpretation id="interp1“ id="interp1“ emma:function=“dialog" emma:function=“verification" emma:confidence="0.6"> emma:confidence="0.6"> EMMA <result> John Dow </result> <result> John Dow </result> </interpretation> </interpretation> Applications 13 James A. Larson EMMA

<interpretation mode = “derived"> EMMA: fusion emma:interpretation id="interp3“ emma:function=“fusion" emma:confidence="0.7"> <result> John Dow </result> </interpretation> Speech Keyboard Grammar Speech Keyboard + Semantic Interpretation Recognition Interpretation Interpretation Instructions Instructions EMMA EMMA <interpretation mode = “voice"> Merging/ <interpretation mode = “text"> <emma:interpretation Unification emma:interpretation id="interp2“ id="interp1“ emma:function="dialog" emma:function="identification" emma:confidence="0.6"> emma:confidence="0.6"> EMMA <result> John Dow </result> <result> John Dow </result> </interpretation> </interpretation> Applications 14 James A. Larson EMMA

Summary EMMA can be used for many types of data EMMA captures information about each data type EMMA information is used in various processing phases • Interpretation and semantic processing • Fusion • Data transmission 15 James A. Larson EMMA

EMMA Extensible Multimodal Annotation markup language Canonical - PowerPoint PPT Presentation

EMMA Extensible Multimodal Annotation markup language Canonical structure semantic interpretations for a variety of inputs including: Speech Natural language text GUI Ink 1 James A. Larson EMMA EMMA Extensible Multimodal

Code Coverage Outlines Code Coverage. EMMA Installing EMMA Running EMMA View

An introduction to L A T EX Emma McCoy Course Notes: Emma McCoy/Phillip Kent November 1, 2017

Friction Fun! Toy Team Members: Emma Drum, Emma Schooley, Vern Patel, and Raj Patel Team Name:

Setting the scene: the ambition for London Dr Dr Emma Emma Whic hiche her Supported by and

healthcare education using the Microsoft HoloLens Emma Collins Emma Collins Dr Liz Ditzel

Westminster e Forum Emma Robertson Emma Robertson June 2011 Commercial in confidence Impact of

Algorithms for NLP CS 11-711 Fall 2020 Lecture 1: Introduction Emma Strubell Welcome! Emma

EMMA MA RF System C. Ohmori and J. S. Berg EMMA MA System * Many FFAG applications require slow

Emma Walmsley, CEO Agenda Q2 2018 progress Emma Walmsley, (15 mins) Chief Executive Officer

A presentation by hilldickinson.com Gross Negligence Manslaughter Emma Galland Partner

Family life course and mortality a sequence analysis Maria Josefsson 1 , Emma Lundholm 1,2 ,

Neutrino Group project Alex T. Emma L. Chris J. Greg H. Safi D. December 14, 2011 Alex T.,

Analysis of HELCOM data n+ o-phenols, PFAS Preliminary results Emma Undeman, Baltic Sea Centre

ESBN Presentation to IGG John Bracken 7 th March 2018 Agenda Storm EMMA Post Meter

SEND children: Principles & practices 29 th September 2017 Emma Patman Twitter: @epatman

The pow er of Rhetoric The pow er of Rhetoric Contact: Contact: EMMA REYES REYES

Broad band acoustic detectors J.P. Zendri* on behalf of the AURIGA collaboration

Intrinsic Superconductivity in graphene ? Electron-phonon mechanisms (Raman scattering) Pure

3. Operator aided acoustic target classification 4. Conclusion and future plans 1 27.01.2020

Adaptive Filters Introduction Gerhard Schmidt Christian-Albrechts-Universitt zu Kiel

Actinide Science: A focus on the properties of Uranium Dioxide Nuclear waste actinide

Outline: SUZero MML Talk Interspeech talk (for Ewald) Explain one technique in a bit more

OTN Governance Peter Harrison Chair, OTN Council 1 What does OTN do? Tracks local-to-global

Multiple-Level Models for Multi- Modal Interaction Martin Russell 1 , Antje S. Meyer 2 , Stephen

EMMA Extensible Multimodal Annotation markup language Canonical - PowerPoint PPT Presentation

EMMA Extensible Multimodal Annotation markup language Canonical structure semantic interpretations for a variety of inputs including: Speech Natural language text GUI Ink 1 James A. Larson EMMA EMMA Extensible Multimodal

Code Coverage Outlines Code Coverage. EMMA Installing EMMA Running EMMA View

An introduction to L A T EX Emma McCoy Course Notes: Emma McCoy/Phillip Kent November 1, 2017

Friction Fun! Toy Team Members: Emma Drum, Emma Schooley, Vern Patel, and Raj Patel Team Name:

Setting the scene: the ambition for London Dr Dr Emma Emma Whic hiche her Supported by and

healthcare education using the Microsoft HoloLens Emma Collins Emma Collins Dr Liz Ditzel

Westminster e Forum Emma Robertson Emma Robertson June 2011 Commercial in confidence Impact of

Algorithms for NLP CS 11-711 Fall 2020 Lecture 1: Introduction Emma Strubell Welcome! Emma

EMMA MA RF System C. Ohmori and J. S. Berg EMMA MA System * Many FFAG applications require slow

Emma Walmsley, CEO Agenda Q2 2018 progress Emma Walmsley, (15 mins) Chief Executive Officer

A presentation by hilldickinson.com Gross Negligence Manslaughter Emma Galland Partner

Family life course and mortality a sequence analysis Maria Josefsson 1 , Emma Lundholm 1,2 ,

Neutrino Group project Alex T. Emma L. Chris J. Greg H. Safi D. December 14, 2011 Alex T.,

Analysis of HELCOM data n+ o-phenols, PFAS Preliminary results Emma Undeman, Baltic Sea Centre

ESBN Presentation to IGG John Bracken 7 th March 2018 Agenda Storm EMMA Post Meter

SEND children: Principles &amp; practices 29 th September 2017 Emma Patman Twitter: @epatman

The pow er of Rhetoric The pow er of Rhetoric Contact: Contact: EMMA REYES REYES

Broad band acoustic detectors J.P. Zendri* on behalf of the AURIGA collaboration

Intrinsic Superconductivity in graphene ? Electron-phonon mechanisms (Raman scattering) Pure

3. Operator aided acoustic target classification 4. Conclusion and future plans 1 27.01.2020

Adaptive Filters Introduction Gerhard Schmidt Christian-Albrechts-Universitt zu Kiel

Actinide Science: A focus on the properties of Uranium Dioxide Nuclear waste actinide

Outline: SUZero MML Talk Interspeech talk (for Ewald) Explain one technique in a bit more

OTN Governance Peter Harrison Chair, OTN Council 1 What does OTN do? Tracks local-to-global

Multiple-Level Models for Multi- Modal Interaction Martin Russell 1 , Antje S. Meyer 2 , Stephen

SEND children: Principles & practices 29 th September 2017 Emma Patman Twitter: @epatman