Proposal of a Hierarchical Proposal of a Hierarchical Architecture - PowerPoint PPT Presentation

Proposal of a Hierarchical Proposal of a Hierarchical Architecture for Multimodal Architecture for Multimodal Interactive Systems y Masahiro Araki* 1 Tsuneo Nitta* 2 Kouichi Katsurada* 2 Takuya Nishimoto* 3 Tetsuo Amakasu* 4 Shinnichi Kawamoto* 5 * 1 Kyoto Institute of Technology * 1 Kyoto Institute of Technology * 2 Toyohashi University of Technology * 3 The University of Tokyo * 4 NTT Cyber Space Labs. * 5 ATR 2007/11/16 W3C MMI ws 1

Outline Outline • Background – Introduction of speech IF committee under ITSCJ – Introduction to Galatea toolkit • Problems of W3C MMI Architecture – Modality Component is too large y p g – Fragile Modality fusion and fission functionality – How to deal with user model? • Our Proposal – Hierarchical MMI architecture Hierarchical MMI architecture – “ Convention over Configuration ” in various layers 2007/11/16 W3C MMI ws 2

Background(1) Background(1) • What is ITSCJ? – Information Technology Standards Commission of Japan p • under IPSJ (Information Processing Society of Japan) • Speech Interface Committee under ITSCJ • Speech Interface Committee under ITSCJ – Mission • Publish TS (Trial Standard) document concerning multimodal dialogue systems 2007/11/16 W3C MMI ws 3

Background(2) Background(2) • Theme of the committee – Architecture of MMI system h f – Requirements of each component • Future directions – Guideline for implementing practical MMI system Guideline for implementing practical MMI system – specify markup language 2007/11/16 W3C MMI ws 4

Our Aim 1. Propose an MMI architecture which can be used for advanced MMI research used for advanced MMI research W3C: From the practical point of view (mobile, accessibility) 2. Examine the validity of the architecture through y g system implementation Galatea Toolkit 3. Develop a framework and release it as a open source towards de facto standard 2007/11/16 W3C MMI ws 5

Galatea Toolkit(1) Galatea Toolkit(1) • Platform for • Platform for developing MMI systems systems • Speech recognition • Speech Synthesis • Face Image g Synthesis 2007/11/16 W3C MMI ws 6

Galatea Toolkit(2) Galatea Toolkit(2) ASR Julian Dialogue Manager TTS Galatea DM Galatea talk Face FSM 2007/11/16 W3C MMI ws 7

Galatea Toolkit(3) Galatea Toolkit(3) Di l Dialogue Phoenix Manager Macro Control Layer (AM ‐ MCL) ( ) Agent Agent Manager Direct Control Layer (AM ‐ DCL) ASR ASR TTS TTS Face Face Julian Galatea talk FSM 2007/11/16 W3C MMI ws 8

Problems of W3C MMI(1) Problems of W3C MMI(1) • The “size” of Modality Component does not suit for life ‐ like agent control it f lif lik t t l Delivery Interaction Data Context Context manager Component Component Runtime Framework Modality Component API Modality Component API Speech Modality Face Image Modality FSM ASR TTS 2007/11/16 W3C MMI ws 9

Problems of W3C MMI(1) Problems of W3C MMI(1) • Lip synchronization with speech output Delivery l Interaction Data Context manager Component Component Runtime Framework 3 o [65] h[60] set lip 1 1 a[65] ... [ ] moving set Text= 2 “ohayou” sequence Speech Modality h d l Face Image Modality d l 4 FSM ASR ASR TTS TTS start 2007/11/16 W3C MMI ws 10

Problems of W3C MMI(1) Problems of W3C MMI(1) • Back channeling mechanism Delivery l Interaction Data Context manager Component Component Runtime Framework 1 2 2 set Text=“hai” short pause nod start Speech Modality h d l Face Image Modality d l FSM ASR ASR TTS TTS 2007/11/16 W3C MMI ws 11

Problems of W3C MMI(2) Problems of W3C MMI(2) • Fragile Modality fusion and fission functionality Delivery Interaction Data Context manager g Component p C Component t Runtime Framework How to define How to define multimodal point (120,139) “from here to there” point (200,300) grammar? g Speech Modality Tactile Modality Is simple touch h unification ASR sensor enough? 2007/11/16 W3C MMI ws 12

Problems of W3C MMI(2) Problems of W3C MMI(2) • Fragile Modality fusion and fission functionality Delivery Interaction Data Context manager g Component p C Component t Runtime Framework Contents “this is route map” SVG planning is p g suitable for Speech Modality Graphic Modality adapting various SVG devices. TTS Viewer 2007/11/16 W3C MMI ws 13

Problems of W3C MMI(3) Problems of W3C MMI(3) • How to deal with user model? Delivery Interaction Data Context manager manager Component Component Component Runtime Framework Where is the user model information stored? t d? Speech Modality Face Image Modality FSM ASR TTS fails many times times 2007/11/16 W3C MMI ws 14

Solution Solution • Back to multimodal framework – more smaller modality component • Separate state transition description • Separate state transition description – task flow – interaction flow – modality fusion/fission hierar hi al ar hite t re hierarchical architecture 2007/11/16 W3C MMI ws 15

Investigation procedure g p Phase 1 use case analysis requirement for overall systems Working draft for MMI architecture 2007/11/16 W3C MMI ws 16

Use case analysis y Name input modality output modality a on ‐ line li di display, speech l h mouse, speech shopping animated agent b voice search b voice search mouse speech mouse, speech display speech display, speech c site search mouse, speech, key display, speech d interaction with i t ti ith speech, image, sensor speech, display robot negotiation negotiation e with interactive speech speech, face image agent agent f kiosk terminal touch, speech speech, display 2007/11/16 W3C MMI ws 17

Example of use case Interaction with robot Interaction with robot Nishijin Kasuri is Nishijin Kasuri is a traditional texture in Kyoto. What is Kasuri ? 2007/11/16 W3C MMI ws 18

Requirements q 1. general 2 2. input modality input modality in common in common with W3C 3. output modality 4. architecture, integration and synchronization 4 hit t i t ti d h i ti point 5 5. runtimes and deployments runtimes and deployments 6. dialogue management extension extension 7. handling of forms and fields 7 h dli f f d fi ld 8. connection with outside application 9. user model and environment information 10.from the viewpoint of developer 2007/11/16 W3C MMI ws 19

user model / layer 6: data model application logic application device model set/get set/get event/ control layer 5: control task control task control event / result command layer 4 layer 4 control interaction control integrated result / event command command event layer 3: control / understanding control modality integration event interpreted result/ interpreted result/ command command command event event layer 2: control/ control/ modality control control interpret p interpret p component component command command results ・ event event TTS / / graphical g p layer 1: y ASR ASR pen / touch / t h I/O device audio output output

Investigation procedure Investigation procedure Phase 2 Detailed analysis of use case Requirements for each layer Publish trial standard release reference implementation 2007/11/16 W3C MMI ws 21

Detailed use case analysis y 2007/11/16 W3C MMI ws 22

Requirements of each layer q y • Clarify Input/Output with adjacent layers • Define events • • Clarify inner layer processing Clarify inner layer processing • Investigate markup language 2007/11/16 W3C MMI ws 23

1 st layer ： Input/Output module 1 layer ： Input/Output module • Function – Uni ‐ modal recognition/synthesis module / • Input module – Input ： (from outside) signal (from 2 nd layer) information used for recognition – Output ： (to 2 nd ) recognition result – Example ： ASR, touch input, face detection, ... • Output module – Input ： (from 2 nd ) output contents – Output ： (to outside) signal – Example ： TTS, Face image synthesizer, Web browser, ... 2007/11/16 W3C MMI ws 24

2 nd : Modality component y p • Function – lapper that absorbs the difference of 1 st layer lapper that absorbs the difference of 1 st la er ex ） Speech Recognition component grammar ： SRGS semantic analysis : SISR result: EMMA – provide multimodal synchronization ex) TTS with lip synchronization 2nd: LS ‐ TTS Modality component 1st: TTS FSM Input/Output p p module 2007/11/16 W3C MMI ws 25

Proposal of a Hierarchical Proposal of a Hierarchical Architecture - PowerPoint PPT Presentation

Proposal of a Hierarchical Proposal of a Hierarchical Architecture for Multimodal Architecture for Multimodal Interactive Systems y Masahiro Araki* 1 Tsuneo Nitta* 2 Kouichi Katsurada* 2 Takuya Nishimoto* 3 Tetsuo Amakasu* 4 Shinnichi Kawamoto* 5

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

Writing Your CAREER Proposal Writing Your CAREER Proposal 2016 NSF/CMMI CAREER Proposal Writing

Writing Your CAREER Proposal Writing Your CAREER Proposal 2016 NSF CAREER Proposal Writing

A PROPOSAL FOR A PRESERVATION A PROPOSAL FOR A PRESERVATION A PROPOSAL FOR A PRESERVATION A

Agglomerative 2-3 Hierarchical Agglomerative 2-3 Hierarchical Clustering: theoretical

HIERARCHICAL DETERMINISTIC WALLETS JOHN NEWBERY @jfnewbery github.com/jnewbery HIERARCHICAL

Using Hierarchical Modeling to Assist Using Hierarchical Modeling to Assist Effects Based

The Brain as a Hierarchical The Brain as a Hierarchical Organization Organization I sabelle

Perspective Hierarchical Dirichlet Process for Perspective Hierarchical Dirichlet Process for

Hierarchical Pointer Analysis for Distributed Programs Distributed Programs Amir Kamil and

Security Technologies and Hierarchical Trust Security Technologies and Hierarchical Trust Today

Hierarchical clustering David M. Blei COS424 Princeton University February 28, 2008 D. Blei

Hierarchical Clustering MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de

The Comprehensive ESRD Care (CEC) Model Open Door Forum April 24, 2014 Alefiyah Mesiwala, MD MPH

Designing Trustworthy User-Agents for a Hostile Web Usenix Security 2009 IE8 Program Manager -

Algorithms in Nature Robustness in biological systems Failure and attacks on networks Is

ScholarBase: Towards a Cross-Domain Knowledgebase for Linked Scholarly Data Mahmoud Elbattah

BOE August 17, 2020 Opening School Recommendation What have we learned? Enrollment Intentions by

Information Recovery from Pairwise Measurements A Shannon-Theoretic Approach Yuxin Chen ,

The Science DMZ Eli Dart, Energy Sciences Network (ESnet) TERENA Network Architects and TF-NOC

The Launch Solution Launch is a solution to high-quality online learning options designed ,

Sambuz

Useful Links

Newsletter

Mail Us