proposal of a hierarchical proposal of a hierarchical
play

Proposal of a Hierarchical Proposal of a Hierarchical Architecture - PowerPoint PPT Presentation

Proposal of a Hierarchical Proposal of a Hierarchical Architecture for Multimodal Architecture for Multimodal Interactive Systems y Masahiro Araki* 1 Tsuneo Nitta* 2 Kouichi Katsurada* 2 Takuya Nishimoto* 3 Tetsuo Amakasu* 4 Shinnichi Kawamoto* 5


  1. Proposal of a Hierarchical Proposal of a Hierarchical Architecture for Multimodal Architecture for Multimodal Interactive Systems y Masahiro Araki* 1 Tsuneo Nitta* 2 Kouichi Katsurada* 2 Takuya Nishimoto* 3 Tetsuo Amakasu* 4 Shinnichi Kawamoto* 5 * 1 Kyoto Institute of Technology * 1 Kyoto Institute of Technology * 2 Toyohashi University of Technology * 3 The University of Tokyo * 4 NTT Cyber Space Labs. * 5 ATR 2007/11/16 W3C MMI ws 1

  2. Outline Outline • Background – Introduction of speech IF committee under ITSCJ – Introduction to Galatea toolkit • Problems of W3C MMI Architecture – Modality Component is too large y p g – Fragile Modality fusion and fission functionality – How to deal with user model? • Our Proposal – Hierarchical MMI architecture Hierarchical MMI architecture – “ Convention over Configuration ” in various layers 2007/11/16 W3C MMI ws 2

  3. Background(1) Background(1) • What is ITSCJ? – Information Technology Standards Commission of Japan p • under IPSJ (Information Processing Society of Japan) • Speech Interface Committee under ITSCJ • Speech Interface Committee under ITSCJ – Mission • Publish TS (Trial Standard) document concerning multimodal dialogue systems 2007/11/16 W3C MMI ws 3

  4. Background(2) Background(2) • Theme of the committee – Architecture of MMI system h f – Requirements of each component • Future directions – Guideline for implementing practical MMI system Guideline for implementing practical MMI system – specify markup language 2007/11/16 W3C MMI ws 4

  5. Our Aim 1. Propose an MMI architecture which can be used for advanced MMI research used for advanced MMI research W3C: From the practical point of view (mobile, accessibility) 2. Examine the validity of the architecture through y g system implementation Galatea Toolkit 3. Develop a framework and release it as a open source towards de facto standard 2007/11/16 W3C MMI ws 5

  6. Galatea Toolkit(1) Galatea Toolkit(1) • Platform for • Platform for developing MMI systems systems • Speech recognition • Speech Synthesis • Face Image g Synthesis 2007/11/16 W3C MMI ws 6

  7. Galatea Toolkit(2) Galatea Toolkit(2) ASR Julian Dialogue Manager TTS Galatea DM Galatea talk Face FSM 2007/11/16 W3C MMI ws 7

  8. Galatea Toolkit(3) Galatea Toolkit(3) Di l Dialogue Phoenix Manager Macro Control Layer (AM ‐ MCL) ( ) Agent Agent Manager Direct Control Layer (AM ‐ DCL) ASR ASR TTS TTS Face Face Julian Galatea talk FSM 2007/11/16 W3C MMI ws 8

  9. Problems of W3C MMI(1) Problems of W3C MMI(1) • The “size” of Modality Component does not suit for life ‐ like agent control it f lif lik t t l Delivery Interaction Data Context Context manager Component Component Runtime Framework Modality Component API Modality Component API Speech Modality Face Image Modality FSM ASR TTS 2007/11/16 W3C MMI ws 9

  10. Problems of W3C MMI(1) Problems of W3C MMI(1) • Lip synchronization with speech output Delivery l Interaction Data Context manager Component Component Runtime Framework 3 o [65] h[60] set lip 1 1 a[65] ... [ ] moving set Text= 2 “ohayou” sequence Speech Modality h d l Face Image Modality d l 4 FSM ASR ASR TTS TTS start 2007/11/16 W3C MMI ws 10

  11. Problems of W3C MMI(1) Problems of W3C MMI(1) • Back channeling mechanism Delivery l Interaction Data Context manager Component Component Runtime Framework 1 2 2 set Text=“hai” short pause nod start Speech Modality h d l Face Image Modality d l FSM ASR ASR TTS TTS 2007/11/16 W3C MMI ws 11

  12. Problems of W3C MMI(2) Problems of W3C MMI(2) • Fragile Modality fusion and fission functionality Delivery Interaction Data Context manager g Component p C Component t Runtime Framework How to define How to define multimodal point (120,139) “from here to there” point (200,300) grammar? g Speech Modality Tactile Modality Is simple touch h unification ASR sensor enough? 2007/11/16 W3C MMI ws 12

  13. Problems of W3C MMI(2) Problems of W3C MMI(2) • Fragile Modality fusion and fission functionality Delivery Interaction Data Context manager g Component p C Component t Runtime Framework Contents “this is route map” SVG planning is p g suitable for Speech Modality Graphic Modality adapting various SVG devices. TTS Viewer 2007/11/16 W3C MMI ws 13

  14. Problems of W3C MMI(3) Problems of W3C MMI(3) • How to deal with user model? Delivery Interaction Data Context manager manager Component Component Component Runtime Framework Where is the user model information stored? t d? Speech Modality Face Image Modality FSM ASR TTS fails many times times 2007/11/16 W3C MMI ws 14

  15. Solution Solution • Back to multimodal framework – more smaller modality component • Separate state transition description • Separate state transition description – task flow – interaction flow – modality fusion/fission hierar hi al ar hite t re hierarchical architecture 2007/11/16 W3C MMI ws 15

  16. Investigation procedure g p Phase 1 use case analysis requirement for overall systems Working draft for MMI architecture 2007/11/16 W3C MMI ws 16

  17. Use case analysis y Name input modality output modality a on ‐ line li di display, speech l h mouse, speech shopping animated agent b voice search b voice search mouse speech mouse, speech display speech display, speech c site search mouse, speech, key display, speech d interaction with i t ti ith speech, image, sensor speech, display robot negotiation negotiation e with interactive speech speech, face image agent agent f kiosk terminal touch, speech speech, display 2007/11/16 W3C MMI ws 17

  18. Example of use case Interaction with robot Interaction with robot Nishijin Kasuri is Nishijin Kasuri is a traditional texture in Kyoto. What is Kasuri ? 2007/11/16 W3C MMI ws 18

  19. Requirements q 1. general 2 2. input modality input modality in common in common with W3C 3. output modality 4. architecture, integration and synchronization 4 hit t i t ti d h i ti point 5 5. runtimes and deployments runtimes and deployments 6. dialogue management extension extension 7. handling of forms and fields 7 h dli f f d fi ld 8. connection with outside application 9. user model and environment information 10.from the viewpoint of developer 2007/11/16 W3C MMI ws 19

  20. user model / layer 6: data model application logic application device model set/get set/get event/ control layer 5: control task control task control event / result command layer 4 layer 4 control interaction control integrated result / event command command event layer 3: control / understanding control modality integration event interpreted result/ interpreted result/ command command command event event layer 2: control/ control/ modality control control interpret p interpret p component component command command results ・ event event TTS / / graphical g p layer 1: y ASR ASR pen / touch / t h I/O device audio output output

  21. Investigation procedure Investigation procedure Phase 2 Detailed analysis of use case Requirements for each layer Publish trial standard release reference implementation 2007/11/16 W3C MMI ws 21

  22. Detailed use case analysis y 2007/11/16 W3C MMI ws 22

  23. Requirements of each layer q y • Clarify Input/Output with adjacent layers • Define events • • Clarify inner layer processing Clarify inner layer processing • Investigate markup language 2007/11/16 W3C MMI ws 23

  24. 1 st layer : Input/Output module 1 layer : Input/Output module • Function – Uni ‐ modal recognition/synthesis module / • Input module – Input : (from outside) signal (from 2 nd layer) information used for recognition – Output : (to 2 nd ) recognition result – Example : ASR, touch input, face detection, ... • Output module – Input : (from 2 nd ) output contents – Output : (to outside) signal – Example : TTS, Face image synthesizer, Web browser, ... 2007/11/16 W3C MMI ws 24

  25. 2 nd : Modality component y p • Function – lapper that absorbs the difference of 1 st layer lapper that absorbs the difference of 1 st la er ex ) Speech Recognition component grammar : SRGS semantic analysis : SISR result: EMMA – provide multimodal synchronization ex) TTS with lip synchronization 2nd: LS ‐ TTS Modality component 1st: TTS FSM Input/Output p p module 2007/11/16 W3C MMI ws 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend