Part 3: Audio-Visual Child-Robot Interaction Petros Maragos - PowerPoint PPT Presentation

Computer Vision, Speech Communication & Signal Processing Group, Intelligent Robotics and Automation Laboratory National Technical University of Athens, Greece (NTUA) Robot Perception and Interaction Unit, Athena Research and Innovation Center (Athena RIC) Part 3: Audio-Visual Child-Robot Interaction Petros Maragos slides: http://cvsp.cs.ntua.gr/interspeech2018 Tutorial at INTERSPEECH 2018, Hyderabad, India, 2 Sep. 2018 1

EU project BabyRobot: Experimental Setup Room 2

TD experiments video Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction 3

Act Sense Think Wizard‐of‐Oz Perception System child’s activity Action 3d Object Recognition Tracking Visual Stream IrisTK behavior generation Visual Gesture Recognition Behavioral Monitoring child’s behavioral AV Localization & state Tracking IrisBroker Audio Stream Action Branch Distant Speech Recognition Speech Emotion Visual Emotion Text Emotion Recognition Recognition Recognition Behavioral Branch Audio Related Information Visual Related Information Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction 4

Experimental Setup: Hardware & Software

Action Branch: Developed Technologies Multiview Gesture Recognition 3D Object Tracking Speaker Localization and Distant Multiview Action Recognition Speech Recognition Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction 6

Audio-Visual Localization Evaluation  Track multiple persons using Kinect skeleton.  Select the person closest to the auditory source position.  Rcor: percentage of correct estimations (deviation from ground truth less than 0.5m)  Audio Source Localization: 45.5%  Audio-Visual Localization: 85.6% Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction 7

Multi-view Gesture Recognition  Multiple views of the child’s gesture from different sensors  Fusion of the three sensors’ decisions Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction 8

Gesture Recognition – Vocabulary Nod Greet Come Closer Sit Stop Point Circle Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction 9

Multi-view Gesture Recognition - Evaluation  7 classes: nod, greet, come closer, sit, stop, point, circle  Average classification accuracy (%) for the employed gestures performed by 28 children (development corpus).  Results for the five different features for both single and multi-steam cases. Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction 10

Multi-view Gesture Recognition - Children vs. Adults  different training schemes  Adults models  Children models  Mixed model Employed Features: MBH A. Tsiami, P. Koutras, N. Efthymiou, P. Filntisis, G. Potamianos, P. Maragos, “Multi3: Multi-sensory Perception System for Multi-modal Child Interaction with Multiple Robots”, Proc. ICRA, 2018. Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction 11

Distant Speech Recognition System I think that you are I think that it is hammering a nail the rabbit I think that you It relates to are painting peace Collected Data  DSR model training and adaptation per Kinect (Greek models) 12 Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction 12

Spoken Command Recognition Evaluation • TD (Typically-Developing) children data: 40 phrases • average word (WCOR) and sentence accuracy (SCOR) for the DSR task, per utterance set for all adaptation choices. • 4-fold cross-validation 13 Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction 13

Spoken Command Recognition – Children vs Adults  different training schemes  Adults models  Children models  Mixed model Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction 14

Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction 15

Action Recognition- Vocabulary Cleaning a window Ironing a shirt Digging a hole Driving a bus Painting a wall Hammering a nail Wiping the floor Reading Swimming Working Out Playing the guitar Dancing Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction 16

Multi-view Action Recognition - Evaluation  13 classes of pantomime actions  Average classification accuracy (%) for the employed gestures performed by 28 children (development corpus).  Results for the five different features for both single and multi-steam cases. N. Efthymiou, P. Koutras, P. Filntisis, G. Potamianos, P. Maragos, “Multi-view Fusion for Action Recognition in Child-Robot Interaction”, Proc. ICIP, 2018. Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction 17

Multi-view Action Recognition – Children vs Adults  different training schemes  Adults models  Children models  Mixed model Employed Features: MBH Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction 18

Children-Robot Interaction: TD video - Rock Paper Scissors A. Tsiami, P. Filntisis, N. Efthymiou, P. Koutras, G. Potamianos, P. Maragos, “Multi3: Multi-sensory Perception System for Multi-modal Child Interaction with Multiple Robots”, Proc. ICRA , 2018. Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction 19

Part 3: Conclusions  Synopsis : • Data collection and annotation: 28 TD and 15 ASD children (+ 20 adults) • Audio-Visual localization and tracking • 3D Object tracking • Multi-view Gesture and Action recognition • Distant Speech recognition • Multimodal Emotion recognition  Ongoing work : • Evaluate the whole perception system with TD and ASD children • Extend and develop methods for engagement and behavioral understanding Tutorial slides: http://cvsp.cs.ntua.gr/interspeech2018 For more information, demos, and current results: http://cvsp.cs.ntua.gr and http://robotics.ntua.gr Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction Interspeech 2018 Tutorial: Multimodal Speech & Audio Processing in Audio-Visual Human-Robot Interaction 20

Part 3: Audio-Visual Child-Robot Interaction Petros Maragos - PowerPoint PPT Presentation

Computer Vision, Speech Communication & Signal Processing Group, Intelligent Robotics and Automation Laboratory National Technical University of Athens, Greece (NTUA) Robot Perception and Interaction Unit, Athena Research and Innovation

Human-Robot Interaction CMSC 691 Spring 2016 2 u What is an interaction with a robot? u What is

Robothlon Team competition, each team programs a robot for each event Events Robot

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

INTERSPEECH 2018 Turorial: Multimodal Speech and Audio Processing in Audio-Visual Human-Robot

Rational Robot A Test Automation Tool What is Rational Robot? Rational Robot is a complete

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

What is a robot? A robot is an intelligent system that interacts with the Robot Lecture 2:

Human-robot interaction Human-robot interaction in autism in autism S. Casalini, G. Dalle Mura,

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Create PowerPoint Audio and Video V0B August 2020 V0B V0B Schield: 2020 PPTX Create Audio-Video

Audio and Speech August 13, 2001 Audio 2 Digital sound anti-aliasing amplifier codec filter

Game Audio Coding vs. Aesthetics Leonard Paul of Lotus Audio Vancouver, Canada Game Audio :

Establishing a Korean Robot Ethics Charter 2007. 4. 14 Robot Division, Ministry of Commerce,

Out line Robot ics Percept ion Robot ics Planning Reading: R&N Sect .

Robot behaviour and control A robot can be defined as an intelligent link between perception

Knowledge Working Organization #ABE17, Warsaw, October 2017 HOW CAN YOU CREATE AN @mortenelvang

10-701 Machine Learning (Spring 2012) Principal Component Analysis Yang Xu This note is partly

DIY IoT Backend Voon Siong WONG greenfields greenfields 4 years ago, wanted IoT platform no

Real Time Data Analytics @ Uber Ankur Bansal November 14, 2016 About Me Sr. Software Engineer,

Impulsive control of moving ensembles of interacting agents Maxim Staritsyn joint work with

Python & Memory Tomasz Paczkowski @oinopion PyWaw, 14.07.2014 Disclaimer Code was

Evan Krall 2015-06-12 Who is this person? Evan Krall SRE = development + sysadmin 4+ years at

Interaktionsdesign Interaktionsdesign E2008 Lektion 19 Lektion 19 Mening i brug

Part 3: Audio-Visual Child-Robot Interaction Petros Maragos - PowerPoint PPT Presentation

Computer Vision, Speech Communication & Signal Processing Group, Intelligent Robotics and Automation Laboratory National Technical University of Athens, Greece (NTUA) Robot Perception and Interaction Unit, Athena Research and Innovation

Human-Robot Interaction CMSC 691 Spring 2016 2 u What is an interaction with a robot? u What is

Robothlon Team competition, each team programs a robot for each event Events Robot

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

INTERSPEECH 2018 Turorial: Multimodal Speech and Audio Processing in Audio-Visual Human-Robot

Rational Robot A Test Automation Tool What is Rational Robot? Rational Robot is a complete

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

What is a robot? A robot is an intelligent system that interacts with the Robot Lecture 2:

Human-robot interaction Human-robot interaction in autism in autism S. Casalini, G. Dalle Mura,

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Create PowerPoint Audio and Video V0B August 2020 V0B V0B Schield: 2020 PPTX Create Audio-Video

Audio and Speech August 13, 2001 Audio 2 Digital sound anti-aliasing amplifier codec filter

Game Audio Coding vs. Aesthetics Leonard Paul of Lotus Audio Vancouver, Canada Game Audio :

Establishing a Korean Robot Ethics Charter 2007. 4. 14 Robot Division, Ministry of Commerce,

Out line Robot ics Percept ion Robot ics Planning Reading: R&amp;N Sect .

Robot behaviour and control A robot can be defined as an intelligent link between perception

Knowledge Working Organization #ABE17, Warsaw, October 2017 HOW CAN YOU CREATE AN @mortenelvang

10-701 Machine Learning (Spring 2012) Principal Component Analysis Yang Xu This note is partly

DIY IoT Backend Voon Siong WONG greenfields greenfields 4 years ago, wanted IoT platform no

Real Time Data Analytics @ Uber Ankur Bansal November 14, 2016 About Me Sr. Software Engineer,

Impulsive control of moving ensembles of interacting agents Maxim Staritsyn joint work with

Python &amp; Memory Tomasz Paczkowski @oinopion PyWaw, 14.07.2014 Disclaimer Code was

Evan Krall 2015-06-12 Who is this person? Evan Krall SRE = development + sysadmin 4+ years at

Interaktionsdesign Interaktionsdesign E2008 Lektion 19 Lektion 19 Mening i brug

Out line Robot ics Percept ion Robot ics Planning Reading: R&N Sect .

Python & Memory Tomasz Paczkowski @oinopion PyWaw, 14.07.2014 Disclaimer Code was