Robot audition and its deployment Kazuhiro Nakadai Principal - - PowerPoint PPT Presentation

robot audition and its deployment
SMART_READER_LITE
LIVE PREVIEW

Robot audition and its deployment Kazuhiro Nakadai Principal - - PowerPoint PPT Presentation

Honda Research Institute JP Robot audition and its deployment Kazuhiro Nakadai Principal Researcher, Honda Research Institute Japan Co. Ltd. Visiting Professor, Tokyo Institute of Technology Visiting Professor, Waseda University 2nd Workshop


slide-1
SLIDE 1

Honda Research Institute JP 1

Robot audition and its deployment

Kazuhiro Nakadai

Principal Researcher, Honda Research Institute Japan Co. Ltd. Visiting Professor, Tokyo Institute of Technology Visiting Professor, Waseda University

2nd Workshop on Alternative Sensing for Robot Perception: Beyond Laser and Vision

slide-2
SLIDE 2

Honda Research Institute JP

Outline

  • 1. Background of Robot Audition
  • 2. Introduction to Robot Audition Research
  • 3. Open Source Software for Robot Audition
  • 4. Deployment of Robot Audition
  • 5. Summary

2

slide-3
SLIDE 3

Honda Research Institute JP

Background

Robot as our partner Necessity of auditory processing → robot audition

Humanoid robot

 Interaction with human is expected to be a partner.

welfare company House keeping News provider

Service, Interaction, Information, Entertainment…

slide-4
SLIDE 4

Honda Research Institute JP

Robot Audition When a robot listens to sound with its ears, ….

It should deal with the mixture of sounds.

Ego-noise Motors, self-voice

slide-5
SLIDE 5

Honda Research Institute JP

Robot Audition

  • Proposed by Prof. Okuno (Kyoto Univ. →

Waseda Univ.) and Nakadai at AAAI-2000

– http://winne.kuis.kyoto-u.ac.jp/SIG/

  • A research field bridging Robotics, AI and

Signal processing

  • Continuously expanding

– Japan: Kyoto Univ., Honda RI, Tokyo Tech., ATR, AIST, Kumamoto Univ., Waseda Univ.,etc – Europe: CNRS-LAAS (France), INRIA (France), Univ. of Erlangen- Nuremberg (Germany), Ruhr-Universität Bochum (Germany), ITU (Turkey), Imperial College London (UK), etc – North America: Sherbrooke Univ.(Canada), MERL (USA), Virginia Tech. (USA), Willow Garage (USA), etc – Oceania: UTS (Australia)

Robot Audition

slide-6
SLIDE 6

Honda Research Institute JP

Our Activities for Robot Audition

Organized Sessions on IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems (IROS 2005-2013) * Since 2014, robot audition is registered as an official keyword in IEEE-RAS. Special Session on IEEE Int’l Conf. on Acoustics, Speech and Signal Processing (ICASSP 2009)@Taipei, Taiwan (ICASSP 2015)@Brisbane, Australia HARK Tutorial (OSS) France: 2009, 2012, 2013 Korea: 2008 Japan: once a year since 2008

  • Migration to Taxai at Willow Garage 2010 @ Palo Alto, USA
  • International workshop on Music Robot 2010 @ Taipei, Taiwan
slide-7
SLIDE 7

Honda Research Institute JP

Outline

  • 1. Background of Robot Audition
  • 2. Introduction to Robot Audition Research
  • 3. Open Source Software for Robot Audition
  • 4. Deployment of Robot Audition
  • 5. Summary

7

slide-8
SLIDE 8

Honda Research Institute JP

Robot is surrounded by various noises.

Target Speech Directional noise Ego-noise such as motion and voice) (near field, loud) Reverberation (echo) Diffuse noise (BGN, omni-directional)

Different characteristics → one-by-one approach

  • Sound Source Separation mainly for directional noise
  • Dereverberation
  • Ego-noise suppression
slide-9
SLIDE 9

Honda Research Institute JP

Sound Source Separation

9

Source Input

) ( x

+ +

Separation Matrix

) ( W

Output

Separation process

) ( ) ( ) (    x W y 

Incremental SSS: Update to reduce mixing cost :step-size parameter

W ) (W J

) ( y

Source Separation

) ( '

1

W J W W   

 t t

Separation Matrix

slide-10
SLIDE 10

Honda Research Institute JP

Adaptive Step-size (AS)

Newton’s method

Sound Source Separation with Adaptive Step-size control

Fixed step-size: Difficult to adapt to environmental changes like robot motions and moving sources => GHDSS-AS [IEEE-TSLP Nakajima 10]

10

  • tearai

Recorded sound Manually-tuned Small value Large value

200 400 600 800 1000 1200

  • 50
  • 40
  • 30
  • 20
  • 10

Number of Updates Level [dB] GSS with u = 1 250 500 1k 2k 200 400 600 800 1000 1200

  • 50
  • 40
  • 30
  • 20
  • 10

Number of Updates Level [dB] GSS with u = 1 250 500 1k 2k

Fixed μ

Time (# of frames) Separation depth

Adaptively-controlled μ

Separation depth Time (# of frames)

SSS

slide-11
SLIDE 11

Honda Research Institute JP

Experiment with Texai [IEEE ICRA 2011]

Reverberant conference room (RT > 1s), around 20m x 10m.

11

http://www.youtube.com/watch?v=xpjPun7Owxg

Time (frame) Direction (degree) Talker1 Talker2 Talker3 Talker4 Garbage

Recorded

slide-12
SLIDE 12

Honda Research Institute JP

Ego-noise suppression

12

Robot’s voice & motion noise

  • closer to mics
  • Higher power

Interactive Dancing Robot

Key idea Robot knows what it utters and what kind of motions it does.

                                                  ) ( ) ( ) ( 1 1 , ( , ( ) , ( ) , ( ) , ( M f S f S f N M H H A M f S f S f Y                 

Known signal Noise siganal

  • bservation

Known signal (utterance)

Semi-blind ICA⇒barge-in-able robot Template-based ego-motion noise suppression

Pos ture noise Pos ture noise

[Neural Computation ‘12, IEEE IROS ‘09-’12]

slide-13
SLIDE 13

Honda Research Institute JP

 Mismatch between two blocks

 Noise suppression  Automatic speech recognition (ASR)

Missing-Feature-Theory-based Integration [ASRU 07]

Noise Suppression Automatic Speech Recognition Noisy/ Simultaneous Speech

Text

Distorted speech Clean speech,

  • r speech with

known noise

Missing Feature Theory (MFT) for better integration

slide-14
SLIDE 14

Honda Research Institute JP

Missing Feature Theory (MFT)

The features of corrupted sound at time t

Missing features caused by separation

Small error

Normal ASR

The features of corrupted sound at time t

MFT-based ASR

i

) (i x ) (i x

i

Missing feature mask (MFM) An acoustic model stored in ASR

Large error

One of the most important issues is automatic MFM generation.

slide-15
SLIDE 15

Honda Research Institute JP

An example of automatic generated MFM

left center right spectrogram MFM captured

1 (reliable) 0 (unreliable)

Arayuru Genjitsu wo … Isshukan bakari … Terebi gemu ya pasokon de … leakage leakage speech pass masked masked

slide-16
SLIDE 16

Honda Research Institute JP

Outline

  • 1. Background of Robot Audition
  • 2. Introduction to Robot Audition Research
  • 3. Open Source Software for Robot Audition
  • 4. Deployment of Robot Audition
  • 5. Summary

16

slide-17
SLIDE 17

Honda Research Institute JP

Open Source Robot Audition Software HARK

HRI-JP Audition for Robots with Kyoto University http://www.hark.jp/ hark = listen (old English) Research: Free (Commercial: Licensing)

Sound Source Localization

Sound Source Separation

Automatic Speech Recognition

Array

Dialog Developing under collaboration between Kyoto Univ., HRI-JP, and Tokyo Tech.

slide-18
SLIDE 18

Honda Research Institute JP

His istory

  • ry and

nd Tut utoria

  • rials

ls

1.

  • Apr. 2008, First release (0.1.7)

– 1st Tutorial: Nov. 17th, 2008, Kyoto University, Kyoto, Japan – 2nd Tutorial: Dec. 5th, 2008, KIST, Seoul, Korea

2.

  • Nov. 2009, 1.0.0 Pre-release

– 3rd Tutorial: Nov. 20th, 2009, Keio University, Yokohama, Japan – 4th Tutorial: Dec. 5th, 2009, Univ. de Pierre et Marie Curie, Paris, France

3.

  • Nov. 2010, Major version-up (1.0.0) – performance, rich documents

– 5th Tutorial: Nov. 20th, 2010, Kyoto University, Kyoto, Japan

4.

  • Feb. 2012, Version-up (1.1) – performance, 64bit processing, ROS

– 6th Tutorial: Feb. 29th, 2012, Univ. de Pierre et Marie Curie, Paris, France – 7th Tutorial: Mar. 9th, 2012, Nagoya University, Nagoya, Japan

5.

  • Mar. 2013, Version-up (1.7) – Window, Kinect, PSEye

– 8th Tutorial: Mar. 19th, 2013, Kyoto University, Kyoto, Japan

6.

  • Oct. 2013, Major Version-up (2.0) – HARKDesigner, Microcone

– 9th Tutorial: Oct. 2nd, 2013, LAAS-CNRS, Toulouse, France – 10th Tutorial: Dec. 5th, 2013, Waseda University, Tokyo, Japan

7.

  • Nov. 2014, Version-up (2.1)

– 11th Tutorial: Nov. 21th, 2014, Waseda University, Tokyo, Japan

8. Nov., 2015 Version-up (2.2) planned

slide-19
SLIDE 19

Honda Research Institute JP

Features in HARK (1)

GUI programming environment (HARK Designer)

– Web-based programming environment

(jQuery, node.js, HTML5)

– Chrome/Safari/Firefox on Linux/Windows/Mac – Small overhead in module communication (frame-based processing) provided by FlowDesigner [Cote04]

An example of robot audition system with HARK

a) Module network b) Property setting

slide-20
SLIDE 20

Honda Research Institute JP

Features in HARK (2)

Support many multi-channel sound input devices Advanced signal processing technologies

– Localization: GEVD/GSVD [Nakamura’11], 3D localization – Separation: GHDSS [Nakajima ‘09], HRLE [Nakajima ‘10], etc.

Easy to install

– Just use a package management tool “apt-get” !

Rich documentation

– Manual and cookbook over 300 pages in Japanese and English

Packages: ROS, OpenCV, Python, …

ALSA supported sound cards (e.g. RME) Kinect (4mics) PlayStation eye (4mics) Microcone (7mics)

slide-21
SLIDE 21

Honda Research Institute JP

Outline

  • 1. Background of Robot Audition
  • 2. Introduction to Robot Audition Research
  • 3. Open Source Software for Robot Audition, HARK
  • 4. Deployment of Robot Audition
  • 5. Summary

21

slide-22
SLIDE 22

Honda Research Institute JP

Musical Robot [IEEE IROS 09 workshop on musical robots]

22

Human-Robot Interaction according to musical beats

  • Adaptive beat tracking
  • HRP2, Nao : Thereminist
slide-23
SLIDE 23

Honda Research Institute JP

SLAM-based Mic array Calibration

Issues in mic array processing – Given microphone positions – Synchronous recording

Measurements are required.

Embedded mic array

Special multichannel A/D is necessary.

EKF-SLAM based online mic-array calibration before reference after

x [m] x [m] x [m] y [m] y [m] y [m]

Average energy

Beam-forming output [Miura et al., IROS 2011] (Best Paper Finalist)

slide-24
SLIDE 24

Honda Research Institute JP

Application to UAV (SSL) [IROS 2014]

Reference SEVD-MUSIC iGSVD-MUSIC w/ CMS

Highly noisy sound sources (-15dB) can be localized.

Quadrotor with 16 mics SSL with iGSVD-MUSIC Localization Results

slide-25
SLIDE 25

Honda Research Institute JP

Application of HARK-Embedded [IROS ‘15]

Sound Source Localization and Visualization

  • 2D sound source localization using 1D sound source

localization, motion, and gyro information

slide-26
SLIDE 26

Honda Research Institute JP

26

Robot Audition based IVI system [IROS’15]

  • Talk button less, highly noise-robust voice recognition,
  • multi-party dialog with a robot agent
  • Hybrid of local and cloud services
slide-27
SLIDE 27

Honda Research Institute JP

Spatio-Temporal Analysis of Frog Chorus

Firefly

– Takeshi Mizumoto, Ikkyu Aihara, Takuma Otsuka, Ryu Takeda, Kazuyuki Aihara, Hiroshi G. Okuno: Sound Imaging of Nocturnal Animal Calls in Their Natural Habita, Journal of Comparative Physiology A, Accepted, 26 Apr. 2011. doi:10.1007/s00359-011-0652-7 (IF 1.8)

Discovery of Three-Group Chorus (=Tri-phase synchronization)

– Ikkyu Aihara, Ryu Takeda, Takeshi Mizumoto, Takuma Otsuka, Toru Takahashi, Hiroshi G. Okuno, Kazuyuki Aihara: Complex and transitive synchronization in a frustrated system of calling frogs, Physical Review E, Vol.83, Issue 3. 031913 (2011) [5 pages], 21 Mar. 2011. doi:10.1103/PhysRevE.83.031913 (IF 2.4)

27

Rice field

Foot path

Oki Island

slide-28
SLIDE 28

Honda Research Institute JP

Summary

 Introduced an overview of robot audition  Introduced open source software for robot audition HARK  Introduced deployment of robot audition technologies to robotics and other fields.

28

slide-29
SLIDE 29

Honda Research Institute JP

Take Home Message Audio processing is powerful as well as visual sensors, and it is essential to HRI and HMI. Robot audition is a research field to consider techniques working in various real-world

  • scenes. When you are interested in robot

audition, please join us (we continuously have sessions in IROS), and try to use HARK. http://www.hark.jp/

29/60