Audio recognition, context-awareness, and its applications - - PowerPoint PPT Presentation

audio recognition context awareness and its applications
SMART_READER_LITE
LIVE PREVIEW

Audio recognition, context-awareness, and its applications - - PowerPoint PPT Presentation

Audio recognition, context-awareness, and its applications Yoonchang Han Co-founder & CEO, Cochlear.ai 26 March, 2018 Rule-based Deep learning methods (Source: Softbank Pepper) See Computer vision Understand Natural language


slide-1
SLIDE 1

Audio recognition, context-awareness, and its applications

Yoonchang Han

Co-founder & CEO, Cochlear.ai 26 March, 2018

slide-2
SLIDE 2

(Source: Softbank Pepper)

Rule-based methods Deep learning

slide-3
SLIDE 3

See Understand language Listen Computer vision Natural language processing Speech recognition

(Source: Softbank Pepper)

slide-4
SLIDE 4

Taking an umbrella Closing the window

slide-5
SLIDE 5

(Audio source: http://www.freesound.org/people/Damiaan/)

Foot step sound High heels

slide-6
SLIDE 6

(Source: BBC)

slide-7
SLIDE 7

Easy for Humans Hard for Machines

slide-8
SLIDE 8

Evolution of data processing technique

Feature engineering Early days Traditional ML Prediction ML Classifier Data Feature engineering Deep learning Deep learning More human effort More automatic Better performance

slide-9
SLIDE 9

To tackle each topic (make some “rules”) To simulate how human understand the sound (and prepare data)

Domain knowledge

slide-10
SLIDE 10

Required domain knowledge

Signal Processing Psychoacoustics Cognitive Sciences Acoustics Music Machine Learning

slide-11
SLIDE 11

“Modern” audio identification pipeline

Audio Neural Network Time-frequency representation Output

butterfly flower

  • bjects in an image ≈ instruments in a spectrogram

voice violin piano

slide-12
SLIDE 12

“Machine listening” is the use of signal processing and machine learning for making sense of natural / everyday sounds, and recorded music.

  • Machine listening lab, Queen Mary, Univ. of London
slide-13
SLIDE 13

Voice

Language Age Gender Emotion Health

Music

Genre Mood Tempo Chord Pitch

slide-14
SLIDE 14

Voice

Music

Machine listening

driving bus library train park city centre home market cafe

Acoustic scenes

… water boil glass break car horn footstep knock dog bark gun shot bird chirping snoring

Acoustic events

“Any” sound we hear everyday

sneeze crying

slide-15
SLIDE 15

Acoustic scene/event detection

Computer vision

(Sources: Tensorflow, Facebook , Microsoft, Apple, Shazam)

Optical Character Recognition (OCR)

Machine listening

Facial recognition Object detection Voice recognition Music search Speaker identification

slide-16
SLIDE 16

70 80 90 100

2013 2017

Scene classification accuracy (IEEE DCASE) 76 % 92 %

(Source: http://www.cs.tut.fi/sgn/arg/dcase2017/, http://c4dm.eecs.qmul.ac.uk/sceneseventschallenge/resultsSC.html)

slide-17
SLIDE 17

Artificial Intelligence Deep Learning Machine Learning

slide-18
SLIDE 18

Perceive Think Act

slide-19
SLIDE 19

Cat Five, Zero

slide-20
SLIDE 20

Know what it is (with input restriction) Know what it is Know what/where it is Know what/where it is + why

Simple Identification Closer to human

slide-21
SLIDE 21

Sense (closed alpha release in April)

Acoustic event Music analysis Speech analysis

Genre/Mood /Key/Tempo Age/Gender /Emotion

Activity detection Scene classification

Dogbark/Babycry Carhorn/Snoring... Music,Speech,Others Indoor/Outdoor /Vehicles

slide-22
SLIDE 22

Activity detection Unified model Why do we need…

slide-23
SLIDE 23

It is really challenging because…

Recording environment Recording device Noises Local characteristics Overlapped / Polyphonic

slide-24
SLIDE 24

Probability or Saliency ?

slide-25
SLIDE 25

Example: AI speakers

“Alexa, turn on the light” “Alexa, play dance music” “Alexa, turn on TV”

(footstep sound, door slam, cough, Someone got back home, got a bad cold) turn on light / TV play suitable music adjust room temperature warmer (not just a pattern, there is a “reason”) ask to take cold medicine before sleep

Simple voice control IoT control-tower with context-awareness

slide-26
SLIDE 26

Example: Humanoid robots

See things Understand speech

+

Listen things other than voice Know who they talk to

(Source: Atlas, Boston Dynamics)

slide-27
SLIDE 27

Example: Autonomous car

(Source: NVIDIA)

Outside - Car horn (normal, air horn), Siren (fire truck, police, ambulance) Inside - Music mood, snoring, baby, anomaly detection (malfunction warning)

slide-28
SLIDE 28

Musician

AI researcher

Architect Visual artist

Contemporary dancer

ATMO: Generative music for spatial atmo-sphere

+

slide-29
SLIDE 29

Generative Music with contextual information

slide-30
SLIDE 30

Ambient music Background music

Generative Music with contextual information

slide-31
SLIDE 31

Analysis Result : Typing in a rainy day…

Contextual Information

Typing… Reading a book… Raining outside…

slide-32
SLIDE 32

Microphone Speaker

slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35

contact@cochlear.ai