audio recognition context awareness and its applications
play

Audio recognition, context-awareness, and its applications - PowerPoint PPT Presentation

Audio recognition, context-awareness, and its applications Yoonchang Han Co-founder & CEO, Cochlear.ai 26 March, 2018 Rule-based Deep learning methods (Source: Softbank Pepper) See Computer vision Understand Natural language


  1. Audio recognition, context-awareness, and its applications Yoonchang Han Co-founder & CEO, Cochlear.ai 26 March, 2018

  2. Rule-based Deep learning methods (Source: Softbank Pepper)

  3. See Computer vision Understand Natural language language processing Listen Speech recognition (Source: Softbank Pepper)

  4. Taking an umbrella Closing the window

  5. Foot step sound High heels (Audio source: http://www.freesound.org/people/Damiaan/)

  6. (Source: BBC)

  7. Easy for Humans Hard for Machines

  8. Evolution of data processing technique Data Feature More engineering automatic More Feature human Deep learning engineering effort Better performance ML Classifier Prediction Early days Traditional ML Deep learning

  9. Domain knowledge To tackle each topic (make some “rules”) To simulate how human understand the sound (and prepare data)

  10. Required domain knowledge Signal Cognitive Music Processing Sciences Machine Psychoacoustics Acoustics Learning

  11. “Modern” audio identification pipeline Time-frequency Audio Neural Network Output representation objects in an image ≈ instruments in a spectrogram voice flower piano violin butterfly

  12. “Machine listening” is the use of signal processing and machine learning for making sense of natural / everyday sounds, and recorded music. - Machine listening lab, Queen Mary, Univ. of London

  13. Voice … Age Language Gender Emotion Health Music … Genre Mood Chord Pitch Tempo

  14. Machine listening Acoustic scenes Acoustic events bus park glass break knock … library city centre car horn dog bark driving train footstep water boil home market gun shot snoring cafe … bird chirping crying sneeze … Music Voice “Any” sound we hear everyday

  15. Computer vision Machine listening Optical Character Voice recognition Recognition (OCR) Music search Facial recognition Speaker identification Acoustic Object detection scene/event detection (Sources: Tensorflow, Facebook , Microsoft, Apple, Shazam)

  16. 100 92 % 90 76 % 80 70 2013 2017 Scene classification accuracy (IEEE DCASE) (Source: http://www.cs.tut.fi/sgn/arg/dcase2017/, http://c4dm.eecs.qmul.ac.uk/sceneseventschallenge/resultsSC.html)

  17. Deep Machine Artificial Learning Learning Intelligence

  18. Perceive Think Act

  19. Five, Zero Cat

  20. Simple Identification Know what it is (with input restriction) Know what it is Know what/where it is Know what/where it is + why Closer to human

  21. Sense (closed alpha release in April) Activity� Music,�Speech,�Others detection Music�� Speech� Scene� Acoustic� analysis analysis classification event Genre�/�Mood� Age�/�Gender�� Indoor�/�Outdoor� Dog�bark�/�Baby�cry� /�Key�/�Tempo /�Emotion /�Vehicles Car�horn�/�Snoring�...

  22. Why do we need… Activity detection Unified model

  23. It is really challenging because… Recording environment Recording device Noises Local characteristics Overlapped / Polyphonic

  24. Probability or Saliency ?

  25. Example: AI speakers IoT control-tower Simple voice control with context-awareness (footstep sound, door slam, cough, Someone got back home, got a bad cold) “Alexa, turn on the light” turn on light / TV “Alexa, play dance music” play suitable music “Alexa, turn on TV” adjust room temperature warmer (not just a pattern, there is a “reason”) ask to take cold medicine before sleep

  26. Example: Humanoid robots See things Understand speech + Listen things other than voice Know who they talk to (Source: Atlas, Boston Dynamics)

  27. (Source: NVIDIA) Example: Autonomous car Outside - Car horn (normal, air horn), Siren (fire truck, police, ambulance) Inside - Music mood, snoring, baby, anomaly detection (malfunction warning)

  28. ATMO: Generative music for spatial atmo-sphere Architect Musician + AI researcher Visual artist Contemporary dancer

  29. Generative Music with contextual information

  30. Ambient music Background music Generative Music with contextual information

  31. Analysis Result : Typing in a rainy day… Contextual Information Typing… Reading a book… Raining outside…

  32. Microphone Speaker

  33. contact@cochlear.ai

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend