Incl Inclusi usive Des ve Design ign Dee Deep Lear p Learning - - PowerPoint PPT Presentation

incl inclusi usive des ve design ign
SMART_READER_LITE
LIVE PREVIEW

Incl Inclusi usive Des ve Design ign Dee Deep Lear p Learning - - PowerPoint PPT Presentation

Incl Inclusi usive Des ve Design ign Dee Deep Lear p Learning ning on on Aud Audio in Azu io in Azure re Swetha Machanavajhala, Software Engineer II Xiaoyong Zhu, Senior Data Scientist @swethaMVNV Tru rue life e life-thr


slide-1
SLIDE 1

Incl Inclusi usive Des ve Design ign

Dee Deep Lear p Learning ning on

  • n Aud

Audio in Azu io in Azure re

Swetha Machanavajhala, Software Engineer II Xiaoyong Zhu, Senior Data Scientist

@swethaMVNV

slide-2
SLIDE 2

Swetha Machanavajhala

Carbon monoxide detector was beeping for a WE WEEK K at Swetha’s house. Since she is deaf, she was unaware until a neighbor informed her.

Tru rue life e life-thr threatenin eatening g in incid cident ent

slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5

DISABILITY

PERSONAL HEALTH CONDITION

DISABILITY

=

MISMATCHED HUMAN INTERACTIONS

slide-6
SLIDE 6

Incl Inclusi usive Des ve Design ign

slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14

Visualizing Sounds React in a second

slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18

Capturing loudness real-time

slide-19
SLIDE 19
slide-20
SLIDE 20

Currently…

slide-21
SLIDE 21

Currently…

slide-22
SLIDE 22

Hearing AI can transcribe phone calls

slide-23
SLIDE 23

Hearing AI can do more…

slide-24
SLIDE 24
slide-25
SLIDE 25

Xiaoyong Zhu

Deep Learning for Audio in Azure

slide-26
SLIDE 26

Landscape

Sound based predictive maintenance

https://www.3dsig.com/

slide-27
SLIDE 27

Landscape

SDK and product to turn machine sounds to actions

https://www.otosense.com/

slide-28
SLIDE 28

Landscape

enables OEMs to embed contextual awareness

  • nto devices.

https://www.audioanalytic.com/

slide-29
SLIDE 29
slide-30
SLIDE 30

Dataset

slide-31
SLIDE 31
slide-32
SLIDE 32

Convert 1-dimensional array to 2-dimensional matrix

slide-33
SLIDE 33

CNN on audio

Convert 1-dimensional data to 2-dimensional data Input for CNNs – Log-scaled mel-spectrogram mel-spectrogram = what humans hear Consider time by mel-spectrogram bands (X by Y axis) 150 bands each pixel = amplitude in time/frequency combination

slide-34
SLIDE 34

CNN on audio

Convert 1-dimensional data to 2-dimensional data Input for CNNs – Log-scaled mel-spectrogram mel-spectrogram = what humans hear Consider time by mel-spectrogram bands (X by Y axis) 150 bands each pixel = amplitude in time/frequency combination

slide-35
SLIDE 35

CNN on audio

Convert 1-dimensional data to 2-dimensional data Input for CNNs – Log-scaled mel-spectrogram mel-spectrogram = what humans hear Consider time by mel-spectrogram bands (X by Y axis) 150 bands each pixel = amplitude in time/frequency combination

slide-36
SLIDE 36

Selecting a right band number is important

slide-37
SLIDE 37

Network Architecture

11 convolutional layers Faster convergence & better accuracy Batch-normalization layers Improving accuracy Self-attention mechanism Longer segments (2secs) Narrow mel-band

slide-38
SLIDE 38

Network Architecture

11 convolutional layers Faster convergence & better accuracy Batch-normalization layers Improving accuracy Self-attention mechanism Longer segments (2secs) Narrow mel-band

slide-39
SLIDE 39

Network Architecture

11 convolutional layers Faster convergence & better accuracy Batch-normalization layers Improving accuracy Self-attention mechanism Longer segments (2secs) Narrow mel-band

slide-40
SLIDE 40

Intelligent Sound Prediction - Architecture

slide-41
SLIDE 41

Demo Hearing AI can recognize sounds

slide-42
SLIDE 42
slide-43
SLIDE 43

Performance

slide-44
SLIDE 44
slide-45
SLIDE 45

Artificial Intelligence proves sounds need not be heard!

Sound Movement Speech Phone calls Localization Specific sounds

slide-46
SLIDE 46