Incl Inclusi usive Des ve Design ign
Dee Deep Lear p Learning ning on
- n Aud
Audio in Azu io in Azure re
Swetha Machanavajhala, Software Engineer II Xiaoyong Zhu, Senior Data Scientist
@swethaMVNV
Incl Inclusi usive Des ve Design ign Dee Deep Lear p Learning - - PowerPoint PPT Presentation
Incl Inclusi usive Des ve Design ign Dee Deep Lear p Learning ning on on Aud Audio in Azu io in Azure re Swetha Machanavajhala, Software Engineer II Xiaoyong Zhu, Senior Data Scientist @swethaMVNV Tru rue life e life-thr
Swetha Machanavajhala, Software Engineer II Xiaoyong Zhu, Senior Data Scientist
@swethaMVNV
Swetha Machanavajhala
Carbon monoxide detector was beeping for a WE WEEK K at Swetha’s house. Since she is deaf, she was unaware until a neighbor informed her.
Tru rue life e life-thr threatenin eatening g in incid cident ent
DISABILITY
MISMATCHED HUMAN INTERACTIONS
Xiaoyong Zhu
https://www.3dsig.com/
https://www.otosense.com/
https://www.audioanalytic.com/
Convert 1-dimensional data to 2-dimensional data Input for CNNs – Log-scaled mel-spectrogram mel-spectrogram = what humans hear Consider time by mel-spectrogram bands (X by Y axis) 150 bands each pixel = amplitude in time/frequency combination
Convert 1-dimensional data to 2-dimensional data Input for CNNs – Log-scaled mel-spectrogram mel-spectrogram = what humans hear Consider time by mel-spectrogram bands (X by Y axis) 150 bands each pixel = amplitude in time/frequency combination
Convert 1-dimensional data to 2-dimensional data Input for CNNs – Log-scaled mel-spectrogram mel-spectrogram = what humans hear Consider time by mel-spectrogram bands (X by Y axis) 150 bands each pixel = amplitude in time/frequency combination
11 convolutional layers Faster convergence & better accuracy Batch-normalization layers Improving accuracy Self-attention mechanism Longer segments (2secs) Narrow mel-band
11 convolutional layers Faster convergence & better accuracy Batch-normalization layers Improving accuracy Self-attention mechanism Longer segments (2secs) Narrow mel-band
11 convolutional layers Faster convergence & better accuracy Batch-normalization layers Improving accuracy Self-attention mechanism Longer segments (2secs) Narrow mel-band
Intelligent Sound Prediction - Architecture