with gp u
play

with GP U Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO, 2Hz 2 - PowerPoint PPT Presentation

Revolutionary Voice Enhancement in Real-Time Communications with GP U Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO, 2Hz 2 Mute Background Noises Voice Quality with Deep Learning Mute Background Noise Mute Everyone Except Me


  1. Revolutionary Voice Enhancement in Real-Time Communications with GP U Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO, 2Hz

  2. 2

  3. Mute Background Noises

  4. Voice Quality with Deep Learning •Mute Background Noise 
 •Mute Everyone Except Me 
 •Remove Room Echo 
 •High Resolution Voice Everywhere 5

  5. Real-Time Noise Suppression with Deep Learning 6

  6. Traditional Noise Cancellation -Requires 2-4 mics 
 -Runs on edge device 
 -Cancels only limited noises 
 -Outbound only 7

  7. Deep Learning powered Noise Cancellation Train krispNet -No dependency on mics 
 Deep Neural Network -Bi-directional 
 -Cancels all noise types 
 -Runs everywhere - on device 
 and in the cloud Background Clean Human Noises Speeches 8

  8. How to Measure Voice Quality? 9

  9. Industry Standards - Academia - PESQ, Subjective 
 - Industry - 3QUEST (Speech MOS, Noise MOS, Global MOS) 
 - Skype Audio Test and 3GPP TS 26.131 specifications 10

  10. Audio Lab 11

  11. 12

  12. krisp.ai 13

  13. Seamlessly Integrates in Conferencing Apps Supports any Microphone or Headset

  14. krisp.ai Best Product in Audio/Voice 2018 17

  15. Training and Inference 18

  16. Training Process 19

  17. Training Data - 2K distinct speakers - gender and age diverse distribution 
 - >10K distinct noises - babble, construction, traffic, cafeteria, office, etc 
 - 2000+ hours 20

  18. Training on GPUs - All in Python 
 - Distributed TensorFlow 
 - Multiple in-house NVIDIA 1080ti. Takes a full week. 
 - p2.16xlarge in AWS. 16x NVIDIA K80 21

  19. Inference - Supports NVIDIA, Intel and ARM platforms 
 - All in C/C++. Sometimes ASM 
 - Smaller network (5x boost with some quality penalty) 
 - TensorRT boosts ~2x 22

  20. Moving to the Cloud 23

  21. Server-side Noise Cancellation 24

  22. Latency Constraints 200ms end to end latency Codecs and other DSP (10-80ms) Network (varies) DNN Compute ( < 5ms) < 20ms DNN Algorithmic (15ms) 25

  23. 
 
 How do you scale to 100K+ concurrent streams with such latency constraints? Ex. Discord processes 2.5M 
 concurrent audio streams 26

  24. CPU Servers … GPU Servers 10x-20x less costly 27

  25. Scalability with Batching 28

  26. Ultimate Quality Audio Frame } 5ms Remove Noise Remove Room Echo Expand Voice HD Ultimate Quality Audio Frame 29

  27. Maximum Quality and Scale with NVIDIA Tensor Cores 30

  28. TensorRT is pretty awesome TensorFlow Batching TensorRT Batching 3000 2250 1500 750 0 P100 V100 K80 T4 31

  29. T4 and V100 are both awesome FP32 FP16 5000 3750 2500 1250 0 P100 V100 T4 32

  30. Key Takeaways 1. Voice Quality Enhancement is moving to the Cloud 
 2. For large scale deployments we need GPUs 
 3. T4 and V100 GPUs are most efficient for this 33

  31. Thank You! Booth #247 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend