with GP U Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO, 2Hz 2 - - PowerPoint PPT Presentation

▶

Sep 21, 2022 262 likes •612 views

Revolutionary Voice Enhancement in Real-Time Communications with GP U Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO, 2Hz 2 Mute Background Noises Voice Quality with Deep Learning Mute Background Noise Mute Everyone Except Me

SLIDE 1

Revolutionary Voice Enhancement in Real-Time Communications with GPU

Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO, 2Hz

SLIDE 2

SLIDE 3

SLIDE 4

Mute Background Noises

SLIDE 5

Voice Quality with Deep Learning

Mute Background Noise 
Mute Everyone Except Me 
Remove Room Echo 
High Resolution Voice Everywhere

SLIDE 6

Real-Time Noise Suppression with Deep Learning

SLIDE 7

Requires 2-4 mics 
Runs on edge device 
Cancels only limited noises 
Outbound only

Traditional Noise Cancellation

SLIDE 8

Train krispNet Deep Neural Network

Background Noises Clean Human Speeches

Deep Learning powered Noise Cancellation

No dependency on mics 
Bi-directional 
Cancels all noise types 
Runs everywhere - on device

and in the cloud

SLIDE 9

How to Measure Voice Quality?

SLIDE 10

Academia - PESQ, Subjective 
Industry - 3QUEST (Speech MOS, Noise MOS, Global MOS) 
Skype Audio Test and 3GPP TS 26.131 specifications

Industry Standards

SLIDE 11

Audio Lab

SLIDE 12

SLIDE 13

krisp.ai

SLIDE 14

SLIDE 15

Seamlessly Integrates in Conferencing Apps Supports any Microphone or Headset

SLIDE 16

SLIDE 17

krisp.ai Best Product in Audio/Voice 2018

SLIDE 18

Training and Inference

SLIDE 19

Training Process

SLIDE 20

2K distinct speakers - gender and age diverse distribution 
>10K distinct noises - babble, construction, traffic, cafeteria,
ffice, etc  
2000+ hours

Training Data

SLIDE 21

All in Python 
Distributed TensorFlow 
Multiple in-house NVIDIA 1080ti. Takes a full week. 
p2.16xlarge in AWS. 16x NVIDIA K80

Training on GPUs

SLIDE 22

Supports NVIDIA, Intel and ARM platforms 
All in C/C++. Sometimes ASM 
Smaller network (5x boost with some quality penalty) 
TensorRT boosts ~2x

Inference

SLIDE 23

Moving to the Cloud

SLIDE 24

Server-side Noise Cancellation

SLIDE 25

Latency Constraints

200ms end to end latency

Codecs and other DSP (10-80ms) Network (varies) DNN Compute ( < 5ms) DNN Algorithmic (15ms)

< 20ms

SLIDE 26

How do you scale to 100K+ concurrent streams with such latency constraints?    

Ex. Discord processes 2.5M

concurrent audio streams

SLIDE 27

10x-20x less costly

…

CPU Servers GPU Servers

SLIDE 28

Scalability with Batching

SLIDE 29

Ultimate Quality

Remove Noise Remove Room Echo Expand Voice HD Audio Frame Ultimate Quality Audio Frame

} 5ms

SLIDE 30

Maximum Quality and Scale with NVIDIA Tensor Cores

SLIDE 31

TensorRT is pretty awesome

750 1500 2250 3000 P100 V100 K80 T4 TensorFlow Batching TensorRT Batching

SLIDE 32

T4 and V100 are both awesome

1250 2500 3750 5000 P100 V100 T4 FP32 FP16

SLIDE 33

1. Voice Quality Enhancement is moving to the Cloud 
2. For large scale deployments we need GPUs 
3. T4 and V100 GPUs are most efficient for this

Key Takeaways

SLIDE 34

Thank You!

krisp.ai

} 5ms

Booth #247