OPPORTUNITIES AND CHALLENGES OF PARALLELIZING SPEECH RECOGNITION
Jike Chong, Gerald Friedland, Adam Janin, Nelson Morgan, Chris Oei
1
OPPORTUNITIES AND CHALLENGES OF PARALLELIZING SPEECH RECOGNITION - - PowerPoint PPT Presentation
OPPORTUNITIES AND CHALLENGES OF PARALLELIZING SPEECH RECOGNITION Jike Chong, Gerald Friedland, Adam Janin , Nelson Morgan, Chris Oei 1 OUTLINE Motivation Improving Accuracy Improving Throughput Improving Latency 2 Meeting
Jike Chong, Gerald Friedland, Adam Janin, Nelson Morgan, Chris Oei
1
2
3
Speech Recognition Relevant Web Scraping Audio Signal
"who spoke when"
Speaker Diarization Speaker Attribution
"what's relevant to this" "who said what"
Summarization
"what was said"
Indexing, Search, Retrieval Question Answering
... ... higher-level analysis ... "what are the main points" ...
4
5
6
7
8
9
10
11
12
13
WFST Recogni-on Network
... HOP hh aa p ... ON aa n ... POP p aa p ...
aa hh n
HOP ON POP CAT HAT IN THE ... ... ... ... ... CAT HAT ... ... HOP IN ... ON POP ... THE ...
HMM Acous5c Phone Model Pronuncia5on Model Bigram Language Model
…
Features from
Gaussian Mixture Model for One Phone State
… … … … … … …
Mixture Components Compu-ng distance to each mixture components Compu-ng weighted sum
14 14
WFST Recogni-on Network
15
16
17
18
Audiotrack: Clustering: Segmentation:
19
(Re-)Alignment Merge two Clusters?
Yes
(Re-)Training
Cluster1 Cluster2 Cluster1 Cluster2 Cluster1 Cluster2 Cluster1 Cluster2
End
No
Initialization
Cluster1 Cluster2 Cluster2 Cluster2 Cluster1 Cluster2 Cluster2 Cluster2
20
21
Audio Signal History Buffer Speaker Mapping Diarization Engine Segmentation Clustering Online Subsystem Offline Subsystem 2.5 sec Buffer Online Decision Online Decision MAP Training "who is speaking now"
22
32.00 33.00 34.00 35.00 36.00 37.00 38.00 39.00 40.00
1 2 3 4 5 6 7 8
Online Diarization: DER/Core
Error % Cores Dedicated to Offline Subsystem
7+GPU
23
24
25