DEEP LEARNING IN BUSINESS CONVERSATION ANALYSIS ANTHONY SCODARY, - - PowerPoint PPT Presentation

deep learning in business conversation analysis
SMART_READER_LITE
LIVE PREVIEW

DEEP LEARNING IN BUSINESS CONVERSATION ANALYSIS ANTHONY SCODARY, - - PowerPoint PPT Presentation

DEEP LEARNING IN BUSINESS CONVERSATION ANALYSIS ANTHONY SCODARY, GRIDSPACE WONKYUM LEE, GRIDSPACE INTRO Which translation speech recognition so and so forth I mean there's a whole bunch of amazing applications that are made possible by deep


slide-1
SLIDE 1

DEEP LEARNING IN BUSINESS CONVERSATION ANALYSIS

ANTHONY SCODARY, GRIDSPACE WONKYUM LEE, GRIDSPACE

slide-2
SLIDE 2

INTRO

“Which translation speech recognition so and so forth I mean there's a whole bunch of amazing applications that are made possible by deep learning and so internet service providers are using it for internal application development. And then lastly what you mentioned as cloud service providers and basically because of the adoption of gp use and because of the success of kuta and so many applications are now able to be accelerate on gp use so that we can extend the capabilities of moore's law so that we can continue. You'd have the benefits of of computing acceleration, which which in the cloud means reducing cost. And that's on the serve cloud service provider side of of the Internet company so that would be amazon web services as the Google compute cloud.”

slide-3
SLIDE 3

OVERVIEW

  • 1. Business Conversations
  • 2. Recognition
  • 3. Analysis
slide-4
SLIDE 4
  • 1. Business Conversations

DEEP LEARNING IN BUSINESS CONVERSATION ANALYSIS

slide-5
SLIDE 5

PROTOCOLS

SIGNAL PROCESSING

slide-6
SLIDE 6

PROTOCOLS

  • Symbol Set (Lexicon)
  • Rules (Syntax)
  • Meaning (Semantics)
slide-7
SLIDE 7

SINK

TYPES OF PROTOCOLS

SOURCE MEDIUM

slide-8
SLIDE 8

TYPES OF PROTOCOLS: ENDPOINTS

BIRD CALL SEISMOGRAPH GROWLING ELECTRIC FENCE TCP FIRE ALARM “SIT” SIRI SPEECH

NATURE MACHINE HUMAN NATURE MACHINE HUMAN

slide-9
SLIDE 9

TYPES OF PROTOCOLS: H2H MEDIA

BANDWIDTH INFORMATION DENSITY EMAIL SMS VOICEMAIL CHAT MISSED CALL POSTCARD WAVING SPEECH

slide-10
SLIDE 10

WHY DO WE STILL TALK?

  • Fast
  • Innate
  • Layered
  • Synchronous
  • Dense in meaning
slide-11
SLIDE 11

ORGANIZATIONS

INTERNAL COMMUNICATION EXTERNAL COMMUNICATION Calls Meetings Hallway Chats Support Calls In-Person Sales Documents Email Chat SMS Chat Support Social Media Email

slide-12
SLIDE 12

ORGANIZATIONS

INTERNAL COMMUNICATION EXTERNAL COMMUNICATION Calls Meetings Hallway Chats Support Calls In-Person Sales Documents Email Chat SMS Chat Support Social Media Email Mostly lost today

slide-13
SLIDE 13

THIS DATA MATTERS

slide-14
SLIDE 14

THIS DATA MATTERS

slide-15
SLIDE 15
  • 2. Recognition

DEEP LEARNING IN BUSINESS CONVERSATION ANALYSIS

slide-16
SLIDE 16

REAL-TIME CALL ANALYSIS

ASR DSP SCANNER CLASSIFIER

slide-17
SLIDE 17

Feature Extraction (MFCC) Acoustic Model (GMM) Lexicon Language Model

“hello” Conventional ASR

  • Combination of blocks designed by each expertise

GMM-HMM: 1980-2010

ASR

slide-18
SLIDE 18

Feature Extraction (MFCC) Acoustic Model (GMM) Lexicon Language Model

“hello” Lots of tuning to improve accuracy

Robust Feature, Speaker-Adaptation, Application specific LM

ASR

slide-19
SLIDE 19

Feature Extraction (MFCC) Acoustic Model Lexicon Language Model

“hello”

Replacing acoustic model with deep neural net

DNN-HMM: 30%-40% improvement (2011-2017)

ASR

slide-20
SLIDE 20

All-in-one Deep Learning Model

“hello”

Someday in the near future, Replacing whole models with one neural net

End-to-End ASR: active research in-progress

ASR

slide-21
SLIDE 21

Simple Linear model(GMM) Advanced Linear model (GMM-SAT-DT) Deep Learning Model

End-to-End Deep Learning (under development)

“Human parity”

ASR error rate for decades (in Academia)

WER (log scale)

ASR HISTORY

slide-22
SLIDE 22

“However, it’s still NOT Easy in real-world business conversational voice”

Language Challenge Acoustic Challenge

  • Domain specific terminology (company name, product name, …)
  • Spontaneous speech (natural conversation)
  • Accent, Dialect, Mispronunciation
  • Noise (background, channel)
  • Acoustic effect (reverberation, Lombard effect)
  • Variability from speakers
  • Microphone displacement (near/far field)

ASR CHALLENGES

slide-23
SLIDE 23

Data is King!

  • General Conversational Data + in-domain data

(training with in-domain data improves 15-30% accuracy)

  • Simulated data with variety noise helps!

(improves 10-15% accuracy)

  • Data collection with semi-supervised training

helps

LARGE-SCALE DATA PROCESSING

slide-24
SLIDE 24

Multi-GPU Training

  • 4x Titan X with parallel training
  • One week for full-training with 25k hours audio
  • 80x Faster than 32 core CPU machine

LARGE-SCALE DATA PROCESSING

slide-25
SLIDE 25

Real-time adaptive processing

  • Online i-vector adaptation (5-10% improvement)
  • speaker characteristics
  • environmental noise
  • Accent & dialect
  • Context-based grammar adaptation (recognize

in-domain specific terms)

REAL-TIME ADAPTIVE PROCESSING

slide-26
SLIDE 26

State-of-Art deep learning model

  • Time-delayed neural network
  • Computation optimization (Subsampling,

bi-phone, etc)

  • WFST framework for search

“Purely sequence-trained neural networks for ASR based on lattice-free MMI”, Interspeech 2016

WER: 5~6% Capital Market Model 12~15% Customer Intelligence Model Real-Time-Factor: 0.3-0.35

STATE OF THE ART DEEP LEARNING MODEL

slide-27
SLIDE 27

DEEP LEARNING IN BUSINESS CONVERSATION ANALYSIS

  • 3. Analysis
slide-28
SLIDE 28

IS TRANSCRIPTION REALLY WHAT YOU WANT ANYWAY?

slide-29
SLIDE 29

STUFF WITH ACTUAL USE TO COMPANIES

  • Prediction
  • Classification
  • Summarization
  • Entity Extraction
  • Anomaly Detection
slide-30
SLIDE 30

“ARTIFICIAL INTELLIGENCE”

slide-31
SLIDE 31

“ARTIFICIAL INTELLIGENCE”

ARITHMETIC GRAPH SEARCH CHESS IMAGE RECOGNITION CONVERSATION EMOTION CONSCIOUSNESS ABOVE THIS LINE THIS SURELY IS “REAL” INTELLIGENCE

slide-32
SLIDE 32

“ARTIFICIAL INTELLIGENCE”

TECHNOLOGY REVOLUTION WASTE OF MONEY AND TIME

slide-33
SLIDE 33

“ARTIFICIAL INTELLIGENCE”

We focus on the industry needs as an engineering task.

slide-34
SLIDE 34

ANALYSIS

  • 1. Speech is complex.

Let models decide what features matter for a task or application.

slide-35
SLIDE 35

ANALYSIS

  • 2. Speech is high dimensional.

Datasets must be large enough to train large models to match.

slide-36
SLIDE 36

ANALYSIS

  • 3. Conversational speech is noisy.

Large, well-augmented datasets are necessary to be robust.

slide-37
SLIDE 37

ANALYSIS

slide-38
SLIDE 38

ANALYSIS

slide-39
SLIDE 39

ANALYSIS

slide-40
SLIDE 40

ANALYSIS

...

slide-41
SLIDE 41

ANALYSIS

slide-42
SLIDE 42

ANALYSIS

aardvark zebra One-hot (D-dimensions)

ℝ300 ℝ40

slide-43
SLIDE 43

ANALYSIS

KING QUEEN BROTHER SISTER MAN WOMAN

slide-44
SLIDE 44

ANALYSIS

i have no political party actually ~~~‘democrat’ i have no political party actually ~~~‘democrat’ i have no political party actually ~~~‘democrat’

slide-45
SLIDE 45

ANALYSIS

slide-46
SLIDE 46

API

gridspace.com

slide-47
SLIDE 47

QUESTIONS?