Spoken Language Understanding on the Edge Alaa Saade, Alice Coucke, - - PowerPoint PPT Presentation

spoken language understanding on the edge
SMART_READER_LITE
LIVE PREVIEW

Spoken Language Understanding on the Edge Alaa Saade, Alice Coucke, - - PowerPoint PPT Presentation

Spoken Language Understanding on the Edge Alaa Saade, Alice Coucke, Alexandre Caulier, Joseph Dureau, Adrien Ball, Thodore Bluche, David Leroy, Clment Doumouro, Thibault Gisselbrecht, Francesco Caltagirone, Thibaut Lavril, Mael Primet Snips,


slide-1
SLIDE 1

Spoken Language Understanding on the Edge

Alaa Saade, Alice Coucke, Alexandre Caulier, Joseph Dureau, Adrien Ball, Théodore Bluche, David Leroy, Clément Doumouro, Thibault Gisselbrecht, Francesco Caltagirone, Thibaut Lavril, Mael Primet Snips, Paris EMC2 Workshop @ Neurips 2019
 November 13 Alexandre Caulier

slide-2
SLIDE 2

t ɜ r n ɑ n ð ə 
 l a ɪ t s ɪ n ð ə 
 ˈl ɪ v ɪ ŋ r u m Automatic Speech Recognition Engine Language model Acoustic model Natural Language Understanding Engine Turn on the lights in the living room Intent: 
 SwitchLightOn 
 Slots:
 room: living room Language modeling

Spoken language understanding system

Tested and certified to run on

1GB RAM
 1.4GHz CPU

  • Cloud independent - no remote processing


  • Private by Design - no user data can be collected


  • Accurate - on-par with cloud-based solutions

Features

slide-3
SLIDE 3

Deep neural network

/a/ /b/ /c/ /d/ /e/

time

Proba over phones Challenges Large deep learning models
 
 Computationally & memory intensive Training data: 10K+ hours of in-domain audio with transcript per language Trade-off between accuracy & computational efficiency Reduced model size (~10MB) 
 Few K hours of training data t ɜ r n ɑ n ð ə 
 l a ɪ t s ɪ n ð ə 
 ˈl ɪ v ɪ ŋ r u m Automatic Speech Recognition Engine Language model Acoustic model Natural Language Understanding Engine Turn on the lights in the living room Intent: 
 SwitchLightOn 
 Slots:
 room: living room Language modeling

Acoustic modeling

slide-4
SLIDE 4

t ɜ r n ɑ n ð ə 
 l a ɪ t s ɪ n ð ə 
 ˈl ɪ v ɪ ŋ r u m Automatic Speech Recognition Engine Language model Acoustic model Natural Language Understanding Engine Turn on the lights in the living room Intent: 
 SwitchLightOn 
 Slots:
 room: living room Language modeling

Assistant Contextualization

Intent Conditional Random Field Logistic regression Sentence Slots Natural Language Understanding

/a/ /b/ /c/ /d/

time

Proba over phones Turn on the lights in the living room Decoding graph Language Model

Approach : LM and NLU are consistent and contextualized

Out of vocabulary management Lightweight models On-device personalization

slide-5
SLIDE 5

Datasets Audio utterances with transcripts & supervision Recorded in close and far- field

💢Smart Lights Assistant


1.8K utterances


400 word pronunciations

🎶 Music Assistant


3K utterances


178K word pronunciations

Method 
 Specialized for 💢 & 🎶

<100MB, real time on a Raspberry Pi 3

Google Speech-to-Text cloud services


One-size-fits-all engine

Metrics End-to-end score

Intent: 
 SwitchLightOn Slots:
 room: living room

% of perfectly parsed queries

Benchmarks - Datasets Open Sourcing


Experimental setting

slide-6
SLIDE 6

Benchmarks


End-to-End performance

Tier 1 Artists
 1-1k Tier 2 Artists
 4.5k-5.5k Tier 3 Artists
 9k-10k Snips 71 % 68 % 67 % Google 69 % 38 % 37 %

🎶


 Contextualized for 💢 & 🎶 <100MB, real time on a Raspberry Pi 3 STT cloud service
 One-size-fits-all engine

Smart Lights Assistant💢


400 word pronunciations

Music Assistant 🎶


178K word pronunciations

0% 50% 100%

48 79 69 84

% of perfectly parsed queries

Questions ?