Deep Learning for Dialog Nate Kushman Researcher Microsoft - - PowerPoint PPT Presentation
Deep Learning for Dialog Nate Kushman Researcher Microsoft - - PowerPoint PPT Presentation
Deep Learning for Dialog Nate Kushman Researcher Microsoft Research Labs Microsoft Research Labs Cambridge, UK Basic Computer Science Research Long-term goal: real-world impact
Microsoft Research Labs Cambridge, UK
Basic Computer Science Research Long-term goal: real-world impact
https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave/
Machine learning for synthesising source code MACHINE LEARNING PROGRAM STRUCTURE
How can we use gradient descent to synthesize human- interpretable programs? Can we use neural networks to guide a traditional program synthesizer?
Example problem from Gaunt et al. 2017
Representation Learning: Generative Models
𝜄 𝛸𝑡 𝛸𝑑 𝑦𝑗 𝑡𝑗 𝑑𝐻
𝑗 ∈ 𝐻 𝐻 ∈ G
Multi-Level VAE
Project Malmo:
- bservation
action
Agent Applications Services Infrastructure
Optimal medication outcomes require a concert of patient, practitioner and health-system insights and actions, in more timely and targeted ways: higher definition healthcare
Example: Customer Care Intelligence
Frictionless human-like conversations Seamless integration between human and AI agents
Why is Dialog Relevant for Fin/T ech Services?
2009 2010 2011 2012 2013 2014 2015 2016 Speech
2009 2010 2011 2012 2013 2014 2015 2016 Speech
2009 2010 2011 2012 2013 2014 2015 2016 Speech Vision
2009 2010 2011 2012 2013 2014 2015 2016 Speech Vision Natural Language
Three main Challenges: Context is challenging for three reasons:
Long Distance Relationships
“ ” “I run Windows 10” “I’m printing in powerpoint” …. Solution: “Upgrade VS240 driver”
Subtlety Matters
“The menu is below the button” “The button is above the menu” Followed by: “Now Click it”
Many Possible Combinations
Need Either:
- Large amounts of data
- Manual engineering for
each new domain
Neural Context Representations Symbolic Context Representations
9.0 8.3 2.7 6.2 9.1 8.9
“ ”
Domain: Technical Support Intent: Projector Setup Device: Epson VS240
Struggle with long distance relationships Excel at Subtlety Requires large amounts of data Great for long distance relationships Struggle with subtly of meaning Requires Engineering per domain
Dominant approach in Real-World Systems In practice used mostly for chit-chat
1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1
A G C G A T G C G A T
Example data: DNA sequences task: Classify junk or gene?
1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1
A G C G A T G C G A T
Ht Ht+1
NN
t t + 1 t - 1 Ht rt Ht+1 h t + 2
Ht – “an h-dimensional ‘compressed summary’ of t tokens”
1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1
A G C G A T G C G A T
t t + 1 t - 1
NN
t + 2 Ht+2
1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1
A G C G A T G C G A T
t t + 1 t - 1
NN
t + 2 Ht+2
NN
Ht
1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1
A G C G A T G C G A T
t
t + 1 t - 1
NN NN NN NN NN NN NN
t + 2 Ht+2
C G A T
0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 … 0 0
Aardvark Aardwolf 0 1 0 0
… 0 0
Zymurgy
0 0 0 0 … 0 1
…
xt 0 1 0 0
... ...
0 0
[104 – 105] [102 – 103] embedding
- 1. Sequence classification
- 2. Next-token prediction
I am happy
I am happy am happy <end> I am happy <start> That’s great That’s great <end>
- 3. Sequence-to-sequence
Decoder
Neural Context <start> That’s great <end>
Encoder
I am happy
Input: Preceding dialog context Output: Next Utterance
Decoder
Neural Context <start> That’s great <end>
Encoder
I am happy
Input: Preceding dialog context Output: Next Utterance
Supervised Learning
Faster learning but requires a human to perform the task to provide the “correct” response
Decoder
Neural Context <start> You’re mean <end>
Encoder
I am happy
Input: Preceding dialog context Output: Next Utterance
Reinforcement Learning
- 10
10
Hard to learn from, but only requires users to provide a rating at the end of a dialog
+10
Language Understanding Dialog Managment Natural Language Generation
Are there any action movies to see this weekend?
Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend” Query(LOCATION)
Where would you like to go? How about the Capital Theater?
Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend”
Language Understanding Dialog Management
Slot - Location: “Capital Theater” Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend” Slot – Location: “Capital”
Datatbase
Language Understanding
Are there any action movies to see this weekend?
Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend”
Language Understanding
Find me a cheap Taiwanese Resturant in Oakland?
Domain: Restaurant Intent: Find Slot - Price: “cheap” Slot - Type: “Taiwanese” Slot – Loc: “Oakland”
Movie Domain in Restaura aurant nt Domain in
- Ontology Based
- Pipeline Decision
9,5% 2,5% Traditional (ngram) RNN
Error Rate
Are there any action movies to see this weekend? Domain: Movie
Are there weekend? Domain: Movie
Are there any action movies to see this weekend? Domain: Movie
Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend”
Are there any <> <> <>
4,3% 3,4% Traditional (SVM) RNN
Error Rate
action movies B-Genre I-Genre to see <> <> this weekend? B-Date I-Date
Are there any action movies to see this weekend?
Are there any <> <> <> action movies B-Genre I-Genre to see <> <> this weekend? B-Date I-Date
Language Understanding
<EOS>
Domain: Movie Intent: Find Slot – Genre: “Action” Slot – Date: “this weekend” Domain: Movie Intent: Find
13,7% 13,4% Separate RNN Joint RNN
Error Rate
Language Understanding Dialog Managment Natural Language Generation
Are there any action movies to see this weekend?
Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend” Query(LOCATION)
Where would you like to go? How about the Capital Theater?
Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend”
Language Understanding Dialog Management
Slot - Location: “Capital Theater” Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend” Slot – Location: “Capital”
Query(LOCATION) Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend”
Dialog Management
Slot - Location: “Capital Theater” Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend” Slot – Location: “Capital”
Language Understanding Dialog Managment Natural Language Generation
Are there any action movies to see this weekend?
Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend” Query(LOCATION)
Where would you like to go? How about the Capital Theater?
Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend”
Language Understanding Dialog Management
Slot - Location: “Capital Theater” Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend” Slot – Location: “Capital”
<S> Where would you Natural Language Generation
Query(LOCATION)
Where would you like to go?
like to go?
Language Understanding Dialog Managment Natural Language Generation
Are there any action movies to see this weekend?
Domain … Query…
Where would you like to go? How about the Capital Theater?
Domain: Movie …
Language Understanding Dialog Management
Slot… Domain: Movie …
Dialog Managment Natural Language Generation
Are there any action movies to see this weekend?
Domain … Query…
Where would you like to go? How about the Capital Theater?
Domain: Movie …
Dialog Management
Slot… Domain: Movie …
Utterance Encoder Utterance Encoder
Natural Language Generation
Are there any action movies to see this weekend?
Domain … Query…
Where would you like to go? How about the Capital Theater?
Domain: Movie … Slot… Domain: Movie …
Utterance Encoder Utterance Encoder
Are there any action movies to see this weekend?
Domain … Query…
Where would you like to go? How about the Capital Theater?
Domain: Movie … Slot… Domain: Movie …
Utterance Encoder Utterance Encoder Utterance Decoder
Neural Context Neural Context
Dialog Encoder
Neural Context Neural Context
Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?
Utterance Encoder Utterance Encoder Utterance Decoder
Neural Context Neural Context
Dialog Encoder
Neural Context Neural Context
Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?
Utterance Encoder Utterance Encoder Utterance Decoder Pros:
Can handle subtly of meaning No Manual Engineering Cons: Requires a large amount of training data Cannot handle long distance dependencies Cannot interface with a database
[Williams et. al. 2017]
Language Understanding Dialog Managment Natural Language Generation
Are there any action movies to see this weekend?
Domain … Query…
Where would you like to go? How about the Capital Theater?
Domain: Movie …
Language Understanding Dialog Management
Slot… Domain: Movie …
Encoder
Language Understanding Dialog Managment Natural Language Generation
Are there any action movies to see this weekend?
Domain … Query…
Where would you like to go? How about the Capital Theater?
Domain: Movie …
Language Understanding Dialog Management
Slot… Domain: Movie …
Encoder Pros:
Requires very little training data Can handle long distance dependencies Can handle subtly of meaning Very robust in practice Cons: Requires significant manual engineering 66,7% 48,5% 44,4% Rule Based DNN HCN
Error Rate
Fact Attention
[Ghazvininejad, et. al. 2017]
Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?
Encoder Utterance Decoder Utterance Encoder Fact Attention Utterance Encoder
Fact Attention
Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?
Encoder Utterance Decoder Utterance Encoder Fact Attention Utterance Encoder Pros:
Requires no manual engineering Can handle subtly of meaning Cons: Requires a large amount of training data Cannot handle long distance dependencies Still mostly for chit-chat
Encoder Utterance Decoder
[Dhingra et. al. 2017]
Utterance Encoder
Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?
Utterance Encoder
Time Location Genre Utterance Decoder Utterance Encoder
Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?
Utterance Encoder
Time Location Genre Utterance Decoder Utterance Encoder
Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?
Utterance Encoder Pros:
Can handle long distance dependencies Can handle subtly of meaning Cons: Requires some manual engineering Not yet practical
Time Location Genre Utterance Decoder Utterance Encoder
Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?
Utterance Encoder
Z Y X Utterance Decoder Utterance Encoder
Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?
Utterance Encoder
Z Y X Utterance Decoder Utterance Encoder
Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?
Utterance Encoder Question: How can we supervise these properties from naturally occurring data? Possible Answer: Vision
- Focus on dialogs in a visual setting
- The relevant entities in the dialog will appear visually as well
Visual Latent Variables Attribute Latent Variables
….. ….. …..
Attribute Connectivity
Triangle Circle Purple Orange Entity Vectors Attribute Vectors Deconvolutional Network
….. …..
Per Entity Images Observ rved d Image ge Purple Triangle Orange Circle FC Layers FC Layers Natura ral Langu guage ge Descri cript ption
- n
[Nash et. al. 2017]
Visual sual Latent ent Variabl iables es: Attri ribut bute e latent ent variabl iables es 𝑨1 𝑨2 𝑨4 𝑨5 SimpleSh mpleShapes apes Datas aset: et:
90% 72% 67% 50% 60% 70% 80% 90% 100% Generative Entity Networks CNN+RNN RNN
Accuracy