Deep Learning for Dialog Nate Kushman Researcher Microsoft - - PowerPoint PPT Presentation

deep learning for dialog
SMART_READER_LITE
LIVE PREVIEW

Deep Learning for Dialog Nate Kushman Researcher Microsoft - - PowerPoint PPT Presentation

Deep Learning for Dialog Nate Kushman Researcher Microsoft Research Labs Microsoft Research Labs Cambridge, UK Basic Computer Science Research Long-term goal: real-world impact


slide-1
SLIDE 1

Nate Kushman Researcher Microsoft Research Labs

Deep Learning for Dialog

slide-2
SLIDE 2

Microsoft Research Labs Cambridge, UK

 Basic Computer Science Research  Long-term goal: real-world impact

slide-3
SLIDE 3

https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave/

slide-4
SLIDE 4

Machine learning for synthesising source code MACHINE LEARNING PROGRAM STRUCTURE

How can we use gradient descent to synthesize human- interpretable programs? Can we use neural networks to guide a traditional program synthesizer?

Example problem from Gaunt et al. 2017

slide-5
SLIDE 5

Representation Learning: Generative Models

𝜄 𝛸𝑡 𝛸𝑑 𝑦𝑗 𝑡𝑗 𝑑𝐻

𝑗 ∈ 𝐻 𝐻 ∈ G

Multi-Level VAE

slide-6
SLIDE 6

Project Malmo:

  • bservation

action

slide-7
SLIDE 7

Agent Applications Services Infrastructure

Optimal medication outcomes require a concert of patient, practitioner and health-system insights and actions, in more timely and targeted ways: higher definition healthcare

slide-8
SLIDE 8

Example: Customer Care Intelligence

 Frictionless human-like conversations  Seamless integration between human and AI agents

Why is Dialog Relevant for Fin/T ech Services?

slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15

2009 2010 2011 2012 2013 2014 2015 2016 Speech

slide-16
SLIDE 16

2009 2010 2011 2012 2013 2014 2015 2016 Speech

slide-17
SLIDE 17

2009 2010 2011 2012 2013 2014 2015 2016 Speech Vision

slide-18
SLIDE 18

2009 2010 2011 2012 2013 2014 2015 2016 Speech Vision Natural Language

slide-19
SLIDE 19

Three main Challenges: Context is challenging for three reasons:

Long Distance Relationships

“ ” “I run Windows 10” “I’m printing in powerpoint” …. Solution: “Upgrade VS240 driver”

Subtlety Matters

“The menu is below the button” “The button is above the menu” Followed by: “Now Click it”

Many Possible Combinations

Need Either:

  • Large amounts of data
  • Manual engineering for

each new domain

slide-20
SLIDE 20

Neural Context Representations Symbolic Context Representations

9.0 8.3 2.7 6.2 9.1 8.9

“ ”

Domain: Technical Support Intent: Projector Setup Device: Epson VS240

Struggle with long distance relationships Excel at Subtlety Requires large amounts of data Great for long distance relationships Struggle with subtly of meaning Requires Engineering per domain

Dominant approach in Real-World Systems In practice used mostly for chit-chat

slide-21
SLIDE 21

1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1

A G C G A T G C G A T

Example data: DNA sequences task: Classify junk or gene?

slide-22
SLIDE 22

1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1

A G C G A T G C G A T

Ht Ht+1

NN

t t + 1 t - 1 Ht rt Ht+1 h t + 2

Ht – “an h-dimensional ‘compressed summary’ of t tokens”

slide-23
SLIDE 23

1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1

A G C G A T G C G A T

t t + 1 t - 1

NN

t + 2 Ht+2

slide-24
SLIDE 24

1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1

A G C G A T G C G A T

t t + 1 t - 1

NN

t + 2 Ht+2

NN

Ht

slide-25
SLIDE 25

1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1

A G C G A T G C G A T

t

t + 1 t - 1

NN NN NN NN NN NN NN

t + 2 Ht+2

slide-26
SLIDE 26

C G A T

0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 … 0 0

Aardvark Aardwolf 0 1 0 0

… 0 0

Zymurgy

0 0 0 0 … 0 1

xt 0 1 0 0

... ...

0 0

[104 – 105] [102 – 103] embedding

slide-27
SLIDE 27
  • 1. Sequence classification
  • 2. Next-token prediction

I am happy

I am happy am happy <end> I am happy <start> That’s great That’s great <end>

  • 3. Sequence-to-sequence
slide-28
SLIDE 28

Decoder

Neural Context <start> That’s great <end>

Encoder

I am happy

Input: Preceding dialog context Output: Next Utterance

slide-29
SLIDE 29

Decoder

Neural Context <start> That’s great <end>

Encoder

I am happy

Input: Preceding dialog context Output: Next Utterance

Supervised Learning

Faster learning but requires a human to perform the task to provide the “correct” response

slide-30
SLIDE 30

Decoder

Neural Context <start> You’re mean <end>

Encoder

I am happy

Input: Preceding dialog context Output: Next Utterance

Reinforcement Learning

  • 10

10

Hard to learn from, but only requires users to provide a rating at the end of a dialog

+10

slide-31
SLIDE 31

Language Understanding Dialog Managment Natural Language Generation

Are there any action movies to see this weekend?

Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend” Query(LOCATION)

Where would you like to go? How about the Capital Theater?

Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend”

Language Understanding Dialog Management

Slot - Location: “Capital Theater” Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend” Slot – Location: “Capital”

Datatbase

slide-32
SLIDE 32

Language Understanding

Are there any action movies to see this weekend?

Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend”

Language Understanding

Find me a cheap Taiwanese Resturant in Oakland?

Domain: Restaurant Intent: Find Slot - Price: “cheap” Slot - Type: “Taiwanese” Slot – Loc: “Oakland”

Movie Domain in Restaura aurant nt Domain in

  • Ontology Based
  • Pipeline Decision
slide-33
SLIDE 33

9,5% 2,5% Traditional (ngram) RNN

Error Rate

Are there any action movies to see this weekend? Domain: Movie

Are there weekend? Domain: Movie

slide-34
SLIDE 34

Are there any action movies to see this weekend? Domain: Movie

Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend”

Are there any <> <> <>

4,3% 3,4% Traditional (SVM) RNN

Error Rate

action movies B-Genre I-Genre to see <> <> this weekend? B-Date I-Date

slide-35
SLIDE 35

Are there any action movies to see this weekend?

Are there any <> <> <> action movies B-Genre I-Genre to see <> <> this weekend? B-Date I-Date

Language Understanding

<EOS>

Domain: Movie Intent: Find Slot – Genre: “Action” Slot – Date: “this weekend” Domain: Movie Intent: Find

13,7% 13,4% Separate RNN Joint RNN

Error Rate

slide-36
SLIDE 36

Language Understanding Dialog Managment Natural Language Generation

Are there any action movies to see this weekend?

Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend” Query(LOCATION)

Where would you like to go? How about the Capital Theater?

Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend”

Language Understanding Dialog Management

Slot - Location: “Capital Theater” Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend” Slot – Location: “Capital”

slide-37
SLIDE 37

Query(LOCATION) Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend”

Dialog Management

Slot - Location: “Capital Theater” Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend” Slot – Location: “Capital”

slide-38
SLIDE 38

Language Understanding Dialog Managment Natural Language Generation

Are there any action movies to see this weekend?

Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend” Query(LOCATION)

Where would you like to go? How about the Capital Theater?

Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend”

Language Understanding Dialog Management

Slot - Location: “Capital Theater” Domain: Movie Intent: Find Slot - Genre: “Action” Slot - Date: “this weekend” Slot – Location: “Capital”

slide-39
SLIDE 39

<S> Where would you Natural Language Generation

Query(LOCATION)

Where would you like to go?

like to go?

slide-40
SLIDE 40

Language Understanding Dialog Managment Natural Language Generation

Are there any action movies to see this weekend?

Domain … Query…

Where would you like to go? How about the Capital Theater?

Domain: Movie …

Language Understanding Dialog Management

Slot… Domain: Movie …

slide-41
SLIDE 41

Dialog Managment Natural Language Generation

Are there any action movies to see this weekend?

Domain … Query…

Where would you like to go? How about the Capital Theater?

Domain: Movie …

Dialog Management

Slot… Domain: Movie …

Utterance Encoder Utterance Encoder

slide-42
SLIDE 42

Natural Language Generation

Are there any action movies to see this weekend?

Domain … Query…

Where would you like to go? How about the Capital Theater?

Domain: Movie … Slot… Domain: Movie …

Utterance Encoder Utterance Encoder

slide-43
SLIDE 43

Are there any action movies to see this weekend?

Domain … Query…

Where would you like to go? How about the Capital Theater?

Domain: Movie … Slot… Domain: Movie …

Utterance Encoder Utterance Encoder Utterance Decoder

slide-44
SLIDE 44

Neural Context Neural Context

Dialog Encoder

Neural Context Neural Context

Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?

Utterance Encoder Utterance Encoder Utterance Decoder

slide-45
SLIDE 45

Neural Context Neural Context

Dialog Encoder

Neural Context Neural Context

Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?

Utterance Encoder Utterance Encoder Utterance Decoder Pros:

Can handle subtly of meaning No Manual Engineering Cons: Requires a large amount of training data Cannot handle long distance dependencies Cannot interface with a database

slide-46
SLIDE 46

[Williams et. al. 2017]

Language Understanding Dialog Managment Natural Language Generation

Are there any action movies to see this weekend?

Domain … Query…

Where would you like to go? How about the Capital Theater?

Domain: Movie …

Language Understanding Dialog Management

Slot… Domain: Movie …

Encoder

slide-47
SLIDE 47

Language Understanding Dialog Managment Natural Language Generation

Are there any action movies to see this weekend?

Domain … Query…

Where would you like to go? How about the Capital Theater?

Domain: Movie …

Language Understanding Dialog Management

Slot… Domain: Movie …

Encoder Pros:

Requires very little training data Can handle long distance dependencies Can handle subtly of meaning Very robust in practice Cons: Requires significant manual engineering 66,7% 48,5% 44,4% Rule Based DNN HCN

Error Rate

slide-48
SLIDE 48

Fact Attention

[Ghazvininejad, et. al. 2017]

Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?

Encoder Utterance Decoder Utterance Encoder Fact Attention Utterance Encoder

slide-49
SLIDE 49

Fact Attention

Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?

Encoder Utterance Decoder Utterance Encoder Fact Attention Utterance Encoder Pros:

Requires no manual engineering Can handle subtly of meaning Cons: Requires a large amount of training data Cannot handle long distance dependencies Still mostly for chit-chat

slide-50
SLIDE 50

Encoder Utterance Decoder

[Dhingra et. al. 2017]

Utterance Encoder

Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?

Utterance Encoder

slide-51
SLIDE 51

Time Location Genre Utterance Decoder Utterance Encoder

Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?

Utterance Encoder

slide-52
SLIDE 52

Time Location Genre Utterance Decoder Utterance Encoder

Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?

Utterance Encoder Pros:

Can handle long distance dependencies Can handle subtly of meaning Cons: Requires some manual engineering Not yet practical

slide-53
SLIDE 53

Time Location Genre Utterance Decoder Utterance Encoder

Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?

Utterance Encoder

slide-54
SLIDE 54

Z Y X Utterance Decoder Utterance Encoder

Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?

Utterance Encoder

slide-55
SLIDE 55

Z Y X Utterance Decoder Utterance Encoder

Are there any action movies to see this weekend? Where would you like to go? How about the Capital Theater?

Utterance Encoder Question: How can we supervise these properties from naturally occurring data? Possible Answer: Vision

  • Focus on dialogs in a visual setting
  • The relevant entities in the dialog will appear visually as well
slide-56
SLIDE 56

Visual Latent Variables Attribute Latent Variables

….. ….. …..

Attribute Connectivity

Triangle Circle Purple Orange Entity Vectors Attribute Vectors Deconvolutional Network

….. …..

Per Entity Images Observ rved d Image ge Purple Triangle Orange Circle FC Layers FC Layers Natura ral Langu guage ge Descri cript ption

  • n

[Nash et. al. 2017]

slide-57
SLIDE 57

Visual sual Latent ent Variabl iables es: Attri ribut bute e latent ent variabl iables es 𝑨1 𝑨2 𝑨4 𝑨5 SimpleSh mpleShapes apes Datas aset: et:

slide-58
SLIDE 58

90% 72% 67% 50% 60% 70% 80% 90% 100% Generative Entity Networks CNN+RNN RNN

Accuracy

slide-59
SLIDE 59