Whats so Hard about Natural Language Understanding? Alan Ritter - - PowerPoint PPT Presentation

what s so hard about natural language understanding
SMART_READER_LITE
LIVE PREVIEW

Whats so Hard about Natural Language Understanding? Alan Ritter - - PowerPoint PPT Presentation

Whats so Hard about Natural Language Understanding? Alan Ritter Computer Science and Engineering The Ohio State University Collaborators: Jiwei Li, Dan Jurafsky (Stanford) Bill Dolan, Michel Galley, Jianfeng Gao (MSR), Colin Cherry


slide-1
SLIDE 1

What’s so Hard about Natural Language Understanding?

Alan Ritter Computer Science and Engineering The Ohio State University

Collaborators: Jiwei Li, Dan Jurafsky (Stanford) Bill Dolan, Michel Galley, Jianfeng Gao (MSR), Colin Cherry (Google) Jeniya Tabassum (Ohio State), Alexander Konovalov (Ohio State), Wei Xu (Ohio State) Brendan O’Connor (Umass)

slide-2
SLIDE 2

What’s so Hard about Natural Language Understanding?

Alan Ritter Computer Science and Engineering The Ohio State University

Collaborators: Jiwei Li, Dan Jurafsky (Stanford) Bill Dolan, Michel Galley, Jianfeng Gao (MSR), Colin Cherry (Google) Jeniya Tabassum (Ohio State), Alexander Konovalov (Ohio State), Wei Xu (Ohio State) Brendan O’Connor (Umass)

slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5

Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe.

slide-6
SLIDE 6
  • Web-scale Conversations?
  • Web-scale Structured Data?

Q: Large, End-to-End Datasets for NLU?

Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe.

slide-7
SLIDE 7
  • Web-scale Conversations?
  • Web-scale Structured Data?

Q: Large, End-to-End Datasets for NLU?

Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe.

slide-8
SLIDE 8

Data-Driven Conversation

6

  • Twitter: ~ 500 Million

Public SMS-Style Conversations per Month

  • Goal: Learn

conversational agents directly from massive volumes of data.

slide-9
SLIDE 9

Data-Driven Conversation

6

  • Twitter: ~ 500 Million

Public SMS-Style Conversations per Month

  • Goal: Learn

conversational agents directly from massive volumes of data.

slide-10
SLIDE 10

Noisy Channel Model

7

[Ritter, Cherry, Dolan EMNLP 2011]

Who wants to come over for dinner tomorrow? Input:

slide-11
SLIDE 11

Noisy Channel Model

7

[Ritter, Cherry, Dolan EMNLP 2011]

Who wants to come over for dinner tomorrow? Input: Output: Yum ! I

{

slide-12
SLIDE 12

Noisy Channel Model

7

[Ritter, Cherry, Dolan EMNLP 2011]

Who wants to come over for dinner tomorrow? Input: Output:

{

want to Yum ! I

{

slide-13
SLIDE 13

Noisy Channel Model

7

[Ritter, Cherry, Dolan EMNLP 2011]

Who wants to come over for dinner tomorrow? Input: Output:

{

want to Yum ! I

{

be there

{

slide-14
SLIDE 14

Noisy Channel Model

7

[Ritter, Cherry, Dolan EMNLP 2011]

Who wants to come over for dinner tomorrow? Input: Output:

{

want to Yum ! I

{

be there

{

tomorrow !

{

slide-15
SLIDE 15

Neural Conversation

8

[Sordoni et. al. 2015] [Xu et. al. 2016] [Wen et. al. 2016] [Li et. al. 2016] [Kannan et. al. 2016] [Serban et. al. 2016]

slide-16
SLIDE 16

Neural Conversation

8

[Sordoni et. al. 2015] [Xu et. al. 2016] [Wen et. al. 2016] [Li et. al. 2016] [Kannan et. al. 2016] [Serban et. al. 2016]

slide-17
SLIDE 17

9

How old are you? Slide Credit: Jiwei Li

slide-18
SLIDE 18

10

How old are you? i 'm 16 . Slide Credit: Jiwei Li

slide-19
SLIDE 19

11

How old are you? i 'm 16 . 16 ? Slide Credit: Jiwei Li

slide-20
SLIDE 20

12

How old are you? i 'm 16 . 16 ? i don 't know what you 're talking about Slide Credit: Jiwei Li

slide-21
SLIDE 21

13

How old are you? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying Slide Credit: Jiwei Li

slide-22
SLIDE 22

How old are you? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about

Slide Credit: Jiwei Li

slide-23
SLIDE 23

How old are you? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about

Slide Credit: Jiwei Li

Bad Action

slide-24
SLIDE 24

How old are you? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about

Slide Credit: Jiwei Li

Bad Action Outcome

slide-25
SLIDE 25

how

  • ld

are you

Encoding

State

Deep Reinforcement Learning

[Li, Monroe, Ritter, Galley, Gao, Jurafsky EMNLP 2016]

How old are you?

slide-26
SLIDE 26

I’m 16 . EOS

Decoding

EOS I’m 16 . how

  • ld

are you

Encoding

Deep Reinforcement Learning

Action

[Li, Monroe, Ritter, Galley, Gao, Jurafsky EMNLP 2016]

How old are you? i 'm 16 .

slide-27
SLIDE 27

I’m 16 . EOS

Decoding

EOS I’m 16 . how

  • ld

are you

Encoding

REINFORCE Algorithm (Williams,1992)

What we want to learn

Learning: Policy Gradient

Action

How old are you? i 'm 16 .

slide-28
SLIDE 28

Q: Rewards?

slide-29
SLIDE 29

Q: Rewards?

A: Turing Test

slide-30
SLIDE 30

Q: Rewards?

A: Turing Test Adversarial Learning (Goodfellow et al., 2014)

slide-31
SLIDE 31

Adversarial Learning for Neural Dialogue

Real-world conversations Response Generator generate response sample human response Discriminator Real or Fake?

[Li, Monroe, Shi, Jean, Ritter, Jurafsky EMNLP 2016]

slide-32
SLIDE 32

Adversarial Learning for Neural Dialogue

Real-world conversations Response Generator generate response sample human response Discriminator Real or Fake?

(Alternate Between Training Generator and Discriminator)

[Li, Monroe, Shi, Jean, Ritter, Jurafsky EMNLP 2016]

slide-33
SLIDE 33

Adversarial Learning for Neural Dialogue

Real-world conversations Response Generator generate response sample human response Discriminator Real or Fake?

(Alternate Between Training Generator and Discriminator) REINFORCE Algorithm (Williams,1992)

[Li, Monroe, Shi, Jean, Ritter, Jurafsky EMNLP 2016]

slide-34
SLIDE 34

Adversarial Learning Improves Response Generation

Human Evaluator: Machine Evaluator: Adversarial Success (How often can you fool a machine)

Adversarial Learning 8.0% Standard Seq2Seq model 4.9% Adversarial Win Adversarial Lose Tie 62% 18% 20%

vs vanilla generation model

Slide Credit: Jiwei Li

[Bowman et. al. 2016]

slide-35
SLIDE 35
  • Web-scale Conversations?
  • Web-scale Structured Data?

Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe.

Q: Large, End-to-End Datasets for NLU?

slide-36
SLIDE 36
  • Web-scale Conversations?
  • Web-scale Structured Data?

Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe.

Generates fluent open domain replies

Q: Large, End-to-End Datasets for NLU?

slide-37
SLIDE 37
  • Web-scale Conversations?
  • Web-scale Structured Data?

Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe.

Generates fluent open domain replies Really Natural Language Understanding?

Q: Large, End-to-End Datasets for NLU?

slide-38
SLIDE 38
  • Web-scale Conversations?
  • Web-scale Structured Data?

Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe.

Generates fluent open domain replies Really Natural Language Understanding?

Q: Large, End-to-End Datasets for NLU?

slide-39
SLIDE 39

Learning from Distant Supervision

3) Time Normalization 4) Event Extraction

Challenge: diversity in noisy text Challenge: lack of negative examples

[Tabassum, Ritter, Xu, EMNLP 2016] [Ritter, et. al. WWW 2015]

O(θ) =

N

X

i

log pθ(yi|xi) | {z }

Log Likelihood

− λUD(˜ p||ˆ punlabeled

θ

) | {z }

Label regularization

1) Named Entity Recognition

Challenge: highly ambiguous labels

[Ritter, et. al. EMNLP 2011]

2) Relation Extraction

Challenge: missing data

[Ritter, et. al. TACL 2013] [Konovalov, et. al. WWW 2017]

[Mintz et. al. 2009]

slide-40
SLIDE 40

Learning from Distant Supervision

3) Time Normalization 4) Event Extraction

Challenge: diversity in noisy text Challenge: lack of negative examples

[Tabassum, Ritter, Xu, EMNLP 2016] [Ritter, et. al. WWW 2015]

O(θ) =

N

X

i

log pθ(yi|xi) | {z }

Log Likelihood

− λUD(˜ p||ˆ punlabeled

θ

) | {z }

Label regularization

1) Named Entity Recognition

Challenge: highly ambiguous labels

[Ritter, et. al. EMNLP 2011]

2) Relation Extraction

Challenge: missing data

[Ritter, et. al. TACL 2013] [Konovalov, et. al. WWW 2017]

[Mintz et. al. 2009]

slide-41
SLIDE 41

Time Normalization

[Tabassum, Ritter, Xu EMNLP 2016]

1 Jan 2016

State-of- the-art time resolvers TempEX HeidelTime SUTime UWTime

{ }

slide-42
SLIDE 42

Time Normalization

Distant Supervision 
 (no human labels or rules!)

[Tabassum, Ritter, Xu EMNLP 2016]

1 Jan 2016

State-of- the-art time resolvers TempEX HeidelTime SUTime UWTime

{ }

slide-43
SLIDE 43

Mercury Transit May 9,2016

Distant Supervision Assumption

slide-44
SLIDE 44

Mercury Transit May 9,2016

Distant Supervision Assumption

slide-45
SLIDE 45

9 May 10 May 8 May

Distant Supervision Assumption

Mercury Transit May 9,2016

slide-46
SLIDE 46

9 May 10 May 8 May

Distant Supervision Assumption

Mercury Transit May 9,2016

slide-47
SLIDE 47

9 May 10 May 8 May

Distant Supervision Assumption

Mercury Transit May 9,2016

slide-48
SLIDE 48

9 May 10 May 8 May

Distant Supervision Assumption

Mercury Transit May 9,2016

slide-49
SLIDE 49

9 May 10 May 8 May

Distant Supervision Assumption

Mercury Transit May 9,2016

slide-50
SLIDE 50

9 May 10 May 8 May

Distant Supervision Assumption

Mercury Transit May 9,2016

slide-51
SLIDE 51

9 May 10 May 8 May

Distant Supervision Assumption

Mercury Transit May 9,2016

slide-52
SLIDE 52

9 May 10 May 8 May

Distant Supervision Assumption

Mercury Transit May 9,2016

slide-53
SLIDE 53

… w1 w2 w3 wn

[ Mercury, 5/9/2016 ] Words

[Event Database]

Sentence Level Tags

t1 t2 t3 t4

1 31 Mon Sun

1 12

Past Present Future

… … …

Multiple Instance Learning Tagger

slide-54
SLIDE 54

… w1 w2 w3 wn

[ Mercury, 5/9/2016 ] Words

[Event Database]

Sentence Level Tags

t1 t2 t3 t4

1 31 Mon Sun

1 12

Past Present Future

… … …

… z1 z2 z3 zn

Word Level Tags Local Classifier

exp(θ · f(wi, zi))

Multiple Instance Learning Tagger

slide-55
SLIDE 55

Deterministic OR

… w1 w2 w3 wn

[ Mercury, 5/9/2016 ] Words

[Event Database]

Sentence Level Tags

t1 t2 t3 t4

1 31 Mon Sun

1 12

Past Present Future

… … …

… z1 z2 z3 zn

Word Level Tags Local Classifier

exp(θ · f(wi, zi))

[Hoffmann et. al. 2011]

Multiple Instance Learning Tagger

slide-56
SLIDE 56

Deterministic OR

… w1 w2 w3 wn

[ Mercury, 5/9/2016 ]

Maximize Conditional Likelihood:

X

z

P(z, t|w, θ)

Words

[Event Database]

Sentence Level Tags

t1 t2 t3 t4

1 31 Mon Sun

1 12

Past Present Future

… … …

… z1 z2 z3 zn

Word Level Tags Local Classifier

exp(θ · f(wi, zi))

[Hoffmann et. al. 2011]

Multiple Instance Learning Tagger

slide-57
SLIDE 57

Sentence Level Tags:

TL = Future MOY= May DOM=9 DOW= Mon

Missing Data Problem

slide-58
SLIDE 58

Aggregated Sentence Level Tags

… w1 w2 w3 wn … z1 z2 z3 zn t1 t2 t3 t4

[Event Database]

Missing Data Extension

slide-59
SLIDE 59

… w1 w2 w3 wn … z1 z2 z3 zn

[Event Database]

t0

1

t0

2

t0

3

t0

4

m4 m3 m2 m1

Missing Data Problem In Distant Supervision [Ritter, et. al. TACL 2013]

Missing Data Extension

slide-60
SLIDE 60

Mentioned in Text

… w1 w2 w3 wn … z1 z2 z3 zn

[Event Database]

t0

1

t0

2

t0

3

t0

4

m4 m3 m2 m1

Missing Data Problem In Distant Supervision [Ritter, et. al. TACL 2013]

Missing Data Extension

slide-61
SLIDE 61

Mentioned in Text Implied by Event Date

… w1 w2 w3 wn … z1 z2 z3 zn

[Event Database]

t0

1

t0

2

t0

3

t0

4

m4 m3 m2 m1

Missing Data Problem In Distant Supervision [Ritter, et. al. TACL 2013]

Missing Data Extension

slide-62
SLIDE 62

Mentioned in Text Encourage Agreement Implied by Event Date

… w1 w2 w3 wn … z1 z2 z3 zn

[Event Database]

t0

1

t0

2

t0

3

t0

4

m4 m3 m2 m1

Missing Data Problem In Distant Supervision [Ritter, et. al. TACL 2013]

Missing Data Extension

slide-63
SLIDE 63

Example Tags

Word Im Hella excited for tomorrow Tag NA NA Future NA Future Word Thnks for a Christmas party

  • n

fri Tag NA NA NA December NA NA Friday

slide-64
SLIDE 64

Evaluation

slide-65
SLIDE 65

Evaluation

17% increase in F- score over SUTime

slide-66
SLIDE 66
slide-67
SLIDE 67

Where can we find NLU? Follow the data!

slide-68
SLIDE 68

Where can we find NLU? Follow the data!

slide-69
SLIDE 69

Where can we find NLU? Follow the data! Opportunistically Gathered Data:

  • Twitter Events (Time Normalization)
  • Billions of Internet Conversations
slide-70
SLIDE 70

Where can we find NLU? Follow the data! Opportunistically Gathered Data:

  • Twitter Events (Time Normalization)
  • Billions of Internet Conversations

Design Models for the Data

(rather than the other way around)

slide-71
SLIDE 71

Where can we find NLU? Follow the data! Opportunistically Gathered Data:

  • Twitter Events (Time Normalization)
  • Billions of Internet Conversations

Design Models for the Data

(rather than the other way around)

Thank You!