Human-AI Collaboration for Neural Text Generation with - - PowerPoint PPT Presentation

human ai collaboration for neural text generation with
SMART_READER_LITE
LIVE PREVIEW

Human-AI Collaboration for Neural Text Generation with - - PowerPoint PPT Presentation

Human-AI Collaboration for Neural Text Generation with Interpretable Neural Networks Sebastian Gehrmann Thesis Defense Committee Members Barbara Grosz Sasha Rush Oct 18, 2019 Stuart Shieber This is Jesse, a journalist. Jesse


slide-1
SLIDE 1

Human-AI Collaboration 
 for Neural Text Generation
 with Interpretable Neural Networks

Sebastian Gehrmann


Thesis Defense

Oct 18, 2019

Committee Members
 Barbara Grosz Sasha Rush Stuart Shieber

slide-2
SLIDE 2

This is Jesse, a journalist. Jesse has a ton of work.

slide-3
SLIDE 3

Maybe AI can help reduce the workload? Introduce AI-reen, a text-generation model.

slide-4
SLIDE 4

Jesse could give some of the workload to AI-reen. Doing so, Jesse would give up her agency over that work.

slide-5
SLIDE 5

But AI-reen is biased and makes mistakes! Jesse still needs to provide oversight over its work.

slide-6
SLIDE 6

By collaborating with AI-reen, Jesse could gain
 the benefits of automation without losing her agency.

slide-7
SLIDE 7

Accepted solution

Explain Suggestion Update Suggestion Provide Feedback A c c e p t

Problem

slide-8
SLIDE 8

They want to collaboratively summarize a document.

Source

slide-9
SLIDE 9

Both have an idea how to summarize it.

slide-10
SLIDE 10

If AI-reen was human, it could communicate its reasoning. But its prediction are not interpretable. ???

slide-11
SLIDE 11

Even if it could explain its suggestion, 
 it can’t incorporate feedback from Jesse.

I picked this phrase, because… I don’t like it.

slide-12
SLIDE 12

Interpretability is necessary, but we also need controllability.

I picked this phrase, because… How about … instead?

slide-13
SLIDE 13

Let’s empower humans to collaborate with AI!

++

Summarization [EMNLP ’18]
 Data2Text [INLG ’18] Section Title Generation [NAACL ’19] TL;DR Generation [INLG ’19] Collaborative Semantic Inference [VAST ’19] Detecting Fake Text with GLTR [ACL Demo ’19] Automated Mediation [Behavior & Technology ’19] LSTMVis [InfoVis ’17] Phenotyping Saliency [PloS one, ’17] Seq2Seq-Vis [VAST ’18] Model Selection [DeepStruct ’19] Modeling Capacity [Formal Languages ’19]

slide-14
SLIDE 14

Outline

  • 1. Background: Sequence Modeling for NLP
  • 2. Incorporating Content Selection into a Summarization Model
  • 3. How to Understand Predictions?
  • 4. Collaborating with the Model to Summarize
slide-15
SLIDE 15

The small dog

  • wns

a yellow ball . ? ? ? ? ? ? ?

p(yt+1|y1, …, yt)

y1 y2 y3 y4 y5 y6 y7 y8

slide-16
SLIDE 16

The small dog

  • wns

a yellow ball .

[Elman ’90, Hochreiter & Schmidhuber ’97]

y1 y2 y3 y4 y5 y6 y7 y8

slide-17
SLIDE 17

The small dog

  • wns

a yellow ball .

[Elman ’90, Hochreiter & Schmidhuber ’97]

y1 y2 y3 y4 y5 y6 y7 y8

slide-18
SLIDE 18

The small dog

  • wns

a yellow ball .

large small child dog

p …

[Bengio ‘03]

y1 y2 y3 y4 y5 y6 y7 y8

slide-19
SLIDE 19

Der kleine Hund besitzt einen gelben Ball . y1 y2 y3 y4 y5 y6 y7 y8 Target Source x1 x2 x3 x4 x5 x6 x7 x8 The small dog

  • wns

a yellow ball .

p(yt+1|y1, …, yt) p(yt+1|y1, …, yt, x) p(next word|Der kleine, The small dog...) p(y3|y1, y2, x)

slide-20
SLIDE 20 The small dog
  • wns
a yellow ball . x1 x2 x3 x4 x5 x6 x7 x8

Der kleine y1 y2

Encoder Decoder

[Bahdanau et al. ’14, Sutskever et al. ’14]
slide-21
SLIDE 21 The small dog
  • wns
a yellow ball . x1 x2 x3 x4 x5 x6 x7 x8

Der kleine y1 y2

p(at|x, y1:t)

Attention Encoder Decoder

[Bahdanau et al. ’14, Sutskever et al. ’14]
slide-22
SLIDE 22 The small dog
  • wns
a yellow ball . x1 x2 x3 x4 x5 x6 x7 x8

Context

S

s=1

as

t xs

Der kleine y1 y2

Encoder Decoder

[Bahdanau et al. ’14, Sutskever et al. ’14]
slide-23
SLIDE 23 The small dog
  • wns
a yellow ball . x1 x2 x3 x4 x5 x6 x7 x8

Der kleine y1 y2

Encoder Decoder

das Hund Kind große

p …

[Bahdanau et al. ’14, Sutskever et al. ’14]
slide-24
SLIDE 24

Consider an abstractive summarization problem, with

x1, …, xS y1, …, yT p(y|x)

Input Summary Train a summarizer to maximize .

[Gehrmann, Deng, and Rush, EMNLP ’18]
slide-25
SLIDE 25 The small dog
  • wns
a yellow ball . x1 x2 x3 x4 x5 x6 x7 x8

Dog

  • wns

y1 y2

Encoder Decoder

p(at|x, y1:t)

Attention

The dog a ball

p …

[Vinyals et al. ’15, Filippova et al. ’15, Gu et al. ’16, See et al. ’17]
slide-26
SLIDE 26

The copy mechanism uses a binary soft switch 
 that determines whether the model copies or generates.

zt p(yt+1|x, y1:t) = +p( |x, y1:t) × p( |x, y1:t) ×

das Hund Kind große

p

slide-27
SLIDE 27

The copy mechanism uses a binary soft switch 
 that determines whether the model copies or generates.

zt

} } }

Standard model prediction Reusing p(at|x, y1:t)

σ(Wht + b)

p(yt+1|x, y1:t) = p(zt=1|x, y1:t) × p(yt+1|zt=1,x, y1:t) +p(zt=0|x, y1:t) × p(yt+1|zt=0,x, y1:t)

}

1 − σ(Wht + b)

slide-28
SLIDE 28

Just because a model can copy, should it?

slide-29
SLIDE 29

Summarizer Text Copy Mechanism

slide-30
SLIDE 30

Summarizer Text Copy Mechanism

slide-31
SLIDE 31

Summarizer Text Copy Mechanism Text Copy Mechanism

slide-32
SLIDE 32

Abstractive summarizers over-extract.

“Angela Merkel and her husband, chemistry professor Joachim Sauer, 
 are spotted on their annual easter trip
 to the island of ischia, near Naples. ”

slide-33
SLIDE 33

The model fails at content selection!

Consider the content selection as 
 word-level extractive summarization. Let denote a binary indicator 
 whether a source word is used in a summary.
 
 Train a model to maximize .

t1, …, tS p(t|x)

slide-34
SLIDE 34

How to generate supervised data?

The small dog owns a large yellow ball. 
 The big dog from next door chases the ball. Big dog chases small dog ’ s ball.

slide-35
SLIDE 35

The small dog owns a large yellow ball. 
 The big dog from next door chases the ball. Big dog chases small dog ’ s ball.

slide-36
SLIDE 36

The small dog owns a large yellow ball. 
 The big dog from next door chases the ball. Big dog chases small dog ’ s ball.

slide-37
SLIDE 37

The small dog owns a large yellow ball. 
 The big dog from next door chases the ball. Big dog chases small dog ’ s ball.

slide-38
SLIDE 38

The small dog owns a large yellow ball. 
 The big dog from next door chases the ball. Big dog chases small dog ’ s ball.

slide-39
SLIDE 39

The small dog owns a large yellow ball. 
 The big dog from next door chases the ball. Big dog chases small dog ’ s ball.

slide-40
SLIDE 40

The small dog owns a large yellow ball. 
 The big dog from next door chases the ball. Big dog chases small dog ’ s ball.

slide-41
SLIDE 41

The small dog owns a large yellow ball. 
 The big dog from next door chases the ball.

t

Content Selector Model based on ELMo

slide-42
SLIDE 42

Control copied content with Bottom-Up Attention by restricting what can be copied to important content.

Source Masked Source Summary Content Selection Bottom-Up Attention

slide-43
SLIDE 43

Control copied content with Bottom-Up Attention by restricting what can be copied to important content.

Let denote the selection probability from the content selector. Let denote an importance threshold. Modify the copy-attention such that

ϵ

qs

p( ˜ as

t |x, y1:t) = {

p(as

t |x, y1:t) qs > ϵ

  • w.
slide-44
SLIDE 44 The small dog
  • wns
a yellow ball . x1 x2 x3 x4 x5 x6 x7 x8

Dog

  • wns

y1 y2

Encoder Decoder

p( ˜ at|x, y1:t)

Bottom-Up Attention

The dog a ball

p …

slide-45
SLIDE 45

+2 ROUGE The improvements were consistent across two evaluated datasets.

slide-46
SLIDE 46

“Angela Merkel and her husband, chemistry professor Joachim Sauer, 
 are spotted on their annual easter trip
 to the island of ischia, near Naples. ” “Angela Merkel and her husband
 are spotted on their easter trip. ”

Without Bottom-Up With Bottom-Up

slide-47
SLIDE 47

There is still work to be done…

slide-48
SLIDE 48

Summarization models struggle in real-world scenarios! How do we make the generation of a summary collaborative?

slide-49
SLIDE 49

The Users of Interpretability and Collaboration

End User Trainer Architect

[Strobelt*, Gehrmann, et al,. InfoVis ’17]
slide-50
SLIDE 50

The Target of Interpretability and Collaboration

Decision Model

[Gehrmann*, Strobelt*, et al., VAST ’19]

̂ y θ

slide-51
SLIDE 51

The Coupling of Model and Interface

Interactive Collaboration Interactive Observation Passive Observation

(a) Passive Obervation (b) Interactive Obervation

x
  • (b) Interactive Obervation

(c) Interactive Collaboration

x
  • ABCDEF

(c) Interactive Collaboration

[Gehrmann*, Strobelt*, et al., VAST ’19]
slide-52
SLIDE 52

θ

(a) Passive Obervation (b) Interactive Obervation

[Wongsuphasawat et al,. VAST ’17]
slide-53
SLIDE 53

̂ y

x
  • (b) Interactive Obervation

(c) Interactive Collaboration

[Strobelt*, Gehrmann*, et al,. VAST ’18]
slide-54
SLIDE 54 3/27/2018 S2S Attention http://localhost:8080/client/index.html?in=wir%20wollen%20heute%20mal%20richtig%20spass%20haben%20. 1/3 Start entering some encoder sentence (enter triggers request)... wir wollen heute mal richtig spass haben . Enc words: Attention: topK: wir wollen heute mal richtig spass haben . we want to have really fun today . we want to have really fun today . now 're a be a enjoy now , so 'd really get fun enjoyed with to and 've that really some funny here 's i have the do quite enjoyable from with pivot change: word attn compare: sentence swap:
  • wir
wollen heute morgen mal richtig spass haben . we want to have really fun this morning . we want to have really fun this morning . so &apos;re really be fun enjoy that tomorrow , and &apos;d a get a enjoyed in day in now really that really some funny here next with this &apos;ve some do quite enjoyable with evening to compare <s> we now and so want &apos;re we we to going want have be really a some fun really fun really really fun today now fun fun today . . time time . . . wir wollen heute mal richtig spass haben . wir wollen heute morgen mal richtig spass haben . show: edges nodes wir wollen heute mal richtig spass haben . wir wollen heute morgen mal richtig spass haben . show: src tgt highlight: -1 +1 ich möchte ihnen heute morgen ein paar geschichten erzählen und über ein anderes afrika sprechen . <s> what i want to do this morning is share with you a couple of stories and talk about a different africa . </s> ich möchte heute morgen ein wenig darüber sprechen , was passiert , wenn wir uns von design in richtung eines design-thinking bewegen . <s> i &apos;d like to talk a little bit this morning about what happens if we move from design to design thinking . </s> über diese beiden aspekte werde ich heute morgen etwas berichten . <s> and i &apos;m going to say a few words about each one this morning . </s> mein name ist ursus wehrli , und ich möchte ihnen heute morgen gerne von meinem projekt , kunst aufräumen , erzählen . <s> my name is ursus wehrli , and i would like to talk to you this morning about my project , tidying up art . </s> eine neue theorie ist jetzt , und ihr habt sie bereits heute morgen von dr. insel gehört , dass psychische erkrankungen störungen der neuralen verbindungen sind , die einfluss auf gefühle , laune und <unk> haben . <s> now , an emerging view that you also heard about from dr. insel this morning , is that psychiatric disorders are actually disturbances of neural circuits that mediate emotion , mood and affect . </s> alle 30 sekunden stirbt irgendwo auf der welt ein kind an malaria und paul levy sprach heute morgen über die metapher von der <unk> , die in den vereinigten staaten abstürzt .

Collaborative Summarization?

̂ y

x
  • (b) Interactive Obervation
(c) Interactive Collaboration

θ

(a) Passive Obervation (b) Interactive Obervation [Gehrmann*, Strobelt*, et al., VAST ’19]
slide-55
SLIDE 55

Collaborative Summarization!

̂ y

x
  • ABCDEF

(c) Interactive Collaboration

slide-56
SLIDE 56

? ? ! forward backward ? ? ! backward

p(y|x) z

Train-Endpoint predictor Left/Right

p(z|x) p(z|x, y)

? ?

slide-57
SLIDE 57

? ? ! forward backward ? ? ! backward

p(y, z|x)

Train-Endpoint + Path predictor

p(z|x) p(z|x, y)

! !

slide-58
SLIDE 58

? ? ! forward backward

p(zleft = 0)

slide-59
SLIDE 59

Scientists at NASA are one step closer to understanding how much water could have existed on primeval Mars. These new findings also indicate how primitive water reservoirs there could have evolved over billions of years, indicating that early oceans on the Red Planet might have held more water than Earth's Arctic Ocean, NASA scientists reveal in a study published Friday in the journal Science.

slide-60
SLIDE 60 (a) (c) (b) (d) (d) (e) (f) (g) (g) (h)

p(z|x, y) : red p(y|x, z)

p( ˜ as

t |x, y1:t)

p(z) : blue

slide-61
SLIDE 61 (a) (c) (b) (d) (d) (e) (f) (g) (g) (h)

Blue: What do I want to use? Red: What has been summarized Where did it come from? The Summary

slide-62
SLIDE 62
slide-63
SLIDE 63
slide-64
SLIDE 64
slide-65
SLIDE 65
slide-66
SLIDE 66
slide-67
SLIDE 67

CSI can make models collaborative

Leveraging the underlying structure of a problem can lead to better neural models. We can add interpretable and controllable latent variables that follow this structure. Exposing the variables in an interface allows end users 
 to manipulate the model reasoning process. This allows end users to retain agency over the automated process.

slide-68
SLIDE 68

++

slide-69
SLIDE 69

++

TL;DR Generation Section Title 
 Generation

SELECTOR COMPRESSOR RANKER

Summarization

Source Masked Source Summary Content Selection Bottom-Up Attention

Data2Text

start_name end_name end_area centre city Eagle start_area ... <s> Near the city centre ... <s> Eagle is near the ... ...

Structured 
 Summarization

slide-70
SLIDE 70

++

SELECTOR COMPRESSOR RANKER Source Masked Source Summary Content Selection Bottom-Up Attention start_name end_name end_area centre city Eagle start_area ... <s> Near the city centre ... <s> Eagle is near the ... ...
slide-71
SLIDE 71

++

SELECTOR COMPRESSOR RANKER Source Masked Source Summary Content Selection Bottom-Up Attention start_name end_name end_area centre city Eagle start_area ... <s> Near the city centre ... <s> Eagle is near the ... ... (a) (c) (b) (d) (d) (e) (f) (g) (g) (h)
slide-72
SLIDE 72

++

SELECTOR COMPRESSOR RANKER Source Masked Source Summary Content Selection Bottom-Up Attention start_name end_name end_area centre city Eagle start_area ... <s> Near the city centre ... <s> Eagle is near the ... ... (a) (c) (b) (d) (d) (e) (f) (g) (g) (h)
slide-73
SLIDE 73

What are future opportunities?

slide-74
SLIDE 74

Language Model

Language Model

slide-75
SLIDE 75

NLP is exciting at the moment… and scary.

GPT-2 XL-Net NLU RoBERTa NLG

??? ???

slide-76
SLIDE 76

How do we use large LM’s for controlled generation?

GPT-2

Data2Text Abstractive Summarization Captioning

slide-77
SLIDE 77

How can we detect and prevent biases in learned representation that influence downstream tasks? Abstractive Summarization

slide-78
SLIDE 78

How do we evaluate generated natural language 
 at a time where the content, and not lexical overlap, matters? How can we measure its affect on humans?

slide-79
SLIDE 79

Acknowledgements <3

slide-80
SLIDE 80

Acknowledgements <3

Many thanks to…

  • My Twitter followers
  • All members of the Harvard HCI and NLP groups
  • My friends and roommates
  • My American family
  • My German family
slide-81
SLIDE 81

We can achieve Human-AI collaboration 
 through interpretable and controllable neural models.

Visual & Interaction Design Model Design b) collaborator (CSI) design process: informs Visual & Interaction Design Model Design informs informs