[PPT] - Human-AI Collaboration for Neural Text Generation with PowerPoint Presentation

SLIDE 1

Human-AI Collaboration   for Neural Text Generation  with Interpretable Neural Networks

Sebastian Gehrmann 

Thesis Defense

Oct 18, 2019

Committee Members  Barbara Grosz Sasha Rush Stuart Shieber

SLIDE 2

This is Jesse, a journalist. Jesse has a ton of work.

SLIDE 3

Maybe AI can help reduce the workload? Introduce AI-reen, a text-generation model.

SLIDE 4

Jesse could give some of the workload to AI-reen. Doing so, Jesse would give up her agency over that work.

SLIDE 5

But AI-reen is biased and makes mistakes! Jesse still needs to provide oversight over its work.

SLIDE 6

By collaborating with AI-reen, Jesse could gain  the benefits of automation without losing her agency.

SLIDE 7

Accepted solution

Explain Suggestion Update Suggestion Provide Feedback A c c e p t

Problem

SLIDE 8

They want to collaboratively summarize a document.

Source

SLIDE 9

Both have an idea how to summarize it.

SLIDE 10

If AI-reen was human, it could communicate its reasoning. But its prediction are not interpretable. ???

SLIDE 11

Even if it could explain its suggestion,   it can’t incorporate feedback from Jesse.

I picked this phrase, because… I don’t like it.

SLIDE 12

Interpretability is necessary, but we also need controllability.

I picked this phrase, because… How about … instead?

SLIDE 13

Let’s empower humans to collaborate with AI!

++

Summarization [EMNLP ’18]  Data2Text [INLG ’18] Section Title Generation [NAACL ’19] TL;DR Generation [INLG ’19] Collaborative Semantic Inference [VAST ’19] Detecting Fake Text with GLTR [ACL Demo ’19] Automated Mediation [Behavior & Technology ’19] LSTMVis [InfoVis ’17] Phenotyping Saliency [PloS one, ’17] Seq2Seq-Vis [VAST ’18] Model Selection [DeepStruct ’19] Modeling Capacity [Formal Languages ’19]

SLIDE 14

Outline

1. Background: Sequence Modeling for NLP
2. Incorporating Content Selection into a Summarization Model
3. How to Understand Predictions?
4. Collaborating with the Model to Summarize

SLIDE 15

The small dog

wns

a yellow ball . ? ? ? ? ? ? ?

p(yt+1|y1, …, yt)

y1 y2 y3 y4 y5 y6 y7 y8

SLIDE 16

The small dog

wns

a yellow ball .

[Elman ’90, Hochreiter & Schmidhuber ’97]

y1 y2 y3 y4 y5 y6 y7 y8

SLIDE 17

The small dog

wns

a yellow ball .

[Elman ’90, Hochreiter & Schmidhuber ’97]

y1 y2 y3 y4 y5 y6 y7 y8

SLIDE 18

The small dog

wns

a yellow ball .

large small child dog

p …

[Bengio ‘03]

y1 y2 y3 y4 y5 y6 y7 y8

SLIDE 19

Der kleine Hund besitzt einen gelben Ball . y1 y2 y3 y4 y5 y6 y7 y8 Target Source x1 x2 x3 x4 x5 x6 x7 x8 The small dog

wns

a yellow ball .

p(yt+1|y1, …, yt) p(yt+1|y1, …, yt, x) p(next word|Der kleine, The small dog...) p(y3|y1, y2, x)

SLIDE 20 The small dog

wns

a yellow ball . x1 x2 x3 x4 x5 x6 x7 x8

Der kleine y1 y2

Encoder Decoder

[Bahdanau et al. ’14, Sutskever et al. ’14]

SLIDE 21 The small dog

wns

a yellow ball . x1 x2 x3 x4 x5 x6 x7 x8

Der kleine y1 y2

p(at|x, y1:t)

Attention Encoder Decoder

[Bahdanau et al. ’14, Sutskever et al. ’14]

SLIDE 22 The small dog

wns

a yellow ball . x1 x2 x3 x4 x5 x6 x7 x8

Context

S

∑

s=1

as

t xs

Der kleine y1 y2

Encoder Decoder

[Bahdanau et al. ’14, Sutskever et al. ’14]

SLIDE 23 The small dog

wns

a yellow ball . x1 x2 x3 x4 x5 x6 x7 x8

Der kleine y1 y2

Encoder Decoder

das Hund Kind große

p …

[Bahdanau et al. ’14, Sutskever et al. ’14]

SLIDE 24

Consider an abstractive summarization problem, with

x1, …, xS y1, …, yT p(y|x)

Input Summary Train a summarizer to maximize .

[Gehrmann, Deng, and Rush, EMNLP ’18]

SLIDE 25 The small dog

wns

a yellow ball . x1 x2 x3 x4 x5 x6 x7 x8

Dog

wns

y1 y2

Encoder Decoder

p(at|x, y1:t)

Attention

The dog a ball

p …

[Vinyals et al. ’15, Filippova et al. ’15, Gu et al. ’16, See et al. ’17]

SLIDE 26

The copy mechanism uses a binary soft switch   that determines whether the model copies or generates.

zt p(yt+1|x, y1:t) = +p( |x, y1:t) × p( |x, y1:t) ×

das Hund Kind große

p

…

SLIDE 27

The copy mechanism uses a binary soft switch   that determines whether the model copies or generates.

zt

} } }

Standard model prediction Reusing p(at|x, y1:t)

σ(Wht + b)

}

1 − σ(Wht + b)

SLIDE 28

Just because a model can copy, should it?

SLIDE 29

Summarizer Text Copy Mechanism

SLIDE 30

Summarizer Text Copy Mechanism

SLIDE 31

Summarizer Text Copy Mechanism Text Copy Mechanism

SLIDE 32

Abstractive summarizers over-extract.

“Angela Merkel and her husband, chemistry professor Joachim Sauer,   are spotted on their annual easter trip  to the island of ischia, near Naples. ”

SLIDE 33

The model fails at content selection!

Consider the content selection as   word-level extractive summarization. Let denote a binary indicator   whether a source word is used in a summary.    Train a model to maximize .

t1, …, tS p(t|x)

SLIDE 34

How to generate supervised data?