Human-AI Collaboration for Neural Text Generation with Interpretable Neural Networks
Sebastian Gehrmann
Thesis Defense
Oct 18, 2019
Committee Members Barbara Grosz Sasha Rush Stuart Shieber
Human-AI Collaboration for Neural Text Generation with - - PowerPoint PPT Presentation
Human-AI Collaboration for Neural Text Generation with Interpretable Neural Networks Sebastian Gehrmann Thesis Defense Committee Members Barbara Grosz Sasha Rush Oct 18, 2019 Stuart Shieber This is Jesse, a journalist. Jesse
Human-AI Collaboration for Neural Text Generation with Interpretable Neural Networks
Sebastian Gehrmann
Thesis Defense
Oct 18, 2019
Committee Members Barbara Grosz Sasha Rush Stuart Shieber
This is Jesse, a journalist. Jesse has a ton of work.
Maybe AI can help reduce the workload? Introduce AI-reen, a text-generation model.
Jesse could give some of the workload to AI-reen. Doing so, Jesse would give up her agency over that work.
But AI-reen is biased and makes mistakes! Jesse still needs to provide oversight over its work.
By collaborating with AI-reen, Jesse could gain the benefits of automation without losing her agency.
Accepted solution
Explain Suggestion Update Suggestion Provide Feedback A c c e p t
Problem
They want to collaboratively summarize a document.
Source
Both have an idea how to summarize it.
If AI-reen was human, it could communicate its reasoning. But its prediction are not interpretable. ???
Even if it could explain its suggestion, it can’t incorporate feedback from Jesse.
I picked this phrase, because… I don’t like it.
Interpretability is necessary, but we also need controllability.
I picked this phrase, because… How about … instead?
Let’s empower humans to collaborate with AI!
++
Summarization [EMNLP ’18] Data2Text [INLG ’18] Section Title Generation [NAACL ’19] TL;DR Generation [INLG ’19] Collaborative Semantic Inference [VAST ’19] Detecting Fake Text with GLTR [ACL Demo ’19] Automated Mediation [Behavior & Technology ’19] LSTMVis [InfoVis ’17] Phenotyping Saliency [PloS one, ’17] Seq2Seq-Vis [VAST ’18] Model Selection [DeepStruct ’19] Modeling Capacity [Formal Languages ’19]
Outline
The small dog
a yellow ball . ? ? ? ? ? ? ?
p(yt+1|y1, …, yt)
y1 y2 y3 y4 y5 y6 y7 y8
The small dog
a yellow ball .
[Elman ’90, Hochreiter & Schmidhuber ’97]y1 y2 y3 y4 y5 y6 y7 y8
The small dog
a yellow ball .
[Elman ’90, Hochreiter & Schmidhuber ’97]y1 y2 y3 y4 y5 y6 y7 y8
The small dog
a yellow ball .
large small child dog
p …
[Bengio ‘03]y1 y2 y3 y4 y5 y6 y7 y8
Der kleine Hund besitzt einen gelben Ball . y1 y2 y3 y4 y5 y6 y7 y8 Target Source x1 x2 x3 x4 x5 x6 x7 x8 The small dog
a yellow ball .
p(yt+1|y1, …, yt) p(yt+1|y1, …, yt, x) p(next word|Der kleine, The small dog...) p(y3|y1, y2, x)
Der kleine y1 y2
Encoder Decoder
[Bahdanau et al. ’14, Sutskever et al. ’14]Der kleine y1 y2
p(at|x, y1:t)
Attention Encoder Decoder
[Bahdanau et al. ’14, Sutskever et al. ’14]Context
S∑
s=1as
t xsDer kleine y1 y2
Encoder Decoder
[Bahdanau et al. ’14, Sutskever et al. ’14]Der kleine y1 y2
Encoder Decoder
das Hund Kind große
p …
[Bahdanau et al. ’14, Sutskever et al. ’14]Consider an abstractive summarization problem, with
x1, …, xS y1, …, yT p(y|x)
Input Summary Train a summarizer to maximize .
[Gehrmann, Deng, and Rush, EMNLP ’18]Dog
y1 y2
Encoder Decoder
p(at|x, y1:t)
Attention
The dog a ball
p …
[Vinyals et al. ’15, Filippova et al. ’15, Gu et al. ’16, See et al. ’17]The copy mechanism uses a binary soft switch that determines whether the model copies or generates.
zt p(yt+1|x, y1:t) = +p( |x, y1:t) × p( |x, y1:t) ×
das Hund Kind große
p
…
The copy mechanism uses a binary soft switch that determines whether the model copies or generates.
zt
Standard model prediction Reusing p(at|x, y1:t)
σ(Wht + b)
p(yt+1|x, y1:t) = p(zt=1|x, y1:t) × p(yt+1|zt=1,x, y1:t) +p(zt=0|x, y1:t) × p(yt+1|zt=0,x, y1:t)
1 − σ(Wht + b)
Just because a model can copy, should it?
Summarizer Text Copy Mechanism
Summarizer Text Copy Mechanism
Summarizer Text Copy Mechanism Text Copy Mechanism
Abstractive summarizers over-extract.
“Angela Merkel and her husband, chemistry professor Joachim Sauer, are spotted on their annual easter trip to the island of ischia, near Naples. ”
The model fails at content selection!
Consider the content selection as word-level extractive summarization. Let denote a binary indicator whether a source word is used in a summary. Train a model to maximize .
t1, …, tS p(t|x)
How to generate supervised data?
The small dog owns a large yellow ball. The big dog from next door chases the ball. Big dog chases small dog ’ s ball.
The small dog owns a large yellow ball. The big dog from next door chases the ball. Big dog chases small dog ’ s ball.
The small dog owns a large yellow ball. The big dog from next door chases the ball. Big dog chases small dog ’ s ball.
The small dog owns a large yellow ball. The big dog from next door chases the ball. Big dog chases small dog ’ s ball.
The small dog owns a large yellow ball. The big dog from next door chases the ball. Big dog chases small dog ’ s ball.
The small dog owns a large yellow ball. The big dog from next door chases the ball. Big dog chases small dog ’ s ball.
The small dog owns a large yellow ball. The big dog from next door chases the ball. Big dog chases small dog ’ s ball.
The small dog owns a large yellow ball. The big dog from next door chases the ball.
Content Selector Model based on ELMo
Control copied content with Bottom-Up Attention by restricting what can be copied to important content.
Source Masked Source Summary Content Selection Bottom-Up Attention
Control copied content with Bottom-Up Attention by restricting what can be copied to important content.
Let denote the selection probability from the content selector. Let denote an importance threshold. Modify the copy-attention such that
ϵ
qs
p( ˜ as
t |x, y1:t) = {
p(as
t |x, y1:t) qs > ϵ
Dog
y1 y2
Encoder Decoder
p( ˜ at|x, y1:t)
Bottom-Up Attention
The dog a ball
p …
+2 ROUGE The improvements were consistent across two evaluated datasets.
“Angela Merkel and her husband, chemistry professor Joachim Sauer, are spotted on their annual easter trip to the island of ischia, near Naples. ” “Angela Merkel and her husband are spotted on their easter trip. ”
Without Bottom-Up With Bottom-Up
There is still work to be done…
Summarization models struggle in real-world scenarios! How do we make the generation of a summary collaborative?
The Users of Interpretability and Collaboration
End User Trainer Architect
[Strobelt*, Gehrmann, et al,. InfoVis ’17]The Target of Interpretability and Collaboration
Decision Model
[Gehrmann*, Strobelt*, et al., VAST ’19]The Coupling of Model and Interface
Interactive Collaboration Interactive Observation Passive Observation
(a) Passive Obervation (b) Interactive Obervation
x(c) Interactive Collaboration
x(c) Interactive Collaboration
[Gehrmann*, Strobelt*, et al., VAST ’19](a) Passive Obervation (b) Interactive Obervation
[Wongsuphasawat et al,. VAST ’17](c) Interactive Collaboration
[Strobelt*, Gehrmann*, et al,. VAST ’18]Collaborative Summarization?
̂ y
xθ
(a) Passive Obervation (b) Interactive Obervation [Gehrmann*, Strobelt*, et al., VAST ’19]Collaborative Summarization!
(c) Interactive Collaboration
? ? ! forward backward ? ? ! backward
p(y|x) z
Train-Endpoint predictor Left/Right
p(z|x) p(z|x, y)
? ?
? ? ! forward backward ? ? ! backward
p(y, z|x)
Train-Endpoint + Path predictor
p(z|x) p(z|x, y)
! !
? ? ! forward backward
p(zleft = 0)
Scientists at NASA are one step closer to understanding how much water could have existed on primeval Mars. These new findings also indicate how primitive water reservoirs there could have evolved over billions of years, indicating that early oceans on the Red Planet might have held more water than Earth's Arctic Ocean, NASA scientists reveal in a study published Friday in the journal Science.
p(z|x, y) : red p(y|x, z)
p( ˜ as
t |x, y1:t)
p(z) : blue
Blue: What do I want to use? Red: What has been summarized Where did it come from? The Summary
CSI can make models collaborative
Leveraging the underlying structure of a problem can lead to better neural models. We can add interpretable and controllable latent variables that follow this structure. Exposing the variables in an interface allows end users to manipulate the model reasoning process. This allows end users to retain agency over the automated process.
++
++
TL;DR Generation Section Title Generation
SELECTOR COMPRESSOR RANKERSummarization
Source Masked Source Summary Content Selection Bottom-Up AttentionData2Text
start_name end_name end_area centre city Eagle start_area ... <s> Near the city centre ... <s> Eagle is near the ... ...Structured Summarization
++
SELECTOR COMPRESSOR RANKER Source Masked Source Summary Content Selection Bottom-Up Attention start_name end_name end_area centre city Eagle start_area ... <s> Near the city centre ... <s> Eagle is near the ... ...++
SELECTOR COMPRESSOR RANKER Source Masked Source Summary Content Selection Bottom-Up Attention start_name end_name end_area centre city Eagle start_area ... <s> Near the city centre ... <s> Eagle is near the ... ... (a) (c) (b) (d) (d) (e) (f) (g) (g) (h)++
SELECTOR COMPRESSOR RANKER Source Masked Source Summary Content Selection Bottom-Up Attention start_name end_name end_area centre city Eagle start_area ... <s> Near the city centre ... <s> Eagle is near the ... ... (a) (c) (b) (d) (d) (e) (f) (g) (g) (h)What are future opportunities?
Language Model
NLP is exciting at the moment… and scary.
GPT-2 XL-Net NLU RoBERTa NLG
??? ???
How do we use large LM’s for controlled generation?
GPT-2
Data2Text Abstractive Summarization Captioning
How can we detect and prevent biases in learned representation that influence downstream tasks? Abstractive Summarization
How do we evaluate generated natural language at a time where the content, and not lexical overlap, matters? How can we measure its affect on humans?
Acknowledgements <3
Acknowledgements <3
Many thanks to…
We can achieve Human-AI collaboration through interpretable and controllable neural models.
Visual & Interaction Design Model Design b) collaborator (CSI) design process: informs Visual & Interaction Design Model Design informs informs