Building Adaptable and Scalable Natural Language Generation Systems - - PowerPoint PPT Presentation

building adaptable and scalable natural language
SMART_READER_LITE
LIVE PREVIEW

Building Adaptable and Scalable Natural Language Generation Systems - - PowerPoint PPT Presentation

Building Adaptable and Scalable Natural Language Generation Systems Yannis Konstas Natural Language Generation is everywhere (Machine Translation)


slide-1
SLIDE 1

Building Adaptable and Scalable Natural Language Generation Systems

Yannis Konstas

slide-2
SLIDE 2

Human:

Natural Language Generation is everywhere

(Machine Translation)

Input:

Ο πρόεδρος των ΗΠΑ Ντόναλντ Τραμπ γνωστοποίησε ότι δεν θα πάει στο ετήσιο δείπνο της Ένωσης Ανταποκριτών Λευκού Οίκου (WHCA) στα τέλη του Απριλίου. The president of the United States Donald Trump announced that he would not go to the annual dinner

  • f the White House Correspondents' Association (WHCA)

in late April.

slide-3
SLIDE 3

Human:

Natural Language Generation is everywhere

(Machine Translation)

Input:

Ο πρόεδρος των ΗΠΑ Ντόναλντ Τραμπ γνωστοποίησε ότι δεν θα πάει στο ετήσιο δείπνο της Ένωσης Ανταποκριτών Λευκού Οίκου (WHCA) στα τέλη του Απριλίου. The president of the United States Donald Trump announced that he would not go to the annual dinner

  • f the White House Correspondents' Association (WHCA)

in late April. The US president Donald Trump announced that it would

go to the annual dinner of White House Correspondents

Union (WHCA) in late April.

slide-4
SLIDE 4

Natural Language Generation is everywhere

(Dialogue Systems)

What is the weather going to be like in Sitka What is the weather going to be like in Chicago

slide-5
SLIDE 5

Natural Language Generation is everywhere

(Dialogue Systems)

What is the weather going to be like in Sitka What is the weather going to be like in Chicago No I meant Chicago

slide-6
SLIDE 6

Natural Language Generation is everywhere

(Dialogue Systems)

What is the weather going to be like in Sitka What is the weather going to be like in Chicago No I meant Chicago How about on Tuesday

slide-7
SLIDE 7

Natural Language Generation is everywhere

(Conversational Agents) …or when things get too emotional

slide-8
SLIDE 8

(Harsley et al., CSCW 2016)

Natural Language Generation is everywhere

(Educational Technology)

slide-9
SLIDE 9

(Krause et al., CVPR 2017)

Natural Language Generation is everywhere

(Caption Generation)

A man swinging a bat.

slide-10
SLIDE 10

(Krause et al., CVPR 2017)

Natural Language Generation is everywhere

(Caption Generation)

A baseball player is swinging a bat. He is wearing a red helmet and a white shirt. The catcher’s mitt is behind the batter. A man swinging a bat.

slide-11
SLIDE 11

Concept-to-Text Text Summarization Machine Translation Dialogue Systems Conversational Agents Code to Language Storytelling Captions Instructional Text Educational Technology Meaning Representations Human-Robot Interaction

slide-12
SLIDE 12

Concept-to-Text Text Summarization Machine Translation Dialogue Systems Conversational Agents Code to Language Storytelling Captions Instructional Text Educational Technology Meaning Representations Human-Robot Interaction

slide-13
SLIDE 13

Concept-to-Text Text Summarization Machine Translation Dialogue Systems Conversational Agents Code to Language Storytelling Captions Instructional Text Educational Technology Meaning Representations Human-Robot Interaction

slide-14
SLIDE 14

Concept-to-Text Text Summarization Machine Translation Dialogue Systems Conversational Agents Code to Language Storytelling Captions Instructional Text Educational Technology Meaning Representations Human-Robot Interaction

slide-15
SLIDE 15

Concept-to-Text Text Summarization Machine Translation Dialogue Systems Conversational Agents Code to Language Storytelling Captions Instructional Text Educational Technology Meaning Representations Human-Robot Interaction

slide-16
SLIDE 16

Natural Language Generation

  • Input: Computer-interpretable representation of the world
  • Select content
  • Organize content in particular order
  • Decide how to verbalise content
  • Output: Text

Input Text

slide-17
SLIDE 17

know I planet lazy inhabit man

min mean max mod e wind 10 15 20 dir W temp 50 60 72 gust 5 10 13

public int TextWidth (string text) { TextBlock t = new TextBlock(); t.Text = text; return (int) Math.Ceiling(t.ActualWidth); }

20x + 5y = γ

High quality source code is often paired with high level summaries of the computation it performs, for example in code documentation or in descriptions posted in online forums.

Machine Translation Concept-to-Text Human-Robot Interaction Educational Technology Meaning Representations Code to Language

slide-18
SLIDE 18

know I planet lazy inhabit man

min mean max mod e wind 10 15 20 dir W temp 50 60 72 gust 5 10 13

public int TextWidth (string text) { TextBlock t = new TextBlock(); t.Text = text; return (int) Math.Ceiling(t.ActualWidth); }

20x + 5y = γ

Place the heineken block west

  • f the mercedes block.

Overcast, with a high of 70. Moderate westerly winds, with gusts as high as 13 mph. I know the planet is inhabited by a lazy man. Tammy bought 20 apples and 5 oranges. How many fruits does she have now? Get rendered width of string rounded up to the nearest integer.

High quality source code is often paired with high level summaries of the computation it performs, for example in code documentation or in descriptions posted in online forums.

⾼髙品質のソースコードは、コードドキュメ ントやオンラインフォーラムに掲載された 説明など、実⾏行降する計算のハイレベルの要 約と対になることがよくあります。

Machine Translation Concept-to-Text Human-Robot Interaction Educational Technology Meaning Representations Code to Language

slide-19
SLIDE 19

Existing Approaches

  • Rule-based frameworks
  • Modular architecture

Successes

slide-20
SLIDE 20

Existing Approaches

  • Rule-based frameworks
  • Modular architecture

Challenges

  • Expensive to build
  • Hard to deploy to new applications

Successes

slide-21
SLIDE 21

Data-driven NLG

  • Learn generation process directly from data
  • Easier to build and maintain
  • Adapt to multiple domains
slide-22
SLIDE 22

Data-driven NLG

  • Learn generation process directly from data
  • Easier to build and maintain
  • Adapt to multiple domains

Challenges

  • Require large corpora - NLG is low-resourced
  • New machine learning model for every application
slide-23
SLIDE 23

Outline

  • Neural Network architecture for NLG
  • Learn from different inputs
slide-24
SLIDE 24

Outline

  • Neural Network architecture for NLG
  • Learn from different inputs
  • Address low-resource problem
  • Generic framework for scaling to

large corpora without extra annotation

  • Collect large datasets from community-based platform
slide-25
SLIDE 25

Outline

  • Neural Network architecture for NLG
  • Learn from different inputs
  • Address low-resource problem
  • Generic framework for scaling to

large corpora without extra annotation

  • Collect large datasets from community-based platform
  • Adapt to two applications
  • Meaning Representations
  • Code to Language
slide-26
SLIDE 26

Neural NLG

Joint work with Srinivasan Iyer, Mark Yatskar Luke Zettlemoyer, Yejin Choi

slide-27
SLIDE 27

Overview

  • Sequence to sequence architecture
  • End-to-end model w/o intermediate representations
  • Linearisation of input to string
  • Pre-process
  • Paired Training
  • Scalable data augmentation
slide-28
SLIDE 28

Meaning Representations

(Flanigan et al, NAACL 2016, Pourdamaghani and Knight, INLG 2016, Song et al, EMNLP 2016)

Input: Graph Structure

(Abstract Meaning Representation - AMR; Banarescu et al., 2013)

know I planet lazy ARG0 ARG1 inhabit man ARG1-of ARG0 mod

I knew a planet that was inhabited by a lazy man. I have known a planet that was inhabited by a lazy man. I know a planet. It is inhabited by a lazy man.

(Konstas, Iyer, Yatskar, Choi, Zettlemoyer, ACL 2017, to Appear)

slide-29
SLIDE 29

Meaning Representations

(Flanigan et al, NAACL 2016, Pourdamaghani and Knight, INLG 2016, Song et al, EMNLP 2016)

Input: Graph Structure

(Abstract Meaning Representation - AMR; Banarescu et al., 2013)

know I planet lazy ARG0 ARG1 inhabit man ARG1-of ARG0 mod

I knew a planet that was inhabited by a lazy man. I have known a planet that was inhabited by a lazy man. I know a planet. It is inhabited by a lazy man.

(Konstas, Iyer, Yatskar, Choi, Zettlemoyer, ACL 2017, to Appear)

know

slide-30
SLIDE 30

Meaning Representations

(Flanigan et al, NAACL 2016, Pourdamaghani and Knight, INLG 2016, Song et al, EMNLP 2016)

Input: Graph Structure

(Abstract Meaning Representation - AMR; Banarescu et al., 2013)

know I planet lazy ARG0 ARG1 inhabit man ARG1-of ARG0 mod

I knew a planet that was inhabited by a lazy man. I have known a planet that was inhabited by a lazy man. I know a planet. It is inhabited by a lazy man.

(Konstas, Iyer, Yatskar, Choi, Zettlemoyer, ACL 2017, to Appear)

know I

slide-31
SLIDE 31

Meaning Representations

(Flanigan et al, NAACL 2016, Pourdamaghani and Knight, INLG 2016, Song et al, EMNLP 2016)

Input: Graph Structure

(Abstract Meaning Representation - AMR; Banarescu et al., 2013)

know I planet lazy ARG0 ARG1 inhabit man ARG1-of ARG0 mod

I knew a planet that was inhabited by a lazy man. I have known a planet that was inhabited by a lazy man. I know a planet. It is inhabited by a lazy man.

(Konstas, Iyer, Yatskar, Choi, Zettlemoyer, ACL 2017, to Appear)

know I planet

slide-32
SLIDE 32

Sequence to sequence model

Encoder

input

slide-33
SLIDE 33

Sequence to sequence model

Encoder Decoder

input

  • utput
I The A … know knew planet … a planet man …

inhabit inhabited was …

ˆ w = argmax

w

Y

i

p

  • wi|w<i, h(s)
slide-34
SLIDE 34

Sequence to sequence model

Attention Encoder Decoder

input

  • utput
I The A … know knew planet … a planet man …

inhabit inhabited was … know ARG0 I ARG1 ( planet ARG1-of inhabit

<s> I know the planet

  • f

ˆ w = argmax

w

Y

i

p

  • wi|w<i, h(s)
slide-35
SLIDE 35

Linearization

Graph —> Depth First Search

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

slide-36
SLIDE 36

Linearization

Graph —> Depth First Search

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

slide-37
SLIDE 37

Encoding

Linearize —> RNN encoding

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

slide-38
SLIDE 38

Encoding

Linearize —> RNN encoding

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

slide-39
SLIDE 39

Encoding

Linearize —> RNN encoding

hold ARG0 ( person ARG0-of

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

  • Token embeddings
slide-40
SLIDE 40

Encoding

Linearize —> RNN encoding

hold ARG0 ( person ARG0-of

h1(s) h2(s) h3(s) h4(s) h5(s)

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

  • Token embeddings
  • Recurrent Neural Network (RNN)
slide-41
SLIDE 41

Encoding

Linearize —> RNN encoding

hold ARG0 ( person ARG0-of

h1(s) h2(s) h3(s) h4(s) h5(s)

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

  • Token embeddings
  • Recurrent Neural Network (RNN)
  • Bi-directional RNN
slide-42
SLIDE 42

Encoding

Linearize —> RNN encoding

hold ARG0 ( person ARG0-of

h1(s) h2(s) h3(s) h4(s) h5(s)

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s)

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

  • Token embeddings
  • Recurrent Neural Network (RNN)
  • Bi-directional RNN
slide-43
SLIDE 43

Encoding

Linearize —> RNN encoding

hold ARG0 ( person ARG0-of

h1(s) h2(s) h3(s) h4(s) h5(s)

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s)

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

  • Token embeddings
  • Recurrent Neural Network (RNN)
  • Bi-directional RNN
slide-44
SLIDE 44

Decoding

h1 hN(s)

RNN Encoding —> RNN Decoding (Beam search)

slide-45
SLIDE 45

Decoding

h1 hN(s)

RNN Encoding —> RNN Decoding (Beam search)

  • init h(s)
slide-46
SLIDE 46

Decoding

h1 hN(s)

Holding Held US …

RNN Encoding —> RNN Decoding (Beam search)

  • softmax
  • init h(s)
slide-47
SLIDE 47

Decoding

h1 hN(s)

Holding Held US …

h2

a the meeting …

w11:Holding

Helds

w12:

Hold

w13:

US

w14:

RNN Encoding —> RNN Decoding (Beam search)

  • softmax
  • p
  • wi|w<i, h(s)
  • init h(s)
slide-48
SLIDE 48

Decoding

h1 hN(s)

Holding Held US …

h2

a the meeting …

h3

US person expert …

w11:Holding

Helds

w12:

Hold

w13:

US

w14:

… Hold a

w21:

Hold the

w22:

Held a

w23:

Held the

w24:

RNN Encoding —> RNN Decoding (Beam search)

  • softmax
  • p
  • wi|w<i, h(s)
  • init h(s)
slide-49
SLIDE 49

Decoding

h1 hN(s)

Holding Held US …

h2

a the meeting …

h3

US person expert …

hk

w11:Holding

Helds

w12:

Hold

w13:

US

w14:

… Hold a

w21:

Hold the

w22:

Held a

w23:

Held the

w24: wk1: The

US

  • fficials held

wk2:

US

  • fficials held a

wk3:

US

  • fficials hold the

wk4:

US

  • fficials will hold a

meeting meetings meet …

RNN Encoding —> RNN Decoding (Beam search)

  • softmax
  • p
  • wi|w<i, h(s)
  • init h(s)
slide-50
SLIDE 50

Attention

h2 h3

a the meeting …

w2: held

slide-51
SLIDE 51

Attention

h3

a the meeting …

w2: held

hold ARG0 ( person ARG0-of

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s) c3

[ ]

slide-52
SLIDE 52

Attention

h3

a the meeting …

w2: held

hold ARG0 ( person ARG0-of

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s) c3

[ ]

ci = X

i

aijh(s)

j

ai = soft max

  • fi
  • h(s), hi
slide-53
SLIDE 53

Attention

hold ARG0 ( person role US

  • fficial ) ARG1 (

meet expert group )

US

  • fficials

held an expert group meeting in January 2002

h3

a the meeting …

w2: held

hold ARG0 ( person ARG0-of

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s) c3

[ ]

ci = X

i

aijh(s)

j

ai = soft max

  • fi
  • h(s), hi
slide-54
SLIDE 54

Attention

hold ARG0 ( person role US

  • fficial ) ARG1 (

meet expert group )

US

  • fficials

held an expert group meeting in January 2002

h3

a the meeting …

w2: held

hold ARG0 ( person ARG0-of

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s) c3

[ ]

ci = X

i

aijh(s)

j

ai = soft max

  • fi
  • h(s), hi
slide-55
SLIDE 55

Attention

hold ARG0 ( person role US

  • fficial ) ARG1 (

meet expert group )

US

  • fficials

held an expert group meeting in January 2002

h3

a the meeting …

w2: held

hold ARG0 ( person ARG0-of

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s) c3

[ ]

ci = X

i

aijh(s)

j

ai = soft max

  • fi
  • h(s), hi
slide-56
SLIDE 56

Attention

hold ARG0 ( person role US

  • fficial ) ARG1 (

meet expert group )

US

  • fficials

held an expert group meeting in January 2002

h3

a the meeting …

w2: held

hold ARG0 ( person ARG0-of

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s) c3

[ ]

ci = X

i

aijh(s)

j

ai = soft max

  • fi
  • h(s), hi
slide-57
SLIDE 57

Pre-processing

Linearization —> Anonymization

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

slide-58
SLIDE 58

Pre-processing

Linearization —> Anonymization

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

country “United States”

slide-59
SLIDE 59

Pre-processing

Linearization —> Anonymization

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

country “United States” 2002 1

slide-60
SLIDE 60

Pre-processing

Linearization —> Anonymization

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

country “United States” 2002 1 “New York” city

slide-61
SLIDE 61

Pre-processing

Linearization —> Anonymization

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

country “United States” 2002 1 “New York” city

loc_0 officials held an expert group meeting in month_0 year_0 in loc_1 .

slide-62
SLIDE 62

Experimental Setup

AMR LDC2015E86 (SemEval-2016 Task 8)

  • Hand annotated MR graphs: newswire, forums
  • ~16k training / 1k development / 1k test pairs

Train

  • Optimize cross-entropy loss

Evaluation

  • BLEU n-gram precision

(Papineni et al., ACL 2002)

slide-63
SLIDE 63

First Attempt

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-64
SLIDE 64

First Attempt

BLEU 5.8 11.6 17.4 23.2 29

TreeToStr TSP PBMT NNLG

26.9 22.4 23

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-65
SLIDE 65

First Attempt

BLEU 5.8 11.6 17.4 23.2 29

TreeToStr TSP PBMT NNLG

22 26.9 22.4 23

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-66
SLIDE 66

First Attempt

BLEU 5.8 11.6 17.4 23.2 29

TreeToStr TSP PBMT NNLG

22 26.9 22.4 23

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

All systems use a Language Model trained on a very large corpus. We will emulate via data augmentation.

(Sennrich et al., ACL 2016)

slide-67
SLIDE 67

What went wrong?

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

US officials held an expert group meeting in January 2002 in New York . United States officials held held a meeting in January 2002 .

Reference Prediction

44.26% 74.85%

slide-68
SLIDE 68

What went wrong?

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

US officials held an expert group meeting in January 2002 in New York . United States officials held held a meeting in January 2002 .

Reference Prediction

  • Repetition

44.26% 74.85%

slide-69
SLIDE 69

What went wrong?

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

US officials held an expert group meeting in January 2002 in New York . United States officials held held a meeting in January 2002 .

Reference Prediction

  • Repetition
  • Coverage

44.26% 74.85%

slide-70
SLIDE 70

What went wrong?

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

US officials held an expert group meeting in January 2002 in New York . United States officials held held a meeting in January 2002 .

Reference Prediction

  • Repetition
  • Coverage

a) Sparsity

Tokens 4500 9000 13500 18000

Total OOV@1 OOV@5 44.26% 74.85%

slide-71
SLIDE 71

What went wrong?

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

US officials held an expert group meeting in January 2002 in New York . United States officials held held a meeting in January 2002 .

Reference Prediction

  • Repetition
  • Coverage

a) Sparsity b) Avg sent length: 20 words c) Limited Language Modeling capacity

Tokens 4500 9000 13500 18000

Total OOV@1 OOV@5 44.26% 74.85%

slide-72
SLIDE 72

Data Augmentation

Original Dataset: ~16k graph-sentence pairs

slide-73
SLIDE 73

Data Augmentation

Original Dataset: ~16k graph-sentence pairs

Gigaword: ~183M sentences *only*

slide-74
SLIDE 74

Data Augmentation

Original Dataset: ~16k graph-sentence pairs

Gigaword: ~183M sentences *only* Sample sentences with vocabulary overlap

% 20 40 60 80 OOV@1 OOV@5

Original Giga-200k Giga-2M

slide-75
SLIDE 75

Data Augmentation

graph text Generate from MR

Attention

Encoder Decoder

graph text

slide-76
SLIDE 76

Data Augmentation

graph text Generate from MR Parse to MR

Attention

Encoder Decoder

graph text

Attention

Encoder Decoder

text graph

slide-77
SLIDE 77

Data Augmentation

graph text Generate from MR Parse to MR

Attention

Encoder Decoder

graph text

Attention

Encoder Decoder

text graph

slide-78
SLIDE 78

Data Augmentation

graph text Generate from MR Parse to MR

Attention

Encoder Decoder

graph text

Attention

Encoder Decoder

text graph

Re-train

slide-79
SLIDE 79

Data Augmentation

input text Generate from Input Parse to Input

slide-80
SLIDE 80

Paired Training

slide-81
SLIDE 81

Paired Training

Train MR Parser P on Original Dataset

( , )

slide-82
SLIDE 82

Paired Training

for i = 0 … N

Train MR Parser P on Original Dataset

( , )

slide-83
SLIDE 83

Paired Training

for i = 0 … N

Si =Sample k 10i sentences from Gigaword

Train MR Parser P on Original Dataset

( , )

slide-84
SLIDE 84

Paired Training

for i = 0 … N

Parse Si sentences with P

Si =Sample k 10i sentences from Gigaword

Train MR Parser P on Original Dataset

( , )

slide-85
SLIDE 85

Paired Training

Self-train Parser

for i = 0 … N

Parse Si sentences with P

Si =Sample k 10i sentences from Gigaword

Re-train MR Parser P on Si Train MR Parser P on Original Dataset

( , )

slide-86
SLIDE 86

Paired Training

Self-train Parser

for i = 0 … N

Parse Si sentences with P

Si =Sample k 10i sentences from Gigaword

Re-train MR Parser P on Si Train MR Parser P on Original Dataset

( , )

slide-87
SLIDE 87

Paired Training

Self-train Parser

for i = 0 … N

Parse Si sentences with P

Si =Sample k 10i sentences from Gigaword

Re-train MR Parser P on Si Train MR Parser P on Original Dataset

( , )

Train Generator G on SN

( , )

slide-88
SLIDE 88

Training MR Parser

Train P on Original Dataset

slide-89
SLIDE 89

Training MR Parser

Train P on Original Dataset

slide-90
SLIDE 90

Training MR Parser

Sample S1=200k sentences from Gigaword Train P on Original Dataset

200k

slide-91
SLIDE 91

Training MR Parser

Sample S1=200k sentences from Gigaword Parse S1 with P Train P on Original Dataset

200k 200k

( , )

slide-92
SLIDE 92

Training MR Parser

Sample S1=200k sentences from Gigaword Train P on S1=200k Parse S1 with P Train P on Original Dataset

200k 200k 200k

( , )

slide-93
SLIDE 93

Training MR Parser

Sample S1=200k sentences from Gigaword Train P on S1=200k Fine-tune P on Original Dataset Parse S1 with P Train P on Original Dataset

200k 200k

Fine-tune: init parameters from previous step and train on Original Dataset

200k

( , )

slide-94
SLIDE 94

Training MR Parser

Train P on S2=2M Fine-tune P on Original Dataset Sample S2=2M sentences from Gigaword Parse S2 with P

200k 200k 200k

Fine-tune: init parameters from previous step and train on Original Dataset

( , )

slide-95
SLIDE 95

Training MR Parser

Train P on S2=2M Fine-tune P on Original Dataset Sample S2=2M sentences from Gigaword Parse S2 with P

200k 2M 2M 2M

Fine-tune: init parameters from previous step and train on Original Dataset

( , )

slide-96
SLIDE 96

Training MR Generator

Train G on S3=2M Fine-tune G on Original Dataset Sample S3=2M sentences from Gigaword Parse S3 with P

200k 2M 2M 2M

Fine-tune: init parameters from previous step and train on Original Dataset

( , )

slide-97
SLIDE 97

Training MR Generator

Train G on S3=2M Fine-tune G on Original Dataset Sample S3=2M sentences from Gigaword Parse S3 with P

2M 2M 2M 2M G G

Fine-tune: init parameters from previous step and train on Original Dataset

( , )

slide-98
SLIDE 98

Training MR Generator

Train G on S3=2M Fine-tune G on Original Dataset Sample S3=2M sentences from Gigaword Parse S3 with P

2M 2M 2M 2M G G

Fine-tune: init parameters from previous step and train on Original Dataset

( , )

slide-99
SLIDE 99

Final Results

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-100
SLIDE 100

Final Results

BLEU 7 14 21 28 35

TreeToStr TSP PBMT NNLG NNLG-200k NNLG-2M NNLG-20M

22 26.9 22.4 23

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-101
SLIDE 101

Final Results

BLEU 7 14 21 28 35

TreeToStr TSP PBMT NNLG NNLG-200k NNLG-2M NNLG-20M

27.4 22 26.9 22.4 23

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-102
SLIDE 102

Final Results

BLEU 7 14 21 28 35

TreeToStr TSP PBMT NNLG NNLG-200k NNLG-2M NNLG-20M

32.3 27.4 22 26.9 22.4 23

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-103
SLIDE 103

Final Results

BLEU 7 14 21 28 35

TreeToStr TSP PBMT NNLG NNLG-200k NNLG-2M NNLG-20M

34.06 32.3 27.4 22 26.9 22.4 23

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-104
SLIDE 104

How did we do?

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

US officials held an expert group meeting in January 2002 in New York . In January 2002 United States officials held a meeting of the group experts in New York .

Reference Prediction

44.26% 74.85%

Errors: Disfluency Coverage

slide-105
SLIDE 105

How did we do?

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

US officials held an expert group meeting in January 2002 in New York . In January 2002 United States officials held a meeting of the group experts in New York .

Reference Prediction

44.26% 74.85%

The report stated British government must help to stabilize weak states and push for international regulations that would stop terrorists using freely available information to create and unleash new forms of biological warfare such as a modified version of the influenza virus.

Reference

The report stated that the Britain government must help stabilize the weak states and push international regulations to stop the use of freely available information to create a form of new biological warfare such as the modified version

  • f the influenza .

Prediction

Errors: Disfluency Coverage

slide-106
SLIDE 106

Adapt to other applications?

  • Structured input representation

Meaning Representation of Natural Language Programming Language

slide-107
SLIDE 107

Code to Language

Joint work with Srinivasan Iyer Luke Zettlemoyer, Alvin Cheung

slide-108
SLIDE 108

Code to Language

Input: Source Code

(SQL - C#)

(Summarizing Source Code using a Neural Attention Model. Iyer, Konstas, Cheung, Zettlemoyer, ACL 2016) public int TextWidth (string text) { TextBlock t = new TextBlock(); t.Text = text; return (int) Math.Ceiling(t.ActualWidth); }

Get rendered width of string rounded up to the nearest integer.

Output: Summary

slide-109
SLIDE 109

Code to Language

Input: Source Code

(SQL - C#)

(Summarizing Source Code using a Neural Attention Model. Iyer, Konstas, Cheung, Zettlemoyer, ACL 2016) public int TextWidth (string text) { TextBlock t = new TextBlock(); t.Text = text; return (int) Math.Ceiling(t.ActualWidth); }

Get rendered width of string rounded up to the nearest integer.

Output: Summary

SELECT max(marks) FROM stud_records WHERE marks < (SELECT max(marks) FROM stud_records);

How to find the second largest value from a table?

slide-110
SLIDE 110

Input Representation

1) Code snippet —>Linearize (left-to-right)

(Summarizing Source Code using a Neural Attention Model. Iyer, Konstas, Cheung, Zettlemoyer, ACL 2016)

SELECT max(marks) FROM stud_records WHERE marks < (SELECT max(marks) FROM stud_records);

How to find the second largest value from a table?

slide-111
SLIDE 111

Input Representation

1) Code snippet —>Linearize (left-to-right)

(Summarizing Source Code using a Neural Attention Model. Iyer, Konstas, Cheung, Zettlemoyer, ACL 2016)

SELECT max(marks) FROM stud_records WHERE marks < (SELECT max(marks) FROM stud_records); SELECT max(col0) FROM tab0 WHERE col0 < (SELECT max(col1) FROM tab1);

How to find the second largest value from a table?

2) Anonymize

slide-112
SLIDE 112

Input Representation

1) Code snippet —>Linearize (left-to-right)

(Summarizing Source Code using a Neural Attention Model. Iyer, Konstas, Cheung, Zettlemoyer, ACL 2016)

SELECT max(marks) FROM stud_records WHERE marks < (SELECT max(marks) FROM stud_records); SELECT max(col0) FROM tab0 WHERE col0 < (SELECT max(col1) FROM tab1);

How to find the second largest value from a table?

SELECT max col0 FROM tab0

h2(s) h3(s) h4(s) h5(s) h1(s)

2) Anonymize 3) Bag of Words Encoding

slide-113
SLIDE 113

Decoding with Attention

4) Bag of Words Encoding —>RNN Decoding 5) Attention directly on input embeddings

(Summarizing Source Code using a Neural Attention Model. Iyer, Konstas, Cheung, Zettlemoyer, ACL 2016)

SELECT max col0 FROM tab0

h2(s) h3(s) h4(s) h5(s) c3 h1(s) h3

value the col_0 …

w2: largest

[ ]

slide-114
SLIDE 114

Decoding with Attention

4) Bag of Words Encoding —>RNN Decoding 5) Attention directly on input embeddings

(Summarizing Source Code using a Neural Attention Model. Iyer, Konstas, Cheung, Zettlemoyer, ACL 2016)

SELECT max col0 FROM tab0

h2(s) h3(s) h4(s) h5(s) c3 h1(s) h3

value the col_0 …

w2: largest

[ ]

slide-115
SLIDE 115

Community-based Datasets

slide-116
SLIDE 116

Community-based Datasets

slide-117
SLIDE 117

Community-based Datasets

  • (Accepted Answer, Post title) pairs
  • ~33K SQL / 66k C# examples
slide-118
SLIDE 118

Results

PBMT: MOSES Phrase-based MT system SUM-NN: Rush et al, EMNLP 2015

slide-119
SLIDE 119

Results

BLEU 5.25 10.5 15.75 21 SQL C#

IR PBMT SUM-NN CODE-NN

PBMT: MOSES Phrase-based MT system SUM-NN: Rush et al, EMNLP 2015

slide-120
SLIDE 120

Results

BLEU 5.25 10.5 15.75 21 SQL C#

IR PBMT SUM-NN CODE-NN

PBMT: MOSES Phrase-based MT system SUM-NN: Rush et al, EMNLP 2015

slide-121
SLIDE 121

Human Evaluation Results

Naturalness 1.25 2.5 3.75 5 SQL C#

IR PBMT SUM-NN CODE-NN

PBMT: MOSES Phrase-based MT system SUM-NN: Rush et al, EMNLP 2015 Informativeness 1.25 2.5 3.75 5 SQL C#

slide-122
SLIDE 122

How did we do?

SELECT * FROM table ORDER BY Rand() LIMIT 10

Select random rows from mysql table How to get random rows from a mysql database?

Reference CODE-NN

slide-123
SLIDE 123

How did we do?

SELECT * FROM table ORDER BY Rand() LIMIT 10

Select random rows from mysql table How to get random rows from a mysql database?

Reference CODE-NN

foreach (string pTxt in xml.parent) { TreeNode parent = new TreeNode(); foreach (string cTxt in xml.child) { TreeNode child = new TreeNode(); parent.Nodes.Add(child ); } }

Adding childs to a treenode dynamically in C# How to get all child nodes in TreeView?

Reference CODE-NN

slide-124
SLIDE 124

Neural NLG Contributions

slide-125
SLIDE 125

Neural NLG Contributions

  • Adapt to multiple applications
  • Scale to very large corpora
  • Address low-resource problem
  • Paired training general technique
  • Train on noisy community-based datasets
slide-126
SLIDE 126

Future Work

slide-127
SLIDE 127

Educational Technology

(Koncel-Kedziorski, Konstas, Zettlemoyer, Hajishirzi. A Theme-Rewriting Approach for Generating Algebra Word Problems, EMNLP 2016)

Bob has 639 sheep. Alice has 504 sheep. How many more sheep does Bob have than Alice?

+ =

Joint work with Rik Koncel-Kedziorski Luke Zettlemoyer, Hannaneh Hajishirzi

slide-128
SLIDE 128

Educational Technology

(Koncel-Kedziorski, Konstas, Zettlemoyer, Hajishirzi. A Theme-Rewriting Approach for Generating Algebra Word Problems, EMNLP 2016)

Bob has 639 sheep. Alice has 504 sheep. How many more sheep does Bob have than Alice?

+ =

Luke Skywalker has 639 blasters. Leia has 504 blasters. How many more blasters does Luke Skywalker have than Leia?

Syntactic, Semantic, Thematic rewriter

Joint work with Rik Koncel-Kedziorski Luke Zettlemoyer, Hannaneh Hajishirzi

slide-129
SLIDE 129

Educational Technology

Luke Skywalker has 639 blasters. Leia has 504 blasters. How many more blasters does Luke Skywalker have than Leia?

+ =

Bob has 639 sheep. Alice has 504 sheep. How many more sheep does Bob have than Alice?

slide-130
SLIDE 130

Educational Technology

Luke Skywalker has 639 blasters. Leia has 504 blasters. How many more blasters does Luke Skywalker have than Leia?

+ =

504 + x = 639

slide-131
SLIDE 131

Educational Technology

Luke Skywalker has 639 blasters. Leia has 504 blasters. How many more blasters does Luke Skywalker have than Leia?

+ =

504 + x = 639 504 + x = 639

theme

Luke Skywalker blasters

slide-132
SLIDE 132

Educational Technology

Luke Skywalker has 639 blasters. Leia has 504 blasters. How many more blasters does Luke Skywalker have than Leia?

+ =

504 + x = 639 504 + x = 639

theme

Luke Skywalker blasters

sOUT:

slide-133
SLIDE 133

Educational Technology

Luke Skywalker has 639 blasters. Leia has 504 blasters. How many more blasters does Luke Skywalker have than Leia?

+ =

504 + x = 639 504 + x = 639

theme

Luke Skywalker blasters

sOUT: sIN:

slide-134
SLIDE 134

Educational Technology

Luke Skywalker has 639 blasters. Leia has 504 blasters. How many more blasters does Luke Skywalker have than Leia?

+ =

504 + x = 639

sG:

504 + x = 639

theme

Luke Skywalker blasters

sOUT: sIN:

slide-135
SLIDE 135

Educational Technology

Luke Skywalker has 639 blasters. Leia has 504 blasters. How many more blasters does Luke Skywalker have than Leia?

math problem

+ =

504 + x = 639

sG:

504 + x = 639

theme

Luke Skywalker blasters

sOUT: sIN:

Luke Skywalker has 639 blasters. Leia has 504 blasters. How many more blasters does Luke Skywalker have than Leia?

f(LMIN, LMOUT, LMG)

slide-136
SLIDE 136

Educational Technology

Luke Skywalker uses the force to open the locked door that leads to the hangar. Then Han Solo runs past the spaceship in the hangar and blasted the two droids guarding it.

theme

Luke Skywalker blasters

sOUT:

Luke Skywalker uses the force to open the locked door that leads to the hangar. Then Han Solo runs past the spaceship in the hangar and blasted the two droids guarding it.

LMIN

slide-137
SLIDE 137

Educational Technology

Luke Skywalker uses the force to open the locked door that leads to the hangar. Then Han Solo runs past the spaceship in the hangar and blasted the two droids guarding it.

theme

Luke Skywalker blasters

sOUT:

Luke Skywalker uses the force to open the locked door that leads to the hangar. Then Han Solo runs past the spaceship in the hangar and blasted the two droids guarding it.

LMIN

504 + x = 639

sIN:

Bob has 639 sheep. Alice has 504 sheep. How many more sheep does Bob have than Alice? Bob has 639 sheep. Alice has 504 sheep. How many more sheep does Bob have than Alice?

LMOUT

slide-138
SLIDE 138

Educational Technology

Luke Skywalker uses the force to open the locked door that leads to the hangar. Then Han Solo runs past the spaceship in the hangar and blasted the two droids guarding it.

theme

Luke Skywalker blasters

sOUT:

Luke Skywalker uses the force to open the locked door that leads to the hangar. Then Han Solo runs past the spaceship in the hangar and blasted the two droids guarding it.

LMIN

504 + x = 639

sIN:

Bob has 639 sheep. Alice has 504 sheep. How many more sheep does Bob have than Alice? Bob has 639 sheep. Alice has 504 sheep. How many more sheep does Bob have than Alice?

LMOUT

sG:

slide-139
SLIDE 139

Educational Technology

Luke Skywalker uses the force to open the locked door that leads to the hangar. Then Han Solo runs past the spaceship in the hangar and blasted the two droids guarding it.

theme

Luke Skywalker blasters

sOUT:

Luke Skywalker uses the force to open the locked door that leads to the hangar. Then Han Solo runs past the spaceship in the hangar and blasted the two droids guarding it.

LMIN

504 + x = 639

sIN:

Bob has 639 sheep. Alice has 504 sheep. How many more sheep does Bob have than Alice? Bob has 639 sheep. Alice has 504 sheep. How many more sheep does Bob have than Alice?

LMOUT

Defense lawyer Thomas Olsson stated it was very tragic and a failure for Swedish law and

  • rder that the client Thomas Olsson was

representing had been kept in detention. The official alleged Karzai was reluctant to move against big drug lords in Karzai 's political power base. Defense lawyer Thomas Olsson stated it was very tragic and a failure for Swedish law and

  • rder that the client Thomas Olsson was

representing had been kept in detention. The official alleged Karzai was reluctant to move against big drug lords in Karzai 's political power base.

LMG

slide-140
SLIDE 140

Educational Technology

Luke Skywalker uses the force to open the locked door that leads to the hangar. Then Han Solo runs past the spaceship in the hangar and blasted the two droids guarding it.

theme

Luke Skywalker blasters

sOUT:

Luke Skywalker uses the force to open the locked door that leads to the hangar. Then Han Solo runs past the spaceship in the hangar and blasted the two droids guarding it.

LMIN

504 + x = 639

sIN:

Bob has 639 sheep. Alice has 504 sheep. How many more sheep does Bob have than Alice? Bob has 639 sheep. Alice has 504 sheep. How many more sheep does Bob have than Alice?

LMOUT

sG:

Luke Skywalker has 639 blasters. Leia has 504 blasters. How many more blasters does Luke Skywalker have than Leia?

f(LMIN, LMOUT, LMG)

slide-141
SLIDE 141

Concept-to-Text

(A Global Model for Concept-to-Text Generation. Konstas and Lapata, JAIR 2013; EMNLP 2013)

time min mean max mode wind 12-3 3 5 7 wind 3-6 5 5 5 wind 6-9 5 6 7 dir 12-3 NW dir 3-6 NE dir 6-9 NE temp 12-9 40 42 45 precip 12-3 25 223 45 50 precip 3-6 15 30 50 precip 6-9 12 18 25 cover 12-3 50-75 cover c 3-6 50-75 cover 6-9

75-100

Chance of rain then becoming overcast, with a high of 45. Calm to moderate northeast winds.

(Angeli et al. EMNLP 2010, Kim and Mooney COLING 2010)

slide-142
SLIDE 142

Concept-to-Text

(A Global Model for Concept-to-Text Generation. Konstas and Lapata, JAIR 2013; EMNLP 2013)

time min mean max mode wind 12-3 3 5 7 wind 3-6 5 5 5 wind 6-9 5 6 7 dir 12-3 NW dir 3-6 NE dir 6-9 NE temp 12-9 40 42 45 precip 12-3 25 223 45 50 precip 3-6 15 30 50 precip 6-9 12 18 25 cover 12-3 50-75 cover c 3-6 50-75 cover 6-9

75-100

Chance of rain then becoming overcast, with a high of 45. Calm to moderate northeast winds.

(Angeli et al. EMNLP 2010, Kim and Mooney COLING 2010)

slide-143
SLIDE 143

Concept-to-Text

(A Global Model for Concept-to-Text Generation. Konstas and Lapata, JAIR 2013; EMNLP 2013)

time min mean max mode wind 12-3 3 5 7 wind 3-6 5 5 5 wind 6-9 5 6 7 dir 12-3 NW dir 3-6 NE dir 6-9 NE temp 12-9 40 42 45 precip 12-3 25 223 45 50 precip 3-6 15 30 50 precip 6-9 12 18 25 cover 12-3 50-75 cover c 3-6 50-75 cover 6-9

75-100

Chance of rain then becoming overcast, with a high of 45. Calm to moderate northeast winds.

(Angeli et al. EMNLP 2010, Kim and Mooney COLING 2010)

slide-144
SLIDE 144

Concept-to-Text

(A Global Model for Concept-to-Text Generation. Konstas and Lapata, JAIR 2013; EMNLP 2013)

time min mean max mode wind 12-3 3 5 7 wind 3-6 5 5 5 wind 6-9 5 6 7 dir 12-3 NW dir 3-6 NE dir 6-9 NE temp 12-9 40 42 45 precip 12-3 25 223 45 50 precip 3-6 15 30 50 precip 6-9 12 18 25 cover 12-3 50-75 cover c 3-6 50-75 cover 6-9

75-100

Chance of rain then becoming overcast, with a high of 45. Calm to moderate northeast winds.

(Angeli et al. EMNLP 2010, Kim and Mooney COLING 2010)

slide-145
SLIDE 145

Concept-to-Text

(A Global Model for Concept-to-Text Generation. Konstas and Lapata, JAIR 2013; EMNLP 2013)

time min mean max mode wind 12-3 3 5 7 wind 3-6 5 5 5 wind 6-9 5 6 7 dir 12-3 NW dir 3-6 NE dir 6-9 NE temp 12-9 40 42 45 precip 12-3 25 223 45 50 precip 3-6 15 30 50 precip 6-9 12 18 25 cover 12-3 50-75 cover c 3-6 50-75 cover 6-9

75-100

Chance of rain then becoming overcast, with a high of 45. Calm to moderate northeast winds.

(Angeli et al. EMNLP 2010, Kim and Mooney COLING 2010)

slide-146
SLIDE 146

Concept-to-Text

(A Global Model for Concept-to-Text Generation. Konstas and Lapata, JAIR 2013; EMNLP 2013)

time min mean max mode wind 12-3 3 5 7 wind 3-6 5 5 5 wind 6-9 5 6 7 dir 12-3 NW dir 3-6 NE dir 6-9 NE temp 12-9 40 42 45 precip 12-3 25 223 45 50 precip 3-6 15 30 50 precip 6-9 12 18 25 cover 12-3 50-75 cover c 3-6 50-75 cover 6-9

75-100

Chance of rain then becoming overcast, with a high of 45. Calm to moderate northeast winds.

(Angeli et al. EMNLP 2010, Kim and Mooney COLING 2010)

slide-147
SLIDE 147

Concept-to-Text

(A Global Model for Concept-to-Text Generation. Konstas and Lapata, JAIR 2013; EMNLP 2013)

time min mean max mode wind 12-3 3 5 7 wind 3-6 5 5 5 wind 6-9 5 6 7 dir 12-3 NW dir 3-6 NE dir 6-9 NE temp 12-9 40 42 45 precip 12-3 25 223 45 50 precip 3-6 15 30 50 precip 6-9 12 18 25 cover 12-3 50-75 cover c 3-6 50-75 cover 6-9

75-100

Chance of rain then becoming overcast, with a high of 45. Calm to moderate northeast winds.

(Angeli et al. EMNLP 2010, Kim and Mooney COLING 2010)

slide-148
SLIDE 148

Concept-to-Text

(A Global Model for Concept-to-Text Generation. Konstas and Lapata, JAIR 2013; EMNLP 2013)

time min mean max mode wind 12-3 3 5 7 wind 3-6 5 5 5 wind 6-9 5 6 7 dir 12-3 NW dir 3-6 NE dir 6-9 NE temp 12-9 40 42 45 precip 12-3 25 223 45 50 precip 3-6 15 30 50 precip 6-9 12 18 25 cover 12-3 50-75 cover c 3-6 50-75 cover 6-9

75-100

Chance of rain then becoming overcast, with a high of 45. Calm to moderate northeast winds.

(Angeli et al. EMNLP 2010, Kim and Mooney COLING 2010)

slide-149
SLIDE 149

Concept-to-Text

(A Global Model for Concept-to-Text Generation. Konstas and Lapata, JAIR 2013; EMNLP 2013)

time min mean max mode wind 12-3 3 5 7 wind 3-6 5 5 5 wind 6-9 5 6 7 dir 12-3 NW dir 3-6 NE dir 6-9 NE temp 12-9 40 42 45 precip 12-3 25 223 45 50 precip 3-6 15 30 50 precip 6-9 12 18 25 cover 12-3 50-75 cover c 3-6 50-75 cover 6-9

75-100

Chance of rain then becoming overcast, with a high of 45. Calm to moderate northeast winds.

(Angeli et al. EMNLP 2010, Kim and Mooney COLING 2010)

slide-150
SLIDE 150

Concept-to-Text

Chance of rain then becoming overcast, with a high of 45. Calm to moderate northeast winds.

time min mean max mode wind 12-3 3 5 7 wind 3-6 5 5 5 wind 6-9 5 6 7 dir 12-3 NW dir 3-6 NE dir 6-9 NE temp 12-9 40 42 45 precip 12-3 25 223 45 50 precip 3-6 15 30 50 precip 6-9 12 18 25 cover 12-3 50-75 cover c 3-6 50-75 cover 6-9

75-100

(A Global Model for Concept-to-Text Generation. Konstas and Lapata, JAIR 2013; EMNLP 2013)

slide-151
SLIDE 151

Concept-to-Text

Chance of rain then becoming overcast, with a high of 45. Calm to moderate northeast winds.

time min mean max mode wind 12-3 3 5 7 wind 3-6 5 5 5 wind 6-9 5 6 7 dir 12-3 NW dir 3-6 NE dir 6-9 NE temp 12-9 40 42 45 precip 12-3 25 223 45 50 precip 3-6 15 30 50 precip 6-9 12 18 25 cover 12-3 50-75 cover c 3-6 50-75 cover 6-9

75-100

(A Global Model for Concept-to-Text Generation. Konstas and Lapata, JAIR 2013; EMNLP 2013)

D precip1 cover1 dir [0,3] [3,13] temp1

wind_

[7,13] [13,18] [16,18] max

chance of rain

time mode time_

then becoming

  • vercast,

[3,4] [4,7] [3,7] max

with a high

  • f 45.

wind2_

wind2

wind3 min mean

calm to moderate

mode

northeast winds.

[13,14] [14,16] S1 S2 [0,13] [13,18]

slide-152
SLIDE 152

Concept-to-Text

Chance of rain then becoming overcast, with a high of 45. Calm to moderate northeast winds.

time min mean max mode wind 12-3 3 5 7 wind 3-6 5 5 5 wind 6-9 5 6 7 dir 12-3 NW dir 3-6 NE dir 6-9 NE temp 12-9 40 42 45 precip 12-3 25 223 45 50 precip 3-6 15 30 50 precip 6-9 12 18 25 cover 12-3 50-75 cover c 3-6 50-75 cover 6-9

75-100

Encoder

Document Decoder

table

document

Document Planner

Attention

(Mei et al., NAACL 2016)

slide-153
SLIDE 153

Concept-to-Text

Chance of rain then becoming overcast, with a high of 45. Calm to moderate northeast winds.

time min mean max mode wind 12-3 3 5 7 wind 3-6 5 5 5 wind 6-9 5 6 7 dir 12-3 NW dir 3-6 NE dir 6-9 NE temp 12-9 40 42 45 precip 12-3 25 223 45 50 precip 3-6 15 30 50 precip 6-9 12 18 25 cover 12-3 50-75 cover c 3-6 50-75 cover 6-9

75-100

Encoder

Document Decoder

table

document

Document Planner

  • Document plan based on:
  • sequences of records
  • discourse relations

Attention

(Mei et al., NAACL 2016)

slide-154
SLIDE 154

Caption Generation

(Krause et al., CVPR 2017, Yatskar et al., CVPR 2016, Krishna et al., 2016)

hitting agent victim victim part tool place ballplayer baseball

  • baseball

bat baseball diamond wearing wearer clothing body part ballplayer red helmet head wearing wearer clothing body part ballplayer white shirt torso

slide-155
SLIDE 155

Caption Generation

(Krause et al., CVPR 2017, Yatskar et al., CVPR 2016, Krishna et al., 2016)

hitting agent victim victim part tool place ballplayer baseball

  • baseball

bat baseball diamond wearing wearer clothing body part ballplayer red helmet head

A baseball player is swinging a bat. He is wearing a red helmet and a white shirt. The catcher’s mitt is behind the batter.

wearing wearer clothing body part ballplayer white shirt torso

Encoder

Document Decoder

frames

document

Document Planner

Attention

slide-156
SLIDE 156

Semantic-based Machine Translation

(Jones et al., COLING 2012)

The children told that lie

Source

その うそ は ⼦孑供 たち が つい た sono uso-wa kodomo-tachi-ga tsui-ta that lie-TOP child-and others-NOM breathe out-PAST

Target

slide-157
SLIDE 157

Semantic-based Machine Translation

(Jones et al., COLING 2012)

The children told that lie

Source

その うそ は ⼦孑供 たち が つい た sono uso-wa kodomo-tachi-ga tsui-ta that lie-TOP child-and others-NOM breathe out-PAST

Target

slide-158
SLIDE 158

Semantic-based Machine Translation

(Jones et al., COLING 2012)

The children told that lie

Source

その うそ は ⼦孑供 たち が つい た sono uso-wa kodomo-tachi-ga tsui-ta that lie-TOP child-and others-NOM breathe out-PAST

Target

tell child lie that ARG0 ARG1 ARG0-of
slide-159
SLIDE 159

Semantic-based Machine Translation

(Jones et al., COLING 2012)

The children told that lie

Source

その うそ は ⼦孑供 たち が つい た sono uso-wa kodomo-tachi-ga tsui-ta that lie-TOP child-and others-NOM breathe out-PAST

Target

Graph-to-graph transformation

tell child lie that ARG0 ARG1 ARG0-of tsuku kodomo tachi sono ARG1 ARG0 ARG0-of
slide-160
SLIDE 160

Semantic-based Machine Translation

(Jones et al., COLING 2012)

The children told that lie

Source

その うそ は ⼦孑供 たち が つい た sono uso-wa kodomo-tachi-ga tsui-ta that lie-TOP child-and others-NOM breathe out-PAST

Target

Graph-to-graph transformation

tell child lie that ARG0 ARG1 ARG0-of tsuku kodomo tachi sono ARG1 ARG0 ARG0-of
slide-161
SLIDE 161

Semantic-based Machine Translation

(Jones et al., COLING 2012)

The children told that lie

Source

その うそ は ⼦孑供 たち が つい た sono uso-wa kodomo-tachi-ga tsui-ta that lie-TOP child-and others-NOM breathe out-PAST

Target

  • No Japanese AMR corpus

Graph-to-graph transformation

tell child lie that ARG0 ARG1 ARG0-of tsuku kodomo tachi sono ARG1 ARG0 ARG0-of
slide-162
SLIDE 162

Semantic-based Machine Translation

(Jones et al., COLING 2012)

The children told that lie

Source

その うそ は ⼦孑供 たち が つい た sono uso-wa kodomo-tachi-ga tsui-ta that lie-TOP child-and others-NOM breathe out-PAST

Target

  • No Japanese AMR corpus
  • MRS hand-crafted grammars (Minimal Recursion Semantics; Copestake et al., RLC 2006)

Joint work with Michael Wayne Goodman Graph-to-graph transformation

tell child lie that ARG0 ARG1 ARG0-of tsuku kodomo tachi sono ARG1 ARG0 ARG0-of
slide-163
SLIDE 163

Semantic-based Machine Translation

(Jones et al., COLING 2012)

The children told that lie

Source

その うそ は ⼦孑供 たち が つい た sono uso-wa kodomo-tachi-ga tsui-ta that lie-TOP child-and others-NOM breathe out-PAST

Target

  • No Japanese AMR corpus
  • MRS hand-crafted grammars (Minimal Recursion Semantics; Copestake et al., RLC 2006)

1) Parse to MRS from English

Joint work with Michael Wayne Goodman Graph-to-graph transformation

tell child lie that ARG0 ARG1 ARG0-of tsuku kodomo tachi sono ARG1 ARG0 ARG0-of
slide-164
SLIDE 164

Semantic-based Machine Translation

(Jones et al., COLING 2012)

The children told that lie

Source

その うそ は ⼦孑供 たち が つい た sono uso-wa kodomo-tachi-ga tsui-ta that lie-TOP child-and others-NOM breathe out-PAST

Target

  • No Japanese AMR corpus
  • MRS hand-crafted grammars (Minimal Recursion Semantics; Copestake et al., RLC 2006)

1) Parse to MRS from English 2) Generate Japanese from MRS

Joint work with Michael Wayne Goodman Graph-to-graph transformation

tell child lie that ARG0 ARG1 ARG0-of tsuku kodomo tachi sono ARG1 ARG0 ARG0-of
slide-165
SLIDE 165

Dialogue Systems

(Acharya et al., INLG 2016, Rieser et al., IEEE/ACM 2014)

slide-166
SLIDE 166

Dialogue Systems

(Acharya et al., INLG 2016, Rieser et al., IEEE/ACM 2014)

> I would like to follow up on my speech therapy treatment.

slide-167
SLIDE 167

Dialogue Systems

(Acharya et al., INLG 2016, Rieser et al., IEEE/ACM 2014)

> I would like to follow up on my speech therapy treatment.

treatment follow

therapy

I speech

slide-168
SLIDE 168

Dialogue Systems

(Acharya et al., INLG 2016, Rieser et al., IEEE/ACM 2014)

> I would like to follow up on my speech therapy treatment.

treatment follow

therapy

I speech

slide-169
SLIDE 169

Dialogue Systems

(Acharya et al., INLG 2016, Rieser et al., IEEE/ACM 2014)

> I would like to follow up on my speech therapy treatment. Patient #3245 Log:

You were admitted for acute subcortical cerebrovascular accident. […] Verbal impairment related to communication impairment was treated with speech therapy 3 months ago. [...]

treatment follow

therapy

I speech

slide-170
SLIDE 170

Dialogue Systems

(Acharya et al., INLG 2016, Rieser et al., IEEE/ACM 2014)

> I would like to follow up on my speech therapy treatment. Patient #3245 Log:

You were admitted for acute subcortical cerebrovascular accident. […] Verbal impairment related to communication impairment was treated with speech therapy 3 months ago. [...]

treatment follow

therapy

I speech treatment therapy

impair

verbal communicate

slide-171
SLIDE 171

Dialogue Systems

(Acharya et al., INLG 2016, Rieser et al., IEEE/ACM 2014)

> I would like to follow up on my speech therapy treatment. Patient #3245 Log:

You were admitted for acute subcortical cerebrovascular accident. […] Verbal impairment related to communication impairment was treated with speech therapy 3 months ago. [...]

treatment follow

therapy

I speech treatment therapy

impair

verbal communicate

see

log start 3 improve therapy

slide-172
SLIDE 172

Dialogue Systems

(Acharya et al., INLG 2016, Rieser et al., IEEE/ACM 2014)

> I would like to follow up on my speech therapy treatment. Patient #3245 Log:

You were admitted for acute subcortical cerebrovascular accident. […] Verbal impairment related to communication impairment was treated with speech therapy 3 months ago. [...]

< I can see in my logs, that we started improving verbal impairment due to the accident, with speech therapy 3 months ago. When would you like to book the next appointment?

treatment therapy

impair

verbal communicate

see

log start 3 improve therapy

slide-173
SLIDE 173

Summary

  • General data-driven approach for NLG
  • Facilitates the deployment to new domains
  • Integrates to existing systems and applications
slide-174
SLIDE 174

Summary

Thank You

  • General data-driven approach for NLG
  • Facilitates the deployment to new domains
  • Integrates to existing systems and applications