A Unified Model for Extractive and Abstractive Summarization using - - PowerPoint PPT Presentation

a unified model for extractive and abstractive
SMART_READER_LITE
LIVE PREVIEW

A Unified Model for Extractive and Abstractive Summarization using - - PowerPoint PPT Presentation

A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss Project page Wan-Ting Hsu Chieh-Kai Lin National Tsing Hua University National Tsing Hua University 1 Outline Motivation Our Method Training


slide-1
SLIDE 1

A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss

1

Wan-Ting Hsu

National Tsing Hua University

Chieh-Kai Lin

National Tsing Hua University

Project page

slide-2
SLIDE 2

Outline

  • Motivation
  • Our Method
  • Training Procedures
  • Experiments and Results
  • Conclusion

2

slide-3
SLIDE 3

Outline

  • Motivation
  • Our Method
  • Training Procedures
  • Experiments and Results
  • Conclusion

3

slide-4
SLIDE 4

4

Overview

Textual Media

People spend 12 hours everyday consuming media in 2018. – eMarketer

https://www.emarketer.com/topics/topic/time-spent-with-media

slide-5
SLIDE 5

5

Overview

Textual Media

People spend 12 hours everyday consuming media in 2018. – eMarketer

https://www.emarketer.com/topics/topic/time-spent-with-media

slide-6
SLIDE 6

6

Overview

Textual Media

People spend 12 hours everyday consuming media in 2018. – eMarketer

https://www.emarketer.com/topics/topic/time-spent-with-media

slide-7
SLIDE 7

7

Overview

Textual Media

People spend 12 hours everyday consuming media in 2018. – eMarketer

https://www.emarketer.com/topics/topic/time-spent-with-media

slide-8
SLIDE 8

8

Overview

Text Summarization

  • To condense a piece of text to a shorter version while maintaining the

important points

slide-9
SLIDE 9

9

Overview

Examples of Text Summarization

  • Article headlines
  • Meeting minutes
  • Movie/book reviews
  • Bulletins (weather forecasts/stock

market reports)

slide-10
SLIDE 10

10

Overview

Examples of Text Summarization

  • Article headlines
  • Meeting minutes
  • Movie/book reviews
  • Bulletins (weather forecasts/stock

market reports)

slide-11
SLIDE 11

11

Overview

Examples of Text Summarization

  • Article headlines
  • Meeting minutes
  • Movie/book reviews
  • Bulletins (weather forecasts/stock

market reports)

slide-12
SLIDE 12

12

Overview

Examples of Text Summarization

  • Article headlines
  • Meeting minutes
  • Movie/book reviews
  • Bulletins (weather forecasts/stock

market reports)

slide-13
SLIDE 13

13

Overview

Examples of Text Summarization

  • Article headlines
  • Meeting minutes
  • Movie/book reviews
  • Bulletins (weather forecasts/stock

market reports)

slide-14
SLIDE 14

14

Overview

Automatic Text Summarization

  • To condense a piece of text to a shorter version while maintaining the

important points

Extractive Summarization Abstractive Summarization

select text from the article generate the summary word-by-word

slide-15
SLIDE 15

15

Overview

Extractive Summarization

  • Select phrases or sentences from the source document
  • Shen, D.; Sun, J.-T.; Li, H.; Yang, Q.; and Chen, Z. 2007. Document summarization using conditional random fields. IJCAI 2007.
  • Kågebäck, M., Mogren, O., Tahmasebi, N., & Dubhashi, D. Extractive Summarization using Continuous Vector Space Models. EACL 2014.
  • Cheng, J., and Lapata, M. Neural summarization by extracting sentences and words. ACL 2016.
  • Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. Summarunner: A recurrent neural network based sequence model for extractive

summarization of documents. AAAI 2017

Representation

1 3 9 2 5 6 5 7 8 1 1 4

sentence 1 sentence 2 sentence 3

slide-16
SLIDE 16

16

Overview

Abstractive Summarization

  • Select phrases or sentences from the source document
  • Alexander M Rush, Sumit Chopra, and Jason Weston. A neural attention model for abstractive sentence summarization. EMNLP 2015.
  • Ramesh Nallapati, Bowen Zhou, Cicero dos Santos, Caglar Gulcehre, and Bing Xiang. Abstractive text summarization using sequence-

tosequence rnns and beyond. CoNLL 2016.

  • Abigail See, Peter J Liu, and Christopher D Manning. Get to the point: Summarization with pointergenerator networks. ACL 2017.
  • Romain Paulus, Caiming Xiong, and Richard Socher. A deep reinforced model for abstractive summarization. ICLR 2018.
  • Fan, Angela, David Grangier, and Michael Auli. Controllable abstractive summarization. arXiv preprint arXiv:1711.05217 (2017).

Encoder Article Representations Decoder

slide-17
SLIDE 17
  • Extractive summary

(select sentences):

  • important, correct
  • incoherent or not concise
  • Abstractive summary

(generate word-by-word):

  • readable, concise
  • may lose or mistake some facts
  • Unified summary:
  • important, correct
  • readable, concise

17

Overview

Motivation

Italian artist Johannes Stoetter has painted two naked women to look like a chameleon. The 37-year-old has previously transformed his models into frogs and parrots but this may be his most intricate and impressive artwork to date.

not concise

slide-18
SLIDE 18
  • Extractive summary

(select sentences):

  • important, correct
  • incoherent or not concise
  • Abstractive summary

(generate word-by-word):

  • readable, concise
  • may lose or mistake some facts
  • Unified summary:
  • important, correct
  • readable, concise

18

Overview

Motivation

Italian artist Johannes Stoetter has painted two naked women to look like a chameleon. The 37-year-old has previously transformed his models into frogs and parrots but this may be his most intricate and impressive artwork to date. Johannes Stoetter has previously transformed his models into frogs and parrots but this chameleon may be his most impressive artwork to date.

not concise concise

slide-19
SLIDE 19
  • Extractive summary

(select sentences):

  • important, correct
  • incoherent or not concise
  • Abstractive summary

(generate word-by-word):

  • readable, concise
  • may lose or mistake some facts
  • Unified summary:
  • important, correct
  • readable, concise

19

Overview

Motivation

Italian artist Johannes Stoetter has painted two naked women to look like a chameleon. The 37-year-old has previously transformed his models into frogs and parrots but this may be his most intricate and impressive artwork to date. Johannes Stoetter has previously transformed his models into frogs and parrots but this chameleon may be his most impressive artwork to date.

not concise concise

Justin Bieber

slide-20
SLIDE 20
  • Extractive summary

(select sentences):

  • important, correct
  • incoherent or not concise
  • Abstractive summary

(generate word-by-word):

  • readable, concise
  • may lose or mistake some facts
  • Unified summary:
  • important, correct
  • readable, concise

20

Overview

Motivation

Italian artist Johannes Stoetter has painted two naked women to look like a chameleon. The 37-year-old has previously transformed his models into frogs and parrots but this may be his most intricate and impressive artwork to date. Johannes Stoetter has previously transformed his models into frogs and parrots but this chameleon may be his most impressive artwork to date.

not concise concise

Justin Bieber

slide-21
SLIDE 21

Outline

  • Motivation
  • Our Method
  • Training Procedures
  • Experiments and Results
  • Conclusion

21

slide-22
SLIDE 22

22

Models

Extractor

Method

Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. AAAI 2017

slide-23
SLIDE 23

23

Models

Extractor

Method

Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. AAAI 2017

static sentence attention

slide-24
SLIDE 24

24

Models

Extractor Abstracter

Method

Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. AAAI 2017 Abigail See, Peter J Liu, and Christopher D Manning. Get to the point: Summarization with pointer-generator networks. ACL 2017

static sentence attention

slide-25
SLIDE 25

25

Models

Extractor Abstracter

Method

Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. AAAI 2017 Abigail See, Peter J Liu, and Christopher D Manning. Get to the point: Summarization with pointer-generator networks. ACL 2017

static sentence attention dynamic word attention

slide-26
SLIDE 26

26

Combined Attention

Extractor Abstracter

Method

static sentence attention dynamic word attention

𝛾 𝛽

𝑛: word index 𝑜: sentence index 𝑢: generated word index

slide-27
SLIDE 27

27

Combined Attention

Extractor Abstracter

Method

static sentence attention dynamic word attention

𝛾1

𝛽1

𝑛: word index 𝑜: sentence index 𝑢: generated word index

𝛽2 𝛽3

Cindy is lucky. She won $1000. She is going to …

slide-28
SLIDE 28

28

Combined Attention

Extractor Abstracter

Method

static sentence attention dynamic word attention

𝛾1

𝛽1

𝑛: word index 𝑜: sentence index 𝑢: generated word index

𝛾2

𝛽2 𝛽3

Cindy is lucky. She won $1000. She is going to …

𝛽4 𝛽5 𝛽6

slide-29
SLIDE 29

29

Combined Attention

Extractor Abstracter

Method

static sentence attention dynamic word attention

𝛾1

𝛽1

𝑛: word index 𝑜: sentence index 𝑢: generated word index

𝛾2 𝛾3

𝛽2 𝛽3

Cindy is lucky. She won $1000. She is going to …

𝛽4 𝛽5 𝛽6 𝛽7 𝛽8 𝛽9 …

slide-30
SLIDE 30

30

Method

Combined Attention

  • Our unified model combines sentence-level and word-level attentions

to take advantage of both extractive and abstractive summarization approaches.

slide-31
SLIDE 31

31

Method

Combined Attention

  • Updated word attention is used for calculating the context vector and

final word distribution

slide-32
SLIDE 32

32

Method

Encourage Consistency

  • We propose a novel inconsistency loss function to ensure our unified

model to be mutually beneficial to both extractive and abstractive summarization.

multiplied attention of top K attended words

maximize

slide-33
SLIDE 33

33

Method

Encourage Consistency

  • encourage consistency of the top K attended words at each decoder

time step.

Sentence 1 Sentence 2 Sentence 3 inconsistent

1.0 0.5

K = 2

consistent

inconsistency loss: consistent < inconsistent

slide-34
SLIDE 34

Outline

  • Motivation
  • Our Method
  • Training Procedures
  • Experiments and Results
  • Conclusion

34

slide-35
SLIDE 35

35

Extractive Summarization Abstractive Summarization

select sentences from the article generate the summary word-by-word

Training Procedures

slide-36
SLIDE 36

Training Procedures

36

  • 3 types of loss functions:
  • 1. extractor loss
  • 2. abstracter loss

+ coverage loss

  • 3. inconsistency loss
slide-37
SLIDE 37

Training Procedures

37

  • 3 types of loss functions:
  • 1. extractor loss
  • 2. abstracter loss

+ coverage loss

  • 3. inconsistency loss
slide-38
SLIDE 38

Training Procedures

38

  • 3 types of loss functions:
  • 1. extractor loss
  • 2. abstracter loss

+ coverage loss

  • 3. inconsistency loss

Ground Truth 1 0 1

slide-39
SLIDE 39

Training Procedures

39

  • 3 types of loss functions:
  • 1. extractor loss
  • 2. abstracter loss

+ coverage loss

  • 3. inconsistency loss

Ground Truth 1 0 1

slide-40
SLIDE 40

40

Training Procedures

Extractor Target

  • To extract sentences with high informativity:

the extracted sentences should contain information that is needed to generate an abstractive summary as much as possible.

  • Ground-truth labels:
  • 1. Measure the informativity of each sentence in the article by computing the

ROUGE-L recall score between the sentence and the reference abstractive summary.

  • 2. Select the sentence in the order of high to low informativity and add one

sentence at a time if the new sentence can increase the informativity of all the selected sentences.

Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. AAAI 2017

slide-41
SLIDE 41

41

Combined Attention

Extractor Abstracter

static sentence attention dynamic word attention 𝑛: word index 𝑜: sentence index 𝑢: generated word index

Training Procedures

0.5 0.5 0.5

slide-42
SLIDE 42
  • 3 types of loss functions:
  • 1. extractor loss
  • 2. abstracter loss

+ coverage loss

  • 3. inconsistency loss

Training Procedures

42

Abigail See, Peter J Liu, and Christopher D Manning. Get to the point: Summarization with pointer-generator networks. ACL 2017

slide-43
SLIDE 43

Training Procedures

43

  • 3 types of loss functions:
  • 1. extractor loss
  • 2. abstracter loss

+ coverage loss

  • 3. inconsistency loss
slide-44
SLIDE 44

44

Training Procedures

  • 1. Two-stages training
  • 2. End-to-end training without inconsistency loss
  • 3. End-to-end training with inconsistency loss
slide-45
SLIDE 45

45

Training Procedures

  • 1. Two-stages training
  • The extractor is used as a classifier to select sentences with high informativity

and output only those sentences. = Hard attention on the original article.

  • simply combine the extractor and abstracter by feeding the extracted

sentences to the abstracter.

Extractor

extracted sentences

Abstracter

summary article

slide-46
SLIDE 46

46

Training Procedures

  • 2. End-to-end training without inconsistency loss
  • the sentence-level attention is soft attention and will be combined with the

word-level attention

  • minimize extractor loss and abstracter loss

Extractor Abstracter

summary article

+

slide-47
SLIDE 47

47

Training Procedures

  • 3. End-to-end training with inconsistency loss
  • the sentence-level attention is soft attention and will be combined with the

word-level attention

  • minimize extractor loss, abstracter loss and inconsistency loss:

Extractor Abstracter

summary article

+

slide-48
SLIDE 48

Outline

  • Motivation
  • Our Method
  • Training Procedures
  • Experiments and Results
  • Conclusion

48

slide-49
SLIDE 49

49

Experiment

Dataset – CNN/DailyMail Dataset

Train Validation Test Article-summary pairs 287,113 13,368 11,490 (…) Article ≈ 766 words Summary ≈ 53 words

slide-50
SLIDE 50

50

Experiment

Dataset – CNN/DailyMail Dataset

Train Validation Test Article-summary pairs 287,113 13,368 11,490 (…) Article ≈ 766 words Summary ≈ 53 words

Highlight

50 words

Article

700 words

slide-51
SLIDE 51

51

Experiment

Results – Abstractive Summarization

slide-52
SLIDE 52

52

Experiment

Results – Abstractive Summarization

slide-53
SLIDE 53

53

Experiment

Results – Abstractive Summarization

slide-54
SLIDE 54

54

Experiment

Results – Inconsistency Rate 𝑆𝑗𝑜𝑑

sentence attention and word attention in time step 𝑢

inconsistency step 𝒖𝒋𝒐𝒅: inconsistency rate:

slide-55
SLIDE 55

55

Experiment

Results – Inconsistency Rate 𝑆𝑗𝑜𝑑

sentence attention and word attention in time step 𝑢

inconsistency step 𝒖𝒋𝒐𝒅: inconsistency rate:

slide-56
SLIDE 56

56

Experiment

Results – Inconsistency Rate 𝑆𝑗𝑜𝑑

sentence attention and word attention in time step 𝑢

inconsistency step 𝒖𝒋𝒐𝒅: inconsistency rate:

slide-57
SLIDE 57

57

Experiment

Results – Inconsistency Rate 𝑆𝑗𝑜𝑑

sentence attention and word attention in time step 𝑢

inconsistency step 𝒖𝒋𝒐𝒅: inconsistency rate:

slide-58
SLIDE 58

58

Experiment

Results – Human Evaluation on MTurk

  • Informativity:

how well does the summary capture the important parts of the article?

  • Conciseness:

is the summary clear enough to explain everything without being redundant?

  • Readability:

how well-written (fluent and grammatical) the summary is?

trap

slide-59
SLIDE 59

59

Experiment

Results – Human Evaluation

  • Informativity: how well does the summary capture the important parts of the article?
  • Conciseness: is the summary clear enough to explain everything without being redundant?
  • Readability: how well-written (fluent and grammatical) the summary is?
slide-60
SLIDE 60

Outline

  • Motivation
  • Our Method
  • Training Procedures
  • Experiments and Results
  • Conclusion

60

slide-61
SLIDE 61

61

Conclusion and Future work

Conclusion

  • We propose a unified model combining the strength of extractive and abstractive

summarization.

  • A novel inconsistency loss function is introduced to penalize the inconsistency between

two levels of attentions. The inconsistency loss enables extractive and abstractive summarization to be mutually beneficial.

  • By end-to-end training of our model, we achieve the best ROUGE scores while being the

most informative and readable summarization on the CNN/Daily Mail dataset in a solid human evaluation.

slide-62
SLIDE 62

62

Acknowledgements

Min Sun Wen-Ting Tsu Chieh-Kai Lin Ming-Ying Lee Kerui Min Jing Tang

slide-63
SLIDE 63

Q & A

63

Project page

  • Code
  • Test output
  • Supplementary material

https://hsuwanting.github.io/unified_summ/