Extractive Summarization with SWAP-NET: Sentences and Words from - PowerPoint PPT Presentation

Extractive Summarization with SWAP-NET: Sentences and Words from Alternating Pointer Networks Aishwarya Jadhav Vaibhav Rajan Indian Institute of Science School of Computing Bangalore, India National University of Singapore

Extractive Summarization Select salient sentences from input document to create a summary S 1 S i 1 S 2 • Supervised extractive summarization for single document inputs S i m S n OUTPUT Summary 1 ≤ i k ≤ n INPUT Document with sentences S 1 , S 2 ,.., S n

Our Contribution A Deep Learning Architecture for training an extractive summarizer: SWAP-NET S 1 • Unlike previous methods, SWAP-NET uses keywords for sentence selection S i 1 • Predicts both important words and S 2 sentences in document • Two-level Encoder-Decoder Attention model S i m • Outperform state of the art extractive S n summarisers. OUTPUT Summary 1 ≤ i k ≤ n INPUT Document with sentences S 1 , S 2 ,.., S n

Extractive Summarization Methods Recent extractive summarization methods

Extractive Summarization Methods Recent extractive summarization methods • NN (Cheng and Lapata, 2016) Sentence Label Pre-trained word Sentence Encoding Sentence encodings Prediction embeddings wrt words in it wrt other sentences (with decoder) Jianpeng Cheng and Mirella Lapata. 2016. Neural summarization by extracting sentences and words. 54th Annual Meeting of the Association for Computational Linguistics.

Extractive Summarization Methods Recent extractive summarization methods • NN (Cheng and Lapata, 2016) Sentence Label Pre-trained word Sentence Encoding Sentence encodings Prediction embeddings wrt words in it wrt other sentences (with decoder) Sentence Label • SummaRuNNer (Nallapati et al., 2017) Prediction Pre-trained word Word Encodings Sentence Encoding Sentence Encodings Document Encoding embeddings wrt other words wrt words in it wrt other sentences wrt its sentences Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. Summarunner: A recurrent neural network based sequence model for extractive summarization of docments. In Association for the Advancement of Artificial Intelligence, pages 3075–3081. Jianpeng Cheng and Mirella Lapata. 2016. Neural summarization by extracting sentences and words. 54th Annual Meeting of the Association for Computational Linguistics.

Extractive Summarization Methods Recent extractive summarization methods • NN (Cheng and Lapata, 2016) Sentence Label Pre-trained word Sentence Encoding Sentence encodings Prediction embeddings wrt words in it wrt other sentences (with decoder) Sentence Label • SummaRuNNer (Nallapati et al., 2017) Prediction Pre-trained word Word Encodings Sentence Encoding Sentence Encodings Document Encoding embeddings wrt other words wrt words in it wrt other sentences wrt its sentences • Both assume saliency of sentence s depends on salient sentences appearing before s Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. Summarunner: A recurrent neural network based sequence model for extractive summarization of docments. In Association for the Advancement of Artificial Intelligence, pages 3075–3081. Jianpeng Cheng and Mirella Lapata. 2016. Neural summarization by extracting sentences and words. 54th Annual Meeting of the Association for Computational Linguistics.

Intuition Behind Approach Question : Which sentence should be considered salient (part of summary)? • Our hypothesis: saliency of a sentence depends on both salient sentences and words appearing before that sentence in the document • Similar to graph based models by Wan et al. (2007) • Along with labelling sentences we also label words to determine their saliency • Moreover, saliency of a word depends on previous salient words and sentences Xiaojun Wan, Jianwu Yang, and Jianguo Xiao. 2007. Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In Proceedings of the 45th annual meeting of the association of computational linguistics , pages 552–559.

Intuition Behind Approach Three types of Interactions: • Sentence-Sentence Interaction • Word-Word Interaction • Sentence-Word Interaction

Intuition: Interaction Between Sentences A sentence should be salient if it is heavily linked with other salient sentences Sentence - Sentence S1 S2 S3 V6 V5 V4 V3 V2 V1

Intuition: Interaction Between Words A word should be salient if it is heavily linked with other salient words S1 S2 S3 V6 V5 V4 V3 V2 V1 Word-Word

Intuition: Words and Sentences Interaction A sentence should be salient if it contains many salient words A word should be salient if it appears in many salient sentences Sentence-Word S1 S2 S3 V6 V5 V4 V3 V2 V1

Intuition: Words and Sentences Interaction Generate extractive summary using both important words and sentences Sentence - Sentence Sentence-Word S1 S2 S3 V6 V5 V4 V3 V2 V1 Important Sentences: S3 Word-Word Important Words: V2, V3

Keyword Extraction and Sentence Extraction • Sentence to Sentence Interaction as Sentence Extraction • Word to Word Interaction as Word Extraction • For discrete sequences, pointer networks have been successfully used to learn how to select positions from an input sequence • We use two pointer networks one at word-level and another at sentence-level

Pointer Network Pointer network (Vinyals et al., 2015), • Encoder-Decoder architecture with Attention • Attention mechanism is used to select one of the inputs at each decoding step • Thus, e ff ectively pointing to an input Attention Vector 3 2 e2 e1 e3 e4 d1 d2 Input Output Indices (R): 2,3 x1 x2 x3 x4 (X): Decoder Encoder Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer networks. In Advances in Neural Information Processing Systems , pages 2692–2700.

Three Interactions Sentence-Level Pointer Network ? Sentence - Sentence Sentence-Word S1 S2 S3 V6 V5 V4 V3 V2 V1 Word-Word Word-Level Pointer Network

Three Interactions: SWAP-NET Sentence-Level Pointer Network Sentence - Sentence Sentence-Word A Mechanism to Combine Word Level Attentions and Generate Summary Sentence Level Attentions Word-Word Word-Level Pointer Network

Questions Q1 : How can the two attentions be combined? Q2 : How can the summaries be generated considering both the attentions? Q1 Q2 A Mechanism to Combine ? ? Sentence-Word Word Level Attentions and Generate Summary Sentence Level Attentions

Three Interactions: SWAP-NET Sentence-Level Pointer Network ? Sentence - Sentence Sentence-Word S1 S2 S3 V6 V5 V4 V3 V2 V1 Word-Word Word-Level Pointer Network

SWAP-NET Architecture: Word-Level Pointer Network Similar to Pointer Network, • The word encoder is bi-directional LSTM • Word-level decoder learns to point to important words Word Word Decoder Encoder E E E E E D D D W W W W W W W W 1 2 3 4 5 1 2 3 w1 w2 w3 w4 w5

SWAP-NET Architecture: Word-Level Pointer Network Probability of word i, • Purple line: attention vector given as input at decoding step j to each decoding step • Sum of word encodings weighted by attention probabilities generated in previous step Word Attention w1 w2 w3 w4 w5 Word Attention E E E E E D D D W W W W W W W W Vector 1 2 3 4 5 1 2 3 w1 w2 w3 w4 w5

Three Interactions: SWAP-NET Sentence-Level Pointer Network ? Sentence - Sentence Sentence-Word S1 S2 S3 V6 V5 V4 V3 V2 V1 Word-Word Word-Level Pointer Network

SWAP-NET Architecture: Sentence-Level Hierarchical Pointer Network Sentence is represented by encoding of last word of that sentence Sentence Sentence E E D D D Encoder Decoder S S S S S 1 2 1 2 3 s1 s2 Word Word E E E E E D D D Decoder Encoder W W W W W W W W 1 2 3 4 5 1 2 3 w1 w2 w3 w4 w5

SWAP-NET Architecture: Sentence-Level Hierarchical Pointer Network Attention vectors are sum of sentence encodings weighted by attention probabilities by previous decoding step Sentence Attention E E D D D Vector S S S S S 1 2 1 2 3 s1 s2 Probability of sentence k, at decoding step j Sentence Attention E E E E E D D D W W W W W W W W 1 2 3 4 5 1 2 3 w1 w2 w3 w4 w5

Combining Sentence Attention and Word Attention Q1 : How can the two attentions be combined? Sentences S1 S2 S3 V1 V2 V2 V5 V4 V6 V4 V3 V2 Words A document with three sentences and corresponding words is shown

Sentence and Word Interactions Possible Solution: Step 1: Hold sentence processing. Then group all words and determine their saliency sequentially S1 S2 S3 V1 V2 V2 V5 V4 V6 V4 V3 V2

Sentence and Word Interactions Possible Solution: Step 2: Using output of step 1, i.e., using keywords, process sentences to determine salient sentences S1 S2 S3 V1 V2 V2 V5 V4 V6 V4 V3 V2 INCOMPLETE SOLUTION : This methods processes sentence depending on words but does not use sentences for processing words.

Sentence and Word Interactions Solution : Group each sentence and its words separately and process them sequentially S1 S2 S3 V1 V2 V2 V5 V4 V6 V4 V3 V2

Sentence and Word Interactions Step1: Hold sentence processing. Determine saliency of words in S1 S1 S2 S3 V1 V2 V2 V5 V4 V6 V4 V3 V2

Extractive Summarization with SWAP-NET: Sentences and Words from - PowerPoint PPT Presentation

Extractive Summarization with SWAP-NET: Sentences and Words from Alternating Pointer Networks Aishwarya Jadhav Vaibhav Rajan Indian Institute of Science School of Computing Bangalore, India National University of Singapore Extractive

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

y = x; } int a = 2, b = 6; swap(a,b); void swap(int x, int y) { int temp = y; y = x; x =

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

Get To The Point: Summarization with Pointer-Generator Networks Abigail See* Peter J. Liu

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

Interest Rate Swap and Interest Rate Swap and Variable Rate Debt Programs Variable Rate Debt

Market Models for Forward Swap Rates and Credit Default Swap Spreads Marek Rutkowski School of

Cushman & Wakefield SWAP Presentation SWAP - Safe Work Assurance Platform Overview What We

A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss Project

A Class of Submodular Functions for Document Summarization Hui Lin, Jeff Bilmes University of

MeanSum : A Neural Model for Unsupervised Multi-Document Abstractive Summarization Eric Chu *

SYMBOLIC LOGIC UNIT 10: SINGULAR SENTENCES Singular Sentences (monadic) Paris is beautiful

Committee Report Back: Workforce Investment Community Advisory Committee (WiCAC) Meeting on

WHERE THERES HOPE, THERES LIFE Dr. Janet Stum bo Introduction Twenty-two years ago I was

2018 ELECTION RECAP & 86 TH LEGISLATIVE SESSION PREVIEW Kevin S tewart, Legislative Counsel

GENERAL PRACTITIONER JOB DESCRIPTION Deliver across the entire spectrum of care, regardless

Frameless Stereotactic Navigation Stephen Monette (Team Leader) Matt Boyer (BWIG) Jake Levin

Presentation Kit Introduction : The original, old Baroque Pachtas Palace and house of Earl

Agreements: Structuring Key Provisions Avoiding Stark Law and AKS Violations, Overcoming

DPH Annual Report for 2013 Dr Carolyn Harper, Director of Public Health Public Health Annual

Extractive Summarization with SWAP-NET: Sentences and Words from - PowerPoint PPT Presentation

Extractive Summarization with SWAP-NET: Sentences and Words from Alternating Pointer Networks Aishwarya Jadhav Vaibhav Rajan Indian Institute of Science School of Computing Bangalore, India National University of Singapore Extractive

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

y = x; } int a = 2, b = 6; swap(a,b); void swap(int x, int y) { int temp = y; y = x; x =

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

Get To The Point: Summarization with Pointer-Generator Networks Abigail See* Peter J. Liu

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

Interest Rate Swap and Interest Rate Swap and Variable Rate Debt Programs Variable Rate Debt

Market Models for Forward Swap Rates and Credit Default Swap Spreads Marek Rutkowski School of

Cushman &amp; Wakefield SWAP Presentation SWAP - Safe Work Assurance Platform Overview What We

A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss Project

A Class of Submodular Functions for Document Summarization Hui Lin, Jeff Bilmes University of

MeanSum : A Neural Model for Unsupervised Multi-Document Abstractive Summarization Eric Chu *

SYMBOLIC LOGIC UNIT 10: SINGULAR SENTENCES Singular Sentences (monadic) Paris is beautiful

Committee Report Back: Workforce Investment Community Advisory Committee (WiCAC) Meeting on

WHERE THERES HOPE, THERES LIFE Dr. Janet Stum bo Introduction Twenty-two years ago I was

2018 ELECTION RECAP &amp; 86 TH LEGISLATIVE SESSION PREVIEW Kevin S tewart, Legislative Counsel

GENERAL PRACTITIONER JOB DESCRIPTION Deliver across the entire spectrum of care, regardless

Frameless Stereotactic Navigation Stephen Monette (Team Leader) Matt Boyer (BWIG) Jake Levin

Presentation Kit Introduction : The original, old Baroque Pachtas Palace and house of Earl

Agreements: Structuring Key Provisions Avoiding Stark Law and AKS Violations, Overcoming

DPH Annual Report for 2013 Dr Carolyn Harper, Director of Public Health Public Health Annual

Cushman & Wakefield SWAP Presentation SWAP - Safe Work Assurance Platform Overview What We

2018 ELECTION RECAP & 86 TH LEGISLATIVE SESSION PREVIEW Kevin S tewart, Legislative Counsel