a neural attention model for abstractive sentence
play

A Neural Attention Model for Abstractive Sentence Summarization - PowerPoint PPT Presentation

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra Jason Weston Facebook AI Research Harvard SEAS Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 1 / 42 Sentence Summarization


  1. A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra Jason Weston Facebook AI Research Harvard SEAS Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 1 / 42

  2. Sentence Summarization Source Russian Defense Minister Ivanov called Sunday for the creation of a joint front for combating global terrorism. Target Russia calls for joint front against terrorism. Summarization Phenomena: Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 2 / 42

  3. Sentence Summarization Source Russian Defense Minister Ivanov called Sunday for the creation of a joint front for combating global terrorism. Target Russia calls for joint front against terrorism. Summarization Phenomena: Generalization Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 2 / 42

  4. Sentence Summarization Source Russian Defense Minister Ivanov called Sunday for the creation of a joint front for combating global terrorism. Target Russia calls for joint front against terrorism. Summarization Phenomena: Generalization Deletion Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 2 / 42

  5. Sentence Summarization Source Russian Defense Minister Ivanov called Sunday for the creation of a joint front for combating global terrorism. Target Russia calls for joint front against terrorism. Summarization Phenomena: Generalization Deletion Paraphrase Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 2 / 42

  6. Types of Sentence Summary [Not Standardized] Compressive : deletion-only Russian Defense Minister Ivanov called Sunday for the creation of a joint front for combating global terrorism. Extractive : deletion and reordering Abstractive : arbitrary transformation Russia calls for joint front against terrorism. Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 3 / 42

  7. Elements of Human Summary Jing 2002 Phenomenon Abstract Compress Extract (1) Sentence Reduction � � � (2) Sentence Combination � � � (3) Syntactic Transformation � � (4) Lexical Paraphrasing � (5) Generalization or Specification � (6) Reordering � � Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 4 / 42

  8. Related Work: Ext/Abs Sentence Summary Syntax-Based [Dorr, Zajic, and Schwartz 2003; Cohn and Lapata 2008; Woodsend, Feng, and Lapata 2010] Topic-Based [Zajic, Dorr, and Schwartz 2004] Machine Translation-Based [Banko, Mittal, and Witbrock 2000] Semantics-Based [Liu et al. 2015] Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 5 / 42

  9. Related Work: Attention-Based Neural MT Bahdanau, Cho, and Bengio 2014 Use attention (“soft alignment”) over source to determine next word. Robust to longer sentences versus encoder-decoder style models. No explicit alignment step, trained end-to-end. Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 6 / 42

  10. A Neural Attention Model for Summarization Question: Can a data-driven model capture abstractive phenomenon necessary for summarization without explicit representations? Properties: Utilizes a simple attention-based neural conditional language model. No syntax or other pipelining step, strictly data-driven. Generation is fully abstractive. Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 7 / 42

  11. Attention-Based Summarization (ABS) Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 8 / 42

  12. Model Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 9 / 42

  13. Summarization Model Notation: x ; Source sentence of length M with M >> N y ; Summarized sentence of length N (we assume N is given) Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 10 / 42

  14. Summarization Model Notation: x ; Source sentence of length M with M >> N y ; Summarized sentence of length N (we assume N is given) Past work: Noisy-channel summary [Knight and Marcu 2002] arg max log p ( y | x ) = arg max log p ( y ) p ( x | y ) y y Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 10 / 42

  15. Summarization Model Notation: x ; Source sentence of length M with M >> N y ; Summarized sentence of length N (we assume N is given) Past work: Noisy-channel summary [Knight and Marcu 2002] arg max log p ( y | x ) = arg max log p ( y ) p ( x | y ) y y Neural machine translation: Direct neural-network parameteriziation p ( y i +1 | y c , x ; θ ) ∝ exp( NN ( x , y c ; θ )) where y i +1 is the current word and y c is the context Most neural MT is non-Markovian, i.e. y c is full history (RNN, LSTM) [Kalchbrenner and Blunsom 2013; Sutskever, Vinyals, and Le 2014; Bahdanau, Cho, and Bengio 2014] Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 10 / 42

  16. Feed-Forward Neural Language Model Bengio et al. 2003 p ( y i + 1 | x , y c ; θ ) V h U ˜ y c E y c x y c ˜ = [ Ey i − C +1 , . . . , Ey i ] , h = tanh( U˜ y c ) , p ( y i +1 | y c , x ; θ ) ∝ exp( Vh ) . Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 11 / 42

  17. Feed-Forward Neural Language Model Bengio et al. 2003 p ( y i + 1 | x , y c ; θ ) V W h src U ˜ y c E y c x ˜ y c = [ Ey i − C +1 , . . . , Ey i ] , h = tanh( U˜ y c ) , p ( y i +1 | y c , x ; θ ) ∝ exp( Vh + W src ( x , y c )) . Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 11 / 42

  18. Source Model 1: Bag-of-Words Model src 1 p ˜ x F y c x ˜ x = [ Fx 1 , . . . , Fx M ] , p = [1 / M , . . . , 1 / M ] , [Uniform Distribution] p ⊤ ˜ src 1 ( x , y c ) = x . Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 12 / 42

  19. Source Model 2: Convolutional Model Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 13 / 42

  20. Source Model 3: Attention-Based Model y ′ ˜ ˜ x c F G y c x x ˜ = [ Fx 1 , . . . , Fx M ] , y ′ ˜ = [ Gy i − C +1 , . . . , Gy i ] , c Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 14 / 42

  21. Source Model 3: Attention-Based Model p P y ′ ˜ ˜ x c F G y c x ˜ x = [ Fx 1 , . . . , Fx M ] , y ′ ˜ = [ Gy i − C +1 , . . . , Gy i ] , c y ′ p ∝ exp( ˜ xP˜ c ) , [Attention Distribution] Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 14 / 42

  22. Source Model 3: Attention-Based Model src 3 p ¯ x P y ′ ˜ x ˜ c F G y c x ˜ x = [ Fx 1 , . . . , Fx M ] , y ′ ˜ = [ Gy i − C +1 , . . . , Gy i ] , c y ′ p ∝ exp( ˜ xP˜ c ) , [Attention Distribution] i +( Q − 1) / 2 � ∀ i ¯ x i = ˜ x i / Q , [Local Smoothing] q = i − ( Q − 1) / 2 p ⊤ ¯ src 3 ( x , y c ) = x . Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 14 / 42

  23. ABS Example [ � s � Russia calls] for y c y i +1 x Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 15 / 42

  24. ABS Example [ � s � Russia calls for] joint y c y i +1 x Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 15 / 42

  25. ABS Example [ � s � Russia calls for joint] front y c y i +1 x Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 15 / 42

  26. ABS Example � s � [Russia calls for joint front] against y c y i +1 x Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 15 / 42

  27. ABS Example � s � Russia [calls for joint front against] terrorism y c y i +1 x Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 15 / 42

  28. ABS Example � s � Russia calls [for joint front against terrorism] . y c y i +1 x Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 15 / 42

  29. Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 16 / 42

  30. Headline Generation Training Set Graff et al. 2003; Napoles, Gormley, and Van Durme 2012 Use Gigaword dataset. Total Sentences 3.8 M Newswire Services 7 Source Word Tokens 119 M Source Word Types 110 K Average Source Length 31 . 3 tokens Summary Word Tokens 31 M Summary Word Types 69 K Average Summary Length 8 . 3 tokens Average Overlap 4 . 6 tokens Average Overlap in first 75 2 . 6 tokens Comp with [Filippova and Altun 2013] 250K compressive pairs (although Filippova et al. 2015 2 million) Training done with mini-batch stochastic gradient descent. Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 17 / 42

  31. Generation: Beam Search russia calls for joint defense minister calls joint joint front calls terrorism russia calls for terrorism . . . Markov assumption allows for hypothesis recombination. Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 18 / 42

  32. Extension: Extractive Tuning Low-dim word embeddings unaware of exact matches. Log-linear parameterization: N − 1 exp( α ⊤ � p ( y | x ; θ, α ) ∝ f ( y i +1 , x , y c )) . i =0 Features f : Model score (neural model) 1 Unigram overlap 2 Bigram overlap 3 Trigram overlap 4 Word out-of-order 5 Similar to rare-word issue in neural MT [Luong et al. 2015] Use MERT for estimating α as post-processing (not end-to-end) Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 19 / 42

  33. Results Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 20 / 42

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend