text generation with exemplar based adaptive decoding
play

Text Generation with Exemplar-based Adaptive Decoding Hao Peng, - PowerPoint PPT Presentation

Text Generation with Exemplar-based Adaptive Decoding Hao Peng, Ankur Parikh, Manaal Faruqui, Bhuwan Dhingra, Dipanjan Das @NAACL June 4, 2019 Outline Background and Overview Adaptive Decoding Experiments Conditioned Text Generation


  1. Text Generation with Exemplar-based Adaptive Decoding Hao Peng, Ankur Parikh, Manaal Faruqui, Bhuwan Dhingra, Dipanjan Das @NAACL June 4, 2019

  2. Outline ❖ Background and Overview ❖ Adaptive Decoding ❖ Experiments

  3. Conditioned Text Generation A Portuguese train derailed in Oporto Source x on Wednesday, killing three people. Target Portuguese train derailed, killing three. y

  4. Conditioned Text Generation A Portuguese train derailed in Oporto Source x on Wednesday, killing three people. Enc( x ) Encoder · · · Attention + Copy α ⇣ ⌘ Decoder Dec y | Enc( x ) · · · Target Portuguese train derailed, killing three. y

  5. Exemplar-informed Generation Retrieve Source x Exemplar z Training target A Portuguese train derailed in Oporto Two die in a Britain train collision on Wednesday, killing three people Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

  6. Exemplar-informed Generation What to say How to say it Retrieve Source x Exemplar z Training target A Portuguese train derailed in Oporto Two die in a Britain train collision on Wednesday, killing three people Motivation • Better performance Cao et al., 2018; Zhang et al., 2018 • Diversity and interpretability Guu et al., 2017; Wiseman et al., 2018 Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

  7. Exemplar-informed Generation What to say How to say it Retrieve Source x Exemplar z Training target A Portuguese train derailed in Oporto Two die in a Britain train collision on Wednesday, killing three people ( x 1 , y 1 ) ( x 2 , y 2 ) Training pairs ( x 3 , y 3 ) ( x 4 , y 4 ) · · · Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

  8. Exemplar-informed Generation What to say How to say it Retrieve Source x Exemplar z Training target A Portuguese train derailed in Oporto Two die in a Britain train collision on Wednesday, killing three people Similar ( x 1 , y 1 ) ( x 2 , y 2 ) Training pairs ( x 3 , y 3 ) ( x 4 , y 4 ) Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

  9. Exemplar-informed Generation What to say How to say it Retrieve Source x Exemplar z Training target A Portuguese train derailed in Oporto Two die in a Britain train collision on Wednesday, killing three people Similar ( x 1 , y 1 ) ( x 2 , y 2 ) Training pairs ( x 3 , y 3 ) ( x 4 , y 4 ) Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

  10. Status Quo [ ; ] Source x Exemplar z A Portuguese train derailed in Oporto on Wednesday, killing three people. Two die in a Britain train collision. � � Enc [ x ; z ] · · · Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

  11. Status Quo [ ; ] Source x Exemplar z A Portuguese train derailed in Oporto on Wednesday, killing three people. Two die in a Britain train collision. � � Enc [ x ; z ] · · · Attention + copy α ⇣ �⌘ � Dec y | Enc [ x ; z ] · · · Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

  12. Status Quo [ ; ] Source x Exemplar z A Portuguese train derailed in Oporto on Wednesday, killing three people. Two die in a Britain train collision. � � Enc [ x ; z ] · · · What to say Attention + copy α ⇣ �⌘ � Dec y | Enc [ x ; z ] · · · How to say it Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

  13. Status Quo [ ; ] Source x Exemplar z A Portuguese train derailed in Oporto on Wednesday, killing three people. Two die in a Britain train collision. � � Enc [ x ; z ] · · · Attention + copy α ⇣ �⌘ � Dec y | Enc [ x ; z ] · · · Output Britain train derailed, killing two. Guu et al., 2017; Cao et al., 2018

  14. Overview Motivation • Encoder what to say � � y | P Source x Exemplar z , • Decoder how to say it Method: Adaptive Decoding AdaDec z • Exemplar-specific decoder

  15. Overview Motivation • Encoder what to say � � y | P Source x Exemplar z , • Decoder how to say it Method: Adaptive Decoding AdaDec z • Exemplar-specific decoder • Drop-in replacement in x seq2seq · · · α · · ·

  16. Overview Motivation • Encoder what to say � � y | P Source x Exemplar z , • Decoder how to say it Method: Adaptive Decoding AdaDec z • Exemplar-specific decoder • Drop-in replacement in x seq2seq · · · Experiments α • Summarization • Data2text generation · · ·

  17. Outline ❖ Background and Overview ❖ Adaptive Decoding ❖ Experiments

  18. Adaptive Decoder Goal • Customized decoder for each exemplar. AdaDec z

  19. Adaptive Decoder Goal • Customized decoder for each n o exemplar. , · · · , , Key Points Interpolation • Exemplar-informed z interpolation of backbones. AdaDec z

  20. Adaptive Decoder Goal • Customized decoder for each n o exemplar. , · · · , , Key Points Interpolation • Exemplar-informed z interpolation of backbones. • Low-rank constraints by AdaDec z construction.

  21. Adaptive Decoder ⇣ ⌘ ⇣ ⌘ ⇣ ⌘ , , , W 3 W 2 c 3 W 1 c 1 c 2

  22. Adaptive Decoder ⇣ ⌘ ⇣ ⌘ ⇣ ⌘ , , , W 3 W 2 c 3 W 1 c 1 c 2 AdaDec z + + σ 1 σ 2 σ 3 = W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3

  23. Adaptive Decoder ⇣ ⌘ ⇣ ⌘ ⇣ ⌘ , , , W 3 W 2 c 3 W 1 c 1 c 2 Exemplar z AdaDec z + + σ 1 σ 2 σ 3 = W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3

  24. Adaptive Decoder ⇣ ⌘ ⇣ ⌘ ⇣ ⌘ , , , W 3 W 2 c 3 W 1 c 1 c 2 p = RNN( z ) p > c 1     σ 1 p > c 2  = σ 2    p > c 3 σ 3 W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3

  25. Adaptive Decoder ⇣ ⌘ ⇣ ⌘ ⇣ ⌘ , , , W 3 W 2 c 3 W 1 c 1 c 2 p = RNN( z ) p > c 1     σ 1 p > c 2  = σ 2    p > c 3 σ 3 W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3

  26. Low-rank Constraints W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 + + σ 3 σ 1 σ 2 =

  27. Low-rank Constraints Too many params! W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 + + σ 3 σ 1 σ 2 =

  28. Low-rank Constraints W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 > + σ 2 u 2 v 2 > + σ 3 u 3 v 3 > W = σ 1 u 1 v 1 + + σ 1 σ 2 σ 3 =

  29. Low-rank Constraints W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 > + σ 2 u 2 v 2 > + σ 3 u 3 v 3 > W = σ 1 u 1 v 1 + + σ 1 σ 2 σ 3 = | {z } | {z } Rank = 1 Rank ≤ 3

  30. Low-rank Constraints W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 > + σ 2 u 2 v 2 > + σ 3 u 3 v 3 > + · · · W = σ 1 u 1 v 1 + + + σ 1 σ 2 σ 3 + = · · · | {z } | {z } Rank ≤ d Rank = 1 | {z } m = d

  31. Low-rank Constraints W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 > + σ 2 u 2 v 2 > + σ 3 u 3 v 3 > + · · · W = σ 1 u 1 v 1 + + + σ 1 σ 2 σ 3 + = · · · | {z } | {z } Rank ≤ d Rank = 1 | {z } m = d O ( d 3 ) → O ( d 2 )

  32. Walkthrough Retrieve Source x Exemplar z Training target

  33. Walkthrough Retrieve Source x Exemplar z Training target p = RNN( z ) p > c 1     σ 1 p > c 2  = σ 2    p > c 3 σ 3

  34. Walkthrough Retrieve Source x Exemplar z Training target p = RNN( z ) p > c 1     σ 1 p > c 2  = σ 2    p > c 3 σ 3 σ 1 σ 2 + σ 3

  35. Walkthrough Retrieve Source x Exemplar z Training target p = RNN( z ) p > c 1     σ 1 p > c 2  = σ 2    p > c 3 σ 3 Enc · · · σ 1 α σ 2 AdaDec z + · · · σ 3

  36. Outline ❖ Background and Overview ❖ Adaptive Decoding ❖ Experiments

  37. Experiments: Summarization Datasets: • Gigaword. Rush et al., 2015 • New York Times (NYT). Durrett et al., 2016 Implementation: • TF-IDF + cosine similarity for exemplar retrieval. • LSTM encoder/decoder. • Comparable implementation and tuning.

  38. Experiments: Summarization Datasets: • Gigaword. Rush et al., 2015 • New York Times (NYT). Durrett et al., 2016 Implementation: • TF-IDF + cosine similarity for exemplar retrieval. • LSTM encoder/decoder. • Comparable implementation and tuning.

  39. Rouge scores on Gigaword test set ROUGE-1 ROUGE-L ROUGE-2 37.3 37.0 36.0 35.0 34.7 34.5 33.2 32.4 Rouge 19.0 18.5 17.1 16.6 Seq2seq AttExp, Cao AdaDec, this work Cao, Full Enc. & Att. Exemplar Adaptive Decoding Rerank Cao et al., 2018

  40. Rouge scores on Gigaword test set ROUGE-1 ROUGE-L ROUGE-2 37.3 37.0 36.0 35.0 34.7 34.5 33.2 32.4 Rouge 19.0 18.5 17.1 16.6 Seq2seq AttExp, Cao AdaDec, this work Cao, Full Enc. & Att. Exemplar Adaptive Decoding Rerank Cao et al., 2018

  41. Rouge scores on Gigaword test set ROUGE-1 ROUGE-L ROUGE-2 37.3 37.0 36.0 35.0 34.7 34.5 33.2 32.4 Rouge 19.0 18.5 17.1 16.6 Seq2seq AttExp, Cao AdaDec, this work Cao, Full Enc. & Att. Exemplar Adaptive Decoding Rerank Cao et al., 2018

  42. Rouge scores on NYT test set ROUGE-1 ROUGE-2 43.2 42.9 42.5 41.9 Rouge 26.4 26.0 25.7 25.1 Seq2seq Paulus AttExp AdaDec Enc. & Att. Exemplar Adaptive Decoding Paulus et al., 2018

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend