more is less non parametric language models and efficiency
play

More is Less? Non-parametric Language Models and Efficiency Graham - PowerPoint PPT Presentation

Carnegie Mellon University More is Less? Non-parametric Language Models and Efficiency Graham Neubig Carnegie Mellon University Based on Work by Junxian He w/ Taylor Berg-Kirkpatrick 1 Carnegie Mellon University Parametric Language Models


  1. Carnegie Mellon University More is Less? Non-parametric Language Models and Efficiency Graham Neubig Carnegie Mellon University Based on Work by Junxian He w/ Taylor Berg-Kirkpatrick 1

  2. Carnegie Mellon University Parametric Language Models Parametric LM <s> 2

  3. Carnegie Mellon University Parametric Language Models I Parametric LM <s> 3

  4. Carnegie Mellon University Parametric Language Models I Parametric LM <s> I 4

  5. Carnegie Mellon University Parametric Language Models I ordered Parametric LM <s> I 5

  6. Carnegie Mellon University Parametric Language Models I ordered a Parametric LM <s> I ordered 6

  7. Carnegie Mellon University Parametric Language Models I ordered a pizza with sauce . </s> Parametric LM <s> I ordered a pizza with sauce . 7

  8. Carnegie Mellon University Non-Parametric Language Models Non-Parametric LM Non-parametric datastore (typically training dataset) 8

  9. Carnegie Mellon University Non-Parametric Language Models Non-Parametric LM Prototype sentence I ordered a burger with fries Non-parametric datastore (typically training dataset) 9

  10. Carnegie Mellon University Non-Parametric Language Models Non-Parametric LM Prototype sentence <s> I ordered a burger with fries Non-parametric datastore (typically training dataset) 10

  11. Carnegie Mellon University Non-Parametric Language Models I Non-Parametric LM Prototype sentence <s> I ordered a burger with fries Non-parametric datastore (typically training dataset) 11

  12. Carnegie Mellon University Non-Parametric Language Models I ordered a pizza with sauce . </s> Non-Parametric LM Prototype sentence <s> I ordered a pizza with sauce . I ordered a burger with fries Non-parametric datastore (typically training dataset) 12

  13. Carnegie Mellon University Non-Parametric Language Models I ordered a pizza with sauce . </s> Non-Parametric LM Prototype sentence <s> I ordered a pizza with sauce . I ordered a burger with fries Non-parametric datastore (typically training dataset) 13

  14. Carnegie Mellon University A Human Processing Analogy 14

  15. Carnegie Mellon University A Human Processing Analogy Our Language Ability/ Knowledge 14

  16. Carnegie Mellon University A Human Processing Analogy Our Language Ability/ Our Language Ability/ Knowledge + Knowledge Wikipedia/Google 14

  17. Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP 15

  18. Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. 15

  19. Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. •Bias probabili6es based on count-based aggrega6on of retrieved sentences: •Guiding Neural Machine Transla/on with Retrieved Transla/on Pieces. Zhang et al. 2018. •Deep Weighted Averaging Classifiers. Card et al. 2019. •Nearest Neighbor Machine Transla/on. Khandelwal et al. 2020. 15

  20. Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. •Bias probabili6es based on count-based aggrega6on of retrieved sentences: •Guiding Neural Machine Transla/on with Retrieved Transla/on Pieces. Zhang et al. 2018. •Deep Weighted Averaging Classifiers. Card et al. 2019. •Nearest Neighbor Machine Transla/on. Khandelwal et al. 2020. •A;end to retrieved sentences: • Search Engine Guided Non-Parametric Neural Machine Transla/on. Gu et al. 2018. • Genera/ng Sentences by Edi/ng Prototypes. Guu et al. 2018. • REALM: Retrieval-Augmented Language Model Pre-Training. Guu et al. 2020. 15

  21. Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. •Bias probabili6es based on count-based aggrega6on of retrieved sentences: •Guiding Neural Machine Transla/on with Retrieved Transla/on Pieces. Zhang et al. 2018. •Deep Weighted Averaging Classifiers. Card et al. 2019. •Nearest Neighbor Machine Transla/on. Khandelwal et al. 2020. •A;end to retrieved sentences: • Search Engine Guided Non-Parametric Neural Machine Transla/on. Gu et al. 2018. • Genera/ng Sentences by Edi/ng Prototypes. Guu et al. 2018. • REALM: Retrieval-Augmented Language Model Pre-Training. Guu et al. 2020. •Feed retrieved states into model: •Learning to Remember Rare Events. Kaiser et al. 2017. 15

  22. Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. •Bias probabili6es based on count-based aggrega6on of retrieved sentences: •Guiding Neural Machine Transla/on with Retrieved Transla/on Pieces. Zhang et al. 2018. •Deep Weighted Averaging Classifiers. Card et al. 2019. •Nearest Neighbor Machine Transla/on. Khandelwal et al. 2020. •A;end to retrieved sentences: • Search Engine Guided Non-Parametric Neural Machine Transla/on. Gu et al. 2018. • Genera/ng Sentences by Edi/ng Prototypes. Guu et al. 2018. • REALM: Retrieval-Augmented Language Model Pre-Training. Guu et al. 2020. •Feed retrieved states into model: •Learning to Remember Rare Events. Kaiser et al. 2017. •Fine-tune models on retrieved sentences: •One Sentence One Model for Neural Machine Transla/on. Li et al. 2016. 15

  23. Carnegie Mellon University Non-Parametric Language Models: Pros/Cons Pros • Micgate pressure on the parametric models • Improved interpretability in the modeling process

  24. Carnegie Mellon University Non-Parametric Language Models: Pros/Cons Pros • Micgate pressure on the parametric models • Improved interpretability in the modeling process Cons • In almost all cases, large parametric datastore leads to significant issues with memory and speed efficiency at test cme

  25. Carnegie Mellon University A Concrete Example: Prototype-based Language Models (Guu et al. TACL 2018) 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend