 
              Carnegie Mellon University More is Less? Non-parametric Language Models and Efficiency Graham Neubig Carnegie Mellon University Based on Work by Junxian He w/ Taylor Berg-Kirkpatrick 1
Carnegie Mellon University Parametric Language Models Parametric LM <s> 2
Carnegie Mellon University Parametric Language Models I Parametric LM <s> 3
Carnegie Mellon University Parametric Language Models I Parametric LM <s> I 4
Carnegie Mellon University Parametric Language Models I ordered Parametric LM <s> I 5
Carnegie Mellon University Parametric Language Models I ordered a Parametric LM <s> I ordered 6
Carnegie Mellon University Parametric Language Models I ordered a pizza with sauce . </s> Parametric LM <s> I ordered a pizza with sauce . 7
Carnegie Mellon University Non-Parametric Language Models Non-Parametric LM Non-parametric datastore (typically training dataset) 8
Carnegie Mellon University Non-Parametric Language Models Non-Parametric LM Prototype sentence I ordered a burger with fries Non-parametric datastore (typically training dataset) 9
Carnegie Mellon University Non-Parametric Language Models Non-Parametric LM Prototype sentence <s> I ordered a burger with fries Non-parametric datastore (typically training dataset) 10
Carnegie Mellon University Non-Parametric Language Models I Non-Parametric LM Prototype sentence <s> I ordered a burger with fries Non-parametric datastore (typically training dataset) 11
Carnegie Mellon University Non-Parametric Language Models I ordered a pizza with sauce . </s> Non-Parametric LM Prototype sentence <s> I ordered a pizza with sauce . I ordered a burger with fries Non-parametric datastore (typically training dataset) 12
Carnegie Mellon University Non-Parametric Language Models I ordered a pizza with sauce . </s> Non-Parametric LM Prototype sentence <s> I ordered a pizza with sauce . I ordered a burger with fries Non-parametric datastore (typically training dataset) 13
Carnegie Mellon University A Human Processing Analogy 14
Carnegie Mellon University A Human Processing Analogy Our Language Ability/ Knowledge 14
Carnegie Mellon University A Human Processing Analogy Our Language Ability/ Our Language Ability/ Knowledge + Knowledge Wikipedia/Google 14
Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP 15
Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. 15
Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. •Bias probabili6es based on count-based aggrega6on of retrieved sentences: •Guiding Neural Machine Transla/on with Retrieved Transla/on Pieces. Zhang et al. 2018. •Deep Weighted Averaging Classifiers. Card et al. 2019. •Nearest Neighbor Machine Transla/on. Khandelwal et al. 2020. 15
Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. •Bias probabili6es based on count-based aggrega6on of retrieved sentences: •Guiding Neural Machine Transla/on with Retrieved Transla/on Pieces. Zhang et al. 2018. •Deep Weighted Averaging Classifiers. Card et al. 2019. •Nearest Neighbor Machine Transla/on. Khandelwal et al. 2020. •A;end to retrieved sentences: • Search Engine Guided Non-Parametric Neural Machine Transla/on. Gu et al. 2018. • Genera/ng Sentences by Edi/ng Prototypes. Guu et al. 2018. • REALM: Retrieval-Augmented Language Model Pre-Training. Guu et al. 2020. 15
Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. •Bias probabili6es based on count-based aggrega6on of retrieved sentences: •Guiding Neural Machine Transla/on with Retrieved Transla/on Pieces. Zhang et al. 2018. •Deep Weighted Averaging Classifiers. Card et al. 2019. •Nearest Neighbor Machine Transla/on. Khandelwal et al. 2020. •A;end to retrieved sentences: • Search Engine Guided Non-Parametric Neural Machine Transla/on. Gu et al. 2018. • Genera/ng Sentences by Edi/ng Prototypes. Guu et al. 2018. • REALM: Retrieval-Augmented Language Model Pre-Training. Guu et al. 2020. •Feed retrieved states into model: •Learning to Remember Rare Events. Kaiser et al. 2017. 15
Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. •Bias probabili6es based on count-based aggrega6on of retrieved sentences: •Guiding Neural Machine Transla/on with Retrieved Transla/on Pieces. Zhang et al. 2018. •Deep Weighted Averaging Classifiers. Card et al. 2019. •Nearest Neighbor Machine Transla/on. Khandelwal et al. 2020. •A;end to retrieved sentences: • Search Engine Guided Non-Parametric Neural Machine Transla/on. Gu et al. 2018. • Genera/ng Sentences by Edi/ng Prototypes. Guu et al. 2018. • REALM: Retrieval-Augmented Language Model Pre-Training. Guu et al. 2020. •Feed retrieved states into model: •Learning to Remember Rare Events. Kaiser et al. 2017. •Fine-tune models on retrieved sentences: •One Sentence One Model for Neural Machine Transla/on. Li et al. 2016. 15
Carnegie Mellon University Non-Parametric Language Models: Pros/Cons Pros • Micgate pressure on the parametric models • Improved interpretability in the modeling process
Carnegie Mellon University Non-Parametric Language Models: Pros/Cons Pros • Micgate pressure on the parametric models • Improved interpretability in the modeling process Cons • In almost all cases, large parametric datastore leads to significant issues with memory and speed efficiency at test cme
Carnegie Mellon University A Concrete Example: Prototype-based Language Models (Guu et al. TACL 2018) 18
Recommend
More recommend