Transfer learning with neural language models CS 685, Spring 2020 - PowerPoint PPT Presentation

Transfer learning with neural language models CS 685, Spring 2020 Advanced Natural Language Processing Mohit Iyyer College of Information and Computer Sciences University of Massachusetts Amherst many slides from Jacob Devlin & Matt Peters

Stu ff from last time… • Project proposals due 9/21, please use Overleaf template • Still working on making the next homework computationally feasible on Colab, look out for it next week • Please ask other questions (about logistics / material / etc) in the chatbox! 2

Do NNs really need millions of labeled examples? • Can we leverage unlabeled data to cut down on the number of labeled examples we need? 3

What is transfer learning? • In our context: take a network trained on a task for which it is easy to generate labels, and adapt it to a different task for which it is harder. • In computer vision: train a CNN on ImageNet, transfer its representations to every other CV task • In NLP: train a really big language model on billions of words, transfer to every NLP task! 4

A huge self- Sentiment- supervised specialized model model step 1: step 2: unsupervised supervised pretraining fine-tuning A Labeled ton of reviews from unlabeled text IMDB 5

language models for transfer learning Deep contextualized word representations. Peters et al., NAACL 2018

Previous methods (e.g., word2vec) represent each word type with a single vector play = [0.2, -0.1, 0.5, ...] bank = [-0.3, 1.4, 0.7, ...] run = [-0.5, -0.3, -0.1, ...] NNs are then used to compose those vectors over longer sequences

Single vector per word The new-look play area is due to be completed by early spring 2010 .

Single vector per word Gerrymandered congressional districts favor representatives who play to the party base .

Single vector per word The freshman then completed the three-point play for a 66-63 lead .

Nearest neighbors play = [0.2, -0.1, 0.5, ...] Nearest Neighbors playing plays game player games Play played football players multiplayer

Multiple senses entangled play = [0.2, -0.1, 0.5, ...] Nearest Neighbors VERB playing plays game player games Play played football players multiplayer

Multiple senses entangled play = [0.2, -0.1, 0.5, ...] Nearest Neighbors VERB playing plays game player NOUN games Play played football players multiplayer

Multiple senses entangled play = [0.2, -0.1, 0.5, ...] Nearest Neighbors VERB playing plays game player NOUN games Play played football ADJ players multiplayer

�� 15

Examples on iPad 16

�� 17

Deep bidirectional language model … download new games or play ??

Deep bidirectional language model LSTM … download new games or play ??

Deep bidirectional language model LSTM LSTM LSTM … download new games or play ??

Deep bidirectional language model LSTM LSTM LSTM LSTM LSTM … download new games or play ??

Deep bidirectional language model LSTM LSTM LSTM LSTM LSTM LSTM LSTM … download new games or play ??

Deep bidirectional language model LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM … download new games or play ??

Deep bidirectional language model ?? LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM … download new games or play ??

Use all layers of language model 0.25 biLSTM biLSTM biLSTM ELMo 0.6 embeddings from language models biLSTM biLSTM biLSTM 0.15 … games or play online via …

Learned task-specific combination of layers s 3 biLSTM biLSTM biLSTM ELMo s 2 embeddings from language models biLSTM biLSTM biLSTM s 1 … games or play online via …

Contextual representations ELMo representations are contextual – they depend on the entire sentence in which a word is used. how many different embeddings does ELMo compute for a given word?

ELMo improves NLP tasks

Large-scale recurrent neural language models learn contextual representations that capture basic elements of semantics and syntax Adding ELMo to existing state-of-the-art models provides significant performance improvement on all NLP tasks.

FROM TO

�� 32

�� Why not? � �� 33

�� 34

Transfer learning with neural language models CS 685, Spring 2020 - PowerPoint PPT Presentation

Transfer learning with neural language models CS 685, Spring 2020 Advanced Natural Language Processing Mohit Iyyer College of Information and Computer Sciences University of Massachusetts Amherst many slides from Jacob Devlin & Matt Peters

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Industrial Transfer Learning Introduction to Industrial Transfer Learning Industrial Transfer

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

Language Modeling CS 6956: Deep Learning for NLP Overview What is a language model? How

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural networks, Language

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Recurrent Language Models CMSC 470 Marine Carpuat Toward a Neural Language Model Figures by

Recurrent Language Models CMSC 470 Marine Carpuat Toward a Neural Language Model Figures by

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

4 Language Models 2: Log-linear Language Models This chapter will discuss another set of language

Transfer United: Partnerships to Foster Transfer Student Success Tuesday, November 5 th

Transfer Learning Eu Wern Teh What are we covering? Why transfer learning? Fine

Language Models: Evaluation & Neural Models CMSC 470 Marine Carpuat Slides credit: Jurasky

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/

Monte-Carlo Game Tree Search: Advanced Techniques Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

Generative Adversarial Networks presented by Ian Goodfellow presentation co-developed with Aaron

Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam Lerer* 1 , Sam Gross 1 , Tuomas

CS440/ECE448 Lecture 12: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

Remember these? Playing Atari Games using RL VARSHA LALWANI AKSHAY MASARE Motivation May be we

AI Methodology Theoretical aspects Mathematical formalizations, properties, algorithms

Deep Reinforcement Learning M. Soleymani Sharif University of Technology Spring 2020 Most

Transfer learning with neural language models CS 685, Spring 2020 - PowerPoint PPT Presentation

Transfer learning with neural language models CS 685, Spring 2020 Advanced Natural Language Processing Mohit Iyyer College of Information and Computer Sciences University of Massachusetts Amherst many slides from Jacob Devlin & Matt Peters

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Industrial Transfer Learning Introduction to Industrial Transfer Learning Industrial Transfer

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

Language Modeling CS 6956: Deep Learning for NLP Overview What is a language model? How

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural networks, Language

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Recurrent Language Models CMSC 470 Marine Carpuat Toward a Neural Language Model Figures by

Recurrent Language Models CMSC 470 Marine Carpuat Toward a Neural Language Model Figures by

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

4 Language Models 2: Log-linear Language Models This chapter will discuss another set of language

Transfer United: Partnerships to Foster Transfer Student Success Tuesday, November 5 th

Transfer Learning Eu Wern Teh What are we covering? Why transfer learning? Fine

Language Models: Evaluation &amp; Neural Models CMSC 470 Marine Carpuat Slides credit: Jurasky

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/

Monte-Carlo Game Tree Search: Advanced Techniques Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

Generative Adversarial Networks presented by Ian Goodfellow presentation co-developed with Aaron

Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam Lerer* 1 , Sam Gross 1 , Tuomas

CS440/ECE448 Lecture 12: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

Remember these? Playing Atari Games using RL VARSHA LALWANI AKSHAY MASARE Motivation May be we

AI Methodology Theoretical aspects Mathematical formalizations, properties, algorithms

Deep Reinforcement Learning M. Soleymani Sharif University of Technology Spring 2020 Most

Language Models: Evaluation & Neural Models CMSC 470 Marine Carpuat Slides credit: Jurasky