Recurrent Neural Networks CS 6956: Deep Learning for NLP Overview - PowerPoint PPT Presentation

Recurrent Neural Networks CS 6956: Deep Learning for NLP

Overview 1. Modeling sequences 2. Recurrent neural networks: An abstraction 3. Usage patterns for RNNs 4. BiDirectional RNNs 5. A concrete example: The Elman RNN 6. The vanishing gradient problem 7. Long short-term memory units 1

Overview 1. Modeling sequences 2. Recurrent neural networks: An abstraction 3. Usage patterns for RNNs 4. BiDirectional RNNs 5. A concrete example: The Elman RNN 6. The vanishing gradient problem 7. Long short-term memory units 2

Sequences abound in NLP S a l t L a k e C i t y Words are sequences of characters 3

Sequences abound in NLP S a l t L a k e C i t y lives in Salt Lake City John Sentences are sequences of words 4

Sequences abound in NLP S a l t L a k e C i t y lives in Salt Lake City John John lives in Salt Lake City. He enjoys hiking with his dog. His cat hates hiking. Paragraphs are sequences of sentences 5

Sequences abound in NLP S a l t L a k e C i t y lives in Salt Lake City John John lives in Salt Lake City. He enjoys hiking with his dog. His cat hates hiking. And so on… inputs are naturally sequences at different levels 6

Sequences abound in NLP S a l t L a k e C i t y lives in Salt Lake City John John lives in Salt Lake City. He enjoys hiking with his dog. His cat hates hiking. Outputs can also be sequences 7

Sequences abound in NLP S a l t L a k e C i t y lives in Salt Lake City John John lives in Salt Lake City. He enjoys hiking with his dog. His cat hates hiking. lives in Salt Lake City John Part-of-speech tags form a sequence 8

Sequences abound in NLP S a l t L a k e C i t y lives in Salt Lake City John John lives in Salt Lake City. He enjoys hiking with his dog. His cat hates hiking. lives in Salt Lake City John Noun Verb Preposition Noun Noun Noun Part-of-speech tags form a sequence 9

Sequences abound in NLP S a l t L a k e C i t y lives in Salt Lake City John John lives in Salt Lake City. He enjoys hiking with his dog. His cat hates hiking. Noun Verb Preposition Noun Noun Noun lives in Salt Lake City John Person Location Even things that don’t look like a sequence can be made to look like one Example: Named entity tags 10

Sequences abound in NLP S a l t L a k e C i t y lives in Salt Lake City John John lives in Salt Lake City. He enjoys hiking with his dog. His cat hates hiking. Noun Verb Preposition Noun Noun Noun lives in Salt Lake City John B-PER O O B-LOC I-LOC I-LOC Even things that don’t look like a sequence can be made to look like one Example: Named entity tags 11

Sequences abound in NLP S a l t L a k e C i t y lives in Salt Lake City John John lives in Salt Lake City. He enjoys hiking with his dog. His cat hates hiking. Noun Verb Preposition Noun Noun Noun B-PER O O B-LOC I-LOC I-LOC And we can get very creative with such encodings Example: We can encode parse trees as a sequence of decisions needed to construct the tree 12

Sequences abound in NLP S a l t L a k e C i t y lives in Salt Lake City John Natural question: How do we model sequential inputs and outputs? John lives in Salt Lake City. He enjoys hiking with his dog. His cat hates hiking. Noun Verb Preposition Noun Noun Noun B-PER O O B-LOC I-LOC I-LOC And we can get very creative with such encodings Example: We can encode parse trees as a sequence of decisions needed to construct the tree 13

Sequences abound in NLP S a l t L a k e C i t y lives in Salt Lake City John Natural question: How do we model sequential inputs and outputs? John lives in Salt Lake City. He enjoys hiking with his dog. His cat hates hiking. More concretely, we need a mechanism that allows us to Noun Verb Preposition Noun Noun Noun 1. Capture sequential dependencies between inputs B-PER O O B-LOC I-LOC I-LOC 2. Model uncertainty over sequential outputs And we can get very creative with such encodings Example: We can encode parse trees as a sequence of decisions needed to construct the tree 14

� Modeling sequences: The problem Suppose we want to build a language model that computes the probability of sentences We can write the probability as 𝑄 𝑦 # , 𝑦 % , 𝑦 & , ⋯ , 𝑦 ( = * 𝑄(𝑦 , ∣ 𝑦 # , 𝑦 % ⋯ , 𝑦 ,.# ) , 15

Example: A Language model It was a bright cold day in April. 16

Example: A Language model It was a bright cold day in April. Probability of a word starting a sentence 17

Example: A Language model It was a bright cold day in April. Probability of a word starting a sentence Probability of a word following “It” 18

Example: A Language model It was a bright cold day in April. Probability of a word starting a sentence Probability of a word following “It” Probability of a word following “It was” 19

Example: A Language model It was a bright cold day in April. Probability of a word starting a sentence Probability of a word following “It” Probability of a word following “It was” Probability of a word following “It was a” 20

Example: A Language model It was a bright cold day in April. Probability of a word starting a sentence Probability of a word following “It” Probability of a word following “It was” Probability of a word following “It was a” 21

A history-based model • Each token is dependent on all the tokens that came before it – Simple conditioning – Each P(x i | …) is a multinomial probability distribution over the tokens • What is the problem here? – How many parameters do we have? • Grows with the size of the sequence! 22

A history-based model • Each token is dependent on all the tokens that came before it – Simple conditioning – Each P(x i | …) is a multinomial probability distribution over the tokens • What is the problem here? – How many parameters do we have? • Grows with the size of the sequence! 23

� The traditional solution: Lose the history Make a modeling assumption Example: The first order Markov model assumes that 𝑄 𝑦 , 𝑦 # , 𝑦 % , ⋯ , 𝑦 ,.# = 𝑄(𝑦 , ∣ 𝑦 ,.# ) This allows us to simplify 𝑄 𝑦 # , 𝑦 % , 𝑦 & , ⋯ , 𝑦 ( = * 𝑄(𝑦 , ∣ 𝑦 # , 𝑦 % ⋯ , 𝑦 ,.# ) , 24

� The traditional solution: Lose the history Make a modeling assumption Example: The first order Markov model assumes that 𝑄 𝑦 , 𝑦 # , 𝑦 % , ⋯ , 𝑦 ,.# = 𝑄(𝑦 , ∣ 𝑦 ,.# ) This allows us to simplify 𝑄 𝑦 # , 𝑦 % , 𝑦 & , ⋯ , 𝑦 ( = * 𝑄(𝑦 , ∣ 𝑦 # , 𝑦 % ⋯ , 𝑦 ,.# ) , These dependencies are ignored 25

� The traditional solution: Lose the history Make a modeling assumption Example: The first order Markov model assumes that 𝑄 𝑦 , 𝑦 # , 𝑦 % , ⋯ , 𝑦 ,.# = 𝑄(𝑦 , ∣ 𝑦 ,.# ) This allows us to simplify 𝑄 𝑦 # , 𝑦 % , 𝑦 & , ⋯ , 𝑦 ( = * 𝑄(𝑦 , ∣ 𝑦 ,.# ) , 26

Example: Another language model It was a bright cold day in April Probability of a word starting a sentence Probability of a word following “It” Probability of a word following “was” Probability of a word following “a” 27

Example: Another language model It was a bright cold day in April Probability of a word starting a sentence Probability of a word following “It” Probability of a word following “was” Probability of a word following “a” If there are K tokens/states, how many parameters do we need? 28

Example: Another language model It was a bright cold day in April Probability of a word starting a sentence Probability of a word following “It” Probability of a word following “was” Probability of a word following “a” If there are K tokens/states, how many parameters do we need? O(K 2 ) 29

Can we do better? • Can we capture the meaning of the entire history without arbitrarily growing the number of parameters? • Or equivalently, can we discard the Markov assumption? • Can we represent arbitrarily long sequences as fixed sized vectors? – Perhaps to provide features for subsequent classification • Answer: Recurrent neural networks (RNNs) 30

Recurrent Neural Networks CS 6956: Deep Learning for NLP Overview - PowerPoint PPT Presentation

Recurrent Neural Networks CS 6956: Deep Learning for NLP Overview 1. Modeling sequences 2. Recurrent neural networks: An abstraction 3. Usage patterns for RNNs 4. BiDirectional RNNs 5. A concrete example: The Elman RNN 6. The vanishing

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

IN5550 Neural Methods in Natural Language Processing Recurrent Neural Networks Stephan Oepen

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Introduction to Recurrent Neural Networks Jakob Verbeek Modeling sequential data with Recurrent

Weather, Weather, Everywhere Adding a Weather Forecast to your pages By Bradley Roberts EM:

Concept #2: Piezoboot Charger l Many outdoor electronics require AA batteries. l In the

Preprocessing Data for Machine Learning P R E P R OC E SSIN G FOR MAC H IN E L E AR N IN G

Mood Walks The Nurture of Nature Tuesday October 6 th , 2015 12:00 1:00 pm Webinar

Hitch-hiking and polygenic adaptation Kevin Thornton Ecology and Evolutionary Biology, UC Irvine

CITIES, HEALTH AND WELL-BEING NOVEMBER 2011 The making of a place of health, the making of a

Eldorado Canyon State Park Visitor Use Management Plan Public Meetings September 9 and 17, 2019

The Protg OWL Plugin Holger Knublauch Stanford University July 07 2004 Overview The