Machine Learning for Computational Linguistics Recurrent neural - PowerPoint PPT Presentation

Machine Learning for Computational Linguistics Recurrent neural networks (RNNs) Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft July 5, 2016

Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, backpropagation sigmoid , tanh , or ReLU functions, such as logistic 1 / 22 Feed-forward networks f () g () h = f ( W ( 1 ) x ) w ( 1 ) y = g ( W ( 2 ) h ) ( 2 ) w x 1 11 h 1 1 1 y 1 = g ( W ( 2 ) f ( W ( 1 ) x )) w w ( 1 ) ( 2 1 12 ) ( 1 ) 2 w ( 2 ) 01 1 0 w ( 1 ) w w ( 2 ) • f () and g () are non-linear 21 1 2 w ( 1 ) ( 2 ) w x 2 22 h 2 2 2 y 2 • weights are updated using w ( 1 ) w ( 2 ) 2 02 0 1 1

Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, – predicting the local environment (word2vec, GloVe) – global statistics over the complete data (e.g., SVD) representations: unlabeled data words, as well as relations between them representations represent similarities/difgerences between methods, particularly for neural networks Dense (word) representations 2 / 22 • Dense vector representations are useful for many ML • Unlike sparse (one-of-K / one-hot) representations, dense • General-purpose word vectors can be trained with • They can also be trained for the task at hand • Two methods to obtain (general purpose) dense

Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, vanishing or exploding gradients back propagation may result in useful layers/hierarchies of features are problems where successful in many task layers) have recently been … Deep feed-forward networks 3 / 22 • Deep neural networks (>2 hidden • They are particularly useful in • Training deep networks with x 1 x m

Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, of location invariance by a weighted some of its neighbors Pooling Convolution 4 / 22 Convolutional networks ′ ′ ′ h h h 1 2 3 h 1 h 2 h 3 h 4 h 5 x 1 x 2 x 3 x 4 x 5 • Convolution transforms input by replacing each input unit • Typically it is followed by pooling • CNNs are useful to detect local features with some amount • Sparse connectivity makes CNNs computationally effjcient

Neural networks: a quick summary Feature maps July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, Classifjer Features Pooling Convolution Recurrent neural networks Word vectors Input seeing worth really not CNNs for NLP 5 / 22

Neural networks: a quick summary Recurrent neural networks Recurrent neural networks: motivation – can only learn associations – they do not have memory of earlier inputs: they cannot handle sequences learning Ç. Çöltekin, SfS / University of Tübingen July 5, 2016 6 / 22 • Feed forward networks • Recurrent neural networks are NN solution for sequence • This is achieved by recursive loops in the network

Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, somewhat tricky Forward calculation is straightforward, learning becomes hidden layers) as well as the input But they include loops that use previous output (of the feed-forward networks 7 / 22 Recurrent neural networks y h 1 h 2 h 3 h 4 x 1 x 2 x 3 x 4 • Recurrent neural networks are similar to the standard

Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, somewhat tricky Forward calculation is straightforward, learning becomes hidden layers) as well as the input feed-forward networks 7 / 22 Recurrent neural networks y h 1 h 2 h 3 h 4 x 1 x 2 x 3 x 4 • Recurrent neural networks are similar to the standard • But they include loops that use previous output (of the

Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, somewhat tricky hidden layers) as well as the input feed-forward networks 7 / 22 Recurrent neural networks y h 1 h 2 h 3 h 4 x 1 x 2 x 3 x 4 • Recurrent neural networks are similar to the standard • But they include loops that use previous output (of the • Forward calculation is straightforward, learning becomes

Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, dependencies long-distance but cannot learn network feed-forward states (context units) previous hidden Hidden units Output units Context units Input Elman (1990) A simple version: SRNs 8 / 22 • The network keeps • The rest is just like a y p o c • Training is simple,

Neural networks: a quick summary Recurrent neural networks Processing sequences with RNNs links Ç. Çöltekin, SfS / University of Tübingen July 5, 2016 9 / 22 • RNNs process sequences one unit at a time • The earlier input afgects the output through the recurrent y h 1 h 2 h 3 h 4

Neural networks: a quick summary Recurrent neural networks Processing sequences with RNNs links not Ç. Çöltekin, SfS / University of Tübingen July 5, 2016 9 / 22 • RNNs process sequences one unit at a time • The earlier input afgects the output through the recurrent y h 1 h 2 h 3 h 4

Neural networks: a quick summary Recurrent neural networks Processing sequences with RNNs links really Ç. Çöltekin, SfS / University of Tübingen July 5, 2016 9 / 22 • RNNs process sequences one unit at a time • The earlier input afgects the output through the recurrent y h 1 h 2 h 3 h 4

Neural networks: a quick summary Recurrent neural networks Processing sequences with RNNs links worth Ç. Çöltekin, SfS / University of Tübingen July 5, 2016 9 / 22 • RNNs process sequences one unit at a time • The earlier input afgects the output through the recurrent y h 1 h 2 h 3 h 4

Neural networks: a quick summary Recurrent neural networks Processing sequences with RNNs links seeing Ç. Çöltekin, SfS / University of Tübingen July 5, 2016 9 / 22 • RNNs process sequences one unit at a time • The earlier input afgects the output through the recurrent y h 1 h 2 h 3 h 4

Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, backpropagated how errors should be fjrst not that obvious 10 / 22 Learning in recurrent networks y ( 1 ) • We need to learn three sets of weights: W 0 , W 1 and W 0 W 1 • Backpropagation in RNNs are at W 3 h ( 1 ) • It is not immediately obvious W 0 x

Neural networks: a quick summary … July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, Note: the weights with the same color are shared. … Recurrent neural networks 11 / 22 Back propagation through time (BPTT) … Unrolling a recurrent network y ( 0 ) y ( 1 ) y ( t − 1 ) y ( t ) h ( 0 ) h ( 1 ) h ( t − 1 ) h ( t ) x ( 0 ) x ( 1 ) x ( t − 1 ) x ( t )

Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, … … … Many-to-many (e.g., POS tagging) RNN architectures 12 / 22 y ( 0 ) y ( 1 ) y ( t − 1 ) y ( t ) h ( 0 ) h ( 1 ) h ( t − 1 ) h ( t ) x ( 0 ) x ( 1 ) x ( t − 1 ) x ( t )

Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, … 12 / 22 … Many-to-one (e.g., document classifjcation) RNN architectures y ( t ) h ( 0 ) h ( 1 ) h ( t − 1 ) h ( t ) x ( 0 ) x ( 1 ) x ( t − 1 ) x ( t )

Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, … 12 / 22 Many-to-one with a delay (e.g., machine translation) … RNN architectures y ( t − 1 ) y ( t ) h ( 0 ) h ( 1 ) h ( t − 1 ) h ( t ) x ( 0 ) x ( 1 ) x ( t − 1 ) x ( t )

Neural networks: a quick summary … July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, … … Backward states … Forward states Recurrent neural networks Bidirectional RNNs 13 / 22 y ( t − 1 ) y ( t ) y ( t − 1 ) x ( t − 1 ) x ( t ) x ( t + 1 )

Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, probabilities cannot be estimated reliably due to sparsity corpus 14 / 22 sequences of words words A short divergence: language models • Language models are useful in many NLP tasks • A language model defjnes a probability distribution over • An ngram model assigns probabilities of a sequence of m ∏ P ( w 1 , . . . , w m ) ≈ P ( w i | w i − 1 , . . . , w i −( n − 1 ) ) i = 1 • Conditional probabilities are estimated from a (unlabeled) • Larger ngrams require lots of memory, and their

Neural networks: a quick summary Recurrent neural networks RNNs as language models learn dependencies at a longer distance Ç. Çöltekin, SfS / University of Tübingen July 5, 2016 15 / 22 • RNNs can function as language models • We can train RNNs using unlabeled data for this purpose • During training the task of RNN is to predict the next word • Depending on the network confjguration, an RNN can • The resulting system can generate sequences

Machine Learning for Computational Linguistics Recurrent neural - PowerPoint PPT Presentation

Machine Learning for Computational Linguistics Recurrent neural networks (RNNs) ar ltekin University of Tbingen Seminar fr Sprachwissenschaft July 5, 2016 Neural networks: a quick summary Recurrent neural networks July 5, 2016

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Introduction to Linguistics Darrell Larsen Linguistics 101 Darrell Larsen Introduction to

Outline zipfR zipfR (Computational) linguistics Evert & Baroni Evert & Baroni

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Topics in Computational Linguistics Topics in Computational Linguistics March 28, 2014 GIL,

One-Shot Learning: Language Acquisition for Machine SS16 Computational Linguistics for

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Linguistics 201 Personnel Introduction to Linguistics General Course Description Syllabus

A Comprehensive Survey on Deep Future Frame Video Prediction by Javier Selva Castell

Deep Recurrent Survival Analysis Kan Ren, Jiarui Qin, Lei Zheng, Zhengyu Yang, Weinan Zhang, Lin

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Stochastic Processes Will Perkins March 7, 2013 Stochastic Processes Q: What is a Stochastic

Machine Translation 2 Wikipedia Machine translation, often referred to by the acronym MT, is a

Vicsek Model Vicsek et al., 1995 YouTube: The Greatest Bird Show on Earth Characteristics such as

3 7 6 4 1 5 2 9 7 1 5 8 6 4 Meal Counting 5 3 1 0 7 9 Do's & Don'ts 8 5

COMP 204 Introduction to image analysis with scikit-image (part three) Mathieu Blanchette, based

Machine Learning for Computational Linguistics Recurrent neural - PowerPoint PPT Presentation

Machine Learning for Computational Linguistics Recurrent neural networks (RNNs) ar ltekin University of Tbingen Seminar fr Sprachwissenschaft July 5, 2016 Neural networks: a quick summary Recurrent neural networks July 5, 2016

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Introduction to Linguistics Darrell Larsen Linguistics 101 Darrell Larsen Introduction to

Outline zipfR zipfR (Computational) linguistics Evert &amp; Baroni Evert &amp; Baroni

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Topics in Computational Linguistics Topics in Computational Linguistics March 28, 2014 GIL,

One-Shot Learning: Language Acquisition for Machine SS16 Computational Linguistics for

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Linguistics 201 Personnel Introduction to Linguistics General Course Description Syllabus

A Comprehensive Survey on Deep Future Frame Video Prediction by Javier Selva Castell

Deep Recurrent Survival Analysis Kan Ren, Jiarui Qin, Lei Zheng, Zhengyu Yang, Weinan Zhang, Lin

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Stochastic Processes Will Perkins March 7, 2013 Stochastic Processes Q: What is a Stochastic

Machine Translation 2 Wikipedia Machine translation, often referred to by the acronym MT, is a

Vicsek Model Vicsek et al., 1995 YouTube: The Greatest Bird Show on Earth Characteristics such as

3 7 6 4 1 5 2 9 7 1 5 8 6 4 Meal Counting 5 3 1 0 7 9 Do's &amp; Don'ts 8 5

COMP 204 Introduction to image analysis with scikit-image (part three) Mathieu Blanchette, based

Outline zipfR zipfR (Computational) linguistics Evert & Baroni Evert & Baroni

3 7 6 4 1 5 2 9 7 1 5 8 6 4 Meal Counting 5 3 1 0 7 9 Do's & Don'ts 8 5