LSTMs Overview Subhashini Venugopalan Neural Networks z t Output - PowerPoint PPT Presentation

LSTMs Overview Subhashini Venugopalan

Neural Networks z t Output B Hidden Hidden Input

WHY RNNs/LSTMs? Can we operate over sequences of inputs? Limitations of vanilla Neural Networks z t Output Outputs a fixed size vector. B Hidden Performs a fixed number of computations (#layers). Hidden Accepts only fixed size input Input e.g 224x224 images.

Recurrent Neural Networks They are networks with loops. [Elman ‘90] Image Credit: Chris Olah

Un-Roll The Loop Recurrent Neural Network “unrolled in time” ● Each time step has a layer with the same weights. ● The repeating layer/module is a sigmoid or a tanh. ● Learns to model (h t | x 1 , …, x t-1 ) Image Credit: Chris Olah

Simple RNNs sigmoid or tanh Image Credit: Chris Olah sigmoid or tanh

Problems with Simple RNNs ● Can’t seem to handle “long-term dependencies” in practice ● Gradients shrink through the many layers (Vanishing Gradients) [Hochreiter ‘91] [Bengio et. al. ‘94] Image Credit: Chris Olah

Long Short Term Memory (LSTMs) [Hochreiter and Schmidhuber ‘97] Image Credit: Chris Olah

LSTM Unit x t h t-1 x t h t-1 Memory Cell: Core of the LSTM Unit Output Input Encodes all inputs observed Gate Gate Memory Cell x t h t + h t-1 Input Modulation Gate Forget Gate [Hochreiter and Schmidhuber ‘97] h t-1 x t [Graves ‘13]

LSTM Unit x t h t-1 x t h t-1 Memory Cell: Core of the LSTM Unit Output Input Encodes all inputs observed Gate Gate Memory Cell x t h t + Gates: h t-1 Input Input, Output and Forget Modulation Sigmoid [0,1] Gate Forget Gate [Hochreiter and Schmidhuber ‘97] h t-1 x t [Graves ‘13]

LSTM Unit x t h t-1 x t h t-1 Output Input Gate Gate Memory Cell x t Update the Cell state h t + h t-1 Input Modulation Learns long-term dependencies Gate Forget Gate [Hochreiter and Schmidhuber ‘97] h t-1 x t [Graves ‘13]

Can Model Sequences LSTM LSTM LSTM LSTM ● Can handle longer-term dependencies ● Overcomes Vanishing Gradients problem ● GRUs - Gated Recurrent Units is a much simpler variant which also overcomes these issues. [Cho et. al. ‘14]

Putting Things Together Encode a sequence of inputs to a vector. (h t | x 1 , …, x t-1 ) Decode from the vector to a sequence of outputs. Pr(x t | x 1 , …, x t-1 ) Image Credit: Sutskever et. al.

SOLVE A WIDER RANGE OF PROBLEMS Sequence to Sequence Image Captioning Activity Recognition Machine Translation Sutskever et. al. ‘14, Cho et. al. ‘14 Vinyals et. al. ‘15, Donahue et. al. ‘15 Speech Recognition Graves & Jaitly ‘14 Donahue et. al. ‘15 V. et. al. ‘15, Li et. al. ‘15 Video Description 3 of 4 papers to be discussed this class VQA, POS tagging, ... Image Credit: Andrej Karpathy

Resources ● Graves’ paper - LSTMs explanation. Generating sequences with recurrent neural networks. Applications to handwriting and speech recognition. ● Chris’ Blog - LSTM unit explanation. ● Karpathy’s Blog - Applications. ● Tensorflow and Caffe - Code examples.

Sequence to Sequence Video to Text Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue Raymond Mooney, Trevor Darrell, Kate Saenko

Objective A monkey is pulling a dog’s tail and is chased by the dog.

Recurrent Neural Networks (RNNs) can map a vector to a sequence. RNN RNN English French [Sutskever et al. NIPS’14] encoder decoder Sentence Sentence RNN [Donahue et al. CVPR’15] Encode Sentence decoder [Vinyals et al. CVPR’15] RNN Encode Sentence [Venugopalan et. al. decoder NAACL’15] RNN RNN Encode Sentence [Venugopalan et. al. ICCV’ encoder decoder 15] (this work)

S2VT Overview CNN CNN CNN CNN Now decode it to a sentence! LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM ... LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM A man is talking ... Encoding stage Decoding stage Sequence to Sequence - Video to Text (S2VT) S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, K. Saenko

1. Train on Imagenet 1000 categories CNN 2. Take activations from layer before classification fc7: 4096 dimension CNN “feature vector” Forward propagate Output: “fc7” features (activations before classification layer) Frames: RGB

UCF 101 101 Action 1. Train CNN on Classes Activity classes CNN (modified AlexNet) 2. Use optical flow to extract flow images. [T. Brox et. al. ECCV ‘04] 3. Take activations from layer before classification fc7: 4096 dimension “feature vector” CNN Frames: Flow Forward propagate Output: “fc7” features (activations before classification layer)

Dataset: Youtube ~2000 clips ● Avg. length: 11s per clip ● ~40 sentence per clip ● ~81,000 sentences ● ● A man is walking on a rope . ● A man is walking across a rope . ● A man is balancing on a rope . ● A man is balancing on a rope at the beach. ● A man walks on a tightrope at the beach. ● A man is balancing on a volleyball net . ● A man is walking on a rope held by poles ● A man balanced on a wire . ● The man is balancing on the wire . ● A man is walking on a rope . ● A man is standing in the sea shore.

Results (Youtube) 27.7 Mean-Pool (VGG) S2VT 28.2 (randomized) 29.2 S2VT (RGB) S2VT 29.8 (RGB+Flow) METEOR: MT metric. Considers alignment, para-phrases and similarity.

Evaluation: Movie Corpora MPII-MD M-VAD MPII, Germany Univ. of Montreal ● ● DVS alignment: semi- DVS alignment: automated ● ● automated and crowdsourced speech extraction 94 movies 92 movies ● ● 68,000 clips 46,009 clips ● ● Avg. length: 3.9s per clip Avg. length: 6.2s per clip ● ● ~1 sentence per clip 1-2 sentences per clip ● ● 68,375 sentences 56,634 sentences ● ●

Movie Corpus - DVS Processed : Someone rushes Looking troubled, into the courtyard. someone descends She then puts a the stairs. head scarf on ...

Results (MPII-MD Movie Corpus) Best Prior Work 5.6 [Rohrbach et al. CVPR’15] 6.7 Mean-Pool 7.1 S2VT (RGB)

Results (M-VAD Movie Corpus) Best Prior Work 4.3 [Yao et al. ICCV’15] 6.1 Mean-Pool 6.7 S2VT (RGB)

M-VAD: https://youtu.be/pER0mjzSYaM

Discussion ● What are the advantages/drawbacks of this approach? ○ End-to-end, annotations ● Detaching recognition and generation. ● Why only METEOR (not BLEU or other metrics)? ● Domain adaptation, Re-use RNNs (youtube -> movies, activity recognition) ● Languages other than English. ● Features apart from Optical Flow, RGB; temporal representation. Sequence to Sequence - Video to Text (S2VT) S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, K. Saenko

Code and more examples http://vsubhashini.github.io/s2vt.html Sequence to Sequence - Video to Text (S2VT) S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, K. Saenko

LSTMs Overview Subhashini Venugopalan Neural Networks z t Output - PowerPoint PPT Presentation

LSTMs Overview Subhashini Venugopalan Neural Networks z t Output B Hidden Hidden Input WHY RNNs/LSTMs? Can we operate over sequences of inputs? Limitations of vanilla Neural Networks z t Output Outputs a fixed size vector. B Hidden

LSTMs Exploit Linguistic Attributes of Data Nelson F . Liu, Omer Levy, Roy Schwartz, Chenhao

Unsupervised Le Learning of Video Representations using LS LSTMs Srivastava et al. University

Beyond detection: GANs and LSTMs to pay attention at human presence Ri Rita Cucchia iara Imag

LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

Class 15 - Long Short-Term Memory (LSTM) Class 15 - Long Short-Term Memory (LSTM) Study materials

Recurrent Neural Networks + LSTMs + Attention Surag Nair (based on slides by Xavier

ECE 6504: Deep Learning for Perception Topics: LSTMs (intuition and variants) [Abhishek:]

A summary of compare its capabilities in detecting fjller-gap dependencies to the other two LSTMs.

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

Recurrent Networks: Stability analysis and LSTMs 1 Which open source project? 2 Related math.

Recurrent Networks, and LSTMs, for NLP Michael Collins, Columbia University Representing

A template-based approach for speech synthesis intonation generation using LSTMs Srikanth Ronanki

Recurrent Neural Networks: Stability analysis and LSTMs M. Soleymani Sharif University of

Natural Language Understanding Lecture 12: Recurrent Neural Networks and LSTMs Adam Lopez

Recurrent Networks: Stability analysis and LSTMs 1 Y(t+6) Story so far Stock vector X(t)

Annual General Meeting April 20 th , 2019 Richmond Olympic Oval Welcome Opening Comments

Computer Graphics Seminar um olhar sobre os dados olmpicos perceptions of olympic data Jlia

Strategy optimization in beachvolleyball applying a two scale approach to the olympic games

Understand Basketball Games 2018.6.15 Sports Videos Large quantity, high

Develop, Find, and Execute: A Blueprint for Effective Email Acquisition Campaigns #ProspectCloud

ANNUAL GENERAL MEETING Blackmud Creek Community League 2019 CALL TO ORDER Acknowledgment of

like Anna Franziska Michel Designer Volleyball Yoga Artist Long Distanz Runner Founder Kids with a

Considering Learning Styles in Learning Managem ent System s: Investigating the Behaviour of