ECE 6504: Deep Learning for Perception Topics: Recurrent Neural - PowerPoint PPT Presentation

ECE 6504: Deep Learning for Perception Topics: – Recurrent Neural Networks (RNNs) – BackProp Through Time (BPTT) – Vanishing / Exploding Gradients – [Abhishek:] Lua / Torch Tutorial Dhruv Batra Virginia Tech

Administrativia • HW3 – Out today – Due in 2 weeks – Please please please please please start early – https://computing.ece.vt.edu/~f15ece6504/homework3/ (C) Dhruv Batra 2

Plan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackProp Through Time (BPTT) – Vanishing / Exploding Gradients • [Abhishek:] Lua / Torch Tutorial (C) Dhruv Batra 3

New Topic: RNNs (C) Dhruv Batra 4 Image Credit: Andrej Karpathy

Synonyms • Recurrent Neural Networks (RNNs) • Recursive Neural Networks – General familty; think graphs instead of chains • Types: – Long Short Term Memory (LSTMs) – Gated Recurrent Units (GRUs) – Hopfield network – Elman networks – … • Algorithms – BackProp Through Time (BPTT) – BackProp Through Structure (BPTS) (C) Dhruv Batra 5

What’s wrong with MLPs? • Problem 1: Can’t model sequences – Fixed-sized Inputs & Outputs – No temporal structure • Problem 2: Pure feed-forward processing – No “memory”, no feedback (C) Dhruv Batra 6 Image Credit: Alex Graves, book

Sequences are everywhere … (C) Dhruv Batra 7 Image Credit: Alex Graves and Kevin Gimpel

Even where you might not expect a sequence … (C) Dhruv Batra 8 Image Credit: Vinyals et al.

Even where you might not expect a sequence … • Input ordering = sequence (C) Dhruv Batra 9 Image Credit: Ba et al.; Gregor et al

(C) Dhruv Batra 10 Image Credit: [Pinheiro and Collobert, ICML14]

Why model sequences? Figure Credit: Carlos Guestrin

Why model sequences? (C) Dhruv Batra 12 Image Credit: Alex Graves

Name that model Y 1 = {a, … z} Y 2 = {a, … z} Y 3 = {a, … z} Y 4 = {a, … z} Y 5 = {a, … z} X 1 = X 2 = X 3 = X 4 = X 5 = Hidden Markov Model (HMM) (C) Dhruv Batra Figure Credit: Carlos Guestrin 13

How do we model sequences? • No input (C) Dhruv Batra 14 Image Credit: Bengio, Goodfellow, Courville

How do we model sequences? • With inputs (C) Dhruv Batra 15 Image Credit: Bengio, Goodfellow, Courville

How do we model sequences? • With inputs and outputs (C) Dhruv Batra 16 Image Credit: Bengio, Goodfellow, Courville

How do we model sequences? • With Neural Nets (C) Dhruv Batra 17 Image Credit: Alex Graves

How do we model sequences? • It’s a spectrum … Input: No Input: No sequence Input: Sequence Input: Sequence sequence Output: Sequence Output: No Output: Sequence Output: No sequence sequence Example: Example: machine translation, video captioning, open- Im2Caption Example: sentence ended question answering, video question answering Example: classification, “standard” multiple-choice classification / question answering regression problems (C) Dhruv Batra 18 Image Credit: Andrej Karpathy

Things can get arbitrarily complex (C) Dhruv Batra 19 Image Credit: Herbert Jaeger

Key Ideas • Parameter Sharing + Unrolling – Keeps numbers of parameters in check – Allows arbitrary sequence lengths! • “Depth” – Measured in the usual sense of layers – Not unrolled timesteps • Learning – Is tricky even for “shallow” models due to unrolling (C) Dhruv Batra 20

Plan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackProp Through Time (BPTT) – Vanishing / Exploding Gradients • [Abhishek:] Lua / Torch Tutorial (C) Dhruv Batra 21

BPTT • a (C) Dhruv Batra 22 Image Credit: Richard Socher

Illustration [Pascanu et al] • Intuition • Error surface of a single hidden unit RNN; High curvature walls • Solid lines: standard gradient descent trajectories • Dashed lines: gradient rescaled to fix problem (C) Dhruv Batra 23

Fix #1 • Pseudocode (C) Dhruv Batra 24 Image Credit: Richard Socher

Fix #2 • Smart Initialization and ReLus – [Socher et al 2013] – A Simple Way to Initialize Recurrent Networks of Rectified Linear Units, Le et al. 2015 (C) Dhruv Batra 25

ECE 6504: Deep Learning for Perception Topics: Recurrent Neural - PowerPoint PPT Presentation

ECE 6504: Deep Learning for Perception Topics: Recurrent Neural Networks (RNNs) BackProp Through Time (BPTT) Vanishing / Exploding Gradients [Abhishek:] Lua / Torch Tutorial Dhruv Batra Virginia Tech Administrativia HW3

ECE 6504: Deep Learning for Perception Topics: LSTMs (intuition and variants) [Abhishek:]

ECE 6504: Deep Learning for Perception Topics: (Finish) Backprop Convolutional Neural

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

Visual Perception human perception display devices 1 CS 349 - Visual Perception Reference

PLAYING ATARI WITH DEEP REINFORCEMENT LEARNING NEURAL NETWORK VISION FOR ROBOT DRIVING ARJUN

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

MODULES AS PERCEPTUAL INPUT - SYSTEMS Language Perception Visual Auditory Perception

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

(Deep) Learning for Robot Perception and Navigation Wolfram Burgard Deep Learning for Robot

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

For New Construction & Ship Repair PERCEPTION ESTI-MATE PERCEPTION ESTI-MATE 1 PERCEPTION

Optimal Learning Rate What is the optimal value opt of the learning rate? Consider 1 -dim.

Neurodynamic Optimization: New Models and kWTA Applications Jun Wang jwang@mae.cuhk.edu.hk

The 3-pound universe we live in Cerebrum/Cerebral Cortex Thalamus Hypothalamus Pons

Neural Networks Find a way to teach networks to do a certain computation (e.g. ICA) Network

Neural Networks, Chapter 11 in ESL II STK-IN4300 Statistical Learning Methods in Data Science

Quantum neurons Yudong Cao with Gian Giacomo Guerreschi, Aln Aspuru-Guzik Quantum Techniques

Unsupervised Learning Gustavo Velasco-Hern andez Pattern Recognition, 2014 Gustavo

Deep Learning - Theory and Practice Deep Neural Networks 12-03-2020

ECE 6504: Deep Learning for Perception Topics: Recurrent Neural - PowerPoint PPT Presentation

ECE 6504: Deep Learning for Perception Topics: Recurrent Neural Networks (RNNs) BackProp Through Time (BPTT) Vanishing / Exploding Gradients [Abhishek:] Lua / Torch Tutorial Dhruv Batra Virginia Tech Administrativia HW3

ECE 6504: Deep Learning for Perception Topics: LSTMs (intuition and variants) [Abhishek:]

ECE 6504: Deep Learning for Perception Topics: (Finish) Backprop Convolutional Neural

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

Visual Perception human perception display devices 1 CS 349 - Visual Perception Reference

PLAYING ATARI WITH DEEP REINFORCEMENT LEARNING NEURAL NETWORK VISION FOR ROBOT DRIVING ARJUN

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

MODULES AS PERCEPTUAL INPUT - SYSTEMS Language Perception Visual Auditory Perception

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

(Deep) Learning for Robot Perception and Navigation Wolfram Burgard Deep Learning for Robot

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

For New Construction &amp; Ship Repair PERCEPTION ESTI-MATE PERCEPTION ESTI-MATE 1 PERCEPTION

Optimal Learning Rate What is the optimal value opt of the learning rate? Consider 1 -dim.

Neurodynamic Optimization: New Models and kWTA Applications Jun Wang jwang@mae.cuhk.edu.hk

The 3-pound universe we live in Cerebrum/Cerebral Cortex Thalamus Hypothalamus Pons

Neural Networks Find a way to teach networks to do a certain computation (e.g. ICA) Network

Neural Networks, Chapter 11 in ESL II STK-IN4300 Statistical Learning Methods in Data Science

Quantum neurons Yudong Cao with Gian Giacomo Guerreschi, Aln Aspuru-Guzik Quantum Techniques

Unsupervised Learning Gustavo Velasco-Hern andez Pattern Recognition, 2014 Gustavo

Deep Learning - Theory and Practice Deep Neural Networks 12-03-2020

For New Construction & Ship Repair PERCEPTION ESTI-MATE PERCEPTION ESTI-MATE 1 PERCEPTION