DEEP LEARNING FFR135, Artificial Neural Networks Olof Mogren - PowerPoint PPT Presentation

DEEP LEARNING FFR135, Artificial Neural Networks Olof Mogren Chalmers University of Technology October 2016

DEEP LEARNING • Artificial neural networks • Many layers of abstractions • Outperforms traditional methods in: • Image classification • Natural language processing • Machine translation • Sentiment analysis • Speech recognition • Reinforcement learning

SEMI-RECENT PROGRESS • 2006: Depth breakthrough: layerwise pretrained Restricted Boltzmann Machines • GPUs • Practical use Real applications from Google, Facebook, Tesla, Microsoft, Apple, and others! A fast learning algorithm for deep belief nets ; Hinton, Osindero, Tehi; Neural Computation; 2006

PERCEPTRON • 1943, M cCulloch & Pitts (neuron model) inputs output • 1958, Rosenblatt (perceptron) x • Linear (binary) classification of inputs 0 w 0 x w • Can not learn any non-linear function 1 1 w (e.g. XOR) 2 y x 2 w 3 x w 3 4 x 4

MODELLING XOR 1 1 0 x 0 0 0 1 0 1 x 1

MODELLING XOR 1 1 0 1 1 x 0 x 0 ∧ ¬ x 1 0 0 1 0 0 1 0 1 0 1 ¬ x 0 ∧ x 1 x 1

MULTI-LAYER PERCEPTRON • Combining layers lets us inputs hidden layer outputs represent non-linear functions • Each layer: • Linear transformation: a = W x + b • Non-linear (element-wise) activation: h = g ( a )

MODELLING FUNCTIONS • Universal function approximation inputs hidden layer outputs • Stacking layers: function composition • Apply error/loss function to output • Continuously differentiable; chain rule • Propagating errors (backpropagation) • (Mini-batch) Stochastic gradient descent (SGD) details

MOTIVATION OF DEPTH • M ore compact representation (exponentially) • There are boolean functions that require • Polynomial number of units ( deep architecture) • Exponential number of units ( shallow architecture) • E.g., parity function (for n input bits): • efficiently represented with depth O ( log n ) • but O ( 2 n ) gates if represented by a depth two circuit (Yao, 1985) Exploring Strategies for Training Deep Neural Networks ; Larochelle, Bengio, Louradour, Lamblin; JMLR 2009

LEARNING LEVELS OF REPRESENTATION • E ach layer: non-linear transformation of inputs: h = sigmoid ( W x + b ) • Learning representations; abstractions • No feature engineering!

DISTRIBUTED REPRESENTATIONS • E.g.: big, yellow, Volkswagen • Non-distributed representations: n binary parameters → n values • E.g.: Clustering, n-grams, decision trees, etc. • NNs learn distributed representations • Distributed representations: n binary parameters → 2 n possible values

EXAMPLE: WORD EMBEDDINGS • Distributed representations for words • word2vec, glove, etc.

DEEP LEARNING IN JAVASCRIPT cs231n.stanford.edu playground.tensorflow.org

LEVELS OF ABSTRACTIONS

Convolution Layer 32x32x3 image height 32 width 32 depth 3 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 10 27 Jan 2016 27 Jan 2016

Convolution Layer 32x32x3 image 5x5x3 filter 32 Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” 32 3 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 11 27 Jan 2016 27 Jan 2016

Convolution Layer Filters always extend the full depth of the input volume 32x32x3 image 5x5x3 filter 32 Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” 32 3 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 12 27 Jan 2016 27 Jan 2016

Convolution Layer 32x32x3 image 5x5x3 filter 32 1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image 32 (i.e. 5*5*3 = 75-dimensional dot product + bias) 3 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 13 27 Jan 2016 27 Jan 2016

Convolution Layer activation map 32x32x3 image 5x5x3 filter 32 28 convolve (slide) over all spatial locations 28 32 3 1 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 14 27 Jan 2016 27 Jan 2016

consider a second, green filter Convolution Layer activation maps 32x32x3 image 5x5x3 filter 32 28 convolve (slide) over all spatial locations 28 32 3 1 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 15 27 Jan 2016 27 Jan 2016

For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: activation maps 32 28 Convolution Layer 28 32 3 6 We stack these up to get a “new image” of size 28x28x6! Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 16 27 Jan 2016 27 Jan 2016

Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation functions 32 28 CONV, ReLU e.g. 6 5x5x3 32 28 filters 3 6 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 17 27 Jan 2016 27 Jan 2016

Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions 32 28 24 …. CONV, CONV, CONV, ReLU ReLU ReLU e.g. 6 e.g. 10 5x5x3 5x5x 6 32 28 24 filters filters 3 6 10 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 18 27 Jan 2016 27 Jan 2016

one filter => example 5x5 filters one activation map (32 total) We call the layer convolutional because it is related to convolution of two signals: elementwise multiplication and sum of a filter and the signal (image) Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 21 27 Jan 2016 27 Jan 2016

preview: Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 22 27 Jan 2016 27 Jan 2016

A closer look at spatial dimensions: activation map 32x32x3 image 5x5x3 filter 32 28 convolve (slide) over all spatial locations 28 32 3 1 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 23 27 Jan 2016 27 Jan 2016

A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter 7 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 24 27 Jan 2016 27 Jan 2016

A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter => 5x5 output 7 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 28 27 Jan 2016 27 Jan 2016

A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter applied with stride 2 7 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 29 27 Jan 2016 27 Jan 2016

A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter applied with stride 2 7 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 30 27 Jan 2016 27 Jan 2016

A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter applied with stride 2 => 3x3 output! 7 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 31 27 Jan 2016 27 Jan 2016

A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter applied with stride 3? 7 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 32 27 Jan 2016 27 Jan 2016

A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter applied with stride 3? doesn’t fit! 7 cannot apply 3x3 filter on 7x7 input with stride 3. Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - Lecture 7 - 33 27 Jan 2016 27 Jan 2016

DEEP LEARNING FFR135, Artificial Neural Networks Olof Mogren - PowerPoint PPT Presentation

DEEP LEARNING FFR135, Artificial Neural Networks Olof Mogren Chalmers University of Technology October 2016 DEEP LEARNING Artificial neural networks Many layers of abstractions Outperforms traditional methods in: Image

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Understanding Convolutional Neural Networks David Stutz July 24th, 2014 David Stutz | July

Deep Learning: Training Juhan Nam Training Deep Neural Networks Forward (hidden unit

Generative vs. discriminative Generative Discriminative Belief network A is more More

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning practical issues by

tss

Le Lecture 7 7 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix 1 Na Nave L Losse

Lectu ture 7 Recap Prof. Leal-Taix and Prof. Niessner 1 Bey Beyon ond l linea ear

Neural Networks. Petr Pok Czech Technical University in Prague Faculty of Electrical

DEEP LEARNING FFR135, Artificial Neural Networks Olof Mogren - PowerPoint PPT Presentation

DEEP LEARNING FFR135, Artificial Neural Networks Olof Mogren Chalmers University of Technology October 2016 DEEP LEARNING Artificial neural networks Many layers of abstractions Outperforms traditional methods in: Image

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Understanding Convolutional Neural Networks David Stutz July 24th, 2014 David Stutz | July

Deep Learning: Training Juhan Nam Training Deep Neural Networks Forward (hidden unit

Generative vs. discriminative Generative Discriminative Belief network A is more More

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning practical issues by

tss

Le Lecture 7 7 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix 1 Na Nave L Losse

Lectu ture 7 Recap Prof. Leal-Taix and Prof. Niessner 1 Bey Beyon ond l linea ear

Neural Networks. Petr Pok Czech Technical University in Prague Faculty of Electrical

Deep learning for natural language processing A short primer on deep learning Benoit Favre <