Lecture 13: Introduction to Deep Learning Aykut Erdem March 2016 - PowerPoint PPT Presentation

Lecture 13: − Introduction to Deep Learning Aykut Erdem March 2016 Hacettepe University

Last time.. Computational Graph x s (scores) * L + hinge loss W R slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 2

Last time… Training Neural Networks Mini-batch SGD Loop: 1.Sample a batch of data 2.Forward prop it through the graph, get loss 3.Backprop to calculate the gradients slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 4.Update the parameters using the gradient 3

This week • Introduction to Deep Learning • Deep Convolutional Networks   • Brief Overview of other Deep Networks 4

Deep Learning 5

Synonyms • Representation Learning • Deep (Machine) Learning • Deep Neural Networks • Deep Unsupervised Learning • Simply: Deep Learning slide by Dhruv Batra 6

Recap: 1 Layer Neural Network • 1 Neuron y x Σ - Takes input x - Outputs y   “Neuron” y = τ( f(x) ) f(x|w,b) = w T x – b = w 1 *x 1 + w 2 *x 2 + w 3 *x 3 – b • ~Logistic Regression! sigmoid - Gradient Descent tanh rectilinear slide by Yisong Yue 7

Recap: 2 Layer Neural Network Σ y x Σ Σ Hidden Layer • 2 Layers of Neurons - 1 st Layer takes input x Non-Linear! - 2 nd Layer takes output of 1 st layer • Can approximate arbitrary functions - Provided hidden layer is large enough slide by Yisong Yue - “fat” 2-Layer Network 8

Deep Neural Networks • Why prefer Deep over a “Fat” 2-Layer? - Compact Model (exponentially large “fat” model) • slide by Yisong Yue Image Source: http://blog.peltarion.com/2014/06/22/deep-learning-and-deep-neural-networks-in-synapse/ 9

Original Biological Inspiration David Hubel & Torsten Wiesel discovered “simple cells” and • “complex cells” in the 1959 - Some cells activate for simple patterns • E.g., lines at certain angles - Some cells activate for more complex patterns • Appear to take activations of simple cells as input slide by Yisong Yue Image Source: https://cms.www.countway.harvard.edu/wp/wp-content/uploads/2013/09/0002595_ref.jpg https://cognitiveconsonance.files.wordpress.com/2013/05/c_fig5.jpg 10

Early Hierarchical Feature Models   for Vision • Hubel & Wiesel [60s]   Simple & Complex   cells architecture: • Fukushima’s   Neocognitron   [70s] slide by Joan Bruna 12 figures from Yann LeCun’s CVPR plenary

                Early Hierarchical Feature Models   for Vision • Yann LeCun’s Early ConvNets [80s]:   - Used for character recognition - Trained with back propagation. slide by Joan Bruna 13 figures from Yann LeCun’s CVPR plenary

Deep Learning pre-2012 • Despite its very competitive performance, deep learning architectures were not widespread before 2012. - State-of-the-art in handwritten pattern recognition [LeCun et al. ’89, Ciresan et al, ’07, etc] slide by Joan Bruna 14 figures from Yann LeCun’s CVPR plenary

Deep Learning pre-2012 • Despite its very competitive performance, deep learning architectures were not widespread before 2012. - Face detection [Vaillant et al’93,’94 ; Osadchy et al, ’03, ’04, ’07] slide by Joan Bruna 15 figures from Yann LeCun’s CVPR plenary

Deep Learning pre-2012 • Despite its very competitive performance, deep learning architectures were not widespread before 2012. - Scene Parsing [Farabet et al, ’12,’13] slide by Joan Bruna 16 figures from Yann LeCun’s CVPR plenary

Deep Learning pre-2012 • Despite its very competitive performance, deep learning architectures were not widespread before 2012. - Scene Parsing [Farabet et al, ’12,’13] slide by Joan Bruna 17 figures from Yann LeCun’s CVPR plenary

ImageNET • Object recognition competition (2012) - 1.5 Million Labeled Training Examples - ≈ 1000 classes Leopard( Mushroom( Mite( slide by Yisong Yue http://www.image-net.org/ 18

Deep Learning Golden age in Vision • 2012-2014 Imagenet results: • 2015 results: MSRA under 3.5% error.   slide by Joan Bruna (using a CNN with 150 layers!) 19 figures from Yann LeCun’s CVPR plenary

Traditional Machine Learning VISION hand-crafted   your favorite   features “car” classifier SIFT/HOG fixed learned SPEECH hand-crafted   your favorite   features \ ˈ d ē p\ classifier MFCC fixed learned slide by Marc’Aurelio Ranzato, Yann LeCun NLP hand-crafted   your favorite   This burrito place features “+” classifier is yummy and fun! Bag-of-words fixed learned 20

It’s an old paradigm • The first learning machine:   Feature Extractor the Perceptron - Built at Cornell in 1960 A • The Perceptron was a linear classifier on top of a simple feature extractor W i • The vast majority of practical applications of ML today use glorified linear classifiers N y=sign ( W i F i ( X ) +b ) or glorified template matching. ∑ • Designing a feature extractor requires i= 1 considerable e ff orts by experts. slide by Marc’Aurelio Ranzato, Yann LeCun 21

Hierarchical Compositionality VISION pixels edge texton motif part object SPEECH spectral sample formant motif phone word band slide by Marc’Aurelio Ranzato, Yann LeCun NLP character word NP/VP/.. clause sentence story 22

Building A Complicated Function Given a library of simple functions Compose into a complicate function slide by Marc’Aurelio Ranzato, Yann LeCun 23

Building A Complicated Function Given a library of simple functions Idea 1: Linear Combinations • Boosting Compose into a • Kernels • … complicate function slide by Marc’Aurelio Ranzato, Yann LeCun 24

Building A Complicated Function Given a library of simple functions Idea 2: Compositions • Deep Learning Compose into a • Grammar models • Scattering transforms… complicate function slide by Marc’Aurelio Ranzato, Yann LeCun 25

Building A Complicated Function Given a library of simple functions Idea 2: Compositions • Deep Learning Compose into a • Grammar models • Scattering transforms… complicate function slide by Marc’Aurelio Ranzato, Yann LeCun 26

Deep Learning = Hierarchical Compositionality “car” slide by Marc’Aurelio Ranzato, Yann LeCun 27

Deep Learning = Hierarchical Compositionality “car” Low-Level   Mid-Level   High-Level   Trainable   Feature Feature Feature Classifier slide by Marc’Aurelio Ranzato, Yann LeCun Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013] 28

The Mammalian Visual Cortex is Hierarchical • The ventral (recognition) pathway in the visual cortex slide by Marc’Aurelio Ranzato, Yann LeCun [picture from Simon Thorpe] 29

Traditional Machine Learning VISION hand-crafted   your favorite   features “car” classifier SIFT/HOG fixed learned SPEECH hand-crafted   your favorite   features \ ˈ d ē p\ classifier MFCC fixed learned slide by Marc’Aurelio Ranzato, Yann LeCun NLP hand-crafted   your favorite   This burrito place features “+” classifier is yummy and fun! Bag-of-words fixed learned 30

Traditional Machine Learning (more accurately) “Learned” VISION K-Means/ SIFT/HOG classifier “car” pooling fixed unsupervised supervised SPEECH Mixture of MFCC classifier \ ˈ d ē p\ Gaussians slide by Marc’Aurelio Ranzato, Yann LeCun fixed unsupervised supervised NLP Parse Tree This burrito place n-grams classifier “+” Syntactic is yummy and fun! fixed unsupervised supervised 31

Deep Learning = End-to-End Learning “Learned” VISION K-Means/ SIFT/HOG classifier “car” pooling fixed unsupervised supervised SPEECH Mixture of MFCC classifier \ ˈ d ē p\ Gaussians fixed unsupervised supervised slide by Marc’Aurelio Ranzato, Yann LeCun NLP Parse Tree This burrito place n-grams classifier “+” Syntactic is yummy and fun! fixed unsupervised supervised 32

Deep Learning = End-to-End Learning • A hierarchy of trainable feature transforms - Each module transforms its input representation into a higher-level one. - High-level features are more global and more invariant - Low-level features are shared among categories Trainable   Trainable   Trainable   slide by Marc’Aurelio Ranzato, Yann LeCun Feature- Feature- Feature- Transform /   Transform /   Transform /   Classifier Classifier Classifier Learned Internal Representations 33

“Shallow” vs Deep Learning • “Shallow” models hand-crafted “Simple” Trainable Feature Extractor Classifier fixed learned • Deep models Trainable   Trainable   Trainable   Feature- Feature- Feature- slide by Marc’Aurelio Ranzato, Yann LeCun Transform /   Transform /   Transform /   Classifier Classifier Classifier Learned Internal Representations 34

Next lecture: Deep Convolutional Nets 35

Lecture 13: Introduction to Deep Learning Aykut Erdem March 2016 - PowerPoint PPT Presentation

Lecture 13: Introduction to Deep Learning Aykut Erdem March 2016 Hacettepe University Last time.. Computational Graph x s (scores) * L + hinge loss W R slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 2 Last time

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Stewardship and Integrated Pest Management in a Commercial Nursery in Canada Valerie Sikkema

Introduction to First Introduction to First Generation Expert Generation Expert Systems

Scale and level (approach) are different! Approach determines criteria for observation: scale

1. The coins of the Bible (Denarius and Widows Mite) have been found in abundance 2. Ancient

Has the honey bee a future? Some facts about honey bees Pollinate 60% of all commercial crops

Lecture 1: Introduction to RKHS MLSS Cadiz, 2016 Gatsby Unit, CSML, UCL May 12, 2016 Lecture 1:

UNIQUE ISSUES IN LUKEACTS The Omissions of Markan Material 35% of Luke is drawn from Mark

Bugs / Insekten int I = 0; i++; void