CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/49

Acknowledgements Most of this material is based on the article “Deep Learning in Neural Networks: An Overview” by J. Schmidhuber [1] The errors, if any, are due to me and I apologize for them Feel free to contact me if you think certain portions need to be corrected (please provide appropriate references) 2/49 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 1

Chapter 1: Biological Neurons 3/49 Module 1.1

Reticular Theory Joseph von Gerlach proposed that the nervous system is a single continuous network as opposed to a network of many discrete cells! 1871-1873 Reticular theory 4/49 Module 1.1

Staining Technique Camillo Golgi discovered a chemical reaction that allowed him to examine nervous tissue in much greater detail than ever before He was a proponent of Reticular theory. 1871-1873 Reticular theory 4/49 Module 1.1

Neuron Doctrine Santiago Ram´ on y Cajal used Golgi’s technique to study the nervous system and proposed that it is actually made up of discrete individual cells formimg a network (as opposed to a single continuous network) 1871-1873 1888-1891 Reticular theory Neuron Doctrine 4/49 Module 1.1

The Term Neuron The term neuron was coined by Hein- rich Wilhelm Gottfried von Waldeyer-Hartz around 1891. He further consolidated the Neuron Doc- trine. 1871-1873 1888-1891 Reticular theory Neuron Doctrine 4/49 Module 1.1

Nobel Prize Both Golgi (reticular theory) and Cajal (neuron doctrine) were jointly awarded the 1906 Nobel Prize for Physiology or Medicine, that resulted in lasting conflicting ideas and con- troversies between the two scientists. 1871-1873 1888-1891 1906 Reticular theory Neuron Doctrine Nobel Prize 4/49 Module 1.1

The Final Word In 1950s electron microscopy finally con- firmed the neuron doctrine by unam- biguously demonstrating that nerve cells were individual cells interconnected through synapses (a network of many individual neurons). 1871-1873 1888-1891 1906 1950 Reticular theory Neuron Doctrine Nobel Prize Synapse 4/49 Module 1.1

Chapter 2: From Spring to Winter of AI 5/49 Module 2

McCulloch Pitts Neuron McCulloch (neuroscientist) and Pitts (logi- cian) proposed a highly simplified model of the neuron (1943) [2] 1943 MP Neuron 6/49 Module 2

Perceptron “the perceptron may eventually be able to learn, make decisions, and translate languages” -Frank Rosenblatt 1943 1957-1958 MP Neuron Perceptron 6/49 Module 2

Perceptron “the embryo of an electronic computer that the Navy expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.” -New York Times 1943 1957-1958 MP Neuron Perceptron 6/49 Module 2

First generation Multilayer Perceptrons Ivakhnenko et. al. [3] 1943 1957-1958 1965-1968 MP Neuron Perceptron MLP 6/49 Module 2

Perceptron Limitations In their now famous book “Perceptrons”, Minsky and Papert outlined the limits of what perceptrons could do [4] 1943 1957-1958 1965-1968 1969 MP Neuron Perceptron MLP Limitations 6/49 Module 2

AI Winter of connectionism Almost lead to the abandonment of connec- tionist AI 1943 1957-1958 1965-1968 1969 1969-1986 MP Neuron Perceptron MLP Limitations AI Winter 6/49 Module 2

Backpropagation Discovered and rediscovered several times throughout 1960’s and 1970’s Werbos(1982) [5] first used it in the context of artificial neural networks Eventually popularized by the work of Rumelhart et. al. in 1986 [6] 1943 1957-1958 1965-1968 1969 1969-1986 1986 MP Neuron Perceptron MLP Limitations AI Winter Backpropagation 6/49 Module 2

Gradient Descent Cauchy discovered Gradient Descent moti- vated by the need to compute the orbit of heavenly bodies 1847 1943 1957-1958 1965-1968 1969 1969-1986 1986 Gradient Descent MP Neuron Perceptron MLP Limitations AI Winter Backpropagation 6/49 Module 2

Universal Approximation The- orem A multilayered network of neurons with a single hidden layer can be used to approxi- mate any continuous function to any desired precision [7] 1847 1943 1957-1958 1965-1968 1969 1969-1986 1986 1989 Gradient Descent MP Neuron Perceptron MLP Limitations AI Winter UAT Backpropagation 6/49 Module 2

Chapter 3: The Deep Revival 7/49 Module 3

Unsupervised Pre-Training Hinton and Salakhutdinov described an ef- fective way of initializing the weights that allows deep autoencoder networks to learn a low-dimensional representation of data. [8] 2006 Unsupervised Pre-Training 8/49 Module 3

Unsupervised Pre-Training The idea of unsupervised pre-training actually dates back to 1991-1993 (J. Schmidhu- ber) when it was used to train a “Very Deep Learner” 1991-1993 2006 Unsupervised Pre-Training Very Deep Learner 9/49 Module 3

More insights (2007-2009) Further Investigations into the effectiveness of Unsupervised Pre-training 1991-1993 2006-2009 Unsupervised Pretraining Very Deep Learner 9/49 Module 3

Success in Handwriting Recog- nition Graves et. al. outperformed all entries in an international Arabic handwriting recognition competition [9] 1991-1993 2006-2009 2009 Handwriting Unsupervised Pretraining Very Deep Learner 9/49 Module 3

Success in Speech Recognition Dahl et. al. showed relative error reduction of 16.0% and 23.2% over a state of the art system [10] 1991-1993 2006-2009 2009 2010 Handwriting Speech Unsupervised Pretraining Very Deep Learner 9/49 Module 3

New record on MNIST Ciresan et. al. set a new record on the MNIST dataset using good old backpropagation on GPUs (GPUs enter the scene) [11] 1991-1993 2006-2009 2009 2010 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST 9/49 Module 3

First Superhuman Visual Pat- tern Recognition D. C. Ciresan et. al. achieved 0.56% error rate in the IJCNN Traffic Sign Recognition Competition [12] 1991-1993 2006-2009 2009 2010 2011 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST Visual Pattern Recognition 9/49 Module 3

Winning more visual recognition challenges Network Error Layers AlexNet [13] 16.0% 8 1991-1993 2006-2009 2009 2010 2011 2012-2016 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST Success on ImageNet Visual Pattern Recognition 9/49 Module 3

Winning more visual recognition challenges Network Error Layers AlexNet [13] 16.0% 8 ZFNet [14] 11.2% 8 1991-1993 2006-2009 2009 2010 2011 2012-2016 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST Success on ImageNet Visual Pattern Recognition 9/49 Module 3

Winning more visual recognition challenges Network Error Layers AlexNet [13] 16.0% 8 ZFNet [14] 11.2% 8 VGGNet [15] 7.3% 19 1991-1993 2006-2009 2009 2010 2011 2012-2016 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST Success on ImageNet Visual Pattern Recognition 9/49 Module 3

Winning more visual recognition challenges Network Error Layers AlexNet [13] 16.0% 8 ZFNet [14] 11.2% 8 VGGNet [15] 7.3% 19 GoogLeNet [16] 6.7% 22 1991-1993 2006-2009 2009 2010 2011 2012-2016 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST Success on ImageNet Visual Pattern Recognition 9/49 Module 3

Winning more visual recognition challenges Network Error Layers AlexNet [13] 16.0% 8 ZFNet [14] 11.2% 8 VGGNet [15] 7.3% 19 GoogLeNet [16] 6.7% 22 MS ResNet [17] 3.6% 152!! 1991-1993 2006-2009 2009 2010 2011 2012-2016 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST Success on ImageNet Visual Pattern Recognition 9/49 Module 3

Chapter 4: From Cats to Convolutional Neural Networks 10/49 Module 4

Hubel and Wiesel Experiment Experimentally showed that each neuron has a fixed receptive field - i.e. a neuron will fire only in response to a visual stimuli in a specific region in the visual space [18] 1959 H and W experiment 11/49 Module 4

Neocognitron Used for Handwritten character recognition and pattern recognition (Fukushima et. al.) [19] 1959 1980 H and W experiment Neocognitron 11/49 Module 4

Convolutional Neural Network Handwriting digit recognition using backpropagation over a Convolutional Neural Network (LeCun et. al.) [20] 1959 1980 1989 H and W experiment Neocognitron CNN 11/49 Module 4

LeNet-5 Introduced the (now famous) MNIST dataset (LeCun et. al.) [21] 1959 1980 1989 1998 H and W experiment Neocognitron CNN LeNet-5 11/49 Module 4

An algorithm inspired by an experiment on cats is today used to detect cats in videos :-) 12/49 Module 4

Chapter 5: Faster, higher, stronger 13/49 Module 5

Better Optimization Methods Faster convergence, better accuracies 1983 Nesterov 14/49 Module 5

Better Optimization Methods Faster convergence, better accuracies 1983 2011 Adagrad Nesterov 14/49 Module 5

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/49 Acknowledgements Most of this material is based on the

CS7015 (Deep Learning) : Lecture 10 Learning Vectorial Representations Of Words Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 16 Encoder Decoder Models, Attention Mechanism Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 18 Markov Networks Mitesh M. Khapra Department of Computer

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning): Lecture 4 Feedforward Neural Networks, Backpropagation Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

CS7015 (Deep Learning) : Lecture 19 Using joint distributions for classification and sampling,

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons,

L14. Sound Detector + Localization 2 - delay r ( ) f ( t ) h ( ) dt Jeffress Model

Dyscalculia 100 years 10 years ago Pekka Rsnen principal investigator Niilo Mki

Logic, Symbolic AI, and Cogni#ve Science Byoung-Tak Zhang

Sociolinguistics and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, LTI, SDU -

Outline 1. Introduction 2. Bio Molecules 2.1 Operation Principle and Applications of Microarrays

Certifying the Safe Design of a Virtual Fixture Control Algorithm for a Surgical Robot Yanni

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic

SI231 Matrix Computations Lecture 3: Least Squares Ziping Zhao Fall Term 20202021 School of