Lecture 13: Introduction to Deep Learning Aykut Erdem March 2016 - - PowerPoint PPT Presentation

lecture 13
SMART_READER_LITE
LIVE PREVIEW

Lecture 13: Introduction to Deep Learning Aykut Erdem March 2016 - - PowerPoint PPT Presentation

Lecture 13: Introduction to Deep Learning Aykut Erdem March 2016 Hacettepe University Last time.. Computational Graph x s (scores) * L + hinge loss W R slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 2 Last time


slide-1
SLIDE 1

Lecture 13:

− Introduction to Deep Learning

Aykut Erdem

March 2016 Hacettepe University

slide-2
SLIDE 2

Last time.. Computational Graph

2

x W

*

hinge loss

R

+

L s (scores)

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

slide-3
SLIDE 3

Last time… Training Neural Networks

3

Mini-batch SGD Loop: 1.Sample a batch of data 2.Forward prop it through the graph, get loss 3.Backprop to calculate the gradients 4.Update the parameters using the gradient

slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

slide-4
SLIDE 4

This week

  • Introduction to Deep Learning
  • Deep Convolutional Networks

  • Brief Overview of other Deep Networks

4

slide-5
SLIDE 5

Deep Learning

5

slide-6
SLIDE 6

Synonyms

  • Representation Learning
  • Deep (Machine) Learning
  • Deep Neural Networks
  • Deep Unsupervised Learning
  • Simply: Deep Learning

6

slide by Dhruv Batra

slide-7
SLIDE 7

Recap: 1 Layer Neural Network

  • 1 Neuron
  • Takes input x
  • Outputs y

  • ~Logistic Regression!
  • Gradient Descent

Σ

x y “Neuron”

f(x|w,b) = wTx – b = w1*x1 + w2*x2 + w3*x3 – b y = τ( f(x) )

sigmoid tanh rectilinear

7

slide by Yisong Yue

slide-8
SLIDE 8

Recap: 2 Layer Neural Network

  • 2 Layers of Neurons
  • 1st Layer takes input x
  • 2nd Layer takes output of 1st layer
  • Can approximate arbitrary functions
  • Provided hidden layer is large enough
  • “fat” 2-Layer Network

Σ

x y

Σ Σ

Hidden Layer

Non-Linear!

8

slide by Yisong Yue

slide-9
SLIDE 9

Deep Neural Networks

  • Why prefer Deep over a “Fat” 2-Layer?
  • Compact Model
  • (exponentially large “fat” model)

Image Source: http://blog.peltarion.com/2014/06/22/deep-learning-and-deep-neural-networks-in-synapse/ 9

slide by Yisong Yue

slide-10
SLIDE 10

Original Biological Inspiration

  • David Hubel & Torsten Wiesel discovered “simple cells” and

“complex cells” in the 1959

  • Some cells activate for simple patterns
  • E.g., lines at certain angles
  • Some cells activate for more complex patterns
  • Appear to take activations of simple cells as input

10 Image Source: https://cms.www.countway.harvard.edu/wp/wp-content/uploads/2013/09/0002595_ref.jpg https://cognitiveconsonance.files.wordpress.com/2013/05/c_fig5.jpg

slide by Yisong Yue

slide-11
SLIDE 11

11

slide-12
SLIDE 12

Early Hierarchical Feature Models 
 for Vision

  • Hubel & Wiesel [60s] 


Simple & Complex 
 cells architecture:

  • Fukushima’s 


Neocognitron 
 [70s]

slide by Joan Bruna

12

figures from Yann LeCun’s CVPR plenary

slide-13
SLIDE 13
  • Yann LeCun’s Early ConvNets [80s]:



 
 
 
 
 
 
 


  • Used for character recognition
  • Trained with back propagation.

13

Early Hierarchical Feature Models 
 for Vision

slide by Joan Bruna

figures from Yann LeCun’s CVPR plenary

slide-14
SLIDE 14

Deep Learning pre-2012

  • Despite its very competitive performance, deep learning

architectures were not widespread before 2012.

  • State-of-the-art in handwritten pattern recognition [LeCun et al. ’89,

Ciresan et al, ’07, etc]

14

slide by Joan Bruna

figures from Yann LeCun’s CVPR plenary

slide-15
SLIDE 15

Deep Learning pre-2012

  • Despite its very competitive performance, deep learning

architectures were not widespread before 2012.

  • Face detection [Vaillant et al’93,’94 ; Osadchy et al, ’03, ’04, ’07]

15

slide by Joan Bruna

figures from Yann LeCun’s CVPR plenary

slide-16
SLIDE 16

Deep Learning pre-2012

  • Despite its very competitive performance, deep learning

architectures were not widespread before 2012.

  • Scene Parsing [Farabet et al, ’12,’13]

16

slide by Joan Bruna

figures from Yann LeCun’s CVPR plenary

slide-17
SLIDE 17

Deep Learning pre-2012

  • Despite its very competitive performance, deep learning

architectures were not widespread before 2012.

  • Scene Parsing [Farabet et al, ’12,’13]

17

slide by Joan Bruna

figures from Yann LeCun’s CVPR plenary

slide-18
SLIDE 18

ImageNET

  • Object recognition competition (2012)
  • 1.5 Million Labeled Training Examples
  • ≈1000 classes

18

http://www.image-net.org/

Leopard( Mushroom( Mite(

slide by Yisong Yue

slide-19
SLIDE 19

Deep Learning Golden age in Vision

  • 2012-2014 Imagenet results:
  • 2015 results: MSRA under 3.5% error. 


(using a CNN with 150 layers!)

19

slide by Joan Bruna

figures from Yann LeCun’s CVPR plenary

slide-20
SLIDE 20

Traditional Machine Learning

\ˈd ē p\

fixed learned your favorite
 classifier hand-crafted
 features SIFT/HOG

“car” “+”

This burrito place is yummy and fun!

VISION SPEECH NLP

fixed learned your favorite
 classifier hand-crafted
 features MFCC fixed learned your favorite
 classifier hand-crafted
 features Bag-of-words

slide by Marc’Aurelio Ranzato, Yann LeCun

20

slide-21
SLIDE 21

It’s an old paradigm

  • The first learning machine: 


the Perceptron

  • Built at Cornell in 1960
  • The Perceptron was a linear classifier on

top of a simple feature extractor

  • The vast majority of practical applications
  • f ML today use glorified linear classifiers
  • r glorified template matching.
  • Designing a feature extractor requires

considerable efforts by experts.

y=sign(

i=1 N

W i F i(X )+b)

A

Feature Extractor

Wi

21

slide by Marc’Aurelio Ranzato, Yann LeCun

slide-22
SLIDE 22

Hierarchical Compositionality

VISION SPEECH NLP pixels edge texton motif part

  • bject

sample spectral band formant motif phone word character NP/VP/.. clause sentence story word

slide by Marc’Aurelio Ranzato, Yann LeCun

22

slide-23
SLIDE 23

Building A Complicated Function

Given a library of simple functions Compose into a complicate function

slide by Marc’Aurelio Ranzato, Yann LeCun

23

slide-24
SLIDE 24

Building A Complicated Function

Given a library of simple functions

Idea 1: Linear Combinations

  • Boosting
  • Kernels

Compose into a complicate function

slide by Marc’Aurelio Ranzato, Yann LeCun

24

slide-25
SLIDE 25

Building A Complicated Function

Given a library of simple functions

Idea 2: Compositions

  • Deep Learning
  • Grammar models
  • Scattering transforms…

Compose into a complicate function

slide by Marc’Aurelio Ranzato, Yann LeCun

25

slide-26
SLIDE 26

Building A Complicated Function

Given a library of simple functions

Idea 2: Compositions

  • Deep Learning
  • Grammar models
  • Scattering transforms…

Compose into a complicate function

slide by Marc’Aurelio Ranzato, Yann LeCun

26

slide-27
SLIDE 27

Deep Learning = Hierarchical Compositionality

“car”

slide by Marc’Aurelio Ranzato, Yann LeCun

27

slide-28
SLIDE 28

Trainable 
 Classifier Low-Level
 Feature Mid-Level
 Feature High-Level
 Feature

Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]

“car”

Deep Learning = Hierarchical Compositionality

slide by Marc’Aurelio Ranzato, Yann LeCun

28

slide-29
SLIDE 29
  • The ventral (recognition) pathway in the visual cortex

[picture from Simon Thorpe]

The Mammalian Visual Cortex is Hierarchical

slide by Marc’Aurelio Ranzato, Yann LeCun

29

slide-30
SLIDE 30

Traditional Machine Learning

\ˈd ē p\

fixed learned your favorite
 classifier hand-crafted
 features SIFT/HOG

“car” “+”

This burrito place is yummy and fun!

VISION SPEECH NLP

fixed learned your favorite
 classifier hand-crafted
 features MFCC fixed learned your favorite
 classifier hand-crafted
 features Bag-of-words

slide by Marc’Aurelio Ranzato, Yann LeCun

30

slide-31
SLIDE 31

fixed unsupervised supervised classifier Mixture of Gaussians MFCC

\ˈd ē p\

fixed unsupervised supervised classifier K-Means/ pooling SIFT/HOG

“car”

fixed unsupervised supervised classifier n-grams Parse Tree Syntactic

“+”

This burrito place is yummy and fun!

VISION SPEECH NLP

Traditional Machine Learning (more accurately)

“Learned”

slide by Marc’Aurelio Ranzato, Yann LeCun

31

slide-32
SLIDE 32

fixed unsupervised supervised classifier Mixture of Gaussians MFCC

\ˈd ē p\

fixed unsupervised supervised classifier K-Means/ pooling SIFT/HOG

“car”

fixed unsupervised supervised classifier n-grams Parse Tree Syntactic

“+”

This burrito place is yummy and fun!

VISION SPEECH NLP “Learned”

slide by Marc’Aurelio Ranzato, Yann LeCun

32

Deep Learning = End-to-End Learning

slide-33
SLIDE 33
  • A hierarchy of trainable feature transforms
  • Each module transforms its input representation into a

higher-level one.

  • High-level features are more global and more invariant
  • Low-level features are shared among categories

Trainable
 Feature- Transform / 
 Classifier Trainable
 Feature- Transform / 
 Classifier Trainable
 Feature- Transform / 
 Classifier Learned Internal Representations

Deep Learning = End-to-End Learning

slide by Marc’Aurelio Ranzato, Yann LeCun

33

slide-34
SLIDE 34
  • “Shallow” models
  • Deep models

Trainable
 Feature- Transform / 
 Classifier Trainable
 Feature- Transform / 
 Classifier Trainable
 Feature- Transform / 
 Classifier Learned Internal Representations

“Shallow” vs Deep Learning

“Simple” Trainable Classifier hand-crafted Feature Extractor

fixed learned

slide by Marc’Aurelio Ranzato, Yann LeCun

34

slide-35
SLIDE 35

Next lecture: Deep Convolutional Nets

35