Lecture 13:
−Introduction to Deep Learning −Deep Convolutional Neural Networks
Aykut Erdem
November 2016 Hacettepe University
Lecture 13: Introduction to Deep Learning Deep Convolutional Neural - - PowerPoint PPT Presentation
Lecture 13: Introduction to Deep Learning Deep Convolutional Neural Networks Aykut Erdem November 2016 Hacettepe University Administrative Assignment 3 is out! It is due November 30, 2016 You will implement a 2-layer Neural
−Introduction to Deep Learning −Deep Convolutional Neural Networks
Aykut Erdem
November 2016 Hacettepe University
− It is due November 30, 2016 − You will implement a 2-layer Neural Network
2
3
3
slide by Fei-Fei Li &
activations gradients
“local gradient”
4
x W
*
hinge loss
R
+
L s (scores)
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
5
slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson
6
− Yann LeCun, Yoshua Bengio and Geoff Hinton
7
8
9
10
Psychological Review, Vol. 65, 1958
11
(Rumelhart, Hinton, Williams, 1986)
(Werbos, 1988)
1989)
(LSTM) (Schmidhuber, 1997)
12
13
Adapted from Joan Bruna
14
learning
suitable features (weights).
supervised learning to achieve good results.
15
Science, Vol. 313, 28 July 2006.
16
17
Image classification
Easiest classes Hardest classes
Output Scale T-shirt Steel drum Drumstick Mud turtle Output Scale T-shirt Giant panda Drumstick Mud turtle
1K categories
error
2012 Teams %Error Supervision (Toronto) 15.3 ISI (Tokyo) 26.1 VGG (Oxford) 26.9 XRCE/INRIA 27.0 UvA (Amsterdam) 29.6 INRIA/LEAR 33.4
pooling layers)
dropout
18
CNN based, non-CNN based
19
Robotics
Amodei et al., "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin", In CoRR 2015 M.-T. Luong et al., "Effective Approaches to Attention-based Neural Machine Translation", EMNLP 2015
Driving Cars”, In CoRR 2016
deep neural networks and tree search", Nature 529, 2016
supervision: Learning to Grasp from 50K Tries and 700 Robot Hours” ICRA 2015
reveals new insights into the genetic determinants of disease", Science 347, 2015
Groove: Generation of Realistic Accompaniments from Single Song Recordings", In IJCAI 2015
Speech recognition Self-Driving Cars Game Playing Genomics Machine Translation
am a student _ Je suis étudiant Je suis étudiant _ I
Audio Generation
And many more… 20
21
22
Slide credit: Neil Lawrence
22
23
Year Breakthroughs in AI Datasets (First Available) Algorithms (First Proposed) 1994 Human-level spontaneous speech recognition Spoken Wall Street Journal articles and other texts (1991) Hidden Markov Model (1984) 1997 IBM Deep Blue defeated Garry Kasparov 700,000 Grandmaster chess games, aka “The Extended Book” (1991) Negascout planning algorithm (1983) 2005 Google’s Arabic-and Chinese-to- English translation 1.8 trillion tokens from Google Web and News pages (collected in 2005) Statistical machine translation algorithm (1988) 2011 IBM Watson became the world Jeopardy! champion 8.6 million documents from Wikipedia, Wiktionary, and Project Gutenberg (updated in 2010) Mixture-of-Experts (1991) 2014 Google’s GoogLeNet object classification at near-human performance ImageNet corpus of 1.5 million labeled images and 1,000 object categories (2010) Convolutional Neural Networks (1989) 2015 Google’s DeepMind achieved human parity in playing 29 Atari games by learning general control from video Arcade Learning Environment dataset of over 50 Atari games (2013) Q-learning (1992) Average No. of Years to Breakthrough: 3 years 18 years
Table credit: Quant Quanto
Slide credit:
24
25 Slide credit:
25
26
Networks from Overfitting”, JMLR Vol. 15, No. 1,
27
In ICML 2015
28
29
30
slide by Dhruv Batra
31
slide by Dhruv Batra
\ˈd ē p\
fixed learned
your favorite classifier hand-crafted features SIFT/HOG
“car” “+”
This burrito place is yummy and fun!
VISION SPEECH NLP
fixed learned
your favorite classifier hand-crafted features MFCC
fixed learned
your favorite classifier hand-crafted features Bag-of-words
slide by Marc’Aurelio Ranzato, Yann LeCun
32
the Perceptron
top of a simple feature extractor
considerable efforts by experts.
i=1 N
Feature Extractor
Wi
33
slide by Marc’Aurelio Ranzato, Yann LeCun
VISION SPEECH NLP pixels edge texton motif part
sample spectral band formant motif phone word character NP/VP/.. clause sentence story word
slide by Marc’Aurelio Ranzato, Yann LeCun
34
Given a library of simple functions Compose into a complicate function
slide by Marc’Aurelio Ranzato, Yann LeCun
35
Given a library of simple functions
Idea 1: Linear Combinations
Compose into a complicate function
slide by Marc’Aurelio Ranzato, Yann LeCun
36
Given a library of simple functions
Idea 2: Compositions
Compose into a complicate function
slide by Marc’Aurelio Ranzato, Yann LeCun
37
Given a library of simple functions
Idea 2: Compositions
Compose into a complicate function
slide by Marc’Aurelio Ranzato, Yann LeCun
38
“car”
slide by Marc’Aurelio Ranzato, Yann LeCun
39
Trainable Classifier Low-Level Feature Mid-Level Feature High-Level Feature
Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]
“car”
slide by Marc’Aurelio Ranzato, Yann LeCun
40
41 Sparse DBNs [Lee et al. ICML ‘09] Figure courtesy: Quoc Le
slide by Dhruv Batra
42
slide by Dhruv Batra
\ˈd ē p\
fixed learned
your favorite classifier hand-crafted features SIFT/HOG
“car” “+”
This burrito place is yummy and fun!
VISION SPEECH NLP
fixed learned
your favorite classifier hand-crafted features MFCC
fixed learned
your favorite classifier hand-crafted features Bag-of-words
slide by Marc’Aurelio Ranzato, Yann LeCun
43
fixed unsupervised supervised
classifier Mixture of Gaussians
MFCC
\ˈd ē p\
fixed unsupervised supervised
classifier K-Means/ pooling
SIFT/HOG
“car”
fixed unsupervised supervised
classifier
n-grams
Parse Tree Syntactic
“+”
This burrito place is yummy and fun!
VISION SPEECH NLP
“Learned”
slide by Marc’Aurelio Ranzato, Yann LeCun
44
fixed unsupervised supervised
classifier Mixture of Gaussians
MFCC
\ˈd ē p\
fixed unsupervised supervised
classifier K-Means/ pooling
SIFT/HOG
“car”
fixed unsupervised supervised
classifier
n-grams
Parse Tree Syntactic
“+”
This burrito place is yummy and fun!
VISION SPEECH NLP “Learned”
slide by Marc’Aurelio Ranzato, Yann LeCun
45
higher-level one.
Trainable Feature- Transform / Classifier Trainable Feature- Transform / Classifier Trainable Feature- Transform / Classifier Learned Internal Representations
slide by Marc’Aurelio Ranzato, Yann LeCun
46
Trainable Feature- Transform / Classifier Trainable Feature- Transform / Classifier Trainable Feature- Transform / Classifier Learned Internal Representations
“Simple” Trainable Classifier hand-crafted Feature Extractor
fixed learned
slide by Marc’Aurelio Ranzato, Yann LeCun
47
48
slide by Dhruv Batra
49 Image credit: Moontae Lee
slide by Geoff Hinton
neurons
representation of many concepts
50
Local Distributed
slide by Geoff Hinton
Image credit: Moontae Lee
51
bedroom mountain
Scene Classification
slide by Bolei Zhou
52
slide by Yisong Yue
53
54
slide by Yisong Yue
55
slide by Yisong Yue
56
slide by Yisong Yue
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
57
32 32 3
width height depth
58
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
32 32 3
Convolve the filter with the image i.e. “slide over the image spatially, computing dot products”
59
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
32 32 3
Convolve the filter with the image i.e. “slide over the image spatially, computing dot products”
Filters always extend the full depth of the input volume
60
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
32 32 3
1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product + bias)
61
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
32 32 3
convolve (slide) over all spatial locations activation map 1 28 28
62
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
32 32 3
convolve (slide) over all spatial locations activation maps 1 28 28
63
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
32 32 3 Convolution Layer activation maps 6 28 28
We stack these up to get a “new image” of size 28x28x6!
64
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
32 32 3 28 28 6 CONV, ReLU e.g. 6 5x5x3 filters
65
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
32 32 3 CONV, ReLU e.g. 6 5x5x3 filters 28 28 6 CONV, ReLU e.g. 10 5x5x6 filters CONV, ReLU
10 24 24
66
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
[From recent Yann LeCun slides]
67
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
[From recent Yann LeCun slides]
68
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
(32 total)
We call the layer convolutional because it is related to convolution of two signals:
elementwise multiplication and sum of a filter and the signal (image)
69
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
70
70
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
32 32 3
convolve (slide) over all spatial locations activation map 1 28 28
71
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
72
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
73
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
74
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
75
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
76
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
77
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
78
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
79
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
80
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
81
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
82
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
e.g. input 7x7 3x3 filter, applied with stride 1 pad with 1 pixel border => what is the
83
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
e.g. input 7x7 3x3 filter, applied with stride 1 pad with 1 pixel border => what is the
7x7 output!
84
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
e.g. input 7x7 3x3 filter, applied with stride 1 pad with 1 pixel border => what is the
7x7 output! in general, common to see CONV layers with stride 1, filters of size FxF , and zero- padding with (F-1)/2. (will preserve size spatially) e.g. F = 3 => zero pad with 1 F = 5 => zero pad with 2 F = 7 => zero pad with 3
85
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
32 32 3 CONV, ReLU e.g. 6 5x5x3 filters 28 28 6 CONV, ReLU e.g. 10 5x5x6 filters CONV, ReLU
10 24 24
86
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
(No padding, no strides) Convolving a 3 × 3 kernel over a 4 × 4 input using unit strides (i.e., i = 4, k = 3, s = 1 and p = 0).
Image credit: Vincent Dumoulin and Francesco Visin 87
Image credit: Vincent Dumoulin and Francesco Visin
88
89
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
90
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
91
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
92
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
93
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
Common settings: K = (powers of 2, e.g. 32, 64, 128, 512)
94
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
64 56 56 1x1 CONV with 32 filters 32 56 56 (each filter has size 1x1x64, and performs a 64-dimensional dot product)
95
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
96
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
97
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
98
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
32 32 3
1 number: the result of taking a dot product between the filter and this part of the image (i.e. 5*5*3 = 75-dimensional dot product)
99
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
32 32 3
1 number: the result of taking a dot product between the filter and this part of the image (i.e. 5*5*3 = 75-dimensional dot product) It’s just a neuron with local connectivity...
100
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
32 32 3 An activation map is a 28x28 sheet of neuron
“5x5 filter” -> “5x5 receptive field for each neuron”
101
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
32 32 3
E.g. with 5 filters, CONV layer consists of neurons arranged in a 3D grid (28x28x5) There will be 5 different neurons all looking at the same region in the input volume 5
102
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
103
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
104
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
have nice interpretation as a saturating “firing rate” of a neuron 3 problems:
gradients
centered
105
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
[LeCun et al., 1991]
106
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
[Krizhevsky et al., 2012]
107
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
108
108
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
109
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
max pool with 2x2 filters and stride 2
6 8 3 4 1 1 2 4 5 6 7 8 3 2 1 1 2 3 4
110
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
111
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
112
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
113
Neural Networks
113
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson
http://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html
114
slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson