CS535: Deep Learning
- 1. Introduction
Winter 2018 Fuxin Li
With materials from Pierre Baldi, Geoffrey Hinton, Andrew Ng, Honglak Lee, Aditya Khosla, Joseph Lim
1
CS535: Deep Learning 1. Introduction Winter 2018 Fuxin Li With - - PowerPoint PPT Presentation
CS535: Deep Learning 1. Introduction Winter 2018 Fuxin Li With materials from Pierre Baldi, Geoffrey Hinton, Andrew Ng, Honglak Lee, Aditya Khosla, Joseph Lim 1 Cutting Edge of Machine Learning: Deep Learning in Neural Networks Engineering
Winter 2018 Fuxin Li
With materials from Pierre Baldi, Geoffrey Hinton, Andrew Ng, Honglak Lee, Aditya Khosla, Joseph Lim
1
Cutting Edge of Machine Learning: Deep Learning in Neural Networks
Engineering applications:
Understanding
2
different sizes, avg 482x415, color
2014
super-human performance in 2015
Sources: Krizhevsky et al ImageNet Classification with Deep Convolutional Neural Networks, Lee et al Deeply supervised nets 2014, Szegedy et al, Going Deeper with convolutions, ILSVRC2014, Sanchez & Perronnin CVPR 2011, http://www.clarifai.com/ Benenson, http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html
3
4
5
Deep Learning
Deep Architectures for Protein Contact Map Prediction. Bioinformatics, 28, 2449-2457, (2012) 6
7
8
9
10
Label: “Motorcycle” Suggest tags Image search … Speech recognition Music classification Speaker identification … Web search Anti-spam Machine translation …
text audio images/video
ML ML ML
(Supervised) Machine learning: Find 𝒈, so that 𝒈(𝒀) ≈ 𝒁
11
“motorcycle”
ML
12
13
14
Parameters Vector [d x 1] Input [d x 1] Classifier Result [1 x 1] Bias (scalar)
Usually refer 𝐱, 𝑐 as w
min
𝐱 𝑗=1 𝑜
𝑀(𝑧𝑗, 𝑔 𝐲𝑗; 𝐱 ) min
𝐱 𝐅[𝑀𝑑(𝑧𝑗, 𝑔 𝐲𝑗; 𝐱 )]
𝑀𝑑: ቊ1, 𝑧 ≠ 𝑔(𝑦) 0, 𝑧 = 𝑔(𝑦)
way
𝑀𝑗 = |𝑧𝑗 − 𝒙⊤𝒚𝑗| 𝑀𝑗 = 𝑧𝑗 − 𝒙⊤𝒚𝑗
2
𝑀𝑗 = log(1 + 𝑓𝑧𝑗𝑔 𝑦𝑗 ) 𝑧 ∈ {−1,1} 𝑀𝑗 = max(0,1 − 𝑧𝑗𝑔 𝑦𝑗 )
interesting functions) not linearly separable
𝑦1, 𝑦2, … , 𝑦𝑒 ↦ [𝑦1
2, 𝑦2 2, … , 𝑦𝑒 2, 𝑦1𝑦2, 𝑦1𝑦3, … , 𝑦𝑒−1𝑦𝑒]
𝐿 𝑦, 𝑦𝑗 = exp(−𝛾||𝑦𝑗 − 𝑦||2) 𝑔 𝑦 =
𝑗
𝛽𝑗𝐿(𝑦, 𝑦𝑗) cos 𝐱⊤𝐲 + 𝑐 , 𝐱 ∼ 𝑂𝑒 0, 𝛾𝐽 , 𝑐 ∼ 𝑉[0,1] s𝑗𝑛𝑝𝑗𝑒 𝐱⊤𝐲 + 𝑐 , optimized 𝐱
SVM: Linear
approximators”
true function with a low error
24
You see this: But the camera sees this:
25
Input
Raw image
Motorbikes “Non”-Motorbikes
Learning algorithm
pixel 1 pixel 2
pixel 1 pixel 2
26
Input
Motorbikes “Non”-Motorbikes
Learning algorithm
pixel 1 pixel 2
pixel 1 pixel 2 Raw image
27
Input
Motorbikes “Non”-Motorbikes
Learning algorithm
pixel 1 pixel 2
pixel 1 pixel 2 Raw image
28
Input
Motorbikes “Non”-Motorbikes
Learning algorithm
pixel 1 pixel 2
Feature representation
handlebars wheel
E.g., Does it have Handlebars? Wheels?
Handlebars Wheels
Raw image Features
29
SIFT Spin image HoG RIFT Textons GLOH
30
SIFT Spin image HoG RIFT Textons GLOH
31
pixels edges
(combination
32
The high and low tides of neural networks
33
Update
D0 D1 D2 Input Layer Output Layer Destinations Perceptron: Activation functions: Learning:
Frank Rosenblatt.
34
35
Rumelhart, Hinton, Williams 1986) input vector
hidden layers
Back-propagate error signal to get derivatives for learning
Compare outputs with correct answer to get error signal
36
37
Algorithm Error Rate (%) Linear classifier (perceptron) 12.0 K-nearest-neighbors 5.0 Boosting 1.26 SVM 1.4 Neural Network 1.6 Convolutional Neural Networks 0.95 With automatic distortions + ensemble + many tricks 0.23
38
39
Algorithm Accuracy (%) SVM with Pyramid Matching Kernel (2005) 58.2% Spatial Pyramid Matching (2006) 64.6% SVM-KNN (2006) 66.2% Sparse Coding + Pyramid Matching (2009) 73.2% SVM Regression w object proposals (2010) 81.9% Group-Sensitive MKL (2009) 84.3% Deep Learning (pretrained on Imagenet) (2014) 91.4%
~80% is widely considered to be the limit on this dataset
40
41
Convolution Sobel filter Convolution
42
visual processing Learning filters:
43
224 x 224 224 x 224 112 x 112 56 x 56 28 x 28 14 x 14 7 x 7 Airplane Dog Car SUV Minivan Sign Pole ……
44
Zeiler and Fergus 2014
45
46
47
18, pp 1527-1554.
Conference on Machine Learning, 2010
co-adaptation of feature detectors. Arxiv 2012.
48