1
Convolutional Neural Networks
Kaitlin Palmer San Diego State University
Outline
- What are Convolutional Neural Networks (CNN)
- Why use a CNN
- Typical Layout
– Kernel Size – Stride Size/Padding – Pooling
- Keras Implementation
2
1 2
Convolutional Neural Networks Kaitlin Palmer San Diego State - - PDF document
Convolutional Neural Networks Kaitlin Palmer San Diego State University 1 Outline What are Convolutional Neural Networks (CNN) Why use a CNN Typical Layout Kernel Size Stride Size/Padding Pooling Keras Implementation
2
1 2
3
4
ISS tracking Data: https://www.nasa.gov/pdf/686319main_AP_ED_Stats_RadarData.pdf
3 4
5
6
5 6
7
𝑜
𝑛
𝑜
𝑛
8
7 8
9
10
9 10
11
319*280*3 = 267,960 [two multiplications and one addition per kernel
billion parameters 4 billion times more effective
12
11 12
13
14
13 14
15
Andrew Ng 2017
16
15 16
17
18
17 18
19
20
19 20
21
22
21 22
23
24
23 24
25 Yann LeCun: http://yann.lecun.com/exdb/lenet/stroke-width.html
26
25 26
27 LeCun et al. 1998 Gradient Based Learning Applied to Document Recognition
28
27 28
29
30
𝑛,𝑜
𝑘, 𝑛−1 𝑡+𝑙, 𝑜−1 𝑡+𝑚
𝑗,𝑘,𝑙
𝑚,𝑛 𝑡.𝑢. 𝑚−1 𝑡+𝑛=𝑘
𝑜,𝑞 𝑡.𝑢. 𝑜−1 𝑡+𝑞=𝑙
𝑟
Backpropagation from
Tensor, change loss with respect to feature map Backpropagation through hidden layer Derivatives with respect to the kernel 29 30
31
https://sthalles.github.io/deep_segmentation_network/
32
𝑚,𝑛,𝑜
𝑚,𝑘+𝑛−1,𝑙+𝑜−1𝑥𝑗,𝑘,𝑙,𝑚,𝑛,𝑜 ]
Fully connected layer
Locally connected layer (patch size 2) Convolutional Layer
31 32
layers and convolutional layer
through
but memory size increased only by a factor of the size of the kernels
33
Traditional convolution ~tiled convolution with t=1 Locally connected layer (patch size 2) Tiled convolution (t=2)
𝑚,𝑛,𝑜
𝑚,𝑘+𝑛−1,𝑙+𝑜−1
34
33 34
35
Single Channel
www.riotgames.com Position Rotation Scale
Multi-channel
36
Single Channel Multi-channel 35 36
37
Single Channel Multi Channel
38
37 38
39
40
39 40
41
42
41 42
43 Utdallas.edu
https://www.youtube.com/watch?v=IOHayh06LJ4
44
43 44
45
Tootel et al. Proceedings of the National Academy of Sciences Feb 1998, 95 (3) 811-817; DOI: 10.1073/pnas.95.3.811
Foeva Foeva Foeva
46
– 2D structure of V1 and retina (light in lower half ~ activation in lower half of V1). So too neural networks in 2 dimensional maps – ‘Simple Cells’- linear function of the image in a small, spatially, localized receptive field (e.g. detector units) – ‘Complex Cells’- invariant to small shifts in position (e.g. pooling layers over all spatial location). Maxout units as well – Grandmother cells- concept of cells that respond to your grandmother regardless of the location and scale – Proven ‘Halle Berry neuron- (Quiroga et al. 2005) fires at image, drawing, or text
45 46
– Foeva-high res image detection surrounded by low res image – Quick eye movements- ‘saccades’ glimpse relevant scene pictures (Herman Grid Optical Illusion) – NN’s receive high res everywhere
47
– Objects, relationships between objects, 3D geometric information
48
47 48
and record output
49
50
𝑦∈
𝑧∈
49 50
51
Gating Term 𝛽 –Magnitude of the response 𝛾 –how quickly does the receptive field falls off how cell responds to light on the x’ axis Location terms 𝜐 – radians from the horizontal Translation and rotation of x and y
52
Location Parameters 𝑦0, 𝑧0, 𝜐 Gaussian Scale Parameters 𝛾𝑦𝛾𝑧 Sinusoid parameters 𝑔𝜚 51 52