Neural Network Basics Part II Content Image-to-image Why fully - - PowerPoint PPT Presentation
Neural Network Basics Part II Content Image-to-image Why fully - - PowerPoint PPT Presentation
Neural Network Basics Part II Content Image-to-image Why fully convolutional? Fully Convolutional Networks (FCN) Up-sampling Network architecture Recurrent Neural Networks Sequence data and representation
Content
- Image-to-image
- Why fully convolutional?
- Fully Convolutional Networks (FCN)
- Up-sampling
- Network architecture
- Recurrent Neural Networks
- Sequence data and representation
- RNNs model: Forward & backward
- Different types of RNNs
- LSTM unit
- Deep Learning Frameworks
- Deep learning frameworks & popularity
- Data representation
- Typical training steps
- Model convertors
- Standard model format
2
Image-to-Image
Why fully convolutional?
- Detection:
4
image
deep CNNs
class bbox input
- utput
One-stage: YOLO, SSD, … Two-stage: Faster R-CNN,…
Why fully convolutional?
- Graphics & … :
5
image
deep CNNs
input
- utput
~15’: AlexNet, VGG, … with fully connected layer
?
image volume 3D mesh ……
✖
Fixed input size in NNs with FC layers
- Fully connected layers in VGG-16:
6
flatten
feature map 7, 7, 512
⋮
1, 7 × 7 × 512
⋮ ⋮
7 × 7 × 512, 4096 image 224, 224, 3
conv layers & pooling layers
… 𝑌 𝑋 𝑔 𝑌 = 𝑋𝑈𝑌 + 𝑐
vector fully connected layer 6
- utput
Fully Connected vs. Fully Convolutional
7 Fully connected Fully convolutional Input size ✖Fixed ✔Any Computation ✖Intensive ✔Less intensive Spatial information ✖Lost ✔Preserved
Computation in AlexNet:
- Weights: Conv layers 90% : FC layers 10%
- Computation: Conv layers 10% : FC layers 90%
Spatial information:
- Conv layers: Volume -> Volume
- FC layers: Volume -> Vector
Fully Convolutional Networks
8
- J. Long, et al, Fully Convolutional Networks for Semantic Segmentation, 2014
Questions: 1. Up-sampling? 2. Original Size?
How to do up-sampling?
- Interpolation
- Nearest neighbor interpolation
- Linear interpolation
- Bi-linear interpolation
- Bi-cubic interpolation
- Drawbacks for interpolation variants
- Manual feature engineering
- Nothing for the networks to learn
9
𝑣 < 0.5 𝑤 < 0.5 𝑣 ≥ 0.5 𝑤 < 0.5 𝑣 < 0.5 𝑤 ≥ 0.5 𝑣 ≥ 0.5 𝑤 ≥ 0.5 (i, j) (i, j+1) (i+1, j) (i+1, j+1)
(𝑦𝑏, 𝑧𝑏) (𝑦𝑐, 𝑧𝑐) (𝑦, 𝑧)
How to do up-sampling?
- Padding with zeros/Un-pooling
10
Matthew D. Zeiler, et al, Visualizing and Understanding Convolutional Networks, 2013
How to do up-sampling?
- Transpose convolution
11
Transpose Convolution
- Convolutions:
- Input (𝑜 × 𝑜)
- 4 × 4 feature map
- Kernel (𝑔 × 𝑔, 𝑞, 𝑡)
- 3 × 3 kernel
- 0 padding
- 1 stride
- Output
- 2 × 2 feature map
- Output size
- 𝑔𝑚𝑝𝑝𝑠
𝑜+2𝑞−𝑔 𝑡
+ 1
12
Transpose Convolution
- Going backward of convolution
13
𝑍 = 𝐷𝑌 𝐷𝑈𝑍 = 𝑌
Transpose Convolution
- Convolutional matrix for 𝑍 = 𝐷𝑌
- Kernel 3 × 3
- Padding 0
- Stride 1
14
Convolution matrix 𝐷 (4, 16)
1 4 1 1 4 3 3 3 1
Kernel (3, 3)
1 2 1 2
1 4 1 1 4 3 3 3 1 1 4 1 1 4 3 3 3 1 1 4 1 1 4 3 3 3 1 1 4 1 1 4 3 3 3 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3
Transpose Convolution
- Flatten the input matrix
- 4, 4 -> (16, 1)
15 4 5 8 7 1 8 8 8 3 6 6 4 6 5 7 8
1 2 3 1 2 3
4 5 8 7 1 8 8 8 3 6 6 4 6 5 7 8
Flatten
Input matrix (4, 4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Flattened input matrix 𝑌 (16, 1)
Transpose Convolution
- Perform 'convolution' and resize
- 𝐷𝑌 = 𝑍
- Resize 𝑌
16
112 148 126 134 112 148 126 134
Resize
Output (4, 1) Output 𝑍 (2, 2)
Transpose convolution
- Perform transpose convolution
- 𝐷𝑌 = 𝑍
- 𝑌 = 𝐷𝑈𝑍
17
2 1 4 4
Resize
Input 𝑍 (4, 1) Transposed convolution matrix 𝐷𝑈 (16, 4) Output (16, 1) Output 𝑌 (4, 4)
Transpose Convolution in Caffe
- Forward: im2col
18
Transpose Convolution in Caffe
- Forward: col2im
19
T
𝐷𝑝𝑣𝑢 × 𝐼′ × 𝑋′ feature map
col2im
Transpose Convolution in Caffe
- Backward
- Loss function: 𝑀
- 𝑍𝑚 = 𝑑𝑝𝑚2𝑗𝑛 𝑗𝑛2𝑑𝑝𝑚 𝐿𝑚
𝑗𝑛2𝑑𝑝𝑚 𝑌𝑚
𝑈
20
𝜖𝑀 𝜖𝑍𝑚 𝜖𝑀 𝜖𝑍𝑚−1
𝜖𝑀 𝜖𝑍𝑚−1 = 𝜖𝑀 𝜖𝑍𝑚 ∙ 𝜖𝑍𝑚 𝜖𝑍𝑚−1 = 𝜖𝑀 𝜖𝑍𝑚 ∙ 𝜖𝑍𝑚 𝜖𝑌𝑚 = 𝐷𝑚 𝑈 ∙ 𝜖𝑀 𝜖𝑍𝑚
𝑍𝑚−1 Transpose convolution: 𝑌 = 𝐷𝑈𝑍 𝐷𝑚
Transpose Convolution in Caffe
- 'deconv_layer.cpp'
- 𝑌 = 𝐷𝑈𝑍
21
Original size
22
𝐼 2 × 𝑋 2 𝐼 4 × 𝑋 4 𝐼 8 × 𝑋 8 𝐼 16 × 𝑋 16 𝐼 8 × 𝑋 8 𝐼 4 × 𝑋 4 𝐼 2 × 𝑋 2
Network architectures for Image-to-image
- Encoder-decoder
23
Edgar Simo-Serra, et al, Fully Learning to Simplify: Fully Convolutional Networks for Rough Sketch Cleanup, 2016
Network architectures for Image-to-image
- Encoder-decoder + skip connections
24
Olaf Ronneberger, et al, U-Net: Convolution Networks for Biomedical Image Segmentation, 2015
Summary for Image-to-Image
- Why fully convolutional?
- Analysis of fully connected layers
- Fully convolutional vs. fully connected
- Fully Convolutional Networks (FCN)
- Up-sampling
- Interpolation
- Un-pooling
- Transpose convolution
- Theory
- Implementation in Caffe
- Network architecture
- Encoder-decoder
- Encoder-decoder + skip connections
25
Recurrent Neural Networks
Examples of Sequence data
- Speech recognition
- Sentiment classification
- Machine translation
- Name entity recognition
- …
27 "The quick brown fox jumped over the lazy dog." "There is nothing to like in this movie." "The quick brown fox jumped over the lazy dog." "快速的棕色狐狸跳过 懒狗。" "Yesterday, John met Merry." "Yesterday, John met Merry."
Sequence data: ✓ Elements from a list ✓ Arrange elements in order
One-hot representation
28
input: Harry Potter and Herminone Granger invented a new spell.
Vocabulary: 10,000 words 𝑏 𝑏𝑏𝑠𝑝𝑜 ⋮ 𝑏𝑜𝑒 ⋮ ℎ𝑏𝑠𝑠𝑧 ⋮ 𝑞𝑝𝑢𝑢𝑓𝑠 ⋮ 𝑨𝑣𝑚𝑣 1 2 ⋮ 367 ⋮ 4,075 ⋮ 6,830 ⋮ 10,000 ⋮ ⋮ ⋮ 1 ⋮ ⋮ ⋮ 1 ⋮ ⋮ ⋮ 1 ⋮ ⋮ ⋮ 1 ⋮ ⋮ ⋮ ⋮
Why not standard networks?
29
Problems:
- Inputs, outputs can be different lengths in different examples.
- Doesn't share features learned across different positions of text.
Recurrent Neural Networks
- Forward propagation
30
RNN Cell
𝑦<1> ො 𝑧<1>
RNN Cell
𝑦<2> ො 𝑧<2>
RNN Cell
𝑦<3> ො 𝑧<3>
RNN Cell
𝑦<𝑈
𝑦>
ො 𝑧<𝑈
𝑧>
… 𝑏<1> 𝑏<2> 𝑏<3> 𝑏<..> 𝑏<0> Time step Teddy Roosevelt was a great President. Teddy bears are on sale!
Recurrent Neural Networks
- RNNs cell
31
RNN Cell
𝑦<𝑢> ො 𝑧<𝑢>
𝑏<𝑢−1> 𝑏<𝑢>
𝑏<𝑢> = 1 𝑋
𝑏𝑏𝑏<𝑢−1> + 𝑋 𝑏𝑦𝑦<𝑢> + 𝑐𝑏
ො 𝑧<𝑢> = 2(𝑋
𝑏𝑧𝑏<𝑢> + 𝑐𝑧)
tanh/ReLU sigmoid
Recurrent Neural Networks
- Backward propagation through time
32
RNN Cell
𝑦<1> ො 𝑧<1>
RNN Cell
𝑦<2> ො 𝑧<2>
RNN Cell
𝑦<3> ො 𝑧<3>
RNN Cell
𝑦<𝑈
𝑦>
ො 𝑧<𝑈
𝑧>
… 𝑏<1> 𝑏<2> 𝑏<3> 𝑏<..> 𝑏<0> 𝑀 ො 𝑧, 𝑧 =
𝑢=1 𝑈
𝑦
𝑀<𝑢>(ො 𝑧<𝑢>, 𝑧<𝑢>) Loss function:
Different types of RNNs
33
𝑼𝒚 = 𝑼𝒛 𝑼𝒚 ≠ 𝑼𝒛
Vanishing gradient with RNNs
34
The cat, which already ate the food, was full. The cats, which already ate the food, were full.
RNN Cell
𝑦<1> ො 𝑧<1>
RNN Cell
𝑦<2> ො 𝑧<2>
RNN Cell
𝑦<3> ො 𝑧<3>
RNN Cell
𝑦<𝑈
𝑦>
ො 𝑧<𝑈
𝑧>
… 𝑏<1> 𝑏<2> 𝑏<3> 𝑏<..> 𝑏<0>
Solution to vanishing gradient
- GRU – Gated Recurrent Unit
- TCN – Time Convolutional Networks
- LSTM – Long-Short Term Memory Unit
35
Summary for RNNs
- What is sequence data?
- One-hot representation for words in a vocabulary
- Why not standard networks?
- RNNs
- Forward
- RNNs cell
- Backward
- Different types of RNNs
- Solution to vanishing gradient
36
Deep Learning Frameworks
Deep Learning Frameworks
- Popular frameworks
- Less frequently frameworks
38 (Python, C, Java, GO) (Python, backends support
- ther languages)
(Python) (C++, Python, Matlab) (Python) (Python, C++) (Python, R, Julia, Scala, Go, Javascript and more) (Matlab) (Python, C++, C#) (Python)
Popularity
39
- Google trends for search items '[name] github'
- Baidu index for search item '[name] + github'
TensorFlow PyTorch Caffe Keras TensorFlow PyTorch Caffe Keras
How do frameworks represent data?
- Tensors/Blobs
40
…
Batch size: 𝑂 Width: 𝑋 Height: 𝐼 Channel: 𝐷
input/internal data: 𝑶 × 𝑫 × 𝑰 × 𝑿 Output channel: 𝑃𝐷 Kernel width: K𝑋 Kernel height: K𝐼 Input channel: I𝐷 … convolution kernel: 𝐏𝐃 × 𝑱𝑫 × 𝑳𝑰 × 𝑳𝑿
Typical Training Steps
41
Converting Between Frameworks
- Develop in one framework, deploy in another
- https://github.com/ysh329/deep-learning-model-convertor
42
Standard Format for Models
43
- Intermediate representation
- Native support for most popular frameworks
- Convert models from one format to another
- Model visualization (MMdnn only)
Main Reference
- Mitra, et al, SIGGRAPH Asia 2018 Course notes
- Naoki Shibuya, Up-sampling with Transposed Convolution,
towarddatascience.com, 2017.
- J. Long, et al, Fully Convolutional Networks for Semantic
Segmentation, arXiv:1411.4038v2, 2015
- Andrew Ng, Recurrent Neural Networks, Coursera
- More references on notes of each slide.
44
Thank You!☺
DOWNLOADS at http://vcc.szu.edu.cn