Neural Network Basics Part II Content Image-to-image Why fully - - PowerPoint PPT Presentation

neural network basics part ii
SMART_READER_LITE
LIVE PREVIEW

Neural Network Basics Part II Content Image-to-image Why fully - - PowerPoint PPT Presentation

Neural Network Basics Part II Content Image-to-image Why fully convolutional? Fully Convolutional Networks (FCN) Up-sampling Network architecture Recurrent Neural Networks Sequence data and representation


slide-1
SLIDE 1

Neural Network Basics Part II

冯远滔

slide-2
SLIDE 2

Content

  • Image-to-image
  • Why fully convolutional?
  • Fully Convolutional Networks (FCN)
  • Up-sampling
  • Network architecture
  • Recurrent Neural Networks
  • Sequence data and representation
  • RNNs model: Forward & backward
  • Different types of RNNs
  • LSTM unit
  • Deep Learning Frameworks
  • Deep learning frameworks & popularity
  • Data representation
  • Typical training steps
  • Model convertors
  • Standard model format

2

slide-3
SLIDE 3

Image-to-Image

slide-4
SLIDE 4

Why fully convolutional?

  • Detection:

4

image

deep CNNs

class bbox input

  • utput

One-stage: YOLO, SSD, … Two-stage: Faster R-CNN,…

slide-5
SLIDE 5

Why fully convolutional?

  • Graphics & … :

5

image

deep CNNs

input

  • utput

~15’: AlexNet, VGG, … with fully connected layer

?

image volume 3D mesh ……

slide-6
SLIDE 6

Fixed input size in NNs with FC layers

  • Fully connected layers in VGG-16:

6

flatten

feature map 7, 7, 512

1, 7 × 7 × 512

⋮ ⋮

7 × 7 × 512, 4096 image 224, 224, 3

conv layers & pooling layers

… 𝑌 𝑋 𝑔 𝑌 = 𝑋𝑈𝑌 + 𝑐

vector fully connected layer 6

  • utput
slide-7
SLIDE 7

Fully Connected vs. Fully Convolutional

7 Fully connected Fully convolutional Input size ✖Fixed ✔Any Computation ✖Intensive ✔Less intensive Spatial information ✖Lost ✔Preserved

Computation in AlexNet:

  • Weights: Conv layers 90% : FC layers 10%
  • Computation: Conv layers 10% : FC layers 90%

Spatial information:

  • Conv layers: Volume -> Volume
  • FC layers: Volume -> Vector
slide-8
SLIDE 8

Fully Convolutional Networks

8

  • J. Long, et al, Fully Convolutional Networks for Semantic Segmentation, 2014

Questions: 1. Up-sampling? 2. Original Size?

slide-9
SLIDE 9

How to do up-sampling?

  • Interpolation
  • Nearest neighbor interpolation
  • Linear interpolation
  • Bi-linear interpolation
  • Bi-cubic interpolation
  • Drawbacks for interpolation variants
  • Manual feature engineering
  • Nothing for the networks to learn

9

𝑣 < 0.5 𝑤 < 0.5 𝑣 ≥ 0.5 𝑤 < 0.5 𝑣 < 0.5 𝑤 ≥ 0.5 𝑣 ≥ 0.5 𝑤 ≥ 0.5 (i, j) (i, j+1) (i+1, j) (i+1, j+1)

(𝑦𝑏, 𝑧𝑏) (𝑦𝑐, 𝑧𝑐) (𝑦, 𝑧)

slide-10
SLIDE 10

How to do up-sampling?

  • Padding with zeros/Un-pooling

10

Matthew D. Zeiler, et al, Visualizing and Understanding Convolutional Networks, 2013

slide-11
SLIDE 11

How to do up-sampling?

  • Transpose convolution

11

slide-12
SLIDE 12

Transpose Convolution

  • Convolutions:
  • Input (𝑜 × 𝑜)
  • 4 × 4 feature map
  • Kernel (𝑔 × 𝑔, 𝑞, 𝑡)
  • 3 × 3 kernel
  • 0 padding
  • 1 stride
  • Output
  • 2 × 2 feature map
  • Output size
  • 𝑔𝑚𝑝𝑝𝑠

𝑜+2𝑞−𝑔 𝑡

+ 1

12

slide-13
SLIDE 13

Transpose Convolution

  • Going backward of convolution

13

𝑍 = 𝐷𝑌 𝐷𝑈𝑍 = 𝑌

slide-14
SLIDE 14

Transpose Convolution

  • Convolutional matrix for 𝑍 = 𝐷𝑌
  • Kernel 3 × 3
  • Padding 0
  • Stride 1

14

Convolution matrix 𝐷 (4, 16)

1 4 1 1 4 3 3 3 1

Kernel (3, 3)

1 2 1 2

1 4 1 1 4 3 3 3 1 1 4 1 1 4 3 3 3 1 1 4 1 1 4 3 3 3 1 1 4 1 1 4 3 3 3 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3

slide-15
SLIDE 15

Transpose Convolution

  • Flatten the input matrix
  • 4, 4 -> (16, 1)

15 4 5 8 7 1 8 8 8 3 6 6 4 6 5 7 8

1 2 3 1 2 3

4 5 8 7 1 8 8 8 3 6 6 4 6 5 7 8

Flatten

Input matrix (4, 4)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Flattened input matrix 𝑌 (16, 1)

slide-16
SLIDE 16

Transpose Convolution

  • Perform 'convolution' and resize
  • 𝐷𝑌 = 𝑍
  • Resize 𝑌

16

112 148 126 134 112 148 126 134

Resize

Output (4, 1) Output 𝑍 (2, 2)

slide-17
SLIDE 17

Transpose convolution

  • Perform transpose convolution
  • 𝐷𝑌 = 𝑍
  • 𝑌 = 𝐷𝑈𝑍

17

2 1 4 4

Resize

Input 𝑍 (4, 1) Transposed convolution matrix 𝐷𝑈 (16, 4) Output (16, 1) Output 𝑌 (4, 4)

slide-18
SLIDE 18

Transpose Convolution in Caffe

  • Forward: im2col

18

slide-19
SLIDE 19

Transpose Convolution in Caffe

  • Forward: col2im

19

T

𝐷𝑝𝑣𝑢 × 𝐼′ × 𝑋′ feature map

col2im

slide-20
SLIDE 20

Transpose Convolution in Caffe

  • Backward
  • Loss function: 𝑀
  • 𝑍𝑚 = 𝑑𝑝𝑚2𝑗𝑛 𝑗𝑛2𝑑𝑝𝑚 𝐿𝑚

𝑗𝑛2𝑑𝑝𝑚 𝑌𝑚

𝑈

20

𝜖𝑀 𝜖𝑍𝑚 𝜖𝑀 𝜖𝑍𝑚−1

𝜖𝑀 𝜖𝑍𝑚−1 = 𝜖𝑀 𝜖𝑍𝑚 ∙ 𝜖𝑍𝑚 𝜖𝑍𝑚−1 = 𝜖𝑀 𝜖𝑍𝑚 ∙ 𝜖𝑍𝑚 𝜖𝑌𝑚 = 𝐷𝑚 𝑈 ∙ 𝜖𝑀 𝜖𝑍𝑚

𝑍𝑚−1 Transpose convolution: 𝑌 = 𝐷𝑈𝑍 𝐷𝑚

slide-21
SLIDE 21

Transpose Convolution in Caffe

  • 'deconv_layer.cpp'
  • 𝑌 = 𝐷𝑈𝑍

21

slide-22
SLIDE 22

Original size

22

𝐼 2 × 𝑋 2 𝐼 4 × 𝑋 4 𝐼 8 × 𝑋 8 𝐼 16 × 𝑋 16 𝐼 8 × 𝑋 8 𝐼 4 × 𝑋 4 𝐼 2 × 𝑋 2

slide-23
SLIDE 23

Network architectures for Image-to-image

  • Encoder-decoder

23

Edgar Simo-Serra, et al, Fully Learning to Simplify: Fully Convolutional Networks for Rough Sketch Cleanup, 2016

slide-24
SLIDE 24

Network architectures for Image-to-image

  • Encoder-decoder + skip connections

24

Olaf Ronneberger, et al, U-Net: Convolution Networks for Biomedical Image Segmentation, 2015

slide-25
SLIDE 25

Summary for Image-to-Image

  • Why fully convolutional?
  • Analysis of fully connected layers
  • Fully convolutional vs. fully connected
  • Fully Convolutional Networks (FCN)
  • Up-sampling
  • Interpolation
  • Un-pooling
  • Transpose convolution
  • Theory
  • Implementation in Caffe
  • Network architecture
  • Encoder-decoder
  • Encoder-decoder + skip connections

25

slide-26
SLIDE 26

Recurrent Neural Networks

slide-27
SLIDE 27

Examples of Sequence data

  • Speech recognition
  • Sentiment classification
  • Machine translation
  • Name entity recognition

27 "The quick brown fox jumped over the lazy dog." "There is nothing to like in this movie." "The quick brown fox jumped over the lazy dog." "快速的棕色狐狸跳过 懒狗。" "Yesterday, John met Merry." "Yesterday, John met Merry."

Sequence data: ✓ Elements from a list ✓ Arrange elements in order

slide-28
SLIDE 28

One-hot representation

28

input: Harry Potter and Herminone Granger invented a new spell.

Vocabulary: 10,000 words 𝑏 𝑏𝑏𝑠𝑝𝑜 ⋮ 𝑏𝑜𝑒 ⋮ ℎ𝑏𝑠𝑠𝑧 ⋮ 𝑞𝑝𝑢𝑢𝑓𝑠 ⋮ 𝑨𝑣𝑚𝑣 1 2 ⋮ 367 ⋮ 4,075 ⋮ 6,830 ⋮ 10,000 ⋮ ⋮ ⋮ 1 ⋮ ⋮ ⋮ 1 ⋮ ⋮ ⋮ 1 ⋮ ⋮ ⋮ 1 ⋮ ⋮ ⋮ ⋮

slide-29
SLIDE 29

Why not standard networks?

29

Problems:

  • Inputs, outputs can be different lengths in different examples.
  • Doesn't share features learned across different positions of text.
slide-30
SLIDE 30

Recurrent Neural Networks

  • Forward propagation

30

RNN Cell

𝑦<1> ො 𝑧<1>

RNN Cell

𝑦<2> ො 𝑧<2>

RNN Cell

𝑦<3> ො 𝑧<3>

RNN Cell

𝑦<𝑈

𝑦>

ො 𝑧<𝑈

𝑧>

… 𝑏<1> 𝑏<2> 𝑏<3> 𝑏<..> 𝑏<0> Time step Teddy Roosevelt was a great President. Teddy bears are on sale!

slide-31
SLIDE 31

Recurrent Neural Networks

  • RNNs cell

31

RNN Cell

𝑦<𝑢> ො 𝑧<𝑢>

𝑏<𝑢−1> 𝑏<𝑢>

𝑏<𝑢> = 𝑕1 𝑋

𝑏𝑏𝑏<𝑢−1> + 𝑋 𝑏𝑦𝑦<𝑢> + 𝑐𝑏

ො 𝑧<𝑢> = 𝑕2(𝑋

𝑏𝑧𝑏<𝑢> + 𝑐𝑧)

tanh/ReLU sigmoid

slide-32
SLIDE 32

Recurrent Neural Networks

  • Backward propagation through time

32

RNN Cell

𝑦<1> ො 𝑧<1>

RNN Cell

𝑦<2> ො 𝑧<2>

RNN Cell

𝑦<3> ො 𝑧<3>

RNN Cell

𝑦<𝑈

𝑦>

ො 𝑧<𝑈

𝑧>

… 𝑏<1> 𝑏<2> 𝑏<3> 𝑏<..> 𝑏<0> 𝑀 ො 𝑧, 𝑧 = ෍

𝑢=1 𝑈

𝑦

𝑀<𝑢>(ො 𝑧<𝑢>, 𝑧<𝑢>) Loss function:

slide-33
SLIDE 33

Different types of RNNs

33

𝑼𝒚 = 𝑼𝒛 𝑼𝒚 ≠ 𝑼𝒛

slide-34
SLIDE 34

Vanishing gradient with RNNs

34

The cat, which already ate the food, was full. The cats, which already ate the food, were full.

RNN Cell

𝑦<1> ො 𝑧<1>

RNN Cell

𝑦<2> ො 𝑧<2>

RNN Cell

𝑦<3> ො 𝑧<3>

RNN Cell

𝑦<𝑈

𝑦>

ො 𝑧<𝑈

𝑧>

… 𝑏<1> 𝑏<2> 𝑏<3> 𝑏<..> 𝑏<0>

slide-35
SLIDE 35

Solution to vanishing gradient

  • GRU – Gated Recurrent Unit
  • TCN – Time Convolutional Networks
  • LSTM – Long-Short Term Memory Unit

35

slide-36
SLIDE 36

Summary for RNNs

  • What is sequence data?
  • One-hot representation for words in a vocabulary
  • Why not standard networks?
  • RNNs
  • Forward
  • RNNs cell
  • Backward
  • Different types of RNNs
  • Solution to vanishing gradient

36

slide-37
SLIDE 37

Deep Learning Frameworks

slide-38
SLIDE 38

Deep Learning Frameworks

  • Popular frameworks
  • Less frequently frameworks

38 (Python, C, Java, GO) (Python, backends support

  • ther languages)

(Python) (C++, Python, Matlab) (Python) (Python, C++) (Python, R, Julia, Scala, Go, Javascript and more) (Matlab) (Python, C++, C#) (Python)

slide-39
SLIDE 39

Popularity

39

  • Google trends for search items '[name] github'
  • Baidu index for search item '[name] + github'

TensorFlow PyTorch Caffe Keras TensorFlow PyTorch Caffe Keras

slide-40
SLIDE 40

How do frameworks represent data?

  • Tensors/Blobs

40

Batch size: 𝑂 Width: 𝑋 Height: 𝐼 Channel: 𝐷

input/internal data: 𝑶 × 𝑫 × 𝑰 × 𝑿 Output channel: 𝑃𝐷 Kernel width: K𝑋 Kernel height: K𝐼 Input channel: I𝐷 … convolution kernel: 𝐏𝐃 × 𝑱𝑫 × 𝑳𝑰 × 𝑳𝑿

slide-41
SLIDE 41

Typical Training Steps

41

slide-42
SLIDE 42

Converting Between Frameworks

  • Develop in one framework, deploy in another
  • https://github.com/ysh329/deep-learning-model-convertor

42

slide-43
SLIDE 43

Standard Format for Models

43

  • Intermediate representation
  • Native support for most popular frameworks
  • Convert models from one format to another
  • Model visualization (MMdnn only)
slide-44
SLIDE 44

Main Reference

  • Mitra, et al, SIGGRAPH Asia 2018 Course notes
  • Naoki Shibuya, Up-sampling with Transposed Convolution,

towarddatascience.com, 2017.

  • J. Long, et al, Fully Convolutional Networks for Semantic

Segmentation, arXiv:1411.4038v2, 2015

  • Andrew Ng, Recurrent Neural Networks, Coursera
  • More references on notes of each slide.

44

slide-45
SLIDE 45

Thank You!☺

DOWNLOADS at http://vcc.szu.edu.cn