neural network basics part ii
play

Neural Network Basics Part II Content Image-to-image Why fully - PowerPoint PPT Presentation

Neural Network Basics Part II Content Image-to-image Why fully convolutional? Fully Convolutional Networks (FCN) Up-sampling Network architecture Recurrent Neural Networks Sequence data and representation


  1. Neural Network Basics Part II 冯远滔

  2. Content • Image-to-image • Why fully convolutional? • Fully Convolutional Networks (FCN) • Up-sampling • Network architecture • Recurrent Neural Networks • Sequence data and representation • RNNs model: Forward & backward • Different types of RNNs • LSTM unit • Deep Learning Frameworks • Deep learning frameworks & popularity • Data representation • Typical training steps • Model convertors • Standard model format 2

  3. Image-to-Image

  4. Why fully convolutional? • Detection: class image deep CNNs bbox One- stage: YOLO, SSD, … input output Two-stage: Faster R- CNN,… 4

  5. Why fully convolutional? • Graphics & … : image volume ? image deep CNNs 3D mesh ~15’: AlexNet , VGG, … …… input output with fully connected ✖ layer 5

  6. Fixed input size in NNs with FC layers • Fully connected layers in VGG-16: fully connected layer 6 vector output conv layers & flatten pooling layers ⋮ … feature image ⋮ map 7, 7, 512 𝑔 𝑌 = 𝑋 𝑈 𝑌 + 𝑐 224, 224, 3 ⋮ 𝑌 1, 7 × 7 × 512 𝑋 7 × 7 × 512, 4096 6

  7. Fully Connected vs. Fully Convolutional Fully connected Fully convolutional Input size ✖ Fixed ✔ Any Computation ✖ Intensive ✔ Less intensive Spatial information ✖ Lost ✔ Preserved Computation in AlexNet: - Weights: Conv layers 90% : FC layers 10% - Computation: Conv layers 10% : FC layers 90% Spatial information: - Conv layers: Volume -> Volume - FC layers: Volume -> Vector 7

  8. Fully Convolutional Networks Questions: 1. Up-sampling? 2. Original Size? J. Long, et al, Fully Convolutional Networks for Semantic Segmentation, 2014 8

  9. How to do up-sampling? (i, j) (i+1, j) 𝑣 < 0.5 𝑣 ≥ 0.5 • Interpolation 𝑤 < 0.5 𝑤 < 0.5 • Nearest neighbor interpolation • Linear interpolation 𝑣 < 0.5 𝑣 ≥ 0.5 • Bi-linear interpolation 𝑤 ≥ 0.5 𝑤 ≥ 0.5 • Bi-cubic interpolation (i, j+1) (i+1, j+1) • Drawbacks for interpolation variants • Manual feature engineering (𝑦 𝑐 , 𝑧 𝑐 ) • Nothing for the networks to learn (𝑦, 𝑧) (𝑦 𝑏 , 𝑧 𝑏 ) 9

  10. How to do up-sampling? • Padding with zeros/Un-pooling Matthew D. Zeiler, et al, Visualizing and Understanding Convolutional Networks, 2013 10

  11. How to do up-sampling? • Transpose convolution 11

  12. Transpose Convolution • Convolutions: • Input ( 𝑜 × 𝑜 ) • 4 × 4 feature map • Kernel ( 𝑔 × 𝑔 , 𝑞 , 𝑡 ) • 3 × 3 kernel • 0 padding • 1 stride • Output • 2 × 2 feature map • Output size 𝑜+2𝑞−𝑔 • 𝑔𝑚𝑝𝑝𝑠 + 1 𝑡 12

  13. Transpose Convolution • Going backward of convolution 𝑍 = 𝐷𝑌 𝐷 𝑈 𝑍 = 𝑌 13

  14. Transpose Convolution 1 0 2 • Convolutional matrix for 𝑍 = 𝐷𝑌 1 4 1 0 • Kernel 3 × 3 1 4 3 1 • Padding 0 3 3 1 • Stride 1 2 Kernel (3, 3) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 4 1 0 1 4 3 0 3 3 1 0 0 0 0 0 0 0 1 4 1 0 1 4 3 0 3 3 1 0 0 0 0 1 0 0 0 0 1 4 1 0 1 4 3 0 3 3 1 0 2 0 0 0 0 0 1 4 1 0 1 4 3 0 3 3 1 3 Convolution matrix 𝐷 (4, 16) 14

  15. 4 0 Transpose Convolution 5 1 8 2 • Flatten the input matrix 7 3 4, 4 -> (16, 1) • 1 4 8 5 8 0 1 2 3 6 4 5 8 7 8 0 7 Flatten 1 8 8 8 3 1 8 Flattened input matrix 𝑌 (16, 1) 6 3 6 6 4 9 2 6 6 5 7 8 10 3 4 11 Input matrix (4, 4) 6 12 5 13 7 14 8 15 15

  16. Transpose Convolution • Perform 'convolution' and resize • 𝐷𝑌 = 𝑍 • Resize 𝑌 112 112 148 148 Resize 126 126 134 134 Output 𝑍 (2, 2) Output (4, 1) 16

  17. Transpose convolution • Perform transpose convolution • 𝐷𝑌 = 𝑍 • 𝑌 = 𝐷 𝑈 𝑍 2 Resize 1 Transposed convolution matrix 𝐷 𝑈 (16, 4) 4 4 Input 𝑍 (4, 1) Output (16, 1) Output 𝑌 (4, 4) 17

  18. Transpose Convolution in Caffe • Forward: im2col 18

  19. Transpose Convolution in Caffe • Forward: col2im T col2im 𝐷 𝑝𝑣𝑢 × 𝐼 ′ × 𝑋 ′ feature map 19

  20. Transpose Convolution in Caffe • Backward 𝑍 𝑚−1 • Loss function: 𝑀 𝑈 • 𝑍 𝑚 = 𝑑𝑝𝑚2𝑗𝑛 𝑗𝑛2𝑑𝑝𝑚 𝐿 𝑚 𝑗𝑛2𝑑𝑝𝑚 𝑌 𝑚 𝐷 𝑚 𝜖𝑍 𝑚 ∙ 𝜖𝑍 𝑚 𝜖𝑍 𝑚−1 = 𝜖𝑀 𝜖𝑀 𝜖𝑀 𝜖𝑀 𝜖𝑍 𝑚−1 𝜖𝑍 𝑚 ∙ 𝜖𝑍 𝑚 = 𝜖𝑀 𝜖𝑍 𝑚−1 𝜖𝑍 𝑚 𝜖𝑌 𝑚 = 𝐷 𝑚 𝑈 ∙ 𝜖𝑀 𝜖𝑍 𝑚 Transpose convolution: 𝑌 = 𝐷 𝑈 𝑍 20

  21. Transpose Convolution in Caffe • 'deconv_layer.cpp' • 𝑌 = 𝐷 𝑈 𝑍 21

  22. Original size 𝐼 2 × 𝑋 𝐼 2 × 𝑋 2 2 𝐼 4 × 𝑋 𝐼 4 × 𝑋 𝐼 8 × 𝑋 4 4 𝐼 8 × 𝑋 8 8 16 × 𝑋 𝐼 16 22

  23. Network architectures for Image-to-image • Encoder-decoder Edgar Simo-Serra, et al, Fully Learning to Simplify: Fully Convolutional Networks for Rough Sketch Cleanup, 2016 23

  24. Network architectures for Image-to-image • Encoder-decoder + skip connections 24 Olaf Ronneberger, et al, U-Net: Convolution Networks for Biomedical Image Segmentation, 2015

  25. Summary for Image-to-Image • Why fully convolutional? • Analysis of fully connected layers • Fully convolutional vs. fully connected • Fully Convolutional Networks (FCN) • Up-sampling • Interpolation • Un-pooling • Transpose convolution Theory • Implementation in Caffe • • Network architecture • Encoder-decoder • Encoder-decoder + skip connections 25

  26. Recurrent Neural Networks

  27. Examples of Sequence data "The quick brown fox • Speech recognition jumped over the lazy dog." "There is nothing to • Sentiment classification like in this movie." "The quick brown fox • Machine translation " 快速的棕色狐狸跳过 jumped over the lazy dog." 懒狗。 " • Name entity recognition "Yesterday, John met "Yesterday, John met Merry." Merry ." • … Sequence data: ✓ Elements from a list ✓ Arrange elements in order 27

  28. One-hot representation input: Harry Potter and Herminone Granger invented a new spell. Vocabulary: 10,000 words 𝑏 0 0 1 1 0 𝑏𝑏𝑠𝑝𝑜 0 0 2 0 0 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 𝑏𝑜𝑒 1 0 367 0 0 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ℎ𝑏𝑠𝑠𝑧 1 0 0 4,075 0 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 𝑞𝑝𝑢𝑢𝑓𝑠 6,830 0 0 0 1 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 𝑨𝑣𝑚𝑣 0 0 10,000 0 0 28

  29. Why not standard networks? Problems : - Inputs, outputs can be different lengths in different examples. - Doesn't share features learned across different positions of text. 29

  30. Recurrent Neural Networks • Forward propagation 𝑧 <1> 𝑧 <2> 𝑧 <3> 𝑧 <𝑈 𝑧 > ො ො ො ො RNN RNN RNN RNN 𝑏 <0> 𝑏 <1> 𝑏 <2> 𝑏 <3> 𝑏 <..> … Cell Cell Cell Cell 𝑦 <1> 𝑦 <2> 𝑦 <3> 𝑦 <𝑈 𝑦 > Teddy Roosevelt was a great President. Time step Teddy bears are on sale! 30

  31. Recurrent Neural Networks • RNNs cell 𝑧 <𝑢> ො tanh/ReLU 𝑏 <𝑢> = 𝑕 1 𝑋 𝑏𝑏 𝑏 <𝑢−1> + 𝑋 𝑏𝑦 𝑦 <𝑢> + 𝑐 𝑏 𝑏 <𝑢−1> 𝑏 <𝑢> RNN Cell 𝑧 <𝑢> = 𝑕 2 (𝑋 𝑏𝑧 𝑏 <𝑢> + 𝑐 𝑧 ) ො 𝑦 <𝑢> sigmoid 31

  32. Recurrent Neural Networks • Backward propagation through time 𝑧 <1> 𝑧 <2> 𝑧 <3> 𝑧 <𝑈 𝑧 > ො ො ො ො 𝑏 <0> RNN RNN RNN RNN 𝑏 <1> 𝑏 <2> 𝑏 <3> 𝑏 <..> … Cell Cell Cell Cell 𝑦 <1> 𝑦 <2> 𝑦 <3> 𝑦 <𝑈 𝑦 > 𝑈 𝑦 𝑀 <𝑢> (ො 𝑧 <𝑢> , 𝑧 <𝑢> ) 𝑀 ො 𝑧, 𝑧 = ෍ Loss function: 32 𝑢=1

  33. Different types of RNNs 𝑼 𝒚 = 𝑼 𝒛 𝑼 𝒚 ≠ 𝑼 𝒛 33

  34. Vanishing gradient with RNNs The cat , which already ate the food, was full. The cats , which already ate the food, were full. 𝑧 <𝑈 𝑧 > 𝑧 <1> 𝑧 <2> 𝑧 <3> ො ො ො ො RNN RNN RNN RNN 𝑏 <0> 𝑏 <1> 𝑏 <2> 𝑏 <3> 𝑏 <..> … Cell Cell Cell Cell 𝑦 <1> 𝑦 <2> 𝑦 <3> 𝑦 <𝑈 𝑦 > 34

  35. Solution to vanishing gradient • GRU – Gated Recurrent Unit • TCN – Time Convolutional Networks • LSTM – Long-Short Term Memory Unit 35

  36. Summary for RNNs • What is sequence data? • One-hot representation for words in a vocabulary • Why not standard networks? • RNNs • Forward RNNs cell • • Backward • Different types of RNNs • Solution to vanishing gradient 36

  37. Deep Learning Frameworks

  38. Deep Learning Frameworks • Popular frameworks (Python) (C++, Python, Matlab) (Python, backends support (Python, C, Java, GO) other languages) • Less frequently frameworks (Python, C++) (Python) (Python) (Python, C++, C#) 38 (Python, R, Julia, Scala, Go, Javascript and more) (Matlab)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend