CENG5030 Part 2-1: Introduction to Convolutional Nueral Network Bei - - PowerPoint PPT Presentation

▶

Nov 13, 2022 173 likes •476 views

CENG5030 Part 2-1: Introduction to Convolutional Nueral Network Bei Yu (Latest update: March 4, 2019) Spring 2019 1 / 22 Overview CNN Architecture Overview CNN Energy Efficiency CNN on Embedded Platform 2 / 22 Overview CNN Architecture

SLIDE 1

CENG5030 Part 2-1: Introduction to Convolutional Nueral Network

Bei Yu

(Latest update: March 4, 2019)

Spring 2019

1 / 22

SLIDE 2

Overview

CNN Architecture Overview CNN Energy Efficiency CNN on Embedded Platform

2 / 22

SLIDE 3

Overview

CNN Architecture Overview CNN Energy Efficiency CNN on Embedded Platform

3 / 22

SLIDE 4

CNN Architecture Overview

◮ Convolution Layer ◮ Rectified Linear Unit (ReLU) ◮ Pooling Layer ◮ Fully Connected Layer

…

CONV

max(0,x)

ReLU POOL CONV

max(0,x)

ReLU POOL …… FC Hotspot Non-hotspot

3 / 22

SLIDE 5

Convolution Layer

Convolution Operation: I ⊗ K(x, y) =

I(i, x − j, y − k)K(j, k)

…

CONV

max(0,x)

ReLU POOL CONV

max(0,x)

ReLU POOL …… FC Hotspot Non-hotspot

…

CONV

max(0,x)

ReLU POOL CONV

max(0,x)

ReLU POOL …… FC Hotspot Non-hotspot 4 / 22

SLIDE 6

Convolution Layer (cont.)

Effect of different convolution kernel sizes:

(a) 7 × 7 (b) 5 × 5 (c) 3 × 3 Kernel Size Padding Test Accuracy

7 × 7

3 87.50%

5 × 5

2 93.75%

3 × 3

1 96.25%

5 / 22

SLIDE 7

Rectified Linear Unit

…

CONV

max(0,x)

ReLU POOL CONV

max(0,x)

ReLU POOL …… FC Hotspot Non-hotspot

◮ Alleviate overfitting with sparse feature map ◮ Avoid gradient vanishing problem

Activation Function Expression Validation Loss ReLU

max{x, 0}

0.16 Sigmoid

1 1+exp(−x)

87.0 TanH

exp(2x)−1 exp(2x)+1

0.32 BNLL

log(1 + exp(x))

87.0 WOAF NULL 87.0

6 / 22

SLIDE 8

Pooling Layer

…

CONV

max(0,x)

ReLU POOL CONV

max(0,x)

ReLU POOL …… FC Hotspot Non-hotspot

◮ Extracts the local region statistical attributes in the feature map

1 2 3 5 6 7 9 10 11 4 8 12 13 14 15 16 6 16 14 8

MAXPOOL

(a) max pooling

3.5 13.5 11.5 5.5

AVEPOOL

1 2 3 5 6 7 9 10 11 4 8 12 13 14 15 16

(b) avg pooling

7 / 22

SLIDE 9

Pooling Layer (cont.)

◮ Translation invarient ◮ Dimension reduction

Effect of pooling methods: Pooling Method Kernel Test Accuracy Max

2 × 2

96.25% Ave

2 × 2

96.25% Stochastic

2 × 2

90.00%

8 / 22

SLIDE 10

Fully Connected Layer

◮ Fully connected layer transforms high dimension feature maps into flattened vector.

…

CONV

max(0,x)

ReLU POOL CONV

max(0,x)

ReLU POOL …… FC Hotspot Non-hotspot

9 / 22

SLIDE 11

Fully Connected Layer (cont.)

◮ A percentage of nodes are dropped out (i.e. set to zero) ◮ avoid overfitting

… … 16x16x32 2048 512 C5-3 P5

……

Convolutional Hidden Layers

Effect of dropout ratio: 0.5 1 90.00 95.00 100.00

Dropout Ratio Accuracy (%)

10 / 22

SLIDE 12

Fully Connected Layer (cont.)

◮ A percentage of nodes are dropped out (i.e. set to zero) ◮ avoid overfitting

… … 16x16x32 2048 512 C5-3 P5

……

Convolutional Hidden Layers

Effect of dropout ratio: 0.5 1 90.00 95.00 100.00

Dropout Ratio Accuracy (%)

10 / 22

SLIDE 13

Overview

CNN Architecture Overview CNN Energy Efficiency CNN on Embedded Platform

11 / 22

SLIDE 14

Computer Vision

◮ Humans use their eyes and their brains to visually sense the world. ◮ Computers user their cameras and computation to visually sense the world

Jian Sun, “Introduction to Computer Vision and Deep Learning”.

11 / 22

SLIDE 15

Few More Core Problems

Segmentation Classification Detection

Image Region

Pixel Sequence

Video

12 / 22

SLIDE 16

A Bit of History

Jian Sun, “Introduction to Computer Vision and Deep Learning”.

13 / 22

SLIDE 17

Winter of Neural Networks (mid 90’ – 2006)

◮ The rises of SVM, Random forest ◮ No theory to play ◮ Lack of training data ◮ Benchmark is insensitive ◮ Difficulties in optimization ◮ Hard to reproduce results Curse

“Deep neural networks are no good and could never be trained.”

14 / 22

SLIDE 18

Renaissance of Deep Learning (2006 – )

◮ A fast learning algorithm for deep belief nets. [Hinton et.al 1996] ◮ Data + Computing + Industry Competition ◮ NVidia’s GPU, Google Brain (16,000 CPUs) ◮ Speech: Microsoft [2010], Google [2011], IBM ◮ Image: AlexNet, 8 layers [Krizhevsky et.al 2012] (26.2% -> 15.3%)

15 / 22

SLIDE 19

Slide Credit: He et al. (MSRA)

11x11 conv, 96, /4, pool/2

5x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384 3x3 conv, 256, pool/2 fc, 4096 fc, 4096 fc, 1000 AlexNet, 8 layers (ILSVRC 2012)

Revolution of Depth

16 / 22

SLIDE 20

Slide Credit: He et al. (MSRA)

11x11 conv, 96, /4, pool/2

5x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384 3x3 conv, 256, pool/2 fc, 4096 fc, 4096 fc, 1000

AlexNet, 8 layers (ILSVRC 2012)

3x3 conv, 64 3x3 conv, 64, pool/2 3x3 conv, 128 3x3 conv, 128, pool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 fc, 4096 fc, 4096 fc, 1000

VGG, 19 layers (ILSVRC 2014)

in pu t C o n v 7 x 7 + 2 (S ) Max P ool 3 x 3 + 2 (S ) L o c a l R e s p N o r m C o n v 1 x 1 + 1 (V) C o n v 3 x 3 + 1 (S ) L o c a l R e s p N o r m Max P ool 3 x 3 + 2 (S ) C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t Max P ool 3 x 3 + 2 (S ) C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v M a x P o o l Av e r a g e P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 3 (V) C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t C o n v C o n v M a x P o o l Av e r a g e P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 3 (V) D e pt h Co n c a t Max P ool 3 x 3 + 2 (S ) C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n ca t C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n ca t Av e r a ge Po o l 7 x 7 + 1 (V) FC D e p t h C o n c a t F C C o n v C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) FC S oft max A c t i v a t i o n so f t m a x 0 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) FC FC S oft max A c t i v a t i o n so f t m a x 1 S oft max A c t i v a t i o n s o f t m a x 2

GoogleNet, 22 layers (ILSVRC 2014)

Revolution of Depth

16 / 22

SLIDE 21

Slide Credit: He et al. (MSRA)

1x1 co nv , 256 1x1 co nv , 128, /2 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 256, /2 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 512, /2 3x3 co nv , 512 1x1 co nv , 2048 1x1 co nv , 512 3x3 co nv , 512 1x1 co nv , 2048 1x1 co nv , 512 3x3 co nv , 512 1x1 co nv , 2048 ave pool, fc 1000 7x7 conv, 64, /2, pool /2 1x1 co nv , 64 3x3 co nv , 64 1x1 co nv , 256 1x1 co nv , 64 3x3 co nv , 64 1x1 co nv , 256 1x1 co nv , 64 3x3 co nv , 64

AlexNet, 8 layers (ILSVRC 2012) ResNet, 152 layers (ILSVRC 2015)

3x3 conv , 64 3x3 conv , 64, pool/2 3x3 conv , 128 3x3 conv , 128, pool/2 3x3 conv , 256 3x3 conv , 256 3x3 conv , 256 3x3 conv , 256, pool/2 3x3 conv , 512 3x3 conv , 512 3x3 conv , 512 3x3 conv , 512, pool/2 3x3 conv , 512 3x3 conv , 512 3x3 conv , 512 3x3 conv , 512, pool/2 fc, 4096 fc, 4096 fc, 1000 5x5 conv , 256, pool/2 3x3 conv , 384 3x3 conv , 384 3x3 conv , 256, pool/2 fc, 4096 fc, 4096 fc, 1000

VGG, 19 layers (ILSVRC 2014)

Revolution of Depth

16 / 22

SLIDE 22

Some Recent Classification Architectures

◮ AlexNet (Krizhevsky, Sutskever, and E. Hinton 2012) 233MB ◮ Network in Network (Lin, Chen, and Yan 2013) 29MB ◮ VGG (Simonyan and Zisserman 2015) 549MB ◮ GoogleNet (Szegedy, Liu, et al. 2015) 51MB ◮ ResNet (He et al. 2016) 215MB ◮ Inception-ResNet (Szegedy, Vanhoucke, et al. 2016) ◮ DenseNet (Huang et al. 2017) ◮ Xception (Chollet 2017) ◮ MobileNetV2 (Sandler et al. 2018) ◮ ShuffleNet (Zhang et al. 2018)

17 / 22

SLIDE 23

Some Recent Classification Architectures

◮ AlexNet (Krizhevsky, Sutskever, and E. Hinton 2012) 233MB ◮ Network in Network (Lin, Chen, and Yan 2013) 29MB ◮ VGG (Simonyan and Zisserman 2015) 549MB ◮ GoogleNet (Szegedy, Liu, et al. 2015) 51MB ◮ ResNet (He et al. 2016) 215MB ◮ Inception-ResNet (Szegedy, Vanhoucke, et al. 2016) 23MB ◮ DenseNet (Huang et al. 2017) 80MB ◮ Xception (Chollet 2017) 22MB ◮ MobileNetV2 (Sandler et al. 2018) 14MB ◮ ShuffleNet (Zhang et al. 2018) 22MB

17 / 22

SLIDE 24

1Alfredo Canziani, Adam Paszke, and Eugenio Culurciello (2017). “An analysis of deep neural network models for practical

applications”. In: arXiv preprint.

18 / 22

SLIDE 25

Convolutional Neural Network (CNN)

19 / 22

SLIDE 26

Overview

CNN Architecture Overview CNN Energy Efficiency CNN on Embedded Platform

20 / 22

SLIDE 27

When Machine Learning Meets Hardware

Convolution layer is one of the most expensive layers

◮ Computation pattern ◮ Emerging challenges More and more end-point devices with limited memory ◮ Cameras ◮ Smartphone ◮ Autonomous driving

20 / 22

SLIDE 28

Source: https://basicmi.github.io/Deep-Learning-Processor-List/

21 / 22

SLIDE 29

Flexibility vs. Efficiency

Flexibility Power/Performance Efficiency

22 / 22