Lecture 6: Convolutional NN Princeton University COS 495 - - PowerPoint PPT Presentation

β–Ά
lecture 6 convolutional nn
SMART_READER_LITE
LIVE PREVIEW

Lecture 6: Convolutional NN Princeton University COS 495 - - PowerPoint PPT Presentation

Deep Learning Basics Lecture 6: Convolutional NN Princeton University COS 495 Instructor: Yingyu Liang Review: convolutional layers Convolution: two dimensional case Input Kernel/filter a b c d w x e f g h y z i j k l wa + bx


slide-1
SLIDE 1

Deep Learning Basics Lecture 6: Convolutional NN

Princeton University COS 495 Instructor: Yingyu Liang

slide-2
SLIDE 2

Review: convolutional layers

slide-3
SLIDE 3

Convolution: two dimensional case

a b c d e f g h i j k l w x y z bw + cx + fy + gz wa + bx + ey + fz Kernel/filter Feature map Input

slide-4
SLIDE 4

Convolutional layers

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

the same weight shared for all output nodes 𝑛 output nodes π‘œ input nodes 𝑙 kernel size

slide-5
SLIDE 5

Terminology

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

slide-6
SLIDE 6

Case study: LeNet-5

slide-7
SLIDE 7

LeNet-5

  • Proposed in β€œGradient-based learning applied to document

recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner,

in Proceedings of the IEEE, 1998

slide-8
SLIDE 8

LeNet-5

  • Proposed in β€œGradient-based learning applied to document

recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner,

in Proceedings of the IEEE, 1998

  • Apply convolution on 2D images (MNIST) and use backpropagation
slide-9
SLIDE 9

LeNet-5

  • Proposed in β€œGradient-based learning applied to document

recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner,

in Proceedings of the IEEE, 1998

  • Apply convolution on 2D images (MNIST) and use backpropagation
  • Structure: 2 convolutional layers (with pooling) + 3 fully connected layers
  • Input size: 32x32x1
  • Convolution kernel size: 5x5
  • Pooling: 2x2
slide-10
SLIDE 10

LeNet-5

Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

slide-11
SLIDE 11

LeNet-5

Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

slide-12
SLIDE 12

LeNet-5

Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Filter: 5x5, stride: 1x1, #filters: 6

slide-13
SLIDE 13

LeNet-5

Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Pooling: 2x2, stride: 2

slide-14
SLIDE 14

LeNet-5

Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Filter: 5x5x6, stride: 1x1, #filters: 16

slide-15
SLIDE 15

LeNet-5

Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Pooling: 2x2, stride: 2

slide-16
SLIDE 16

LeNet-5

Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Weight matrix: 400x120

slide-17
SLIDE 17

LeNet-5

Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Weight matrix: 120x84 Weight matrix: 84x10

slide-18
SLIDE 18

Software platforms for CNN

Updated in April 2016; checked more recent ones online

slide-19
SLIDE 19

Platform: Marvin (marvin.is)

slide-20
SLIDE 20

Platform: Marvin by

slide-21
SLIDE 21

LeNet in Marvin: convolutional layer

slide-22
SLIDE 22

LeNet in Marvin: pooling layer

slide-23
SLIDE 23

LeNet in Marvin: fully connected layer

slide-24
SLIDE 24

Platform: Caffe (caffe.berkeleyvision.org)

slide-25
SLIDE 25

LeNet in Caffe

slide-26
SLIDE 26

Platform: Tensorflow (tensorflow.org)

slide-27
SLIDE 27

Platform: Tensorflow (tensorflow.org)

slide-28
SLIDE 28

Platform: Tensorflow (tensorflow.org)

slide-29
SLIDE 29

Others

  • Theano – CPU/GPU symbolic expression compiler in python (from

MILA lab at University of Montreal)

  • Torch – provides a Matlab-like environment for state-of-the-art

machine learning algorithms in lua

  • Lasagne - Lasagne is a lightweight library to build and train neural

networks in Theano

  • See: http://deeplearning.net/software_links/
slide-30
SLIDE 30

Optimization: momentum

slide-31
SLIDE 31

Basic algorithms

  • Minimize the (regularized) empirical loss

ΰ·  𝑀𝑆 πœ„ =

1 π‘œ σ𝑒=1 π‘œ

π‘š(πœ„, 𝑦𝑒, 𝑧𝑒) + 𝑆(πœ„) where the hypothesis is parametrized by πœ„

  • Gradient descent

πœ„π‘’+1 = πœ„π‘’ βˆ’ πœƒπ‘’π›Όΰ·  𝑀𝑆 πœ„π‘’

slide-32
SLIDE 32

Mini-batch stochastic gradient descent

  • Instead of one data point, work with a small batch of 𝑐 points

(𝑦𝑒𝑐+1,𝑧𝑒𝑐+1),…, (𝑦𝑒𝑐+𝑐,𝑧𝑒𝑐+𝑐)

  • Update rule

πœ„π‘’+1 = πœ„π‘’ βˆ’ πœƒπ‘’π›Ό 1 𝑐 ෍

1≀𝑗≀𝑐

π‘š πœ„π‘’, 𝑦𝑒𝑐+𝑗, 𝑧𝑒𝑐+𝑗 + 𝑆(πœ„π‘’)

slide-33
SLIDE 33

Momentum

  • Drawback of SGD: can be slow when gradient is small
  • Observation: when the gradient is consistent across consecutive steps,

can take larger steps

  • Metaphor: rolling marble ball on gentle slope
slide-34
SLIDE 34

Momentum

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Contour: loss function Path: SGD with momentum Arrow: stochastic gradient

slide-35
SLIDE 35

Momentum

  • work with a small batch of 𝑐 points

(𝑦𝑒𝑐+1,𝑧𝑒𝑐+1),…, (𝑦𝑒𝑐+𝑐,𝑧𝑒𝑐+𝑐)

  • Keep a momentum variable 𝑀𝑒, and set a decay rate 𝛽
  • Update rule

𝑀𝑒 = π›½π‘€π‘’βˆ’1 βˆ’ πœƒπ‘’π›Ό 1 𝑐 ෍

1≀𝑗≀𝑐

π‘š πœ„π‘’, 𝑦𝑒𝑐+𝑗, 𝑧𝑒𝑐+𝑗 + 𝑆(πœ„π‘’) πœ„π‘’+1 = πœ„π‘’ + 𝑀𝑒

slide-36
SLIDE 36

Momentum

  • Keep a momentum variable 𝑀𝑒, and set a decay rate 𝛽
  • Update rule

𝑀𝑒 = π›½π‘€π‘’βˆ’1 βˆ’ πœƒπ‘’π›Ό 1 𝑐 ෍

1≀𝑗≀𝑐

π‘š πœ„π‘’, 𝑦𝑒𝑐+𝑗, 𝑧𝑒𝑐+𝑗 + 𝑆(πœ„π‘’) πœ„π‘’+1 = πœ„π‘’ + 𝑀𝑒

  • Practical guide: 𝛽 is set to 0.5 until the initial learning stabilizes and

then is increased to 0.9 or higher.