Neural Networks Part 3 Yingyu Liang yliang@cs.wisc.edu Computer - - PowerPoint PPT Presentation

β–Ά
neural networks
SMART_READER_LITE
LIVE PREVIEW

Neural Networks Part 3 Yingyu Liang yliang@cs.wisc.edu Computer - - PowerPoint PPT Presentation

Neural Networks Part 3 Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison Convolutional neural networks Strong empirical application performance Convolutional networks: neural networks that


slide-1
SLIDE 1

Neural Networks

Part 3

Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison

slide-2
SLIDE 2

Convolutional neural networks

  • Strong empirical application performance
  • Convolutional networks: neural networks that use convolution in

place of general matrix multiplication in at least one of their layers for a specific kind of weight matrix 𝑋 β„Ž = 𝜏(π‘‹π‘ˆπ‘¦ + 𝑐)

slide-3
SLIDE 3

Convolution

slide-4
SLIDE 4

Convolution: discrete version

  • Given array 𝑣𝑒 and π‘₯𝑒, their convolution is a function 𝑑𝑒
  • Written as
  • When 𝑣𝑒 or π‘₯𝑒 is not defined, assumed to be 0

𝑑𝑒 = ෍

𝑏=βˆ’βˆž +∞

𝑣𝑏π‘₯π‘’βˆ’π‘ 𝑑 = 𝑣 βˆ— π‘₯

  • r

𝑑𝑒 = 𝑣 βˆ— π‘₯ 𝑒

slide-5
SLIDE 5

Illustration 1

a b c d e f x y z xb+yc+zd π‘₯= [z, y, x] 𝑣 = [a, b, c, d, e, f]

𝑑3

π±πŸ‘ 𝐱𝟐 𝐱𝟏 𝐯𝟐 π’—πŸ‘ π―πŸ’

slide-6
SLIDE 6

Illustration 1

a b c d e f x y z xc+yd+ze

𝑑4

π±πŸ‘ 𝐱𝟐 𝐱𝟏 π―πŸ‘ π’—πŸ’ π―πŸ“

slide-7
SLIDE 7

Illustration 1

a b c d e f x y z xd+ye+zf

π±πŸ‘ 𝐱𝟐 𝐱𝟏 π―πŸ’ π’—πŸ“ π―πŸ”

𝑑5

slide-8
SLIDE 8

Illustration 1: boundary case

a b c d e f x y xe+yf

π±πŸ‘ 𝐱𝟐 π’—πŸ“ π―πŸ”

𝑑6

slide-9
SLIDE 9

Illustration 1 as matrix multiplication

y z x y z x y z x y z x y z x y a b c d e f

slide-10
SLIDE 10

Illustration 2: two dimensional case

a b c d e f g h i j k l w x y z wa + bx + ey + fz

slide-11
SLIDE 11

Illustration 2

a b c d e f g h i j k l w x y z bw + cx + fy + gz wa + bx + ey + fz

slide-12
SLIDE 12

Illustration 2

a b c d e f g h i j k l w x y z bw + cx + fy + gz wa + bx + ey + fz Kernel (or filter) Feature map Input

slide-13
SLIDE 13

Advantage: sparse interaction

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Fully connected layer, 𝑛 Γ— π‘œ edges 𝑛 output nodes π‘œ input nodes

slide-14
SLIDE 14

Advantage: sparse interaction

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Convolutional layer, ≀ 𝑛 Γ— 𝑙 edges 𝑛 output nodes π‘œ input nodes 𝑙 kernel size

slide-15
SLIDE 15

Advantage: sparse interaction

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Multiple convolutional layers: larger receptive field

slide-16
SLIDE 16

Advantage: parameter sharing

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

The same kernel are used repeatedly. E.g., the black edge is the same weight in the kernel.

slide-17
SLIDE 17

Advantage: equivariant representations

  • Equivariant: transforming the input = transforming the output
  • Example: input is an image, transformation is shifting
  • Convolution(shift(input)) = shift(Convolution(input))
  • Useful when care only about the existence of a pattern, rather than

the location

slide-18
SLIDE 18

Pooling

  • Summarizing the input (i.e., output the max of the input)

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

slide-19
SLIDE 19

Advantage

Induce invariance

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

slide-20
SLIDE 20

Motivation from neuroscience

  • David Hubel and Torsten Wiesel studied early visual system in human

brain (V1 or primary visual cortex), and won Nobel prize for this

  • V1 properties
  • 2D spatial arrangement
  • Simple cells: inspire convolution layers
  • Complex cells: inspire pooling layers
slide-21
SLIDE 21

Variants of convolution and pooling

slide-22
SLIDE 22

Variants of convolutional layers

  • Multiple dimensional convolution
  • Input and kernel can be 3D
  • E.g., images have (width, height, RBG channels)
  • Multiple kernels lead to multiple feature maps (also called channels)
  • Mini-batch of images have 4D: (image_id, width, height, RBG

channels)

slide-23
SLIDE 23

Variants of convolutional layers

  • Padding: valid

a b c d e f x y z xd+ye+zf

slide-24
SLIDE 24

Variants of convolutional layers

  • Padding: same

a b c d e f x y xe+yf

slide-25
SLIDE 25

Variants of convolutional layers

  • Stride

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

slide-26
SLIDE 26

Variants of pooling

  • Stride and padding

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

slide-27
SLIDE 27

Variants of pooling

  • Max pooling 𝑧 = max{𝑦1, 𝑦2, … , 𝑦𝑙}
  • Average pooling 𝑧 = mean{𝑦1, 𝑦2, … , 𝑦𝑙}
  • Others like max-out
slide-28
SLIDE 28

Case study: LeNet-5

slide-29
SLIDE 29

LeNet-5

  • Proposed in β€œGradient-based learning applied to document

recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner,

in Proceedings of the IEEE, 1998

slide-30
SLIDE 30

LeNet-5

  • Proposed in β€œGradient-based learning applied to document

recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner,

in Proceedings of the IEEE, 1998

  • Apply convolution on 2D images (MNIST) and use backpropagation
slide-31
SLIDE 31

LeNet-5

  • Proposed in β€œGradient-based learning applied to document

recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner,

in Proceedings of the IEEE, 1998

  • Apply convolution on 2D images (MNIST) and use backpropagation
  • Structure: 2 convolutional layers (with pooling) + 3 fully connected layers
  • Input size: 32x32x1
  • Convolution kernel size: 5x5
  • Pooling: 2x2
slide-32
SLIDE 32

LeNet-5

Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

slide-33
SLIDE 33

LeNet-5

Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

slide-34
SLIDE 34

LeNet-5

Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Filter: 5x5, stride: 1x1, #filters: 6

slide-35
SLIDE 35

LeNet-5

Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Pooling: 2x2, stride: 2

slide-36
SLIDE 36

LeNet-5

Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Filter: 5x5x6, stride: 1x1, #filters: 16

slide-37
SLIDE 37

LeNet-5

Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Pooling: 2x2, stride: 2

slide-38
SLIDE 38

LeNet-5

Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Weight matrix: 400x120

slide-39
SLIDE 39

LeNet-5

Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Weight matrix: 120x84 Weight matrix: 84x10