CSC 411: Lecture 11: Neural Networks II Class based on Raquel - - PowerPoint PPT Presentation

csc 411 lecture 11 neural networks ii
SMART_READER_LITE
LIVE PREVIEW

CSC 411: Lecture 11: Neural Networks II Class based on Raquel - - PowerPoint PPT Presentation

CSC 411: Lecture 11: Neural Networks II Class based on Raquel Urtasun & Rich Zemels lectures Sanja Fidler University of Toronto March 2, 2016 Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 1 / 55 Today


slide-1
SLIDE 1

CSC 411: Lecture 11: Neural Networks II

Class based on Raquel Urtasun & Rich Zemel’s lectures Sanja Fidler

University of Toronto

March 2, 2016

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 1 / 55

slide-2
SLIDE 2

Today

Deep learning for Object Recognition

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 2 / 55

slide-3
SLIDE 3

Neural Nets for Object Recognition

People are very good at recognizing shapes

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 3 / 55

slide-4
SLIDE 4

Neural Nets for Object Recognition

People are very good at recognizing shapes

I Intrinsically difficult, computers are bad at it Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 3 / 55

slide-5
SLIDE 5

Neural Nets for Object Recognition

People are very good at recognizing shapes

I Intrinsically difficult, computers are bad at it

Why is it difficult?

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 3 / 55

slide-6
SLIDE 6

Why is it a Problem?

Difficult scene conditions [From: Grauman & Leibe]

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 4 / 55

slide-7
SLIDE 7

Why is it a Problem?

Huge within-class variations. Recognition is mainly about modeling variation. [Pic from: S. Lazebnik]

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 5 / 55

slide-8
SLIDE 8

Why is it a Problem?

Tones of classes [Biederman]

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 6 / 55

slide-9
SLIDE 9

Neural Nets for Object Recognition

People are very good at recognizing shapes

I Intrinsically difficult, computers are bad at it

Some reasons why it is difficult:

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 7 / 55

slide-10
SLIDE 10

Neural Nets for Object Recognition

People are very good at recognizing shapes

I Intrinsically difficult, computers are bad at it

Some reasons why it is difficult:

I Segmentation: Real scenes are cluttered Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 7 / 55

slide-11
SLIDE 11

Neural Nets for Object Recognition

People are very good at recognizing shapes

I Intrinsically difficult, computers are bad at it

Some reasons why it is difficult:

I Segmentation: Real scenes are cluttered I Invariances: We are very good at ignoring all sorts of variations that do

not affect shape

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 7 / 55

slide-12
SLIDE 12

Neural Nets for Object Recognition

People are very good at recognizing shapes

I Intrinsically difficult, computers are bad at it

Some reasons why it is difficult:

I Segmentation: Real scenes are cluttered I Invariances: We are very good at ignoring all sorts of variations that do

not affect shape

I Deformations: Natural shape classes allow variations (faces, letters,

chairs)

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 7 / 55

slide-13
SLIDE 13

Neural Nets for Object Recognition

People are very good at recognizing shapes

I Intrinsically difficult, computers are bad at it

Some reasons why it is difficult:

I Segmentation: Real scenes are cluttered I Invariances: We are very good at ignoring all sorts of variations that do

not affect shape

I Deformations: Natural shape classes allow variations (faces, letters,

chairs)

I A huge amount of computation is required Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 7 / 55

slide-14
SLIDE 14

How to Deal with Large Input Spaces

How can we apply neural nets to images?

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 8 / 55

slide-15
SLIDE 15

How to Deal with Large Input Spaces

How can we apply neural nets to images? Images can have millions of pixels, i.e., x is very high dimensional

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 8 / 55

slide-16
SLIDE 16

How to Deal with Large Input Spaces

How can we apply neural nets to images? Images can have millions of pixels, i.e., x is very high dimensional How many parameters do I have?

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 8 / 55

slide-17
SLIDE 17

How to Deal with Large Input Spaces

How can we apply neural nets to images? Images can have millions of pixels, i.e., x is very high dimensional How many parameters do I have? Prohibitive to have fully-connected layers

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 8 / 55

slide-18
SLIDE 18

How to Deal with Large Input Spaces

How can we apply neural nets to images? Images can have millions of pixels, i.e., x is very high dimensional How many parameters do I have? Prohibitive to have fully-connected layers What can we do?

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 8 / 55

slide-19
SLIDE 19

How to Deal with Large Input Spaces

How can we apply neural nets to images? Images can have millions of pixels, i.e., x is very high dimensional How many parameters do I have? Prohibitive to have fully-connected layers What can we do? We can use a locally connected layer

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 8 / 55

slide-20
SLIDE 20

34

Locally Connected Layer

Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters

Ra Ranzato

Note: This parameterization is good when input image is registered (e.g., face recognition).

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 9 / 55

slide-21
SLIDE 21

When Will this Work?

When Will this Work?

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 10 / 55

slide-22
SLIDE 22

When Will this Work?

When Will this Work? This is good when the input is (roughly) registered

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 10 / 55

slide-23
SLIDE 23

General Images

The object can be anywhere

[Slide: Y. Zhu]

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 11 / 55

slide-24
SLIDE 24

General Images

The object can be anywhere

[Slide: Y. Zhu]

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 12 / 55

slide-25
SLIDE 25

General Images

The object can be anywhere

[Slide: Y. Zhu]

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 13 / 55

slide-26
SLIDE 26

35

STATIONARITY? Statistics is similar at different locations

Ra Ranzato

Note: This parameterization is good when input image is registered (e.g., face recognition).

Locally Connected Layer

Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 14 / 55

slide-27
SLIDE 27

The replicated feature approach

The red connections all have the same weight.

5

Adopt approach apparently used in monkey visual systems Use many different copies of the same feature detector.

I Copies have slightly different

positions.

I Could also replicate across scale and

  • rientation.

I Tricky and expensive I Replication reduces number of free

parameters to be learned. Use several different feature types, each with its own replicated pool of detectors.

I Allows each patch of image to be

represented in several ways.

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 15 / 55

slide-28
SLIDE 28

Convolutional Neural Net

Idea: statistics are similar at different locations (Lecun 1998) Connect each hidden unit to a small input patch and share the weight across space This is called a convolution layer and the network is a convolutional network

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 16 / 55

slide-29
SLIDE 29

Convolutional Layer

Ra Ranzato

hn

j = max(0, K

X

k=1

hn−1

k

∗ w n

jk)

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 17 / 55

slide-30
SLIDE 30

Convolutional Layer

Ra Ranzato

hn

j = max(0, K

X

k=1

hn−1

k

∗ w n

jk)

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 18 / 55

slide-31
SLIDE 31

Convolutional Layer

Ra Ranzato

hn

j = max(0, K

X

k=1

hn−1

k

∗ w n

jk)

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 19 / 55

slide-32
SLIDE 32

Convolutional Layer

Ra Ranzato

hn

j = max(0, K

X

k=1

hn−1

k

∗ w n

jk)

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 20 / 55

slide-33
SLIDE 33

Convolutional Layer

Ra Ranzato

hn

j = max(0, K

X

k=1

hn−1

k

∗ w n

jk)

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 21 / 55

slide-34
SLIDE 34

Convolutional Layer

Ra Ranzato

hn

j = max(0, K

X

k=1

hn−1

k

∗ w n

jk)

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 22 / 55

slide-35
SLIDE 35

54

Learn multiple filters.

E.g.: 200x200 image 100 Filters Filter size: 10x10 10K parameters

Ra Ranzato

Convolutional Layer

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 23 / 55

slide-36
SLIDE 36

Convolutional Layer

Figure: Left: CNN, right: Each neuron computes a linear and activation function

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 24 / 55

slide-37
SLIDE 37

Convolutional Layer

Figure: Left: CNN, right: Each neuron computes a linear and activation function Hyperparameters of a convolutional layer:

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 24 / 55

slide-38
SLIDE 38

Convolutional Layer

Figure: Left: CNN, right: Each neuron computes a linear and activation function Hyperparameters of a convolutional layer: The number of filters (controls the depth of the output volume)

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 24 / 55

slide-39
SLIDE 39

Convolutional Layer

Figure: Left: CNN, right: Each neuron computes a linear and activation function Hyperparameters of a convolutional layer: The number of filters (controls the depth of the output volume) The stride: how many units apart do we apply a filter spatially (this controls the spatial size of the output volume)

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 24 / 55

slide-40
SLIDE 40

Convolutional Layer

Figure: Left: CNN, right: Each neuron computes a linear and activation function Hyperparameters of a convolutional layer: The number of filters (controls the depth of the output volume) The stride: how many units apart do we apply a filter spatially (this controls the spatial size of the output volume) The size w × h of the filters

[http://cs231n.github.io/convolutional-networks/] Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 24 / 55

slide-41
SLIDE 41

MLP vs ConvNet

Figure: Top: MLP, bottom: Convolutional neural network

[http://cs231n.github.io/convolutional-networks/]

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 25 / 55

slide-42
SLIDE 42

61

By “pooling” (e.g., taking max) filter responses at different locations we gain robustness to the exact spatial location

  • f features.

Ra Ranzato

Pooling Layer

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 26 / 55

slide-43
SLIDE 43

Pooling Options

Max Pooling: return the maximal argument Average Pooling: return the average of the arguments Other types of pooling exist.

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 27 / 55

slide-44
SLIDE 44

Pooling

Figure: Left: Pooling, right: max pooling example

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 28 / 55

slide-45
SLIDE 45

Pooling

Figure: Left: Pooling, right: max pooling example Hyperparameters of a pooling layer:

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 28 / 55

slide-46
SLIDE 46

Pooling

Figure: Left: Pooling, right: max pooling example Hyperparameters of a pooling layer: The spatial extent F

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 28 / 55

slide-47
SLIDE 47

Pooling

Figure: Left: Pooling, right: max pooling example Hyperparameters of a pooling layer: The spatial extent F The stride

[http://cs231n.github.io/convolutional-networks/]

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 28 / 55

slide-48
SLIDE 48

67

Ra Ranzato

Pooling Layer: Receptive Field Size

Conv. layer

h

n−1

h

n

Pool. layer

h

n1

If convolutional filters have size KxK and stride 1, and pooling layer has pools of size PxP, then each unit in the pooling layer depends upon a patch (at the input of the preceding conv. layer) of size: (P+K-1)x(P+K-1)

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 29 / 55

slide-49
SLIDE 49

Backpropagation with Weight Constraints

It is easy to modify the backpropagation algorithm to incorporate linear constraints between the weights To constrain: w1 = w2 we need: ∆w1 = ∆w2

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 30 / 55

slide-50
SLIDE 50

Backpropagation with Weight Constraints

It is easy to modify the backpropagation algorithm to incorporate linear constraints between the weights To constrain: w1 = w2 we need: ∆w1 = ∆w2 We compute the gradients as usual, and then modify the gradients so that they satisfy the constraints. compute:

∂E ∂w1 and ∂E ∂w2

use:

∂E ∂w1 + ∂E ∂w2 for w1 and w2

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 30 / 55

slide-51
SLIDE 51

Backpropagation with Weight Constraints

It is easy to modify the backpropagation algorithm to incorporate linear constraints between the weights To constrain: w1 = w2 we need: ∆w1 = ∆w2 We compute the gradients as usual, and then modify the gradients so that they satisfy the constraints. compute:

∂E ∂w1 and ∂E ∂w2

use:

∂E ∂w1 + ∂E ∂w2 for w1 and w2

So if the weights started off satisfying the constraints, they will continue to satisfy them.

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 30 / 55

slide-52
SLIDE 52

Backpropagation with Weight Constraints

It is easy to modify the backpropagation algorithm to incorporate linear constraints between the weights To constrain: w1 = w2 we need: ∆w1 = ∆w2 We compute the gradients as usual, and then modify the gradients so that they satisfy the constraints. compute:

∂E ∂w1 and ∂E ∂w2

use:

∂E ∂w1 + ∂E ∂w2 for w1 and w2

So if the weights started off satisfying the constraints, they will continue to satisfy them. This is an intuition behind the backprop. In practice, write down the equations and compute derivatives (it’s a nice exercise, do it at home)

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 30 / 55

slide-53
SLIDE 53

Now let’s make this very deep to get a real state-of-the-art object recognition system

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 31 / 55

slide-54
SLIDE 54

Convolutional Neural Networks (CNN)

Remember from your image processing / computer vision course about filtering?

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 32 / 55

slide-55
SLIDE 55

Convolutional Neural Networks (CNN)

If our filter is [−1, 1], you get a vertical edge detector

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 32 / 55

slide-56
SLIDE 56

Convolutional Neural Networks (CNN)

Now imagine we want to have many filters (e.g., vertical, horizontal, corners,

  • ne for dots). We will use a filterbank.

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 32 / 55

slide-57
SLIDE 57

Convolutional Neural Networks (CNN)

So applying a filterbank to an image yields a cube-like output, a 3D matrix in which each slice is an output of convolution with one filter. We apply an activation function on each hidden unit (typically a ReLU).

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 32 / 55

slide-58
SLIDE 58

Convolutional Neural Networks (CNN)

So applying a filterbank to an image yields a cube-like output, a 3D matrix in which each slice is an output of convolution with one filter. We apply an activation function on each hidden unit (typically a ReLU).

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 32 / 55

slide-59
SLIDE 59

Convolutional Neural Networks (CNN)

Do some additional tricks. A popular one is called max pooling. Any idea why you would do this?

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 32 / 55

slide-60
SLIDE 60

Convolutional Neural Networks (CNN)

Do some additional tricks. A popular one is called max pooling. Any idea why you would do this? To get invariance to small shifts in position.

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 32 / 55

slide-61
SLIDE 61

Convolutional Neural Networks (CNN)

Now add another “layer” of filters. For each filter again do convolution, but this time with the output cube of the previous layer.

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 32 / 55

slide-62
SLIDE 62

Convolutional Neural Networks (CNN)

Keep adding a few layers. Any idea what’s the purpose of more layers? Why can’t we just have a full bunch of filters in one layer?

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 32 / 55

slide-63
SLIDE 63

Convolutional Neural Networks (CNN)

In the end add one or two fully (or densely) connected layers. In this layer, we don’t do convolution we just do a dot-product between the “filter” and the output of the previous layer.

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 32 / 55

slide-64
SLIDE 64

Convolutional Neural Networks (CNN)

Add one final layer: a classification layer. Each dimension of this vector tells us the probability of the input image being of a certain class.

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 32 / 55

slide-65
SLIDE 65

Convolutional Neural Networks (CNN)

The trick is to not hand-fix the weights, but to train them. Train them such that when the network sees a picture of a dog, the last layer will say “dog”.

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 32 / 55

slide-66
SLIDE 66

Convolutional Neural Networks (CNN)

Or when the network sees a picture of a cat, the last layer will say “cat”.

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 32 / 55

slide-67
SLIDE 67

Convolutional Neural Networks (CNN)

Or when the network sees a picture of a boat, the last layer will say “boat”... The more pictures the network sees, the better.

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 32 / 55

slide-68
SLIDE 68

Classification

Once trained we feed in an image or a crop, run through the network, and read out the class with the highest probability in the last (classif) layer.

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 33 / 55

slide-69
SLIDE 69

Example

[http://cs231n.github.io/convolutional-networks/]

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 34 / 55

slide-70
SLIDE 70

95

Architecture for Classification

CONV LOCAL CONTRAST NORM MAX POOLING FULLY CONNECTED LINEAR CONV LOCAL CONTRAST NORM MAX POOLING CONV CONV CONV MAX POOLING FULLY CONNECTED

Krizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012

category prediction input Ra Ranzato

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 35 / 55

slide-71
SLIDE 71

96

CONV LOCAL CONTRAST NORM MAX POOLING FULLY CONNECTED LINEAR CONV LOCAL CONTRAST NORM MAX POOLING CONV CONV CONV MAX POOLING FULLY CONNECTED

Total nr. params: 60M

4M 16M 37M 442K 1.3M 884K 307K 35K

Total nr. flops: 832M

4M 16M 37M 74M 224M 149M 223M 105M

Krizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012

category prediction input Ra Ranzato

Architecture for Classification

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 36 / 55

slide-72
SLIDE 72

ImageNet

Imagenet, biggest dataset for object classification: http://image-net.org/ 1000 classes, 1.2M training images, 150K for test

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 37 / 55

slide-73
SLIDE 73

The 2012 Computer Vision Crisis

(Classification) (Detection)

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 38 / 55

slide-74
SLIDE 74

So Neural Networks are Great

So networks turn out to be great. Everything is deep, even if it’s shallow! Companies leading the competitions as they have more computational power At this point Google, Facebook, Microsoft, Baidu “steal” most neural network professors/students from academia

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 39 / 55

slide-75
SLIDE 75

So Neural Networks are Great

But to train the networks you need quite a bit of computational power (e.g., GPU farm). So what do you do?

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 39 / 55

slide-76
SLIDE 76

So Neural Networks are Great

Buy even more.

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 39 / 55

slide-77
SLIDE 77

So Neural Networks are Great

And train more layers. 16 instead of 7 before. 144 million parameters. Figure: K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image

  • Recognition. arXiv 2014

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 39 / 55

slide-78
SLIDE 78

150 Layers!

Networks are now at 150 layers They use a skip connections with special form In fact, they don’t fit on this screen Amazing performance! A lot of “mistakes” are due to wrong ground-truth

[He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2016] Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 40 / 55

slide-79
SLIDE 79

Results: Object Classification

Slide: R. Liao, Paper: [He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2016] Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 41 / 55

slide-80
SLIDE 80

Results: Object Detection

Slide: R. Liao, Paper: [He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2016] Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 42 / 55

slide-81
SLIDE 81

Results: Object Detection

Slide: R. Liao, Paper: [He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2016] Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 43 / 55

slide-82
SLIDE 82

Results: Object Detection

Slide: R. Liao, Paper: [He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2016] Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 44 / 55

slide-83
SLIDE 83

Results: Object Detection

Slide: R. Liao, Paper: [He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2016] Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 45 / 55

slide-84
SLIDE 84

What do CNNs Learn?

Figure: Filters in the first convolutional layer of Krizhevsky et al

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 46 / 55

slide-85
SLIDE 85

What do CNNs Learn?

Figure: Filters in the second layer

[http://arxiv.org/pdf/1311.2901v3.pdf]

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 47 / 55

slide-86
SLIDE 86

What do CNNs Learn?

Figure: Filters in the third layer

[http://arxiv.org/pdf/1311.2901v3.pdf]

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 48 / 55

slide-87
SLIDE 87

What do CNNs Learn?

[http://arxiv.org/pdf/1311.2901v3.pdf]

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 49 / 55

slide-88
SLIDE 88

How to Train Good CNNs

Normalize your data (standard trick: subtract mean, divide by standard deviation)

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 50 / 55

slide-89
SLIDE 89

How to Train Good CNNs

Normalize your data (standard trick: subtract mean, divide by standard deviation) Augment your data (add image flips, rotations, etc)

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 50 / 55

slide-90
SLIDE 90

How to Train Good CNNs

Normalize your data (standard trick: subtract mean, divide by standard deviation) Augment your data (add image flips, rotations, etc) Keep training data balanced

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 50 / 55

slide-91
SLIDE 91

How to Train Good CNNs

Normalize your data (standard trick: subtract mean, divide by standard deviation) Augment your data (add image flips, rotations, etc) Keep training data balanced Shuffle data before batching

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 50 / 55

slide-92
SLIDE 92

How to Train Good CNNs

Normalize your data (standard trick: subtract mean, divide by standard deviation) Augment your data (add image flips, rotations, etc) Keep training data balanced Shuffle data before batching In training: Random initialization of weights with proper variance

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 50 / 55

slide-93
SLIDE 93

How to Train Good CNNs

Normalize your data (standard trick: subtract mean, divide by standard deviation) Augment your data (add image flips, rotations, etc) Keep training data balanced Shuffle data before batching In training: Random initialization of weights with proper variance Monitor your loss function, and accuracy (performance) on validation

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 50 / 55

slide-94
SLIDE 94

How to Train Good CNNs

Normalize your data (standard trick: subtract mean, divide by standard deviation) Augment your data (add image flips, rotations, etc) Keep training data balanced Shuffle data before batching In training: Random initialization of weights with proper variance Monitor your loss function, and accuracy (performance) on validation If your labeled image dataset is small: pre-train your CNN on a large dataset (eg Imagenet), and fine-tune on your dataset

[Slide: Y. Zhu, check tutorial slides and code: http://www.cs.utoronto.ca/~fidler/teaching/2015/CSC2523.html]

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 50 / 55

slide-95
SLIDE 95

Tricking a Neural Net

Read about it here (and try it!): https://codewords.recurse.com/issues/five/ why-do-neural-networks-think-a-panda-is-a-vulture Watch: https://www.youtube.com/watch?v=M2IebCN9Ht4

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 51 / 55

slide-96
SLIDE 96

More on NNs

Figure: Generate images: http://arxiv.org/pdf/1511.06434v1.pdf

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 52 / 55

slide-97
SLIDE 97

More on NNs

Generate text: https://vimeo.com/146492001, https://github.com/karpathy/neuraltalk2, https://github.com/ryankiros/visual-semantic-embedding

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 53 / 55

slide-98
SLIDE 98

More on NNs

Figure: Compose music: https://www.youtube.com/watch?v=0VTI1BBLydE

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 54 / 55

slide-99
SLIDE 99

Links

NNs for computer vision: https://github.com/kjw0612/awesome-deep-vision Recurrent neural networks: https://github.com/kjw0612/awesome-rnn Lots of code, models, tutorials: https://github.com/carpedm20/awesome-torch More links on our class webpage

Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 55 / 55