Introduction to Deep Learning A. G. Schwing & S. Fidler - - PowerPoint PPT Presentation

introduction to deep learning
SMART_READER_LITE
LIVE PREVIEW

Introduction to Deep Learning A. G. Schwing & S. Fidler - - PowerPoint PPT Presentation

Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2015 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 1 / 39 Outline Universality of Neural Networks 1 Learning Neural


slide-1
SLIDE 1

Introduction to Deep Learning

  • A. G. Schwing & S. Fidler

University of Toronto, 2015

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 1 / 39

slide-2
SLIDE 2

Outline

1

Universality of Neural Networks

2

Learning Neural Networks

3

Deep Learning

4

Applications

5

References

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 2 / 39

slide-3
SLIDE 3

What are neural networks? Let’s ask

  • Biological
  • Computational

Input #1 Input #2 Input #3 Input #4 Output Hidden layer Input layer Output layer

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 3 / 39

slide-4
SLIDE 4

What are neural networks? ...Neural networks (NNs) are computational models inspired by biological neural networks [...] and are used to estimate or approximate functions... [Wikipedia]

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 4 / 39

slide-5
SLIDE 5

What are neural networks? Origins: Traced back to threshold logic [W. McCulloch and W. Pitts, 1943] Perceptron [F . Rosenblatt, 1958]

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 5 / 39

slide-6
SLIDE 6

What are neural networks? Use cases Classification Playing video games Captcha Neural Turing Machine (e.g., learn how to sort) Alex Graves

http://www.technologyreview.com/view/532156/googles-secretive-deepmind-startup-unveils-a-neural-turing-machine/

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 6 / 39

slide-7
SLIDE 7

What are neural networks? Example: input x parameters w1, w2, b

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 7 / 39

slide-8
SLIDE 8

What are neural networks? Example: input x parameters w1, w2, b x ∈ R h1 b ∈ R f w1 w2

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 7 / 39

slide-9
SLIDE 9

How to compute the function? Forward propagation/pass, inference, prediction: Given input x and parameters w, b Compute (latent variables/) intermediate results in a feed-forward manner Until we obtain output function f x ∈ R h1 b ∈ R f w1 w2

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 8 / 39

slide-10
SLIDE 10

How to compute the function? Forward propagation/pass, inference, prediction: Given input x and parameters w, b Compute (latent variables/) intermediate results in a feed-forward manner Until we obtain output function f

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 8 / 39

slide-11
SLIDE 11

How to compute the function? Example: input x, parameters w1, w2, b x ∈ R h1 b ∈ R f w1 w2

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 9 / 39

slide-12
SLIDE 12

How to compute the function? Example: input x, parameters w1, w2, b x ∈ R h1 b ∈ R f w1 w2 h1 = σ(w1 · x + b) f = w2 · h1 Sigmoid function: σ(z) = 1/(1 + exp(−z))

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 9 / 39

slide-13
SLIDE 13

How to compute the function? Example: input x, parameters w1, w2, b x ∈ R h1 b ∈ R f w1 w2 h1 = σ(w1 · x + b) f = w2 · h1 Sigmoid function: σ(z) = 1/(1 + exp(−z)) x = ln 2, b = ln 3, w1 = 2, w2 = 2 h1 =? f =?

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 9 / 39

slide-14
SLIDE 14

How to compute the function? Given parameters, what is f for x = 0, x = 1, x = 2, ... f = w2σ(w1 · x + b)

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 10 / 39

slide-15
SLIDE 15

How to compute the function? Given parameters, what is f for x = 0, x = 1, x = 2, ... f = w2σ(w1 · x + b)

−5 5 0.5 1 1.5 2 x f

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 10 / 39

slide-16
SLIDE 16

Let’s mess with parameters: x ∈ R h1 b ∈ R f w1 w2 h1 = σ(w1 · x + b) f = w2 · h1 σ(z) = 1/(1 + exp(−z))

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 11 / 39

slide-17
SLIDE 17

Let’s mess with parameters: x ∈ R h1 b ∈ R f w1 w2 h1 = σ(w1 · x + b) f = w2 · h1 σ(z) = 1/(1 + exp(−z)) w1 = 1.0, b changes

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 11 / 39

slide-18
SLIDE 18

Let’s mess with parameters: x ∈ R h1 b ∈ R f w1 w2 h1 = σ(w1 · x + b) f = w2 · h1 σ(z) = 1/(1 + exp(−z)) w1 = 1.0, b changes

−5 5 0.2 0.4 0.6 0.8 1 x f b = −2 b = 0 b = 2

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 11 / 39

slide-19
SLIDE 19

Let’s mess with parameters: x ∈ R h1 b ∈ R f w1 w2 h1 = σ(w1 · x + b) f = w2 · h1 σ(z) = 1/(1 + exp(−z)) w1 = 1.0, b changes b = 0, w1 changes

−5 5 0.2 0.4 0.6 0.8 1 x f b = −2 b = 0 b = 2

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 11 / 39

slide-20
SLIDE 20

Let’s mess with parameters: x ∈ R h1 b ∈ R f w1 w2 h1 = σ(w1 · x + b) f = w2 · h1 σ(z) = 1/(1 + exp(−z)) w1 = 1.0, b changes b = 0, w1 changes

−5 5 0.2 0.4 0.6 0.8 1 x f b = −2 b = 0 b = 2 −5 5 0.2 0.4 0.6 0.8 1 x f w1 = 0 w1 = 0.5 w1 = 1.0 w1 = 100

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 11 / 39

slide-21
SLIDE 21

Let’s mess with parameters: x ∈ R h1 b ∈ R f w1 w2 h1 = σ(w1 · x + b) f = w2 · h1 σ(z) = 1/(1 + exp(−z)) w1 = 1.0, b changes b = 0, w1 changes

−5 5 0.2 0.4 0.6 0.8 1 x f b = −2 b = 0 b = 2 −5 5 0.2 0.4 0.6 0.8 1 x f w1 = 0 w1 = 0.5 w1 = 1.0 w1 = 100

Keep in mind the step function.

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 11 / 39

slide-22
SLIDE 22

How to use Neural Networks for binary classification? Feature/Measurement: x Output: How likely is the input to be a cat?

−5 5 0.2 0.4 0.6 0.8 1 x y

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 12 / 39

slide-23
SLIDE 23

How to use Neural Networks for binary classification? Feature/Measurement: x Output: How likely is the input to be a cat?

−5 5 0.2 0.4 0.6 0.8 1 x f

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 12 / 39

slide-24
SLIDE 24

How to use Neural Networks for binary classification? Feature/Measurement: x Output: How likely is the input to be a cat?

−5 5 0.2 0.4 0.6 0.8 1 x f

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 12 / 39

slide-25
SLIDE 25

How to use Neural Networks for binary classification? Feature/Measurement: x Output: How likely is the input to be a cat?

−5 5 0.2 0.4 0.6 0.8 1 x f

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 12 / 39

slide-26
SLIDE 26

How to use Neural Networks for binary classification? Shifted feature/measurement: x Output: How likely is the input to be a cat? Previous features

−5 5 0.2 0.4 0.6 0.8 1 x f

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 13 / 39

slide-27
SLIDE 27

How to use Neural Networks for binary classification? Shifted feature/measurement: x Output: How likely is the input to be a cat? Previous features Shifted features

−5 5 0.2 0.4 0.6 0.8 1 x f −5 5 0.2 0.4 0.6 0.8 1 x f

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 13 / 39

slide-28
SLIDE 28

How to use Neural Networks for binary classification? Shifted feature/measurement: x Output: How likely is the input to be a cat? Previous features Shifted features

−5 5 0.2 0.4 0.6 0.8 1 x f −5 5 0.2 0.4 0.6 0.8 1 x f

Learning/Training means finding the right parameters.

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 13 / 39

slide-29
SLIDE 29

So far we are able to scale and translate sigmoids. How well can we approximate an arbitrary function? With the simple model we are obviously not going very far. Features are good Simple classifier

−5 5 0.2 0.4 0.6 0.8 1 x f

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 14 / 39

slide-30
SLIDE 30

So far we are able to scale and translate sigmoids. How well can we approximate an arbitrary function? With the simple model we are obviously not going very far. Features are good Features are noisy Simple classifier More complex classifier

−5 5 0.2 0.4 0.6 0.8 1 x f −5 5 0.2 0.4 0.6 0.8 1 x f

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 14 / 39

slide-31
SLIDE 31

So far we are able to scale and translate sigmoids. How well can we approximate an arbitrary function? With the simple model we are obviously not going very far. Features are good Features are noisy Simple classifier More complex classifier

−5 5 0.2 0.4 0.6 0.8 1 x f −5 5 0.2 0.4 0.6 0.8 1 x f

How can we generalize?

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 14 / 39

slide-32
SLIDE 32

Let’s use more hidden variables:

x ∈ R h1 b1 h2 b2 f w1 w2 w3 w4

h1 = σ(w1 · x + b1) h2 = σ(w3 · x + b2) f = w2 · h1 + w4 · h2

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 15 / 39

slide-33
SLIDE 33

Let’s use more hidden variables:

x ∈ R h1 b1 h2 b2 f w1 w2 w3 w4

h1 = σ(w1 · x + b1) h2 = σ(w3 · x + b2) f = w2 · h1 + w4 · h2 Combining two step functions gives a bump.

−5 5 1 1.2 1.4 1.6 1.8 2 x f

w1 = −100, b1 = 40, w3 = 100, b2 = 60, w2 = 1, w4 = 1

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 15 / 39

slide-34
SLIDE 34

So let’s simplify:

x ∈ R h1 b1 h2 b2 f w1 w2 w3 w4 f Bump(x1, x2, h)

We simplify a pair of hidden nodes to a “bump” function: Starts at x1 Ends at x2 Has height h

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 16 / 39

slide-35
SLIDE 35

Now we can represent “bumps” very well. How can we generalize?

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 17 / 39

slide-36
SLIDE 36

Now we can represent “bumps” very well. How can we generalize?

f Bump(0.0, 0.2, h1) Bump(0.2, 0.4, h2) Bump(0.4, 0.6, h3) Bump(0.6, 0.8, h4) Bump(0.8, 1.0, h5) 0.5 1 −0.5 0.5 1 1.5 x f Target Approximation

More bumps gives more accurate approximation. Corresponds to a single layer network.

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 17 / 39

slide-37
SLIDE 37

Universality: theoretically we can approximate an arbitrary function So we can learn a really complex cat classifier Where is the catch?

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 18 / 39

slide-38
SLIDE 38

Universality: theoretically we can approximate an arbitrary function So we can learn a really complex cat classifier Where is the catch? Complexity, we might need quite a few hidden units Overfitting, memorize the training data

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 18 / 39

slide-39
SLIDE 39

Generalizations are possible to

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 19 / 39

slide-40
SLIDE 40

Generalizations are possible to include more input dimensions capture more output dimensions employ multiple layers for more efficient representations See ‘http://neuralnetworksanddeeplearning.com/chap4.html’ for a great read!

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 19 / 39

slide-41
SLIDE 41

How do we find the parameters to obtain a good approximation? How do we tell a computer to do that?

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 20 / 39

slide-42
SLIDE 42

How do we find the parameters to obtain a good approximation? How do we tell a computer to do that? Intuitive explanation: Compute approximation error at the output Propagate error back by computing individual contributions of parameters to error

[Fig. from H. Lee]

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 20 / 39

slide-43
SLIDE 43

Example for backpropagation of error: Target function: 5x2 Approximation: f(x, w) Domain of interest: x ∈ {0, 1, 2, 3} Error: e(w) =

  • x∈{0,1,2,3}

(5x2 − f(x, w))2

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 21 / 39

slide-44
SLIDE 44

Example for backpropagation of error: Target function: 5x2 Approximation: f(x, w) Domain of interest: x ∈ {0, 1, 2, 3} Error: e(w) =

  • x∈{0,1,2,3}

(5x2 − f(x, w))2 Program of interest: min

w e(w) = min w

  • x∈{0,1,2,3}

(5x2 − f(x, w))2

  • ℓ(x,w)

How to optimize?

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 21 / 39

slide-45
SLIDE 45

Example for backpropagation of error: Target function: 5x2 Approximation: f(x, w) Domain of interest: x ∈ {0, 1, 2, 3} Error: e(w) =

  • x∈{0,1,2,3}

(5x2 − f(x, w))2 Program of interest: min

w e(w) = min w

  • x∈{0,1,2,3}

(5x2 − f(x, w))2

  • ℓ(x,w)

How to optimize? Gradient descent

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 21 / 39

slide-46
SLIDE 46

Gradient descent min

w e(w)

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 22 / 39

slide-47
SLIDE 47

Gradient descent min

w e(w)

Algorithm: start with w0, t = 0

1

Compute gradient gt =

∂e ∂w

  • w=wt

2

Update wt+1 = wt − ηgt

3

Set t ← t + 1

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 22 / 39

slide-48
SLIDE 48

Chain rule is important to compute gradients: min

w e(w) = min w

  • x∈{0,1,2,3}

(5x2 − f(x, w))2

  • ℓ(x,w)
  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 23 / 39

slide-49
SLIDE 49

Chain rule is important to compute gradients: min

w e(w) = min w

  • x∈{0,1,2,3}

(5x2 − f(x, w))2

  • ℓ(x,w)

Loss function: ℓ(x, w)

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 23 / 39

slide-50
SLIDE 50

Chain rule is important to compute gradients: min

w e(w) = min w

  • x∈{0,1,2,3}

(5x2 − f(x, w))2

  • ℓ(x,w)

Loss function: ℓ(x, w) Squared loss Log loss Hinge loss

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 23 / 39

slide-51
SLIDE 51

Chain rule is important to compute gradients: min

w e(w) = min w

  • x∈{0,1,2,3}

(5x2 − f(x, w))2

  • ℓ(x,w)

Loss function: ℓ(x, w) Squared loss Log loss Hinge loss Derivatives: ∂e(w) w =

  • x∈{0,1,2,3}

∂ℓ(x, w) ∂w =

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 23 / 39

slide-52
SLIDE 52

Chain rule is important to compute gradients: min

w e(w) = min w

  • x∈{0,1,2,3}

(5x2 − f(x, w))2

  • ℓ(x,w)

Loss function: ℓ(x, w) Squared loss Log loss Hinge loss Derivatives: ∂e(w) w =

  • x∈{0,1,2,3}

∂ℓ(x, w) ∂w =

  • x∈{0,1,2,3}

∂ℓ(x, w) ∂f ∂f(x, w) ∂w

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 23 / 39

slide-53
SLIDE 53

Slightly more complex example: Composite function represented as a directed a-cyclic graph ℓ(x, w) = f1(w1, f2(w2, f3(. . .))) f1(w1, f2) w1 f2(w2, f3) w2 f3(. . .)

∂f1 ∂w1 ∂f1 ∂f2 ∂f1 ∂f2 ∂f2 ∂w2 ∂f1 ∂f2 ∂f2 ∂f3

Repeated application of chain rule for efficient computation of all gradients

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 24 / 39

slide-54
SLIDE 54

Back propagation doesn’t work well for deep sigmoid networks: Diffusion of gradient signal (multiplication of many small numbers) Attractivity of many local minima (random initialization is very far from good points) Requires a lot of training samples Need for significant computational power

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 25 / 39

slide-55
SLIDE 55

Back propagation doesn’t work well for deep sigmoid networks: Diffusion of gradient signal (multiplication of many small numbers) Attractivity of many local minima (random initialization is very far from good points) Requires a lot of training samples Need for significant computational power Solution: 2 step approach Greedy layerwise pre-training Perform full fine tuning at the end

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 25 / 39

slide-56
SLIDE 56

Why go deep? Representation efficiency (fewer computational units for the same function) Hierarchical representation (non-local generalization) Combinatorial sharing (re-use of earlier computation) Works very well

[Fig. from H. Lee]

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 26 / 39

slide-57
SLIDE 57

To obtain more flexibility/non-linearity we use additional function prototypes:

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 27 / 39

slide-58
SLIDE 58

To obtain more flexibility/non-linearity we use additional function prototypes: Sigmoid Rectified linear unit (ReLU) Pooling Dropout Convolutions

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 27 / 39

slide-59
SLIDE 59

Convolutions What do the numbers mean? See Sanja’s lecture 14 for the answers...

[Fig. adapted from A. Krizhevsky]

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 28 / 39

slide-60
SLIDE 60

fconv( , )

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 29 / 39

slide-61
SLIDE 61

fconv( , ) =

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 29 / 39

slide-62
SLIDE 62

Max Pooling What is happening here?

[Fig. adapted from A. Krizhevsky]

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 30 / 39

slide-63
SLIDE 63
  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 31 / 39

slide-64
SLIDE 64

Rectified Linear Unit (ReLU)

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 32 / 39

slide-65
SLIDE 65

Rectified Linear Unit (ReLU) Drop information if smaller than zero Fixes the problem of vanishing gradients to some degree

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 32 / 39

slide-66
SLIDE 66

Rectified Linear Unit (ReLU) Drop information if smaller than zero Fixes the problem of vanishing gradients to some degree Dropout

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 32 / 39

slide-67
SLIDE 67

Rectified Linear Unit (ReLU) Drop information if smaller than zero Fixes the problem of vanishing gradients to some degree Dropout Drop information at random Kind of a regularization, enforcing redundancy

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 32 / 39

slide-68
SLIDE 68

A famous deep learning network called “AlexNet.” The network won the ImageNet competition in 2012. How many parameters? Given an image, what is happening? Inference Time: about 2ms per image when processing many images in parallel on the GPU Training Time: a few days given a single recent GPU

[Fig. adapted from A. Krizhevsky]

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 33 / 39

slide-69
SLIDE 69

Demo

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 34 / 39

slide-70
SLIDE 70

Neural networks have been used for many applications: Classification and Recognition in Computer Vision Text Parsing in Natural Language Processing Playing Video Games Stock Market Prediction Captcha Demos: Russ website Antonio Places website

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 35 / 39

slide-71
SLIDE 71

Classification in Computer Vision: ImageNet Challenge

http://deeplearning.cs.toronto.edu/

Since it’s the end of the semester, let’s find the beach...

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 36 / 39

slide-72
SLIDE 72

Classification in Computer Vision: ImageNet Challenge

http://deeplearning.cs.toronto.edu/

A place to maybe prepare for exams...

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 37 / 39

slide-73
SLIDE 73

Links: Tutorials: http://deeplearning.net/tutorial/deeplearning.pdf Toronto Demo by Russ and students: http://deeplearning.cs.toronto.edu/ MIT Demo by Antonio and students: http://places.csail.mit.edu/demo.html Honglak Lee: http://deeplearningworkshopnips2010.files.wordpress.com/2010/09/nips10- workshop-tutorial-final.pdf Yann LeCun: http://www.cs.nyu.edu/ yann/talks/lecun-ranzato-icml2013.pdf Richard Socher: http://lxmls.it.pt/2014/socher-lxmls.pdf

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 38 / 39

slide-74
SLIDE 74

Videos: Video games: https://www.youtube.com/watch?v=mARt-xPablE Captcha: http://singularityhub.com/2013/10/29/tiny-ai-startup- vicarious-says-its-solved-captcha/ https://www.youtube.com/watch?v=lge-dl2JUAM#t=27 Stock exchange: http://cs.stanford.edu/people/eroberts/courses/soco/projects/neural- networks/Applications/stocks.html

  • A. G. Schwing & S. Fidler (UofT)

CSC420: Intro to Image Understanding 2015 39 / 39