Lecture 4: Backpropagation and Neural Networks part 1 Fei-Fei Li - - PowerPoint PPT Presentation

lecture 4 backpropagation and neural networks part 1
SMART_READER_LITE
LIVE PREVIEW

Lecture 4: Backpropagation and Neural Networks part 1 Fei-Fei Li - - PowerPoint PPT Presentation

Lecture 4: Backpropagation and Neural Networks part 1 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - Lecture 4 - 13 Jan 2016 13 Jan 2016 1 Administrative A1 is due


slide-1
SLIDE 1

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 1

Lecture 4: Backpropagation and Neural Networks part 1

slide-2
SLIDE 2

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 2

Administrative

A1 is due Jan 20 (Wednesday). ~150 hours left Warning: Jan 18 (Monday) is Holiday (no class/office hours) Also note: Lectures are non-exhaustive. Read course notes for completeness. I’ll hold make up office hours on Wed Jan20, 5pm @ Gates 259

slide-3
SLIDE 3

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 3

want

scores function SVM loss data loss + regularization

Where we are...

slide-4
SLIDE 4

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 4

(image credits to Alec Radford)

Optimization

slide-5
SLIDE 5

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 5

Gradient Descent

Numerical gradient: slow :(, approximate :(, easy to write :) Analytic gradient: fast :), exact :), error-prone :( In practice: Derive analytic gradient, check your implementation with numerical gradient

slide-6
SLIDE 6

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 6

Computational Graph

x W

*

hinge loss

R

+

L

s (scores)

slide-7
SLIDE 7

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 7

Convolutional Network (AlexNet) input image weights loss

slide-8
SLIDE 8

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 8

Neural Turing Machine input tape loss

slide-9
SLIDE 9

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 9

Neural Turing Machine

slide-10
SLIDE 10

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 10

e.g. x = -2, y = 5, z = -4

slide-11
SLIDE 11

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 11

e.g. x = -2, y = 5, z = -4 Want:

slide-12
SLIDE 12

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 12

e.g. x = -2, y = 5, z = -4 Want:

slide-13
SLIDE 13

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 13

e.g. x = -2, y = 5, z = -4 Want:

slide-14
SLIDE 14

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 14

e.g. x = -2, y = 5, z = -4 Want:

slide-15
SLIDE 15

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 15

e.g. x = -2, y = 5, z = -4 Want:

slide-16
SLIDE 16

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 16

e.g. x = -2, y = 5, z = -4 Want:

slide-17
SLIDE 17

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 17

e.g. x = -2, y = 5, z = -4 Want:

slide-18
SLIDE 18

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 18

e.g. x = -2, y = 5, z = -4 Want:

slide-19
SLIDE 19

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 19

e.g. x = -2, y = 5, z = -4 Want: Chain rule:

slide-20
SLIDE 20

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 20

e.g. x = -2, y = 5, z = -4 Want:

slide-21
SLIDE 21

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 21

e.g. x = -2, y = 5, z = -4 Want: Chain rule:

slide-22
SLIDE 22

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 22

f

activations

slide-23
SLIDE 23

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 23

f

activations

“local gradient”

slide-24
SLIDE 24

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 24

f

activations

“local gradient”

gradients

slide-25
SLIDE 25

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 25

f

activations gradients

“local gradient”

slide-26
SLIDE 26

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 26

f

activations gradients

“local gradient”

slide-27
SLIDE 27

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 27

f

activations gradients

“local gradient”

slide-28
SLIDE 28

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 28

Another example:

slide-29
SLIDE 29

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 29

Another example:

slide-30
SLIDE 30

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 30

Another example:

slide-31
SLIDE 31

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 31

Another example:

slide-32
SLIDE 32

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 32

Another example:

slide-33
SLIDE 33

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 33

Another example:

slide-34
SLIDE 34

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 34

Another example:

slide-35
SLIDE 35

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 35

Another example:

slide-36
SLIDE 36

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 36

Another example:

slide-37
SLIDE 37

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 37

Another example:

(-1) * (-0.20) = 0.20

slide-38
SLIDE 38

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 38

Another example:

slide-39
SLIDE 39

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 39

Another example:

[local gradient] x [its gradient] [1] x [0.2] = 0.2 [1] x [0.2] = 0.2 (both inputs!)

slide-40
SLIDE 40

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 40

Another example:

slide-41
SLIDE 41

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 41

Another example:

[local gradient] x [its gradient] x0: [2] x [0.2] = 0.4 w0: [-1] x [0.2] = -0.2

slide-42
SLIDE 42

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 42

sigmoid function

sigmoid gate

slide-43
SLIDE 43

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 43

sigmoid function

sigmoid gate

(0.73) * (1 - 0.73) = 0.2

slide-44
SLIDE 44

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 44

Patterns in backward flow

add gate: gradient distributor max gate: gradient router mul gate: gradient… “switcher”?

slide-45
SLIDE 45

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 45

Gradients add at branches

+

slide-46
SLIDE 46

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 46

Implementation: forward/backward API

Graph (or Net) object. (Rough psuedo code)

slide-47
SLIDE 47

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 47

Implementation: forward/backward API

(x,y,z are scalars) * x y z

slide-48
SLIDE 48

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 48

Implementation: forward/backward API

(x,y,z are scalars) * x y z

slide-49
SLIDE 49

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 49

Example: Torch Layers

slide-50
SLIDE 50

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 50

Example: Torch Layers =

slide-51
SLIDE 51

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 51

Example: Torch MulConstant initialization forward() backward()

slide-52
SLIDE 52

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 52

Example: Caffe Layers

slide-53
SLIDE 53

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 53

Caffe Sigmoid Layer

*top_diff (chain rule)

slide-54
SLIDE 54

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 54

Gradients for vectorized code

f

“local gradient” This is now the Jacobian matrix (derivative of each element of z w.r.t. each element of x) (x,y,z are now vectors)

gradients

slide-55
SLIDE 55

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 55

Vectorized operations

f(x) = max(0,x) (elementwise)

4096-d input vector 4096-d

  • utput vector
slide-56
SLIDE 56

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 56

Vectorized operations

f(x) = max(0,x) (elementwise)

4096-d input vector 4096-d

  • utput vector

Q: what is the size of the Jacobian matrix?

Jacobian matrix

slide-57
SLIDE 57

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 57

max(0,x)

(elementwise)

4096-d input vector 4096-d

  • utput vector

Q: what is the size of the Jacobian matrix? [4096 x 4096!] Q2: what does it look like? Vectorized operations

Jacobian matrix

f(x) = max(0,x) (elementwise)

slide-58
SLIDE 58

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 58

max(0,x)

(elementwise)

100 4096-d input vectors 100 4096-d

  • utput vectors

Vectorized operations

in practice we process an entire minibatch (e.g. 100)

  • f examples at one time:

i.e. Jacobian would technically be a [409,600 x 409,600] matrix :\

f(x) = max(0,x) (elementwise)

slide-59
SLIDE 59

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 59

Assignment: Writing SVM/Softmax Stage your forward/backward computation!

E.g. for the SVM:

margins

slide-60
SLIDE 60

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 60

Summary so far

  • neural nets will be very large: no hope of writing down gradient formula by

hand for all parameters

  • backpropagation = recursive application of the chain rule along a

computational graph to compute the gradients of all inputs/parameters/intermediates

  • implementations maintain a graph structure, where the nodes implement

the forward() / backward() API.

  • forward: compute result of an operation and save any intermediates

needed for gradient computation in memory

  • backward: apply the chain rule to compute the gradient of the loss

function with respect to the inputs.

slide-61
SLIDE 61

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 61

slide-62
SLIDE 62

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 62

Neural Network: without the brain stuff

(Before) Linear score function:

slide-63
SLIDE 63

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 63

Neural Network: without the brain stuff

(Before) Linear score function: (Now) 2-layer Neural Network

slide-64
SLIDE 64

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 64

Neural Network: without the brain stuff

(Before) Linear score function: (Now) 2-layer Neural Network x h

W1

s

W2 3072 100 10

slide-65
SLIDE 65

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 65

Neural Network: without the brain stuff

(Before) Linear score function: (Now) 2-layer Neural Network x h

W1

s

W2 3072 100 10

slide-66
SLIDE 66

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 66

Neural Network: without the brain stuff

(Before) Linear score function: (Now) 2-layer Neural Network

  • r 3-layer Neural Network
slide-67
SLIDE 67

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 67 Full implementation of training a 2-layer Neural Network needs ~11 lines:

from @iamtrask, http://iamtrask.github.io/2015/07/12/basic-python-network/

slide-68
SLIDE 68

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 68

Assignment: Writing 2layer Net Stage your forward/backward computation!

slide-69
SLIDE 69

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 69

slide-70
SLIDE 70

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 70

slide-71
SLIDE 71

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 71

slide-72
SLIDE 72

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 72

slide-73
SLIDE 73

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 73

sigmoid activation function

slide-74
SLIDE 74

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 74

slide-75
SLIDE 75

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 75

Be very careful with your Brain analogies: Biological Neurons:

  • Many different types
  • Dendrites can perform complex non-

linear computations

  • Synapses are not a single weight but

a complex non-linear dynamical system

  • Rate code may not be adequate

[Dendritic Computation. London and Hausser]

slide-76
SLIDE 76

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 76

Activation Functions

Sigmoid tanh tanh(x) ReLU max(0,x) Leaky ReLU max(0.1x, x) Maxout ELU

slide-77
SLIDE 77

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 77

Neural Networks: Architectures

“Fully-connected” layers “2-layer Neural Net”, or “1-hidden-layer Neural Net” “3-layer Neural Net”, or “2-hidden-layer Neural Net”

slide-78
SLIDE 78

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 78

Example Feed-forward computation of a Neural Network

We can efficiently evaluate an entire layer of neurons.

slide-79
SLIDE 79

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 79

Example Feed-forward computation of a Neural Network

slide-80
SLIDE 80

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 80

Setting the number of layers and their sizes

more neurons = more capacity

slide-81
SLIDE 81

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 81

(you can play with this demo over at ConvNetJS: http://cs.stanford. edu/people/karpathy/convnetjs/demo/classify2d.html)

Do not use size of neural network as a regularizer. Use stronger regularization instead:

slide-82
SLIDE 82

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 82

Summary

  • we arrange neurons into fully-connected layers
  • the abstraction of a layer has the nice property that it

allows us to use efficient vectorized code (e.g. matrix multiplies)

  • neural networks are not really neural
  • neural networks: bigger = better (but might have to

regularize more strongly)

slide-83
SLIDE 83

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 83

Next Lecture: More than you ever wanted to know about Neural Networks and how to train them.

slide-84
SLIDE 84

Lecture 4 - 13 Jan 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 4 - 13 Jan 2016 84

reverse-mode differentiation (if you want effect of many things on one thing) forward-mode differentiation (if you want effect of one thing on many things) for many different x for many different y complex graph inputs x

  • utputs y