In Introductio ion to Neural l Networks I2DL: Prof. Niessner, - - PowerPoint PPT Presentation

in introductio ion to
SMART_READER_LITE
LIVE PREVIEW

In Introductio ion to Neural l Networks I2DL: Prof. Niessner, - - PowerPoint PPT Presentation

In Introductio ion to Neural l Networks I2DL: Prof. Niessner, Prof. Leal-Taix 1 Lecture 2 Recap I2DL: Prof. Niessner, Prof. Leal-Taix 2 Lin inear Regression = a supervised le learn rning method to find a lin linear r model of


slide-1
SLIDE 1

In Introductio ion to Neural l Networks

I2DL: Prof. Niessner, Prof. Leal-Taixé 1

slide-2
SLIDE 2

Lecture 2 Recap

2 I2DL: Prof. Niessner, Prof. Leal-Taixé

slide-3
SLIDE 3

Lin inear Regression

I2DL: Prof. Niessner, Prof. Leal-Taixé 3

= a supervised le learn rning method to find a lin linear r model of the form

Goal: find a model that explains a target y given the input x ො 𝑧𝑗 = 𝜄0 + ෍

𝑘=1 𝑒

𝑦𝑗𝑘𝜄

𝑘 = 𝜄0 + 𝑦𝑗1𝜄1 + 𝑦𝑗2𝜄2 + ⋯ + 𝑦𝑗𝑒𝜄𝑒

𝜄0

slide-4
SLIDE 4

Logistic ic Regression

  • Loss function
  • Cost function

I2DL: Prof. Niessner, Prof. Leal-Taixé 4

Minimization

ℒ 𝑧𝑗, ෝ 𝑧𝑗 = −𝑧𝑗 ∙ log ෝ 𝑧𝑗 + (1 − 𝑧𝑗) ∙ log[1 − ෝ 𝑧𝑗]) 𝒟 𝜾 = − ෍

𝑗=1 𝑜

(𝑧𝑗 ∙ log ෝ 𝑧𝑗 + (1 − 𝑧𝑗) ∙ log[1 − ෝ 𝑧𝑗])

ෝ 𝑧𝑗 = 𝜏(𝑦𝑗𝜾)

slide-5
SLIDE 5

Lin inear vs Logistic Regressio ion

I2DL: Prof. Niessner, Prof. Leal-Taixé 5

y=1 y=0 Predictions are guaranteed to be within [0;1] Predictions can exceed the range

  • f the training samples

→ in the case of classification [0;1] this becomes a real issue

slide-6
SLIDE 6

How to obtain in the Model?

I2DL: Prof. Niessner, Prof. Leal-Taixé 6

Data points Model parameters Labels (ground truth) Estimation Loss function Optimization 𝜾 ෝ 𝒛 𝒛 𝒚

slide-7
SLIDE 7

Lin inear Score re Functio ions

  • Linear score function as seen in linear regression

𝒈𝒋 = ෍

𝒌

𝑥𝑙,𝑘 𝑦𝑘,𝑗 𝒈 = 𝑿 𝒚

I2DL: Prof. Niessner, Prof. Leal-Taixé 7

(Matrix Notation)

slide-8
SLIDE 8

Lin inear Score re Functio ions on Im Images

  • Linear score function 𝒈 = 𝑿𝒚

On CIFAR-10 On ImageNet

I2DL: Prof. Niessner, Prof. Leal-Taixé 8

Source:: Li/Karpathy/Johnson

slide-9
SLIDE 9

Lin inear Score re Functio ions?

I2DL: Prof. Niessner, Prof. Leal-Taixé 9

Logistic Regression Linear Separation Impossible!

slide-10
SLIDE 10

Lin inear Score re Functio ions?

  • Can we make linear regression better?

– Multiply with another weight matrix 𝑿𝟑

෠ 𝒈 = 𝑿𝟑 ⋅ 𝒈 ෠ 𝒈 = 𝑿𝟑 ⋅ 𝑿 ⋅ 𝒚

  • Operation is still linear.

෢ 𝑿 = 𝑿𝟑 ⋅ 𝑿 ෠ 𝒈 = ෢ 𝑿 𝒚

  • Solution → add non-linearity!!

I2DL: Prof. Niessner, Prof. Leal-Taixé 10

slide-11
SLIDE 11

Neural Network

  • Linear score function 𝒈 = 𝑿𝒚
  • Neural network is a nesting of ‘functions’

– 2-layers: 𝒈 = 𝑿𝟑 max(𝟏, 𝑿𝟐𝒚) – 3-layers: 𝒈 = 𝑿𝟒 max(𝟏, 𝑿𝟑 max(𝟏, 𝑿𝟐𝒚)) – 4-layers: 𝒈 = 𝑿𝟓 tanh (𝑿𝟒, max(𝟏, 𝑿𝟑 max(𝟏, 𝑿𝟐𝒚))) – 5-layers: 𝒈 = 𝑿𝟔𝜏(𝑿𝟓 tanh(𝑿𝟒, max(𝟏, 𝑿𝟑 max(𝟏, 𝑿𝟐𝒚)))) – … up to hundreds of layers

I2DL: Prof. Niessner, Prof. Leal-Taixé 11

slide-12
SLIDE 12

In Introductio ion to Neural l Networks

I2DL: Prof. Niessner, Prof. Leal-Taixé 12

slide-13
SLIDE 13

His istory of

  • f Neural Networks

I2DL: Prof. Niessner, Prof. Leal-Taixé 13

Source: http://beamlab.org/deeplearning/2017/02/23/deep_learning_101_part1.html

slide-14
SLIDE 14

Neural Network

I2DL: Prof. Niessner, Prof. Leal-Taixé 14

Logistic Regression Neural Networks

slide-15
SLIDE 15

Neural Network

  • Non-linear score function 𝒈 = … (max(𝟏, 𝑿𝟐𝒚))

I2DL: Prof. Niessner, Prof. Leal-Taixé 15

On CIFAR-10 Visualizing activations of first layer.

Source: ConvNetJS

slide-16
SLIDE 16

Neural Network

1-layer network: 𝒈 = 𝑿𝒚 𝒚 𝑿 128 × 128 = 16384 𝒈 10

I2DL: Prof. Niessner, Prof. Leal-Taixé 16

Why is this structure useful? 2-layer network: 𝒈 = 𝑿𝟑 max(𝟏, 𝑿𝟐𝒚) 𝒚 𝒊 𝑿𝟐 128 × 128 = 16384 1000 𝒈 𝑿2 10

slide-17
SLIDE 17

Neural Network

I2DL: Prof. Niessner, Prof. Leal-Taixé 17

2-layer network: 𝒈 = 𝑿𝟑 max(𝟏, 𝑿𝟐𝒚) 𝒚 𝒊 𝑿𝟐 128 × 128 = 16384 1000 𝒈 𝑿2 10 Input Layer

Hidden Layer Output Layer

slide-18
SLIDE 18

Net of f Art rtif ificial Neurons

𝑔(𝑋

0,0𝑦 + 𝑐0,0)

𝑦1 𝑦2 𝑦3

𝑔(𝑋

0,1𝑦 + 𝑐0,1)

𝑔(𝑋

0,2𝑦 + 𝑐0,2)

𝑔(𝑋

0,3𝑦 + 𝑐0,3)

𝑔(𝑋

1,0𝑦 + 𝑐1,0)

𝑔(𝑋

1,1𝑦 + 𝑐1,1)

𝑔(𝑋

1,2𝑦 + 𝑐1,2)

𝑔(𝑋

2,0𝑦 + 𝑐2,0)

I2DL: Prof. Niessner, Prof. Leal-Taixé 18

slide-19
SLIDE 19

Neural Network

I2DL: Prof. Niessner, Prof. Leal-Taixé 19

Source: https://towardsdatascience.com/training-deep-neural-networks-9fdb1964b964

slide-20
SLIDE 20

Activ ivatio ion Functions

Sigmoid: 𝜏 𝑦 =

1 (1+𝑓−𝑦)

tanh: tanh 𝑦 ReLU: max 0, 𝑦 Leaky ReLU: max 0.1𝑦, 𝑦 Maxout max 𝑥1

𝑈𝑦 + 𝑐1, 𝑥2 𝑈𝑦 + 𝑐2

ELU f x = ቊ 𝑦 if 𝑦 > 0 α e𝑦 − 1 if 𝑦 ≤ 0 Parametric ReLU: max 𝛽𝑦, 𝑦

I2DL: Prof. Niessner, Prof. Leal-Taixé 20

slide-21
SLIDE 21

Neural Network

𝒈 = 𝑿𝟒 ⋅ (𝑿𝟑 ⋅ 𝑿𝟐 ⋅ 𝒚 ))

I2DL: Prof. Niessner, Prof. Leal-Taixé 21

Why activation functions? Simply concatenating linear layers would be so much cheaper...

slide-22
SLIDE 22

Neural Network

I2DL: Prof. Niessner, Prof. Leal-Taixé 22

Why organize a neural network into layers?

slide-23
SLIDE 23

Bio iolo logical Neurons

I2DL: Prof. Niessner, Prof. Leal-Taixé 23

Credit: Stanford CS 231n

slide-24
SLIDE 24

Bio iolo logical Neurons

I2DL: Prof. Niessner, Prof. Leal-Taixé 24

Credit: Stanford CS 231n

slide-25
SLIDE 25

Art rtif ificial Neural Networks vs Bra rain

Artificial neural networks are insp spired by the brain, but not even close in terms of complexity! The comparison is great for the media and news articles however... 

I2DL: Prof. Niessner, Prof. Leal-Taixé 25

slide-26
SLIDE 26

Art rtif ificial Neural Network

𝑔(𝑋

0,0𝑦 + 𝑐0,0)

𝑦1 𝑦2 𝑦3

𝑔(𝑋

0,1𝑦 + 𝑐0,1)

𝑔(𝑋

0,2𝑦 + 𝑐0,2)

𝑔(𝑋

0,3𝑦 + 𝑐0,3)

𝑔(𝑋

1,0𝑦 + 𝑐1,0)

𝑔(𝑋

1,1𝑦 + 𝑐1,1)

𝑔(𝑋

1,2𝑦 + 𝑐1,2)

𝑔(𝑋

2,0𝑦 + 𝑐2,0)

I2DL: Prof. Niessner, Prof. Leal-Taixé 26

slide-27
SLIDE 27

Neural Network

  • Summary

– Given a dataset with ground truth training pairs [𝑦𝑗; 𝑧𝑗], – Find optimal weights 𝑿 using stochastic gradient descent, such that the loss function is minimized

  • Compute gradients with backpropagation (use batch-mode;

more later)

  • Iterate many times over training set (SGD; more later)

I2DL: Prof. Niessner, Prof. Leal-Taixé 27

slide-28
SLIDE 28

Computatio ional l Graphs

I2DL: Prof. Niessner, Prof. Leal-Taixé 28

slide-29
SLIDE 29

Computatio ional Gra raphs

  • Directional graph
  • Matrix operations are represented as compute

nodes.

  • Vertex nodes are variables or operators like +, -, *, /,

log(), exp() …

  • Directional edges show flow of inputs to vertices

I2DL: Prof. Niessner, Prof. Leal-Taixé 29

slide-30
SLIDE 30

Computatio ional Gra raphs

  • 𝑔 𝑦, 𝑧, 𝑨 = 𝑦 + 𝑧 ⋅ 𝑨

mult sum 𝑔 𝑦, 𝑧, 𝑨

I2DL: Prof. Niessner, Prof. Leal-Taixé 30

slide-31
SLIDE 31

Evalu luation: : Forw rward Pass

Initialization 𝑦 = 1, 𝑧 = −3, 𝑨 = 4 mult sum 𝑔 = −8 1 −3 4 𝑒 = −2 1 −3 4

I2DL: Prof. Niessner, Prof. Leal-Taixé 31

sum

  • 𝑔 𝑦, 𝑧, 𝑨 = 𝑦 + 𝑧 ⋅ 𝑨
slide-32
SLIDE 32

Computatio ional Gra raphs

  • Why discuss compute graphs?
  • Neural networks have complicated architectures

𝒈 = 𝑿𝟔𝜏(𝑿𝟓 tanh(𝑿𝟒, max(𝟏, 𝑿𝟑 max(𝟏, 𝑿𝟐𝒚))))

  • Lot of matrix operations!
  • Represent NN as computational graphs!

I2DL: Prof. Niessner, Prof. Leal-Taixé 32

slide-33
SLIDE 33

Computatio ional Gra raphs

A neural network can be represented as a computational graph...

– it has compute nodes (operations) – it has edges that connect nodes (data flow) – it is directional – it can be organized into ‘layers’

I2DL: Prof. Niessner, Prof. Leal-Taixé 33

slide-34
SLIDE 34

Computatio ional Gra raphs

I2DL: Prof. Niessner, Prof. Leal-Taixé 34

𝑨𝑙

(2) = ෍ 𝑗

𝑦𝑗𝑥𝑗𝑙

(2) + 𝑐𝑗 (2)

𝑏𝑙

(2) = 𝑔(𝑨𝑙 2 )

𝑨𝑙

(3) = ෍ 𝑗

𝑏𝑗

(2)𝑥𝑗𝑙 (3) + 𝑐𝑗 (3)

… 𝑦1 𝑦2 𝑦3

+1

𝑨1

(2)

𝑨2

(2)

𝑨3

(2)

𝑨1

(3)

𝑨2

(3)

𝑥11

(2)

𝑥12

(2)

𝑥13

(2)

𝑥21

(2)

𝑥22

(2)

𝑥23

(2)

𝑥31

(2)

𝑐1

(2)

𝑐2

(2)

𝑐3

(2)

𝑥33

(2)

𝑥11

(3)

𝑥12

(3)

𝑥21

(3)

𝑥22

(3)

𝑥31

(3)

𝑥32

(3)

𝑐1

(3)

𝑐2

(3)

𝑥32

(2)

𝑏1

(2)

𝑏2

(2)

𝑏3

(2)

𝑔 𝑔 𝑔

+1

slide-35
SLIDE 35

Computatio ional Gra raphs

I2DL: Prof. Niessner, Prof. Leal-Taixé 35

  • From a set of neurons to a Structured Compute Pipeline

[Szegedy et al.,CVPR’15] Going Deeper with Convolutions

slide-36
SLIDE 36

Computatio ional Gra raphs

  • The computation of Neural Network has further

meanings:

– The multiplication of 𝑿𝒋 and 𝒚: encode input information – The activation function: select the key features

I2DL: Prof. Niessner, Prof. Leal-Taixé 36

Source; https://www.zybuluo.com/liuhui0803/note/981434

slide-37
SLIDE 37

Computatio ional Gra raphs

  • The computations of Neural Networks have further

meanings:

– The convolutional layers: extract useful features with shared weights

I2DL: Prof. Niessner, Prof. Leal-Taixé 37

Source: https://www.zcfy.cc/original/understanding-convolutions-colah-s-blog

slide-38
SLIDE 38

Computatio ional Gra raphs

  • The computations of Neural Networks have further

meanings:

– The convolutional layers: extract useful features with shared weights

I2DL: Prof. Niessner, Prof. Leal-Taixé 38

Source: https://www.zybuluo.com/liuhui0803/note/981434

slide-39
SLIDE 39

Loss Functio ions

I2DL: Prof. Niessner, Prof. Leal-Taixé 39

slide-40
SLIDE 40

What’s Next?

I2DL: Prof. Niessner, Prof. Leal-Taixé 40

Are these reasonably close? Inputs Neural Network Outputs Targets

We need a way to describe how close the network's

  • utputs (= predictions) are to the targets!
slide-41
SLIDE 41

I2DL: Prof. Niessner, Prof. Leal-Taixé 41

Idea: calculate a ‘distance’ between prediction and target!

prediction target large distance! prediction target small distance! bad prediction good prediction

What’s Next?

slide-42
SLIDE 42

Loss Functions

  • A function to measure th

the goodness of f th the pre redic ictio ions (or equivalently, the network's performance) Intuitively, ...

– a larg rge loss ind indicates bad d pre redic ictions/performance (→ performance needs to be improved by training the model) – the choice of the loss function depends on the concrete problem or the distribution of the target variable

I2DL: Prof. Niessner, Prof. Leal-Taixé 42

slide-43
SLIDE 43

Regression Loss

  • L1 Loss:

𝑀 𝒛, ෝ 𝒛; 𝜾 = 1 𝑜 ෍

𝑗 𝑜

| 𝑧𝑗 − ෝ 𝑧𝑗| 1

  • MSE Loss:

𝑀 𝒛, ෝ 𝒛; 𝜾 = 1 𝑜 ෍

𝑗 𝑜

| 𝑧𝑗 − ෝ 𝑧𝑗| 2

2

I2DL: Prof. Niessner, Prof. Leal-Taixé 43

slide-44
SLIDE 44

Bin inary Cro ross Entro ropy

I2DL: Prof. Niessner, Prof. Leal-Taixé 44

  • Loss function for binary (yes/no) classification

Yes! (0.8) No! (0.2) The network predicts the probability of the input belonging to the "yes" class!

𝑀 𝒛, ෝ 𝒛; 𝜾 = − ෍

𝑗=1 𝑜

(𝑧𝑗 ∙ log ෝ 𝑧𝑗 + (1 − 𝑧𝑗) ∙ log[1 − ෝ 𝑧𝑗])

slide-45
SLIDE 45

Cro ross Entro ropy

I2DL: Prof. Niessner, Prof. Leal-Taixé 45

= loss function for multi-class classification

dog (0.1) rabbit (0.2) duck (0.7) …

𝑀 𝒛, ෝ 𝒛; 𝜾 = − ෍

𝑗=1 𝑜

𝑙=1 𝑙

(𝑧𝑗𝑙 ∙ logො 𝑧𝑗𝑙)

This generalizes the binary case from the slide before!

slide-46
SLIDE 46

More re General Case

I2DL: Prof. Niessner, Prof. Leal-Taixé 46

  • Ground truth: 𝒛
  • Prediction: ෝ

𝒛

  • Loss function: 𝑀 𝒛, ෝ

𝒛

  • Motivation:

– minimize the loss <=> find better predictions – predictions are generated by the NN – find better predictions <=> find better NN

slide-47
SLIDE 47

I2DL: Prof. Niessner, Prof. Leal-Taixé 47

Prediction Targets Bad prediction! Loss Training time

In Init itially

t1

slide-48
SLIDE 48

I2DL: Prof. Niessner, Prof. Leal-Taixé 48

Prediction Targets Bad prediction! Loss Training time

During Training…

t2

slide-49
SLIDE 49

I2DL: Prof. Niessner, Prof. Leal-Taixé 49

Prediction Targets Bad prediction! Loss Training time t3

During Training…

slide-50
SLIDE 50

I2DL: Prof. Niessner, Prof. Leal-Taixé 50

Tra rain ining Curv rve

Training time Loss

slide-51
SLIDE 51

I2DL: Prof. Niessner, Prof. Leal-Taixé 51

How to Fin ind a Better NN?

Parameters 𝜾 Loss 𝑀(𝒛, 𝑔

𝜾 𝒚 )

𝜾

Plotting loss curves against model parameters

slide-52
SLIDE 52

How to Fin ind a Better NN?

I2DL: Prof. Niessner, Prof. Leal-Taixé 52

Optimization! We train compute graphs with some optimization techniques!

  • Loss function: 𝑀 𝒛, ෝ

𝒛 = 𝑀(𝒛, 𝑔

𝜾 𝒚 )

  • Neural Network: 𝑔

𝜾(𝒚)

  • Goal:

– minimize the loss w. r. t. 𝜾

slide-53
SLIDE 53

How to Fin ind a Better NN?

I2DL: Prof. Niessner, Prof. Leal-Taixé 53

Gradient Descent Loss function 𝜾 Loss 𝑀(𝒛, 𝑔

𝜾 𝒚 )

𝜾

  • Minimize: 𝑀 𝒛, 𝑔

𝜾 𝒚

w.r.t. 𝜾

  • In the context of NN, we use gradient-based
  • ptimization
slide-54
SLIDE 54

How to Fin ind a Better NN?

I2DL: Prof. Niessner, Prof. Leal-Taixé 54

  • Minimize: 𝑀 𝒛, 𝑔

𝜾 𝒚

w.r.t. 𝜾

𝜾 Loss 𝑀(𝒛, 𝑔

𝜾 𝒚 )

𝜾

𝛂𝜾𝑀(𝒛, 𝑔

𝜾 𝒚 )

𝜾 = 𝜾 − 𝛽 𝛂𝜾𝑀 𝒛, 𝑔

𝜾 𝒚

𝜾∗ = arg min 𝑀 𝒛, 𝑔

𝜾 𝒚

Learning rate

slide-55
SLIDE 55

How to Fin ind a Better NN?

  • Given inputs 𝒚 and targets 𝒛
  • Given one layer NN with no activation function

𝒈𝜾 𝒚 = 𝑿𝒚 , 𝜾 = 𝑿

Later 𝜾 = {𝑿, 𝒄}

  • Given MSE Loss: 𝑀 𝒛, ෝ

𝒛; 𝜾 =

1 𝑜 σ𝑗 𝑜 | 𝑧𝑗 − ෝ

𝑧𝑗| 2

2

I2DL: Prof. Niessner, Prof. Leal-Taixé 55

slide-56
SLIDE 56

How to Fin ind a Better NN?

  • Given inputs 𝒚 and targets 𝒛
  • Given one layer NN with no activation function
  • Given MSE Loss: 𝑀 𝒛, ෝ

𝒛; 𝜾 =

1 𝑜 σ𝑗 𝑜 | 𝑧𝑗 − 𝑿 ⋅ 𝑦𝑗| 2 2

I2DL: Prof. Niessner, Prof. Leal-Taixé 56

𝑦 * Multiply 𝑋 𝑧 𝑀 Gradient flow

slide-57
SLIDE 57

How to Fin ind a Better NN?

  • Given inputs 𝒚 and targets 𝒛
  • Given a one layer NN with no activation function

𝒈𝜾 𝒚 = 𝑿𝒚 , 𝜾 = 𝑿

  • Given MSE Loss: 𝑀 𝒛, ෝ

𝒛; 𝜾 =

1 𝑜 σ𝑗 𝑜 | 𝑿 ⋅ 𝒚𝒋 − 𝒛𝒋| 2 2

  • 𝛂𝜾𝑀 𝒛, 𝑔

𝜾 𝒚

=

1 𝑜 σ𝑗 𝑜 𝑿 ⋅ 𝒚𝒋 − 𝒛𝒋 ⋅ 𝒚𝒋 𝑼

I2DL: Prof. Niessner, Prof. Leal-Taixé 57

slide-58
SLIDE 58

How to Fin ind a Better NN?

  • Given inputs 𝒚 and targets 𝒛
  • Given a multi-layer NN with many activations

𝒈 = 𝑿𝟔𝜏(𝑿𝟓 tanh(𝑿𝟒, max(𝟏, 𝑿𝟑 max(𝟏, 𝑿𝟐𝒚))))

  • Gradient descent for 𝑀 𝒛, 𝑔

𝜾 𝒚

  • w. r. t. 𝜾

– Need to propagate gradients from end to first layer (𝑿𝟐).

I2DL: Prof. Niessner, Prof. Leal-Taixé 58

slide-59
SLIDE 59

How to Fin ind a Better NN?

  • Given inputs 𝒚 and targets 𝒛
  • Given multi-layer NN with many activations

I2DL: Prof. Niessner, Prof. Leal-Taixé 59

𝑦 * Multiply 𝑋

1

𝑋

2

max(𝟏, ) * Multiply 𝑧 𝑀 Gradient flow

slide-60
SLIDE 60

How to Fin ind a Better NN?

  • Given inputs 𝒚 and targets 𝒛
  • Given multilayer layer NN with many activations

𝒈 = 𝑿𝟔𝜏(𝑿𝟓 tanh(𝑿𝟒, max(𝟏, 𝑿𝟑 max(𝟏, 𝑿𝟐𝒚))))

  • Gradient descent solution for 𝑀 𝒛, 𝑔

𝜾 𝒚

  • w. r. t. 𝜾

– Need to propagate gradients from end to first layer (𝑿𝟐)

  • Backpropagation: Use chain rule to compute

gradients

– Compute graphs come in handy!

I2DL: Prof. Niessner, Prof. Leal-Taixé 60

slide-61
SLIDE 61

How to Fin ind a Better NN?

  • Why gradient descent?

– Easy to compute using compute graphs

  • Other methods include

– Newtons method – L-BFGS – Adaptive moments – Conjugate gradient

I2DL: Prof. Niessner, Prof. Leal-Taixé 61

slide-62
SLIDE 62

Summary

  • Neural Networks are computational graphs
  • Goal: for a given train set, find optimal weights
  • Optimization is done using gradient-based solvers

– Many options (more in the next lectures)

  • Gradients are computed via backpropagation

– Nice because can easily modularize complex functions

I2DL: Prof. Niessner, Prof. Leal-Taixé 62

slide-63
SLIDE 63

Next Lectures

  • Next

xt Lectu ture:

– Backpropagation and optimization of Neural Networks

  • Check fo

for r updates on website/moodle regarding exercises

I2DL: Prof. Niessner, Prof. Leal-Taixé 63

slide-64
SLIDE 64

See you next week 

I2DL: Prof. Niessner, Prof. Leal-Taixé 64

slide-65
SLIDE 65

Furt rther Reading

  • Optimization:

– http://cs231n.github.io/optimization-1/ – http://www.deeplearningbook.org/contents/optimizatio n.html

  • General concepts:

– Pa Patte ttern Recogni nitio ion and nd Machin hine Learn rning – C. Bishop – http://www.deeplearningbook.org/

I2DL: Prof. Niessner, Prof. Leal-Taixé 65