Le Lecture 10 recap ap Prof. Leal-Taix and Prof. Niessner 1 Le - - PowerPoint PPT Presentation

le lecture 10 recap ap
SMART_READER_LITE
LIVE PREVIEW

Le Lecture 10 recap ap Prof. Leal-Taix and Prof. Niessner 1 Le - - PowerPoint PPT Presentation

Le Lecture 10 recap ap Prof. Leal-Taix and Prof. Niessner 1 Le LeNet 60k parameters Digit recognition: 10 classes Conv -> Pool -> Conv -> Pool -> Conv -> FC As we go deeper: Width, height Number of filters


slide-1
SLIDE 1

Le Lecture 10 recap ap

  • Prof. Leal-Taixé and Prof. Niessner

1

slide-2
SLIDE 2

Le LeNet

  • Digit recognition: 10 classes
  • Conv -> Pool -> Conv -> Pool -> Conv -> FC
  • As we go deeper: Width, height Number of filters
  • Prof. Leal-Taixé and Prof. Niessner

2

60k parameters

slide-3
SLIDE 3

Al AlexNe Net

  • Prof. Leal-Taixé and Prof. Niessner

3

[Krizhevsky et al. 2012]

  • Softmax for 1000 classes
slide-4
SLIDE 4

VG VGGNet

  • Striving for simplicity
  • CONV = 3x3 filters with stride 1, same convolutions
  • MAXPOOL = 2x2 filters with stride 2
  • Prof. Leal-Taixé and Prof. Niessner

4

[Simonyan and Zisserman 2014]

slide-5
SLIDE 5

VG VGGNet

  • Prof. Leal-Taixé and Prof. Niessner

5

Conv=3x3,s=1,same Maxpool=2x2,s=2

slide-6
SLIDE 6

VG VGGNet

  • Prof. Leal-Taixé and Prof. Niessner

6

  • Conv -> Pool -> Conv -> Pool -> Conv -> FC
  • As we go deeper: Width, height Number of filters
  • Called VGG-16: 16 layers that have weights
  • Large but simplicity makes it appealing

138M parameters

slide-7
SLIDE 7

The The problem of depth

  • As we add more and more layers, training becomes

harder

  • Vanishing and exploding gradients
  • How can we train very deep nets?
  • Prof. Leal-Taixé and Prof. Niessner

7

slide-8
SLIDE 8

Re Residual block

  • Two layers
  • Prof. Leal-Taixé and Prof. Niessner

8

Input Linear Non-linearity

W LxL−1 + bL xL = f(W LxL−1 + bL) xL+1 = f(W L+1xL + bL+1) xL−1 xL+1 xL

slide-9
SLIDE 9

Re Residual block

  • Two layers
  • Prof. Leal-Taixé and Prof. Niessner

9

Linear

xL−1 xL+1 xL

Linear Main path Input Skip connection

slide-10
SLIDE 10

Re Residual block

  • Two layers
  • Prof. Leal-Taixé and Prof. Niessner

10

Linear

xL−1 xL+1 xL

Linear Input

xL+1 = f(W L+1xL + bL+1) xL+1 = f(W L+1xL + bL+1 + xL−1)

slide-11
SLIDE 11

Re Residual block

  • Two layers
  • Usually use a same convolution since we need same

dimensions

  • Otherwise we need to convert the dimensions with a

matrix of learned weights or zero padding

  • Prof. Leal-Taixé and Prof. Niessner

11

xL+1 xL xL−1

+

slide-12
SLIDE 12

Wh Why do Re ResNets wo work?

  • The identity is easy for the residual block to learn
  • Guaranteed it will not hurt performance, can only

improve

  • Prof. Leal-Taixé and Prof. Niessner

12

xL+1 xL xL−1

+

NN

slide-13
SLIDE 13

1x 1x1 1 convoluti tion

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1

2 Image 5x5 Kernel 1x1

  • Prof. Leal-Taixé and Prof. Niessner

13

What is the output size?

slide-14
SLIDE 14

1x 1x1 1 convoluti tion

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1

2 Image 5x5 Kernel 1x1 −5 ∗ 2 = −10

  • Prof. Leal-Taixé and Prof. Niessner

14

  • 10
slide-15
SLIDE 15

1x 1x1 1 convoluti tion

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1

2 Image 5x5 Kernel 1x1 −1 ∗ 2 = −2

  • Prof. Leal-Taixé and Prof. Niessner

15

  • 10

6 4

  • 10

6 8 6 4 2

  • 6

2 6 6 10

  • 4

2 8 8 10 12 14 18

  • 2
slide-16
SLIDE 16

1x 1x1 1 convoluti tion

  • 5

3 2

  • 5

3 4 3 2 1

  • 3

1 3 3 5

  • 2

1 4 4 5 6 7 9

  • 1

Image 5x5

  • Prof. Leal-Taixé and Prof. Niessner

16

  • 10

6 4

  • 10

6 8 6 4 2

  • 6

2 6 6 10

  • 4

2 8 8 10 12 14 18

  • 2
  • For 1 kernel or filter, it keeps the dimensions and just

scales the input with a number

slide-17
SLIDE 17

Us Using 1x1 convolutions

  • Prof. Leal-Taixé and Prof. Niessner

17

  • Use it to shrink the number of channels
  • Further adds a non-linearity à one can learn more

complex functions

32 32 200 32 32 32 32 Conv 1x1x200 + ReLU

slide-18
SLIDE 18

In Inceptio ion layer

  • Tired of choosing filter sizes?
  • Use them all!
  • All same convolutions
  • 3x3 max pooling is with stride 1

18

  • Prof. Leal-Taixé and Prof. Niessner
slide-19
SLIDE 19

In Inceptio ion layer: : computatio ional cost

32 32 200 92 Conv 5x5 + ReLU Multiplications: 1x1x200x32x32x16 5x5x16x32x32x92 ~ 40 million 32 32 92 16 Conv 1x1 + ReLU 32 32 16

Reduction of multiplications by 1/10

  • Prof. Leal-Taixé and Prof. Niessner

19

slide-20
SLIDE 20

In Inceptio ion layer

  • Prof. Leal-Taixé and Prof. Niessner

20

slide-21
SLIDE 21

Se Semant ntic Se Segment ntation n (FCN)

[Long et al. 15] Fully Convolutional Networks for Semantic Segmetnation (FCN)

  • Prof. Leal-Taixé and Prof. Niessner

21

slide-22
SLIDE 22

Tr Trans nsfer learni ning ng

  • Prof. Leal-Taixé and Prof. Niessner

22

Trained on ImageNet New dataset with C classes TRAIN FROZEN

Donahue 2014, Razavian 2014

slide-23
SLIDE 23

No Now you are:

  • Ready to perform image classification on any dataset
  • Ready to design your own architecture
  • Ready to deal with other problems such as semantic

segmentation (Fully Convolutional Network)

  • Prof. Leal-Taixé and Prof. Niessner

23

slide-24
SLIDE 24

Re Recurrent Ne Neural Ne Networks

  • Prof. Leal-Taixé and Prof. Niessner

24

slide-25
SLIDE 25

RN RNNs are flexi xible

Classic Neural Networks for Image Classification

  • Prof. Leal-Taixé and Prof. Niessner

25

slide-26
SLIDE 26

RN RNNs are flexi xible

Image captioning

  • Prof. Leal-Taixé and Prof. Niessner

26

slide-27
SLIDE 27

RN RNNs are flexi xible

Language recognition

  • Prof. Leal-Taixé and Prof. Niessner

27

slide-28
SLIDE 28

RN RNNs are flexi xible

Machine translation

  • Prof. Leal-Taixé and Prof. Niessner

28

slide-29
SLIDE 29

RN RNNs are flexi xible

Event classification

  • Prof. Leal-Taixé and Prof. Niessner

29

slide-30
SLIDE 30

Ba Basic c struct uctur ure of a RN RNN

  • Multi-layer RNN

Outputs Inputs Hidden states

  • Prof. Leal-Taixé and Prof. Niessner

30

slide-31
SLIDE 31

Ba Basic c struct uctur ure of a RN RNN

  • Multi-layer RNN

Outputs Inputs Hidden states The hidden state will have its own internal dynamics More expressive model!

  • Prof. Leal-Taixé and Prof. Niessner

31

slide-32
SLIDE 32

Ba Basic c struct uctur ure of a RN RNN

  • We want to have notion of “time” or “sequence”

[Christopher Olah] Understanding LSTMs

Hidden state input Previous hidden state

  • Prof. Leal-Taixé and Prof. Niessner

32

slide-33
SLIDE 33

Ba Basic c struct uctur ure of a RN RNN

  • We want to have notion of “time” or “sequence”

Hidden state Parameters to be learned

  • Prof. Leal-Taixé and Prof. Niessner

33

slide-34
SLIDE 34

Ba Basic c struct uctur ure of a RN RNN

  • We want to have notion of “time” or “sequence”

Hidden state Note: non-linearities ignored for now Output

  • Prof. Leal-Taixé and Prof. Niessner

34

slide-35
SLIDE 35

Ba Basic c struct uctur ure of a RN RNN

  • We want to have notion of “time” or “sequence”

Hidden state Same parameters for each time step = generalization! Output

  • Prof. Leal-Taixé and Prof. Niessner

35

slide-36
SLIDE 36

Ba Basic c struct uctur ure of a RN RNN

  • Unrolling RNNs

[Christopher Olah] Understanding LSTMs

Hidden state is the same

  • Prof. Leal-Taixé and Prof. Niessner

36

slide-37
SLIDE 37

Ba Basic c struct uctur ure of a RN RNN

  • Unrolling RNNs

[Christopher Olah] Understanding LSTMs

  • Prof. Leal-Taixé and Prof. Niessner

37

slide-38
SLIDE 38

Ba Basic c struct uctur ure of a RN RNN

  • Unrolling RNNs as feedforward nets

w1 w2 w3 w4 w1 w2 w3 w4 w1 w2 w3 w4 w1 w2 w3 w4 xt

1

xt

2

xt+2

1

xt+2

2

xt+1

2

xt+1

1

Weights are the same!

  • Prof. Leal-Taixé and Prof. Niessner

38

slide-39
SLIDE 39

Ba Back ckprop th through a a RNN

  • Unrolling RNNs as feedforward nets

w1 w2 w3 w4 w1 w2 w3 w4 w1 w2 w3 w4 w1 w2 w3 w4 Chain rule All the way to t=0

Add the derivatives at different times for each weight

  • Prof. Leal-Taixé and Prof. Niessner

39

slide-40
SLIDE 40

Lo Long ng-te term dependencies

I mo moved to Germany any … so I speak German an fluently

  • Prof. Leal-Taixé and Prof. Niessner

40

slide-41
SLIDE 41

Lo Long ng-te term dependencies

  • Simple recurrence
  • Let us forget the input

At = θtA0

Same weights are multiplied over and over again

  • Prof. Leal-Taixé and Prof. Niessner

41

slide-42
SLIDE 42

Lo Long ng-te term dependencies

  • Simple recurrence

What happens to small weights? What happens to large weights? Vanishing gradient Exploding gradient

At = θtA0

  • Prof. Leal-Taixé and Prof. Niessner

42

slide-43
SLIDE 43

Lo Long ng-te term dependencies

  • Simple recurrence
  • If admits eigendecomposition

At = θtA0

Diagonal of this matrix are the eigenvalues Matrix of eigenvectors

  • Prof. Leal-Taixé and Prof. Niessner

43

slide-44
SLIDE 44

Lo Long ng-te term dependencies

  • Simple recurrence
  • If admits eigendecomposition
  • Orthogonal allows us to simplify the recurrence

At = QΛtQ|A0 At = θtA0

  • Prof. Leal-Taixé and Prof. Niessner

44

slide-45
SLIDE 45

Lo Long ng-te term dependencies

  • Simple recurrence

At = QΛtQ|A0

What happens to eigenvalues with magnitude less than one? What happens to eigenvalues with magnitude larger than one? Vanishing gradient Exploding gradient Gradient clipping

  • Prof. Leal-Taixé and Prof. Niessner

45

slide-46
SLIDE 46

Lo Long ng-te term dependencies

  • Simple recurrence

At = θtA0

Let us just make a matrix with eigenvalues = 1 Allow the ce cell to maintain its “state”

  • Prof. Leal-Taixé and Prof. Niessner

46

slide-47
SLIDE 47

Va Vanishing gradient

  • 1. From the weights
  • 2. From the activation functions (tanh)

At = θtA0

  • Prof. Leal-Taixé and Prof. Niessner

47

slide-48
SLIDE 48

Va Vanishing gradient

  • 1. From the weights
  • 2. From the activation functions (tanh)

At = θtA0

1

  • Prof. Leal-Taixé and Prof. Niessner

48

slide-49
SLIDE 49

Lo Long ng Sho hort Term Me Memory

Hochreiter and Schmidhuber 1997

  • Prof. Leal-Taixé and Prof. Niessner

49

slide-50
SLIDE 50

Lo Long ng-Sho Short Te Term Memory Uni Units

  • Simple RNN has tanh as non-linearity
  • Prof. Leal-Taixé and Prof. Niessner

50

slide-51
SLIDE 51

Lo Long ng-Sho Short Te Term Memory Uni Units

  • LSTM
  • Prof. Leal-Taixé and Prof. Niessner

51

slide-52
SLIDE 52

Lo Long ng-Sho Short Te Term Memory Uni Units

  • Key ingredients
  • Cell = transports the information through the unit
  • Prof. Leal-Taixé and Prof. Niessner

52

slide-53
SLIDE 53

Lo Long ng-Sho Short Te Term Memory Uni Units

  • Key ingredients
  • Cell = transports the information through the unit
  • Gate = remove or add information to the cell state

Sigmoid

  • Prof. Leal-Taixé and Prof. Niessner

53

slide-54
SLIDE 54

LSTM LSTM: step p by by step

  • Forget gate

Decides when to erase the cell state Sigmoid = output between 0 (forget) and 1 (keep)

  • Prof. Leal-Taixé and Prof. Niessner

54

slide-55
SLIDE 55

LSTM LSTM: step p by by step

  • Input gate

Decides which values will be updated New cell state,

  • utput from a

tanh (-1,1)

  • Prof. Leal-Taixé and Prof. Niessner

55

slide-56
SLIDE 56

LSTM LSTM: step p by by step

  • Element-wise operations
  • Prof. Leal-Taixé and Prof. Niessner

56

slide-57
SLIDE 57

LSTM LSTM: step p by by step

  • Output gate

Decides which values will be

  • utputted

Output from a tanh (-1,1)

  • Prof. Leal-Taixé and Prof. Niessner

57

slide-58
SLIDE 58

LSTM LSTM: step p by by step

  • Forget gate
  • Input gate
  • Output gate
  • Cell update
  • Cell
  • Output

gt = Tanh(θxgxt + θhght−1 + bg) ht = ot Tanh(Ct)

  • Prof. Leal-Taixé and Prof. Niessner

58

slide-59
SLIDE 59

LSTM LSTM: vani nishi hing ng gr gradient nts?

  • Cell
  • 1. From the weights
  • 2. From the activation functions

weights Identity function

1 for important information

  • Prof. Leal-Taixé and Prof. Niessner

59

slide-60
SLIDE 60

Lo Long ng-Sho Short Te Term Memory Uni Units

  • Highway for the gradient to flow
  • Prof. Leal-Taixé and Prof. Niessner

60

slide-61
SLIDE 61

RN RNN’s in Computer Vi Vision

  • Caption generation

Xu et al. 2015

  • Prof. Leal-Taixé and Prof. Niessner

62

slide-62
SLIDE 62

RN RNN’s in Computer Vi Vision

  • Caption generation
  • Focus is shifted to different parts of the image

Xu et al. 2015

  • Prof. Leal-Taixé and Prof. Niessner

63

slide-63
SLIDE 63

RN RNN’s in Computer Vi Vision

  • Instance segmentation

Romera-Paredes et al. 2015

  • Prof. Leal-Taixé and Prof. Niessner

64

slide-64
SLIDE 64

Fi Final nal exam am

  • Prof. Leal-Taixé and Prof. Niessner

65

slide-65
SLIDE 65

Fi Final exa xam

  • Multiple choice questions
  • Series of questions with free answer
  • There can be questions related to the exercises à if

you did the exercises it will be easier for you to answer them

  • Prof. Leal-Taixé and Prof. Niessner

66

slide-66
SLIDE 66

Fi Final exa xam

  • Must-know topics:

– Basics of ML à from linear classifier to NN – Optimization schemes (not necessary to know all the formulas, but to have a good understanding of the differences between them and their behavior – Backpropagation: concept, math, hint: be fluent at computing backprop by hand – Loss functions and activation functions – CNN: convolution, backprop – RNN, LSTMs

  • Prof. Leal-Taixé and Prof. Niessner

67

slide-67
SLIDE 67

Ad Admin

  • Exam date: Ju

July 16 16th

h at 08:

08:00 00

  • There will NOT be a retake exam
  • No cheat sheet nor calculator during the exam
  • Prof. Leal-Taixé and Prof. Niessner

68

slide-68
SLIDE 68

Ne Next semesters: ne new DL DL courses

  • Prof. Leal-Taixé and Prof. Niessner

69

slide-69
SLIDE 69

De Deep Learning at TUM

  • Keep expanding the courses on Deep Learning
  • This Introduction to Deep Learning course is the basis

for a series of Advanced DL lectures on different topics

  • Advanced topics are typically only for Master

students

  • Prof. Leal-Taixé and Prof. Niessner

70

slide-70
SLIDE 70

De Deep Learning at TUM

Intro to Deep Learning DL for Physics

(Th Thuerey)

DL for Vision

(Ni Niessner, , Le Leal-Ta Taixe)

DL for Medical Applicat.

(Me Menze)

DL in Robotics

(Bä Bäuml)

Machine Learning

(Gü Günnemann)

  • Prof. Leal-Taixé and Prof. Niessner

71

slide-71
SLIDE 71

Ad Advanced DL DL for Computer Vision

  • Deep Learning for Vision (WS18/19): syllabus

– Advanced architectures, e.g. Siamese neural networks – Variational Autoencoders – Generative models, e.g. GAN, – Multi-dimensional CNN – Bayesian Deep Learning

  • Prof. Leal-Taixé and Prof. Niessner

72

slide-72
SLIDE 72

Ad Advanced DL DL for Computer Vision

  • Deep Learning for Vision (WS18/19)

– 2 V + 5 P – Mu Must have attended the Intro to DL – Practical part is a project that will last the whole semester – Please do not sign up unless you are willing to spend a lot of time on the project!

  • Prof. Leal-Taixé and Prof. Niessner

73

slide-73
SLIDE 73

De Detection, Segmentation and Tracking

  • New lecture (Prof. Leal-Taixé, SS19)

– Must have attended the Intro to DL – Common detection and segmentation frameworks (YOLO, Faster-RCNN, Mask-RCNN) – Extension to videos à tracking – One project that will last the whole semester

  • Prof. Leal-Taixé and Prof. Niessner

74

slide-74
SLIDE 74

Tha Thank nk you

Visual Computing

Dynamic Vision and Learning

  • 3D scanning
  • DL in 3D understanding
  • 3D reconstruction
  • Video segmentation
  • Object tracking
  • Camera localization

Ni Niessner Le Leal-Ta Taixé

  • Prof. Leal-Taixé and Prof. Niessner

75