Neural Network Basics Niloy Mitra Iasonas Kokkinos Paul Guerrero - - PowerPoint PPT Presentation

neural network basics
SMART_READER_LITE
LIVE PREVIEW

Neural Network Basics Niloy Mitra Iasonas Kokkinos Paul Guerrero - - PowerPoint PPT Presentation

Deep Learning for Graphics Neural Network Basics Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL EG Course Deep Learning for Graphics Timetable


slide-1
SLIDE 1

Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL

Deep Learning for Graphics

EG Course Deep Learning for Graphics

Neural Network Basics

slide-2
SLIDE 2

EG Course Deep Learning for Graphics

Timetable

Niloy Iasonas Paul Vova Kostas Tobias Ersin Introduction X X X X Theory X NN Basics X X X Supervised Applications Data X Unsupervised Applications X Beyond 2D X X X X Outlook X X X X X X X

slide-3
SLIDE 3

Introduction to Neural Networks

slide-4
SLIDE 4

EG Course Deep Learning for Graphics

Examples: : function parameters, these are learned : source domain : target domain Image Classification:

: image dimensions : class count

Image Synthesis:

: image dimensions : latent variable count

Goal: Learn a Parametric Function

slide-5
SLIDE 5

EG Course Deep Learning for Graphics

Feature coordinate Feature coordinate Each data point has a class label:

Machine Learning 101: Linear Classifier

slide-6
SLIDE 6

EG Course Deep Learning for Graphics

  • Sec. 15.2.3

Nonlinear decision boundaries

slide-7
SLIDE 7

EG Course Deep Learning for Graphics

Given a library of simple functions Compose into a complicated function

Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Building A Complicated Function

slide-8
SLIDE 8

EG Course Deep Learning for Graphics

Given a library of simple functions Compose into a complicated function

Idea 1: Linear Combinations

  • Boosting
  • Kernels

Building A Complicated Function

slide-9
SLIDE 9

EG Course Deep Learning for Graphics

Given a library of simple functions Compose into a complicated function

Idea 2: Compositions

  • Decision Trees
  • Deep Learning

Building A Complicated Function

slide-10
SLIDE 10

EG Course Deep Learning for Graphics

Given a library of simple functions Compose into a complicated function

Idea 2: Compositions

  • Decision Trees
  • Grammar models
  • Deep Learning

Building A Complicated Function

slide-11
SLIDE 11

EG Course Deep Learning for Graphics

Sigmoidal activation

basic building block

‘Neuron’: Cascade of Linear and Nonlinear Function

slide-12
SLIDE 12

EG Course Deep Learning for Graphics

Sigmoidal (“logistic”) Step (“perceptron”) Hyperbolic tangent Rectified Linear Unit (RELU)

function derivative

Image Credit: Olivier Grisel and Charles Ollion

Activation functions

slide-13
SLIDE 13

EG Course Deep Learning for Graphics

non-adaptive hand-coded features

  • utput units e.g.

class labels input units e.g. pixels

Apple Orange

Fixed mapping

Slide credit: G. Hinton

Perceptrons (60’s)

XOR: perceptron killer

slide-14
SLIDE 14

EG Course Deep Learning for Graphics

input vector

hidden layers

  • utputs

Slide credit: G. Hinton

Multi-Layer Perceptrons (~1985)

slide-15
SLIDE 15

EG Course Deep Learning for Graphics

  • Sec. 15.2.3

This is what the hidden layers should be doing!

Reminder: Non-linear decision boundaries

slide-16
SLIDE 16

EG Course Deep Learning for Graphics

Evolution of isocontours as parameters change

http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

Nonlinear mapping

slide-17
SLIDE 17

EG Course Deep Learning for Graphics

Non-linearly separable data Data mapped to learned space Decision function

http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

From non-separable to linearly separable

slide-18
SLIDE 18

EG Course Deep Learning for Graphics

http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

Linearizing a 2D classification task (4 hidden layers)

slide-19
SLIDE 19

EG Course Deep Learning for Graphics

Points in 1D, Decision in 2D

Linearization: may need higher dimensions

http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

slide-20
SLIDE 20

EG Course Deep Learning for Graphics

Linearization: may need higher dimensions

http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

slide-21
SLIDE 21

EG Course Deep Learning for Graphics

Linearization: may need higher dimensions

http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

slide-22
SLIDE 22

EG Course Deep Learning for Graphics

Intuition: learn “dictionary” for objects “Distributed representation”: represent (and classify) objects by mixing & mashing reusable parts

Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Hidden Layers: intuitively, what do they do?

slide-23
SLIDE 23

EG Course Deep Learning for Graphics

“car”

Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Deep Learning = Hierarchical Compositionality

slide-24
SLIDE 24

EG Course Deep Learning for Graphics

Trainable Classifier Low-Level Feature Mid-Level Feature High-Level Feature

“car”

Deep Learning = Hierarchical Compositionality

slide-25
SLIDE 25

EG Course Deep Learning for Graphics

MLP Demo: playground.tensorflow.org

slide-26
SLIDE 26

Training and Optimization

slide-27
SLIDE 27

EG Course Deep Learning for Graphics

Stochastic Gradient Descent, Momentum, “weight decay” Dropout ReLUs Batch Normalization Residual Networks Old: New: (last 5-6 years) Back-propagation algorithm

Neural Network Training: Old & New Tricks

slide-28
SLIDE 28

EG Course Deep Learning for Graphics

Our network implements a parametric function: During training, we search for parameters that minimize a loss: Example: L2 regression loss given target pairs :

Training Goal

slide-29
SLIDE 29

EG Course Deep Learning for Graphics

Initialize: Update:

We can always make it converge for a convex function

Gradient Descent Minimization Method

slide-30
SLIDE 30

EG Course Deep Learning for Graphics

Empirically all are almost equally good Central research topic: how can this happen? On to the gradients!

Multiple Local Minima, based on initialization

slide-31
SLIDE 31

EG Course Deep Learning for Graphics

Forward Backward

All you need is gradients

slide-32
SLIDE 32

EG Course Deep Learning for Graphics

Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Chain Rule

slide-33
SLIDE 33

EG Course Deep Learning for Graphics

Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Chain Rule

slide-34
SLIDE 34

EG Course Deep Learning for Graphics

Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

‘Another Brick in the Wall’

slide-35
SLIDE 35

EG Course Deep Learning for Graphics

Composition of differentiable blocks:

Toy example: single sigmoidal unit

slide-36
SLIDE 36

EG Course Deep Learning for Graphics

Computation graph & automatic differentiation

Slide Credit: Justin Johnson

slide-37
SLIDE 37

EG Course Deep Learning for Graphics

input vector

hidden layers

  • utputs

Multi-Layer Perceptrons

Slide Credit: G. Hinton

slide-38
SLIDE 38

EG Course Deep Learning for Graphics

input vector

hidden layers

  • utputs

Back-propagate error signal to get derivatives for learning

Compare outputs with correct answer to get error signal

Multi-Layer Perceptrons

Slide Credit: G. Hinton

slide-39
SLIDE 39

EG Course Deep Learning for Graphics

Back-propagation Algorithm

slide-40
SLIDE 40

EG Course Deep Learning for Graphics

Our network implements a parametric function: During training, we search for parameters that minimize a loss: Example: L2 regression loss given target pairs :

Training Goal

slide-41
SLIDE 41

EG Course Deep Learning for Graphics

Inputs Outputs Hidden layer Parameters:

A Neural Network for Multi-way Classification

slide-42
SLIDE 42

EG Course Deep Learning for Graphics

Inputs Outputs Hidden layer

A Neural Network in Forward Mode

slide-43
SLIDE 43

EG Course Deep Learning for Graphics

Inputs Outputs Hidden layer

A Neural Network in Forward Mode

slide-44
SLIDE 44

EG Course Deep Learning for Graphics

Inputs Hidden layer Outputs

A Neural Network in Forward Mode

slide-45
SLIDE 45

EG Course Deep Learning for Graphics

Inputs Hidden layer Outputs

A Neural Network in Forward Mode

slide-46
SLIDE 46

EG Course Deep Learning for Graphics

Outputs

Ground truth

Objective for linear regression

slide-47
SLIDE 47

EG Course Deep Learning for Graphics

Outputs

Softmax unit

Ground truth

`Cross-entropy’ loss

Objective for multi-class classification

slide-48
SLIDE 48

EG Course Deep Learning for Graphics

Network output: Loss (prediction error): What we need to compute for gradient descent:

Neural network in forward mode: recap

slide-49
SLIDE 49

EG Course Deep Learning for Graphics

Outputs

A Neural Network in Backward Mode

slide-50
SLIDE 50

EG Course Deep Learning for Graphics

Hidden layer Outputs This we want ?

A Neural Network in Backward Mode

slide-51
SLIDE 51

EG Course Deep Learning for Graphics

Hidden layer Outputs This we want ?

A Neural Network in Backward Mode

slide-52
SLIDE 52

EG Course Deep Learning for Graphics

Linear Layer in Forward Mode: All For One

slide-53
SLIDE 53

EG Course Deep Learning for Graphics

Linear Layer in Backward Mode: One From All

slide-54
SLIDE 54

EG Course Deep Learning for Graphics

Linear Layer Parameters in Backward: 1-to-1

slide-55
SLIDE 55

EG Course Deep Learning for Graphics

Outputs This we want Hidden layer This we have This we computed

A Neural Network in Backward Mode

slide-56
SLIDE 56

EG Course Deep Learning for Graphics

Hidden layer Outputs This we want This we have This we computed

A Neural Network in Backward Mode

slide-57
SLIDE 57

EG Course Deep Learning for Graphics

Hidden layer Outputs

A Neural Network in Backward Mode

slide-58
SLIDE 58

EG Course Deep Learning for Graphics

Hidden layer Outputs

A Neural Network in Backward Mode

slide-59
SLIDE 59

EG Course Deep Learning for Graphics

Stochastic Gradient Descent, Momentum, “weight decay” Dropout ReLUs Batch Normalization Old: New: (last 5-6 years)

Neural Network Training: Old & New Tricks

Back-propagation algorithm

slide-60
SLIDE 60

EG Course Deep Learning for Graphics

Back-prop for i-th example If N=106 , we will need to run back-prop 106 times to update W once! Gradient descent: (l,k,m) element of gradient vector: Per-sample loss Per-layer regularization

Training Objective for N training samples

slide-61
SLIDE 61

EG Course Deep Learning for Graphics

Gradient: Noisy (‘Stochastic’) Gradient: b(1), b(2),…, b(B): sampled from [1,N] Minibatch: B elements Epoch: N samples, N/B batches Batch: [1..N]

Stochastic Gradient Descent (SGD)

slide-62
SLIDE 62

EG Course Deep Learning for Graphics

Code example

Gradient Descent vs Stochastic Gradient Descent

62

slide-63
SLIDE 63

EG Course Deep Learning for Graphics

Back-prop on minibatch ‘’Weight decay’’ Gradient: Noisy (‘Stochastic’) Gradient: b(1), b(2),…, b(B): sampled from [1,N] Minibatch: B elements Epoch: N samples, N/B batches Batch: [1..N]

Regularization in SGD: Weight Decay

slide-64
SLIDE 64

EG Course Deep Learning for Graphics

Learning rate

slide-65
SLIDE 65

EG Course Deep Learning for Graphics

Gradient Descent

slide-66
SLIDE 66

EG Course Deep Learning for Graphics

e.g.

(S)GD with adaptable stepsize

slide-67
SLIDE 67

EG Course Deep Learning for Graphics

Main idea: retain long-term trend of updates, drop oscillations (S)GD (S)GD + momentum

(S)GD with momentum

slide-68
SLIDE 68

EG Course Deep Learning for Graphics

Code example

68

Multi-layer perceptron classification

slide-69
SLIDE 69

EG Course Deep Learning for Graphics

  • Nesterov’s Accelerated Gradient (NAG)
  • R-prop
  • AdaGrad
  • RMSProp
  • AdaDelta
  • Adam

Step-size Selection & Optimizers: research problem

slide-70
SLIDE 70

EG Course Deep Learning for Graphics

Stochastic Gradient Descent, Momentum, “weight decay” Dropout ReLUs Batch Normalization Old: (80’s) New: (last 5-6 years)

Neural Network Training: Old & New Tricks

slide-71
SLIDE 71

EG Course Deep Learning for Graphics

Linearization: may need higher dimensions

http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

slide-72
SLIDE 72

EG Course Deep Learning for Graphics

Reminder: Overfitting, in images

Classification Regression

just right

slide-73
SLIDE 73

EG Course Deep Learning for Graphics

Per-sample loss Per-layer regularization

Previously: l2 Regularization

slide-74
SLIDE 74

EG Course Deep Learning for Graphics

Each sample is processed by a ‘decimated’ neural net Decimated nets: distinct classifiers But: they should all do the same job

Dropout

slide-75
SLIDE 75

EG Course Deep Learning for Graphics

‘Feature noising’

Dropout block

slide-76
SLIDE 76

EG Course Deep Learning for Graphics

Test time: Deterministic Approximation

slide-77
SLIDE 77

EG Course Deep Learning for Graphics

Dropout Performance

slide-78
SLIDE 78

EG Course Deep Learning for Graphics

Stochastic Gradient Descent, Momentum, “weight decay” Dropout ReLUs Batch Normalization Old: (80’s) New: (last 5-6 years)

Neural Network Training: Old & New Tricks

slide-79
SLIDE 79

EG Course Deep Learning for Graphics

Sigmoidal (“logistic”) Rectified Linear Unit (RELU)

‘Neuron’: Cascade of Linear and Nonlinear Function

slide-80
SLIDE 80

EG Course Deep Learning for Graphics

Outputs

Gradient signal from above scaling: <1 (actually <0.25)

Reminder: a network in backward mode

slide-81
SLIDE 81

EG Course Deep Learning for Graphics

Gradient signal from above scaling: <1 (actually <0.25)

Do this 10 times: updates in the first layers get minimal Top layer knows what to do, lower layers “don’t get it” Sigmoidal Unit: Signal is not getting through!

Vanishing Gradients Problem

slide-82
SLIDE 82

EG Course Deep Learning for Graphics

Scaling: {0,1} Gradient signal from above

Vanishing Gradients Problem: ReLU Solves It

slide-83
SLIDE 83

EG Course Deep Learning for Graphics

Stochastic Gradient Descent, Momentum, “weight decay” Dropout ReLUs Batch Normalization Old: (80’s) New: (last 5-6 years)

Neural Network Training: Old & New Tricks

slide-84
SLIDE 84

EG Course Deep Learning for Graphics

10 am 2pm 7pm

External Covariate Shift: your input changes

slide-85
SLIDE 85

EG Course Deep Learning for Graphics

  • Make each patch have zero mean:
  • Then make it have unit variance:

Photometric transformation: I  a I + b

“Whitening”: Set Mean = 0, Variance = 1

slide-86
SLIDE 86

EG Course Deep Learning for Graphics

Neural network activations during training: moving target

Internal Covariate Shift

slide-87
SLIDE 87

EG Course Deep Learning for Graphics

Whiten-as-you-go:

Batch Normalization

slide-88
SLIDE 88

EG Course Deep Learning for Graphics

Batch Normalization: used in all current systems

slide-89
SLIDE 89

Convolutional Neural Networks

slide-90
SLIDE 90

EG Course Deep Learning for Graphics

Example: 200x200 image 40K hidden units ~2B parameters!!!

  • Spatial correlation is local
  • Waste of resources
  • we have not enough training samples anyway..

Fully-connected Layer

slide-91
SLIDE 91

EG Course Deep Learning for Graphics

Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters

Locally-connected Layer

Note: This parameterization is good when input image is registered (e.g., face recognition).

slide-92
SLIDE 92

EG Course Deep Learning for Graphics

Note: This parameterization is good when input image is registered (e.g., face recognition). Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters

Locally-connected Layer

slide-93
SLIDE 93

EG Course Deep Learning for Graphics

Share the same parameters across different locations (assuming input is stationary): Convolutions with learned kernels

Convolutional Layer

slide-94
SLIDE 94

EG Course Deep Learning for Graphics

Convolutional Layer

slide-95
SLIDE 95

EG Course Deep Learning for Graphics

Convolutional Layer

slide-96
SLIDE 96

EG Course Deep Learning for Graphics

Convolutional Layer

slide-97
SLIDE 97

EG Course Deep Learning for Graphics

Convolutional Layer

slide-98
SLIDE 98

EG Course Deep Learning for Graphics

Convolutional Layer

slide-99
SLIDE 99

EG Course Deep Learning for Graphics

Convolutional Layer

slide-100
SLIDE 100

EG Course Deep Learning for Graphics

Convolutional Layer

slide-101
SLIDE 101

EG Course Deep Learning for Graphics

Convolutional Layer

slide-102
SLIDE 102

EG Course Deep Learning for Graphics

Convolutional Layer

slide-103
SLIDE 103

EG Course Deep Learning for Graphics

Convolutional Layer

slide-104
SLIDE 104

EG Course Deep Learning for Graphics

Convolutional Layer

slide-105
SLIDE 105

EG Course Deep Learning for Graphics

Convolutional Layer

slide-106
SLIDE 106

EG Course Deep Learning for Graphics

Convolutional Layer

slide-107
SLIDE 107

EG Course Deep Learning for Graphics

Convolutional Layer

slide-108
SLIDE 108

EG Course Deep Learning for Graphics

Convolutional Layer

slide-109
SLIDE 109

EG Course Deep Learning for Graphics

Convolutional Layer

slide-110
SLIDE 110

EG Course Deep Learning for Graphics

Fully-connected layer

#of parameters: K2

slide-111
SLIDE 111

EG Course Deep Learning for Graphics

#of parameters: size of window

Convolutional layer

slide-112
SLIDE 112

EG Course Deep Learning for Graphics

*

  • 1 0 1
  • 1 0 1
  • 1 0 1 =

Convolutional layer

slide-113
SLIDE 113

EG Course Deep Learning for Graphics

Code example

Learning an edge filter

113

slide-114
SLIDE 114

EG Course Deep Learning for Graphics

Learn multiple filters.

E.g.: 200x200 image 100 Filters Filter size: 10x10 10K parameters

Convolutional layer

slide-115
SLIDE 115

EG Course Deep Learning for Graphics

Conv. layer

h1

n− 1

h2

n− 1

h3

n− 1

h1

n

h2

n

  • utput

feature map input feature map kernel

Convolutional layer

slide-116
SLIDE 116

EG Course Deep Learning for Graphics

h1

n− 1

h2

n− 1

h3

n− 1

h1

n

h2

n

  • utput

feature map input feature map kernel

Convolutional layer

slide-117
SLIDE 117

EG Course Deep Learning for Graphics

h1

n− 1

h2

n− 1

h3

n− 1

h1

n

h2

n

  • utput

feature map input feature map kernel

Convolutional layer

slide-118
SLIDE 118

EG Course Deep Learning for Graphics

Pooling layer

slide-119
SLIDE 119

EG Course Deep Learning for Graphics

Pooling layer

slide-120
SLIDE 120

EG Course Deep Learning for Graphics

Pooling layer: receptive field size

slide-121
SLIDE 121

EG Course Deep Learning for Graphics

Pooling layer: receptive field size

slide-122
SLIDE 122

EG Course Deep Learning for Graphics

Receptive field

slide-123
SLIDE 123

EG Course Deep Learning for Graphics

Receptive field: layer 1

slide-124
SLIDE 124

EG Course Deep Learning for Graphics

Receptive field: layer 2

slide-125
SLIDE 125

EG Course Deep Learning for Graphics

Receptive field: layer 3

slide-126
SLIDE 126

EG Course Deep Learning for Graphics

Receptive field: layer 4

slide-127
SLIDE 127

EG Course Deep Learning for Graphics

Receptive field: layer 5

slide-128
SLIDE 128

EG Course Deep Learning for Graphics

Receptive field: layer 6

slide-129
SLIDE 129

EG Course Deep Learning for Graphics

Receptive field: layer 7

slide-130
SLIDE 130

EG Course Deep Learning for Graphics

Receptive field: layer 8

slide-131
SLIDE 131

Modern Architectures

slide-132
SLIDE 132

EG Course Deep Learning for Graphics

INPUT 32x32

Convolutions Subsampling Convolutions

C1: feature maps 6@28x28

Subsampling

S2: f. maps 6@14x14 S4: f. maps 16@5x5 C5: layer 120 C3: f. maps 16@10x10 F6: layer 84

Full connection Full connection Gaussian connections

OUTPUT 10

Gradient-based learning applied to document recognition.

  • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. 1998

https://www.youtube.com/watch?v=FwFduRA_L6Q

CNNs, late 1980’s: LeNet

slide-133
SLIDE 133

EG Course Deep Learning for Graphics

deep learning = neural networks (+ big data + GPUs) + a few more recent tricks!

What happened in between?

slide-134
SLIDE 134

EG Course Deep Learning for Graphics

Code example

Convolutional Network and Filter Visualizations

134

slide-135
SLIDE 135

EG Course Deep Learning for Graphics

Parameter Initialization

  • All zero initialization: All parameters get the same gradient and same

updates

  • Random initialization: Is sometimes used in practice, but variance of output

depends on number of inputs, which may cause instability early on

  • Kaiming initialization: divide standard deviation by number of inputs

135

: number of inputs (fan-in)

slide-136
SLIDE 136

Code examples

Parameter initialization

136

slide-137
SLIDE 137

EG Course Deep Learning for Graphics

AlexNet Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton: ImageNet classification with deep convolutional neural

  • networks. Commun. ACM 60(6): 84-90 (2017)

CNNs, 2012

slide-138
SLIDE 138

EG Course Deep Learning for Graphics

Karen Simonyan, Andrew Zisserman (=Visual Geometry Group) Very Deep Convolutional Networks for Large-Scale Image Recognition, arxiv, 2014.

CNNs, 2014: VGG

slide-139
SLIDE 139

EG Course Deep Learning for Graphics

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich Going Deeper with Convolutions, CVPR 2015

CNNs, 2014: GoogLeNet

slide-140
SLIDE 140

EG Course Deep Learning for Graphics

ResNet Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition CVPR 2016

CNNs, 2015: ResNet

slide-141
SLIDE 141

EG Course Deep Learning for Graphics

  • Deeper networks can cover more complex problems

– Increasingly large receptive field size & rich patterns

The Deeper, the Better

slide-142
SLIDE 142

EG Course Deep Learning for Graphics

  • From 2 to 10: 2010-2012

– ReLUs – Dropout – …

Going Deeper

slide-143
SLIDE 143

EG Course Deep Learning for Graphics

  • From 10 to 20: 2015

– Batch Normalization

Going Deeper

slide-144
SLIDE 144

EG Course Deep Learning for Graphics

  • From 20 to 100/1000

– Residual networks

Going Deeper

slide-145
SLIDE 145

EG Course Deep Learning for Graphics

  • Plain nets: stacking 3x3 conv layers
  • 56-layer net has higher training error and test error than 20-layer net

Plain network: deeper is not necessarily better

slide-146
SLIDE 146

EG Course Deep Learning for Graphics

  • Naïve solution

– If extra layers are an identity mapping, then training errors can not increase

Residual Network

slide-147
SLIDE 147

EG Course Deep Learning for Graphics

  • Goal: estimate update between an original image and a changed image

Some Network residual Preserving base information can treat perturbation

Residual Modelling: Basic Idea in Image Processing

slide-148
SLIDE 148

EG Course Deep Learning for Graphics

  • Plain block

– Difficult to make identity mapping because of multiple non-linear layers

Residual Network

slide-149
SLIDE 149

EG Course Deep Learning for Graphics

  • Residual block

– If identity were optimal, easy to set weights as 0 – If optimal mapping is closer to identity, easier to find small fluctuations Appropriate for treating perturbation as keeping a base information

Residual Network

slide-150
SLIDE 150

EG Course Deep Learning for Graphics

  • Deeper ResNets have lower training error

Residual Network: deeper is better

slide-151
SLIDE 151

EG Course Deep Learning for Graphics

Residual Network: deeper is better

slide-152
SLIDE 152

EG Course Deep Learning for Graphics

CNNs, 2017: DenseNet

Densely Connected Convolutional Networks, CVPR 2017 Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger Recently proposed, better performance/parameter ratio

slide-153
SLIDE 153

Image-to-Image

slide-154
SLIDE 154

EG Course Deep Learning for Graphics

Image-to-image

  • So far we mapped an image image to a number or label
  • In graphics, output often is “richer”:

– An image – A volume – A 3D mesh – …

  • Architectures

– Encoder-Decoder – Skip connections

slide-155
SLIDE 155

EG Course Deep Learning for Graphics

FCNN

Fully-convolutional Neural Networks

slide-156
SLIDE 156

EG Course Deep Learning for Graphics

FCNN

Fully-convolutional Neural Networks

slide-157
SLIDE 157

EG Course Deep Learning for Graphics

FCNN

Fully-convolutional Neural Networks

slide-158
SLIDE 158

EG Course Deep Learning for Graphics

FCNN

Fully-convolutional Neural Networks

slide-159
SLIDE 159

EG Course Deep Learning for Graphics

FCNN

Fast (shared convolutions) Simple (dense)

Fully-convolutional Neural Networks

slide-160
SLIDE 160

EG Course Deep Learning for Graphics

FCNN

Fast (shared convolutions) Simple (dense) Low resolution

32-fold decimation 224x224 to 7x7

Fully Convolutional Neural Networks in Practice

slide-161
SLIDE 161

EG Course Deep Learning for Graphics

https://medium.com/mlreview/a-guide-to-receptive-field-arithmetic-for-convolutional-neural-networks-e0f514068807

Receptive field arithmetic

slide-162
SLIDE 162

EG Course Deep Learning for Graphics

downsample x 2 convolve ‘implant’ in image coordinates filter ’atrous’

  • S. Mallat, An introduction to wavelets, 1989

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille

Atrous convolution

slide-163
SLIDE 163

EG Course Deep Learning for Graphics

  • F. Yu, V. Koltun, Multi-Scale Context Aggregation by Dilated Convolutions, ICLR 2016

Atrous convolution = Dilated Convolution

slide-164
SLIDE 164

EG Course Deep Learning for Graphics

Graphics: Multiresolution

slide-165
SLIDE 165

EG Course Deep Learning for Graphics

Encoder-decoder

Space Space Features

slide-166
SLIDE 166

EG Course Deep Learning for Graphics

Interpretation

  • Turns image into vector
  • This vector is a very compact and abstract “code”
  • Turns code back into image
slide-167
SLIDE 167

EG Course Deep Learning for Graphics

Encoder-decoder

Learning to simplify. Simo-Serra et al. 2016

slide-168
SLIDE 168

EG Course Deep Learning for Graphics

Code example

Colorization Network

168

slide-169
SLIDE 169

EG Course Deep Learning for Graphics

Up-sampling

  • We saw

– … how to keep resolution – … how to reduce it with pooling

  • But how to increase it again?
  • Options

– Interpolation – Padding (insert zeros) – Transpose convolutions

slide-170
SLIDE 170

EG Course Deep Learning for Graphics

Encoder-decoder + Skip connections

  • 1st: Reduce resolutions as before
  • 2nd: Increase resolution
  • Transposed convolutions

U-Net: Convolutional Networks for Biomedical Image Segmentatio. Ronneberger et al. 2015

slide-171
SLIDE 171

EG Course Deep Learning for Graphics

Encoder-decoder with skip connections

Features Skip link Space Space

slide-172
SLIDE 172

EG Course Deep Learning for Graphics

Interpretation

  • Turns image into vector
  • Turns vector back into image
  • At every step of increasing the resolution, check back with the input to preserve details
  • Familiar trick to graphics people

– (Haar) wavelet – Residual coding – Pyramidal schemes (Laplacian pyramid, etc.)

slide-173
SLIDE 173

Deep Learning Frameworks

slide-174
SLIDE 174

EG Course Deep Learning for Graphics

(Python) (Python, C++, Java) (C++, Python, Matlab) (Python, backends support other languages)

Main frameworks Currently less frequently used

(Python, C++, C#) (Python, C++, and others) (Matlab) (Python, Java, Scala) (Python) (Python, C++) (Python)

slide-175
SLIDE 175

EG Course Deep Learning for Graphics

Popularity

Google Trends for search terms: “[name] tutorial” Google Trends for search terms: “[name] github”

slide-176
SLIDE 176

EG Course Deep Learning for Graphics

Typical Training Steps

for i = 1 .. max_iterations input, ground_truth = load_minibatch(data, i)

  • utput = network_evaluate(input, parameters)

loss = compute_loss(output, ground_truth) # gradients of loss with respect to parameters gradients = network_backpropagate(loss, parameters) parameters = optimizer_step(parameters, gradients)

slide-177
SLIDE 177

EG Course Deep Learning for Graphics

Tensors

  • Frameworks typically represent data as tensors
  • Examples:

feature channels C spatial width W spatial height H batches B

4D convolution kernel: OC x IC x KH x KW 4D input data: B x C x H x W

input channels IC kernel height KH kernel width KW

  • utput channels OC
slide-178
SLIDE 178

EG Course Deep Learning for Graphics

What Does a Deep Learning Framework Do?

  • Tensor math
  • Common network operations/layers
  • Gradients of common operations
  • Backpropagation
  • Optimizers
  • GPU implementations of the above
  • usually: data loading, network parameter saving/loading
  • sometimes: distributed computing
slide-179
SLIDE 179

EG Course Deep Learning for Graphics

Automatic Differentiation & the Computation Graph

parameters = (weight, bias)

  • utput = σ(weight * input + bias)

loss = (output - ground_truth)^2 # gradients of loss with respect to parameters gradients = backpropagate(loss, parameters)

weight input bias

+ *

ground_truth

  • ^

2 loss

  • utput

σ

𝑝1 𝑝2 𝑝3

+ *

𝜖 loss 𝜖 weight

  • ^

loss

σ

𝜖 loss 𝜖 bias 𝜖 loss 𝜖 𝑝1 𝜖 loss 𝜖 𝑝2 𝜖 loss 𝜖 output 𝜖 loss 𝜖 𝑝3

forward pass backward pass

Since loss is a scalar, the gradients are the same size as the parameters

slide-180
SLIDE 180

EG Course Deep Learning for Graphics

Automatic Differentiation & the Computation Graph 𝑔

inputs

  • utputs

𝑔

  • utputs = forward(inputs, parameters)

𝜖 loss 𝜖 parameters 𝜖 loss 𝜖 inputs 𝜖 loss 𝜖 outputs

parameters

𝜖 loss 𝜖 inputs , 𝜖 loss 𝜖 parameters = backward( 𝜖 loss 𝜖 outputs)

slide-181
SLIDE 181

EG Course Deep Learning for Graphics

Static vs Dynamic Computation Graphs

  • Static analysis allows optimizations and distributing workload
  • Dynamic graphs make data-driven control flow easier
  • In static graphs, the graph is usually defined in a separate ‘language’
  • Static graphs have less support for debugging

Static Dynamic

define once, evaluate during training define implicitly by running operations, a new graph is created in each evaluation

x = Variable() loss = if_node(x < parameter[0], x + parameter[0], x - parameter[1]) for i = 1 .. max_iterations x = data() run(loss) backpropagate(loss, parameters) for i = 1 .. max_iterations x = data() if x < parameter[0] loss = x + parameter[0] else loss = x – parameter[1] backpropagate(loss, parameters)

slide-182
SLIDE 182

EG Course Deep Learning for Graphics

Tensorflow

  • Currently the largest community
  • Static graphs (dynamic graphs are in development: Eager Execution)
  • Good support for deployment
  • Good support for distributed computing
  • Typically slower than the other three main frameworks on a single GPU
slide-183
SLIDE 183

EG Course Deep Learning for Graphics

PyTorch

  • Fast growing community
  • Dynamic graphs
  • Distributed computing is in development (some support is already available)
  • Intuitive code, easy to debug and good for experimenting with less traditional

architectures due to dynamic graphs

  • Very Fast
slide-184
SLIDE 184

EG Course Deep Learning for Graphics

Keras

  • A high-level interface for various backends (Tensorflow, CNTK, Theano)
  • Intuitive high-level code
  • Focus on optimizing time from idea to code
  • Static graphs
slide-185
SLIDE 185

EG Course Deep Learning for Graphics

Caffe

  • Created earlier than Tensorflow, PyTorch or Keras
  • Less flexible and less general than the other three frameworks
  • Static graphs
  • Legacy - to be replaced by Caffe2: focus is on performance and deployment

– Facebook’s platform for Detectron (Mask-RCNN, DensePose, …)

slide-186
SLIDE 186

EG Course Deep Learning for Graphics

Converting Between Frameworks

  • Example: develop in one framework, deploy in another
  • Currently: a large range of converters, but no clear standard
  • Standardized model formats are in development

convertor tensorflow pytorch keras caffe caffe2 CNTK chainer mxnet tensorflow

  • pytorch-tf/

MMdnn model-converters/ nn_toolsconvert-to- tensorflow/MMdnn MMdnn/ nn_tools None crosstalk/MMdnn None MMdnn pytorch pytorch2keras (over Keras)

  • Pytorch2keras/

nn-transfer Pytorch2caffe/pytorch- caffe-darknet-convert

  • nnx-caffe2

ONNX None None keras nn_tools /convert-to- tensorflow/keras_to_ten sorflow/keras_to_tensorf low/MMdnn MMdnn/ nn-transfer

  • MMdnnnn_tools

None MMdnn None MMdnn caffe MMdnn/nn_tools/caffe- tensorflow MMdnn/ pytorch- caffe-darknet- convert/ pytorch- resnet caffe_weight_converter/ caffe2keras/nn_tools/ kerascaffe2keras/ Deep_Learning_Model_ Converter/MMdnn

  • CaffeToCaffe2 crosstalkcaffe/CaffeCon

verterMMdnn None mxnet/tools/caffe_conve rter/ResNet_caffe2mxne t/MMdnn caffe2 None ONNX None None

  • ONNX

None None CNTK MMdnn ONNX MMdnn MMdnn MMdnn ONNX

  • None

MMdnn chainer None chainer2pytorch None None None None

  • None

mxnet MMdnn MMdnn MMdnn MMdnn/MXNet2Caffe/ Mxnet2Caffe None MMdnn None

  • from https://github.com/ysh329/deep-learning-model-convertor
slide-187
SLIDE 187

EG Course Deep Learning for Graphics

MMdnn

  • Standard format for models
  • Native support in

development for Pytorch, Caffe2, Chainer, CNTK, and MxNet

  • Converter in development for

Tensorflow

  • Converters

available for several frameworks

  • Common intermediate

representation, but no clear standard

slide-188
SLIDE 188

EG Course Deep Learning for Graphics

The end