Deep Learning Europython 2016 - Bilbao G. French University of - - PowerPoint PPT Presentation

deep learning
SMART_READER_LITE
LIVE PREVIEW

Deep Learning Europython 2016 - Bilbao G. French University of - - PowerPoint PPT Presentation

Deep Learning Europython 2016 - Bilbao G. French University of East Anglia Image montages from http://www.image-net.org Focus: Mainly image processing This talk is more about the principles and the maths than code Got to fit this into 1


slide-1
SLIDE 1

Deep Learning

Europython 2016 - Bilbao

  • G. French

University of East Anglia

Image montages from http://www.image-net.org

slide-2
SLIDE 2

Focus: Mainly image processing

slide-3
SLIDE 3

This talk is more about the principles and the maths than code Got to fit this into 1 hour!

slide-4
SLIDE 4

What we’ll cover

slide-5
SLIDE 5

Theano

What it is and how it works

What is a neural network?

The basic model; the multi-layer perceptron

Convolutional networks

Neural networks for computer vision

slide-6
SLIDE 6

Lasagne

The Lasagne neural network library

Notes for building neural networks

A few tips on building and training neural networks

OxfordNet / VGG and transfer learning

Using a convolutional network trained by the VGG group at Oxford University and re-purposing it for your needs

slide-7
SLIDE 7

Talk materials

slide-8
SLIDE 8

Github Repo (originally for PyData London):

https://github.com/Britefury/deep-learning-tutorial-pydata2016

The notebooks are viewable on Github

slide-9
SLIDE 9

Intro to Theano and Lasagne slides: https://speakerdeck.com/britefury

https://speakerdeck.com/britefury/intro-to-theano-and-lasagne-for-deep-learning

slide-10
SLIDE 10

Amazon AMI (Use GPU machine) AMI ID: ami-e0048af7 AMI Name:

Britefury deep learning - Ubuntu-14.04 Anaconda2- 4.0.0 Cuda-7.5 cuDNN-5 Theano-0.8 Lasagne Fuel

slide-11
SLIDE 11

ImageNet

slide-12
SLIDE 12

Image classification dataset

slide-13
SLIDE 13

~1,000,000 images ~1,000 classes Ground truths prepared manually through Amazon Mechanical Turk

slide-14
SLIDE 14

ImageNet Top-5 challenge: You score if ground truth class is one your top 5 predictions

slide-15
SLIDE 15

ImageNet in 2012 Best approaches used hand-crafted features (SIFT, HOGs, Fisher vectors, etc) + classifier Top-5 error rate: ~25%

slide-16
SLIDE 16

Then the game changed.

slide-17
SLIDE 17

Krizhevsky, Sutskever and Hinton; ImageNet Classification with Deep Convolutional Neural networks [Krizhevsky12] Top-5 error rate of ~15%

slide-18
SLIDE 18

In the last few years, more modern networks have achieved better results still [Simonyan14, He15] Top-5 error rates of ~5-7%

slide-19
SLIDE 19

I hope this talk will give you an idea of how!

slide-20
SLIDE 20

Theano

slide-21
SLIDE 21

Neural network software comes in two flavours: Neural network toolkits Expression compilers

slide-22
SLIDE 22

Neural network toolkit Specify structure of neural network in terms of layers

slide-23
SLIDE 23

Expression compilers Lower level Describe the mathematical expressions behind the layers More powerful and flexible

slide-24
SLIDE 24

Theano An expression compiler

slide-25
SLIDE 25

Write NumPy style expressions Compiles to either C (CPU) or CUDA (nVidia GPU)

slide-26
SLIDE 26

Intro to Theano and Lasagne slides: https://speakerdeck.com/britefury

https://speakerdeck.com/britefury/intro-to-theano-and-lasagne-for-deep-learning

slide-27
SLIDE 27

There is much more to Theano For more information: http://deeplearning.net/tutorial http://deeplearning.net/software/theano

slide-28
SLIDE 28

There are others Tensorflow – developed by Google – is gaining popularity fast

slide-29
SLIDE 29

What is a neural network?

slide-30
SLIDE 30

Multiple layers Data propagates through layers Transformed by each layer

slide-31
SLIDE 31

Neural network image classifier

Inputs Outputs 𝑄 𝑑𝑏𝑢 = 0.003 𝑄 𝑒𝑝𝑕 = 0.002 𝑄 𝑑𝑏𝑠 = 0.005 𝑄 𝑐𝑏𝑜𝑏𝑜𝑏𝑡 = 0.9

Class probabilities

Hidden Hidden

slide-32
SLIDE 32

Neural network

Input layer Hidden layer 0 Hidden layer 1 Output layer

⋯ Inputs Outputs ⋯

slide-33
SLIDE 33

Single layer of a neural network

𝑔(𝑦)

Input vector Weighted connections Bias Activation function / non-linearity Layer activation

slide-34
SLIDE 34

𝑦 = input (M-element vector) 𝑧 = output (N-element vector) 𝑋 = weights parameter (NxM matrix) 𝑐 = bias parameter (N-element vector) 𝑔 = non-linearity (a.k.a. activation function); normally ReLU but can be tanh or sigmoid 𝑧 = 𝑔(𝑋𝑦 + 𝑐)

slide-35
SLIDE 35

In a nutshell: 𝑧 = 𝑔(𝑋𝑦 + 𝑐)

slide-36
SLIDE 36

Repeat for each layer

Input vector

𝑔(𝑋𝑦 + 𝑐)

Hidden layer 0 activation

𝑔(𝑋𝑦 + 𝑐)

Hidden layer 1 activation

𝑔(𝑋𝑦 + 𝑐)

Final layer activation (output)

𝑔(𝑋𝑦 + 𝑐) ⋯

slide-37
SLIDE 37

In mathematical notation: 𝑧; = 𝑔(𝑋

;𝑦 + 𝑐;)

𝑧< = 𝑔 𝑋

<𝑧; + 𝑐<

⋯ 𝑧= = 𝑔(𝑋

=𝑧=>< + 𝑐=)

slide-38
SLIDE 38

As a classifier

Input vector Hidden layer 0 activation Final layer activation with softmax non-linearity

Image pixels

𝑄 𝑑𝑏𝑢 = 0.003 𝑄 𝑒𝑝𝑕 = 0.002 𝑄 𝑑𝑏𝑠 = 0.005 𝑄 𝑐𝑏𝑜𝑏𝑜𝑏𝑡 = 0.9

Class probabilities

slide-39
SLIDE 39

Summary; a neural network is: Built from layers, each of which is: a matrix multiplication, then add bias, then apply non-linearity.

slide-40
SLIDE 40

Training a neural network

slide-41
SLIDE 41

Learn values for parameters; 𝑋 and 𝑐 (for each layer) Use back-propagation

slide-42
SLIDE 42

Initialise weights randomly (more on this later) Initialise biases to 0

slide-43
SLIDE 43

For each example 𝑦?@ABC from training set evaluate network prediction 𝑧D@EF given the training input; 𝑦 = 𝑦?@ABC Measure cost 𝑑 (error); difference between 𝑧D@EF and ground truth output 𝑧?@ABC

slide-44
SLIDE 44

Classification (which of these categories best describes this?) Final layer: softmax as non-linearity 𝑔;

  • utput vector of class probabilities

Cost: negative-log-likelihood / categorical cross-entropy

slide-45
SLIDE 45

Regression (quantify something, real-valued output) Final layer: no non-linearity / identity as 𝑔 Cost: Sum of squared differences

slide-46
SLIDE 46

Reduce cost 𝑑 (also known as loss) using gradient descent

slide-47
SLIDE 47

Compute the derivative (gradient) of cost w.r.t. parameters (all 𝑋 and 𝑐)

slide-48
SLIDE 48

Theano performs symbolic differentiation for you! dCdW = theano.grad(cost, W) (other toolkits – such as Torch and Tensorflow – can also do this)

slide-49
SLIDE 49

Update parameters: 𝑋

; G = 𝑋 ; − 𝛿 FJ FK

L

𝑐;

G = 𝑐; − 𝛿 FJ FML

γ = learning rate

slide-50
SLIDE 50

Randomly split the training set into mini-batches of ~100 samples. Train on a mini-batch in a single step. The mini-batch cost is the mean of the costs of the samples in the mini-batch.

slide-51
SLIDE 51

Training on mini-batches means that ~100 samples are processed in parallel – very good for running GPUs that do lots

  • f operations in parallel
slide-52
SLIDE 52

Training on all examples in the training set is called an epoch Run multiple epochs (often 200-300)

slide-53
SLIDE 53

Summary; train a neural network: Take mini-batch of training samples Evaluate (run/execute) the network Measure the average error/cost across mini- batch Use gradient descent to modify parameters to reduce cost REPEAT ABOVE UNTIL DONE

slide-54
SLIDE 54

Multi-layer perceptron

slide-55
SLIDE 55

Simplest network architecture Nothing we haven’t seen so far Uses only fully-connected / dense layers

slide-56
SLIDE 56

Dense layer: each unit is connected too all units in previous layer

slide-57
SLIDE 57

(Obligatory) MNIST example: 2 hidden layers, both 256 units after 300 iterations over training set: 1.83% validation error

Input Hidden 784 (28x28 images) 256 Hidden Output 256 10

slide-58
SLIDE 58

MNIST is quite a special case Digits nicely centred within image Scaled to approx. same size

slide-59
SLIDE 59

The fully connected networks so far have a weakness: No translation invariance; learned features are position dependent

slide-60
SLIDE 60

For more general imagery: requires a training set large enough to see all features in all possible positions… Requires network with enough units to represent this…

slide-61
SLIDE 61

Convolutional networks

slide-62
SLIDE 62

Convolution Slide a convolution kernel over an image Multiply image pixels by kernel pixels and sum

slide-63
SLIDE 63

Convolution Convolutions are often used for feature detection

slide-64
SLIDE 64

A brief detour…

slide-65
SLIDE 65

Gabor filters

slide-66
SLIDE 66

Back on track to… Convolutional networks

slide-67
SLIDE 67

Recap: FC (fully-connected) layer

𝑔(𝑦)

Input vector Weighted connections Bias Activation function (non-linearity) Layer activation

slide-68
SLIDE 68

Convolutional layer

Each unit only connected to units in its neighbourhood

slide-69
SLIDE 69

Convolutional layer

Weights are shared Red weights have same value As do greens… And yellows

slide-70
SLIDE 70

The values of the weights form a convolution kernel For practical computer vision, more an

  • ne kernel must be used to extract a

variety of features

slide-71
SLIDE 71

Convolutional layer

Different weight-kernels: Output is image with multiple channels

slide-72
SLIDE 72

Note Each kernel connects to pixels in ALL channels in previous layer

slide-73
SLIDE 73

Still 𝑧 = 𝑔(𝑋𝑦 + 𝑐) As convolution can be expressed as multiplication by weight matrix

slide-74
SLIDE 74

Down-sampling In typical networks for computer vision, we need to shrink the resolution after a layer, by some constant factor Use max-pooling or striding

slide-75
SLIDE 75

Down-sampling: max-pooling ‘layer’ [Ciresan12] Take maximum value from each 2 x 2 pooling region (𝑞 x 𝑞) in the general case Down-samples image by factor 𝑞 Operates on channels independently

slide-76
SLIDE 76

Down-sampling: striding Can also down-sample using strided convolution; generate output for 1 in every 𝑜 pixels Faster, can work as well as max-pooling

slide-77
SLIDE 77

Example: A Simplified LeNet [LeCun95] for MNIST digits

slide-78
SLIDE 78

Simplified LeNet for MNIST digits

28 28 24 24

Input Output

1 20

Conv: 20 5x5 kernels Maxpool 2x2

12 8 8 20 50 4 4 50

Conv: 50 5x5 kernels Maxpool 2x2

256 10

Fully connected (flatten and) fully connected

12

slide-79
SLIDE 79

after 300 iterations over training set: 99.21% validation accuracy

Model Error FC64 2.85% FC256--FC256 1.83% 20C5--MP2--50C5--MP2--FC256 0.79%

slide-80
SLIDE 80

What about the learned kernels?

Image taken from paper [Krizhevsky12] (ImageNet dataset, not MNIST) Gabor filters

slide-81
SLIDE 81

Image taken from [Zeiler14]

slide-82
SLIDE 82

Image taken from [Zeiler14]

slide-83
SLIDE 83

Lasagne

slide-84
SLIDE 84

Specifying your network as mathematical expressions is powerful but low-level

slide-85
SLIDE 85

Lasagne is a neural network library built

  • n Theano

Makes building networks with Theano much easier

slide-86
SLIDE 86

Provides API for: constructing layers of a network getting Theano expressions representing output, loss, etc.

slide-87
SLIDE 87

Lasagne is quite a thin layer on top of Theano, so understanding Theano is helpful On the plus side, implementing custom layers, loss functions, etc is quite doable.

slide-88
SLIDE 88

Intro to Theano and Lasagne slides: https://speakerdeck.com/britefury

https://speakerdeck.com/britefury/intro-to-theano-and-lasagne-for-deep-learning

slide-89
SLIDE 89

Notes for building and training neural networks

slide-90
SLIDE 90

Neural network architecture (OxfordNet / VGG style)

slide-91
SLIDE 91

# Layer Input: 3 x 224 x 224

(RGB image, zero-mean)

1 64C3 2 64C3 MP2 3 128C3 4 128C3 MP2

Early part Blocks consisting

  • f:

A few convolutional layers, often 3x3 kernels

  • followed by -

Down-sampling; max-pooling or striding

64C3 = 3x3 conv, 64 filters MP2 = max-pooling, 2x2

slide-92
SLIDE 92

# Layer Input: 3 x 224 x 224

(RGB image, zero-mean)

1 64C3 2 64C3 MP2 3 128C3 4 128C3 MP2

Notation: 64C3 convolutional layer with 64 3x3 filters MP2 max-pooling, 2x2

slide-93
SLIDE 93

# Layer Input: 3 x 224 x 224

(RGB image, zero-mean)

1 64C3 2 64C3 MP2 3 128C3 4 128C3 MP2

Note after down- sampling, double the number of convolutional filters

slide-94
SLIDE 94

# Layer Input: 3 x 224 x 224

(RGB image, zero-mean)

1 64C3 2 64C3 MP2 3 128C3 4 128C3 MP2 FC256 FC10

Later part: After blocks of convolutional and down-sampling layers: Fully-connected (a.k.a. dense) layers

slide-95
SLIDE 95

# Layer Input: 3 x 224 x 224

(RGB image, zero-mean)

1 64C3 2 64C3 MP2 3 128C3 4 128C3 MP2 FC256 FC10

Notation: FC256 fully-connected layer with 256 channels

slide-96
SLIDE 96

# Layer Input: 3 x 224 x 224

(RGB image, zero-mean)

1 64C3 2 64C3 MP2 3 128C3 4 128C3 MP2 FC256 FC10

Overall Convolutional layers detect feature in various positions throughout the image

slide-97
SLIDE 97

# Layer Input: 3 x 224 x 224

(RGB image, zero-mean)

1 64C3 2 64C3 MP2 3 128C3 4 128C3 MP2 FC256 FC10

Overall Fully-connected / dense layers use features detected by convolutional layers to produce

  • utput
slide-98
SLIDE 98

Could also look at architectures developed by others, e.g. Inception by Google, or ResNets by Micrsoft for inspiration

slide-99
SLIDE 99

Batch normalization

slide-100
SLIDE 100

Batch normalization [Ioffe15] is recommended in most cases Necessary for deeper networks (> 8 layers)

slide-101
SLIDE 101

Speeds up training; cost drops faster per-epoch, although epochs take longer (~2x in my experience) Can also reach lower error rates

slide-102
SLIDE 102

Layers can magnify or shrink magnitudes of values. Multiple layers can result in exponential increase/decrease. Batch normalisation maintains constant scale throughout network

slide-103
SLIDE 103

Insert into convolutional and fully- connected layers after matrix multiplication/convolution, before the non-linearity

slide-104
SLIDE 104

Lasagne batch normalization inserts itself into a layer before the non- linearity, so its nice and easy to use:

lyr = lasagne.layers.batch_norm(lyr)

slide-105
SLIDE 105

DropOut

slide-106
SLIDE 106

Normally necessary for training (turned

  • ff at predict/test time)

Reduces over-fitting

slide-107
SLIDE 107

Over-fitting is a well-known problem in machine learning, affects neural networks particularly A model over-fits when it is very good at correctly predicting samples in training set but fails to generalise to samples outside it

slide-108
SLIDE 108

DropOut [Hinton12] During training, randomly choose units to ‘drop out’ by setting their output to 0, with probability 𝑄, usually around 0.5 (compensate by multiplying values by

< <>Q)

slide-109
SLIDE 109

During test/predict: Run as normal (DropOut turned off)

slide-110
SLIDE 110

Normally applied after later, fully connected layers

lyr = lasagne.layers.DenseLayer(lyr, num_units=256) lyr = lasagne.layers.DropoutLayer(lyr, p=0.5)

slide-111
SLIDE 111

Dropout OFF

Input layer Hidden layer 0 Output layer

slide-112
SLIDE 112

Dropout ON (1)

Input layer Hidden layer 0 Output layer

slide-113
SLIDE 113

Dropout ON (2)

Input layer Hidden layer 0 Output layer

slide-114
SLIDE 114

Turning on a different subset of units for each sample: causes units to learn more robust features that cannot rely on the presence of other specific features to cover for flaws

slide-115
SLIDE 115

Dataset augmentation

slide-116
SLIDE 116

Reduce over-fitting by enlarging training set Artificially modify existing training samples to make new ones

slide-117
SLIDE 117

For images: Apply transformations such as move, scale, rotate, reflect, etc.

slide-118
SLIDE 118

Data standardisation

slide-119
SLIDE 119

Neural networks train more effectively when training data has: zero-mean unit variance

slide-120
SLIDE 120

Standardise input data In case of regression, standardise

  • utput data too (don’t forget to invert

the standardisation of network predictions!)

slide-121
SLIDE 121

Standardisation Extract samples into an array In case of images, extract all pixels from all sampls, keeping R, G & B channels separate Compute distribution and standardise

slide-122
SLIDE 122

Either: Zero the mean and scale std-dev to 1, per channel (RGB for images) 𝑦G = 𝑦 − 𝜈 𝑦 𝜏 𝑦

slide-123
SLIDE 123

When training goes wrong and what to look for

slide-124
SLIDE 124

Loss becomes NaN (ensure you track the loss after each epoch so you can watch for this!)

slide-125
SLIDE 125

Classification error rate equivalent of random guess (its not learning)

slide-126
SLIDE 126

Learns to predict constant value;

  • ptimises constant value for best loss

A constant value is a local minimum that the network won’t get out of (neural networks ‘cheat’ like crazy!)

slide-127
SLIDE 127

Neural networks (most) often DON’T learn what you want or expect them to

slide-128
SLIDE 128

Local minima will be the bane of your existence

slide-129
SLIDE 129

Designing a computer vision pipeline

slide-130
SLIDE 130

Simple problems may be solved with just a neural network

slide-131
SLIDE 131

Not sufficient for more complex problems (neural networks aren’t a silver bullet; don’t believe the hype)

slide-132
SLIDE 132

Theoretically possible to use a single network for a complex problem if you have enough training data (often an impractical amount)

slide-133
SLIDE 133

For more complex problems, the problem should be broken down

slide-134
SLIDE 134

Example Identifying right whales, by Felix Lau 2nd place in Kaggle competition http://felixlaumon.github.io/2015/01/0 8/kaggle-right-whale.html

slide-135
SLIDE 135

Identifying right whales, by Felix Lau The first naïve solution – training a classifier to identify individuals – did not work well

slide-136
SLIDE 136

Region-based saliency map revealed that the network had ‘locked on’ to features in the ocean shape rather than the whales

slide-137
SLIDE 137

Lau’s solution: Train a localiser neural network to locate the whale in the image

slide-138
SLIDE 138

Lau’s solution: Train a keypoint finder neural network to locate two keypoints on the whale’s head to identify its orientation

slide-139
SLIDE 139

Lau’s solution: Train classifier neural network on

  • riented and cropped whale head

images

slide-140
SLIDE 140

OxfordNet / VGG and transfer learning

slide-141
SLIDE 141

Using a pre-trained network

slide-142
SLIDE 142

Use Oxford VGG-19; the 19-layer model 1000-class image classifier, trained on ImageNet

slide-143
SLIDE 143

Can download CC licensed weights from (in Caffe format):

http://www.robots.ox.ac.uk/~vgg/research/very_deep/

GitHub repo contains code that downloads a Python version form:

http://s3.amazonaws.com/lasagne/recipes/pretrained/imagenet/vgg19.pkl

slide-144
SLIDE 144

VGG models are simple but effective Consist of: 3x3 convolutions 2x2 max pooling fully connected

slide-145
SLIDE 145

# Layer Input: 3 x 224 x 224

(RGB image, zero-mean)

1 64C3 2 64C3 MP2 3 128C3 4 128C3 MP2 5 256C3 6 256C3 7 256C3 8 256C3 MP2 # Layer 9 512C3 10 512C3 11 512C3 12 512C3 MP2 13 512C3 14 512C3 15 512C3 16 512C3 MP2 17 FC4096 (dropout 50%) 18 FC4096 (dropout 50%) 19 FC1000 soft-max

slide-146
SLIDE 146

Exercise / Demo Classifying an image with VGG-19

slide-147
SLIDE 147

Transfer learning (network re-use)

slide-148
SLIDE 148

Training a neural network is notoriously data-hungry Preparing training data with ground truths is expensive and time consuming

slide-149
SLIDE 149

What if we don’t have enough training data to get good results?

slide-150
SLIDE 150

The ImageNet dataset is huge; millions

  • f images with ground truths

What if we could somehow use it to help us with a different task?

slide-151
SLIDE 151

Good news: we can!

slide-152
SLIDE 152

Transfer learning Re-use part (often most) of a pre-trained network for a new task

slide-153
SLIDE 153

Example; can re-use part of VGG-19 net for: Classifying images with classes that weren’t part of the original ImageNet dataset

slide-154
SLIDE 154

Example; can re-use part of VGG-19 net for: Localisation (find location of object in image) Segmentation (find exact boundary around

  • bject in image)
slide-155
SLIDE 155

Transfer learning: how to Take existing network such as VGG-19

slide-156
SLIDE 156

# Layer Input: 3 x 224 x 224

(RGB image, zero-mean)

1 64C3 2 64C3 MP2 3 128C3 4 128C3 MP2 5 256C3 6 256C3 7 256C3 8 256C3 MP2 # Layer 9 512C3 10 512C3 11 512C3 12 512C3 MP2 13 512C3 14 512C3 15 512C3 16 512C3 MP2 17 FC4096 (drop 50%) 18 FC4096 (drop 50%) 19 FC1000 soft-max

slide-157
SLIDE 157

# Layer 9 512C3 10 512C3 11 512C3 12 512C3 MP2 13 512C3 14 512C3 15 512C3 16 512C3 MP2

Remove last layers e.g. the fully- connected ones (just 17,18,19; those in the left box are hidden here for brevity!)

slide-158
SLIDE 158

# Layer 9 512C3 10 512C3 11 512C3 12 512C3 MP2 13 512C3 14 512C3 15 512C3 16 512C3 MP2 17 FC1024 (drop 50%) 18 FC21 soft-max

Build new randomly initialise layers to replace them (the number of layers created their size is only for illustration here)

slide-159
SLIDE 159

Transfer learning: training Train the network with your training data, only learning parameters for the new layers

slide-160
SLIDE 160

Transfer learning: fine-tuning After learning parameters for the new layers, fine-tune by learning parameters for the whole network to get better accuracy

slide-161
SLIDE 161

Result Nice shiny network with good performance that was trained with much less of our training data

slide-162
SLIDE 162

Some cool work in the field that might be of interest

slide-163
SLIDE 163

Visualizing and understanding convolutional networks [Zeiler14] Visualisations of responses of layers to images

slide-164
SLIDE 164

Visualizing and understanding convolutional networks [Zeiler14]

Image taken from [Zeiler14]

slide-165
SLIDE 165

Visualizing and understanding convolutional networks [Zeiler14]

Image taken from [Zeiler14]

slide-166
SLIDE 166

Deep Neural Networks are Easily Fooled: High Confidence Predictions in Recognizable Images [Nguyen15] Generate images that are unrecognizable to human eyes but are recognized by the network

slide-167
SLIDE 167

Deep Neural Networks are Easily Fooled: High Confidence Predictions in Recognizable Images [Nguyen15]

Image taken from [Nguyen15]

slide-168
SLIDE 168

Learning to generate chairs with convolutional neural networks [Dosovitskiy15] Network in reverse; orientation, design colour, etc parameters as input, rendered images as output training images

slide-169
SLIDE 169

Learning to generate chairs with convolutional neural networks [Dosovitskiy15]

Image taken from [Dosovitskiy15]

slide-170
SLIDE 170

A Neural Algorithm of Artistic Style [Gatys15] Take an OxfordNet model [Simonyan14] and extract texture features from one of the convolutional layers, given a target style / painting as input Use gradient descent to iterate photo – not weights – so that its texture features match those of the target image.

slide-171
SLIDE 171

A Neural Algorithm of Artistic Style [Gatys15]

Image taken from [Gatys15]

slide-172
SLIDE 172

Unsupervised representation Learning with Deep Convolutional Generative Adversarial Nets [Radford 15] Train two networks; one given random parameters to generate an image, another to discriminate between a generated image and

  • ne from the training set
slide-173
SLIDE 173

Generative Adversarial Nets [Radford15]

Images of bedrooms generated using neural net Image taken from [Radford15]

slide-174
SLIDE 174

Generative Adversarial Nets [Radford15]

Image taken from [Radford15]

slide-175
SLIDE 175

Hope you’ve found it helpful!

slide-176
SLIDE 176

Thank you!

slide-177
SLIDE 177

References

slide-178
SLIDE 178

[Dosovitskiy15] Dosovitskiy, Springenberg and Box; Learning to generate chairs with convolutional neural networks, arXiv preprint, 2015

slide-179
SLIDE 179

[Gatys15] Gatys, Echer, Bethge; A Neural Algorithm of Artistic Style, arXiv: 1508.06576, 2015

slide-180
SLIDE 180

[He15a] He, Zhang, Ren and Sun; Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, arXiv 2015

slide-181
SLIDE 181

[He15b] He, Kaiming, et al. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015).

slide-182
SLIDE 182

[Hinton12] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever and R. R. Salakhutdinov; Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.

slide-183
SLIDE 183

[Ioffe15] Ioffe, S.; Szegedy C.. (2015). “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift". ICML 2015, arXiv:1502.03167

slide-184
SLIDE 184

[Jones87] Jones, J.P.; Palmer, L.A. (1987). "An evaluation of the two-dimensional gabor filter model of simple receptive fields in cat striate cortex". J. Neurophysiol 58 (6): 1233–1258

slide-185
SLIDE 185

[Lin13] Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400 (2013).

slide-186
SLIDE 186

[Nesterov83] Nesterov, Y. A method of solving a convex programming problem with convergence rate O(1/sqr(k)). Soviet Mathematics Doklady, 27:372–376 (1983).

slide-187
SLIDE 187

[Radford15] Radford, Metz, Chintala; Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, arXiv:1511.06434, 2015

slide-188
SLIDE 188

[Sutskever13] Sutskever, Ilya, et al. On the importance of initialization and momentum in deep

  • learning. Proceedings of the 30th

international conference on machine learning (ICML-13). 2013.

slide-189
SLIDE 189

[Simonyan14] K. Simonyan and Zisserman; Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556, 2014

slide-190
SLIDE 190

[Wang14] Wang, Dan, and Yi Shang. "A new active labeling method for deep learning."Neural Networks (IJCNN), 2014 International Joint Conference on. IEEE, 2014.

slide-191
SLIDE 191

[Zeiler14] Zeiler and Fergus; Visualizing and understanding convolutional networks, Computer Vision - ECCV 2014