Lecture 14 - Lecture 14 - 29 Feb 2016 29 Feb 2016 1 - - PowerPoint PPT Presentation

lecture 14 lecture 14 29 feb 2016 29 feb 2016 1
SMART_READER_LITE
LIVE PREVIEW

Lecture 14 - Lecture 14 - 29 Feb 2016 29 Feb 2016 1 - - PowerPoint PPT Presentation

Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - Lecture 14 - 29 Feb 2016 29 Feb 2016 1 Administrative Everyone should be done with Assignment 3 now Milestone


slide-1
SLIDE 1

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 14 - 29 Feb 2016 1

slide-2
SLIDE 2

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Administrative

  • Everyone should be done with Assignment 3 now
  • Milestone grades will go out soon

2

slide-3
SLIDE 3

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Last class

3 Segmentation Spatial Transformer Soft Attention

slide-4
SLIDE 4

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Videos

4

slide-5
SLIDE 5

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

ConvNets for images

5

slide-6
SLIDE 6

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Feature-based approaches to Activity Recognition

6

Dense trajectories and motion boundary descriptors for action recognition Wang et al., 2013 Action Recognition with Improved Trajectories Wang and Schmid, 2013 (code available!)

slide-7
SLIDE 7

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 7

Dense trajectories and motion boundary descriptors for action recognition Wang et al., 2013

detect feature points track features with

  • ptical flow

extract HOG/HOF/MBH features in the (stabilized) coordinate system of each tracklet

slide-8
SLIDE 8

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 8

[J. Shi and C. Tomasi, “Good features to track,” CVPR 1994] [Ivan Laptev 2005]

detected feature points

Dense trajectories and motion boundary descriptors for action recognition Wang et al., 2013

slide-9
SLIDE 9

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 9

Dense trajectories and motion boundary descriptors for action recognition Wang et al., 2013

track each keypoint using optical flow.

[G. Farnebäck, “Two-frame motion estimation based on polynomial expansion,” 2003] [T. Brox and J. Malik, “Large displacement optical flow: Descriptor matching in variational motion estimation,” 2011]

slide-10
SLIDE 10

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 10

Dense trajectories and motion boundary descriptors for action recognition Wang et al., 2013 Extract features in the local coordinate system of each tracklet. Accumulate into histograms, separately according to multiple spatio-temporal layouts.

slide-11
SLIDE 11

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 11

Case Study: AlexNet

[Krizhevsky et al. 2012]

Input: 227x227x3 images First layer (CONV1): 96 11x11 filters applied at stride 4 => Output volume [55x55x96] Q: What if the input is now a small chunk of video? E.g. [227x227x3x15] ?

slide-12
SLIDE 12

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 12

Case Study: AlexNet

[Krizhevsky et al. 2012]

Input: 227x227x3 images First layer (CONV1): 96 11x11 filters applied at stride 4 => Output volume [55x55x96] Q: What if the input is now a small chunk of video? E.g. [227x227x3x15] ? A: Extend the convolutional filters in time, perform spatio-temporal convolutions! E.g. can have 11x11xT filters, where T = 2..15.

slide-13
SLIDE 13

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 13

Spatio-Temporal ConvNets

[3D Convolutional Neural Networks for Human Action Recognition, Ji et al., 2010]

slide-14
SLIDE 14

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Spatio-Temporal ConvNets

14

Sequential Deep Learning for Human Action Recognition, Baccouche et al., 2011

slide-15
SLIDE 15

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 15

Spatio-Temporal ConvNets

[Large-scale Video Classification with Convolutional Neural Networks, Karpathy et al., 2014]

spatio-temporal convolutions; worked best.

slide-16
SLIDE 16

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 16

Spatio-Temporal ConvNets

[Large-scale Video Classification with Convolutional Neural Networks, Karpathy et al., 2014]

Learned filters on the first layer

slide-17
SLIDE 17

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 17

Spatio-Temporal ConvNets

[Large-scale Video Classification with Convolutional Neural Networks, Karpathy et al., 2014]

1 million videos 487 sports classes

slide-18
SLIDE 18

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 18

Spatio-Temporal ConvNets

[Large-scale Video Classification with Convolutional Neural Networks, Karpathy et al., 2014]

The motion information didn’t add all that much...

slide-19
SLIDE 19

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 19

Spatio-Temporal ConvNets

slide-20
SLIDE 20

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 20

Spatio-Temporal ConvNets

[Learning Spatiotemporal Features with 3D Convolutional Networks, Tran et al. 2015]

3D VGGNet, basically.

slide-21
SLIDE 21

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 21

Spatio-Temporal ConvNets

[Two-Stream Convolutional Networks for Action Recognition in Videos, Simonyan and Zisserman 2014]

[T. Brox and J. Malik, “Large displacement optical flow: Descriptor matching in variational motion estimation,” 2011]

(of VGGNet fame)

slide-22
SLIDE 22

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 22

Spatio-Temporal ConvNets

[Two-Stream Convolutional Networks for Action Recognition in Videos, Simonyan and Zisserman 2014]

[T. Brox and J. Malik, “Large displacement optical flow: Descriptor matching in variational motion estimation,” 2011]

Two-stream version works much better than either alone.

slide-23
SLIDE 23

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 23

Long-time Spatio-Temporal ConvNets

All 3D ConvNets so far used local motion cues to get extra accuracy (e.g. half a second or so) Q: what if the temporal dependencies of interest are much much longer? E.g. several seconds? event 1 event 2

slide-24
SLIDE 24

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 24

Sequential Deep Learning for Human Action Recognition, Baccouche et al., 2011

Long-time Spatio-Temporal ConvNets

(This paper was way ahead of its time. Cited 65 times.)

slide-25
SLIDE 25

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 25

Sequential Deep Learning for Human Action Recognition, Baccouche et al., 2011

Long-time Spatio-Temporal ConvNets

LSTM way before it was cool (This paper was way ahead of its time. Cited 65 times.)

slide-26
SLIDE 26

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 26

Long-time Spatio-Temporal ConvNets

[Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al., 2015]

slide-27
SLIDE 27

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 27

Long-time Spatio-Temporal ConvNets

[Beyond Short Snippets: Deep Networks for Video Classification, Ng et al., 2015]

slide-28
SLIDE 28

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Summary so far

We looked at two types of architectural patterns:

  • 1. Model temporal motion locally (3D CONV)
  • 2. Model temporal motion globally (LSTM / RNN)

+ Fusions of both approaches at the same time.

28

slide-29
SLIDE 29

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Summary so far

We looked at two types of architectural patterns:

  • 1. Model temporal motion locally (3D CONV)
  • 2. Model temporal motion globally (LSTM / RNN)

+ Fusions of both approaches at the same time.

29

There is another (cleaner) way!

slide-30
SLIDE 30

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 30

3D CONVNET

video

RNN

Finite temporal extent

(neurons that are only a function of finitely many video frames in the past)

Infinite (in theory) temporal extent

(neurons that are function

  • f all video frames in the past)
slide-31
SLIDE 31

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 31

Long-time Spatio-Temporal ConvNets

[Delving Deeper into Convolutional Networks for Learning Video Representations, Ballas et al., 2016]

Beautiful: All neurons in the ConvNet are recurrent. Only requires (existing) 2D CONV routines. No need for 3D spatio-temporal CONV.

slide-32
SLIDE 32

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 32

Long-time Spatio-Temporal ConvNets

[Delving Deeper into Convolutional Networks for Learning Video Representations, Ballas et al., 2016]

Convolution Layer

Normal ConvNet:

slide-33
SLIDE 33

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 33

Long-time Spatio-Temporal ConvNets

[Delving Deeper into Convolutional Networks for Learning Video Representations, Ballas et al., 2016]

CONV

CONV

layer N layer N+1 layer N+1 at previous timestep

RNN-like recurrence (GRU)

slide-34
SLIDE 34

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 34

Long-time Spatio-Temporal ConvNets

[Delving Deeper into Convolutional Networks for Learning Video Representations, Ballas et al., 2016]

Recall: RNNs

Vanilla RNN LSTM GRU

slide-35
SLIDE 35

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 35

Long-time Spatio-Temporal ConvNets

[Delving Deeper into Convolutional Networks for Learning Video Representations, Ballas et al., 2016]

Recall: RNNs

GRU

Matrix multiply => CONV

slide-36
SLIDE 36

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 36

3D CONVNET

video

RNN

Finite temporal extent

(neurons that are only a function of finitely many video frames in the past)

Infinite (in theory) temporal extent

(neurons that are function

  • f all video frames in the past)
slide-37
SLIDE 37

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016 37

RNN CONVNET

video

Infinite (in theory) temporal extent

(neurons that are function

  • f all video frames in the past)

i.e. we obtain:

slide-38
SLIDE 38

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Summary

  • You think you need a Spatio-Temporal Fancy Video

ConvNet

  • STOP. Do you really?
  • Okay fine: do you want to model:
  • local motion? (use 3D CONV), or
  • global motion? (use LSTM).
  • Try out using Optical Flow in a second stream (can work

better sometimes)

  • Try out GRU-RCN! (imo best model)

38

slide-39
SLIDE 39

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Unsupervised Learning

39

slide-40
SLIDE 40

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Unsupervised Learning Overview

  • Definitions
  • Autoencoders

○ Vanilla ○ Variational

  • Adversarial Networks

40

slide-41
SLIDE 41

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Supervised vs Unsupervised

41

Supervised Learning Data: (x, y)

x is data, y is label

Goal: Learn a function to map x -> y Examples: Classification,

regression, object detection, semantic segmentation, image captioning, etc

slide-42
SLIDE 42

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Supervised vs Unsupervised

42

Supervised Learning Data: (x, y)

x is data, y is label

Goal: Learn a function to map x -> y Examples: Classification,

regression, object detection, semantic segmentation, image captioning, etc

Unsupervised Learning Data: x

Just data, no labels!

Goal: Learn some structure

  • f the data

Examples: Clustering,

dimensionality reduction, feature learning, generative models, etc.

slide-43
SLIDE 43

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Unsupervised Learning

43

  • Autoencoders

○ Traditional: feature learning ○ Variational: generate samples

  • Generative Adversarial Networks: Generate samples
slide-44
SLIDE 44

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Autoencoders

44

x z

Encoder Input data Features

slide-45
SLIDE 45

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Autoencoders

45

x z

Encoder Input data Features

Originally: Linear + nonlinearity (sigmoid) Later: Deep, fully-connected Later: ReLU CNN

slide-46
SLIDE 46

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Autoencoders

46

x z

Encoder Input data Features

Originally: Linear + nonlinearity (sigmoid) Later: Deep, fully-connected Later: ReLU CNN z usually smaller than x (dimensionality reduction)

slide-47
SLIDE 47

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Autoencoders

47

x z xx

Encoder Decoder Input data Features Reconstructed input data

slide-48
SLIDE 48

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Autoencoders

48

x z xx

Encoder Decoder Input data Features Reconstructed input data

Originally: Linear + nonlinearity (sigmoid) Later: Deep, fully-connected Later: ReLU CNN (upconv) Encoder: 4-layer conv Decoder: 4-layer upconv

slide-49
SLIDE 49

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Autoencoders

49

x z xx

Encoder Decoder Input data Features Reconstructed input data

Originally: Linear + nonlinearity (sigmoid) Later: Deep, fully-connected Later: ReLU CNN (upconv)

Train for reconstruction with no labels!

Encoder / decoder sometimes share weights Example: dim(x) = D dim(z) = H we: H x D wd: D x H = we

T

slide-50
SLIDE 50

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Autoencoders

50

x z xx

Encoder Decoder Input data Features Reconstructed input data Loss function (Often L2)

Train for reconstruction with no labels!

slide-51
SLIDE 51

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Autoencoders

51

x z

Encoder Input data Features

xx

Decoder Reconstructed input data

After training, throw away decoder!

slide-52
SLIDE 52

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Autoencoders

52

x z yy

Encoder Classifier Input data Features Predicted Label Loss function (Softmax, etc)

y

Use encoder to initialize a supervised model

plane dog deer bird truck

Train for final task (sometimes with small data)

Fine-tune encoder jointly with classifier

slide-53
SLIDE 53

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Autoencoders: Greedy Training

53

Hinton and Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks”, Science 2006

In mid 2000s layer-wise pretraining with Restricted Boltzmann Machines (RBM) was common Training deep nets was hard in 2006!

slide-54
SLIDE 54

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Autoencoders: Greedy Training

54

Hinton and Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks”, Science 2006

In mid 2000s layer-wise pretraining with Restricted Boltzmann Machines (RBM) was common Training deep nets was hard in 2006! Not common anymore With ReLU, proper initialization, batchnorm, Adam, etc easily train from scratch

slide-55
SLIDE 55

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Autoencoders

55

x z

Encoder Input data Features

xx

Decoder Reconstructed input data

Autoencoders can reconstruct data, and can learn features to initialize a supervised model Can we generate images from an autoencoder?

slide-56
SLIDE 56

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder

56 A Bayesian spin on an autoencoder - lets us generate data! Assume our data is generated like this:

z x

Sample from true prior Sample from true conditional

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

slide-57
SLIDE 57

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder

57 A Bayesian spin on an autoencoder! Assume our data is generated like this:

z x

Sample from true prior Sample from true conditional Intuition: x is an image, z gives class, orientation, attributes, etc

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

slide-58
SLIDE 58

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder

58 A Bayesian spin on an autoencoder! Assume our data is generated like this:

z x

Sample from true prior Sample from true conditional Problem: Estimate without access to latent states ! Intuition: x is an image, z gives class, orientation, attributes, etc

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

slide-59
SLIDE 59

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder

59

Prior: Assume is a unit Gaussian

Kingma and Welling, ICLR 2014

slide-60
SLIDE 60

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder

60

Prior: Assume is a unit Gaussian Conditional: Assume is a diagonal Gaussian, predict mean and variance with neural net

Kingma and Welling, ICLR 2014

slide-61
SLIDE 61

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder

61

z x Σx

Mean and (diagonal) covariance of Latent state Decoder network with parameters Prior: Assume is a unit Gaussian Conditional: Assume is a diagonal Gaussian, predict mean and variance with neural net

Kingma and Welling, ICLR 2014

slide-62
SLIDE 62

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder

62

z x Σx

Mean and (diagonal) covariance of Latent state Decoder network with parameters Prior: Assume is a unit Gaussian Conditional: Assume is a diagonal Gaussian, predict mean and variance with neural net Fully-connected or upconvolutional

Kingma and Welling, ICLR 2014

slide-63
SLIDE 63

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Encoder

By Bayes Rule the posterior is:

63

Kingma and Welling, ICLR 2014

slide-64
SLIDE 64

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Encoder

By Bayes Rule the posterior is:

64

Use decoder network =) Gaussian =) Intractible integral =(

Kingma and Welling, ICLR 2014

slide-65
SLIDE 65

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Encoder

By Bayes Rule the posterior is:

65

x z Σz

Mean and (diagonal) covariance of Data point Encoder network with parameters Use decoder network =) Gaussian =) Intractible integral =(

Kingma and Welling, ICLR 2014

slide-66
SLIDE 66

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Encoder

By Bayes Rule the posterior is:

66

x z Σz

Mean and (diagonal) covariance of Data point Encoder network with parameters Use decoder network =) Gaussian =) Intractible integral =( Approximate posterior with encoder network

Kingma and Welling, ICLR 2014

slide-67
SLIDE 67

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Encoder

By Bayes Rule the posterior is:

67

x z Σz

Mean and (diagonal) covariance of Data point Encoder network with parameters Use decoder network =) Gaussian =) Intractible integral =( Approximate posterior with encoder network Fully-connected

  • r convolutional

Kingma and Welling, ICLR 2014

slide-68
SLIDE 68

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder

68

x

Kingma and Welling, ICLR 2014

Data point

slide-69
SLIDE 69

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder

69

x z Σz

Kingma and Welling, ICLR 2014

Mean and (diagonal) covariance of Data point Encoder network

slide-70
SLIDE 70

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder

70

x z Σz z

Kingma and Welling, ICLR 2014

Mean and (diagonal) covariance of Data point Encoder network Sample from

slide-71
SLIDE 71

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder

71

x z Σz z x Σx

Kingma and Welling, ICLR 2014

Mean and (diagonal) covariance of Mean and (diagonal) covariance of Data point Encoder network Sample from Decoder network

slide-72
SLIDE 72

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder

72

x z Σz z x xx Σx

Kingma and Welling, ICLR 2014

Mean and (diagonal) covariance of Mean and (diagonal) covariance of Data point Encoder network Sample from Decoder network Sample from Reconstructed

slide-73
SLIDE 73

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Mean and (diagonal) covariance of (should be close to data x)

Variational Autoencoder

73

x z Σz

Mean and (diagonal) covariance of (should be close to prior ) Data point Encoder network

z x

Sample from Decoder network Sample from

Training like a normal autoencoder: reconstruction loss at the end, regularization toward prior in middle

xx

Reconstructed

Σx

Kingma and Welling, ICLR 2014

slide-74
SLIDE 74

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Generate Data!

74

z

Sample from prior After network is trained:

slide-75
SLIDE 75

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Generate Data!

75

z x Σx

Decoder network Sample from prior After network is trained:

slide-76
SLIDE 76

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Generate Data!

76

z x Σx

Decoder network Sample from Sample from prior After network is trained:

xx

Generated

slide-77
SLIDE 77

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Generate Data!

77

z x Σx

Decoder network Sample from Sample from prior After network is trained:

xx

Generated

slide-78
SLIDE 78

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Generate Data!

78

z x Σx

Decoder network Sample from Sample from prior After network is trained:

xx

Generated

slide-79
SLIDE 79

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Generate Data!

79

z x Σx

Decoder network Sample from Sample from prior Diagonal prior on z => independent latent variables After network is trained:

xx

Generated

slide-80
SLIDE 80

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math Maximum Likelihood?

80 Maximize likelihood of dataset

Kingma and Welling, ICLR 2014

slide-81
SLIDE 81

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math Maximum Likelihood?

81 Maximize likelihood of dataset Maximize log-likelihood instead because sums are nicer

Kingma and Welling, ICLR 2014

slide-82
SLIDE 82

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math Maximum Likelihood?

82 Maximize likelihood of dataset Maximize log-likelihood instead because sums are nicer Marginalize joint distribution

Kingma and Welling, ICLR 2014

slide-83
SLIDE 83

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math Maximum Likelihood?

83 Maximize likelihood of dataset Maximize log-likelihood instead because sums are nicer Intractible integral =(

slide-84
SLIDE 84

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math

84

slide-85
SLIDE 85

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math

85

slide-86
SLIDE 86

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math

86

slide-87
SLIDE 87

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math

87

slide-88
SLIDE 88

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math

88

slide-89
SLIDE 89

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math

89

slide-90
SLIDE 90

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math

90 “Elbow”

slide-91
SLIDE 91

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math

91 “Elbow”

slide-92
SLIDE 92

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math

92 Variational lower bound (elbow) “Elbow”

slide-93
SLIDE 93

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math

93 Variational lower bound (elbow) Training: Maximize lower bound “Elbow”

slide-94
SLIDE 94

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math

94 Reconstruct the input data Variational lower bound (elbow) Training: Maximize lower bound “Elbow”

slide-95
SLIDE 95

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math

95 Reconstruct the input data Latent states should follow the prior Variational lower bound (elbow) Training: Maximize lower bound “Elbow”

slide-96
SLIDE 96

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math

96 Reconstruct the input data Latent states should follow the prior Variational lower bound (elbow) Training: Maximize lower bound Sampling with reparam. trick (see paper) “Elbow”

slide-97
SLIDE 97

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Variational Autoencoder: Math

97 Reconstruct the input data Latent states should follow the prior Variational lower bound (elbow) Training: Maximize lower bound Everything is Gaussian, closed form solution! Sampling with reparam. trick (see paper) “Elbow”

slide-98
SLIDE 98

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Autoencoder Overview

  • Traditional Autoencoders

○ Try to reconstruct input ○ Used to learn features, initialize supervised model ○ Not used much anymore

  • Variational Autoencoders

○ Bayesian meets deep learning ○ Sample from model to generate images

98

slide-99
SLIDE 99

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets

99

z

Random noise

Goodfellow et al, “Generative Adversarial Nets”, NIPS 2014

Can we generate images with less math?

slide-100
SLIDE 100

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets

100

z x

Generator Random noise Fake image

Goodfellow et al, “Generative Adversarial Nets”, NIPS 2014

Can we generate images with less math?

slide-101
SLIDE 101

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets

101

z x

Generator Random noise Fake image

y

Real or fake? Discriminator

Goodfellow et al, “Generative Adversarial Nets”, NIPS 2014

Can we generate images with less math?

slide-102
SLIDE 102

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets

102

z x

Generator Random noise Fake image

y

Real image Real or fake? Discriminator

x

Fake examples: from generator Real examples: from dataset

Goodfellow et al, “Generative Adversarial Nets”, NIPS 2014

Can we generate images with less math?

slide-103
SLIDE 103

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets

103

z x

Generator Random noise Fake image

y

Real image Real or fake? Discriminator

x

Fake examples: from generator Real examples: from dataset

Goodfellow et al, “Generative Adversarial Nets”, NIPS 2014

Train generator and discriminator jointly After training, easy to generate images Can we generate images with less math?

slide-104
SLIDE 104

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets

104

Goodfellow et al, “Generative Adversarial Nets”, NIPS 2014

Nearest neighbor from training set Generated samples

slide-105
SLIDE 105

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets

105

Goodfellow et al, “Generative Adversarial Nets”, NIPS 2014

Nearest neighbor from training set Generated samples (CIFAR-10)

slide-106
SLIDE 106

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Multiscale

106

Denton et al, “Deep generative image models using a Laplacian pyramid of adversarial networks”, NIPS 2015

Generate low-res

slide-107
SLIDE 107

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Multiscale

107

Denton et al, “Deep generative image models using a Laplacian pyramid of adversarial networks”, NIPS 2015

Generate low-res Upsample

slide-108
SLIDE 108

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Multiscale

108

Denton et al, “Deep generative image models using a Laplacian pyramid of adversarial networks”, NIPS 2015

Generate low-res Upsample Generate delta, add

slide-109
SLIDE 109

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Multiscale

109

Denton et al, “Deep generative image models using a Laplacian pyramid of adversarial networks”, NIPS 2015

Generate low-res Upsample Generate delta, add Upsample

slide-110
SLIDE 110

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Multiscale

110

Denton et al, “Deep generative image models using a Laplacian pyramid of adversarial networks”, NIPS 2015

Generate low-res Upsample Generate delta, add Upsample Generate delta, add

slide-111
SLIDE 111

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Multiscale

111

Denton et al, “Deep generative image models using a Laplacian pyramid of adversarial networks”, NIPS 2015

Generate low-res Upsample Generate delta, add Upsample Generate delta, add Upsample

slide-112
SLIDE 112

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Multiscale

112

Denton et al, “Deep generative image models using a Laplacian pyramid of adversarial networks”, NIPS 2015

Generate low-res Upsample Generate delta, add Upsample Generate delta, add Upsample Generate delta, add Done!

slide-113
SLIDE 113

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Multiscale

113

Discriminators work at every scale!

Denton et al, NIPS 2015

slide-114
SLIDE 114

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Multiscale

114

Train separate model per-class on CIFAR-10

Denton et al, NIPS 2015

slide-115
SLIDE 115

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Simplifying

115

Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016

Generator is an upsampling network with fractionally-strided convolutions Discriminator is a convolutional network

slide-116
SLIDE 116

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Simplifying

116

Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016

Generator

slide-117
SLIDE 117

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Simplifying

117

Radford et al, ICLR 2016

Samples from the model look amazing!

slide-118
SLIDE 118

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Simplifying

118

Radford et al, ICLR 2016

Interpolating between random points in latent space

slide-119
SLIDE 119

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Vector Math

119 Smiling woman Neutral woman Neutral man Samples from the model

Radford et al, ICLR 2016

slide-120
SLIDE 120

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Vector Math

120 Smiling woman Neutral woman Neutral man Samples from the model Average Z vectors, do arithmetic

Radford et al, ICLR 2016

slide-121
SLIDE 121

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Vector Math

121 Smiling woman Neutral woman Neutral man Smiling Man Samples from the model Average Z vectors, do arithmetic

Radford et al, ICLR 2016

slide-122
SLIDE 122

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Vector Math

122

Radford et al, ICLR 2016

Glasses man No glasses man No glasses woman

slide-123
SLIDE 123

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Generative Adversarial Nets: Vector Math

123

Radford et al, ICLR 2016

Glasses man No glasses man No glasses woman Woman with glasses

slide-124
SLIDE 124

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Putting everything together

124

x z Σz z x xx Σx Variational Autoencoder

Pixel loss

Dosovitskiy and Brox, “Generating Images with Perceptual Similarity Metrics based on Deep Networks”, arXiv 2016

slide-125
SLIDE 125

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Putting everything together

125

x z Σz z x xx Σx y Variational Autoencoder Discriminator network

Dosovitskiy and Brox, “Generating Images with Perceptual Similarity Metrics based on Deep Networks”, arXiv 2016

Real or Generated Pixel loss

slide-126
SLIDE 126

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Putting everything together

126

x z Σz z x xx Σx y Variational Autoencoder Discriminator network

Pretrained AlexNet

Dosovitskiy and Brox, “Generating Images with Perceptual Similarity Metrics based on Deep Networks”, arXiv 2016

Real or Generated Pixel loss

slide-127
SLIDE 127

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Putting everything together

127

x z Σz z x xx Σx y Variational Autoencoder Discriminator network

Pretrained AlexNet

xf xxf

Features of real image Features of reconstructed image

Dosovitskiy and Brox, “Generating Images with Perceptual Similarity Metrics based on Deep Networks”, arXiv 2016

Real or Generated Pixel loss

slide-128
SLIDE 128

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Putting everything together

128

x z Σz z x xx Σx y Variational Autoencoder Discriminator network

Pretrained AlexNet

xf xxf

Features of real image Features of reconstructed image

L2 loss

Dosovitskiy and Brox, “Generating Images with Perceptual Similarity Metrics based on Deep Networks”, arXiv 2016

Real or Generated Pixel loss

slide-129
SLIDE 129

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Putting everything together

129

Dosovitskiy and Brox, “Generating Images with Perceptual Similarity Metrics based on Deep Networks”, arXiv 2016

Samples from the model, trained

  • n ImageNet
slide-130
SLIDE 130

Lecture 14 -

Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 Feb 2016

Recap

  • Videos
  • Unsupervised learning

○ Autoencoders: Traditional / variational ○ Generative Adversarial Networks

  • Next time: Guest lecture from Jeff Dean

130