Unsupervised Learning of Visual Structure Using Predictive - - PowerPoint PPT Presentation

unsupervised learning of visual structure using
SMART_READER_LITE
LIVE PREVIEW

Unsupervised Learning of Visual Structure Using Predictive - - PowerPoint PPT Presentation

Unsupervised Learning of Visual Structure Using Predictive Generative Networks William Lotter, Gabriel Kreiman & David Cox Harvard University, Cambridge, USA Article overview by Ilya Kuzovkin Computational Neuroscience Seminar University


slide-1
SLIDE 1

Article overview by Ilya Kuzovkin

William Lotter, Gabriel Kreiman & David Cox

Computational Neuroscience Seminar University of Tartu 2015

Harvard University, Cambridge, USA

Unsupervised Learning of Visual Structure Using Predictive Generative Networks

slide-2
SLIDE 2

The idea of predictive coding in neuroscience

slide-3
SLIDE 3

“state-of-the-art deep learning models rely on millions of labeled training examples to learn”

slide-4
SLIDE 4

“state-of-the-art deep learning models rely on millions of labeled training examples to learn” “in contrast to biological systems, where learning is largely unsupervised”

slide-5
SLIDE 5

“state-of-the-art deep learning models rely on millions of labeled training examples to learn” “in contrast to biological systems, where learning is largely unsupervised” “we explore the idea that prediction is not only a useful end-goal, but may also serve as a powerful unsupervised learning signal”

slide-6
SLIDE 6

PART I THE IDEA OF PREDICTIVE ENCODER

"prediction may also serve as a powerful unsupervised learning signal"

slide-7
SLIDE 7

PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)

vs.

slide-8
SLIDE 8

input

  • utput

“bottleneck”

AUTOENCODER

slide-9
SLIDE 9

input

  • utput

“bottleneck”

AUTOENCODER

slide-10
SLIDE 10

input

  • utput

“bottleneck”

AUTOENCODER

slide-11
SLIDE 11

input

  • utput

“bottleneck”

Reconstruction

AUTOENCODER

slide-12
SLIDE 12

input

  • utput

“bottleneck”

Reconstruction

AUTOENCODER

slide-13
SLIDE 13

input

  • utput

“bottleneck”

Can we do prediction? Reconstruction

AUTOENCODER

slide-14
SLIDE 14

PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)

vs.

slide-15
SLIDE 15

RECURRENT NEURAL NETWORK

slide-16
SLIDE 16

RECURRENT NEURAL NETWORK

slide-17
SLIDE 17

RECURRENT NEURAL NETWORK

slide-18
SLIDE 18

RECURRENT NEURAL NETWORK

slide-19
SLIDE 19

PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)

vs.

slide-20
SLIDE 20

PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)

vs.

Convolution ReLu Max-pooling 2x {

slide-21
SLIDE 21

PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)

vs.

Long Short-Term Memory (LSTM) 5 - 15 steps 1024 units Convolution ReLu Max-pooling 2x {

slide-22
SLIDE 22

PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)

vs.

Long Short-Term Memory (LSTM) 5 - 15 steps 1024 units Convolution ReLu Max-pooling 2x { 2 layers NN upsampling ReLu Convolution

slide-23
SLIDE 23

PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)

vs.

Long Short-Term Memory (LSTM) 5 - 15 steps 1024 units Convolution ReLu Max-pooling 2x { MSE loss RMSProp optimizer LR 0.001 2 layers NN upsampling ReLu Convolution

slide-24
SLIDE 24

PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)

vs.

Long Short-Term Memory (LSTM) 5 - 15 steps 1024 units

http://keras.io

2 layers NN upsampling Convolution ReLu ReLu Convolution Max-pooling 2x { MSE loss RMSProp optimizer LR 0.001

slide-25
SLIDE 25
slide-26
SLIDE 26

PART II ADVERSARIAL LOSS

"the generator is trained to maximally confuse the adversarial discriminator"

slide-27
SLIDE 27
slide-28
SLIDE 28

vs.

Long Short-Term Memory (LSTM) 5 - 15 steps 1568 units Fully connected layer 2 layers NN upsampling Convolution ReLu ReLu Convolution Max-pooling 2x { MSE loss RMSProp optimizer LR 0.001

slide-29
SLIDE 29

vs.

Long Short-Term Memory (LSTM) 5 - 15 steps 1568 units Fully connected layer 2 layers NN upsampling Convolution ReLu ReLu Convolution Max-pooling 2x { MSE loss RMSProp optimizer LR 0.001

slide-30
SLIDE 30

MSE loss

slide-31
SLIDE 31

MSE loss

slide-32
SLIDE 32

MSE loss

slide-33
SLIDE 33

3 FC layers (relu, relu, softmax) MSE loss

slide-34
SLIDE 34

3 FC layers (relu, relu, softmax) "trained to maximize the probability that a proposed frame came from the ground truth data and minimize it when it is produced by the generator" MSE loss

slide-35
SLIDE 35

3 FC layers (relu, relu, softmax) "trained to maximize the probability that a proposed frame came from the ground truth data and minimize it when it is produced by the generator" A L l

  • s

s t

  • t

r a i n P G N MSE loss AL loss

slide-36
SLIDE 36

3 FC layers (relu, relu, softmax) "trained to maximize the probability that a proposed frame came from the ground truth data and minimize it when it is produced by the generator" A L l

  • s

s t

  • t

r a i n P G N MSE loss AL loss

slide-37
SLIDE 37

“with adversarial loss alone the generator easily found solutions that fooled the discriminator, but did not look anything like the correct samples” MSE model is fairly faithful to the identities of the faces, but produces blurred versions combined AL/MSE model tends to underfit the identity towards a more average face

slide-38
SLIDE 38

PART III INTERNAL REPRESENTATIONS AND LATENT VARIABLES

"we are interested in understanding the representations learned by the models"

slide-39
SLIDE 39

PGN model LSTM activities L2 regression Value of a latent variable

slide-40
SLIDE 40

PGN model LSTM activities L2 regression Value of a latent variable

slide-41
SLIDE 41

“An MDS algorithm aims to place each object in N-dimensional space such that the between-object distances are preserved as well as possible.”

MULTIDIMENSIONAL SCALING

slide-42
SLIDE 42

PART IV USEFULNESS OF PREDICTIVE LEARNING

"representations trained with a predictive loss outperform

  • ther models of comparable

complexity in a supervised classification problem"

slide-43
SLIDE 43

THE TASK: 50 randomly generated faces (12 angles per each) Generative models: Internal representation SVM Identify class

slide-44
SLIDE 44

THE TASK: 50 randomly generated faces (12 angles per each) Generative models: Internal representation SVM Identify class

  • Encoder-LSTM-Decoder to predict next frame (PGN)
  • Encoder-LSTM-Decoder to predict last frame (AE LSTM dynamic)
  • Encoder-LSTM-Decoder on frames made into static movies (AE LSTM static)
  • Encoder-FC-Decoder with #weights as in LSTM (AE FC #weights)
  • Encoder-FC-Decoder with #units as in LSTM (AE FC #units)
slide-45
SLIDE 45