8. Other Deep Architectures CS 519 Deep Learning, Winter 2018 Fuxin - - PowerPoint PPT Presentation

β–Ά
8 other deep architectures
SMART_READER_LITE
LIVE PREVIEW

8. Other Deep Architectures CS 519 Deep Learning, Winter 2018 Fuxin - - PowerPoint PPT Presentation

8. Other Deep Architectures CS 519 Deep Learning, Winter 2018 Fuxin Li With materials from Zsolt Kira and Ian Goodfellow A brief overview of other architectures Unsupervised Architectures Deep Belief Networks Autoencoders GANs


slide-1
SLIDE 1
  • 8. Other Deep Architectures

CS 519 Deep Learning, Winter 2018 Fuxin Li

With materials from Zsolt Kira and Ian Goodfellow

slide-2
SLIDE 2

A brief overview of other architectures

  • Unsupervised Architectures
  • Deep Belief Networks
  • Autoencoders
  • GANs
  • Temporal Architectures
  • Recurrent Neural Networks (RNN)
  • LSTM
  • We will carefully cover those items later
  • Right now just a brief overview in case that you might be tempted to use

them in your project

slide-3
SLIDE 3

Unsupervised Deep Learning

  • CNN is most successful with a lot of training examples
  • What can we do if we do not have any training example?
  • Or have very few of them?
slide-4
SLIDE 4

Remember PCA: Characteristics and Limitations

  • Easy: Can perform Eigen decomposition
  • Select first K components based on how much variance is capture
  • Bases are orthogonal
  • Optimal under some assumptions (Gaussian)
  • Assumptions almost never true in real data
slide-5
SLIDE 5

PCA as a β€œneural network”

Input vector Input vector code

  • PCA goal:
  • Minimize reconstruction error

min

𝐖 ෍ 𝑗=1 π‘œ

π’šπ’‹ βˆ’ π–π–βŠ€π’šπ‘—

2

π’šπ’‹ π–βŠ€π’šπ‘— π–π–βŠ€π’šπ‘—

slide-6
SLIDE 6

Generalize PCA to multi-layer nonlinear network

input vector

  • utput vector

code

Many encoding layers Many decoding layers

  • Deep Autoencoder
  • Same as other NN (linear

transform + nonlinearity + linear transform etc.)

  • Only difference is that after

decoding, strive to reconstruct the original input

  • Can have

convolutional/fully- connected/sparse versions

slide-7
SLIDE 7

Krizhevsky’s deep autoencoder

1024 1024 1024 8192 4096 2048 1024 512

256-bit binary code The encoder has about 67,000,000 parameters. It takes a few days on a GTX 285 GPU to train on two million images (Tiny dataset)

slide-8
SLIDE 8

Reconstructions of 32x32 color images from 256-bit codes

slide-9
SLIDE 9

retrieved using 256 bit codes retrieved using Euclidean distance in pixel intensity space

slide-10
SLIDE 10

retrieved using 256 bit codes retrieved using Euclidean distance in pixel intensity space

slide-11
SLIDE 11

Generative Adversarial Networks

slide-12
SLIDE 12

Generative Adversarial Networks

  • Cost for the discriminator:
  • Standard cross-entropy loss, with everything from π‘žπ‘’π‘π‘’π‘ label 1, and

everything from 𝑨 label 0

  • Cost for the generator:
  • Try to generate examples to β€œfool” the discriminator
slide-13
SLIDE 13

DCGAN

slide-14
SLIDE 14

Samples of DCGAN-generated images

slide-15
SLIDE 15

DCGAN representations

slide-16
SLIDE 16

Text-to-Image with GANs

slide-17
SLIDE 17

Text-to-Image with GANs

slide-18
SLIDE 18

Problems

slide-19
SLIDE 19

Problems

slide-20
SLIDE 20

iGAN

https://www.youtube.com/watch?v=9c4z6YsBGQ0

slide-21
SLIDE 21
  • Temporal, Sequences
  • Tied weights
  • Some additional variants: Recursive Autoencoders, Long

Short-Term Memory (LSTM)

Recurrent Neural Networks (RNNs)

slide-22
SLIDE 22

Machine Translation

  • Have to look at the entire sentence (or, many sentences)
slide-23
SLIDE 23

Image Captioning

slide-24
SLIDE 24

Restricted Boltzmann Machines

  • Generative version of the encoder
  • Binary-valued hidden variables
  • Define probabilities such as 𝑄 β„Žπ‘— π‘Œ and 𝑄(𝑦𝑗|𝐼)
  • You can generate samples of observed variables from hidden
  • Think as an extension of probabilistic PCA
  • Only if you are into generative models (PGM class)
  • Unsupervised pre-training method to train it (Hinton, Salakhutdinov

2006)

  • Convolutional and fully connected version available
  • Doesn’t perform very well..
slide-25
SLIDE 25

Fooling a deep network(Szegedy et al. 2013)

  • Optimizing a delta from the image to maximize a class prediction 𝑔

𝑑(𝑦)

m𝑏𝑦

Δ𝐽

𝑔

𝑑 𝐽 + Δ𝐽 βˆ’ πœ‡||Δ𝐽||2 (Szegedy et al. 2013, Goodfellow et al. 2014, Nguyen et al. 2015) Goldfish (95.15% confidence) Shark (93.89% confidence)

= =

+0.03 +0.03 Giant Panda (99.32% confidence)

𝐽 Δ𝐽 Δ𝐽