Au Autoenc ncoders Prof. Leal-Taix and Prof. Niessner 1 Mac - - PowerPoint PPT Presentation

au autoenc ncoders
SMART_READER_LITE
LIVE PREVIEW

Au Autoenc ncoders Prof. Leal-Taix and Prof. Niessner 1 Mac - - PowerPoint PPT Presentation

Au Autoenc ncoders Prof. Leal-Taix and Prof. Niessner 1 Mac Machine e lear earning Unsupervised learning Supervised learning Labels or target classes Goal: learn a mapping from input to label Classification,


slide-1
SLIDE 1

Au Autoenc ncoders

  • Prof. Leal-Taixé and Prof. Niessner

1

slide-2
SLIDE 2

Mac Machine e lear earning

  • Prof. Leal-Taixé and Prof. Niessner

2

  • Labels or target

classes

  • Goal: learn a mapping

from input to label

  • Classification,

regression

Unsupervised learning Supervised learning

slide-3
SLIDE 3

DOG DOG DOG CAT CAT CAT

Mac Machine e lear earning

  • Prof. Leal-Taixé and Prof. Niessner

3

Unsupervised learning Supervised learning

slide-4
SLIDE 4

Unsupervised learning Supervised learning

Mac Machine e lear earning

  • No label or target class
  • Find out properties of

the structure of the data

  • Clustering (k-means,

PCA)

  • Prof. Leal-Taixé and Prof. Niessner

4

DOG DOG DOG CAT CAT CAT

slide-5
SLIDE 5

DOG DOG DOG CAT CAT CAT

Mac Machine e lear earning

  • Prof. Leal-Taixé and Prof. Niessner

5

Unsupervised learning Supervised learning

slide-6
SLIDE 6

Mac Machine e lear earning

  • Prof. Leal-Taixé and Prof. Niessner

6

Unsupervised learning Supervised learning

DOG DOG DOG CAT CAT CAT

slide-7
SLIDE 7

Un Unsupervi vised le learning with au autoenc ncoders

  • Prof. Leal-Taixé and Prof. Niessner

7

slide-8
SLIDE 8

Au Auto toenc ncoders

  • Unsupervised approach for learning a lower-

dimensional feature representation from unlabeled training data

  • Prof. Leal-Taixé and Prof. Niessner

8

slide-9
SLIDE 9

Au Auto toenc ncoders

  • From an input

image to a feature representation (bottleneck layer)

  • Encoder: a CNN in
  • ur case
  • Prof. Leal-Taixé and Prof. Niessner

9

Conv Input Image

z x

slide-10
SLIDE 10

Au Auto toenc ncoders

  • Why do we need this dimensionality reduction?
  • To capture the patterns, the most meaningful factors
  • f variation in our data
  • Other dimensionality reduction methods?
  • Prof. Leal-Taixé and Prof. Niessner

10

slide-11
SLIDE 11

Au Auto toenc ncoder: tr traini ning ng

  • Prof. Leal-Taixé and Prof. Niessner

11

Conv Transpose Conv Input Image Output Image

Reconstruction Loss (like L1, L2)

slide-12
SLIDE 12

Au Auto toenc ncoder: tr traini ning ng

  • Prof. Leal-Taixé and Prof. Niessner

12

Latent space z dim (z) < dim (x) Input x Reconstruction x’ Input images Reconstructed images

slide-13
SLIDE 13

Au Auto toenc ncoder: tr traini ning ng

  • No labels

required

  • We can use

unlabeled data to first get its structure

  • Prof. Leal-Taixé and Prof. Niessner

13

Latent space z dim (z) < dim (x) Input x Reconstruction x’

slide-14
SLIDE 14

Au Auto toenc ncoder: Use : Use C Case ses

  • Prof. Leal-Taixé and Prof. Niessner

14

Embedding of MNIST numbers

slide-15
SLIDE 15

Au Auto toenc ncoder for pre-tr traini ning ng

  • Test case: medical applications based on CT images

– Large set of unlabeled data. – Small set of labeled data.

  • We cannot do: take a network pre-trained on
  • ImageNet. Why?
  • The image features are different CT vs natural images
  • Prof. Leal-Taixé and Prof. Niessner

15

slide-16
SLIDE 16

Au Auto toenc ncoder for pre-tr traini ning ng

  • Test case: medical applications based on CT images

– Large set of unlabeled data. – Small set of labeled data.

  • We can do: pre-train our network using an

autoencoder to “learn” the type of features present in CT images

  • Prof. Leal-Taixé and Prof. Niessner

16

slide-17
SLIDE 17

Au Auto toenc ncoder for pre-tr traini ning ng

  • Step 1: Unsupersived training with autoencoders
  • Prof. Leal-Taixé and Prof. Niessner

17

Input Reconstruction

slide-18
SLIDE 18

Au Auto toenc ncoder for pre-tr traini ning ng

  • Step 2: Supervised training with the labeled data
  • Prof. Leal-Taixé and Prof. Niessner

18

Input Reconstruction Throw away the decoder

slide-19
SLIDE 19

Au Auto toenc ncoder for pre-tr traini ning ng

  • Step 2: Supervised training with the labeled data
  • Prof. Leal-Taixé and Prof. Niessner

19

Input

z x y y∗

Ground truth labels for supervised learning Loss Backprop as always

slide-20
SLIDE 20

Wh Why usi using au autoen

  • encoder
  • ders?
  • Use 1: pre-training, as mentioned before

– Image à same image reconstructed – Use the encoder as “feature extractor”

  • Use 2: Use them to get pixel-wise predictions

– Image à semantic segmentation – Low-resolution image à High-resolution image – Image à Depth map

  • Prof. Leal-Taixé and Prof. Niessner

20

slide-21
SLIDE 21

Au Autoenc ncoders fo for pi pixel xel-wis wise predic ictio ions ns

  • Prof. Leal-Taixé and Prof. Niessner

21

slide-22
SLIDE 22

Se Semanti ntic c Se Segmenta ntati tion n (FCN)

  • Recall the Fully Convolutional Networks
  • Prof. Leal-Taixé and Prof. Niessner

22

[Long et al. 15] Fully Convolutional Networks for Semantic Segmetnation (FCN)

Can we do better?

slide-23
SLIDE 23

Se SegNet

  • Prof. Leal-Taixé and Prof. Niessner

23

Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016

slide-24
SLIDE 24

Se SegNet

  • Enc

Encoder: normal convolutional filters + pooling

  • De

Decoder: Upsampling + convolutional filters

  • Prof. Leal-Taixé and Prof. Niessner

24

Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016

slide-25
SLIDE 25

Se SegNet

  • Enc

Encoder: normal convolutional filters + pooling

  • De

Decoder: Upsampling + convolutional filters

  • Prof. Leal-Taixé and Prof. Niessner

25

Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016

slide-26
SLIDE 26

Se SegNet

  • Enc

Encoder: normal convolutional filters + pooling

  • De

Decoder: Upsampling + convolutional filters

  • The convolutional filters in the decoder are learned

using backprop and their goal is to refine the upsampling

  • Prof. Leal-Taixé and Prof. Niessner

26

Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016

slide-27
SLIDE 27

Rec Recal all tran anspos

  • sed

ed con

  • nvol
  • lution
  • n
  • Transposed convolution
  • Unpooling
  • Convolution filter (learned)
  • Also called up-convolution

(never deconvolution)

  • Prof. Leal-Taixé and Prof. Niessner

27

Output 5x5 Input 3x3

slide-28
SLIDE 28

Se SegNet

  • Enc

Encoder: normal convolutional filters + pooling

  • De

Decoder: Upsampling + convolutional filters

  • Softmax

ax layer: The output of the soft-max classifier is a K channel image of probabilities where K is the number of classes.

  • Prof. Leal-Taixé and Prof. Niessner

28

Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016

slide-29
SLIDE 29

Upsampli ling

  • Prof. Leal-Taixé and Prof. Niessner

29

slide-30
SLIDE 30

Ty Types of upsa upsampl plings gs

  • 1. Interpolation
  • Prof. Leal-Taixé and Prof. Niessner

30

?

slide-31
SLIDE 31

Ty Types of upsa upsampl plings gs

  • 1. Interpolation
  • Prof. Leal-Taixé and Prof. Niessner

31

Original image Nearest neighbor interpolation Bilinear interpolation Bicubic interpolation

Image: Michael Guerzhoy

slide-32
SLIDE 32

Ty Types of upsa upsampl plings gs

  • 1. Interpolation

Few artifacts

  • Prof. Leal-Taixé and Prof. Niessner

32

Image: Michael Guerzhoy

slide-33
SLIDE 33

Ty Types of upsa upsampl plings gs

  • 2. Fixed unpooling
  • Prof. Leal-Taixé and Prof. Niessner

33

  • A. Dosovitskiy, “Learning to Generate Chairs, Tables and Cars with Convolutional Networks“. TPAMI 2017

+ CONVS

efficient

slide-34
SLIDE 34

Ty Types of upsa upsampl plings gs

  • 3. Unpooling: “à la DeconvNet”
  • Prof. Leal-Taixé and Prof. Niessner

Keep the locations where the max came from

34

slide-35
SLIDE 35

Ty Types of upsa upsampl plings gs

  • 3. Unpooling: “à la DeconvNet”
  • Prof. Leal-Taixé and Prof. Niessner

35

Now: convolutional filters are LEARNED In DeConvNet: we convolve with the transpose of the learned filter

slide-36
SLIDE 36

Ty Types of upsa upsampl plings gs

  • 3. Unpooling: “à la DeconvNet”

Keep the details of the structures

  • Prof. Leal-Taixé and Prof. Niessner

36

slide-37
SLIDE 37

U-Net o Net or s skip con connecti ection

  • ns i

s in au autoenc ncoders

  • Prof. Leal-Taixé and Prof. Niessner

38

slide-38
SLIDE 38

Ski Skip Conne nnecti ctions ns

  • U-Net
  • Prof. Leal-Taixé and Prof. Niessner

39

  • O. Ronneberger et al. “U-Net: Convolutional Networks for Biomedical Image Segmentation”. MICCAI 2015

Pass the low- level information High-level information Recall ResNet

slide-39
SLIDE 39

Ski Skip Conne nnecti ctions ns

  • U-Net: zoom in
  • Prof. Leal-Taixé and Prof. Niessner

40

  • O. Ronneberger et al. “U-Net: Convolutional Networks for Biomedical Image Segmentation”. MICCAI 2015

append

slide-40
SLIDE 40

Ski Skip Conne nnecti ctions ns

  • Prof. Leal-Taixé and Prof. Niessner

41

  • C. Hazirbas et al. “Deep depth from focus”. ACCV 2018
  • Concatenation connections
slide-41
SLIDE 41

Ski Skip Conne nnecti ctions ns

  • Prof. Leal-Taixé and Prof. Niessner

42

  • Widely used in Autoencoders
  • At what levels the skip connections are needed

depends on your problem

slide-42
SLIDE 42

Au Autoenc ncoders in in Vi Visi sion

  • n
  • Prof. Leal-Taixé and Prof. Niessner

43

slide-43
SLIDE 43

Se SegNet

  • Prof. Leal-Taixé and Prof. Niessner

44

Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016

slide-44
SLIDE 44

Se SegNet

  • Prof. Leal-Taixé and Prof. Niessner

45

Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016

Input Ground truth SegNet

slide-45
SLIDE 45

Mon Monoc

  • cular

ar dep depth

  • Unsupervised monocular depth estimation
  • Prof. Leal-Taixé and Prof. Niessner

46

  • R. Garg et al. „Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue“ ECCV 2016
slide-46
SLIDE 46

Im Image super resolution

  • Image in low resolution à Image in high resolution
  • Problems:

– The content of the image needs to pass through the network (skip connections [2] or other strategies [1]).

  • Prof. Leal-Taixé and Prof. Niessner

47

[2] XJ. Mao et al. “Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetric Skip Connections“. NIPS 2016 [1] C. Dong et al. „Image Super-Resolution Using Deep Convolutional Networks“. TPAMI 2015

slide-47
SLIDE 47

Im Image super resolution

  • Why not learning the residual only? à Much easier!
  • Prof. Leal-Taixé and Prof. Niessner

48

  • J. Kim et al. „Accurate Image Super-Resolution Using Very Deep Convolutional Networks“. CVPR 2016
slide-48
SLIDE 48

Im Image synthesis

  • Semantic segmentation image à Real image
  • Prof. Leal-Taixé and Prof. Niessner

49

  • Q. Chen and V. Koltun „Photographic Image Synthesis with Cascaded Refinement Networks“. ICCV 2017
slide-49
SLIDE 49

Im Image synthesis

  • Semantic segmentation image à Real image
  • No GANs?
  • Prof. Leal-Taixé and Prof. Niessner

50

  • Q. Chen and V. Koltun „Photographic Image Synthesis with Cascaded Refinement Networks“. ICCV 2017
slide-50
SLIDE 50

Im Image synthesis

  • Several works show that one can use a perceptual

loss to achieve high quality results

  • Cannot use the L2 loss as this could penalize realistic

results (black car vs white car)

  • Perceptual loss measures the „content of the image“
  • Prof. Leal-Taixé and Prof. Niessner

51

  • A. Dosovitskiy and T. Brox. „Generating Images with Perceptual Similarity Metrics based on Deep Networks“. NIPS 2016
  • Q. Chen and V. Koltun „Photographic Image Synthesis with Cascaded Refinement Networks“. ICCV 2017
slide-51
SLIDE 51

Perceptual l lo loss and style le transfer

  • Prof. Leal-Taixé and Prof. Niessner

52

slide-52
SLIDE 52

Con Conten ent los

  • ss
  • Content loss (or perceptual loss or feature

reconstruction loss).

  • Use a network to compute the loss
  • Prof. Leal-Taixé and Prof. Niessner

53

Gatys et al „A neural algorithm of artistic style“. arXiv preprint arXiv:1508.06576 (2015)

slide-53
SLIDE 53

Con Conten ent los

  • ss
  • 1. Take a VGG network trained for image classification
  • 2. Pass the generated image and the ground truth

through the network

  • 3. Compare the feature maps
  • Prof. Leal-Taixé and Prof. Niessner

54

Feature maps of the generated image at layer j Feature maps of the ground truth image at layer j Feature map size (channels, height, width)

slide-54
SLIDE 54

Con Conten ent los

  • ss
  • Intuition: if there was a car in the original image, we

want to have “similar” features triggered for the generated image

  • This means we want to “roughly see a car” in the

generated image too (but, e.g., color does not matter)

  • Prof. Leal-Taixé and Prof. Niessner

55

slide-55
SLIDE 55

Sty Style le Tr Trans nsfer

  • The content loss was originally introduced for style

transfer [1]

  • Prof. Leal-Taixé and Prof. Niessner

56

[1] Gatys et al „A neural algorithm of artistic style“. arXiv preprint arXiv:1508.06576 (2015) Image: J. Johnson

slide-56
SLIDE 56
slide-57
SLIDE 57

Sty Style le Tr Trans nsfer

  • Content loss: feature representation similarity
  • Style loss:
  • Comparing Gram matrices
  • Prof. Leal-Taixé and Prof. Niessner

58

Gatys et al „A neural algorithm of artistic style“. arXiv preprint arXiv:1508.06576 (2015)

  • J. Johnson at al. „Perceptual losses for real-time style transfer and super-resolution“ ECCV 2016

Gram matrix of the features of layer j

slide-58
SLIDE 58

Sty Style le lo loss

  • 1. Take a VGG network trained for image classification
  • 2. Pass the generated image and the ground truth

through the network

  • 3. Compute the Gram matrices at a certain layer
  • Comparing channels c and c’
  • Prof. Leal-Taixé and Prof. Niessner

59

slide-59
SLIDE 59

Sty Style le lo loss

  • Intuition: it captures information about which features

tend to activate together.

  • In practice: vectorize the feature maps to the size of

C x (HW)

  • This loss preserves the stylistic features but not the

content

  • Prof. Leal-Taixé and Prof. Niessner

60

slide-60
SLIDE 60

Sty Style le Tr Trans nsfer

  • Prof. Leal-Taixé and Prof. Niessner

61

Star art wi with h a a whi white no noise imag age

slide-61
SLIDE 61

Sty Style le Tr Trans nsfer

  • Prof. Leal-Taixé and Prof. Niessner

62

More weight to the content loss More weight to the style loss

Image: Johnson/Fei-Fei/ Yeung

slide-62
SLIDE 62

Sty Style le Tr Trans nsfer

  • The aforementioned method is slow, requires many

forward/backward passes through VGG.

  • Fast Neural style transfer à Train a Neural network to

do the transfer (one network per style)

  • Prof. Leal-Taixé and Prof. Niessner

63

  • J. Johnson at al. „Perceptual losses for real-time style transfer and super-resolution“ ECCV 2016
slide-63
SLIDE 63

Fa Fast st st style t transf sfer

  • Training: use multiple content images, use the style

image to compute the loss

  • Prof. Leal-Taixé and Prof. Niessner

64

Content loss Style loss

slide-64
SLIDE 64

Fa Fast st st style t transf sfer

  • Training: use multiple content images, use the style

image to compute the loss

  • Test: one forward pass is enough!
  • Prof. Leal-Taixé and Prof. Niessner

65

slide-65
SLIDE 65

Ot Other r us uses of f aut utoenco coders rs

  • Anomaly detection. For example: C. Baur et al. „Deep

Autoencoding Models for Unsupervised Anomaly Segmentation in Brain MR Images“ MICCAI 2018

  • Deep multimodal autoencoders à to mix the

representation of several sources (audio and video)

  • Prof. Leal-Taixé and Prof. Niessner

66

slide-66
SLIDE 66

Ne Next l lecture ure

  • Next lecture on Monday 3rd
  • Make sure you are working on your projects!
  • Group #2 presenting this Friday!
  • Prof. Leal-Taixé and Prof. Niessner

67