Disentanglement of Visual Concepts from Classifying and Synthesizing - - PowerPoint PPT Presentation

disentanglement of visual concepts from classifying and
SMART_READER_LITE
LIVE PREVIEW

Disentanglement of Visual Concepts from Classifying and Synthesizing - - PowerPoint PPT Presentation

Disentanglement of Visual Concepts from Classifying and Synthesizing Scenes Bolei Zhou The Chinese University of Hong Kong Representation Learning The purpose of representation learning: To identify and disentangle the underlying


slide-1
SLIDE 1

Disentanglement of Visual Concepts from Classifying and Synthesizing Scenes

Bolei Zhou The Chinese University of Hong Kong

slide-2
SLIDE 2

Representation Learning

The purpose of representation learning: “To identify and disentangle the underlying explanatory factors hidden in the observed milieu of low-level sensory data.”

Bengio, et al. Representation Learning: A review and new perspectives.

slide-3
SLIDE 3

Sources of Deep Representations

Scene Recognition Object Recognition

Colorization ECCV’16 and CVPR’17 Audio prediction, ECCV’16

Self Supervised Learning Image Classification Image Generation

slide-4
SLIDE 4

Outline

  • Disentanglement of Concepts from Classifying Scenes
  • Sanity Check Experiment: Mixture of MNIST
  • Disentanglement of Visual Concepts from Synthesizing Scenes
  • Future Directions
slide-5
SLIDE 5

My Previous Talks

  • On the importance of single units

CVPR’18 Tutorial talk: https://www.youtube.com/watch?v=1aSS5GEH58U

  • Interpretable representation learning for visual intelligence

MIT thesis defense: https://www.youtube.com/watch?v=J7Zz_33ZeJc

slide-6
SLIDE 6

http://places2.csail.mit.edu/demo.html https://github.com/CSAILVision/places365

Neural Networks for Scene Classification

slide-7
SLIDE 7

Cafeteria (0.9)

Convolutional Neural Network (CNN)

Units as concept detectors

Unit2 at Layer4: Lamp Unit 22 at Layer 5: Face Unit42 at Layer3 : Trademark Unit 57 at Layer4: Windows

What are the internal units for classifying scenes?

slide-8
SLIDE 8

What is a unit doing? - Visualize the unit

[Zeiler et al., ECCV’14] [Girshick et al., CVPR’14]

Deconvolution

[Simonyan et al., ICLR’15] [Springerberg et al., ICLR’15] [Selvaraju, ICCV’17]

Back-propagation Image Synthesis

[Nguyen et al., NIPS’16] [Dosovitskiy et al., CVPR’16] [Mahendran, et al., CVPR’15]

slide-9
SLIDE 9

Unit1: Top activated images

Data Driven Visualization

Unit2: Top activated images Unit3: Top activated images

Layer 5 https://github.com/metalbubble/cnnvisualizer

slide-10
SLIDE 10

Annotating the Interpretation of Units

Word/Description to summarize the images: ______

Amazon Mechanical Turk

Which category the description belongs to:

  • Scene
  • Region or surface
  • Object
  • Object part
  • Texture or material
  • Simple elements or colors

Lamp

[Zhou, Khosla, Lapedriza, Oliva, Torralba. ICLR 2015]

slide-11
SLIDE 11

Interpretable Representations for Objects and Scenes

59 units as objects at conv5 of AlexNet on ImageNet

tie bird dog dog

151 units as objects at conv5 of AlexNet on Places

building windows baseball field face

slide-12
SLIDE 12

Network Dissection

[Zhou*, Bau*, et al. TPAMI’18, CVPR 2017]

Interpretable Units

IoU 0.16

road

conv5 unit 107 (object) IoU 0.14

car

conv5 unit 79 (object) IoU 0.14

waffled

conv5 unit 252 (texture) IoU 0.13

grid

conv5 unit 191 (texture) IoU 0.13

honeycombed

conv5 unit 41 (texture) IoU 0.13

mountain

conv5 unit 144 (object) IoU 0.13

grass

conv5 unit 88 (object) IoU 0.12

paisley

conv5 unit 229 (texture)

6 units

water tree grass plant car windowpane sea airplane mountain skyscraper ceiling building dog person road painting stove bed chair horse floor house sky track bus waterfall sink cabinet shelf pool table sidewalk book ball pit mountain snowy street skyscraper pantry building facade hair wheel head screen shop window crosswalk food wood lined dotted studded banded zigzagged honeycombed grid paisley potholed meshed swirly spiralled freckled sprinkled fibrous waffled pleated grooved cracked chequered cobwebbed matted stratified perforated woven red

32 objects 6 scenes 6 parts 2 materials 25 textures 1 color

Quantify the Interpretability of Networks

slide-13
SLIDE 13

Evaluate Unit for Semantic Segmentation

Top Concept: Lamp, Intersection over Union (IoU)= 0.23 Unit 1: Top activated images from the Testing Dataset

Testing Dataset: 60,000 images annotated with 1,200 concepts

slide-14
SLIDE 14

Layer5 unit 79 car (object) IoU=0.13 Layer5 unit 107 road (object) IoU=0.15

118/256 units covering 72 unique concepts

slide-15
SLIDE 15

AlexNet VGG GoogLeNet ResNet

House Airplane

slide-16
SLIDE 16

More results in the TPAMI extension paper

Interpreting Deep Visual Representations via Network Dissection https://arxiv.org/pdf/1711.05611.pdf Comparison of different network architectures Comparison of supervisions (supervised v.s. self-supervised)

slide-17
SLIDE 17

Sanity Check Experiment for Disentanglement

  • How to quantitatively evaluate the solution reached by CNN?
  • What are the hidden factors in object recognition and scene

recognition?

Scene Recognition Object Recognition

slide-18
SLIDE 18

Sanity Check Experiment for Disentanglement

A controlled classification experiment: Mixture of MNIST

10 digits from MNIST Pairwise combination of digits Class 1 (3,6) Class 2 (0,2) Class 3 (4,5) Class N

With Wentao Zhu (PKU)

slide-19
SLIDE 19

Solving Mixture of MNIST

To classify the given image into one of 45 classes

  • Training data20,000 images
  • Accuracy on validation set: 91.7%

Layer1: 10 units Layer2: 20 units Layer3: 10 units Global average pooling Softmax: 45 classes

Class number

A simple convnet for classification

slide-20
SLIDE 20

Digit Detectors Emerge from Solving Mixture of MNIST

Unit03 for detecting digit 0 Precision: @100=1.00 @300=1.00 @500=1.00 @700=0.99 @(recall=0.25)=0.99 @(recall=0.50)=0.98 @(recall=0.75)=0.90 Top activated images: Activation:

slide-21
SLIDE 21

Digit Detectors Emerge from Solving Mixture of MNIST

Two metrics for unit importance: alignment score and ablation effect

slide-22
SLIDE 22

Dropout Affects the Unit as Digit Detector

Baseline Baseline + Dropout on the conv3

slide-23
SLIDE 23

Dropout Affects the Unit as Digit Detector

Baseline Baseline + Dropout on the conv3

slide-24
SLIDE 24

Layer Width Affects the Unit as Digit Detector

  • Wider network performs better at disentanglement
  • Less reliance on single units

Baseline Baseline with tripling the number of units at conv3

slide-25
SLIDE 25

Layer Width Affects the Unit as Digit Detector

  • Wider layer performs better at disentanglement
  • Less reliance on single units

Baseline Baseline with tripling the number of units at conv3

slide-26
SLIDE 26

Wider layer + Dropout

Baseline Baseline with wider layer Baseline with wider layer + dropout

slide-27
SLIDE 27

Usefulness Experiment

  • Take 8 and 9 as redundant digits (randomly shown in all classes)
  • Effective digits: 0-7
  • Number of classes: 28
slide-28
SLIDE 28

Deep Neural Networks for Synthesizing Scenes

Goodfellow, et al. NIPS’14 Radford, et al. ICLR’15 T Karras et al. 2017

  • A. Brock, et al. 2018

Generative Adversarial Networks

slide-29
SLIDE 29

T Karras et al. 2017

slide-30
SLIDE 30

How to Add or Modify Contents?

Input: Random noise Output: Synthesized image Add trees Add domes

slide-31
SLIDE 31

Understanding the Internal Units in GANs

What are they doing?

David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, J. Tenenbaum, W. Freeman, A. Torralba. GAN Dissection: Visualizing and Understanding GANs. ICLR’19. https://arxiv.org/pdf/1811.10597.pdf

Input: Random noise Output: Synthesized image

slide-32
SLIDE 32

Framework of GAN Dissection

David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, J. Tenenbaum, W. Freeman, A. Torralba. GAN Dissection: Visualizing and Understanding GANs. https://arxiv.org/pdf/1811.10597.pdf

slide-33
SLIDE 33

Unit 365 draws trees. Unit 43 draws domes. Unit 14 draws grass. Unit 276 draws towers.

Units Emerge as Drawing Objects

slide-34
SLIDE 34

Manipulating the Images

Synthesized Images Synthesized Images with Unit 4 removed Unit 4 for drawing Lamp

slide-35
SLIDE 35

Interactive Image Manipulation

All the code and paper are available at http://gandissect.csail.mit.edu

slide-36
SLIDE 36

Latest Work on Using GAN to Manipulate Real Image

  • Challenge: Invert hidden code for any given image

Input: Hidden code z Output: Synthesized image

slide-37
SLIDE 37

Future Directions

Generalization & Overfitting Plasticity & Transfer Learning GAN & Deep RL

Defend and Attack by Adversarial Samples

Network Compression

Interpretable Deep Learning

slide-38
SLIDE 38

Why Care About Interpretability?

‘Alchemy’ of Deep Learning ‘Chemistry’ of Deep Learning

Scientific Understanding