Deep Transfer Learning for Visual Analysis Yu-Chiang Frank Wang, - - PowerPoint PPT Presentation

deep transfer learning for visual analysis
SMART_READER_LITE
LIVE PREVIEW

Deep Transfer Learning for Visual Analysis Yu-Chiang Frank Wang, - - PowerPoint PPT Presentation

Deep Transfer Learning for Visual Analysis Yu-Chiang Frank Wang, Associate Professor Dept. Electrical Engineering, National Taiwan University Taipei, Taiwan 2018/5/19 2 nd AII Workshop Trends of Deep Learning 2 Transfer Learning: What, When,


slide-1
SLIDE 1

Deep Transfer Learning for Visual Analysis

Yu-Chiang Frank Wang, Associate Professor

  • Dept. Electrical Engineering, National Taiwan University

Taipei, Taiwan

2018/5/19 2nd AII Workshop

slide-2
SLIDE 2

Trends of Deep Learning

2

slide-3
SLIDE 3

https://techcrunch.com/2017/02/08/udacity-open-sources-its-self-driving-car-simulator-for-anyone-to-use/ https://googleblog.blogspot.tw/2014/04/the-latest-chapter-for-self-driving-car.html

Transfer Learning: What, When, and Why? (cont’d)

  • A practical example

3

slide-4
SLIDE 4

Recent Research Focuses on Transfer Learning

  • CVPR 2018

Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation

  • AAAI 2018

Order-Free RNN with Visual Attention for Multi-Label Classification

  • CVPR 2018

Multi-Label Zero-Shot Learning with Structured Knowledge Graphs

  • CVPRW 2018

Unsupervised Deep Transfer Learning for Person Re-Identification

4

slide-5
SLIDE 5

Detach & Adapt – Beyond Image Style Transfer

  • Faceapp – Putting a smile on your face!
  • Deep learning for representation disentanglement
  • Interpretable deep feature representation

Input

  • Mr. Takeshi Kaneshiro

5

slide-6
SLIDE 6

Detach & Adapt – Beyond Image Style Transfer

  • Cross-domain image synthesis, manipulation & translation

Disentangle smile from

Photo

Disentangle smile from

Cartoon

Transfer w/o supervision With supervision

Y.-C. F. Wang et al., Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation, CVPR 2018 6

slide-7
SLIDE 7

Detach & Adapt – Beyond Image Style Transfer

  • Cross-domain image synthesis, manipulation & translation [CVPR’18]

Y.-C. F. Wang et al., Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation, CVPR 2018

With supervision W/o supervision attribute

7

slide-8
SLIDE 8

Example Results

  • Face
  • Photo & Sketch
  • Conditional Unsupervised Image Translation

w/o Label supervision w/o Label supervision

Unpaired

Y.-C. F. Wang et al., Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation, CVPR 2018 8

slide-9
SLIDE 9

Comparisons

Cross-Domain Image Translation Representation Disentanglement Unpaired Training Data Multi- domains Bi-direction Joint Representation Unsupervised Interpretability of disentangled factor Pix2pix

X X X X

Cannot disentangle image representation

CycleGAN

O X O X

StarGAN

O O O X

UNIT

O X O O

DTN

O X X O

infoGAN

Cannot translate images across domains

O X

AC-GAN

X O

CDRD (Ours)

O O O O

Partially

O

9

slide-10
SLIDE 10

Recent Research Focuses on Transfer Learning

  • CVPR 2018

Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation

  • AAAI 2018

Order-Free RNN with Visual Attention for Multi-Label Classification

  • CVPR 2018

Multi-Label Zero-Shot Learning with Structured Knowledge Graphs

  • CVPRW 2018

Unsupervised Deep Transfer Learning for Person Re-Identification

10

slide-11
SLIDE 11

Multi-Label Classification for Image Analysis

  • Prediction of multiple object labels from an image
  • Learning across image and semantics domains
  • No object detectors available
  • Desirable if be able to exploit label co-occurrence info

Labels: Person Table Sofa Chair TV Lights Carpet …

11

slide-12
SLIDE 12
  • Canonical-Correlated Autoencoder (C2AE) [Wang et al., AAAI 2017]
  • Unique integration of autoencoder & deep canonical correlation analysis (DCCA)
  • Autoencoder: label embedding + label recovery + label co-occurrence
  • DCCA: joint feature & label embedding
  • Can handle missing labels during learning

DNN for Multi-Label Classification

Latent space

label space

label space

feature space

Clouds Lake Ocean Water Sky Sun Sunset Clouds Lake Ocean Water Sky Sun Sunset

Y.-C. F. Wang et al., Learning Deep Latent Spaces for Multi-Label Classification, AAAI 2017 12

slide-13
SLIDE 13

Order-Free RNN with Visual Attention for Multi-Label Classification [AAAI’18]

Y.-C. F. Wang et al., Order-Free RNN with Visual Attention for Multi-Label Classification, AAAI 2018

  • Visual Attention for MLC [Wang et al., AAAI’18]

13

slide-14
SLIDE 14

Order-Free RNN with Visual Attention for Multi-Label Classification

Y.-C. F. Wang et al., Order-Free RNN with Visual Attention for Multi-Label Classification, AAAI 2018

  • Experiments
  • NUS-WIDE: 269,648 images with 81 labels
  • MS-COCO: 82,783 images with 80 labels
  • Quantitative Evaluation

NUS-WIDE MS-COCO

14

slide-15
SLIDE 15

Order-Free RNN with Visual Attention for Multi-Label Classification

Y.-C. F. Wang et al., Order-Free RNN with Visual Attention for Multi-Label Classification, AAAI 2018

  • Qualitative Evaluation

Example images in MS-COCO with the associated attention maps Incorrect predictions with reasonable visual attention

15

slide-16
SLIDE 16

Multi-Label Zero-Shot Learning with Structured Knowledge Graphs [CVPR’18]

  • Utilizing structured knowledge graphs for modeling label dependency

16

slide-17
SLIDE 17
  • Our Proposed Network

17

slide-18
SLIDE 18
  • Our Proposed Network

18

slide-19
SLIDE 19

Order-Free RNN with Visual Attention for Multi-Label Classification

  • Experiments
  • NUS-WIDE: 269,648 images with 1000 labels
  • MS-COCO: 82,783 images with 80 labels
  • Quantitative Evaluation
  • ML vs. ML-ZSL vs. Generalized ML-ZSL

19

slide-20
SLIDE 20

Recent Research Focuses on Transfer Learning

  • CVPR 2018

Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation

  • AAAI 2018

Order-Free RNN with Visual Attention for Multi-Label Classification

  • CVPR 2018

Multi-Label Zero-Shot Learning with Structured Knowledge Graphs

  • CVPRW 2018

Unsupervised Deep Transfer Learning for Person Re-Identification

20

slide-21
SLIDE 21

Introduction: Person re-identification

Person re-identification task: the system needs to match appearances of a person of interest across non-overlapping cameras.

Camera #1 Camera #2 Camera #3 Camera #4

21

slide-22
SLIDE 22

Target Dataset 𝐽𝑢 w/o labels Source Dataset 𝐽s w/ labels

𝑌 $% 𝑌 $&

ℒ()*&& ℒ(%+&

𝐷- Latent Encoder Latent Decoder

Classifier

𝑌𝑢 𝑌𝑡 𝐹0

𝑓2

%

𝑓(

%

𝑓(

&

𝑓2

&

Latent Space

+ +

𝐸0

ℒ5677 ℒ5677

𝐹8 𝐹- 𝐹9

ℒ+:( ℒ+:(

Adaptation & Re-ID Network

22

slide-23
SLIDE 23

Testing Scenario

23

slide-24
SLIDE 24

Comparisons with Recent Re-ID Methods

24

slide-25
SLIDE 25

Recent Research Focuses on Transfer Learning

  • AAAI 2018

Order-Free RNN with Visual Attention for Multi-Label Classification

  • CVPR 2018

Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation

  • CVPR 2018

Multi-Label Zero-Shot Learning with Structured Knowledge Graphs

  • CVPRW 2018

Unsupervised Deep Transfer Learning for Person Re-Identification

25

slide-26
SLIDE 26

Other Ongoing Research Topics

  • Take a Deep Look from a Single Image
  • Single-Image 3D Object Model Prediction
  • Completing Videos from a Deep Glimpse

26

slide-27
SLIDE 27

3D Shape Estimation from A Single 2D Image

  • Recovering Shape from a Single Image
  • Supervised Setting
  • Input image and its ground truth 3D voxel available for training

27

slide-28
SLIDE 28

3D Shape Estimation from A Single 2D Image

  • Recovering Shape from a Single Image
  • Semi-Supervised Setting
  • Input image and its ground truth 2D mask available for training

28

slide-29
SLIDE 29

3D Shape Estimation from A Single 2D Image

  • Example Results

29

slide-30
SLIDE 30

3D Shape Estimation from A Single 2D Image

  • Example Results

pose pose Chair

30

slide-31
SLIDE 31

Recent Research Focuses

  • Take a Deep Look from a Single Image
  • Single-Image 3D Object Model Prediction
  • Completing Videos from a Deep Glimpse

31

slide-32
SLIDE 32

What’s Video Completion?

32

slide-33
SLIDE 33

Temporal Encoder

From Video Synthesis to Completion

Three Stages in Learning

  • 1. Learning frame-based representation
  • 2. Learning video-based representation
  • 3. Learning video representation

conditioned on input anchor frames Stochastic & Recurrent Conditional-GAN (SR-cGAN)

Output

Temporal Generator

Input

Real Input Synthesized Input . . . . . . . . Real

  • r

Fake

Input: non-consecutive frames of interest Output: video sequence (more than one possible output)

  • Our Proposed Network
  • Variational autoencoder, recurrent neural nets, and GAN

33

slide-34
SLIDE 34

Video Synthesis

Shape Motion KTH

MUG

34

slide-35
SLIDE 35

Video Completion – Example Results

Input (Anchor Frames) Output (Synthesized Video) 7 6 11 12 14 15 6 11 7 12 14 15 GIF Input (Anchor Frames) Output (Synthesized Video) 3 2 7 9 12 14 2 3 7 9 12 14 GIF KTH Shape Motion

35

slide-36
SLIDE 36

Video Completion - Stochasticity

Different Motion Input (Anchor Frames) Output (Synthesized Video) GIF 5 3 8 12 13 14 5 3 8 12 13 14

36

slide-37
SLIDE 37

Video Interpolation & Prediction

  • Interpolation
  • Input:
  • 2 anchor frames
  • fixed on t=1 and 8
  • Output 8 frames
  • Prediction
  • Input:
  • 6 anchor frames
  • Fixed on t=1~6
  • Output 16 frames

37

slide-38
SLIDE 38

Summary

  • Deep Transfer Learning for Visual Analysis
  • Multi-Label Classification for Image Analysis
  • Detach and Adapt – Beyond Image Style Transfer
  • Single-Image 3D Object Model Prediction
  • Completing Videos from a Deep Glimpse

Person Table Sofa Chair TV Lights Carpet …

38

slide-39
SLIDE 39

For More Information…

  • Vision and Learning Lab at NTUEE (http://vllab.ee.ntu.edu.tw/)

39

slide-40
SLIDE 40

Thank You!

40