deep transfer learning for visual analysis
play

Deep Transfer Learning for Visual Analysis Yu-Chiang Frank Wang, - PowerPoint PPT Presentation

Deep Transfer Learning for Visual Analysis Yu-Chiang Frank Wang, Associate Professor Dept. Electrical Engineering, National Taiwan University Taipei, Taiwan 2018/5/19 2 nd AII Workshop Trends of Deep Learning 2 Transfer Learning: What, When,


  1. Deep Transfer Learning for Visual Analysis Yu-Chiang Frank Wang, Associate Professor Dept. Electrical Engineering, National Taiwan University Taipei, Taiwan 2018/5/19 2 nd AII Workshop

  2. Trends of Deep Learning 2

  3. Transfer Learning: What, When, and Why? (cont’d) • A practical example https://techcrunch.com/2017/02/08/udacity-open-sources-its-self-driving-car-simulator-for-anyone-to-use/ https://googleblog.blogspot.tw/2014/04/the-latest-chapter-for-self-driving-car.html 3

  4. Recent Research Focuses on Transfer Learning • CVPR 2018 Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation • AAAI 2018 Order-Free RNN with Visual Attention for Multi-Label Classification • CVPR 2018 Multi-Label Zero-Shot Learning with Structured Knowledge Graphs • CVPRW 2018 Unsupervised Deep Transfer Learning for Person Re-Identification 4

  5. Detach & Adapt – Beyond Image Style Transfer • Faceapp – Putting a smile on your face! • Deep learning for representation disentanglement • Interpretable deep feature representation Input Mr. Takeshi Kaneshiro 5

  6. Detach & Adapt – Beyond Image Style Transfer • Cross-domain image synthesis, manipulation & translation With supervision w/o supervision Transfer Disentangle Disentangle smile smile from from Photo Cartoon Y.-C. F. Wang et al. , Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation, CVPR 2018 6

  7. Detach & Adapt – Beyond Image Style Transfer • Cross-domain image synthesis, manipulation & translation [CVPR’18] With supervision attribute W/o supervision Y.-C. F. Wang et al. , Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation, CVPR 2018 7

  8. Example Results • Face • Photo & Sketch Conditional Unsupervised Image Translation • w/o Label supervision w/o Label supervision Unpaired Y.-C. F. Wang et al. , Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation, CVPR 2018 8

  9. Comparisons Cross-Domain Image Translation Representation Disentanglement Unpaired Multi- Joint Interpretability of Bi-direction Unsupervised Training Data domains Representation disentangled factor X X X X Pix2pix O X O X CycleGAN Cannot disentangle image representation O O O X StarGAN O X O O UNIT O X X O DTN O X infoGAN Cannot translate images across domains X O AC-GAN O O O O O Partially CDRD (Ours) 9

  10. Recent Research Focuses on Transfer Learning • CVPR 2018 Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation • AAAI 2018 Order-Free RNN with Visual Attention for Multi-Label Classification • CVPR 2018 Multi-Label Zero-Shot Learning with Structured Knowledge Graphs • CVPRW 2018 Unsupervised Deep Transfer Learning for Person Re-Identification 10

  11. Multi-Label Classification for Image Analysis • Prediction of multiple object labels from an image • Learning across image and semantics domains • No object detectors available • Desirable if be able to exploit label co-occurrence info Labels: Person Table Sofa Chair TV Lights Carpet … 11

  12. DNN for Multi-Label Classification • Canonical-Correlated Autoencoder (C2AE) [Wang et al., AAAI 2017] • Unique integration of autoencoder & deep canonical correlation analysis (DCCA) • Autoencoder: label embedding + label recovery + label co-occurrence • DCCA: joint feature & label embedding • Can handle missing labels during learning feature space label space Clouds Clouds Lake Lake Ocean Ocean label space Latent space Water Water Sky Sky Sun Sun Sunset Sunset Y.-C. F. Wang et al. , Learning Deep Latent Spaces for Multi-Label Classification, AAAI 2017 12

  13. Order-Free RNN with Visual Attention for Multi-Label Classification [AAAI’18] • Visual Attention for MLC [Wang et al., AAAI’18] Y.-C. F. Wang et al. , Order-Free RNN with Visual Attention for Multi-Label Classification, AAAI 2018 13

  14. Order-Free RNN with Visual Attention for Multi-Label Classification • Experiments • NUS-WIDE: 269,648 images with 81 labels • MS-COCO: 82,783 images with 80 labels • Quantitative Evaluation MS-COCO NUS-WIDE Y.-C. F. Wang et al. , Order-Free RNN with Visual Attention for Multi-Label Classification, AAAI 2018 14

  15. Order-Free RNN with Visual Attention for Multi-Label Classification • Qualitative Evaluation Example images in MS-COCO with the associated attention maps Incorrect predictions with reasonable visual attention Y.-C. F. Wang et al. , Order-Free RNN with Visual Attention for Multi-Label Classification, AAAI 2018 15

  16. Multi-Label Zero-Shot Learning with Structured Knowledge Graphs [CVPR’18] • Utilizing structured knowledge graphs for modeling label dependency 16

  17. • Our Proposed Network 17

  18. • Our Proposed Network 18

  19. Order-Free RNN with Visual Attention for Multi-Label Classification • Experiments • NUS-WIDE: 269,648 images with 1000 labels • MS-COCO: 82,783 images with 80 labels • Quantitative Evaluation • ML vs. ML-ZSL vs. Generalized ML-ZSL 19

  20. Recent Research Focuses on Transfer Learning • CVPR 2018 Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation • AAAI 2018 Order-Free RNN with Visual Attention for Multi-Label Classification • CVPR 2018 Multi-Label Zero-Shot Learning with Structured Knowledge Graphs • CVPRW 2018 Unsupervised Deep Transfer Learning for Person Re-Identification 20

  21. Introduction: Person re-identification Camera #1 Camera #3 Camera #2 Camera #4 Person re-identification task: the system needs to match appearances of a person of interest across non-overlapping cameras. 21

  22. Adaptation & Re-ID Network Latent Space Target Dataset 𝐽 𝑢 Latent Encoder Latent Decoder 𝐹 8 % 𝑓 2 $ % 𝑌 ℒ 5677 ℒ +:( + 𝑌 𝑢 % 𝑓 ( 𝐸 0 w/o labels 𝐹 9 𝐹 0 ℒ (%+& $ & ℒ +:( 𝑌 Source Dataset 𝐽 s & 𝑓 ( 𝑌 𝑡 ℒ 5677 + & 𝑓 2 𝐹 - w/ labels 𝐷 - ℒ ()*&& Classifier 22

  23. Testing Scenario 23

  24. Comparisons with Recent Re-ID Methods 24

  25. Recent Research Focuses on Transfer Learning • AAAI 2018 Order-Free RNN with Visual Attention for Multi-Label Classification • CVPR 2018 Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation • CVPR 2018 Multi-Label Zero-Shot Learning with Structured Knowledge Graphs • CVPRW 2018 Unsupervised Deep Transfer Learning for Person Re-Identification 25

  26. Other Ongoing Research Topics • Take a Deep Look from a Single Image • Single-Image 3D Object Model Prediction • Completing Videos from a Deep Glimpse 26

  27. 3D Shape Estimation from A Single 2D Image • Recovering Shape from a Single Image • Supervised Setting • Input image and its ground truth 3D voxel available for training 27

  28. 3D Shape Estimation from A Single 2D Image • Recovering Shape from a Single Image • Semi-Supervised Setting • Input image and its ground truth 2D mask available for training 28

  29. 3D Shape Estimation from A Single 2D Image • Example Results 29

  30. 3D Shape Estimation from A Single 2D Image • Example Results Chair pose pose 30

  31. Recent Research Focuses • Take a Deep Look from a Single Image • Single-Image 3D Object Model Prediction • Completing Videos from a Deep Glimpse 31

  32. What’s Video Completion? 32

  33. From Video Synthesis to Completion • Our Proposed Network • Variational autoencoder, recurrent neural nets, and GAN Input: non-consecutive frames of interest Input Output Output: video sequence (more than one possible output) Input Synthesized Real . . . . or Three Stages in Learning Fake Input Real 1. Learning frame-based representation Temporal Temporal . . . . 2. Learning video-based representation Encoder Generator 3. Learning video representation Stochastic & Recurrent Conditional-GAN (SR-cGAN) conditioned on input anchor frames 33

  34. Video Synthesis KTH Shape Motion MUG 34

  35. Video Completion – Example Results Shape Motion Output (Synthesized Video) Input (Anchor Frames) GIF 6 7 11 12 14 15 6 7 11 14 15 12 KTH Output (Synthesized Video) Input (Anchor Frames) GIF 2 3 7 9 12 14 2 3 7 9 12 14 35

  36. Video Completion - Stochasticity Output (Synthesized Video) Input (Anchor Frames) GIF 3 5 8 12 13 14 3 5 8 12 13 14 Different Motion 36

  37. Video Interpolation & Prediction • Interpolation • Input: 2 anchor frames • • fixed on t=1 and 8 • Output 8 frames • Prediction • Input: • 6 anchor frames • Fixed on t=1~6 • Output 16 frames 37

  38. Summary • Deep Transfer Learning for Visual Analysis • Multi-Label Classification for Image Analysis • Detach and Adapt – Beyond Image Style Transfer • Single-Image 3D Object Model Prediction • Completing Videos from a Deep Glimpse Person Table Sofa Chair TV Lights Carpet … 38

  39. For More Information… • Vision and Learning Lab at NTUEE (http://vllab.ee.ntu.edu.tw/) 39

  40. Thank You! 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend