in deep learning
play

In Deep Learning Anima Anandkumar & Zachary Lipton DATA - PowerPoint PPT Presentation

Addressing Data Scarcity In Deep Learning Anima Anandkumar & Zachary Lipton DATA AUGMENTATION To improve generalization, augment training data. In computer vision. Simple techniques: rotation, cropping, noise In speech


  1. Addressing Data Scarcity In Deep Learning Anima Anandkumar & Zachary Lipton

  2. DATA AUGMENTATION • To improve generalization, augment training data. • In computer vision. Simple techniques: rotation, cropping, noise • In speech recognition: Additive background noise and spectral transform • More sophisticated approaches ?

  3. PREDICTIVE VS GENERATIVE MODELS P(y | x) P(x | y) y y x x • Impressive gains with deep learning • Far more challenging • Information loss (domain of y << x) • Need to model latent variations

  4. DATA AUGMENTATION 1: MIXED REALITY GAN Merits GAN Peril • Captures statistics of • Quality of generated images not high natural images • Introduces artifacts • Learnable Synthetic Data Peril Merits • High-quality rendering • Domain mis-match • Rendering for visual • Full annotation for free appeal and not • Generate infinite data classification Our GAN-based framework – Mr.GANs – narrows gap between synthetic and real data

  5. 
 MIXED-REALITY 
 GENERATIVE ADVERSARIAL NETWORKS (MR-GAN) 
 Tan Nguyen, Hao Chen, Zachary Lipton, Leo Dirac, Stefano Soatto, A.

  6. 
 
 MIXED-REALITY GENERATIVE ADVERSARIAL NETWORKS (MR-GAN) 
 • Two domains X and Y. • CycleGAN: Transforms from domain X to Y and viceversa. • Enforcing Cycle consistency: F(G(X)) ~ X. • MR-GAN: Progressive CycleGAN

  7. 
 
 MIXED-REALITY GENERATIVE ADVERSARIAL NETWORKS (MR-GAN) 


  8. CLASSIFICATION RESULTS ON CIFAR-DA4 1million synthetic from 3D models. 0.25% real from CIFAR100. 4 classes. Improvement over training on real data: • Real + Synthetic: 5.43% (Stage 0) • Real + CycleGAN: 5.09% (Stage 1) • Real + Mr.GAN: • 8.85% (Stage 2)

  9. THE REAL, THE SYNTHETIC AND THE REFINED Real Synthetic Refined Synthetic Refined Real Mr. GAN pushes both real and synthetic images closer to one another

  10. (R-S): SYNTHETIC -> REFINED SYNTHETIC Synthetic Refined Synthetic

  11. (R-S): REAL -> REFINED REAL Real Refined Real

  12. PREDICTIVE VS GENERATIVE MODELS P(y | x) P(x | y) y y One model to do both? x x • SOTA prediction from CNN models. • What class of p(x|y) yield CNN models for p(y|x)?

  13. 
 LATENT-DEPENDENT DEEP RENDERING MODEL (LD-DRM) 
 Nhat Ho, Tan Nguyen, Ankit Patel, A. , Michael Jordan, Richard Baraniuk

  14. LATENT-DEPENDENT DEEP RENDERING MODEL (LD-DRM) object category Design joint priors for latent latent variables based on intermediate variables reverse-engineering CNN rendering predictive architectures image

  15. LATENT-DEPENDENT DEEP RENDERING MODEL (LD-DRM)

  16. LATENT-DEPENDENT DEEP RENDERING MODEL (LD-DRM)

  17. LATENT-DEPENDENT DEEP RENDERING MODEL (LD-DRM)

  18. STATISTICAL GUARANTEES FOR THE LD-DRM Training loss in the CNNs equivalent to likelihood in LD-DRM • Generalization for prediction depends on the generative model • Better generalization when no. of active rendering paths minimized • Rendering path normalization: new form of regularization • Improves performance significantly.

  19. SEMI-SUPERVISED LEARNING RESULTS Error rate percentage on CIFAR-10 Error rate percentage on CIFAR-100 LD-DRM achieves comparable results to state-of-the-art SSL methods

  20. DATA AUGMENTATION 3: SYMBOLIC EXPRESSIONS Goal: Learn a domain of functions (sin, cos, log, add…) • Training on numerical input-output does not generalize. Data Augmentation with Symbolic Expressions • Efficiently encode relationships between functions. Solution: • Design networks to use both: symbolic + numerical

  21. COMMON STRUCTURE: TREES • Symbolic expression trees. Function evaluation tree. • Decimal trees: encode numbers with decimal representation (numerical). • Can encode any expression, function evaluation and number.

  22. STRUCTURE : TREE LSTM

  23. RESULTS: EQUATION COMPLETION & FUNCTION EVAL

  24. RESULTS: EQUATION VERIFICATION Generalization to unseen depth

  25. RESULTS SUMMARIZED • Vastly Improved numerical evaluation: 90% over function-fitting baseline. • Generalization to verifying symbolic equations of higher depth LSTM: Symbolic TreeLSTM: Symbolic TreeLSTM: symbolic + numeric 76.40 % 93.27 % 96.17 % • Combining symbolic + numerical data helps in better generalization for both tasks: symbolic and numerical evaluation.

  26. 
 CONCLUSION Data scarcity needs to be addressed in a number of ways • Collection: Active learning and partial feedback • Aggregation: Crowdsourcing models • Augmentation: • Graphics rendering + GANs • Semi-supervised learning • Symbolic expressions

  27. Thank you

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend