cs839 special topics in ai deep learning
play

CS839 Special Topics in AI: Deep Learning Learning with Less - PowerPoint PPT Presentation

CS839 Special Topics in AI: Deep Learning Learning with Less Supervision Sharon Yixuan Li University of Wisconsin-Madison October 29, 2020 Overview Weakly Supervised Learning Flickr100M JFT300M (Google) Instagram3B (Facebook)


  1. CS839 Special Topics in AI: Deep Learning Learning with Less Supervision Sharon Yixuan Li University of Wisconsin-Madison October 29, 2020

  2. Overview • Weakly Supervised Learning • Flickr100M • JFT300M (Google) • Instagram3B (Facebook) • Data augmentation • Human heuristics • Automated data augmentation • Self-supervised Learning • Pretext tasks (rotation, patches, colorization etc.) • Invariant vs. Covariant learning • Contrastive learning based framework (current SoTA)

  3. Part I: Weakly Supervised Learning

  4. Model Complexity Keeps Increasing output 10 fc 120 LeNet (Lecun et al. 1998) conv conv fc 84 >100 millions of parameters ResNet (He et al. 2016)

  5. [Sun et al. 2017]

  6. Challenge: Limited labeled data ImageNet , 1M images 1B images x 1000 ~thousand annotation hours ~million annotation hours [Deng et al. 2009]

  7. TRAINING AT SCALE Weakly Supervised Fully Supervised Un-supervised Levels of A CUTE CAT COUPLE CAT, DOG, ??? Supervision FLOOR #CAT Crawled web images ImageNet Instagram/Flickr

  8. TRAINING AT SCALE Non-Visual Incorrect #LOVE #CAT #DOG #HUSKY Labels Labels Noisy Data Missing Labels

  9. Flickr 100M [Joulin et al. 2015]

  10. JFT 300M [Sun et al. 2017]

  11. Can we use billions of images with hashtags for pre-training? [Mahajan et al. 2018]

  12. Hashtags Selection 1.5K, 1B synonyms of ImageNet labels 17K, 3B synonyms of nouns in wordnet [Mahajan et al. 2018]

  13. Network Architecture and Capacity ResNeXt-101 32x C d # of params x10^9 # of flops x10^6 160 900 120 675 80 450 40 225 0 0 4 8 16 32 48 4 8 16 32 48 C C Xie et al. 2016

  14. Largest Weakly Supervised Training 3.5B 
 DISTRIBUTED LARGE CAPACITY MODEL PUBLIC INSTAGRAM 17K UNIQUE LABELS TRAINING (RESNEXT101-32X48) IMAGES (350 GPUS) [Mahajan et al. 2018] 85.1%

  15. Results

  16. Transfer Learning Performance Target task: ImageNet * With a bigger model, we even got 85.4% top-1 error on 16 ImageNet-1K.

  17. Transfer Learning Performance Target task: ImageNet * With a bigger model, we even got 85.4% top-1 error on 17 ImageNet-1K.

  18. Transfer Learning Performance Target task: ImageNet * With a bigger model, we even got 85.4% top-1 error on 18 ImageNet-1K.

  19. Transfer Learning Performance Target task: ImageNet Target task: CUB-2011 & Places-365 * With a bigger model, we even got 85.4% top-1 error on 19 ImageNet-1K.

  20. Models are surprisingly robust to label "noise" Dataset: IG-1B-17k Network: ResNext-101 32x16 20

  21. Effect of Model Capacity Matching hashtags to target task helps (1.5K tags) 
 Target task: ImageNet-1K

  22. BiT Transfer [Kolesnikov et al. 2020]

  23. Part II: Data Augmentation

  24. Data Augmentation “Quokka” Figure credit: https://github.com/aleju/imgaug

  25. Data Augmentation “cat” Load image and label CNN Data

  26. Data Augmentation Transformation function (TF) Load image and label CNN Data

  27. Data Augmentation Transformation function - Change the pixels without changing the labels (TF) - Train on transformed data improves generalization - VERY widely used

  28. Example of Transformation Functions (TFs) Original image Color jitter Horizontal flip Random crop

  29. Heuristic Data Augmentation Human expert TF sequences Augmented data Data TF 1 TF L rotation flip

  30. Heuristic Data Augmentation How to automatically learn the compositions and Human expert parameterizations of TFs? TF sequences Augmented data Data TF 1 TF L rotation flip

  31. TANDA T ransformation A dversarial N etworks for D ata A ugmentations Generator (LSTM) TF sequences Augmented TF 1 TF L Data data rotation flip [Ratner et al. 2017]

  32. TANDA T ransformation A dversarial N etworks for D ata A ugmentations Generator (LSTM) TF sequences Discriminator Augmented TF 1 TF L Data data real or augmented? rotation flip [Ratner et al. 2017]

  33. TANDA T ransformation A dversarial N etworks for D ata A ugmentations Heuristic augmentation TANDA 100 +2.1% 75 +1.4 +3.4% 50 25 Generated MNIST samples 0 CIFAR-10 ACE (F1 score) Medical Imaging [Ratner et al. 2017]

  34. AutoAugment [Cubuk et al. 2018]

  35. AutoAugment Controller (RNN) TF sequences Discriminator Augmented TF 1 TF L Data data real or augmented? rotation flip [Cubuk et al. 2018]

  36. AutoAugment Controller (RNN) TF sequences End model Augmented TF 1 TF L Data data Validation accuracy R rotation flip State-of-the-art performance on various benchmarks, however the computational cost is very high. [Cubuk et al. 2018]

  37. RandAugment Controller (RNN) TF sequences End model Augmented TF 1 TF L Data data Validation accuracy R rotation flip [Cubuk et al. 2019]

  38. RandAugment (1) random sampling over the transformation functions Outperform AutoAugment (2) grid search over the parameters of each transformation Augmented TF 1 TF L Data data Randomly Randomly Sampled Sampled [Cubuk et al. 2019]

  39. Adversarial AutoAugment Adversarial Controller (RNN) Reward signal Maximize Training loss TF sequences End model Augmented TF 1 TF L Data data Minimize Training loss rotation flip 12x reduction in computing cost on ImageNet, compared to AutoAugment. Top-1 error 1.36% on CIFAR-10 (new sota). [Zhang et al. 2019]

  40. Uncertainty-based sampling augmentation Model selects the TFs that provides Rotate the most information during training —No policy learning required Invert mixup invert Cutout Augmented Data … K randomly sampled comp. of TFs … data rotate cutout Mixup Users provide transformation functions (TFs) [Wu et al. 2020]

  41. Empirical results: State of the art quality Improved the existing methods across domains SoTA on CIFAR-10, CIFAR-100, and SVHN 84.54% on CIFAR-100 using Wide-ResNet-28-10 outperforming RandAugment (Cubuk et al.’19) by 1.24% Improved 0.28 pts. in accuracy on text classification problem CIFAR-10 CIFAR-100 SVHN

  42. Check out the blog post series! Automating the Art of Data Augmentation (Part I: Overview) Automating the Art of Data Augmentation (Part II: Practical Methods) Automating the Art of Data Augmentation (Part III: Theory) Automating the Art of Data Augmentation (Part IV: New Direction)

  43. Part III: Self-supervised Learning

  44. Source: Yann LeCun’s talk

  45. What if we can get labels for free for unlabelled data and train unsupervised dataset in a supervised manner?

  46. Pretext Tasks

  47. Rotation [Gidaris et al. 2018]

  48. Rotation Gidaris et al. 2018

  49. Rotation Gidaris et al. 2018

  50. Patches [Doersch et al., 2015]

  51. Colorization [Zhang et al. 2016] http://richzhang.github.io/colorization/

  52. Pretext Invariant Representation Learning (PIRL) [Misra et al. 2019]

  53. Pretext Invariant Representation Learning (PIRL) [Misra et al. 2019] Positive pair Negative pairs

  54. SimCLR [Chen et al. 2020]

  55. SimCLR [Chen et al. 2020]

  56. SimCLR [Chen et al. 2020]

  57. Data Augmentation is the key [Chen et al. 2020]

  58. Unsupervised learning benefits more from bigger models [Chen et al. 2020]

  59. Summary • Weakly Supervised Learning • Flickr100M • JFT300M (Google) • Instagram3B (Facebook) • Data augmentation • Human heuristics • Automated data augmentation • Unsupervised Learning • Pretext tasks (rotation, patches, colorization etc.) • Invariant vs. Covariant learning • Contrastive learning based framework (current SoTA)

  60. Questions ?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend