t ask 2 vec task embedding for model recommendation
play

T ASK 2 VEC : Task Embedding for Model Recommendation - PowerPoint PPT Presentation

T ASK 2 VEC : Task Embedding for Model Recommendation https://arxiv.org/abs/1902.03545 Subhransu Maji College of Information and Computer Sciences University of Massachusetts, Amherst http://people.cs.umass.edu/smaji February 19, 2019 @ ICERM,


  1. T ASK 2 VEC : Task Embedding for Model Recommendation https://arxiv.org/abs/1902.03545 Subhransu Maji College of Information and Computer Sciences University of Massachusetts, Amherst http://people.cs.umass.edu/smaji February 19, 2019 @ ICERM, Brown University

  2. Task Embedding for Model Recommendation Allesandro, Michael, Rahul, Avinash, Subhransu, Charless, Stefano, Pietro What are similar tasks? What architecture should I use? What pre-training dataset? What hyper parameters? Do I need more training data? How difficult is this task? . . . Task = {dataset, labels, loss} If we have a universal vectorial representation of tasks we can frame all sorts of interesting CV applications engineering problems as meta-learning problems 2

  3. Model recommendation Brute Force: Input : Task = ( dataset, loss ) For each feature extractor architecture F : 1. Train classifier on F(dataset) 2. Compute validation performance L(y) X Output: best performing model recommender Task recommendation: Input : Task = ( dataset, loss ) 1. Compute task embedding t = E( Task ) 2. Predict best extractor F = M(t) 2. Train classifier on F(dataset) 3. Compute validation performance Output: best performing model Feature Extractor Zoo 3

  4. Task embedding using Fisher Information 1. Given a task , train a classifier with the task loss on features from a generic “probe network” 2. Compute gradients of probe network parameters w.r.t. task Intuition: F provides information about loss the sensitivity of the task performance to 3. Use statistics of the probe small perturbations of parameters in the parameter gradients as the probe network fixed dimensional task embedding p KL p θ 0 ( y | x ) p θ ( y | x ) = δθ · F · δθ + o ( δθ 2 ) , E x ∼ ˆ 4

  5. Properties of T ASK 2 VEC embedding Dataset: Two layer network ( x i , y i ) , i = 1 . . . n, y i ∈ { 0 , 1 } Classifier: w T φ ( x i ) � � p i = σ FIM for cross entropy loss for the last layer: F w = 1 X p i (1 − p i ) φ ( x i ) φ ( x i ) T x → φ ( x ) N i 5

  6. Properties of T ASK 2 VEC embedding Dataset: 1. Invariance to label space 2. Encodes task difficulty ( x i , y i ) , i = 1 . . . n, y i ∈ { 0 , 1 } 3. Encodes task domain Classifier: w T φ ( x i ) 4. Encodes useful features for the task � � p i = σ FIM for cross entropy loss for the last layer: Representative domain embedding F w = 1 D = 1 X X p i (1 − p i ) φ ( x i ) φ ( x i ) T φ ( x i ) φ ( x i ) T N N i i 6

  7. Properties of T ASK 2 VEC embedding 1. Binary tasks on unit square, i.e., each tile is a task 2. 10 Random ReLU features, i.e., φᵢ = max(0, aᵢ x + bᵢ y + cᵢ) 3. T-SNE to map 10x10 FIM to 2D 7

  8. Properties of T ASK 2 VEC embedding 1. Binary tasks on unit square, i.e., each tile is a task 2. 10 Random ReLU features, i.e., φᵢ = max(0, aᵢ x + bᵢ y + cᵢ) 3. T-SNE to map 10x10 FIM to 2D Polynomial degree 3 8

  9. Robust Fisher Computation 1. For realistic CV tasks we want to Estimate Λ of a Gaussian perturbation: use deep CNNs (e.g., ResNet) and estimate FIM for all the parameters. 2. Challenge: FIM can be hard to estimate (noisy loss landscape; high dimensions; small training set) 3. Robust FIM Optimal Λ satisfies: 1. Restrict it to a diagonal 2. Restrict it a single value per filter (CNN layer) 3. Robust estimation via perturbation “Trivial Embedding” 9

  10. Similarity measures on the space of tasks Task A Task B Task = {dataset, labels, loss} 10

  11. Similarity measures on the space of tasks Domain similarity Unbiased look at dataset bias, Torralba and Efros, CVPR 11 11

  12. Similarity measures on the space of tasks Domain similarity Range / label similarity • e.g., Taxonomic distance https://www.pinterest.com/pin/520799144386337065/ 12

  13. Similarity measures on the space of tasks Domain similarity Range / label similarity • e.g., Taxonomic distance Transfer “distance” • Fine-tune on a followed by b Taskonomy: Disentangling Task Transfer Learning, Amir Zamir, Alexander Sax, William Shen, Leonidas Guibas, Jitendra Malik, Silvio Savarese, CVPR 18 13

  14. Distance measures on T ASK 2 VEC embedding Symmetric distance Asymmetric “distance” 14

  15. M ODEL 2 VEC : Joint embedding of tasks and models 1. So far we have been associating models (feature extractors) with the tasks they are trained on. 2. How about 1. legacy / black-box feature extractors? E.g., SIFT, HOG, Fisher vector 2. models of different complexity trained on the same dataset 3. M ODEL 2V EC : Jointly embed feature extractors (encoded as one-hot- vectors) and tasks such that similarity reflects a meta-task objective. 1. Needs training data 15

  16. Task Zoo • Tasks [1460] • iNaturalist [207] • CUB 200 [25] • iMaterialist [228] • DeepFashion [1000] 16

  17. Task Zoo • Tasks [1460] • iNaturalist [207] • CUB 200 [25] • iMaterialist [228] • DeepFashion [1000] 17

  18. Task Zoo • Tasks [1460] • iNaturalist [207] • CUB 200 [25] • iMaterialist [228] • DeepFashion [1000] 18

  19. Task Zoo • Tasks [1460] • iNaturalist [207] • CUB 200 [25] • iMaterialist [228] • DeepFashion [1000] • Few tasks > 10K training samples but most have 100-1000 samples 19

  20. Experiment: T ASK 2 VEC recapitulates iNaturalist taxonomy Task embedding cosine similarity ResNet trained on ImageNet as probe network plants reptiles birds mammals insects 20

  21. Experiment: T ASK 2 VEC norm encodes task difficulty ResNet trained on ImageNet as probe network 21

  22. Experiment: T ASK 2 VEC vs D OMAIN 2 VEC Actinopterygii (n) Insecta (n) Reptilia (n) Neckline (m) Amphibia (n) Mammalia (n) Category (m) Pants (m) Arachnida (n) Mollusca (n) Color (m) Pattern (m) Aves (n) Plantae (n) Gender (m) Shoes (m) Fungi (n) Protozoa (n) Material (m) Task Embeddings Domain Embeddings 22

  23. Task and Feature Zoo • Tasks [1460] L(y) X • iNaturalist [207] • CUB 200 [25] recommender • iMaterialist [228] • DeepFashion [1000] • Feature Zoo [156 experts] • ResNet-34 pertained on ImageNet • Followed by fine-tuned on tasks Feature Extractor Zoo with enough examples 23

  24. The Matrix The Matrix Tasks Feature extractors 24

  25. The Matrix Tasks Experts iNaturalist + CUB 25

  26. ImageNet expert is usually good but on many tasks the best expert handily outperforms the ImageNet expert 26

  27. Data efficiency of T ASK 2 VEC ImageNet fixed Task2vec fixed Brute force fixed ImageNet finetune Task2vec finetune 27

  28. Choice of distance for T ASK 2 VEC Relative error increase over the oracle (best choice) 28

  29. Choice of the probe network on T ASK 2 VEC Relative error increase over the oracle (best choice) 29

  30. Thank you! Task2Vec: Task Embedding for Meta-Learning, Alessandro Achille, Michael Lam, Rahul Tewari, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Stefano Soatto, Pietro Perona (https:// arxiv.org/abs/1902.03545) 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend