T ASK 2 VEC : Task Embedding for Model Recommendation - PowerPoint PPT Presentation

T ASK 2 VEC : Task Embedding for Model Recommendation https://arxiv.org/abs/1902.03545 Subhransu Maji College of Information and Computer Sciences University of Massachusetts, Amherst http://people.cs.umass.edu/smaji February 19, 2019 @ ICERM, Brown University

Task Embedding for Model Recommendation Allesandro, Michael, Rahul, Avinash, Subhransu, Charless, Stefano, Pietro What are similar tasks? What architecture should I use? What pre-training dataset? What hyper parameters? Do I need more training data? How difficult is this task? . . . Task = {dataset, labels, loss} If we have a universal vectorial representation of tasks we can frame all sorts of interesting CV applications engineering problems as meta-learning problems 2

Model recommendation Brute Force: Input : Task = ( dataset, loss ) For each feature extractor architecture F : 1. Train classifier on F(dataset) 2. Compute validation performance L(y) X Output: best performing model recommender Task recommendation: Input : Task = ( dataset, loss ) 1. Compute task embedding t = E( Task ) 2. Predict best extractor F = M(t) 2. Train classifier on F(dataset) 3. Compute validation performance Output: best performing model Feature Extractor Zoo 3

Task embedding using Fisher Information 1. Given a task , train a classifier with the task loss on features from a generic “probe network” 2. Compute gradients of probe network parameters w.r.t. task Intuition: F provides information about loss the sensitivity of the task performance to 3. Use statistics of the probe small perturbations of parameters in the parameter gradients as the probe network fixed dimensional task embedding p KL p θ 0 ( y | x ) p θ ( y | x ) = δθ · F · δθ + o ( δθ 2 ) , E x ∼ ˆ 4

Properties of T ASK 2 VEC embedding Dataset: Two layer network ( x i , y i ) , i = 1 . . . n, y i ∈ { 0 , 1 } Classifier: w T φ ( x i ) � � p i = σ FIM for cross entropy loss for the last layer: F w = 1 X p i (1 − p i ) φ ( x i ) φ ( x i ) T x → φ ( x ) N i 5

Properties of T ASK 2 VEC embedding Dataset: 1. Invariance to label space 2. Encodes task difficulty ( x i , y i ) , i = 1 . . . n, y i ∈ { 0 , 1 } 3. Encodes task domain Classifier: w T φ ( x i ) 4. Encodes useful features for the task � � p i = σ FIM for cross entropy loss for the last layer: Representative domain embedding F w = 1 D = 1 X X p i (1 − p i ) φ ( x i ) φ ( x i ) T φ ( x i ) φ ( x i ) T N N i i 6

Properties of T ASK 2 VEC embedding 1. Binary tasks on unit square, i.e., each tile is a task 2. 10 Random ReLU features, i.e., φᵢ = max(0, aᵢ x + bᵢ y + cᵢ) 3. T-SNE to map 10x10 FIM to 2D 7

Properties of T ASK 2 VEC embedding 1. Binary tasks on unit square, i.e., each tile is a task 2. 10 Random ReLU features, i.e., φᵢ = max(0, aᵢ x + bᵢ y + cᵢ) 3. T-SNE to map 10x10 FIM to 2D Polynomial degree 3 8

Robust Fisher Computation 1. For realistic CV tasks we want to Estimate Λ of a Gaussian perturbation: use deep CNNs (e.g., ResNet) and estimate FIM for all the parameters. 2. Challenge: FIM can be hard to estimate (noisy loss landscape; high dimensions; small training set) 3. Robust FIM Optimal Λ satisfies: 1. Restrict it to a diagonal 2. Restrict it a single value per filter (CNN layer) 3. Robust estimation via perturbation “Trivial Embedding” 9

Similarity measures on the space of tasks Task A Task B Task = {dataset, labels, loss} 10

Similarity measures on the space of tasks Domain similarity Unbiased look at dataset bias, Torralba and Efros, CVPR 11 11

Similarity measures on the space of tasks Domain similarity Range / label similarity • e.g., Taxonomic distance https://www.pinterest.com/pin/520799144386337065/ 12

Similarity measures on the space of tasks Domain similarity Range / label similarity • e.g., Taxonomic distance Transfer “distance” • Fine-tune on a followed by b Taskonomy: Disentangling Task Transfer Learning, Amir Zamir, Alexander Sax, William Shen, Leonidas Guibas, Jitendra Malik, Silvio Savarese, CVPR 18 13

Distance measures on T ASK 2 VEC embedding Symmetric distance Asymmetric “distance” 14

M ODEL 2 VEC : Joint embedding of tasks and models 1. So far we have been associating models (feature extractors) with the tasks they are trained on. 2. How about 1. legacy / black-box feature extractors? E.g., SIFT, HOG, Fisher vector 2. models of different complexity trained on the same dataset 3. M ODEL 2V EC : Jointly embed feature extractors (encoded as one-hot- vectors) and tasks such that similarity reflects a meta-task objective. 1. Needs training data 15

Task Zoo • Tasks [1460] • iNaturalist [207] • CUB 200 [25] • iMaterialist [228] • DeepFashion [1000] 16

Task Zoo • Tasks [1460] • iNaturalist [207] • CUB 200 [25] • iMaterialist [228] • DeepFashion [1000] • Few tasks > 10K training samples but most have 100-1000 samples 19

Experiment: T ASK 2 VEC recapitulates iNaturalist taxonomy Task embedding cosine similarity ResNet trained on ImageNet as probe network plants reptiles birds mammals insects 20

Experiment: T ASK 2 VEC norm encodes task difficulty ResNet trained on ImageNet as probe network 21

Experiment: T ASK 2 VEC vs D OMAIN 2 VEC Actinopterygii (n) Insecta (n) Reptilia (n) Neckline (m) Amphibia (n) Mammalia (n) Category (m) Pants (m) Arachnida (n) Mollusca (n) Color (m) Pattern (m) Aves (n) Plantae (n) Gender (m) Shoes (m) Fungi (n) Protozoa (n) Material (m) Task Embeddings Domain Embeddings 22

Task and Feature Zoo • Tasks [1460] L(y) X • iNaturalist [207] • CUB 200 [25] recommender • iMaterialist [228] • DeepFashion [1000] • Feature Zoo [156 experts] • ResNet-34 pertained on ImageNet • Followed by fine-tuned on tasks Feature Extractor Zoo with enough examples 23

The Matrix The Matrix Tasks Feature extractors 24

The Matrix Tasks Experts iNaturalist + CUB 25

ImageNet expert is usually good but on many tasks the best expert handily outperforms the ImageNet expert 26

Data efficiency of T ASK 2 VEC ImageNet fixed Task2vec fixed Brute force fixed ImageNet finetune Task2vec finetune 27

Choice of distance for T ASK 2 VEC Relative error increase over the oracle (best choice) 28

Choice of the probe network on T ASK 2 VEC Relative error increase over the oracle (best choice) 29

Thank you! Task2Vec: Task Embedding for Meta-Learning, Alessandro Achille, Michael Lam, Rahul Tewari, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Stefano Soatto, Pietro Perona (https:// arxiv.org/abs/1902.03545) 30

T ASK 2 VEC : Task Embedding for Model Recommendation - PowerPoint PPT Presentation

T ASK 2 VEC : Task Embedding for Model Recommendation https://arxiv.org/abs/1902.03545 Subhransu Maji College of Information and Computer Sciences University of Massachusetts, Amherst http://people.cs.umass.edu/smaji February 19, 2019 @ ICERM,

VEC Virtual New Parent Night 2020-21 Cohort Welcome Parents! Agenda VEC Staff What is

STL and Example Review <vector> Declaration: std::vector<T>vec = {initializer

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Unifiers as Equivalences Proof-Relevant Unification of Dependently Typed Data Jesper Cockx

POWER TO SAVE FAMILY FARMS AND THE ENVIRONMENT 11/21/2013 VEC Conference SCHEMATIC DIAGRAM

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

ASK C o r p o r a t i o n ASK Corporation American ADM, Inc. ASK 1 C o r p o r a t i o n Ask

T ask Analysis Ov erview What is task analysis? T ask Analysis Metho ds task

Ninth to T Ninth to Twelfth Grade welfth Grade Sample T Sample Task ask Task: Persuasiv ask:

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Learning Task-Agnostic Embedding of Multiple Black-Box Experts for Multi-Task Model Fusion Nghia

Task Analy ask Analysis Proc sis Process: ess: Steps 3 & 4 teps 3 & 4 Task Analysis

Task Analy ask Analysis Proc sis Process ess Task Analysis Process Step 1: Ex tep 1: Examine

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

Abington Memorial Hospital Abington Memorial Hospital Abington, Pennsylvania Abington, Pennsylvania

Learning Graph Representations for Video Understanding Xiaolong Wang Carnegie Mellon University

How var iable is the fe e dstoc k fr om de gr ade d tr e e s and mill r e sidue s for

Biohacking: Some/mes Spooky Stuff and Some/mes Wonderful Stuff

Welcome A NDERSON P RIMARY P4 Parents Forum 8 April 2016 Passion for Learning Quest for

Experimental Design CS294 Practical Machine Learning Daniel Ting Original Slides by Barbara

SLIPPERY SLIDES HACK and CHEATS.|100% WORKING!|NEW METHOD|HACK TOOL. Free No Ads 7 Hacks To Make

- A Tutorial - Based on Slides from Dr. Bibhudatta Sahoo University of Illinois at

T ASK 2 VEC : Task Embedding for Model Recommendation - PowerPoint PPT Presentation

T ASK 2 VEC : Task Embedding for Model Recommendation https://arxiv.org/abs/1902.03545 Subhransu Maji College of Information and Computer Sciences University of Massachusetts, Amherst http://people.cs.umass.edu/smaji February 19, 2019 @ ICERM,

VEC Virtual New Parent Night 2020-21 Cohort Welcome Parents! Agenda VEC Staff What is

STL and Example Review &lt;vector&gt; Declaration: std::vector&lt;T&gt;vec = {initializer

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Unifiers as Equivalences Proof-Relevant Unification of Dependently Typed Data Jesper Cockx

POWER TO SAVE FAMILY FARMS AND THE ENVIRONMENT 11/21/2013 VEC Conference SCHEMATIC DIAGRAM

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

ASK C o r p o r a t i o n ASK Corporation American ADM, Inc. ASK 1 C o r p o r a t i o n Ask

T ask Analysis Ov erview What is task analysis? T ask Analysis Metho ds task

Ninth to T Ninth to Twelfth Grade welfth Grade Sample T Sample Task ask Task: Persuasiv ask:

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Learning Task-Agnostic Embedding of Multiple Black-Box Experts for Multi-Task Model Fusion Nghia

Task Analy ask Analysis Proc sis Process: ess: Steps 3 &amp; 4 teps 3 &amp; 4 Task Analysis

Task Analy ask Analysis Proc sis Process ess Task Analysis Process Step 1: Ex tep 1: Examine

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

Abington Memorial Hospital Abington Memorial Hospital Abington, Pennsylvania Abington, Pennsylvania

Learning Graph Representations for Video Understanding Xiaolong Wang Carnegie Mellon University

How var iable is the fe e dstoc k fr om de gr ade d tr e e s and mill r e sidue s for

Biohacking: Some/mes Spooky Stuff and Some/mes Wonderful Stuff

Welcome A NDERSON P RIMARY P4 Parents Forum 8 April 2016 Passion for Learning Quest for

Experimental Design CS294 Practical Machine Learning Daniel Ting Original Slides by Barbara

SLIPPERY SLIDES HACK and CHEATS.|100% WORKING!|NEW METHOD|HACK TOOL. Free No Ads 7 Hacks To Make

- A Tutorial - Based on Slides from Dr. Bibhudatta Sahoo University of Illinois at

STL and Example Review <vector> Declaration: std::vector<T>vec = {initializer

Task Analy ask Analysis Proc sis Process: ess: Steps 3 & 4 teps 3 & 4 Task Analysis