Hierarchically Structured Meta-learning Huaxiu Yao 1,2 , Ying Wei 2 , - PowerPoint PPT Presentation

Hierarchically Structured Meta-learning Huaxiu Yao 1,2 , Ying Wei 2 , Junzhou Huang 1 , Zhenhui Li 2 1 Pennsylvania State University 2 Tencent AI Lab Oral: Thu Jun 13th 09:35 -- 09:40 AM @ Room 103 Poster: Thu Jun 13th 06:30 -- 09:00 PM @ Pacific Ballroom #183

Gradient-based Meta-learning (MAML [1]) Is global initialization enough? [1] Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." Proceedings of the 2 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017. http://people.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf

Task-specific Meta-learning (MT-Net [2]) Should the initialization be tailored to each task? 3 [2] Lee, Yoonho, and Seungjin Choi. "Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace." International Conference on Machine Learning. 2018.

Human Beings: Knowledge Organization and Reuse store read- super- play ing market [3] Gershman, Samuel J., David M. Blei, and Yael Niv. "Context, learning, and extinction." Psychological review 117.1 (2010): 197. 4 [4] Gershman, Samuel J., et al. "Statistical computations underlying the dynamics of memory updating." PLoS computational biology 10.11 (2014): e1003939.

Our Solution: Hierarchically Structured Meta-learning Balance between generalization and customization • Organize tasks by hierarchical clustering • Adapt the global initialization to each cluster of tasks 5

HSML: Optimization Overall optimization problem Extension to continual adaptation • Incrementally increase the clusters as tasks sequentially arrive. • Criterion for adding a cluster—evaluate the average loss over Q epochs 6

Analysis For task 𝒰 " ∼ ℰ , training and testing samples are i.i.d. drawn from 𝒯 " • / . The initialization of HSML (K clusters) can be represented as 𝜄 '( = ∑ +,- 𝑪 + 𝜄 ' • According to [5], the assumptions are ℒ ∈ [0, 1] is 𝜃 -smooth and has a 𝜍 - • Lipschitz Hessian, step size at the 𝑣 -step 𝛽 < = 𝑑/𝑣 satisfying 𝑑 ≤ min{ - - E FD GH I J } with total steps 𝑉 = 𝑜 (N . D , The generalization of base learner 𝑔 R is bounded by 𝜗 𝒯 " , 𝜄 ' , where • Q 𝒰 ` a ^_ 1 + 1 1 Y ` a -b^_ 𝜗 𝒯 " , 𝜄 ' = 𝒫 𝑆 𝒠 𝒰 \] 𝜄 '( - 𝛿 X 𝑑V \ 𝑜 (N ` a -b^_ MAML can be regarded as a special case of HSML, i.e., ∀𝑙, / 𝑪 + = 𝑱 • . After proving ∃ Y , s.t., Y \] 𝜄 '( ≤ Y 𝐶 + +,- 𝑆 𝒠 𝒰 𝑆 𝒠 𝒰 \] 𝜄 ' , we conclude that HSML • \ \ achieves a tighter generalization bound than MAML 7 [5] Kuzborskij, Ilja, and Christoph Lampert. "Data-Dependent Stability of Stochastic Gradient Descent." International Conference on Machine Learning. 2018.

Experiments: Toy Regression Data • 4 sync family functions—Sin, Line, Cubic, Quadratic • K-shot: K samples are used as training (each task) Base model • 2 layers FC with 40 neurons each 8

Experiments: Toy Regression Quantitative results • • Comparison on regression MSEs Comparison in the continual adaptation scenario Method 5-shot 10-shot 2.205 ± 0.121 0.761 ± 0.06 Global shared (MAML) 8 1.096 ± 0.085 0.256 ± 0.02 Task-specific (MUMOMAML[6]) 8 0.856 ± 0.073 0.161 ± 0.021 Our method (HSML) 9 [6] Vuorio, Risto, Shao-Hua Sun, Hexiang Hu, and Joseph J. Lim. "Toward Multimodal Model-Agnostic Meta-Learning." arXiv preprint arXiv:1812.07172 (2018).

Experiments: Toy Regression Qualitive results • • Regression results Cluster assignment interpretation 10

Experiments: Few-shot Classification Data • 4 image classification datasets—Bird, Texture, Aircraft, Fungi • 5-way, 1-shot Base model • a convolutional network with 4 convolution blocks 11

Experiments: Few-shot Classification Quantitative results • • Comparison on accuracy Comparison in the continual adaptation scenario Method Bird Textu Aircr Fungi re aft Global shared 53.94 31.66 51.37 42.12 (MAML) % % % % Task-specific ( 56.82 33.81 53.14 42.22 MUMOMAML[6]) % % % % Our method 60.98 35.01% 57.38 44.02 (HSML) % % % 12 [6] Vuorio, Risto, Shao-Hua Sun, Hexiang Hu, and Joseph J. Lim. "Toward Multimodal Model-Agnostic Meta-Learning." arXiv preprint arXiv:1812.07172 (2018).

Experiments: Few-shot Classification Qualitive results • Cluster assignment interpretation 13

Conclusions • HSML simultaneously customizes task knowledge and preserves knowledge generalization via the hierarchical clustering structure. • Experiments demonstrate the effectiveness and interpretability of HSML in both toy regression and few-shot classification problems. 14

THANK YOU Oral: Thu Jun 13th 09:35 -- 09:40 AM @ Room 103 Poster: Thu Jun 13th 06:30 -- 09:00 PM @ Pacific Ballroom #183

Hierarchically Structured Meta-learning Huaxiu Yao 1,2 , Ying Wei 2 , - PowerPoint PPT Presentation

Hierarchically Structured Meta-learning Huaxiu Yao 1,2 , Ying Wei 2 , Junzhou Huang 1 , Zhenhui Li 2 1 Pennsylvania State University 2 Tencent AI Lab Oral: Thu Jun 13th 09:35 -- 09:40 AM @ Room 103 Poster: Thu Jun 13th 06:30 -- 09:00 PM @ Pacific

A Parallel Numerical Solver Using Hierarchically Tiled Using Hierarchically Tiled Arrays James

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

MECHANICAL CHARACTERIZATION AND MODELING OF HIERARCHICALLY-STRUCTURED COMPOSITE MATERIALS Hugh A.

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

Meta-Learning of Structured Representation by Proximal Mapping Mao Li, Yingyi Ma, Xinhua

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,

Measures to Ensure Safety of Mushrooms and and NonWood Forest Products Wild Plants

Lecture 1: Bioinformatic Algorithms In this lecture Logistics of the course

Designing Polycultures for the Garden Author: Steve Gabriel

There is Fungus Amungus acetazolamide for pseudotumor cerebri with resolution of sxs 18

QSAR MODELING OF FUNGICIDAL ACTIVITY OF MANNICH BASES ACTIVITY OF MANNICH BASES Simona

Time series modeling of plant protection products in aquatic systems in R Analysis of

practice An advocate for government policy & funding that advances green chemistry

FOOD SAFETY TRAINING FOR FARMER SUPPORT ORGANIZATIONS, PART 5 SANITATION FOR FOOD SAFETY AND

Hierarchically Structured Meta-learning Huaxiu Yao 1,2 , Ying Wei 2 , - PowerPoint PPT Presentation

Hierarchically Structured Meta-learning Huaxiu Yao 1,2 , Ying Wei 2 , Junzhou Huang 1 , Zhenhui Li 2 1 Pennsylvania State University 2 Tencent AI Lab Oral: Thu Jun 13th 09:35 -- 09:40 AM @ Room 103 Poster: Thu Jun 13th 06:30 -- 09:00 PM @ Pacific

A Parallel Numerical Solver Using Hierarchically Tiled Using Hierarchically Tiled Arrays James

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

MECHANICAL CHARACTERIZATION AND MODELING OF HIERARCHICALLY-STRUCTURED COMPOSITE MATERIALS Hugh A.

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

Meta-Learning of Structured Representation by Proximal Mapping Mao Li, Yingyi Ma, Xinhua

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

The Meta-Learning Problem &amp; Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers &amp; References META-LRA Kft. 8400 Ajka,

Measures to Ensure Safety of Mushrooms and and NonWood Forest Products Wild Plants

Lecture 1: Bioinformatic Algorithms In this lecture Logistics of the course

Designing Polycultures for the Garden Author: Steve Gabriel

There is Fungus Amungus acetazolamide for pseudotumor cerebri with resolution of sxs 18

QSAR MODELING OF FUNGICIDAL ACTIVITY OF MANNICH BASES ACTIVITY OF MANNICH BASES Simona

Time series modeling of plant protection products in aquatic systems in R Analysis of

practice An advocate for government policy &amp; funding that advances green chemistry

FOOD SAFETY TRAINING FOR FARMER SUPPORT ORGANIZATIONS, PART 5 SANITATION FOR FOOD SAFETY AND

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,

practice An advocate for government policy & funding that advances green chemistry