Hierarchically Structured Meta-learning Huaxiu Yao 1,2 , Ying Wei 2 , - - PowerPoint PPT Presentation

β–Ά
hierarchically structured meta learning
SMART_READER_LITE
LIVE PREVIEW

Hierarchically Structured Meta-learning Huaxiu Yao 1,2 , Ying Wei 2 , - - PowerPoint PPT Presentation

Hierarchically Structured Meta-learning Huaxiu Yao 1,2 , Ying Wei 2 , Junzhou Huang 1 , Zhenhui Li 2 1 Pennsylvania State University 2 Tencent AI Lab Oral: Thu Jun 13th 09:35 -- 09:40 AM @ Room 103 Poster: Thu Jun 13th 06:30 -- 09:00 PM @ Pacific


slide-1
SLIDE 1

Huaxiu Yao1,2, Ying Wei2, Junzhou Huang1, Zhenhui Li2

1Pennsylvania State University 2Tencent AI Lab

Hierarchically Structured Meta-learning

Oral: Thu Jun 13th 09:35 -- 09:40 AM @ Room 103 Poster: Thu Jun 13th 06:30 -- 09:00 PM @ Pacific Ballroom #183

slide-2
SLIDE 2

2

Gradient-based Meta-learning (MAML [1])

Is global initialization enough?

[1] Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017. http://people.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf

slide-3
SLIDE 3

3

Task-specific Meta-learning (MT-Net [2])

Should the initialization be tailored to each task?

[2] Lee, Yoonho, and Seungjin Choi. "Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace." International Conference on Machine Learning. 2018.

slide-4
SLIDE 4

4

Human Beings: Knowledge Organization and Reuse

[3] Gershman, Samuel J., David M. Blei, and Yael Niv. "Context, learning, and extinction." Psychological review 117.1 (2010): 197. [4] Gershman, Samuel J., et al. "Statistical computations underlying the dynamics of memory updating." PLoS computational biology 10.11 (2014): e1003939.

store play super- market read- ing

slide-5
SLIDE 5

5

Our Solution: Hierarchically Structured Meta-learning

  • Organize tasks by hierarchical clustering
  • Adapt the global initialization to each cluster of tasks

Balance between generalization and customization

slide-6
SLIDE 6

6

HSML: Optimization

  • Incrementally increase the clusters as tasks sequentially arrive.
  • Criterion for adding a clusterβ€”evaluate the average loss over Q epochs

Overall optimization problem Extension to continual adaptation

slide-7
SLIDE 7

7

Analysis

  • For task 𝒰

" ∼ β„°, training and testing samples are i.i.d. drawn from 𝒯"

  • The initialization of HSML (K clusters) can be represented as πœ„'( = βˆ‘+,-

.

/ π‘ͺ+πœ„'

  • According to [5], the assumptions are β„’ ∈ [0, 1] is πœƒ-smooth and has a 𝜍-

Lipschitz Hessian, step size at the 𝑣-step 𝛽< = 𝑑/𝑣 satisfying 𝑑 ≀ min{-

D ,

  • E FD GH I J} with total steps 𝑉 = π‘œ(N.
  • The generalization of base learner 𝑔

Q𝒰

R is bounded by πœ— 𝒯", πœ„' , where

πœ— 𝒯", πœ„' = 𝒫 1 + 1 𝑑V 𝛿X Y 𝑆𝒠𝒰

\ \] πœ„'(

^_ `a

  • b^_

`a

1 π‘œ(N

  • b^_

`a

  • MAML can be regarded as a special case of HSML, i.e., βˆ€π‘™, /

π‘ͺ+ = 𝑱

  • After proving βˆƒ Y

𝐢+ +,-

.

, s.t., Y 𝑆𝒠𝒰

\ \] πœ„'( ≀ Y

𝑆𝒠𝒰

\ \] πœ„' , we conclude that HSML

achieves a tighter generalization bound than MAML

[5] Kuzborskij, Ilja, and Christoph Lampert. "Data-Dependent Stability of Stochastic Gradient Descent." International Conference on Machine Learning. 2018.

slide-8
SLIDE 8

8

Experiments: Toy Regression

  • 4 sync family functionsβ€”Sin, Line, Cubic, Quadratic
  • K-shot: K samples are used as training (each task)

Data Base model

  • 2 layers FC with 40 neurons each
slide-9
SLIDE 9

9

Experiments: Toy Regression

Quantitative results

Method 5-shot 10-shot Global shared (MAML) 2.205Β±0.121 0.761Β±0.06 8 Task-specific (MUMOMAML[6]) 1.096Β±0.085 0.256Β±0.02 8 Our method (HSML) 0.856Β±0.073 0.161Β±0.021

  • Comparison on regression MSEs
  • Comparison in the continual

adaptation scenario

[6] Vuorio, Risto, Shao-Hua Sun, Hexiang Hu, and Joseph J. Lim. "Toward Multimodal Model-Agnostic Meta-Learning." arXiv preprint arXiv:1812.07172 (2018).

slide-10
SLIDE 10

10

Experiments: Toy Regression

  • Cluster assignment interpretation

Qualitive results

  • Regression results
slide-11
SLIDE 11

11

Experiments: Few-shot Classification

  • 4 image classification datasetsβ€”Bird, Texture, Aircraft, Fungi
  • 5-way, 1-shot

Data Base model

  • a convolutional network with 4 convolution blocks
slide-12
SLIDE 12

12

Experiments: Few-shot Classification

Quantitative results

  • Comparison on accuracy
  • Comparison in the continual

adaptation scenario

Method Bird Textu re Aircr aft Fungi

Global shared (MAML)

53.94 % 31.66 % 51.37 % 42.12 %

Task-specific ( MUMOMAML[6])

56.82 % 33.81 % 53.14 % 42.22 %

Our method (HSML)

60.98 % 35.01% 57.38 % 44.02 %

[6] Vuorio, Risto, Shao-Hua Sun, Hexiang Hu, and Joseph J. Lim. "Toward Multimodal Model-Agnostic Meta-Learning." arXiv preprint arXiv:1812.07172 (2018).

slide-13
SLIDE 13

13

Experiments: Few-shot Classification

  • Cluster assignment interpretation

Qualitive results

slide-14
SLIDE 14

14

Conclusions

  • HSML simultaneously customizes task knowledge and preserves

knowledge generalization via the hierarchical clustering structure.

  • Experiments demonstrate the effectiveness and interpretability of

HSML in both toy regression and few-shot classification problems.

slide-15
SLIDE 15

THANK YOU

Oral: Thu Jun 13th 09:35 -- 09:40 AM @ Room 103 Poster: Thu Jun 13th 06:30 -- 09:00 PM @ Pacific Ballroom #183