MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, - - PowerPoint PPT Presentation

metafun meta learning with iterative functional updates
SMART_READER_LITE
LIVE PREVIEW

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, - - PowerPoint PPT Presentation

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim, Adam R. Kosiorek, Yee Whye Teh 37th International Conference on Machine Learning Supervised Meta-Learning Supervised Meta-Learning Supervised


slide-1
SLIDE 1

MetaFun: Meta-Learning with Iterative Functional Updates

Jin Xu, Jean-Francois Ton, Hyunjik Kim, Adam R. Kosiorek, Yee Whye Teh

37th International Conference on Machine Learning

slide-2
SLIDE 2

Supervised Meta-Learning

slide-3
SLIDE 3

Supervised Meta-Learning

slide-4
SLIDE 4

Supervised Meta-Learning

slide-5
SLIDE 5

Encoder-Decoder Approaches to Supervised Meta-Learning

What is learning ?

slide-6
SLIDE 6

Encoder-Decoder Approaches to Supervised Meta-Learning

What is learning ?

slide-7
SLIDE 7

Encoder-Decoder Approaches to Supervised Meta-Learning

What is learning ?

slide-8
SLIDE 8

Encoder-Decoder Approaches to Supervised Meta-Learning

What is learning ? What is meta-learning?

(in encoder-decoder approaches like CNP[1])

slide-9
SLIDE 9

Encoder-Decoder Approaches to Supervised Meta-Learning

What is learning ? What is meta-learning?

(in encoder-decoder approaches like CNP[1])

slide-10
SLIDE 10

Encoder-Decoder Approaches to Supervised Meta-Learning

“A model of learning”

What is learning ? What is meta-learning?

(in encoder-decoder approaches like CNP[1])

slide-11
SLIDE 11

Encoder-Decoder Approaches to Supervised Meta-Learning

Predicts conditioned

  • n the representation.

Summarises the context

slide-12
SLIDE 12

Encoder-Decoder Approaches to Supervised Meta-Learning

Predicts conditioned

  • n the representation.

Summarises the context

Both parameterised by NNs

slide-13
SLIDE 13

Incorporating Inductive Biases into Deep Learning Models

Classifier

Convolutional structure as inductive bias.

Dog

slide-14
SLIDE 14

Classifier

Convolutional structure as inductive bias. What are good inductive biases for

Incorporating Inductive Biases into Deep Learning Models

“a model of learning”?

Dog

slide-15
SLIDE 15

MetaFun Overview

What is a better form of set representation?

slide-16
SLIDE 16

MetaFun Overview

What is a better form of set representation? What are good inductive biases/structures for the encoder?

slide-17
SLIDE 17

MetaFun Overview

Euclidean Space

slide-18
SLIDE 18

MetaFun Overview

Euclidean Space Function Space (e.g. Hilbert Space)

Functional Representation

slide-19
SLIDE 19

MetaFun Overview

Euclidean Space Function Space (e.g. Hilbert Space)

Functional Representation Encoders with Iterative Structure

slide-20
SLIDE 20

MetaFun Overview

Euclidean Space

Functional Representation Encoders with Iterative Structure

(permutation of data points should not change set representation)

slide-21
SLIDE 21

MetaFun Overview

Euclidean Space

Functional Representation Encoders with Iterative Structure

[1][2][7]

(permutation of data points should not change set representation)

slide-22
SLIDE 22

MetaFun Overview

Euclidean Space

Functional Representation Encoders with Iterative Structure

Fixed dimensional representation can be limiting for large set size[4], and often lead to underfitting[3].

[1][2][7]

(permutation of data points should not change set representation)

slide-23
SLIDE 23

MetaFun Overview

Euclidean Space

Functional Representation Encoders with Iterative Structure

Permutation invariance Flexible capacity Fixed dimensional representation can be limiting for large set size[4], and often lead to underfitting[3].

[1][2][7]

(permutation of data points should not change set representation)

slide-24
SLIDE 24

MetaFun Overview

Euclidean Space

Functional Representation Encoders with Iterative Structure

Fixed dimensional representation can be limiting for large set size[4], and often lead to underfitting[3]. Permutation invariance Flexible capacity Self-attention modules[6] or relation network[9] can model interaction within the context, but not context-target interaction

[1][2][7]

(permutation of data points should not change set representation)

slide-25
SLIDE 25

MetaFun Overview

Euclidean Space

Functional Representation Encoders with Iterative Structure

Permutation invariance Flexible capacity Within-context and context-target interaction Fixed dimensional representation can be limiting for large set size[4], and often lead to underfitting[3]. Self-attention modules[6] or relation network[9] can model interaction within the context, but not context-target interaction

[1][2][7]

(permutation of data points should not change set representation)

slide-26
SLIDE 26

MetaFun Overview

Functional Representation Encoders with Iterative Structure

Permutation invariance Flexible capacity Within-context and context-target interaction Euclidean Space Function Space (e.g. Hilbert Space)

slide-27
SLIDE 27

MetaFun Overview

Functional Representation Encoders with Iterative Structure

Permutation invariance Flexible capacity Within-context and context-target interaction Euclidean Space Function Space (e.g. Hilbert Space) Learning to update representation with feedback is easier than learning representation directly

slide-28
SLIDE 28

MetaFun Overview

Functional Representation Encoders with Iterative Structure

Permutation invariance Flexible capacity Within-context and context-target interaction Euclidean Space Function Space (e.g. Hilbert Space) Learning to update representation with feedback is easier than learning representation directly Iterative structure may be a good inductive bias for “the model of learning”. (Learning algorithms are often iterative, such as gradient descent)

slide-29
SLIDE 29

MetaFun

slide-30
SLIDE 30

solve by iterative optimisation Gradient Descent

MetaFun and Functional Gradient Descent

slide-31
SLIDE 31

solve by iterative optimisation For supervised learning problems, the objective function often has this form: solve by iterative optimisation Gradient Descent Functional Gradient Descent

MetaFun and Functional Gradient Descent

slide-32
SLIDE 32

Gradient Descent Functional Gradient Descent solve by iterative optimisation For supervised learning problems, the objective function often has this form: solve by iterative optimisation

MetaFun and Functional Gradient Descent

slide-33
SLIDE 33

?

MetaFun and Functional Gradient Descent

slide-34
SLIDE 34

?

MetaFun and Functional Gradient Descent

slide-35
SLIDE 35

?

MetaFun and Functional Gradient Descent

slide-36
SLIDE 36

?

MetaFun and Functional Gradient Descent

slide-37
SLIDE 37

?

Evaluate functional representation at context:

MetaFun and Functional Gradient Descent

slide-38
SLIDE 38

?

Local update funcion: Evaluate functional representation at context:

MetaFun and Functional Gradient Descent

slide-39
SLIDE 39

?

Local update funcion: Evaluate functional representation at context: Functional pooling:

MetaFun and Functional Gradient Descent

slide-40
SLIDE 40

?

Local update funcion: Evaluate functional representation at context: Functional pooling:

MetaFun and Functional Gradient Descent

slide-41
SLIDE 41

?

Local update funcion: Evaluate functional representation at context: Functional pooling:

MetaFun and Functional Gradient Descent

slide-42
SLIDE 42

Local update funcion: Functional pooling: Apply functional updates: MetaFun Iteration

MetaFun

will be the final representation after iterations

slide-43
SLIDE 43

Local update funcion: Functional pooling: Apply functional updates: MetaFun Iteration

Functional Representation

Permutation invariance ✔ Flexible capacity ✔ Within-context and context-target interaction

MetaFun

slide-44
SLIDE 44

Local update funcion: Functional pooling: Apply functional updates: MetaFun Iteration

Functional Representation

Permutation invariance ✔ Flexible capacity ✔ Within-context and context-target interaction ✔ Both the within-context interaction and the interaction between context and target are considered when updating the representation at each iteration.

MetaFun

slide-45
SLIDE 45

Local update funcion: Functional pooling: Apply functional updates: MetaFun Iteration

MetaFun

slide-46
SLIDE 46

Local update funcion: Functional pooling: Apply functional updates: MetaFun Iteration Deep kernels or attention modules

MetaFun for Classification

slide-47
SLIDE 47

Local update funcion: Functional pooling: Apply functional updates: MetaFun Iteration Deep kernels or attention modules Regression: MLP on concatenation of inputs Classification: ?

MetaFun for Classification

slide-48
SLIDE 48

?

Local update funcion: Evaluate functional representation at context: Functional pooling:

MetaFun for Classification

slide-49
SLIDE 49

Local update funcion:

MetaFun for Classification

slide-50
SLIDE 50

Local update funcion:

MetaFun for Classification

slide-51
SLIDE 51

Local update funcion: Functional pooling: Apply functional updates: MetaFun Iteration Regression: MLP on concatenation of inputs Classification: With structure similar to Deep kernels or attention modules

MetaFun for Classification

slide-52
SLIDE 52

Local update funcion: Functional pooling: Apply functional updates: MetaFun Iteration Regression: MLP on concatenation of inputs Classification: With structure similar to Deep kernels or attention modules Incorporate label information into the network structure rather than concatenating the label to the inputs

MetaFun for Classification

slide-53
SLIDE 53

Local update funcion: Functional pooling: Apply functional updates: MetaFun Iteration Incorporate label information into the network structure rather than concatenating the label to the inputs Naturally integrate within-class and between-class interaction Regression: MLP on concatenation of inputs Classification: With structure similar to Deep kernels or attention modules

MetaFun for Classification

slide-54
SLIDE 54

MetaFun and Gradient-Based Meta-Learning

Local update funcion: Functional pooling: Apply functional updates: MetaFun Iteration Model Agnostic Meta-Learning (MAML)[8]

slide-55
SLIDE 55

MetaFun and Gradient-Based Meta-Learning

Local update funcion: Functional pooling: Apply functional updates: MetaFun Iteration Model Agnostic Meta-Learning (MAML)[8]

During meta-training phase, MAML finds a good initialisation from related tasks. During test time, MAML runs a few gradient descent steps from the learned initialisation on the context of a new task.

slide-56
SLIDE 56

MetaFun and Gradient-Based Meta-Learning

Local update funcion: Functional pooling: Apply functional updates: MetaFun Iteration Model Agnostic Meta-Learning (MAML)[8]

During meta-training phase, MAML finds a good initialisation from related tasks. During test time, MAML runs a few gradient descent steps from the learned initialisation on the context of a new task.

slide-57
SLIDE 57

MetaFun and Gradient-Based Meta-Learning

Local update funcion: Functional pooling: Apply functional updates: MetaFun Iteration Model Agnostic Meta-Learning (MAML)[8]

During meta-training phase, MAML finds a good initialisation from related tasks. During test time, MAML runs a few gradient descent steps from the learned initialisation on the context of a new task. Local updates (following gradient) SumPooling (permutation-invariant)

slide-58
SLIDE 58

MetaFun and Gradient-Based Meta-Learning

Local update funcion: Functional pooling: Apply functional updates: MetaFun Iteration Model Agnostic Meta-Learning (MAML)[8]

During meta-training phase, MAML finds a good initialisation from related tasks. During test time, MAML runs a few gradient descent steps from the learned initialisation on the context of a new task. Local updates (following gradient) SumPooling (permutation-invariant) FunPooling Local update function (parameterised by NNs)

slide-59
SLIDE 59

MetaFun and Gradient-Based Meta-Learning

1D Sinusoid Regression Tasks

slide-60
SLIDE 60

MetaFun and Gradient-Based Meta-Learning

1D Sinusoid Regression Tasks MetaFun: Smooth updates and match the ground truth very well across the whole period. MAML: Non-smooth updates and not as good predictions especially on the left side where there is no context points.

slide-61
SLIDE 61

MetaFun and Gradient-Based Meta-Learning

1D Sinusoid Regression Tasks MetaFun: Smooth updates and match the ground truth very well across the whole period. MAML: Non-smooth updates and not as good predictions especially on the left side where there is no context points.

slide-62
SLIDE 62

Large-Scale Few-shot Classification

miniImageNet tieredImageNet

Model 1-shot 5-shot LEO[9] 61.76 ± 0.08% 77.59 ± 0.12% MetaFun (deep kernel version) 61.16 ± 0.15% 78.20 ± 0.16% MetaFun (attention version) 62.12 ± 0.30% 77.78 ± 0.12% Model 1-shot 5-shot LEO 66.33 ± 0.05% 81.44 ± 0.09% MetaOptNet-SVM 65.81 ± 0.74% 81.75 ± 0.58% MetaFun (deep kernel version) 67.27 ± 0.20% 83.28 ± 0.12% MetaFun (attention version) 67.72 ± 0.14% 82.81 ± 0.15% Model 1-shot 5-shot LEO 63.97 ± 0.20% 79.49 ± 0.70% MetaOptNet-SVM[10] 64.09 ± 0.62% 80.00 ± 0.45% MetaFun (deep kernel version) 63.39 ± 0.15% 80.81 ± 0.10% MetaFun (attention version) 64.13 ± 0.13% 80.82 ± 0.17%

(without data augmentation) (with data augmentation) (without data augmentation)

slide-63
SLIDE 63

Large-Scale Few-shot Classification

miniImageNet tieredImageNet

Model 1-shot 5-shot LEO[9] 61.76 ± 0.08% 77.59 ± 0.12% MetaFun (deep kernel version) 61.16 ± 0.15% 78.20 ± 0.16% MetaFun (attention version) 62.12 ± 0.30% 77.78 ± 0.12% Model 1-shot 5-shot LEO 66.33 ± 0.05% 81.44 ± 0.09% MetaOptNet-SVM 65.81 ± 0.74% 81.75 ± 0.58% MetaFun (deep kernel version) 67.27 ± 0.20% 83.28 ± 0.12% MetaFun (attention version) 67.72 ± 0.14% 82.81 ± 0.15% Model 1-shot 5-shot LEO 63.97 ± 0.20% 79.49 ± 0.70% MetaOptNet-SVM[10] 64.09 ± 0.62% 80.00 ± 0.45% MetaFun (deep kernel version) 63.39 ± 0.15% 80.81 ± 0.10% MetaFun (attention version) 64.13 ± 0.13% 80.82 ± 0.17%

(without data augmentation) (with data augmentation) (without data augmentation)

We demonstrates that encoder-decoder style meta-learning methods like conditional neural processes can also also achieves SOTA on large-scale few-shot classification benchmarks.

slide-64
SLIDE 64

Large-Scale Few-shot Classification

We demonstrates that encoder-decoder style meta-learning methods like conditional neural processes can also also achieves SOTA on large-scale few-shot classification benchmarks.

Functional set representation Iterative structure for the encoder?

slide-65
SLIDE 65

Thank you!

jin.xu@stats.ox.ac.uk @jinxu06 (code available here) @jinxu06

slide-66
SLIDE 66

References

[1] Garnelo, Marta, et al. "Conditional Neural Processes." International Conference on Machine Learning. 2018. [2] Garnelo, Marta, et al. "Neural processes." arXiv preprint arXiv:1807.01622 (2018). [3] Kim, Hyunjik, et al. "Attentive neural processes." International Conference on Learning Representations. 2019. [4] Wagstaff, Edward, et al. "On the Limitations of Representing Functions on Sets." International Conference on Machine Learning. 2019. [5] Bloem-Reddy, B. and Teh, Y. W. "Probabilistic symmetries and invariant neural networks." Journal of Machine Learning Research, 21(90):1–61, 2020. [6] Lee, Juho, et al. "Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks." International Conference on Machine Learning. 2019. [7] Zaheer, Manzil, et al. "Deep sets." Advances in neural information processing systems. 2017. [8] Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks." International Conference on Machine Learning. 2017. [9] Rusu, Andrei A., et al. "Meta-learning with latent embedding optimization." International Conference on Learning

  • Representations. 2019.

[10] Lee, Kwonjoon, et al. "Meta-learning with differentiable convex optimization." Proceedings of the IEEE Conference

  • n Computer Vision and Pattern Recognition. 2019.