meta learning of structured representation by proximal
play

Meta-Learning of Structured Representation by Proximal Mapping Mao - PowerPoint PPT Presentation

Meta-Learning of Structured Representation by Proximal Mapping Mao Li, Yingyi Ma, Xinhua Zhang University of Illinois at Chicago Motivation Goal of meta-learning: Extract prior structures from a set of tasks that allows efficient learning of


  1. Meta-Learning of Structured Representation by Proximal Mapping Mao Li, Yingyi Ma, Xinhua Zhang University of Illinois at Chicago

  2. Motivation Goal of meta-learning: Extract prior structures from a set of tasks that allows efficient learning of new tasks . Examples of structural regularities: • Instance level - Input layers: transformation beyond group-based diffeomorphism - Within layers: sparsity, disentanglement, spatial invariance, structured gradient accounting for data covariance, manifold smoothness - Between layers: equvariance, contractivity, robustness under dropout and adversarial perturbations of preceding nodes • Batch/Dataset level - multi-view, multi-modality, multi-domain - diversity, fairness, privacy, causal structure

  3. Existing Approaches • Data Augmentation training data original data augmented data √ boost prediction performance × unclear the improvement is due to the learned representation or due to a better classifier.

  4. Existing Approaches • Auto-encoder downstream encoder decoder label Tasks Input Reconstruction latent representation √ learned the most salient features × usually used as an initialization for subsequent supervised task × not amendable to end-to-end learning Our goal : learn representations that explicitly encode structural priors in an end-to-end fashion.

  5. Existing Approaches • Regularization √ simple and efficient × contention of weights between regularizer and supervised performance

  6. Proposed Method Morph a representation z towards a structured one by proximal mapping: promote desired structure z: mini-batch or single-example a mini-batch a task in meta-learning proximal mapping task-specific base learner Embed the proximal mapping as a layer into deep networks Advantages + decoupling the regularization and supervised learning + extend meta-learning to unsupervised base learners

  7. Proposed Method Morph a representation z towards a structured one by proximal mapping: promote desired structure Before After L : graph-Laplacian (for smoothness on manifold)

  8. MetaProx for Multi-view Learning In multiview learning, observations are available as pairs of views: {x i , y i }. feature extractor f view x supervised predictor h proximal layer view x label features f view x view x view y label features g view y view y proximal map view y supervised predictor h feature extractor g Figure 1: training framework of MetaProx

  9. MetaProx for Multi-view Learning feature extractor f view x supervised predictor h ① proximal layer view x label features f view x view x view y label features g view y view y proximal map view y feature extractor g supervised predictor h ① feature extraction:

  10. MetaProx for Multi-view Learning feature extractor f view x supervised predictor h ② proximal layer view x label features f view x view x view y label features g view y view y proximal map view y feature extractor g supervised predictor h ② proximal mapping: promote high correlation between two views

  11. MetaProx for Multi-view Learning feature extractor f supervised predictor h view x ③ proximal layer view x label features f view x view x view y label features g view y view y proximal map view y supervised predictor h feature extractor g ③ supervised task h : supervised predictor

  12. MetaProx for Multi-view Learning feature extractor f supervised predictor h view x ③ proximal layer view x label features f view x view x view y label features g view y view y proximal map view y supervised predictor h feature extractor g ③ supervised task optimize over red variables

  13. Experiment Results Multi-view image classification - Dataset : a subset of Sketchy (20 classes) { ; … …; } ( , ), ’butterfly’ ( , ), ’cat’ Test accuracy for image classification

  14. Experiment Results (English, German) Crosslingual word embedding word 1 - Dataset : WS353, SimLex999 word 2 . . - Metric : Spearman’ . . s correlation . . word n between the rankings by model and human Table 1: Spearman’ s correlation for word similarities

  15. At the poster: More details and discussions Thanks! “Efficient Meta Learning via Minibatch Proximal Update” (NeurIPS 2019) MetaProx ≠ “Meta-Learning with Implicit Gradients” (NeurIPS 2019) modeling optimization

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend