optimization based meta learning
play

Optimization-Based Meta-Learning CS 330 1 Course Reminders HW1 - PowerPoint PPT Presentation

Optimization-Based Meta-Learning CS 330 1 Course Reminders HW1 due next Weds (9/30). Project guidelines posted start forming groups & formulating ideas. Guest lecture by Matt Johnson on Monday! 2 Plan for Today Recap - Meta-learning


  1. Optimization-Based Meta-Learning CS 330 1

  2. Course Reminders HW1 due next Weds (9/30). Project guidelines posted — start forming groups & formulating ideas. Guest lecture by Matt Johnson on Monday! 2

  3. Plan for Today Recap - Meta-learning problem & black-box meta-learning Optimization Meta-Learning } Part of Homework 2! - Overall approach - Compare: optimization-based vs. black-box - Challenges & solutions - Case study of land cover classi fi cation (time-permitting) Goals for by the end of lecture : - Basics of optimization-based meta-learning techniques (& how to implement) - Trade-o ff s between black-box and optimization-based meta-learning

  4. Problem Settings Recap Multi-Task Learning Transfer Learning 𝒰 1 , ⋯ , 𝒰 T 𝒰 b 𝒰 a Solve multiple tasks at once. Solve target task after solving source task T by transferring knowledge learned from 𝒰 a ∑ min ℒ i ( θ , 𝒠 i ) θ i =1 The Meta-Learning Problem 𝒰 1 , …, 𝒰 n Given data from , quickly solve new task 𝒰 test In transfer learning and meta-learning: generally impractical to access prior tasks In all settings: tasks must share structure.

  5. Example Meta-Learning Problem 5 -way, 1 -shot image classifica;on (MiniImagenet) Given 1 example of 5 classes: Classify new examples held-out classes meta-training training classes … … any ML regression , language genera;on , skill learning , Can replace image classificaCon with: problem 5

  6. <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> Black-Box Adapta;on φ i f θ 4 general form : y ts y ts = f black-box ( D tr i , x ts ) x ts 0 1 2 3 4 D tr i - challenging op6miza6on problem + expressive φ i = f θ ( D tr i ) How else can we represent ? What if we treat it as an op6miza6on procedure?

  7. Plan for Today Recap - Meta-learning problem & black-box meta-learning Optimization Meta-Learning } Part of Homework 2! - Overall approach - Compare: optimization-based vs. black-box - Challenges & solutions - Case study of land cover classi fi cation (time-permitting)

  8. Black-Box Adapta;on Black-Box Adapta;on Op;miza;on-Based Adapta;on φ i f θ 4 y ts x ts 0 1 2 3 4 D tr i

  9. Black-Box Adapta;on Op;miza;on-Based Adapta;on φ i r θ L 4 y ts x ts 0 1 2 3 4 D tr i Key idea: embed opCmizaCon inside the inner learning process Why might this make sense?

  10. Recall: Fine-tuning pre-trained parameters φ θ � α r θ L ( θ , D tr ) Fine-tuning training data for new task (typically for many gradient steps) Universal Language Model Fine-Tuning for Text Classifica;on . Howard, Ruder. ‘18 Fine-tuning less effecCve with very small datasets . 10

  11. Op;miza;on-Based Adapta;on pre-trained parameters φ θ � α r θ L ( θ , D tr ) Fine-tuning training data [test-Cme] for new task X X L ( θ � α r θ L ( θ , D tr L ( θ � α r θ L ( θ , D tr L ( θ � α r θ L ( θ , D tr i ) , D ts i ) , D ts min min min i ) , i ) i ) Meta-learning θ θ θ task i task i i Key idea : Over many tasks, learn parameter vector θ that transfers via fine-tuning 11 Finn, Abbeel, Levine. Model-Agnostic Meta-Learning. ICML 2017

  12. Op;miza;on-Based Adapta;on X L ( θ � α r θ L ( θ , D tr i ) , D ts min i ) θ task i parameter vector being meta-learned op;mal parameter φ ∗ vector for task i i M odel- A gnos;c M eta- L earning 12 Finn, Abbeel, Levine. Model-Agnostic Meta-Learning. ICML 2017

  13. Op;miza;on-Based Adapta;on Key idea : Acquire through opCmizaCon. General Algorithm : Black-box approach OpCmizaCon-based approach 1. Sample task T i (or mini batch of tasks) 2. Sample disjoint datasets D tr i , D test from D i i 3. Compute φ i ← f θ ( D tr Optimize φ i θ � α r θ L ( θ , D tr i ) i ) 4. Update θ using r θ L ( φ i , D test ) i —> brings up second-order derivaCves Do we need to compute the full Hessian? -> whiteboard Do we get higher-order deriva;ves with more inner gradient steps? 13

  14. Plan for Today Recap - Meta-learning problem & black-box meta-learning Optimization Meta-Learning } Part of Homework 2! - Overall approach - Compare: optimization-based vs. black-box - Challenges & solutions - Case study of land cover classi fi cation (time-permitting)

  15. <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> Op;miza;on vs. Black-Box Adapta;on Black-box adapta;on Model-agnos;c meta-learning general form : y ts = f black-box ( D tr i , x ts ) y ts MAML can be viewed as computa6on graph , x ts with embedded gradient operator Note : Can mix & match components of computa;on graph Learn ini;aliza;on but replace gradient update with learned network f ( θ , D tr i , r θ L ) Ravi & Larochelle ICLR ’17 (actually precedes MAML) This computa6on graph view of meta-learning will come back again! 15

  16. Op;miza;on vs. Black-Box Adapta;on How well can learning procedures generalize to similar, but extrapolated tasks? MAML SNAIL, Omniglot image classifica6on MetaNetworks performance task variability Does this structure come at a cost? Finn & Levine ICLR ’18 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend