Optimization-Based Meta-Learning CS 330 1 Course Reminders HW1 - PowerPoint PPT Presentation

Optimization-Based Meta-Learning CS 330 1

Course Reminders HW1 due next Weds (9/30). Project guidelines posted — start forming groups & formulating ideas. Guest lecture by Matt Johnson on Monday! 2

Plan for Today Recap - Meta-learning problem & black-box meta-learning Optimization Meta-Learning } Part of Homework 2! - Overall approach - Compare: optimization-based vs. black-box - Challenges & solutions - Case study of land cover classi fi cation (time-permitting) Goals for by the end of lecture : - Basics of optimization-based meta-learning techniques (& how to implement) - Trade-o ff s between black-box and optimization-based meta-learning

Problem Settings Recap Multi-Task Learning Transfer Learning 𝒰 1 , ⋯ , 𝒰 T 𝒰 b 𝒰 a Solve multiple tasks at once. Solve target task after solving source task T by transferring knowledge learned from 𝒰 a ∑ min ℒ i ( θ , 𝒠 i ) θ i =1 The Meta-Learning Problem 𝒰 1 , …, 𝒰 n Given data from , quickly solve new task 𝒰 test In transfer learning and meta-learning: generally impractical to access prior tasks In all settings: tasks must share structure.

Example Meta-Learning Problem 5 -way, 1 -shot image classifica;on (MiniImagenet) Given 1 example of 5 classes: Classify new examples held-out classes meta-training training classes … … any ML regression , language genera;on , skill learning , Can replace image classificaCon with: problem 5

<latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> Black-Box Adapta;on φ i f θ 4 general form : y ts y ts = f black-box ( D tr i , x ts ) x ts 0 1 2 3 4 D tr i - challenging op6miza6on problem + expressive φ i = f θ ( D tr i ) How else can we represent ? What if we treat it as an op6miza6on procedure?

Plan for Today Recap - Meta-learning problem & black-box meta-learning Optimization Meta-Learning } Part of Homework 2! - Overall approach - Compare: optimization-based vs. black-box - Challenges & solutions - Case study of land cover classi fi cation (time-permitting)

Black-Box Adapta;on Black-Box Adapta;on Op;miza;on-Based Adapta;on φ i f θ 4 y ts x ts 0 1 2 3 4 D tr i

Black-Box Adapta;on Op;miza;on-Based Adapta;on φ i r θ L 4 y ts x ts 0 1 2 3 4 D tr i Key idea: embed opCmizaCon inside the inner learning process Why might this make sense?

Recall: Fine-tuning pre-trained parameters φ θ � α r θ L ( θ , D tr ) Fine-tuning training data for new task (typically for many gradient steps) Universal Language Model Fine-Tuning for Text Classifica;on . Howard, Ruder. ‘18 Fine-tuning less effecCve with very small datasets . 10

Op;miza;on-Based Adapta;on pre-trained parameters φ θ � α r θ L ( θ , D tr ) Fine-tuning training data [test-Cme] for new task X X L ( θ � α r θ L ( θ , D tr L ( θ � α r θ L ( θ , D tr L ( θ � α r θ L ( θ , D tr i ) , D ts i ) , D ts min min min i ) , i ) i ) Meta-learning θ θ θ task i task i i Key idea : Over many tasks, learn parameter vector θ that transfers via fine-tuning 11 Finn, Abbeel, Levine. Model-Agnostic Meta-Learning. ICML 2017

Op;miza;on-Based Adapta;on X L ( θ � α r θ L ( θ , D tr i ) , D ts min i ) θ task i parameter vector being meta-learned op;mal parameter φ ∗ vector for task i i M odel- A gnos;c M eta- L earning 12 Finn, Abbeel, Levine. Model-Agnostic Meta-Learning. ICML 2017

Op;miza;on-Based Adapta;on Key idea : Acquire through opCmizaCon. General Algorithm : Black-box approach OpCmizaCon-based approach 1. Sample task T i (or mini batch of tasks) 2. Sample disjoint datasets D tr i , D test from D i i 3. Compute φ i ← f θ ( D tr Optimize φ i θ � α r θ L ( θ , D tr i ) i ) 4. Update θ using r θ L ( φ i , D test ) i —> brings up second-order derivaCves Do we need to compute the full Hessian? -> whiteboard Do we get higher-order deriva;ves with more inner gradient steps? 13

Plan for Today Recap - Meta-learning problem & black-box meta-learning Optimization Meta-Learning } Part of Homework 2! - Overall approach - Compare: optimization-based vs. black-box - Challenges & solutions - Case study of land cover classi fi cation (time-permitting)

<latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> Op;miza;on vs. Black-Box Adapta;on Black-box adapta;on Model-agnos;c meta-learning general form : y ts = f black-box ( D tr i , x ts ) y ts MAML can be viewed as computa6on graph , x ts with embedded gradient operator Note : Can mix & match components of computa;on graph Learn ini;aliza;on but replace gradient update with learned network f ( θ , D tr i , r θ L ) Ravi & Larochelle ICLR ’17 (actually precedes MAML) This computa6on graph view of meta-learning will come back again! 15

Op;miza;on vs. Black-Box Adapta;on How well can learning procedures generalize to similar, but extrapolated tasks? MAML SNAIL, Omniglot image classifica6on MetaNetworks performance task variability Does this structure come at a cost? Finn & Levine ICLR ’18 16

Optimization-Based Meta-Learning CS 330 1 Course Reminders HW1 - PowerPoint PPT Presentation

Optimization-Based Meta-Learning CS 330 1 Course Reminders HW1 due next Weds (9/30). Project guidelines posted start forming groups & formulating ideas. Guest lecture by Matt Johnson on Monday! 2 Plan for Today Recap - Meta-learning

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong Feng Ou Outline

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Robust Deep Learning Based on Meta-learning Deyu Meng Xian Jiaotong University

Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong Feng TG Ph.D. Student

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,

Individual Participant Data (IPD) Reviews and Meta analyses Lesley Stewart Director, CRD Larysa

Lecture 31/Chapter 25 More about Meta-Analysis Benefits and Pitfalls An Application:

Simultaneous meta and data manipulation in Blaise Marien Lina Statistics netherlands Statistics

Digital Transformation: Pre & Post Covid-19 Jim Balsillie Founder, Centre for International

CS 563 Mobile OS & Device Security Advanced Computer Security CS 563 University of Illinois

WebKit Laszlo Gombos, Samsung Who I am Leading a WebKit team at Samsung WebKit reviewer since

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Organic Compounds in Water and Wastewater NOM and MS Methods Lecture #10 CEE 697z - Lecture #10

Space-Time Finite-Element Exterior Calculus and Variational Discretizations of Gauge Field

Black hole -state geometries, antibranes & the dS landscape Iosif Bena IPhT, CEA Saclay

CS102 Unit 0 Introduction 2 Introduction This is how we often see and interact with

Optimization-Based Meta-Learning CS 330 1 Course Reminders HW1 - PowerPoint PPT Presentation

Optimization-Based Meta-Learning CS 330 1 Course Reminders HW1 due next Weds (9/30). Project guidelines posted start forming groups & formulating ideas. Guest lecture by Matt Johnson on Monday! 2 Plan for Today Recap - Meta-learning

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong Feng Ou Outline

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

The Meta-Learning Problem &amp; Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Robust Deep Learning Based on Meta-learning Deyu Meng Xian Jiaotong University

Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong Feng TG Ph.D. Student

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers &amp; References META-LRA Kft. 8400 Ajka,

Individual Participant Data (IPD) Reviews and Meta analyses Lesley Stewart Director, CRD Larysa

Lecture 31/Chapter 25 More about Meta-Analysis Benefits and Pitfalls An Application:

Simultaneous meta and data manipulation in Blaise Marien Lina Statistics netherlands Statistics

Digital Transformation: Pre &amp; Post Covid-19 Jim Balsillie Founder, Centre for International

CS 563 Mobile OS &amp; Device Security Advanced Computer Security CS 563 University of Illinois

WebKit Laszlo Gombos, Samsung Who I am Leading a WebKit team at Samsung WebKit reviewer since

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Organic Compounds in Water and Wastewater NOM and MS Methods Lecture #10 CEE 697z - Lecture #10

Space-Time Finite-Element Exterior Calculus and Variational Discretizations of Gauge Field

Black hole -state geometries, antibranes &amp; the dS landscape Iosif Bena IPhT, CEA Saclay

CS102 Unit 0 Introduction 2 Introduction This is how we often see and interact with

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,

Digital Transformation: Pre & Post Covid-19 Jim Balsillie Founder, Centre for International

CS 563 Mobile OS & Device Security Advanced Computer Security CS 563 University of Illinois

Black hole -state geometries, antibranes & the dS landscape Iosif Bena IPhT, CEA Saclay