T ASK N ORM : Rethinking Batch Normalization for Meta-Learning John - PowerPoint PPT Presentation

T ASK N ORM : Rethinking Batch Normalization for Meta-Learning John Bronskill Jonathan Gordon James Requeima Sebastian Nowozin Richard E. Turner University of University of University of Microsoft Research University of Cambridge Cambridge Cambridge, Cambridge, Invenia Labs Microsoft Research Department of Engineering Paper: * Bronskill, J. * Gordon, J. Requeima , J., Nowozin, S. and Turner, R.E. “ TaskNorm: Rethinking Batch Normalization for Meta-Learning .” Proceedings of the 37th International Conference on Machine Learning, PMLR 108 (2020). * Equal contribution . Code: https://github.com/cambridge-mlg/cnaps

TaskNorm: Batch Normalization for Meta-learning with Images • We demonstrate the significant effect of batch normalization (BN) on meta-learning image classification accuracy and training efficiency. • We identify issues with transductive BN schemes used in well known meta-learning algorithms. • We introduce T ASK N ORM , a normalization algorithm that is tailored for the meta-learning setting and improves both image classification accuracy and training efficiency.

Meta-Learning

Meta-Learning ➢ Early Machine Learning : Learn classifier based on engineered features

Meta-Learning ➢ Early Machine Learning : Learn classifier based on engineered features ➢ Deep learning : Jointly learn features and classifier

Meta-Learning ➢ Early Machine Learning : Learn classifier based on engineered features ➢ Deep learning : Jointly learn classifier and model ➢ Meta-Learning : Jointly learn features, classifier, and algorithm [1] [1] Hospedales, Timothy, et al. "Meta-learning in neural networks: A survey." arXiv preprint arXiv:2004.05439 (2020).

Meta-Learning ➢ Early Machine Learning : Learn model based on engineered features ➢ Deep learning : Jointly learn features and model ➢ Meta-Learning : Jointly learn features, model, and algorithm [1] Given a task distribution, learn a new task efficiently. [2] [1] Hospedales, Timothy, et al. "Meta-learning in neural networks: A survey." arXiv preprint arXiv:2004.05439 (2020). [2] Sergey Levine & Chelsea Finn - Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning: https://metalearning-cvpr2019.github.io/assets/CVPR_2019_Metalearning_Tutorial_Chelsea_Finn.pdf

Meta-Learning ➢ Early Machine Learning : Learn model based on engineered features ➢ Deep learning : Jointly learn features and model ➢ Meta-Learning : Jointly learn features, model, and algorithm [1] Given a task distribution, learn a new task efficiently. [2] ➢ Focus on utilizing meta-learning in the few-shot classification scenario [1] Hospedales, Timothy, et al. "Meta-learning in neural networks: A survey." arXiv preprint arXiv:2004.05439 (2020). [2] Sergey Levine & Chelsea Finn - Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning: https://metalearning-cvpr2019.github.io/assets/CVPR_2019_Metalearning_Tutorial_Chelsea_Finn.pdf

Few-Shot Meta-Training / Meta-Testing

Few-Shot Meta-Training / Meta-Testing Task 𝜐 Context Context Target Target Set (𝐸 𝜐 ) Set (𝐸 𝜐 ) Set (𝑈 Set (𝑈 𝜐 ) 𝜐 ) Hugo Larochelle – Generalizing From Few Examples With Meta-Learning: https://www.dropbox.com/s/sm68skkkbxbob0i/metalearning.pdf?dl=0

Few-Shot Meta-Training / Meta-Testing Task 𝜐 meter watch stopwatch clock clock stopwatch Context Context Target Target Set (𝐸 𝜐 ) Set (𝐸 𝜐 ) Set (𝑈 Set (𝑈 𝜐 ) 𝜐 ) 𝑈 𝐸 1 1 Meta-Train Hugo Larochelle – Generalizing From Few Examples With Meta-Learning: https://www.dropbox.com/s/sm68skkkbxbob0i/metalearning.pdf?dl=0

Few-Shot Meta-Training / Meta-Testing Task 𝜐 meter watch stopwatch clock clock stopwatch Context Context Target Target Set (𝐸 𝜐 ) Set (𝐸 𝜐 ) Set (𝑈 Set (𝑈 𝜐 ) 𝜐 ) 𝑈 𝐸 1 1 Meta-Train Context Images & Labels Meta-Learner Hugo Larochelle – Generalizing From Few Examples With Meta-Learning: https://www.dropbox.com/s/sm68skkkbxbob0i/metalearning.pdf?dl=0

Few-Shot Meta-Training / Meta-Testing Task 𝜐 meter watch stopwatch clock clock stopwatch Context Context Target Target Set (𝐸 𝜐 ) Set (𝐸 𝜐 ) Set (𝑈 Set (𝑈 𝜐 ) 𝜐 ) 𝑈 𝐸 1 1 Meta-Train Context Images & Labels Meta-Learner Learner Parameters Hugo Larochelle – Generalizing From Few Examples With Meta-Learning: https://www.dropbox.com/s/sm68skkkbxbob0i/metalearning.pdf?dl=0

Few-Shot Meta-Training / Meta-Testing Task 𝜐 meter watch stopwatch clock clock stopwatch Context Context Target Target Set (𝐸 𝜐 ) Set (𝐸 𝜐 ) Set (𝑈 Set (𝑈 𝜐 ) 𝜐 ) 𝑈 𝐸 1 1 Meta-Train Context Target Images & Labels Images Meta-Learner Learner Predictions Parameters Hugo Larochelle – Generalizing From Few Examples With Meta-Learning: https://www.dropbox.com/s/sm68skkkbxbob0i/metalearning.pdf?dl=0

Few-Shot Meta-Training / Meta-Testing Task 𝜐 meter watch stopwatch clock clock stopwatch Target Context Context Target Target Labels Set (𝐸 𝜐 ) Set (𝐸 𝜐 ) Set (𝑈 Set (𝑈 𝜐 ) 𝜐 ) 𝑈 𝐸 1 1 Meta-Train Loss Context Target Images & Labels Images Meta-Learner Learner Predictions Parameters Hugo Larochelle – Generalizing From Few Examples With Meta-Learning: https://www.dropbox.com/s/sm68skkkbxbob0i/metalearning.pdf?dl=0

Meta-Training / Meta-Testing Task 𝜐 meter watch stopwatch clock clock stopwatch Context Context Target Target Set (𝐸 𝜐 ) Set (𝐸 𝜐 ) Set (𝑈 Set (𝑈 𝜐 ) 𝜐 ) 𝑈 𝐸 1 1 Meta-Train Aramaic8 Aramaic9 Aramaic15 Aramaic19 Aramaic19 Aramaic9 𝑈 2 𝐸 2 Hugo Larochelle – Generalizing From Few Examples With Meta-Learning: https://www.dropbox.com/s/sm68skkkbxbob0i/metalearning.pdf?dl=0

Meta-Training / Meta-Testing Task 𝜐 meter watch stopwatch clock clock stopwatch Context Context Target Target Set (𝐸 𝜐 ) Set (𝐸 𝜐 ) Set (𝑈 Set (𝑈 𝜐 ) 𝜐 ) 𝑈 𝐸 1 1 Meta-Train Aramaic8 Aramaic9 Aramaic15 Aramaic19 Aramaic19 Aramaic9 Target Labels 𝑈 2 𝐸 2 Loss Target Context Images Images & Labels Meta-Learner Learner Predictions Parameters Hugo Larochelle – Generalizing From Few Examples With Meta-Learning: https://www.dropbox.com/s/sm68skkkbxbob0i/metalearning.pdf?dl=0

Meta-Training / Meta-Testing Task 𝜐 meter watch watch stopwatch stopwatch clock meter clock clock clock stopwatch stopwatch Context Context Target Target Set (𝐸 𝜐 ) Set (𝐸 𝜐 ) Set (𝑈 Set (𝑈 𝜐 ) 𝜐 ) 𝑈 𝑈 𝐸 1 𝐸 1 1 1 Meta-Train Aramaic8 Aramaic9 Aramaic15 Aramaic19 Aramaic19 Aramaic9 … 𝑈 2 𝐸 2 Hugo Larochelle – Generalizing From Few Examples With Meta-Learning: https://www.dropbox.com/s/sm68skkkbxbob0i/metalearning.pdf?dl=0

Meta-Training / Meta-Testing Task 𝜐 meter watch stopwatch clock clock stopwatch Context Context Target Target Set (𝐸 𝜐 ) Set (𝐸 𝜐 ) Set (𝑈 Set (𝑈 𝜐 ) 𝜐 ) 𝑈 𝐸 1 1 Meta-Train Aramaic8 Aramaic9 Aramaic15 Aramaic19 Aramaic19 Aramaic9 … 𝑈 2 𝐸 2 ? ? curve speed stop no trucks Meta-Test ∗ ∗ 𝑈 𝐸 1 1 Hugo Larochelle – Generalizing From Few Examples With Meta-Learning: https://www.dropbox.com/s/sm68skkkbxbob0i/metalearning.pdf?dl=0

Meta-Training / Meta-Testing Task 𝜐 meter watch stopwatch clock clock stopwatch Context Context Target Target Set (𝐸 𝜐 ) Set (𝐸 𝜐 ) Set (𝑈 Set (𝑈 𝜐 ) 𝜐 ) 𝑈 𝐸 1 1 Meta-Train Aramaic8 Aramaic9 Aramaic15 Aramaic19 Aramaic19 Aramaic9 … 𝑈 2 𝐸 2 ? ? curve speed stop no trucks Meta-Test ∗ ∗ 𝑈 𝐸 1 1 Target Images Context Images & Labels Learner Meta-Learner Predictions Parameters Hugo Larochelle – Generalizing From Few Examples With Meta-Learning: https://www.dropbox.com/s/sm68skkkbxbob0i/metalearning.pdf?dl=0

Batch Normalization Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by Reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).

Batch Normalization ➢ Goal : Normalize each training batch so that it has: • zero mean • unit variance Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by Reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).

Batch Normalization ➢ Goal : Normalize each training batch so that it has: • zero mean • unit variance ➢ Accelerates Neural Network training by: • Allowing the use of higher learning rates. • Decreasing the sensitivity to network initialization. Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by Reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).

“Conventional” Batch Normalization Algorithm Training: Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by Reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).

“Conventional” Batch Normalization Algorithm Training: 𝐶 = 𝑦 1 , 𝑦 2 , … , 𝑦 𝑛 # a mini-batch ⓪ Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by Reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).

T ASK N ORM : Rethinking Batch Normalization for Meta-Learning John - PowerPoint PPT Presentation

T ASK N ORM : Rethinking Batch Normalization for Meta-Learning John Bronskill Jonathan Gordon James Requeima Sebastian Nowozin Richard E. Turner University of University of University of Microsoft Research University of Cambridge

ASK C o r p o r a t i o n ASK Corporation American ADM, Inc. ASK 1 C o r p o r a t i o n Ask

T AX C REDITS , F ORM 1095-A AND F ORM 1095-B W HAT Y OU N EED T O K NOW J ANUARY , 2020 1

Do early birds catch the w orm s ? Do early birds catch the w orm s ? CYCLI CAL STOCKS SEMI

F ORM 1095-A AND F ORM 1095-B W HAT Y OU N EED T O K NOW 1 nystateofhealth.ny.gov T ODAY S W

Introduc)on of Pla/orm ISF Weina Ma Weina.Ma@uoit.ca Agenda

Ask Arthur Ask Arthur Arthurs Story Ask ArthurThe First Year Resources

Slide Handouts: Instruction Ask the Expert Welcome to Module 6 Lesson 1. Instruction: Ask the

Q2 2019 E NVISION Earnings Presentation ACHIE VE CRE AT E 1 T RANSF ORM GROW

Workshop I Best Practices in St Best Practices in Storm W orm Water r Compliance in Ohio in

Innovati tive e Tec echnologic ological l Platform orm 3d.FAB Innovation Platform 1

Simple.Data ...an ORM without O, R or M Timothe Bourguignon MATHEMA Software GmbH Simple.Data

Goal: next 2-3 weeks Create a pla@orm game (side scrolling game) leveraging Canvas Tutorial

PhD Thesis proposal Supporting Conceptual Modelling in ORM by Reasoning Francesco Sportelli

Data and Process Modelling Lab 5. UML Classic Diagrams and ORM Marco Montali KRDB Research

NJ ASK Information Evening The 2013 NJ ASK* will measure the Common Core State Standards(CCSS)

T ask Analysis Ov erview What is task analysis? T ask Analysis Metho ds task

Tara Sadler, Chris Lehman & Hugo Herrera One of the best ways to gather ideas is to see

Higher order complexity Hugo Fre Mathieu Hoyrup CCA 2013 Hugo Fre Higher order

OCaml Platform v0.1 Anil Madhavapeddy, Amir Chaudhry, Thomas Gazagnaire, David Sheets, Philippe

Liquid FM: Recommending Music through Viscous Democracy Paolo Boldi, Corrado Monti, Massimo

Studio to School Celebrating Impact! Agenda Eval team highlights Celebrate impact

Alice is declared bankrupt. Terry, her trustee in bankruptcy, asks the court to order Alice to

Development of small scale cooling systems at AT-ECR (Compact Cooling System, developments at

Towards direct models of classical logic Locali meeting (Beijing, 4-6/11/2013) Pierre-Louis

T ASK N ORM : Rethinking Batch Normalization for Meta-Learning John - PowerPoint PPT Presentation

T ASK N ORM : Rethinking Batch Normalization for Meta-Learning John Bronskill Jonathan Gordon James Requeima Sebastian Nowozin Richard E. Turner University of University of University of Microsoft Research University of Cambridge

ASK C o r p o r a t i o n ASK Corporation American ADM, Inc. ASK 1 C o r p o r a t i o n Ask

T AX C REDITS , F ORM 1095-A AND F ORM 1095-B W HAT Y OU N EED T O K NOW J ANUARY , 2020 1

Do early birds catch the w orm s ? Do early birds catch the w orm s ? CYCLI CAL STOCKS SEMI

F ORM 1095-A AND F ORM 1095-B W HAT Y OU N EED T O K NOW 1 nystateofhealth.ny.gov T ODAY S W

Introduc)on of Pla/orm ISF Weina Ma Weina.Ma@uoit.ca Agenda

Ask Arthur Ask Arthur Arthurs Story Ask ArthurThe First Year Resources

Slide Handouts: Instruction Ask the Expert Welcome to Module 6 Lesson 1. Instruction: Ask the

Q2 2019 E NVISION Earnings Presentation ACHIE VE CRE AT E 1 T RANSF ORM GROW

Workshop I Best Practices in St Best Practices in Storm W orm Water r Compliance in Ohio in

Innovati tive e Tec echnologic ological l Platform orm 3d.FAB Innovation Platform 1

Simple.Data ...an ORM without O, R or M Timothe Bourguignon MATHEMA Software GmbH Simple.Data

Goal: next 2-3 weeks Create a pla@orm game (side scrolling game) leveraging Canvas Tutorial

PhD Thesis proposal Supporting Conceptual Modelling in ORM by Reasoning Francesco Sportelli

Data and Process Modelling Lab 5. UML Classic Diagrams and ORM Marco Montali KRDB Research

NJ ASK Information Evening The 2013 NJ ASK* will measure the Common Core State Standards(CCSS)

T ask Analysis Ov erview What is task analysis? T ask Analysis Metho ds task

Tara Sadler, Chris Lehman &amp; Hugo Herrera One of the best ways to gather ideas is to see

Higher order complexity Hugo Fre Mathieu Hoyrup CCA 2013 Hugo Fre Higher order

OCaml Platform v0.1 Anil Madhavapeddy, Amir Chaudhry, Thomas Gazagnaire, David Sheets, Philippe

Liquid FM: Recommending Music through Viscous Democracy Paolo Boldi, Corrado Monti, Massimo

Studio to School Celebrating Impact! Agenda Eval team highlights Celebrate impact

Alice is declared bankrupt. Terry, her trustee in bankruptcy, asks the court to order Alice to

Development of small scale cooling systems at AT-ECR (Compact Cooling System, developments at

Towards direct models of classical logic Locali meeting (Beijing, 4-6/11/2013) Pierre-Louis

Tara Sadler, Chris Lehman & Hugo Herrera One of the best ways to gather ideas is to see