Parameter-Efficient Transfer Learning for NLP N. Houlsby, A. - PowerPoint PPT Presentation

Mar 30, 2023 •571 likes •739 views

Parameter-Efficient Transfer Learning for NLP N. Houlsby, A. Giurgiu, S. Jastrzbski, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly Imagine doing Transfer Learning for NLP Ingredients: A large pretrained model

Parameter-Efficient Transfer Learning for NLP N. Houlsby, A. Giurgiu*, S. Jastrzębski*, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly
Imagine doing Transfer Learning for NLP Ingredients: ● A large pretrained model (BERT) ● Fine-tuning 2/5
Imagine doing Transfer Learning for NLP Ingredients: ● A large pretrained model (BERT) ● Fine-tuning BERT Task 1 BERT Task 2 Problem for large N ... BERT Task N-1 BERT Task N 2/5
Imagine doing Transfer Learning for NLP Ingredients: ● A large pretrained model (BERT) ● Fine-tuning + Adapter 1 Task 1 + Adapter 2 Task 2 BERT + Adapter N-1 Task N-1 + Adapter N Task N 2/5
BERT + Adapters ● Solution : Train tiny adapter modules at each layer Solution 5 5 3/5
BERT + Adapters ● Solution : Train tiny adapter modules at each layer Solution 6 6 3/5
BERT + Adapters ● Solution : Train tiny adapter modules at each layer Solution 7 7 3/5
BERT + Adapters ● Solution : Train tiny adapter modules at each layer Bottleneck Solution 8 8 3/5
Results on GLUE Benchmark 4/5
Results on GLUE Benchmark 4/5
Results on GLUE Benchmark 4/5
Results on GLUE Benchmark 4/5
Results on GLUE Benchmark Fewer parameters, similar performance Fewer parameters, degraded performance 4/5
Results on GLUE Benchmark 0.4% accuracy drop for 96.4% reduction in the # of parameters/task 4/5
Conclusions 1. If we move towards a single model future, we need to improve parameter-efficiency of transfer learning 2. We propose a module reducing drastically # params/task for NLP , e.g. by 30x at only 0.4% accuracy drop Related work (@ ICML): “ BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning”, A. Stickland & I. Murray Please come to our poster today at 6:30 PM (#102) 5/5

Recommend

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly Most top-tier universities now have NLP faculty (Stanford, Cornell, Berkeley, MIT, UPenn, CMU, Hopkins, etc) Commercial NLP hiring: Google,

380 views • 24 slides

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly Most top-tier universities now have NLP faculty (Stanford, Cornell, Berkeley, MIT, UPenn, CMU, Hopkins, etc) Commercial NLP hiring: Google,

541 views • 25 slides

Transfer Learning in NLP Helping Small Teams Account for Small Datasets Ryan Smith

Transfer Learning in NLP Helping Small Teams Account for Small Datasets Ryan Smith ryan@wootric.com Transfer Learning in NLP What well cover A look into a real problem involving NLP and Deep Learning A brief discussion of the

839 views • 36 slides

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

CS 381 Spring 2016 6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void f(int x, int y) { y := x+1 }; x := 3; What is the value of z? z := 1; f(2*x,z); ... it depends on the parameter

244 views • 22 slides

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning Control Examples Where to apply parameter control How to apply parameter control Parameter Control Motivation Motivation An EA

659 views • 7 slides

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

7/13/2012 NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity Parsing Part of Speech Lecture delivered at the summer school on NLP, IIIT Hyderabad, Tagging 10July, 2012 Vision Speech Morph

522 views • 39 slides

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

CS11-747 Neural Networks for NLP Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and Sequential Data NLP and Sequential Data NLP is full of sequential data NLP and Sequential Data NLP is full of

2.05k views • 171 slides

Industrial Transfer Learning Introduction to Industrial Transfer Learning Industrial Transfer

Industrial Transfer Learning Introduction to Industrial Transfer Learning Industrial Transfer Learning Motivation Machine Learning in Manufacturing Decision Support Automation Process Control Self-Optimization Predictive Quality Predictive

744 views • 24 slides

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation Rodrigo Agerri (IXA NLP Group, UPV/EHU OpenNLP project, Apache Software Foundation) IXA pipes:

586 views • 21 slides

Parameter Passing and Pointers Parameter passing and functions I: reference parameters

Parameter Passing and Pointers Parameter Passing and Pointers Outline Parameter Passing and Pointers Parameter passing and functions I: reference parameters call-by-value vs call-by-name call-by-name by using reference parameters Martin Emms

60 views • 4 slides

10/16/19 Parameters and Parameter Tuning Genetic Algorithms History Taxonomy

10/16/19 Parameters and Parameter Tuning Genetic Algorithms History Taxonomy Parameter Tuning vs Parameter Control EA calibration Parameter Tuning Testing Effort Parameters and Recommendations Parameter

570 views • 11 slides

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We consider this topic under the following headings: The Spectrum of Radiation Radiative Transfer Radiative transfer is a branch of atmospheric

905 views • 58 slides

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for Natural Language Overview NLP for Ontologies Ontologies for NLP Portuguese resources Research at PUCRS Introduction We think and we

760 views • 71 slides

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural Language Processing We try to extract meaning from text: sentiment, word sense, semantic similarity, etc. How does Deep Learning relate? NLP

2.03k views • 124 slides

Progress on the Development of Parameter Progress on the Development of Parameter Values for

Progress on the Development of Parameter Progress on the Development of Parameter Values for Reference Animals and Plants Values for Reference Animals and Plants IAEA EMRAS II Transfer Group IAEA EMRAS II Transfer Group Vienna, Austria, 25-

295 views • 13 slides

What is NLP? CMSC 473/673 http://www.qwantz.com/index.php?comic=170 Todays Learning Goals

What is NLP? CMSC 473/673 http://www.qwantz.com/index.php?comic=170 Todays Learning Goals NLP vs. CL Terminology: NLP: vocabulary, token, type, one-hot encoding, dense embedding, parameter/weight, corpus/corpora Linguistics:

1.12k views • 76 slides

Performance (III) & Power/Energy Hung-Wei Tseng Summary: Performance Equation Instructions

Performance (III) & Power/Energy Hung-Wei Tseng Summary: Performance Equation Instructions Cycles Seconds Execution Time = Cycle Program Instruction ET = IC * CPI * Cycle Time IC (Instruction Count) ISA, Compiler,

614 views • 40 slides

Bayesian modeling of behavior Wei Ji Ma New York University Center for Neural Science and

Bayesian modeling of behavior Wei Ji Ma New York University Center for Neural Science and Department of Psychology Teaching assistants Group 1: Anna Kutschireiter: postdoc at University of Bern Group 2: Anne-Lene Sax: PhD student at University of

1.47k views • 111 slides

Preparing for Ontarios New Workplace Violence and Harassment Legislation Thursday, January

Emond Harnden Breakfast Seminar Preparing for Ontarios New Workplace Violence and Harassment Legislation Thursday, January 28, 2010 Colleen Dunlop Kecia Podetz www.emondharnden.com 1 Session Overview Definitions of Workplace Violence

576 views • 22 slides

Multilingual Training and Cross-lingual Transfer Xinyi Wang Many languages are left behind

CMU CS11-737: Multilingual NLP Multilingual Training and Cross-lingual Transfer Xinyi Wang Many languages are left behind There is not enough monolingual data for many languages Even less annotated data for NMT, sequence label,

912 views • 37 slides

GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training Jiezhong Qiu , Qibin Chen,

GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training Jiezhong Qiu , Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, Jie Tang Real-world Graphs Biol Bi ologi ogical G Graph Question: Soci So cial

598 views • 20 slides

Fine tuning the axioms of relativity to specific subjects Gergely Sz ekely www.renyi.hu/~turms

Fine tuning the axioms of relativity to specific subjects Gergely Sz ekely www.renyi.hu/~turms Joint work with: H. Andr eka, J. Madar asz and I. and P. N emeti Why type questions in relativity Why type questions in relativity A

471 views • 34 slides

Informed Search: Coarse to Fine H YP ERPARAMETER TUN IN G IN P YTH ON Alex Scriven Data

Informed Search: Coarse to Fine H YP ERPARAMETER TUN IN G IN P YTH ON Alex Scriven Data Scientist Informed vs Uninformed Search So far everything we have done has been uninformed search: Uninformed search: Where each iteration of

644 views • 40 slides

Web Governance Committee January 25, 2017 Agenda Site Audit Consultant SiteImprove:

Web Governance Committee January 25, 2017 Agenda Site Audit Consultant SiteImprove: Fine-tuning the Website SCC Dictionary Presidents Video Page Student Essentials Items Compliance Site Audit Consultant SiteImprove:

318 views • 10 slides

Parameter-Efficient Transfer Learning for NLP N. Houlsby, A. - PowerPoint PPT Presentation

Parameter-Efficient Transfer Learning for NLP N. Houlsby, A. Giurgiu, S. Jastrzbski, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly Imagine doing Transfer Learning for NLP Ingredients: A large pretrained model

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

Transfer Learning in NLP Helping Small Teams Account for Small Datasets Ryan Smith

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Industrial Transfer Learning Introduction to Industrial Transfer Learning Industrial Transfer

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Parameter Passing and Pointers Parameter passing and functions I: reference parameters

10/16/19 Parameters and Parameter Tuning Genetic Algorithms History Taxonomy

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Progress on the Development of Parameter Progress on the Development of Parameter Values for

What is NLP? CMSC 473/673 http://www.qwantz.com/index.php?comic=170 Todays Learning Goals

Performance (III) & Power/Energy Hung-Wei Tseng Summary: Performance Equation Instructions

Bayesian modeling of behavior Wei Ji Ma New York University Center for Neural Science and

Preparing for Ontarios New Workplace Violence and Harassment Legislation Thursday, January

Multilingual Training and Cross-lingual Transfer Xinyi Wang Many languages are left behind

GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training Jiezhong Qiu , Qibin Chen,

Fine tuning the axioms of relativity to specific subjects Gergely Sz ekely www.renyi.hu/~turms

Informed Search: Coarse to Fine H YP ERPARAMETER TUN IN G IN P YTH ON Alex Scriven Data

Web Governance Committee January 25, 2017 Agenda Site Audit Consultant SiteImprove:

Sambuz

Useful Links

Newsletter

Mail Us

Parameter-Efficient Transfer Learning for NLP N. Houlsby, A. - PowerPoint PPT Presentation

Parameter-Efficient Transfer Learning for NLP N. Houlsby, A. Giurgiu*, S. Jastrzbski*, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly Imagine doing Transfer Learning for NLP Ingredients: A large pretrained model

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

Transfer Learning in NLP Helping Small Teams Account for Small Datasets Ryan Smith

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Industrial Transfer Learning Introduction to Industrial Transfer Learning Industrial Transfer

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Parameter Passing and Pointers Parameter passing and functions I: reference parameters

10/16/19 Parameters and Parameter Tuning Genetic Algorithms History Taxonomy

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Progress on the Development of Parameter Progress on the Development of Parameter Values for

What is NLP? CMSC 473/673 http://www.qwantz.com/index.php?comic=170 Todays Learning Goals

Performance (III) &amp; Power/Energy Hung-Wei Tseng Summary: Performance Equation Instructions

Bayesian modeling of behavior Wei Ji Ma New York University Center for Neural Science and

Preparing for Ontarios New Workplace Violence and Harassment Legislation Thursday, January

Multilingual Training and Cross-lingual Transfer Xinyi Wang Many languages are left behind

GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training Jiezhong Qiu , Qibin Chen,

Fine tuning the axioms of relativity to specific subjects Gergely Sz ekely www.renyi.hu/~turms

Informed Search: Coarse to Fine H YP ERPARAMETER TUN IN G IN P YTH ON Alex Scriven Data

Web Governance Committee January 25, 2017 Agenda Site Audit Consultant SiteImprove:

Sambuz

Useful Links

Newsletter

Mail Us

Parameter-Efficient Transfer Learning for NLP N. Houlsby, A. Giurgiu, S. Jastrzbski, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly Imagine doing Transfer Learning for NLP Ingredients: A large pretrained model

Performance (III) & Power/Energy Hung-Wei Tseng Summary: Performance Equation Instructions