parameter efficient transfer learning for nlp
play

Parameter-Efficient Transfer Learning for NLP N. Houlsby, A. - PowerPoint PPT Presentation

Parameter-Efficient Transfer Learning for NLP N. Houlsby, A. Giurgiu*, S. Jastrzbski*, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly Imagine doing Transfer Learning for NLP Ingredients: A large pretrained model


  1. Parameter-Efficient Transfer Learning for NLP N. Houlsby, A. Giurgiu*, S. Jastrzębski*, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly

  2. Imagine doing Transfer Learning for NLP Ingredients: ● A large pretrained model (BERT) ● Fine-tuning 2/5

  3. Imagine doing Transfer Learning for NLP Ingredients: ● A large pretrained model (BERT) ● Fine-tuning BERT Task 1 BERT Task 2 Problem for large N ... BERT Task N-1 BERT Task N 2/5

  4. Imagine doing Transfer Learning for NLP Ingredients: ● A large pretrained model (BERT) ● Fine-tuning + Adapter 1 Task 1 + Adapter 2 Task 2 BERT + Adapter N-1 Task N-1 + Adapter N Task N 2/5

  5. BERT + Adapters ● Solution : Train tiny adapter modules at each layer Solution 5 5 3/5

  6. BERT + Adapters ● Solution : Train tiny adapter modules at each layer Solution 6 6 3/5

  7. BERT + Adapters ● Solution : Train tiny adapter modules at each layer Solution 7 7 3/5

  8. BERT + Adapters ● Solution : Train tiny adapter modules at each layer Bottleneck Solution 8 8 3/5

  9. Results on GLUE Benchmark 4/5

  10. Results on GLUE Benchmark 4/5

  11. Results on GLUE Benchmark 4/5

  12. Results on GLUE Benchmark 4/5

  13. Results on GLUE Benchmark Fewer parameters, similar performance Fewer parameters, degraded performance 4/5

  14. Results on GLUE Benchmark 0.4% accuracy drop for 96.4% reduction in the # of parameters/task 4/5

  15. Conclusions 1. If we move towards a single model future, we need to improve parameter-efficiency of transfer learning 2. We propose a module reducing drastically # params/task for NLP , e.g. by 30x at only 0.4% accuracy drop Related work (@ ICML): “ BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning”, A. Stickland & I. Murray Please come to our poster today at 6:30 PM (#102) 5/5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend