Overcoming Multi-Model Forgetting Y. Benyahia, K. Yu, K. - - PowerPoint PPT Presentation

overcoming multi model forgetting
SMART_READER_LITE
LIVE PREVIEW

Overcoming Multi-Model Forgetting Y. Benyahia, K. Yu, K. - - PowerPoint PPT Presentation

Overcoming Multi-Model Forgetting Y. Benyahia, K. Yu, K. Bennani-Smires, M. Jaggi, A. Davison, M. Salzmann, C. Musat 1 The Weight Sharing In One of the first NAS papers using Reinforcement Learning, Zoph et Al. (Google) used more than 800 gpus


slide-1
SLIDE 1

Overcoming Multi-Model Forgetting

  • Y. Benyahia, K. Yu, K. Bennani-Smires, M. Jaggi, A. Davison, M.

Salzmann, C. Musat

1

slide-2
SLIDE 2

2

In One of the first NAS papers using Reinforcement Learning, Zoph et Al. (Google) used more than 800 gpus in parallel for two weeks.

The Weight Sharing

Weight Sharing was introduced in NAS to speed up the process Efficient Neural Architecture Search (Pham et al.)

slide-3
SLIDE 3

Assumptions

3

Our hypothesis: 1. Weight-sharing can negatively affect architectures. 2. If justified, this can lead to a wrong evaluation of candidates in NAS, making the evaluation phase closer to random

slide-4
SLIDE 4

Multi-Model Forgetting

4

slide-5
SLIDE 5

Study of Weight-Sharing

5

Simple scenario of two models sharing parameters: and Assume that we have access to the optimal parameters of the first model Maximizing the posterior distribution , Cross-entropy loss L2 regularization Weight importance

slide-6
SLIDE 6

Experiments on Two Models

  • WPL reduces multi-model forgetting
  • WPL have a minimal effect on the learning of the second model

6

slide-7
SLIDE 7

ENAS on PTB

7

slide-8
SLIDE 8

Summing up

8

To recap, our main contributions are: 1. Weight Sharing negatively impacts NAS 2. Weight Sharing can cause the search phase in NAS to become closer to random 3. WPL reduces Multi-Model Forgetting

Pacific Ballroom #19

(6:30pm - 9pm)