A quantile-based approach for hyperparameter transfer learning David - - PowerPoint PPT Presentation

a quantile based approach for hyperparameter transfer
SMART_READER_LITE
LIVE PREVIEW

A quantile-based approach for hyperparameter transfer learning David - - PowerPoint PPT Presentation

A quantile-based approach for hyperparameter transfer learning David Salinas 2 Huibin Shen 1 Valerio Perrone 1 1 Amazon Research 2 NAVER LABS Europe, work done at Amazon December 11, 2019 David Salinas, Huibin Shen, Valerio Perrone A


slide-1
SLIDE 1

A quantile-based approach for hyperparameter transfer learning

David Salinas2 Huibin Shen1 Valerio Perrone1

1Amazon Research 2NAVER LABS Europe, work done at Amazon

December 11, 2019

David Salinas, Huibin Shen, Valerio Perrone (Amazon Berlin) A quantile-based approach for hyperparameter transfer learning December 11, 2019 1 / 8

slide-2
SLIDE 2

Transfer learning setting

Assume many HP evaluations {xl

i , yl i }nl i=0 available for nl datasets

xl

i ∈ Rd hyperparameter, yl i ∈ R objective to be minimized

Can we use it to speed up the tuning of a new dataset?

David Salinas, Huibin Shen, Valerio Perrone (Amazon Berlin) A quantile-based approach for hyperparameter transfer learning December 11, 2019 2 / 8

slide-3
SLIDE 3

Transfer learning

Difficulties: Scales of objectives yl

i may vary significantly across tasks

Noise may not be Gaussian Many observations: hard to apply (approximate) GP

1.0 1.5 2.0 2.5 3.0 3.5 log number gradient update 10

2

10

1

100 101 value dataset electricity exchange-rate m4-Daily m4-Hourly m4-Monthly m4-Quarterly m4-Weekly m4-Yearly solar traffic wiki-rolling

David Salinas, Huibin Shen, Valerio Perrone (Amazon Berlin) A quantile-based approach for hyperparameter transfer learning December 11, 2019 3 / 8

slide-4
SLIDE 4

Gaussian Copula transform

If only every yl was Gaussian... Apply change of variable ψ = Φ−1 ◦ F Φ Gaussian CDF, F is the marginal CDFs (approximated with empirical CDF) zl = ψ(yl) All zl becomes centered Gaussian! zl ∈ N(0, 1)

David Salinas, Huibin Shen, Valerio Perrone (Amazon Berlin) A quantile-based approach for hyperparameter transfer learning December 11, 2019 4 / 8

slide-5
SLIDE 5

Transfer learning

Parametric Prior Regress z(x) ≈ N(µθ(x), σθ(x)) Parameters θ are learned with MLE on evaluations Joint-learning as θ tied across tasks (only possible because z have comparable scales across tasks l) Two HPO strategies Thompson sampling with N(µθ(x), σθ(x)) Gaussian Copula Process with the prior N(µθ(x), σθ(x))

David Salinas, Huibin Shen, Valerio Perrone (Amazon Berlin) A quantile-based approach for hyperparameter transfer learning December 11, 2019 5 / 8

slide-6
SLIDE 6

Results

Evaluate on 3 blackboxes with precomputed evaluations (MLP [Klein 18], DeepAR [Salinas 17], XGboost)

blackbox # datasets # hyperparameters # evaluations

  • bjectives

DeepAR 11 6 ∼ 220 quantile loss, time FCNET 4 9 62208 MSE, time XGBoost 9 9 5000 1-AUC David Salinas, Huibin Shen, Valerio Perrone (Amazon Berlin) A quantile-based approach for hyperparameter transfer learning December 11, 2019 6 / 8

slide-7
SLIDE 7

Results

20 40 60 80 100 Iteration 10

7

10

6

10

5

10

4

10

3

10

2

10

1

Normalized distance to the minimum

fcnet

RS GP ABLR WS-best auto-range-gp CTS GCP 20 40 60 80 100 Iteration 10

5

10

4

10

3

Normalized distance to the minimum

DeepAR

RS GP ABLR WS-best auto-range-gp CTS GCP 20 40 60 80 100 Iteration 10

2

10

1

Normalized distance to the minimum

xgboost

RS GP ABLR WS-best auto-range-gp CTS GCP

David Salinas, Huibin Shen, Valerio Perrone (Amazon Berlin) A quantile-based approach for hyperparameter transfer learning December 11, 2019 7 / 8

slide-8
SLIDE 8

Results

Because every objectives are Gaussian centered, we can easily combined them! Multi-objective: optimize accuracy/time trade-off with zerror(x) + zruntime(x) More at our poster!

David Salinas, Huibin Shen, Valerio Perrone (Amazon Berlin) A quantile-based approach for hyperparameter transfer learning December 11, 2019 8 / 8