Deep Learning Hyperparameter Optimization with Competing Objectives - - PowerPoint PPT Presentation

deep learning hyperparameter optimization with competing
SMART_READER_LITE
LIVE PREVIEW

Deep Learning Hyperparameter Optimization with Competing Objectives - - PowerPoint PPT Presentation

Deep Learning Hyperparameter Optimization with Competing Objectives GTC 2018 - S8136 Scott Clark scott@sigopt.com OUTLINE 1. Why is Tuning Models Hard? 2. Common Tuning Methods 3. Deep Learning Example 4. Tuning Multiple Metrics 5.


slide-1
SLIDE 1

Deep Learning Hyperparameter Optimization with Competing Objectives

GTC 2018 - S8136 Scott Clark scott@sigopt.com

slide-2
SLIDE 2

OUTLINE

  • 1. Why is Tuning Models Hard?
  • 2. Common Tuning Methods
  • 3. Deep Learning Example
  • 4. Tuning Multiple Metrics
  • 5. Multi-metric Optimization Examples
slide-3
SLIDE 3

Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive

slide-4
SLIDE 4

Photo: Joe Ross

slide-5
SLIDE 5

TUNABLE PARAMETERS IN DEEP LEARNING

slide-6
SLIDE 6

TUNABLE PARAMETERS IN DEEP LEARNING

slide-7
SLIDE 7

TUNABLE PARAMETERS IN DEEP LEARNING

slide-8
SLIDE 8

TUNABLE PARAMETERS IN DEEP LEARNING

slide-9
SLIDE 9

TUNABLE PARAMETERS IN DEEP LEARNING

slide-10
SLIDE 10

Photo: Tammy Strobel

slide-11
SLIDE 11

STANDARD METHODS FOR HYPERPARAMETER SEARCH

slide-12
SLIDE 12

STANDARD TUNING METHODS

Parameter Configuration

?

Grid Search Random Search Manual Search

  • Weights
  • Thresholds
  • Window sizes
  • Transformations

ML / AI Model Testing Data Cross Validation Training Data

slide-13
SLIDE 13

OPTIMIZATION FEEDBACK LOOP

Objective Metric

Better Results

REST API New configurations

ML / AI Model Testing Data Cross Validation Training Data

slide-14
SLIDE 14

DEEP LEARNING EXAMPLE

slide-15
SLIDE 15
  • Classify movie reviews

using a CNN in MXNet

SIGOPT + MXNET

https://aws.amazon.com/blogs/machine-learning/fast-cnn-tuning-with-aws-gpu-instances-and-sigopt/

slide-16
SLIDE 16

TEXT CLASSIFICATION PIPELINE

ML / AI Model

(MXNet)

Testing Text Validation

Accuracy

Better Results

REST API Hyperparameter Configurations and Feature Transformations

Training Text

slide-17
SLIDE 17
  • Comparison of several RMSProp SGD parametrizations

STOCHASTIC GRADIENT DESCENT

slide-18
SLIDE 18

ARCHITECTURE PARAMETERS

slide-19
SLIDE 19

MULTIPLICATIVE TUNING SPEED UP

slide-20
SLIDE 20

SPEED UP #1: CPU -> GPU

slide-21
SLIDE 21

SPEED UP #2: RANDOM/GRID -> SIGOPT

slide-22
SLIDE 22

CONSISTENTLY BETTER AND FASTER

slide-23
SLIDE 23

TUNING MULTIPLE METRICS

What if we want to optimize multiple competing metrics?

  • Complexity Tradeoffs

○ Accuracy vs Training Time ○ Accuracy vs Inference Time

  • Business Metrics

○ Fraud Accuracy vs Money Lost ○ Conversion Rate vs LTV ○ Engagement vs Profit ○ Profit vs Drawdown

slide-24
SLIDE 24

PARETO OPTIMAL

What does it mean to optimize two metrics simultaneously? Pareto efficiency or Pareto optimality is a state of allocation of resources from which it is impossible to reallocate so as to make any one individual or preference criterion better off without making at least

  • ne individual or preference criterion worse off.
slide-25
SLIDE 25

PARETO OPTIMAL

What does it mean to optimize two metrics simultaneously?

The red points are on the Pareto Efficient Frontier, they strictly dominate all of the grey points. You can do no better in one metric without sacrificing performance in the other. Point N is Pareto Optimal compared to Point K.

slide-26
SLIDE 26

PARETO EFFICIENT FRONTIER

Goal is to have best set of feasible solutions to select from

After optimization the expert picks

  • ne or more of the red points from

the Pareto Efficient Frontier to further study or put into production.

slide-27
SLIDE 27

TOY EXAMPLE

slide-28
SLIDE 28

MULTI-METRIC OPTIMIZATION

slide-29
SLIDE 29

DEEP LEARNING EXAMPLES

slide-30
SLIDE 30

MULTI-METRIC OPT IN DEEP LEARNING

https://devblogs.nvidia.com/sigopt-deep-learning-hyperparameter-optimization/

slide-31
SLIDE 31

DEEP LEARNING TRADEOFFS

  • Deep Learning pipelines are time

consuming and expensive to run

  • Application and deployment

conditions may make certain configurations less desirable

  • Tuning for both accuracy and

complexity metrics like training or inference time allows expert to make best decision for production

slide-32
SLIDE 32
  • Comparison of several RMSProp SGD parametrizations
  • Different configurations converge differently

STOCHASTIC GRADIENT DESCENT

slide-33
SLIDE 33

TEXT CLASSIFICATION PIPELINE

ML / AI Model

(MXNet)

Testing Text Validation

Accuracy

Better Results

REST API Hyperparameter Configurations and Feature Transformations

Training Text

Training Time

slide-34
SLIDE 34

FINDING THE FRONTIER

slide-35
SLIDE 35

SEQUENCE CLASSIFICATION PIPELINE

ML / AI Model

(Tensorflow)

Testing Sequences Validation

Accuracy

Better Results

REST API Hyperparameter Configurations and Feature Transformations

Training Sequences

Inference Time

slide-36
SLIDE 36

TEXT CLASSIFICATION PIPELINE

slide-37
SLIDE 37

FINDING THE FRONTIER

slide-38
SLIDE 38

FINDING THE FRONTIER

slide-39
SLIDE 39

LOAN CLASSIFICATION PIPELINE

ML / AI Model

(LightGBM)

Testing Data Validation

AUCPR

Better Results

REST API Hyperparameter Configurations and Feature Transformations

Training Data

Avg $ Lost

slide-40
SLIDE 40

GRID SEARCH CAN MISLEAD

  • Best grid search point (wrt

accuracy) loses >$35 / transaction

  • Best grid search point (wrt loss)

has 70% accuracy

  • Points of the Pareto Frontier give

user more information about what is possible and more control of trade-offs

slide-41
SLIDE 41

DISTRIBUTED TRAINING/SCHEDULING

  • SigOpt serves as a distributed

scheduler for training models across workers

  • Workers access the SigOpt API

for the latest parameters to try for each model

  • Enables easy distributed

training of non-distributed algorithms across any number

  • f models
slide-42
SLIDE 42

TAKEAWAYS

One metric may not paint the whole picture

  • Think about metric trade-offs in your model pipelines
  • Optimizing for the wrong thing can be very expensive

Not all optimization strategies are equal

  • Pick an optimization strategy that gives the most flexibility
  • Different tools enable you to tackle new problems
slide-43
SLIDE 43

Questions?

contact@sigopt.com https://sigopt.com @SigOpt