Accelerating Model Development by Reducing Operational Barriers - - PowerPoint PPT Presentation

accelerating model development by reducing operational
SMART_READER_LITE
LIVE PREVIEW

Accelerating Model Development by Reducing Operational Barriers - - PowerPoint PPT Presentation

Accelerating Model Development by Reducing Operational Barriers Patrick Hayes, Cofounder & CTO, SigOpt Talk ID: S9556 Accelerate and amplify the impact of modelers everywhere 3 SigOpt automates experimentation and optimization Data


slide-1
SLIDE 1

Accelerating Model Development by Reducing Operational Barriers

Patrick Hayes, Cofounder & CTO, SigOpt Talk ID: S9556

slide-2
SLIDE 2

Accelerate and amplify the impact of modelers everywhere

slide-3
SLIDE 3

3

slide-4
SLIDE 4

Hardware Environment

SigOpt automates experimentation and optimization

Transformation Labeling Pre-Processing Pipeline Dev. Feature Eng. Feature Stores

Data Preparation Experimentation, Training, Evaluation

Notebook & Model Framework Experimentation & Model Optimization

On-Premise Hybrid Multi-Cloud Insights, Tracking, Collaboration Model Search, Hyperparameter Tuning Resource Scheduler, Management

Validation Serving Deploying Monitoring Managing Inference Online Testing

Model Deployment

slide-5
SLIDE 5

Hyperparameter Optimization

Model Tuning Grid Search Random Search Bayesian Optimization Training & Tuning Evolutionary Algorithms Deep Learning Architecture Search Hyperparameter Search

slide-6
SLIDE 6

How it works: Seamlessly tune any model

ML, DL or Simulation Model Model Evaluation or Backtest Testing Data Training Data

Never accesses your data or models

slide-7
SLIDE 7

How it Works: Seamless implementation for any stack

Install SigOpt 1 Create experiment 2 Parameterize model 3 Run optimization loop 4 Analyze experiments 5

slide-8
SLIDE 8

How it Works: Seamless implementation for any stack

Install SigOpt 1 Create experiment 2 Parameterize model 3 Run optimization loop 4 Analyze experiments 5

https://bit.ly/sigopt-notebook

slide-9
SLIDE 9

How it Works: Seamless implementation for any stack

Install SigOpt 1 Create experiment 2 Parameterize model 3 Run optimization loop 4 Analyze experiments 5

https://bit.ly/sigopt-notebook

slide-10
SLIDE 10

How it Works: Seamless implementation for any stack

Install SigOpt 1 Create experiment 2 Parameterize model 3 Run optimization loop 4 Analyze experiments 5

https://bit.ly/sigopt-notebook

slide-11
SLIDE 11

How it Works: Seamless implementation for any stack

Install SigOpt 1 Create experiment 2 Parameterize model 3 Run optimization loop 4 Analyze experiments 5

https://bit.ly/sigopt-notebook

slide-12
SLIDE 12

How it Works: Seamless implementation for any stack

Install SigOpt 1 Create experiment 2 Parameterize model 3 Run optimization loop 4 Analyze experiments 5

slide-13
SLIDE 13

13

Benefits: Better, Cheaper, Faster Model Development

90% Cost Savings Maximize utilization of compute

https://aws.amazon.com/blogs/machine-learning/fast- cnn-tuning-with-aws-gpu-instances-and-sigopt/

10x Faster Time to Tune Less expert time per model

https://devblogs.nvidia.com/sigopt-deep-learning-hyp erparameter-optimization/

Better Performance No free lunch, but optimize any model

https://arxiv.org/pdf/1603.09441.pdf

slide-14
SLIDE 14

14

Overview of Features Behind SigOpt

Enterprise Platform Optimization Engine Experiment Insights

Reproducibility Intuitive web dashboards Cross-team permissions and collaboration Advanced experiment visualizations Organizational experiment analysis Parameter importance analysis Multimetric optimization Continuous, categorical,

  • r integer parameters

Constraints and failure regions Up to 10k observations, 100 parameters Multitask optimization and high parallelism Conditional parameters Infrastructure agnostic REST API Model agnostic Black-box interface Doesn’t touch data Libraries for Python, Java, R, and MATLAB Key: Only HPO solution with this capability

slide-15
SLIDE 15

Applied deep learning introduces unique challenges

slide-16
SLIDE 16

Failed observations Constraints Uncertainty Competing objectives Lengthy training cycles Cluster orchestration

sigopt.com/blog

slide-17
SLIDE 17

How do you more efficiently tune models that take days (or weeks) to train?

slide-18
SLIDE 18

18

AlexNex to AlphaGo Zero: 300,000x Increase in Compute

2012 2019 2013 2014 2015 2016 2017 2018 .00001 10,000 1 Petaflop/s - Day (Training) Year

  • AlexNet
  • Dropout
  • Visualizing and Understanding Conv Nets
  • DQN
  • GoogleNet
  • DeepSpeech2
  • ResNets
  • Xception
  • Neural Architecture Search
  • Neural Machine Translation
  • AlphaZero
  • AlphaGo Zero
  • TI7 Dota 1v1

VGG

  • Seq2Seq
slide-19
SLIDE 19

19

Speech Recognition Deep Reinforcement Learning Computer Vision

slide-20
SLIDE 20

Training Resnet-50 on ImageNet takes 10 hours Tuning 12 parameters requires at least 120 distinct models That equals 1,200 hours, or 50 days, of training time

slide-21
SLIDE 21

Running optimization tasks in parallel is critical to tuning expensive deep learning models

slide-22
SLIDE 22

Multiple Users Concurrent Optimization Experiments Concurrent Model Configuration Evaluations Multiple GPUs per Model

22

Complexity of Deep Learning DevOps

Training One Model, No Optimization Basic Case Advanced Case

slide-23
SLIDE 23

23

Cluster Orchestration

1

Spin up and share training clusters Schedule optimization experiments

2

Integrate with the optimization API

3

Monitor experiment and infrastructure

4

slide-24
SLIDE 24

Problems: Infrastructure, scheduling, dependencies, code, monitoring Solution: SigOpt Orchestrate is a CLI for managing training infrastructure and running optimization experiments

slide-25
SLIDE 25

How it Works

slide-26
SLIDE 26

Seamless Integration into Your Model Code

slide-27
SLIDE 27

Easily Define Optimization Experiments

slide-28
SLIDE 28

Easily Kick Off Optimization Experiment Jobs

slide-29
SLIDE 29

Check the Status of Active and Completed Experiments

slide-30
SLIDE 30

View Experiment Logs Across Multiple Workers

slide-31
SLIDE 31

Track Metadata and Monitor Your Results

slide-32
SLIDE 32

Automated Cluster Management

slide-33
SLIDE 33

Training Resnet-50 on ImageNet takes 10 hours Tuning 12 parameters requires at least 120 distinct models That equals 1,200 hours, or 50 days, of training time While training on 20 machines, wall-clock time is 50 days 2.5 days

slide-34
SLIDE 34

Failed Observations Constraints Uncertainty Competing Objectives Lengthy Training Cycles Cluster Orchestration

sigopt.com/blog

slide-35
SLIDE 35

Thank you!

patrick@sigopt.com for additional questions.

Try SigOpt Orchestrate: https://sigopt.com/orchestrate Free access for Academics & Nonprofits: https://sigopt.com/edu Solution-oriented program for the Enterprise: https://sigopt.com/pricing Leading applied optimization research: https://sigopt.com/research … and we're hiring! https://sigopt.com/careers