Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC - - PowerPoint PPT Presentation

fast and easy hyper parameter grid search for deep
SMART_READER_LITE
LIVE PREVIEW

Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC - - PowerPoint PPT Presentation

Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC 2016 Mark Whitney Rescale Overview Hyper-parameter optimization intro Intro to training on Rescale Random sampling demo Advanced optimization workflows Image


slide-1
SLIDE 1

Fast and Easy Hyper-Parameter Grid Search for Deep Learning

GTC 2016 Mark Whitney Rescale

slide-2
SLIDE 2

Overview

  • Hyper-parameter optimization intro
  • Intro to training on Rescale
  • Random sampling demo
  • Advanced optimization workflows
slide-3
SLIDE 3

Image Classification

Labeled training images

Train model on GPU accelerated cluster

Trained Network CAT input conv conv pool fully conn softmax Neural Network Library

model.add(Convolution2D(128, 3, 3) model.add(Dropout(0.4)) ...

Model definition

slide-4
SLIDE 4

Image Classification

Labeled training images

Train model on GPU accelerated cluster

Trained Network CAT input conv conv pool fully conn softmax Neural Network Library

model.add(Convolution2D(128, 3, 3) model.add(Dropout(0.4)) ...

Model definition

slide-5
SLIDE 5

NN Hyper-Parameter Optimization

input conv conv pool fully conn softmax input conv conv pool fully conn softmax conv pool input conv conv pool fully conn softmax fully conn input conv conv pool fully conn softmax input conv conv pool fully conn softmax fully conn

slide-6
SLIDE 6

NN Hyper-Parameter Optimization

input conv conv pool fully conn softmax input conv conv pool fully conn softmax conv pool input conv conv pool fully conn softmax fully conn input conv conv pool fully conn softmax input conv conv pool fully conn softmax fully conn

Which one is best???

slide-7
SLIDE 7

Hyper-Parameter Examples

  • Learning rates
  • Convolution kernel size
  • Convolution kernel filters
  • Pooling sizes
  • Dropout fraction
  • Number of convolutional and dense layers
  • Training epochs
  • Image preprocessing parameters
  • Thorough list in [Bengio 2012]
slide-8
SLIDE 8

NN Hyper-Parameter Optimization

input conv conv pool fully conn softmax

  • Large set of candidate architectures
  • Search space with many GPUs, find most accurate

input conv conv pool fully conn softmax conv pool input conv conv pool fully conn softmax fully conn input conv conv pool fully conn softmax input conv conv pool fully conn softmax fully conn

GPU accelerated clusters

slide-9
SLIDE 9

GPU and HPC on Rescale

  • Founded by aerospace

engineers for cloud sim

  • On-Demand hardware

– GPU (K40s, K80s soon) – Infiniband – Integrated with 30 datacenters globally

  • Optimized software

– Automotive – Aerospace – Life Science – Machine learning

  • 120 packages available
slide-10
SLIDE 10

Basic Model Training

input conv conv pool fully conn softmax

slide-11
SLIDE 11

Basic Model Training

  • Upload dataset to cloud staging storage

Rescale Staging Storage

slide-12
SLIDE 12

Basic Model Training

  • Upload dataset to cloud staging storage
  • Optionally start cluster to preprocess data, transfer data back to staging

Preprocessing cluster

Rescale Staging Storage

slide-13
SLIDE 13

Basic Model Training

  • Upload dataset to cloud staging storage
  • Optionally start cluster to preprocess data, transfer data back to staging
  • Start GPU cluster, train model using definition and dataset

Preprocessing cluster Training cluster

model:add(nn.SpatialConvolution(128, 3, 3)) model:add(nn.ReLU(true)) model:add(nn.Dropout(0.4)) ...

Rescale Staging Storage

slide-14
SLIDE 14

Basic Model Training

  • Upload dataset to cloud staging storage
  • Optionally start cluster to preprocess data, transfer data back to staging
  • Start GPU cluster, train model using definition and dataset
  • On completion of training, retrieve model

Preprocessing cluster Training cluster

model:add(nn.SpatialConvolution(128, 3, 3)) model:add(nn.ReLU(true)) model:add(nn.Dropout(0.4)) ...

input conv conv pool fully conn softmax Rescale Staging Storage

slide-15
SLIDE 15

Parallel Hyper-Parameter Search

input conv conv pool fully conn softmax input conv conv pool fully conn softmax conv pool input conv conv pool fully conn softmax fully conn input conv conv pool fully conn softmax input conv conv pool fully conn softmax fully conn

slide-16
SLIDE 16

Parallel Hyper-Parameter Search

Search algorithm (Grid, Monte Carlo, Black box opt) Model def with params Best model and accuracy Training results Preprocessed data Model def template + Parameter ranges

Parallelized Training

slide-17
SLIDE 17

Monte Carlo/Grid Search: Templated Model Definition

model.add(Convolution2D(${conv_filter_count1},

${conv_kernel_size1}, ${conv_kernel_size1}, input_shape=(1, img_rows, img_cols))) model.add(Activation('relu')) model.add(Convolution2D(${conv_filter_count2}, ${conv_kernel_size2}, ${conv_kernel_size2})) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(${pool_size}, ${pool_size}))) model.add(Dropout(${dropout}))

slide-18
SLIDE 18

Demo: Monte Carlo Keras MNIST training

model.add(Convolution2D( ${conv_filter_count1} , ${conv_kernel_size1} , ${conv_kernel_size1} , ...

GPU nodes Template and Sampling Engine

slide-19
SLIDE 19

Parameter search on Rescale

User provides...

  • Templated model
  • Model training and

evaluation

  • Parameter ranges/choices
  • Training dataset
slide-20
SLIDE 20

Parameter search on Rescale

User provides...

  • Templated model
  • Model training and

evaluation

  • Parameter ranges/choices
  • Training dataset

Rescale does...

  • Sample and inject parameters
  • Provision GPU training nodes
  • Configure training libraries
  • Load balance for training
  • Summarize results
  • Transfer tools for big datasets
slide-21
SLIDE 21

Custom Optimizations

GPU clusters Optimization SDK Black-box optimization packages SMAC Spearmint SciPy.optimize

model.add(Convolution2D( ${conv_filter_count1}, ${conv_kernel_size1}, ${conv_kernel_size1}, ...

Templated/parameterized model Optimization Workflow Engine

slide-22
SLIDE 22

Using Optimization SDK

import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script],

  • utput_files=[output_file],

var_values=X) run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)

slide-23
SLIDE 23

Using Optimization SDK

import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script],

  • utput_files=[output_file],

var_values=X) run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …) # optimizer calls objective

slide-24
SLIDE 24

Using Optimization SDK

import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) # inject parameter values into template run = rescale.submit(training_cmd, input_files=[script],

  • utput_files=[output_file],

var_values=X) run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)

slide-25
SLIDE 25

Using Optimization SDK

import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script],

  • utput_files=[output_file],

var_values=X) # submit training cmd to run run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)

slide-26
SLIDE 26

Using Optimization SDK

import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script],

  • utput_files=[output_file],

var_values=X) run.wait() # wait for training to complete with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)

slide-27
SLIDE 27

Example: Torch7 CIFAR10

  • SMAC optimizer: [Frank Hutter, Holger Hoos, and Kevin Leyton-Brown]
  • Network-in-Network model: [Min Lin, Qiang Chen, Shuicheng Yan]
  • Implementation: [https://github.com/szagoruyko/cifar.torch]

○ NIN + BatchNormalization + Dropout

slide-28
SLIDE 28

Candidate Parameter Variations

  • Learning (6 params)

– Learning rate – Decays – Momentum – Batch size

slide-29
SLIDE 29

Candidate Parameter Variations

  • Learning (6 params)

– Learning rate – Decays – Momentum – Batch size

  • Regularization (6 params)

– Dropouts – Batch normalization – Pool sizes

slide-30
SLIDE 30

Candidate Parameter Variations

  • Learning (6 params)

– Learning rate – Decays – Momentum – Batch size

  • Regularization (6 params)

– Dropouts – Batch normalization – Pool sizes

  • Structural (3 params)

– # of NiN blocks – # of mlpconv layers per block – # of conv filters per layer

conv mlpconv mlpconv pooling + dropout conv mlpconv mlpconv pooling + dropout inner conv layers NiN blocks Convolutional filters

slide-31
SLIDE 31

Optimization Results

  • Best performing:

– convolutional filters 192 -> 330

  • Structure-based optimization:

8.1% -> 7.3% test error

  • 10% reduction in error
  • 150 parameter combinations
  • 814 GPU hours

conv mlpconv mlpconv pooling + dropout conv mlpconv mlpconv pooling + dropout

slide-32
SLIDE 32

Large Scale Learning on Public Cloud

  • Validate everything early
  • Overprovision GPUs, cull bad nodes

– Start N% more GPUs than you need – Check interconnect perf, check GPU perf – Kill off slow or malfunctioning nodes

  • Allow easy restart, reload optimization state
  • Integration tests ensuring hardware/ML library

compatibility

slide-33
SLIDE 33

DNN Optimization on Rescale

https://platform.rescale.com mark@rescale.com