Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC - - PowerPoint PPT Presentation
Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC - - PowerPoint PPT Presentation
Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC 2016 Mark Whitney Rescale Overview Hyper-parameter optimization intro Intro to training on Rescale Random sampling demo Advanced optimization workflows Image
Overview
- Hyper-parameter optimization intro
- Intro to training on Rescale
- Random sampling demo
- Advanced optimization workflows
Image Classification
Labeled training images
Train model on GPU accelerated cluster
Trained Network CAT input conv conv pool fully conn softmax Neural Network Library
model.add(Convolution2D(128, 3, 3) model.add(Dropout(0.4)) ...
Model definition
Image Classification
Labeled training images
Train model on GPU accelerated cluster
Trained Network CAT input conv conv pool fully conn softmax Neural Network Library
model.add(Convolution2D(128, 3, 3) model.add(Dropout(0.4)) ...
Model definition
NN Hyper-Parameter Optimization
input conv conv pool fully conn softmax input conv conv pool fully conn softmax conv pool input conv conv pool fully conn softmax fully conn input conv conv pool fully conn softmax input conv conv pool fully conn softmax fully conn
NN Hyper-Parameter Optimization
input conv conv pool fully conn softmax input conv conv pool fully conn softmax conv pool input conv conv pool fully conn softmax fully conn input conv conv pool fully conn softmax input conv conv pool fully conn softmax fully conn
Which one is best???
Hyper-Parameter Examples
- Learning rates
- Convolution kernel size
- Convolution kernel filters
- Pooling sizes
- Dropout fraction
- Number of convolutional and dense layers
- Training epochs
- Image preprocessing parameters
- Thorough list in [Bengio 2012]
NN Hyper-Parameter Optimization
input conv conv pool fully conn softmax
- Large set of candidate architectures
- Search space with many GPUs, find most accurate
input conv conv pool fully conn softmax conv pool input conv conv pool fully conn softmax fully conn input conv conv pool fully conn softmax input conv conv pool fully conn softmax fully conn
GPU accelerated clusters
GPU and HPC on Rescale
- Founded by aerospace
engineers for cloud sim
- On-Demand hardware
– GPU (K40s, K80s soon) – Infiniband – Integrated with 30 datacenters globally
- Optimized software
– Automotive – Aerospace – Life Science – Machine learning
- 120 packages available
Basic Model Training
input conv conv pool fully conn softmax
Basic Model Training
- Upload dataset to cloud staging storage
Rescale Staging Storage
Basic Model Training
- Upload dataset to cloud staging storage
- Optionally start cluster to preprocess data, transfer data back to staging
Preprocessing cluster
Rescale Staging Storage
Basic Model Training
- Upload dataset to cloud staging storage
- Optionally start cluster to preprocess data, transfer data back to staging
- Start GPU cluster, train model using definition and dataset
Preprocessing cluster Training cluster
model:add(nn.SpatialConvolution(128, 3, 3)) model:add(nn.ReLU(true)) model:add(nn.Dropout(0.4)) ...
Rescale Staging Storage
Basic Model Training
- Upload dataset to cloud staging storage
- Optionally start cluster to preprocess data, transfer data back to staging
- Start GPU cluster, train model using definition and dataset
- On completion of training, retrieve model
Preprocessing cluster Training cluster
model:add(nn.SpatialConvolution(128, 3, 3)) model:add(nn.ReLU(true)) model:add(nn.Dropout(0.4)) ...
input conv conv pool fully conn softmax Rescale Staging Storage
Parallel Hyper-Parameter Search
input conv conv pool fully conn softmax input conv conv pool fully conn softmax conv pool input conv conv pool fully conn softmax fully conn input conv conv pool fully conn softmax input conv conv pool fully conn softmax fully conn
Parallel Hyper-Parameter Search
Search algorithm (Grid, Monte Carlo, Black box opt) Model def with params Best model and accuracy Training results Preprocessed data Model def template + Parameter ranges
Parallelized Training
Monte Carlo/Grid Search: Templated Model Definition
model.add(Convolution2D(${conv_filter_count1},
${conv_kernel_size1}, ${conv_kernel_size1}, input_shape=(1, img_rows, img_cols))) model.add(Activation('relu')) model.add(Convolution2D(${conv_filter_count2}, ${conv_kernel_size2}, ${conv_kernel_size2})) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(${pool_size}, ${pool_size}))) model.add(Dropout(${dropout}))
Demo: Monte Carlo Keras MNIST training
model.add(Convolution2D( ${conv_filter_count1} , ${conv_kernel_size1} , ${conv_kernel_size1} , ...
GPU nodes Template and Sampling Engine
Parameter search on Rescale
User provides...
- Templated model
- Model training and
evaluation
- Parameter ranges/choices
- Training dataset
Parameter search on Rescale
User provides...
- Templated model
- Model training and
evaluation
- Parameter ranges/choices
- Training dataset
Rescale does...
- Sample and inject parameters
- Provision GPU training nodes
- Configure training libraries
- Load balance for training
- Summarize results
- Transfer tools for big datasets
Custom Optimizations
GPU clusters Optimization SDK Black-box optimization packages SMAC Spearmint SciPy.optimize
model.add(Convolution2D( ${conv_filter_count1}, ${conv_kernel_size1}, ${conv_kernel_size1}, ...
Templated/parameterized model Optimization Workflow Engine
Using Optimization SDK
import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script],
- utput_files=[output_file],
var_values=X) run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)
Using Optimization SDK
import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script],
- utput_files=[output_file],
var_values=X) run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …) # optimizer calls objective
Using Optimization SDK
import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) # inject parameter values into template run = rescale.submit(training_cmd, input_files=[script],
- utput_files=[output_file],
var_values=X) run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)
Using Optimization SDK
import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script],
- utput_files=[output_file],
var_values=X) # submit training cmd to run run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)
Using Optimization SDK
import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script],
- utput_files=[output_file],
var_values=X) run.wait() # wait for training to complete with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)
Example: Torch7 CIFAR10
- SMAC optimizer: [Frank Hutter, Holger Hoos, and Kevin Leyton-Brown]
- Network-in-Network model: [Min Lin, Qiang Chen, Shuicheng Yan]
- Implementation: [https://github.com/szagoruyko/cifar.torch]
○ NIN + BatchNormalization + Dropout
Candidate Parameter Variations
- Learning (6 params)
– Learning rate – Decays – Momentum – Batch size
Candidate Parameter Variations
- Learning (6 params)
– Learning rate – Decays – Momentum – Batch size
- Regularization (6 params)
– Dropouts – Batch normalization – Pool sizes
Candidate Parameter Variations
- Learning (6 params)
– Learning rate – Decays – Momentum – Batch size
- Regularization (6 params)
– Dropouts – Batch normalization – Pool sizes
- Structural (3 params)
– # of NiN blocks – # of mlpconv layers per block – # of conv filters per layer
conv mlpconv mlpconv pooling + dropout conv mlpconv mlpconv pooling + dropout inner conv layers NiN blocks Convolutional filters
Optimization Results
- Best performing:
– convolutional filters 192 -> 330
- Structure-based optimization:
8.1% -> 7.3% test error
- 10% reduction in error
- 150 parameter combinations
- 814 GPU hours
conv mlpconv mlpconv pooling + dropout conv mlpconv mlpconv pooling + dropout
Large Scale Learning on Public Cloud
- Validate everything early
- Overprovision GPUs, cull bad nodes
– Start N% more GPUs than you need – Check interconnect perf, check GPU perf – Kill off slow or malfunctioning nodes
- Allow easy restart, reload optimization state
- Integration tests ensuring hardware/ML library