Adapting DL to New Data: An Evolutionary Algorithm for Optimizing - - PowerPoint PPT Presentation

adapting dl to new data an evolutionary algorithm for
SMART_READER_LITE
LIVE PREVIEW

Adapting DL to New Data: An Evolutionary Algorithm for Optimizing - - PowerPoint PPT Presentation

Adapting DL to New Data: An Evolutionary Algorithm for Optimizing Deep Networks Steven R. Young Research Scientist Oak Ridge National Laboratory ORNL is managed by UT-Battelle for the US Department of Energy Overview Deep Learning in


slide-1
SLIDE 1

ORNL is managed by UT-Battelle for the US Department of Energy

Adapting DL to New Data: An Evolutionary Algorithm for Optimizing Deep Networks

Steven R. Young Research Scientist Oak Ridge National Laboratory

slide-2
SLIDE 2

2 Adapting DL to New Data

Overview

  • Deep Learning in Science
  • Challenges
  • Tools
  • Next Steps
slide-3
SLIDE 3

3 Adapting DL to New Data

Deep Learning for Science Applications

Commercial Applications Science Applications

Object Recognition Face Recognition

State of the Art Results Characteristics

  • Data is easy to collect
  • Inexpensive labels

Challenging New Domains

Material Science High Energy Physics

Characteristics

  • Data is difficult to collect
  • Few labels available
slide-4
SLIDE 4

4 Adapting DL to New Data

Problem: Adaptability Challenge

  • Premise: For every data set, there exists a corresponding

neural network that performs ideally with that data

  • What’s the ideal neural network architecture (i.e., hyper-

parameters) for a particular data set ?

  • Widely-used approach: intuition

1. Pick some deep learning software (Caffe, Torch, Theano, etc) 2. Design a set of parameters that defines your deep learning network 3. Try it on your data 4. If it doesn’t work as well as you want, go back to step 2 and try again.

slide-5
SLIDE 5

5 Adapting DL to New Data

The Challenge

Pooling Convolutional Input Output Fully Connected Pooling Convolutional Output Fully Connected Pooling Convolutional Learning Rate Batch Size Momentum Weight Decay

Deep Learning Toolbox

slide-6
SLIDE 6

6 Adapting DL to New Data

Deep Learning Toolbox

The Challenge

Convolutional Input Convolutional Output Learning Rate Batch Size Momentum Weight Decay

slide-7
SLIDE 7

7 Adapting DL to New Data

Hyper-parameter Selection

  • Manual search, guess and check

– Requires domain knowledge

  • Grid search

– Exponential growth with high-dimensional hyper-parameter space – Doesn’t exploit low effective dimension for discovery

  • Random search

– By itself, not adaptive (no use of information from previous experiments)

slide-8
SLIDE 8

8 Adapting DL to New Data

What can we do with Titan?

18,688 GPUs

slide-9
SLIDE 9

9 Adapting DL to New Data

MENNDL: Multi-node Evolutionary Neural Networks for Deep Learning

  • Evolutionary algorithm as a solution for searching hyper-

parameter space for deep learning

– Focus on Convolutional Neural Networks – Evolve only the topology with EA; typical SGD training process – Generally: Provide scalability and adaptability for many data sets and compute platforms

  • Leverage more GPUs; ORNL’s Titan has 18k GPUs

– Next generation, Summit, will have increased GPU capability

  • Provide the ability to apply DL to new datasets quickly

– Climate science, material science, physics, etc.

slide-10
SLIDE 10

10 Adapting DL to New Data

Designing the Genetic Code

  • Goal: facilitate complete network definition exploration
  • Each population member is a network which has a genome

with sets of genes

– Fixed width set of genes corresponds to a layer

  • Layers contain multiple distinct parameters

– Restrict layer types based on section

  • Feature extraction and classification
  • Minor guided design in network, otherwise we attempt to fully encompass all layer

types

Population – Group of Networks Individual - Network Feature Layers Parameters

… …

Classification Layers

… …

slide-11
SLIDE 11

11 Adapting DL to New Data

MENNDL: Communication

Genetic Algorithm Master Gene: Population Network Parameters Network 1 Parameters, Model Predictions Performance Metrics Network 2 Parameters, Model Predictions Performance Metrics Network N Parameters, Model Predictions Performance Metrics Fitness Metrics: Accuracy Worker (one per node)

MPI

slide-12
SLIDE 12

12 Adapting DL to New Data

Hyper-parameter Values vs Performance

  • Currently T&E of latest

code that changes all possible parameters (e.g., # of layers, layer types, etc)

  • Using just 4 nodes
  • From 27% to 65%

Accuracy

Evolved

slide-13
SLIDE 13

13 Adapting DL to New Data

MINERvA

slide-14
SLIDE 14

14 Adapting DL to New Data

MINERvA Vertex Segment Classification

Goal: Classify which segment the vertex is located in. Challenge: Events can have very different characteristics.

slide-15
SLIDE 15

15 Adapting DL to New Data

Benefit of Parallelization

MINERvA dataset

12 hours 6 hours 2 hours

slide-16
SLIDE 16

16 Adapting DL to New Data

Unusual layers

  • Second convolution layer has a kernel size of 29
  • Followed by MAX Pooling layer with kernel size of 19
slide-17
SLIDE 17

17 Adapting DL to New Data

Unusual Layers (limited training examples)

slide-18
SLIDE 18

18 Adapting DL to New Data

Current Status

  • Scaled to 15,000 nodes of Titan
  • 460,000 Networks evaluated in 24 hours
  • Expanding to more complex topologies
  • Evaluating on a wide range of science datasets
slide-19
SLIDE 19

19 Adapting DL to New Data

Acknowledgements

  • Gabriel Perdue (FNAL) and Sohini Upadhyay (University of

Chicago)

  • Adam Terwilliger (Grand Valley State University) and David

Isele (University of Pennsylvania)

  • Robert Patton, Seung-Hwan Lim, Thomas Karnowski, and

Derek Rose (ORNL)

slide-20
SLIDE 20

20 Adapting DL to New Data

Questions