Asynchronous Hyperparameter Tuning and Ablation Studies with Apache - - PowerPoint PPT Presentation

asynchronous hyperparameter tuning and ablation studies
SMART_READER_LITE
LIVE PREVIEW

Asynchronous Hyperparameter Tuning and Ablation Studies with Apache - - PowerPoint PPT Presentation

Asynchronous Hyperparameter Tuning and Ablation Studies with Apache Spark Sina Sheikholeslami Distributed Computing Group, KTH Royal Institute of Technology @cutlash CASTOR Software Days 2019 October 16 2019 sinash@kth.se The Machine


slide-1
SLIDE 1

Asynchronous Hyperparameter Tuning and Ablation Studies with Apache Spark

sinash@kth.se

Sina Sheikholeslami

Distributed Computing Group, KTH Royal Institute of Technology

October 16 2019 CASTOR Software Days 2019

@cutlash

slide-2
SLIDE 2

The Machine Learning System

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 2

Dataset Machine Learning Model Optimizer Evaluate Problem Definition Data Preparation Model Selection

Repeat if needed Model Training

slide-3
SLIDE 3

Artificial Neural Networks

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 3

Input Layer Hidden Layer Output Layer

slide-4
SLIDE 4

How We Study the Brain

  • Early 19th Century,

ablative brain surgeries by Jean Pierre Flourens

(1794 - 1867)

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 4

slide-5
SLIDE 5

Ablation for Machine Learning?

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 5

Dataset Machine Learning Model Optimizer Evaluate Problem Definition Data Preparation Model Selection

Repeat if needed Model Training

area rooms floors price

slide-6
SLIDE 6

Talk of the Town

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 6

“Too frequently, authors propose many tweaks absent proper ablation studies … Sometimes just one of the changes is actually responsible for the improved results … this practice misleads readers to believe that all of the proposed changes are necessary.”

(Lipton & Steinhardt, “Troubling Trends in Machine Learning Scholarship”)

slide-7
SLIDE 7

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 7

Accuracy: 78%

Example: Layer Ablation (1/6)

The Base Model

slide-8
SLIDE 8

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 8

Accuracy: 73%

Example: Layer Ablation (2/6)

slide-9
SLIDE 9

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 9

Example: Layer Ablation (3/6)

The Base Model

slide-10
SLIDE 10

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 10

Accuracy: 67%

Example: Layer Ablation (4/6)

slide-11
SLIDE 11

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 11

Example: Layer Ablation (5/6)

The Base Model

slide-12
SLIDE 12

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 12

Accuracy: 63%

Example: Layer Ablation (6/6)

slide-13
SLIDE 13

Ablation Study

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 13

Machine Learning System

Ablation

New Dataset / Model Configuration Evaluate

slide-14
SLIDE 14

Hyperparameter Tuning

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 14

Machine Learning System

Hyperparameter Tuner

New Hyperparameter Values Evaluate

slide-15
SLIDE 15

System Experimentation (Search)

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 15

Machine Learning System

Global Experiment Controller

New Trial Evaluate

slide-16
SLIDE 16

Better Parallel

  • Ability to train better models, faster
  • Ability to modify and inspect, easier

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 18

(“Parallel Training” - by Maxim Melnikov)

slide-17
SLIDE 17

Parallelization in Practice

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 19

(TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.)

Machine Learning Deep Learning Parallel Processing

slide-18
SLIDE 18

Hopsworks

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 20

Open-source Platform for Data-intensive AI

slide-19
SLIDE 19

Hopsworks

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 21

Open-source Platform for Data-intensive AI

What is Hopsworks? https://tinyurl.com/y4ze79d4

slide-20
SLIDE 20

ML/DL in Hopsworks

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 22

Data Pipelines Ingest & Prep Feature Store Machine Learning Experiments Data Parallel Training Model Serving

Ablation Studies Hyperparameter Tuning

Bottleneck, due to

  • iterative nature
  • human interaction
slide-21
SLIDE 21

Spark and Bulk Synchronous Parallel Model

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 23

Task11

Driver

Task12 Task13 Task1N

HDFS

Task21 Task22 Task23 Task2N

Barrier Barrier

Task31 Task32 Task33 Task3N

Barrier

Metrics1 Metrics2 Metrics3

slide-22
SLIDE 22

Example: Synchronous Hyperparameter Search

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 24

Task11

Driver

Task12 Task13 Task1N

HDFS

Task21 Task22 Task23 Task2N

Barrier Barrier

Task31 Task32 Task33 Task3N

Barrier

Metrics1 Metrics2 Metrics3 Wasted Compute Wasted Compute Wasted Compute

slide-23
SLIDE 23

Critical Requirements

  • Parallel execution of trials
  • Support for early stopping of trials
  • Support for global control of the experiment
  • Resilience to stragglers
  • Simple, “Unified” User & Developer API

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 25

slide-24
SLIDE 24

Maggy

An Open-source Framework for Asynchronous Computation on top of Apache Spark

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 26

slide-25
SLIDE 25

Key Idea: Long Running Tasks

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 27 Task11

Driver

Task12 Task13 Task1N

Barrier

Metrics New Trial

slide-26
SLIDE 26

Maggy Core Architecture

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 28

slide-27
SLIDE 27

Back to Ablation

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 29

slide-28
SLIDE 28

LOCO: Leave One Component Out

  • A simple, “natural” ablation policy: an implementation of an ablator
  • Currently supports Feature Ablation + Layer Ablation

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 30

slide-29
SLIDE 29

Feature Ablation

  • Uses the Feature Store to access the dataset metadata
  • Generates Python callables that once called, will return modified datasets
  • Removes one-feature-at-a-time

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 31

area rooms floors price rooms floors price

slide-30
SLIDE 30

Layer Ablation

  • Uses a base model function
  • Generates Python callables that once called, will return modified models
  • Uses the model configuration to find and remove layer(s)
  • Removes one-layer-at-a-time (or one-layer-group-at-a-time)

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 32

slide-31
SLIDE 31

Ablation User & Developer API

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 33

(Example Notebook Available!)

slide-32
SLIDE 32

User API: Initialize the Study and Add Features

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 34

slide-33
SLIDE 33

User API: Define Base Model

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 35

slide-34
SLIDE 34

User API: Setup Model Ablation

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 36

slide-35
SLIDE 35

User API: Wrap the Training Function

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 37

slide-36
SLIDE 36

User API: Lagom!

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 38

slide-37
SLIDE 37

Developer API: Policy Implementation (1/2)

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 39

slide-38
SLIDE 38

Developer API: Policy Implementation (2/2)

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 40

slide-39
SLIDE 39

Hyperparameter Tuning: User API

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 41

slide-40
SLIDE 40

Hyperparameter Tuning: Developer API

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 42

slide-41
SLIDE 41

Maggy is Open-source

  • Code Repository: https://github.com/logicalclocks/maggy
  • API Documentation: https://maggy.readthedocs.io/en/latest/

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 43

slide-42
SLIDE 42

Next Steps

  • More Ablators
  • More Tuners
  • Support for More Frameworks

October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 44

slide-43
SLIDE 43

Thank you! J

@logicalclocks @hopsworks GitHub

https://github.com/hopshadoop/maggy https://maggy.readthedocs.io/en/latest/ https://logicalclocks.com/whitepapers/ Thanks to the entire Logical Clocks Team J Specially: Moritz Meister @morimeister Jim Dowling @jim_dowling Robin Andersson @robzor92 Kim Hammar @KimHammar1 Alex Ormenisan @alex_ormenisan

(Example Notebook Available!)

sinash@kth.se October 16 2019 CASTOR Software Days 2019

@cutlash