High Performance Machine Learning: Advances, Challenges and - PowerPoint PPT Presentation

High Performance Machine Learning: Advances, Challenges and Opportunities Eduardo Rodrigues Lecture @ ERAD-RS - April 11th, 2019

IBM Research

❆❉❱❆◆❈❊❙

Artificial Intelligence Deep Blue (1997)

AI and Machine Learning AI ML

Jeopardy (2011)

Debater https://www.youtube.com/watch?v=UeF_N1r91RQ

Machine Learning is becoming central to ✘✘✘ many all industries ❳❳❳ ✘ ❳ ◮ Nine out of 10 executives from around the world describe AI as important to solving their organizations’ strategic challenges. ◮ Over the next decade, AI enterprise software revenue will grow from $644 million to nearly $39 billion ◮ Services-related revenue should reach almost $150 billion

AI identifies which primates could be carrying the Zika virus

Biophysics-Inspired AI Uses Photons to Help Surgeons Identify Cancer

IBM takes on Alzheimer’s disease with machine learning

Seismic Facies Segmentation Using Deep Learning

Crop detection

Automatic Citrus Tree Detection from UAV Images

Agropad https://www.youtube.com/watch?v=UYVc0TeuK-w

HPC and ML/AI ◮ As data abounds, deeper and more complex models are developed ◮ These models have many parameters and hyperparameters to tune ◮ A cycle of train, test and adjust is done many times before good results can be achieved ◮ Speedup exploratory cycle improves productivity ◮ Parallel execution is the solution

Basics: deep learning sequential execution Training basics ◮ loop over mini-batches and epochs ◮ forward propagation ◮ compute loss ◮ backward propagation (gradients) ◮ update parameters 1 ∂ L i � L = L i , N bs ∂ W n i

Parallel execution single node - multi-GPU system Many ways to divide the deep neural network The most common strategy is to divide mini-batches across GPUs ◮ The model is replicated across GPUs ◮ Data is divided among them ◮ Two possible approaches: ◮ non-overlapping division ◮ shuffled division ◮ Each GPU computes forward, cost and mini-batch gradients ◮ Gradients are then averaged and stored in a shared space (visible to all GPUs)

Parallelization strategies multi-node One can use a similar strategy with multi-node It requires communication across nodes Two strategies: ◮ Asynchronous ◮ Synchronous

Synchronous ◮ Can be implemented with high efficiency protocols ◮ No need to exchange variables ◮ Faster in terms of time to quality

DDL - Distributed Deep Learning ◮ We use a mesh-tori like reduction ◮ Earlier dimensions need more BW to transfer ◮ Later dimensions need less BW to transfer

Hierarchical communication (1)

Hierarchical communication (2) Reduce example This shows a single example of communication pattern that benefits from hierarchical communication More bandwith at the beginning

Hierarchical communication (2) Reduce example This shows a single example of communication pattern that benefits from hierarchical communication Progressivelly less bandwith is required

Seismic Segmentation Models based on DNNs A symbiotic partnership ◮ Deep Neural Networks have become the main tool for visual recognition ◮ They also have been used by seismologists to help interpret seismic data ◮ Relevant training examples may be sparse ◮ Training these models may take very long ◮ Parallel execution speed up training

Seismic Segmentation Models based on DNNs Challenges ◮ Current deep leaning models (Alexnet, VGG, Inception) do not fit well the task ◮ They are too big ◮ Little data (compared to traditional vision recognition tasks) ◮ Data pre-processing forces model’s input to be smaller ◮ Parallel execution strategies proposed in the literature are not appropriate

What is the recommendation:

Traditional technique

Traditional technique pitfalls Key assumptions are: ◮ the full batch is very large ◮ the effective minibatch is still a small fraction of the full batch A hidden assumption is that small full batches don’t need to run in parallel

Not only Imagenet can benefit from parallel execution

weak scaling, strong scaling

our experiments (1) Time to run 200 epochs Strong 2500 Weak 2000 execution time (s) 1500 1000 500 0 2 4 8 # of GPUs

our experiments (1) Time to run 200 epochs Intersection over union 0.7 strong 2 GPUs Strong 2500 strong 4 GPUs Weak 0.6 strong 8 GPUs weak 2 GPUs 2000 0.5 weak 4 GPUs execution time (s) weak 8 GPUs 0.4 1500 IOU 0.3 1000 0.2 500 0.1 0.0 0 25 50 75 100 125 150 175 200 2 4 8 Epochs # of GPUs

our experiments (2) Time to reach 60% IOU Strong 17500 Weak 15000 execution time (s) 12500 10000 7500 5000 2500 0 2 4 8 # of GPUs

our experiments (2) Time to reach 60% IOU Intersection over union 0.7 Strong 17500 Weak 0.6 15000 0.5 execution time (s) 12500 0.4 10000 IOU 0.3 7500 strong 2 GPUs strong 4 GPUs 0.2 5000 strong 8 GPUs weak 2 GPUs 0.1 weak 4 GPUs 2500 weak 8 GPUs 0.0 0 2 4 8 0 500 1000 1500 2000 # of GPUs Epochs

HPC AI

Motivation ◮ End-users must specify several parameters in their job submissions to the queue system, e.g.: ◮ Number of processors ◮ Queue / Partition ◮ Memory requirements ◮ Other resource requirements ◮ Those parameters have direct impact in the job turnaround time and, more importantly, in the total system utilization ◮ Frequently, end-users are not aware of the implications of the parameters they use ◮ System log keeps valuable information that can be leveraged to improve parameter choice

Related work ◮ Karnak has been used in XSEDE to predict waiting a label time and runtime ◮ Useful for users to plan their e E ( � q ) Point in the knowledge base experiments b Query neighborhood d D ( � q,� x ) = D ( � x,� q ) = d � x ◮ The method may not apply c well for other job f a f b Query ( � q ) x � Neighbor ( � p ) parameters, for example memory requirements

Memory requirements ◮ System owner wants to maximize utilization ◮ Users may not specify memory precisely ◮ Log data can provide training examples for a machine learning approach for predicting memory requirements ◮ This can be seen as a supervised learning task ◮ We have a set of features (e.g. user id, cwd, command parameters, submission time, etc) ◮ We want to predict memory requirements (label)

The Wisdom of Crowds There are many learning algorithms available, e.g. Classification trees, Neural Networks, Instance-based learners, etc Instead of relying on a single algorithm, we aggregate the predictions of several methods "Aggregating the judgment of many consistently beats the accuracy of the average member of the group"

Comparison between mode and poll x86 system Prediction performance in the x86 system mode poll 1.0 0.909 0.892 0.869 0.807 0.791 0.782 0.774 0.8 0.754 0.663 Accuracy 0.602 0.6 0.4 0.2 0.0 0 1 2 3 4 Segment

❈❍❆▲▲❊◆●❊❙

Is the singularity really near? Nick Bostrom - Superintelligence Yuval Noah Harari - 21 Lessons for the 21st Century

Employment

Flexibility and care Kai-Fu Lee - AI Super-powers - China, Silicon Valley and the New World Order

Knowledge https://xkcd.com/1838/

http://tylervigen.com/view_correlation?id=359

http://tylervigen.com/view_correlation?id=1703

https://xkcd.com/552/

Judea Pearl - The book of why Pedro Domingos - The Master Algorithm

❖PP❖❘❚❯◆■❚■❊❙

HPC AI

HPC AI App

HPC AI Agri

IBM Cloud

IBM to launch AI research center in Brazil

HPML 2019 High Performance Machine Learning Workshop @ IEEE/ACM CCGrid - Cyprus http:// hpml2019.github.io

High Performance Machine Learning: Advances, Challenges and - PowerPoint PPT Presentation

High Performance Machine Learning: Advances, Challenges and Opportunities Eduardo Rodrigues Lecture @ ERAD-RS - April 11th, 2019 IBM Research Artificial Intelligence Deep Blue (1997) AI and Machine Learning AI ML

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Machine Learning 1 Machine(Learning(in(a(Nutshell ( Data$ Model$ Performance$ Measure$

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

The Need Advances and Challenges Related to The Need, Advances and Challenges Related to

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

Functional Networks Analysis and a bit more J. Kurths , J. Donges, R. Donner, N. Marwan,

Uncertainty in Applications to . . . Cyberinfrastructure General Research on . . . New

Geometric & Quant Meths in Gravity & Particle Physics Subtitle: Off-diagonal deformations

IRS and monopolistic competition 1 04/03/2013 Assume: IRS Differentiated goods

======!"==Systems= Best Practices for Determining the Traffic Matrix in IP Networks

Countdown to Graduation Laura Brown, Careers Adviser Maths, Stats and Physics Careers Service

Ay 102 Physics of the Interstellar Medium supplemental material Hillenbrand Winter Term

Acknowledgements Much of the material in this video is based on the excellent course An

High Performance Machine Learning: Advances, Challenges and - PowerPoint PPT Presentation

High Performance Machine Learning: Advances, Challenges and Opportunities Eduardo Rodrigues Lecture @ ERAD-RS - April 11th, 2019 IBM Research Artificial Intelligence Deep Blue (1997) AI and Machine Learning AI ML

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Machine Learning 1 Machine(Learning(in(a(Nutshell ( Data$ Model$ Performance$ Measure$

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

The Need Advances and Challenges Related to The Need, Advances and Challenges Related to

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

Functional Networks Analysis and a bit more J. Kurths , J. Donges, R. Donner, N. Marwan,

Uncertainty in Applications to . . . Cyberinfrastructure General Research on . . . New

Geometric &amp; Quant Meths in Gravity &amp; Particle Physics Subtitle: Off-diagonal deformations

IRS and monopolistic competition 1 04/03/2013 Assume: IRS Differentiated goods

======!&quot;==Systems= Best Practices for Determining the Traffic Matrix in IP Networks

Countdown to Graduation Laura Brown, Careers Adviser Maths, Stats and Physics Careers Service

Ay 102 Physics of the Interstellar Medium supplemental material Hillenbrand Winter Term

Acknowledgements Much of the material in this video is based on the excellent course An

Geometric & Quant Meths in Gravity & Particle Physics Subtitle: Off-diagonal deformations

======!"==Systems= Best Practices for Determining the Traffic Matrix in IP Networks