Overviews and practical reports Justin Clarke, Cecilia Ferrando, - PowerPoint PPT Presentation

Overviews and practical reports Justin Clarke, Cecilia Ferrando, William Rebelsky

Trends ● The growing amount of data and need for Machine Learning data processing challenges us to advance systems ● One trillion IoT devices expected by 2035 ● More need for supporting Machine Learning computation on edge devices

Specific challenges ● Who should design better systems for Machine Learning? ● What research is needed to address systems issues for Machine Learning? ● How do we improve AI support on edge devices?

Why are these issues important today?

Source: https://mkomo.com/cost-per-gigabyte-update

Source: Forbes.com

Specific problems being addressed in these works ● How do we develop better systems for ML? ● Who should do so? ● What needs to be done in order to improve the systems? ● What research directions should we be taking? ● How do we advance inference on edge devices?

Strategies and Principles of Distributed Machine Learning on Big Data Eric P. Xing *, Qirong Ho, Pengtao Xie, Dai Wei

Overview ● Rise of big Data -> more demands on Machine Learning ● More demands on Machine Learning -> larger clusters necessary ● Larger clusters necessary -> Someone needs to engineer the systems ● Should this fall to Machine Learning Researchers or Systems researchers? ● View: Big ML benefits from ML-rooted insights. Therefore ML researchers should do the system design.

Four Key Questions 1. How can an ML program be distributed over a cluster? 2. How can ML computation be bridged with inter-machine communication? 3. How can such communication be performed? 4. What should be communicated between machines?

Background on Machine Learning ● ML is becoming the primary mechanism for distilling raw-data into useable insights ● Most ML programs belong to one of a few families of well developed approaches ● Conventional ML R&D excels in model, algorithm, and theory development ● In general, ML is an optimization problem:

Distributed Machine Learning ● Given P systems, we would expect P-fold increase in performance ● However, current state of the art shows less than ½P-fold increase in performance ● Much of this is due to idealized assumptions in research: ○ Infinitely fast networks ○ All machines process at the same rate ○ No additional users/background tasks ● Two obvious research avenues to improve performance: ○ Improve convergence rate (number of iterations) ○ Improve throughput (per-iteration time)

Machine Learning vs Traditional Programs Machine Learning: Traditional: ● ● Error tolerance: Transaction-centric ○ ● ML programs are robust to minor errors in Only execute correctly if each step is intermediate steps atomically correct ● Dynamic structural dependencies: ○ Parameters depend not only on the data but on each other ● Non-uniform convergence: ○ Not all parameters converge in the same number of iterations

State of Current platforms ● Current platforms are general-purpose ● For each software: there is a tradeoff between: ○ Speed of Execution ○ Ease of programmability ○ Correctness of solution ● Current systems ensure that the outcome of each program is perfectly reproducible, which is not necessary in ML ● This paper: Usable software should instead offer two utilities: ○ A ready-to-run set of ML-workhorse implementations (eg MCMC) ○ ML distributed cluster OS that supports the above implementations

1. How can an ML program be distributed over a cluster? ● Big data can be parallelized using either model parallelism or data parallelism strategies ● ML requires a mix of both ● Improved efficiency: ○ Compute prioritization of parameters (e.g. give mroe resources to the parameters that need them) ○ Workload balancing using slow-worker agnosticism ○ Create a Structure Aware Parallelization (SAP) for scheduling, prioritization, and load balancing

2. How can ML computation be bridged with inter-machine communication?

Bridging Models: Current state ● Bulk Synchronous parallel bridging model (BSP): ○ Workers wait at the end of iteration until everyone is finished ○ Issue: Don’t get the p-fold speed up ■ Synchronization barrier suffers from stragglers ■ Synchronization barrier can take longer than the iteration ● Asynchronous execution: ○ Workers continue iterating and sending updates without waiting for others to finish ○ Issue: less progress per iteration ■ Information becomes stale ■ In the limit, errors can cause slow or incorrect convergence

Bridging Models: Solution ● Combine the two models and get the best of both worlds ● Stale Synchronous Parallel (SSP) bridging ○ Workers who get more than s iterations ahead of any other worker are stopped

3. How can such communication be performed? ● Continuous communication with rate limiters in the SSP implementation ● Wait-free Backpropagation ○ Take advantage of the idea that in fully connected layers the top layers account for 90% of the parameters, but only 10% of the backpropagation cost ● Update prioritization based on parameters that change the most (absolute or relative) ● Decentralized storage using Halton topology

4. What should be communicated between machines? ● Typical clusters can transmit at most a few gigabytes per second between two machines ○ Naive synchronization is not instantaneous ● Sufficient Factor Broadcasting: ○ Gradient computations can be decomposed to transmit S(K+D) instead of KD elements ● Convergence is still guaranteed (although it can take extra iterations)

Strategies and Principles of Distributed Machine Learning on Big Data PROS: 1. Enough relevant background information to explain the issues to people not already in the field 2. Strong justification for ML-researchers being involved in designing the systems 3. Separation of the issues into 4 major questions allows for more directed research moving forward CONS: 1. Section 4: Petuum a. Claims close to p-fold speedup but doesn’t show data b. Basic implementation that “might become the foundation of an ML distributed cluster operating system” 2. Inconsistent specificity: carefully state the ML models, general statements of the solution a. “Continuous communication can be achieved by a rate limiter in the SSP implementation” b. “SSP with properly selected staleness values “

A Berkeley View of Systems Challenges for AI Ion Stoica, Dawn Song, Raluca Ada Popa, David Patterson, Michael W. Mahoney, Randy Katz, Anthony D. Joseph, Michael Jordan, Joseph M. Hellerstein, Joseph Gonzalez, Ken Goldberg, Ali Ghodsi, David Culler, Pieter Abbeel

Trends: ● Mission critical AI ● Personalized AI ● AI across organizations ● AI demands outpacing Moore’s Law

Acting in Dynamic Environments ● Continual Learning ● Robust Decisions ● Explainable Decisions

Secure AI ● Secure enclaves ● Adversarial learning ● Shared learning on confidential data

AI-specific architectures ● Domain specific hardware ● composable AI systems ● Cloud-edge systems

Strengths and Weaknesses

[Cecilia]

Machine Learning at Facebook MACHINE LEARNING TASKS INFRASTRUCTURE DATACENTERS - Ranking posts - Content understanding - Object detection - Virtual reality - Speech recognition Inference on the - Translation EDGE edge to avoid latency

Challenges of edge inference OPTIMIZATION SOFTWARE HARDWARE LIMITATIONS LIMITATIONS “HOW DOES (diversity) (low performance) FACEBOOK RUN INFERENCE AT THE EDGE?” EDGE

Challenges of edge inference Mobile inference runs on old CPU cores CPU cores design year

Challenges of edge inference Mobile inference runs on old CPU cores There is no “standard” mobile SoC The most common SoC has only 4% of the market

Challenges of edge inference GPUs? Mobile inference runs on old CPU cores DSPs? There is no “standard” mobile SoC “Holistic” optimization?

Challenges of edge inference GPUs? Mobile inference runs on old CPU cores DSPs? Only 20% of the mobile SoCs have a GPU 3x more powerful than There is no “standard” mobile SoC “Holistic” optimization? CPUs! ( Apple devices stand out)

Challenges of edge inference GPUs? Mobile inference runs on old CPU cores DSPs? There is no “standard” mobile SoC “Holistic” optimization? Digital Signal Processors (co-processors) have little support for vector structures Programmability is an issue Only available on 5% of the SoCs

Challenges of edge inference GPUs? Mobile inference runs on old CPU cores DSPs? There is no “standard” mobile SoC “Holistic” optimization? Performance variability

Facebook mobile inference tools ● FBLearner workflow and optimization ● Caffe2 runs on mobile and is designed for broad support and CNN optimization ● New version of PyTorch designed to accelerate AI from research to production

Facebook mobile inference tools

Overviews and practical reports Justin Clarke, Cecilia Ferrando, - PowerPoint PPT Presentation

Overviews and practical reports Justin Clarke, Cecilia Ferrando, William Rebelsky Trends The growing amount of data and need for Machine Learning data processing challenges us to advance systems One trillion IoT devices expected by

SI NGER THAI LAND PRESENTATI ON Presentation Topics Business Company Overviews I

Audit Reports Guide Table of Contents Audit Reports Available Reports Accessing

th NATIONAL REPORTS 6 th th th 6 6 6 NATIONAL REPORTS NATIONAL REPORTS NATIONAL REPORTS

SUN 2018 1 AGENDA (1) COMPANY OVERVIEWS (2) BUSINESS UPDATE (3) FINANCIAL HIGHLIGHTS (4)

SUN 2017 1 AGENDA (1) COMPANY OVERVIEWS (2) BUSINESS UPDATE (3) FINANCIAL HIGHLIGHTS (4)

DATAIR DC/Win Reports Webinar 6/28/2007 DC/Win Reports Report Setup DC/Win Reports Report

Practical Experience with Practical Experience with Practical Experience with Practical

MS Reports and COM Study MS reports 2 0 1 3 state of play Reports on the " experience

CRYSTAL REPORTS ProCal, ProCert & ProCal-Track Crystal Reports Support CRYSTAL REPORTS

Change from a Practical Perspective Change from a Practical Perspective Change from a Practical

CSpace CSpace CSpace CSpace A More Practical and A More Practical and A

webinar presents a sampling of best practices and overviews, generalities, and some laws. This

IBUS Information Session Agenda Faculty introductions and Course overviews Recommendation:

This webinar may be recorded. This webinar presents a sampling of best practices and overviews,

This webinar may be recorded. This webinar presents a sampling of best practices and overviews,

IBUS Information Session Agenda Faculty introductions and course overviews Recommendation:

Flexible Grid Label Format in Wavelength Switched Op:cal

Adap%ve policies for balancing performance and life%me of mixed SSD arrays through workload

Preparing for the Internet of Things 50 Trillion Gigabyte Challenge Pat McGarry Ryft Systems,

New Abstract: Multi-gigabyte data sets challenge and frustrate R users even on well-equipped

Administrivia CSCE150A CSCE150A Computer Science & Engineering 150A Administrivia Problem

Serverless in Action CodeStock April 13, 2019 Who is Chad Green Director of Software

Introduction to Data Stream Mining Albert Bifet March 2012 Motivation Source: IDCs Digital

BasicComputerConcepts Chapter4 Objec;ves15 ReviewQues;ons

Overviews and practical reports Justin Clarke, Cecilia Ferrando, - PowerPoint PPT Presentation

Overviews and practical reports Justin Clarke, Cecilia Ferrando, William Rebelsky Trends The growing amount of data and need for Machine Learning data processing challenges us to advance systems One trillion IoT devices expected by

SI NGER THAI LAND PRESENTATI ON Presentation Topics Business Company Overviews I

Audit Reports Guide Table of Contents Audit Reports Available Reports Accessing

th NATIONAL REPORTS 6 th th th 6 6 6 NATIONAL REPORTS NATIONAL REPORTS NATIONAL REPORTS

SUN 2018 1 AGENDA (1) COMPANY OVERVIEWS (2) BUSINESS UPDATE (3) FINANCIAL HIGHLIGHTS (4)

SUN 2017 1 AGENDA (1) COMPANY OVERVIEWS (2) BUSINESS UPDATE (3) FINANCIAL HIGHLIGHTS (4)

DATAIR DC/Win Reports Webinar 6/28/2007 DC/Win Reports Report Setup DC/Win Reports Report

Practical Experience with Practical Experience with Practical Experience with Practical

MS Reports and COM Study MS reports 2 0 1 3 state of play Reports on the &quot; experience

CRYSTAL REPORTS ProCal, ProCert &amp; ProCal-Track Crystal Reports Support CRYSTAL REPORTS

Change from a Practical Perspective Change from a Practical Perspective Change from a Practical

CSpace CSpace CSpace CSpace A More Practical and A More Practical and A

webinar presents a sampling of best practices and overviews, generalities, and some laws. This

IBUS Information Session Agenda Faculty introductions and Course overviews Recommendation:

This webinar may be recorded. This webinar presents a sampling of best practices and overviews,

This webinar may be recorded. This webinar presents a sampling of best practices and overviews,

IBUS Information Session Agenda Faculty introductions and course overviews Recommendation:

Flexible Grid Label Format in Wavelength Switched Op:cal

Adap%ve policies for balancing performance and life%me of mixed SSD arrays through workload

Preparing for the Internet of Things 50 Trillion Gigabyte Challenge Pat McGarry Ryft Systems,

New Abstract: Multi-gigabyte data sets challenge and frustrate R users even on well-equipped

Administrivia CSCE150A CSCE150A Computer Science &amp; Engineering 150A Administrivia Problem

Serverless in Action CodeStock April 13, 2019 Who is Chad Green Director of Software

Introduction to Data Stream Mining Albert Bifet March 2012 Motivation Source: IDCs Digital

BasicComputerConcepts Chapter4 Objec;ves15 ReviewQues;ons

MS Reports and COM Study MS reports 2 0 1 3 state of play Reports on the " experience

CRYSTAL REPORTS ProCal, ProCert & ProCal-Track Crystal Reports Support CRYSTAL REPORTS

Administrivia CSCE150A CSCE150A Computer Science & Engineering 150A Administrivia Problem