Learning Systems Research at the Intersection of Machine Learning - PowerPoint PPT Presentation

Learning Systems Research at the Intersection of Machine Learning & Data Systems Joseph E. Gonzalez Asst. Professor, UC Berkeley jegonzal@cs.berkeley.edu

How can machine learning techniques be used to address systems challenges ? Learning Systems How can systems techniques be used to address machine learning challenges ?

How can machine learning techniques be used to address systems challenges ? Systems are getting increasing complex: Ø Resource Disaggregation à growing diversity of system configurations and freedom to add resources as needed Ø New Pricing Models à dynamic pricing and potential to bid for different types of resources Ø Data-centric Workloads à performance depends on interaction between system, algorithms, and data

Paris Performance Aware Runtime Inference System Neeraja Bharath Randy Yadwadkar Hariharan Katz Ø What vm-type should I use to run my experiment? r3.2xlarge m4.large t2.large c4.large r3.xlarge m4.4xlarge r3.large g2.8xlarge m3.xlarge m4.2xlarge g2.2xlarge m3.2xlarge m3.medium t2.micro c4.2xlarge c4.xlarge m3.large c4.4xlarge r3.8xlarge r3.4xlarge c4.8xlarge t2.small t2.nano m4.xlarge t2.medium x1.32xlarge m4.10xlarge

Paris Performance Aware Runtime Inference System Neeraja Bharath Randy Yadwadkar Hariharan Katz Ø What vm-type should I use to run my experiment? g2.2xlarge m3.xlarge r3.2xlarge m4.2xlarge t2.large c4.large r3.xlarge m4.large m4.4xlarge r3.large g2.8xlarge m3.medium 54 Instance Types t2.micro m3.large c4.2xlarge r3.8xlarge r3.4xlarge m3.2xlarge c4.8xlarge c4.4xlarge c4.xlarge t2.small m4.xlarge t2.nano x1.32xlarge t2.medium m4.10xlarge

Paris Performance Aware Runtime Inference System Neeraja Bharath Randy Yadwadkar Hariharan Katz Ø What vm-type should I use to run my experiment? c4.large r3.large m4.large t2.small t2.micro r3.xlarge t2.large m4.2xlarge c4.4xlarge g2.2xlarge m3.xlarge m3.large r3.8xlarge c4.8xlarge g2.8xlarge t2.medium m4.10xlarge c4.xlarge m3.medium c4.2xlarge m4.xlarge t2.nano x1.32xlarge r3.4xlarge m3.2xlarge m4.4xlarge r3.2xlarge 54 25 18 Ø Answer: workload specific and depends on cost & runtime goals

Paris Performance Aware Runtime Inference System Neeraja Bharath Randy Yadwadkar Hariharan Katz Ø Best vm-type depends on workload as well as cost & runtime goals Which VM will cost Price Runtime me the least? m1.small is cheapest?

Paris Performance Aware Runtime Inference System Neeraja Bharath Randy Yadwadkar Hariharan Katz Ø Best vm-type depends on workload as well as cost & runtime goals Price Job Runtime Cost Requires accurate runtime prediction .

Paris Performance Aware Runtime Inference System Neeraja Bharath Randy Yadwadkar Hariharan Katz Ø Goal: Predict the runtime of workload w on VM type v Ø Challenge: How do we model workloads and VM types Ø Insight: Benchmarking Ø Extensive benchmarking to model … relationships between VM types vm1 vm2 vm100 Ø Costly but run once for all workloads Ø Lightweight workload “fingerprinting” Fingerprinting Workload by on a small set of test VMs Ø Generalize workload performance on other VMs Ø Results: Runtime prediction 17% Relative RMSE (56% Baseline)

Hemingway * Modeling Throughput and Convergence for ML Workloads Shivaram Xinghao Zi Venkataraman Pan Zheng Ø What is the best algorithm and level of parallelism for an ML task? Ø Trade-off: Parallelism, Coordination, & Convergence Ø Research challenge: Can we model this trade-off explicitly? We can estimate I from ML Metric Iter. / Sec. data on many systems Loss Systems Metric We can estimate L from data for our problem Cores Iteration I ( p ) Iterations per second as Loss as a function of L ( i, p ) a function of cores p iterations i and cores p *follow-up work to Shivaram’s Ernest paper

Hemingway * Modeling Throughput and Convergence for ML Workloads Shivaram Xinghao Zi Venkataraman Pan Zheng Ø What is the best algorithm and level of parallelism for an ML task? Ø Trade-off: Parallelism, Coordination, & Convergence Ø Research challenge: Can we model this trade-off explicitly? Loss as a function of loss ( t, p ) = L ( t ∗ I ( p ) , p ) L ( i, p ) iterations i and cores p I ( p ) Iterations per second as • How long does it take to get to a given loss? a function of cores p • Given a time budget and number of cores which algorithm will give the best result? *follow-up work to Shivaram’s Ernest paper

Deep Code Completion Neural architectures for reasoning about programs Ø Goals: Xin Chang Dawn Wang Liu Song Ø Smart naming of variables and routines Ø Learn coding styles and patterns Ø Predict large code fragments def fib( ): x Ø Char and Symbol LSTMs if x < 2 : return x else : y = fib(x–1) + fib(x–2) Ø Programs are more tree shaped… return y

Deep Code Completion Neural architectures for reasoning about programs Ø Goals: Xin Chang Dawn Wang Liu Song Ø Smart naming of variables and routines Ø Learn coding styles and patterns def fib( ): Ø Predict large code fragments x Parse Ø Char and Symbol LSTMs if x < 2 Tree return x = y Ø Programs are more tree shaped… + fib(x–1) fib(x–2) return y

Deep Code Completion Neural architectures for reasoning about programs Ø Goals: Xin Chang Dawn Wang Liu Song Ø Smart naming of variables and routines Ø Learn coding styles and patterns def fib( ): Ø Predict large code fragments x Parse Ø Char and Symbol LSTMs if x < 2 Tree return x = y Ø Exploring Tree LSTMs + fib(x–1) Ø Issue: dependencies fib(x–2) flow in both directions return y Kai Sheng Tai, Richard Socher, Christopher D. Manning. “Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks.” (ACL 2015)

Deep Code Completion Neural architectures for reasoning about computer programs Ø Goals: Xin Chang Dawn Wang Liu Song Ø Smart naming of variables and routines Ø Learn coding styles and patterns Ø Predict large code fragments Ø Current studying Char-LSTM and Tree-LSTM on benchmark C++ Vanilla LSTM code and JavaScript code. Ø Plan to extend Tree-LSTM with downward information flow Tree- LSTM

Fun Code Sample Generated by Char-LSTM Generated Code Sample Code Prefix For now, the neural network can learn some code patterns like matching the parenthesis, if-else block, etc but the variable name issue still hasn’t been solved. *this is trained on the leetcode OJ code submissions from Github.

How can machine learning techniques be used to address systems challenges ? Learning Systems How can systems techniques be used to address machine learning challenges ?

Systems for Machine Learning Training Big Data Big Model Timescale: minutes to days Systems: offline and batch optimized Heavily studied ... primary focus of the ML research

Training Big Data Big Model CoCoA Splash Please make a Logo!

Training Big Data Big Model emgine CoCoA Splash Please make a Logo!

Temgine A Scalable Multivariate Time Series Analysis Engine Challenge: Francois Xin Evan Billetti Wang Sparks Ø Estimate second order statistics Ø E.g. Auto-correlation, auto-regressive models, … Ø for high-dimensional & irregularly sampled time series Sensor 1 Regularly Irregularly Sensor 1 Sampled Sampled Time Sensor 2 Time Sensor 2 Difficult to align! Samples are Time easy to align Sensor 3 Time (requires sorting) Sensor 3 Time t 0 t 1 t 2 t 3 t 4 t 5 t 6 Time

Temgine A Scalable Multivariate Time Series Analysis Engine Challenge: Francois Xin Evan Billetti Wang Sparks Ø Estimate second order statistics Ø E.g. Auto-correlation, auto-regressive models, … Ø for high-dimensional & irregularly sampled time series Solution: Irregularly Sensor 1 • Project onto Fourier basis Sampled Time • does not require data alignment Sensor 2 Difficult to align! • Infer statistics in frequency domain Time • equivalent to kernel smoothing Sensor 3 • analysis of bias – variance tradeoff Time

Temgine A Scalable Multivariate Time Series Analysis Engine Challenge: Francois Xin Evan Billetti Wang Sparks Ø Estimate second order statistics Ø E.g. Auto-correlation, auto-regressive models, … Ø for high-dimensional & irregularly sampled time series Solution: • Project onto Fourier basis emgine • does not require data alignment • Infer statistics in frequency domain Define an operator DAG (like TF) • equivalent to kernel smoothing and then rely on query-optimization • analysis of bias – variance tradeoff to define efficient execution.

Learning Training Big Data Big Model

Learning Inference Query ? Big Training Data Decision Big Model Application

Inference Learning Query Big Training Data Decision Big Model Application Timescale: ~ 10 milliseconds Systems: online and latency optimized Less Studied …

Learning Systems Research at the Intersection of Machine Learning - PowerPoint PPT Presentation

Learning Systems Research at the Intersection of Machine Learning & Data Systems Joseph E. Gonzalez Asst. Professor, UC Berkeley jegonzal@cs.berkeley.edu How can machine learning techniques be used to address systems challenges ? Learning

Systems Systems Systems Integration Systems Integration Systems Systems Integration Systems

Types of Expert Systems Interpretation Systems Prediction Systems Diagnosis Systems

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

Bracing Systems Bracing Systems 1 1 Rod Bracing Rod Bracing 2 2 Wind Bracing Systems Wind

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

Machine Learning for Systems and Systems for Machine Learning Jeff Dean Google Brain team

G Corner Electrical Systems Limited SYSTEMS DC Busbar Systems G Corner Electrical CORNER Systems

Year 7 Learning Evening 2017 W elcome! Year 7 Learning Evening 2017 Year 7 Learning Evening

Learning is a never-ending process Tasks come and go, but learning is forever Learn more e ff

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

A Gentle Introduction to Machine Learning Supervised learning, unsupervised learning (very

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA

Welcome to Welcome to The Learning Tree Workshop Series on Learning Differences, Learning

Weapons of mass prediction Leonardo Egidi a (joint work with Jonah Gabry b , in preparation for

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

RinohType A Document Processor inspired by LaTeX Brecht Machiels EuroPython 2015 About the

Sparse Coding and Dictionary Learning for Image Analysis Part I: Optimization for Sparse Coding

Neural Machine Translation Dan Klein, John DeNero UC Berkeley Attention Conditional

Disclosures UCSF June 2014 I have no financial disclosures LEADING THE QUEST FOR HEALTH

Can openEHR archetypes be used in a national context? The Danish archetype Proof-of-Concept

Stanford CS193p Developing Applications for iOS Fall 2017-18 CS193p Fall 2017-18 Today More