Research at the intersection of AI + Systems Joseph E. Gonzalez - PowerPoint PPT Presentation

Research at the intersection of AI + Systems Joseph E. Gonzalez Assistant Professor, UC Berkeley jegonzal@cs.berkeley.edu

Looking Back on AI Systems Going back to when I started graduate school …

Machine learning community has had an evolving focus on AI Systems Distributed Deep Learning Fast Algorithms Frameworks Algorithms 2006 2017 ML for Machine Learning Systems Frameworks Integration of Communities

Learning Training Big Data Big Model The focus of AI Systems research has been on model training.

Training Big Data Big Model Enabling Machine Learning and Systems Innovations Deep Learning Stochastic Distributed (CNN/RNN) Optimization Dataflow Systems Domain Specific Symbolic GPU / TPU Languages (TensorFlow) Methods Acceleration

Training Big Data Big Model Splash CoCoA rllab VW

Learning ? Big Training Data Big Model

Learning Drive Actions Big Training Data Big Model

Learning Prediction Query ? Big Training Data Decision Big Model Application

Prediction Learning Query Big Training Data Decision Big Model Application Goal: ~ 10 ms under heavy load Complicated by Deep Learning è New ML Algorithms and Systems

Support low-latency, high-throughput serving workloads Models getting more complex Ø 10s of GFLOPs [1] Ø Recurrent nets Using specialized Deployed on critical path hardware for Ø Maintain latency goals under heavy load predictions [1] Deep Residual Learning for Image Recognition. He et al. CVPR 2015.

Google Translate Serving “If each of the world’s Android phones used the new Google voice search for just three minutes a day , these engineers realized, the company would 140 billion words a day 1 need twice as many data centers. ” – Wired 82,000 GPUs Designed New Hardware! running 24/7 Tensor Processing Unit (TPU) [1] https://www.nytimes.com/2016/12/14/magazine/the-great-ai-awakening.html

Prediction-Serving Challenges ??? VW Create Caffe 14 Large and growing Support low-latency, high- ecosystem of ML models throughput serving workloads and frameworks

Wide range of application and frameworks

Wide range of application and frameworks ??? VW Create Caffe 16

One-Off Systems for High-Value Tasks Problems: Expensive to build and maintain Ø Requires AI + Systems expertise Tightly-coupled model, framework, and application Ø Difficult to update models and add new frameworks

Prediction Serving is an Open Problem Ø Computationally challenging Ø Need low latency & high throughput Ø No standard technology or abstractions for serving models IDK Prediction Cascades Learning to make fast predictions Low Latency [Work in Progress] Prediction Serving System [NSDI’17]

Clipper Low Latency Prediction Serving System Xin Giulio Daniel Yika Corey Ion Alexey Wang Zhou Crankshaw Luo Zumar Stoica Tumanov

Wide range of application and frameworks ??? VW Create Caffe 20

Middle layer for prediction serving. Common System Abstraction Optimizations VW Create Caffe 21

Clipper Decouples Applications and Models Applications Predict RPC/REST Interface Observe Clipper RPC RPC RPC RPC Model Container (MC) MC MC MC Caffe

Clipper Architecture Applications Predict RPC/REST Interface Observe Clipper Model Selection Layer Combine predictions across frameworks Provide a common interface and Model Abstraction Layer system optimizations RPC RPC RPC RPC MC MC MC Model Container (MC) Caffe

Clipper Architecture Applications Predict RPC/REST Interface Observe Optimized Common Model Clipper Caching Batching API Isolation Model Selection Layer Combine predictions across frameworks Provide a common interface and Model Abstraction Layer system optimizations RPC RPC RPC RPC MC MC MC Model Container (MC) Caffe

Batching to Improve Throughput Ø Why batching helps: Ø Optimal batch depends on: Ø hardware configuration Ø model and framework A single Ø system load page load may generate Clipper Solution: many queries Adaptively tradeoff latency and throughput… Throughput-optimized frameworks Ø Inc. batch size until the latency objective is exceeded ( Additive Increase ) Throughput Ø If latency exceeds SLO cut batch size by a fraction ( Multiplicative Decrease ) Batch size

AdaStLve 1R BatFKLng 4 6 9 3 8 1 3 9 2 8 8 0 7 60000 4 4 9 5 4 Throughput 5 3 8 9 40000 2 2 3 6 2 6 0 1 0 9 2 2 (QPS) 2 7 7 3 20000 8 7 9 9 1 9 0 1 1 3 1 2 0 8 40 2 0 0 0 0 0 P99 Latency 2 2 2 2 2 20 5 5 (ms) 0 0 0 0 0 4 8 1 0 3 3 4 1 1 2 2 1 1000 2 3 6 0 9 Batch Size 4 3 0 t n 0 0 0 S V R 2 e 9 9 9 L r V - 6 6 6 R R V ) ) ) ) 1 e r r O N n n e a a r r P ) r r ) n g e a e a n n a r e n n R S e e r r e a 5 a d L L 6 L L L L K e e n K K y g L O a K 3 6 6 R K 5 6 ( ( ( L 6 ( (

Overhead of decoupled architecture Applications Applications RPC Interface Predict RPC/REST Interface Predict Feedback TensorFlow- Clipper Serving RPC RPC RPC RPC MC MC MC MC Caffe

Overhead of decoupled architecture Better P99 Latency (ms) Throughput (QPS) Better Decentralized system matches performance of centralized design.

Clipper Architecture Applications Predict RPC/REST Interface Observe Clipper Model Selection Layer Combine predictions across frameworks Provide a common interface and Model Abstraction Layer system optimizations RPC RPC RPC RPC MC MC MC Model Container (MC) Caffe

Predict RPC/REST Interface Observe Clipper Model Selection Layer Combine predictions across frameworks Provide a common interface and Model Abstraction Layer Version 1 system optimizations Periodic retraining Version 2 Version 3 RPC RPC RPC RPC MC MC MC Model Container (MC) Caffe Experiment with new models and frameworks Caffe

Selection Policy can Calibrate Confidence Version 2 Version 3 “CAT” “CAT” Policy “DJ” “CAT” UNSURE “CAT” Caffe

Selection Policy: Estimate confidence Image1et 7Rp-5 ErrRr 5ate cRnIident cRnIident unsure unsure 0.4 0.3182 Better 0.1983 0.2 0.0586 0.0469 0.0327 0.0 5-agree ensemEle 4-agree

Selection Policy: Estimate confidence Image1et 7Rp-5 ErrRr 5ate cRnIident cRnIident unsure unsure 0.4 0.3182 0.1983 0.2 Better 0.0586 0.0469 0.0327 0.0 5-agree ensemEle 4-agree width is percentage of query workloads

Open Research Questions Ø Efficient execution of complex model compositions Ø Optimal batching to achieve end-to-end latency goals Ø Automatic model failure identification and correction Ø Use anomaly detection techniques to identify model failures Ø Prediction serving on the edge Ø Allowing models to span cloud and edge infrastructure http://clipper.ai

IDK Prediction Cascades Learning to make fast predictions. Low Latency [Work in Progress] Prediction Serving System [NSDI’17]

Accuracy Relative Cost 90 1.2 78.3 77.4 76.2 1 73.3 80 Small 69.8 1 70 but Order of magnitude 56.6 60 0.8 significant 0.67 gap 50 0.6 40 0.33 0.31 30 0.4 20 0.15 0.2 0.08 10 0 0 Complexity à Complexity à Model costs are increasing much faster than gains in accuracy .

IDK Prediction Cascades Simple models for simple tasks Xin Yika Daniel Alexey https://arxiv.org/abs/1706.00885 Wang Luo Crankshaw Tumanov I don’t Query Simple Model Accurate Model Know Slow Fast Prediction Prediction Combine fast (inaccurate) models with slow (accurate) models to maximize accuracy while reducing computational costs.

Query Simple Model ResNet152 Accuracy Relative Cost 100 1.2 1 78.3 78.3 78.3 78.3 78.3 0.89 1 80 0.8 0.76 0.8 0.63 60 0.6 40 0.4 20 0.2 0 0 37% reduction in runtime @ no loss in accuracy

I don’t Query Simple Model ResNet152 Know Slow Fast Prediction Prediction Ø Cascades within a Model Conv Conv Conv Conv Gate Conv Conv Gate Conv Query Prediction FC

I don’t Query Simple Model Accurate Model Know Slow Fast Prediction Prediction Ø Cascades within a Model Conv Conv Conv Conv Gate Conv Conv Gate Conv Query Prediction FC Skip Blocks

Cascading reduces computational cost 1R GDte CRnvGDte 511GDte 120 110.0 67.08 67.72 Similar gains on 100 larger models 40% AverDge Depth 80 74.0 54.16 54.69 28% 60 38.0 35.82 34.31 40 10% 20 0 5es1et110 5es1et74 5es1et38

Easy Images Difficult Images Skip More Skip More Skip Less Skip Less Number of Layers Skipped

Research at the intersection of AI + Systems Joseph E. Gonzalez - PowerPoint PPT Presentation

Research at the intersection of AI + Systems Joseph E. Gonzalez Assistant Professor, UC Berkeley jegonzal@cs.berkeley.edu Looking Back on AI Systems Going back to when I started graduate school Machine learning community has had an

Intersection Safety Intersection Safety Intersection Safety FHWA Safety Focus Areas FHWA Safety

INTERSECTION LINKUK CONFIDENTIAL 3 3 INTERSECTION LINKUK CONFIDENTIAL 4 Not drawn to scale

Intersection Safety Intersection Safety Intersection Safety Intersections Intersections

Family of intersection problems Family of intersection problems CG Lecture 2 CG Lecture 2 1.

Large deviations for Brownian intersection measures Chiranjib Mukherjee Prague, September, 2011

City of Camas NW 6 th and Norwood Intersection Options Existing Intersection hird level

Computational Geometry Lecture 2: Line segment intersection for map overlay 1 Computational

Ray-triangle intersection In this handout, we explore the steps needed to compute the intersection

CONGESTED INTERSECTION IMPROVEMENTS CONGESTED INTERSECTION IMPROVEMENTS David Plummer &

INTERSECTION CONTROL July 2018 FDOTS MANUAL ON INTERSECTION CONTROL EVALUATION Adopted

Texas Intersection Safety Implementation Plan Workshop JUNE 2, 2016 Why Intersection Safety?

SR 758/Midnight Pass Road and Beach Road Intersection September 22, 2020 1 Intersection

Ray Intersection Steve Marschner CS 4620 Cornell University Cornell CS4620 Fall 2020 Steve

Improved Private Set Intersection against Malicious Adversaries Peter Rindal Mike Rosulek

Filtered and Intersection Homology Jon Woolf, work in progress with Ryan Wissett April, 2016

Rank 3 Inhabitation of Intersection Types Revisited Andrej Dudenhefner Jan Bessai Boris D

DHCAL Progress and Test Beam DHCAL Progress and Test Beam Preparation Preparation Lei Xia Xia

Mercury: RPC for High-Performance Computing Jerome Soumagne The HDF Group June 23, 2017 RPC and

Scalable node addressing Scalable node addre and message routing for global

Advanced Research Issues In Security: Securing Key Internet Technologies CS 236 On-Line MS

Minimal RPC framework with modern C++ Rui Figueira https://github.com/ruifig/czrpc

te techno hnologi logies: es: le less ssons ons le learned rned and nd fu futur ure e

Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by

Design and implementation of a RPC library in Audebert python Introduction Client Service

Research at the intersection of AI + Systems Joseph E. Gonzalez - PowerPoint PPT Presentation

Research at the intersection of AI + Systems Joseph E. Gonzalez Assistant Professor, UC Berkeley jegonzal@cs.berkeley.edu Looking Back on AI Systems Going back to when I started graduate school Machine learning community has had an

Intersection Safety Intersection Safety Intersection Safety FHWA Safety Focus Areas FHWA Safety

INTERSECTION LINKUK CONFIDENTIAL 3 3 INTERSECTION LINKUK CONFIDENTIAL 4 Not drawn to scale

Intersection Safety Intersection Safety Intersection Safety Intersections Intersections

Family of intersection problems Family of intersection problems CG Lecture 2 CG Lecture 2 1.

Large deviations for Brownian intersection measures Chiranjib Mukherjee Prague, September, 2011

City of Camas NW 6 th and Norwood Intersection Options Existing Intersection hird level

Computational Geometry Lecture 2: Line segment intersection for map overlay 1 Computational

Ray-triangle intersection In this handout, we explore the steps needed to compute the intersection

CONGESTED INTERSECTION IMPROVEMENTS CONGESTED INTERSECTION IMPROVEMENTS David Plummer &amp;

INTERSECTION CONTROL July 2018 FDOTS MANUAL ON INTERSECTION CONTROL EVALUATION Adopted

Texas Intersection Safety Implementation Plan Workshop JUNE 2, 2016 Why Intersection Safety?

SR 758/Midnight Pass Road and Beach Road Intersection September 22, 2020 1 Intersection

Ray Intersection Steve Marschner CS 4620 Cornell University Cornell CS4620 Fall 2020 Steve

Improved Private Set Intersection against Malicious Adversaries Peter Rindal Mike Rosulek

Filtered and Intersection Homology Jon Woolf, work in progress with Ryan Wissett April, 2016

Rank 3 Inhabitation of Intersection Types Revisited Andrej Dudenhefner Jan Bessai Boris D

DHCAL Progress and Test Beam DHCAL Progress and Test Beam Preparation Preparation Lei Xia Xia

Mercury: RPC for High-Performance Computing Jerome Soumagne The HDF Group June 23, 2017 RPC and

Scalable node addressing Scalable node addre and message routing for global

Advanced Research Issues In Security: Securing Key Internet Technologies CS 236 On-Line MS

Minimal RPC framework with modern C++ Rui Figueira https://github.com/ruifig/czrpc

te techno hnologi logies: es: le less ssons ons le learned rned and nd fu futur ure e

Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by

Design and implementation of a RPC library in Audebert python Introduction Client Service

CONGESTED INTERSECTION IMPROVEMENTS CONGESTED INTERSECTION IMPROVEMENTS David Plummer &