DtCraft: A High-performance Distributed Execution Engine at Scale - PowerPoint PPT Presentation

DtCraft: A High-performance Distributed Execution Engine at Scale Dr. Tsung-Wei Huang Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign, IL, USA 1

Outline  Streamline the cluster programming  DtCraft system  Leverage your time to produce promising results  Hands-on examples  Q&A 2

Motivation: A “Hard - coded” Distributed Timer  General design partitions  Logical, physical, or hierarchical partitions  Design data are stored in a shared storage (e.g., NFS, GPFS)  Non-blocking IO  Single-server multiple-client model  Event-driven programming  Server is the centralized coordinator  Serialization/Deserialization  Clients exchange boundary timing with server TOP level Hierarchy M2 M1:PI1 M1 M1:PO1 PO1 G1 PI1 H1 M2:PI1 I1 M2:PO1 PI2 M1:PI2 M2 M2:PI2 PI3 Hierarchy M1 Three partitions, top-level, M1, and M2 (given by design teams) Huang et al., “A Distributed Timing Analysis Framework for Large Designs,” IEEE/ACM DAC16 3

What does “Productivity” Mean? Programming language Transparency Performance 4

Our Solution: DtCraft  A unified engine to streamline cluster programming  Completely built from the ground up using C++17  Save your time away from the pain of DevOps High-level C++17-based Stream Graph API Network Event-driven Resource I/O stream Serialization programming reactor control DtCraft Kernel (master, agents, executors) … T.-W. Huang, C.- X. Lin, and M. D. F. Wong, “DtCraft: A distributed execution engine for compute - intensive applications,” IEEE/ACM ICCAD, 2017 5

System Architecture  Express your parallelism in our stream graph model  Generic dataflow at any granularity  Deliver transparent concurrency through the kernel  Automatic workload distribution and message passing DtCraft website: http://dtcraft.web.engr.illinois.edu/ 6

Stream Graph Programming Model  A general representation of a dataflow  Abstraction over computation and communication  Analogous to the assembly line model  Vertex storage  goods store  Stream processing unit  independent workers Compute unit Generate data ostream istream Data stream A B A  B A  B A B Stream graph istream ostream B A A  B A  B Data stream buffer Compute unit Generate data 7

Outline  Streamline the cluster programming  DtCraft system  Leverage your time to produce promising results  Hands-on examples  Q&A 8

Write a DtCraft Application  Step 1: Decide the stream graph of your application  Step 2: Specify the data types to stream  Step 3: Define the stream computation callback  Step 4: Attach resources on vertices (optional)  Step 5: Submit ./submit – master=host hello-world ostream istream A B String str; String str; A B int id; int id; Container 1 Container 2 (1 CPU / 4GB RAM) (1 CPU / 8GB RAM) 9

Feedback Control Flow Example  Concurrent Ping-pong  Each end keeps sending a binary data to the other end  Iteration breaks when one end received a hundred 1s Step 3: A  B callback Step 2: Step 4: A’s resource bool flag; [=] (auto& B, auto& is) { 1 CPU / 1 GB RAM Extract bool from is; ‘1’ or ‘0’ (random) if received 100: close; else send bool to A; Step 1: stream graph } Break at A B counter ≥ 100 istream B Step 3: A  B callback ‘1’ or ‘0’ (random) [=] (auto& A, auto& is) { Extract bool from is; Step 4: B’s resource if received 100: close; 1 CPU / 1 GB RAM Step 2: else send bool to B; } bool flag; istream A Step 5: ./submit – master=127.0.0.1 ping-pong 10

g++ ping-pong.cpp -lDtCraft -o ping-pong  ~40 lines of code  Single program  Sequential flow  Fully distributed  Simple syntax  Resource control  Built-in serialization  Asynchronous IO  Multi-threaded  Isolation … and more ~$ ./submit ping-pong or ~$ ./ping-pong 11

Distributed Timing Analysis using DtCraft  Two-level hierarchical design (three partitions)  Top-level TOP level Hierarchy M2 API M1:PI1 M1  M1 M1:PO1 report_at PO1  M2 G1 PI1 H1 report_slew M2:PI1 I1 M2:PO1 report_rat PI2 M1:PI2 remove_gate M2 Timing M2:PI2 PI3 insert_gate Hierarchy M1 command power_gate s insert_net  Three timer vertices Timer User connect_pin  One user vertex ... Boundary timing  Four Linux containers  Six input/output streams Optimization Timer Timer Each container has one OpenTimer program operating on one design hierarchy 12

Exchange Timing Data – Delay, Slew, etc. DtCraft Existing framework In-context streaming with < 30 lines Many extra stuff  Extra.pb.h Out-of-context Extra.pb.cpp streaming takes … > 300 lines Source.cpp 13

Deploy the Distributed Timer in One Line DtCraft Existing framework Duplicate the code for each partition Top.cpp M1.cpp M2.cpp Only three lines for resource control in Linux container Container 3 Container 1 Container 2 Wrap up with submission scripts ~$ ./submit – master=127.0.0.1 binary 14

Comparison with the Hard-coded Method  × 17 fewer lines of code  33% from message passing The potential productivity  67% from boilerplate code gain is tremendous!  7-11% performance loss  Transparent concurrency  API cost Development time Runtime (40 AWS nodes) 6000 15 4000 10 2000 5 0 0 Small Medium Large # weeks DtCraft Hard-coded DtCraft Hard-coded 15

Getting Involved with DtCraft   Github: https://github.com/twhuang-uiuc/DtCraft  Star our project to receive updates DtCraft Cluster computing Scalability Groovy API Security Productivity  MIT license  Open to collaboration! 16

Distributed Online Machine Learning  One image stream generator  One image label classifier/trainer Image stream … Data DNN source Classifier Stream generator Online image label classifier 17

Only 60-line code to create distributed ML with streaming 18

Thank you! Tsung-Wei Huang twh760812@gmail.com http://web.engr.illinois.edu/~thuang19/ 19

DtCraft: A High-performance Distributed Execution Engine at Scale - PowerPoint PPT Presentation

DtCraft: A High-performance Distributed Execution Engine at Scale Dr. Tsung-Wei Huang Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign, IL, USA 1 Outline Streamline the cluster programming

MASTERING STRATEGY EXECUTION 18 BEST PRACTICES FOR STRATEGY EXECUTION STRATEGY EXECUTION AS

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

RAY A Scalable computation engine Ray is a flexible, high-performance, distributed

Whats New in Engine Research Whats New in Engine Research Mark Musculus Engine Combustion

1 Mapping Relational Data Model Patterns To The App Engine Datastore Max Ross November 19,

Google App Engine Guido van Rossum Stanford EE380 Colloquium, Nov 5, 2008 Google App Engine

Hadoop Map Reduce 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind

execution states with swapping Processes, Execution, and State 3F. Execution State Model exit

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

Introducing Mobilgard 540 X ExxonMobils newest high -performance marine diesel engine

Scenarios@run.time Distributed Scenarios@run.time Distributed Execution of Specifications

CIEL: a universal execution engine for distributed data-flow computing Derek G. Murray, Malte

CIEL: A UNIVERSAL EXECUTION ENGINE FOR DISTRIBUTED DATA-FLOW COMPUTING Derek G. Murray, Malte

CIEL A universal execution engine for distributed data-flow computing Murray, Derek G., et al.

EPA 2016 PACCAR MX-11 Engine Month XX, 20XX EPA 2016 PACCAR MX-11 Engine PRESENTERS NAME

Sparse Gaussian Process Approximations Dr. Richard E. Turner ( ret26@cam.ac.uk ) Computational and

Health and Genetics Engineering & Public Policy Rebecca Balebako y & c S a e v c

Statistical properties for holomorphic endomorphisms of morphisms F. Bianchi, projective spaces

Free Electrons. Kernel, drivers and embedded Linux development, consulting, training and support.

Firmware at the Mu2e Test Stand Micol Rigatti Final Report 25/09/2019 Mu2e Experiment A search

Analysing Switch-Case Tables by Partial Evaluation Niklas Holsti Tidorum Ltd www.tidorum.fi

Research Paper Recommender System Based on Deep Text Comprehension Dongyu Ru Kun Chen SJTU

Processing-in-memory (PIM) is regaining attention for energy efficient computing Graph

DtCraft: A High-performance Distributed Execution Engine at Scale - PowerPoint PPT Presentation

DtCraft: A High-performance Distributed Execution Engine at Scale Dr. Tsung-Wei Huang Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign, IL, USA 1 Outline Streamline the cluster programming

MASTERING STRATEGY EXECUTION 18 BEST PRACTICES FOR STRATEGY EXECUTION STRATEGY EXECUTION AS

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

RAY A Scalable computation engine Ray is a flexible, high-performance, distributed

Whats New in Engine Research Whats New in Engine Research Mark Musculus Engine Combustion

1 Mapping Relational Data Model Patterns To The App Engine Datastore Max Ross November 19,

Google App Engine Guido van Rossum Stanford EE380 Colloquium, Nov 5, 2008 Google App Engine

Hadoop Map Reduce 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind

execution states with swapping Processes, Execution, and State 3F. Execution State Model exit

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

Introducing Mobilgard 540 X ExxonMobils newest high -performance marine diesel engine

Scenarios@run.time Distributed Scenarios@run.time Distributed Execution of Specifications

CIEL: a universal execution engine for distributed data-flow computing Derek G. Murray, Malte

CIEL: A UNIVERSAL EXECUTION ENGINE FOR DISTRIBUTED DATA-FLOW COMPUTING Derek G. Murray, Malte

CIEL A universal execution engine for distributed data-flow computing Murray, Derek G., et al.

EPA 2016 PACCAR MX-11 Engine Month XX, 20XX EPA 2016 PACCAR MX-11 Engine PRESENTERS NAME

Sparse Gaussian Process Approximations Dr. Richard E. Turner ( ret26@cam.ac.uk ) Computational and

Health and Genetics Engineering &amp; Public Policy Rebecca Balebako y &amp; c S a e v c

Statistical properties for holomorphic endomorphisms of morphisms F. Bianchi, projective spaces

Free Electrons. Kernel, drivers and embedded Linux development, consulting, training and support.

Firmware at the Mu2e Test Stand Micol Rigatti Final Report 25/09/2019 Mu2e Experiment A search

Analysing Switch-Case Tables by Partial Evaluation Niklas Holsti Tidorum Ltd www.tidorum.fi

Research Paper Recommender System Based on Deep Text Comprehension Dongyu Ru Kun Chen SJTU

Processing-in-memory (PIM) is regaining attention for energy efficient computing Graph

Health and Genetics Engineering & Public Policy Rebecca Balebako y & c S a e v c