DtCraft: A High-performance Distributed Execution Engine at Scale - - PowerPoint PPT Presentation

dtcraft a high performance distributed execution engine
SMART_READER_LITE
LIVE PREVIEW

DtCraft: A High-performance Distributed Execution Engine at Scale - - PowerPoint PPT Presentation

DtCraft: A High-performance Distributed Execution Engine at Scale Dr. Tsung-Wei Huang Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign, IL, USA 1 Outline Streamline the cluster programming


slide-1
SLIDE 1

DtCraft: A High-performance Distributed Execution Engine at Scale

  • Dr. Tsung-Wei Huang

Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign, IL, USA

1

slide-2
SLIDE 2

Outline

 Streamline the cluster programming

DtCraft system

 Leverage your time to produce promising results

Hands-on examples

 Q&A

2

slide-3
SLIDE 3

Motivation: A “Hard-coded” Distributed Timer

 General design partitions

 Logical, physical, or hierarchical partitions  Design data are stored in a shared storage (e.g., NFS, GPFS)

 Single-server multiple-client model

 Server is the centralized coordinator  Clients exchange boundary timing with server

TOP level M1 Hierarchy M2 PI1 PI2 PI3 Hierarchy M1 PO1 M1:PI1 M1:PI2 M1:PO1 M2:PI1 M2:PI2

M2:PO1

M2 I1 G1 H1

Three partitions, top-level, M1, and M2 (given by design teams)

3

 Non-blocking IO  Event-driven programming  Serialization/Deserialization

Huang et al., “A Distributed Timing Analysis Framework for Large Designs,” IEEE/ACM DAC16

slide-4
SLIDE 4

What does “Productivity” Mean?

4

Programming language Transparency Performance

slide-5
SLIDE 5

Our Solution: DtCraft

 A unified engine to streamline cluster programming

Completely built from the ground up using C++17

 Save your time away from the pain of DevOps

5

DtCraft Kernel (master, agents, executors) Network programming I/O stream Event-driven reactor Resource control

Serialization High-level C++17-based Stream Graph API

T.-W. Huang, C.-X. Lin, and M. D. F. Wong, “DtCraft: A distributed execution engine for compute- intensive applications,” IEEE/ACM ICCAD, 2017

slide-6
SLIDE 6

System Architecture

6

 Express your parallelism in our stream graph model

Generic dataflow at any granularity

 Deliver transparent concurrency through the kernel

Automatic workload distribution and message passing

DtCraft website: http://dtcraft.web.engr.illinois.edu/

slide-7
SLIDE 7

Stream Graph Programming Model

 A general representation of a dataflow

Abstraction over computation and communication

 Analogous to the assembly line model

Vertex storage  goods store Stream processing unit  independent workers

7

A B

Data stream Data stream

  • stream

A

istream

B

istream

A

Stream graph

  • stream

B

AB AB

AB

AB

Generate data Compute unit Compute unit Generate data buffer

slide-8
SLIDE 8

Outline

 Streamline the cluster programming

DtCraft system

 Leverage your time to produce promising results

Hands-on examples

 Q&A

8

slide-9
SLIDE 9

Write a DtCraft Application

 Step 1: Decide the stream graph of your application  Step 2: Specify the data types to stream  Step 3: Define the stream computation callback  Step 4: Attach resources on vertices (optional)  Step 5: Submit

9

A B

Container 1 (1 CPU / 4GB RAM) Container 2 (1 CPU / 8GB RAM)

A B String str; int id; String str; int id;

  • stream

istream ./submit –master=host hello-world

slide-10
SLIDE 10

Feedback Control Flow Example

 Concurrent Ping-pong

Each end keeps sending a binary data to the other end Iteration breaks when one end received a hundred 1s

10

A B

‘1’ or ‘0’ (random) ‘1’ or ‘0’ (random) Break at counter ≥ 100

Step 2: bool flag; Step 2: bool flag; Step 5: ./submit –master=127.0.0.1 ping-pong

[=] (auto& B, auto& is) { Extract bool from is; if received 100: close; else send bool to A; }

istream

B Step 3: AB callback

[=] (auto& A, auto& is) { Extract bool from is; if received 100: close; else send bool to B; }

istream

A Step 3: AB callback Step 4: A’s resource 1 CPU / 1 GB RAM Step 4: B’s resource 1 CPU / 1 GB RAM Step 1: stream graph

slide-11
SLIDE 11

g++ ping-pong.cpp -lDtCraft -o ping-pong

11

  • ~40 lines of code
  • Single program
  • Sequential flow
  • Fully distributed
  • Simple syntax
  • Resource control
  • Built-in serialization
  • Asynchronous IO
  • Multi-threaded
  • Isolation

… and more

~$ ./ping-pong ~$ ./submit ping-pong

  • r
slide-12
SLIDE 12

Distributed Timing Analysis using DtCraft

 Two-level hierarchical design (three partitions)

Timer Timer Timer

API

report_at report_slew report_rat remove_gate insert_gate power_gate insert_net connect_pin

... Optimization program

12 TOP level M1 Hierarchy M2 PI1 PI2 PI3 Hierarchy M1 PO1 M1:PI1 M1:PI2 M1:PO1 M2:PI1 M2:PI2

M2:PO1

M2 I1 G1 H1

User

 Three timer vertices  One user vertex  Four Linux containers  Six input/output streams

Boundary timing Timing command s

 Top-level  M1  M2 Each container has one OpenTimer

  • perating on one design hierarchy
slide-13
SLIDE 13

Exchange Timing Data – Delay, Slew, etc.

13

DtCraft Existing framework

In-context streaming with < 30 lines

Extra.pb.h Extra.pb.cpp … Source.cpp

Out-of-context streaming takes > 300 lines Many extra stuff 

slide-14
SLIDE 14

Deploy the Distributed Timer in One Line

14

~$ ./submit –master=127.0.0.1 binary

Existing framework DtCraft

Top.cpp M1.cpp M2.cpp

Duplicate the code for each partition

Container 1 Container 2 Container 3

Wrap up with submission scripts Only three lines for resource control in Linux container

slide-15
SLIDE 15

Comparison with the Hard-coded Method

15

 ×17 fewer lines of code

 33% from message passing  67% from boilerplate code

 7-11% performance loss

 Transparent concurrency  API cost

2000 4000 6000 Small Medium Large

Runtime (40 AWS nodes)

DtCraft Hard-coded

The potential productivity gain is tremendous!

5 10 15 # weeks

Development time

DtCraft Hard-coded

slide-16
SLIDE 16

Getting Involved with DtCraft 

 Github: https://github.com/twhuang-uiuc/DtCraft  Star our project to receive updates  MIT license  Open to collaboration!

16

DtCraft

Cluster computing

Scalability Security Productivity Groovy API

slide-17
SLIDE 17

Distributed Online Machine Learning

 One image stream generator  One image label classifier/trainer

17

Data source DNN Classifier

Image stream Stream generator Online image label classifier

slide-18
SLIDE 18

18

Only 60-line code to create distributed ML with streaming

slide-19
SLIDE 19

Thank you!

19

Tsung-Wei Huang twh760812@gmail.com http://web.engr.illinois.edu/~thuang19/