Concurrency-Enhancing Transformations for Asynchronous Behavioral - PowerPoint PPT Presentation

Concurrency-Enhancing Transformations for Asynchronous Behavioral Specifications: A Data-Driven Approach John Hansen and Montek Singh University of North Carolina Chapel Hill, NC, USA 1

Introduction: Motivation &MAIN : main proc (IN? chan <<byte, byte, byte, byte, byte forever do Most high-level async tools are begin &ContextCHAN1 : chan <<byte, byte, byte, byte, byte &ContextCHAN2 : chan <<byte, byte, byte, byte>> syntax-directed (Haste/Balsa) &ContextCHAN3 : chan <<byte, byte, byte>> &ContextCHAN4 : chan <<byte, byte>> &ContextCHAN5 : chan <<byte>> | ( These tools are inadequate for contextproc1(IN, ContextCHAN1)|| contextproc2(ContextCHAN1, ContextCHAN2)|| contextproc3(ContextCHAN2, ContextCHAN3)|| contextproc4(ContextCHAN3, ContextCHAN4)|| designing high-speed circuits contextproc5(ContextCHAN4, ContextCHAN5)|| contextproc6(ContextCHAN5, OUT) ) Straightforward spec ➜ slow circuit end od &contextproc1 = proc (IN? chan <<byte, byte, byte, byte, by Fast circuits require significant effort begin context : var <<a: byte, b: byte, c: byte, d: byte, | Need better tool support! forever do IN?context; OUT!<<c, d, context od end &MAIN : main proc (IN? chan <<byte, byte, byte, byte, byte, byte>> & OUT! chan byte). &contextproc2 = proc (IN? chan <<byte, byte, byte, byte, by begin begin a, b, c, d, e, f, g, h, i, j, k : var byte context : var <<c: byte, d: byte, e: byte, f: byte, ~100 lines | | forever do forever do IN?<<a, b, c, d, e, f>>; IN?context; g := a * b; ~10 lines OUT!<<e, f, context h := c * d; od i := e * f; end j := g + h; k := i * j; &contextproc3 = proc (IN? chan <<byte, byte, byte, byte>> & OUT!k begin od context : var <<e: byte, f: byte, g: byte, h: byte end 2 | 2

Our Contribution “Source-to-Source Compiler” Rewrites specs to enhance concurrency Fully-automated and integrated into Haste flow Arsenal of several powerful optimizations: parallelization, pipelining, arithmetic opt., communication opt. Benefits: Up to 59x speedup (throughput) of implementation ... ... 290x speedup with arithmetic optimization Or: Reduces design effort by up to 95% (lines of code) with our method: high performance with low design effort without our method: high performance requires significant effort! 3 3

Our Contribution Our tool integrated as “preprocessor” to Haste compiler leverages Haste compilation and backend Behavioral Spec Parallelize Pipeline Arithmetic Opt. Compiler Communication Opt. Handshake Graph TechMap Original Haste Flow Netlist 4 4

Our Contribution 4 concurrency-enhancing optimizations: X?<<a,b,c,d>>; Parallelization e:=a+b|| e:=a+b|| e:=a+b; e:=a+b; e:=a+b|| remove unnecessary sequencing f:=c+d; f:=c+d; f:=c+d; f:=c+d; f:=c+d; Pipelining g:=f+1; g:=f+1; allow overlapped execution h:=g*2; h:=g*2; Arithmetic Optimization k:=e*f*g*h; k:=e*f*g*h; k:=(e*f)*(g*h); k:=(e*f)*(g*h); decompose/restructure long-latency operations Y!k; Channel Communication Optimization Z!e; Z!e; re-ordering for increased concurrency 5 5

Our Contribution Benefits of automatic code rewriting: &ContextCHAN1 : chan ... &ContextCHAN2 : chan ... &ContextCHAN3 : chan ... Eases burden on designer &ContextCHAN4 : chan ... &ContextCHAN5 : chan ... ... allows focus on functionality instead of perf. contextproc1(IN, ContextCHAN1)|| contextproc2(ContextCHAN1, ContextCHAN2)|| contextproc3(ContextCHAN2, ContextCHAN3)|| greater readability ➜ less chance of bugs contextproc4(ContextCHAN3, ContextCHAN4)|| contextproc5(ContextCHAN4, ContextCHAN5)|| contextproc6(ContextCHAN5, OUT) Step towards design space exploration &contextproc1 = proc (IN? chan ...& OUT! selectively apply optimizations where needed... chan ...). begin context : var <<...>> ... based on a cost function (speed/energy/area) | forever do IN?context; Backwards compatible with legacy code OUT!<<c, d, e, f, a * b>> od Transformed end simply recompile for high-speed implementation &contextproc2 = ... Code &contextproc3 = ... ... forever do &contextproc6 = ... IN?<<a,b,c,d,e,f>>; g := a * b; Designer’s h := c * d; i := e * f; Code j := g + h; k := i * j; OUT!k od 6 6

Solution Domain: Class of Specifications Input Domain: Requires “slack-elastic” specifications Spec must be tolerant of additional slack on channels Formally: deadlock-free, restriction on probes, ... [Manohar/Martin98] Output: Produces “data-driven” specifications Pipelined: data drives computation, not control-dominated Preserves top-level system topology, including cycles Replaces each module with parallelized+pipelined version Correctness model (slack elasticity): spec maintains original token order per channel no guarantees about relative token order across channels 7 7

Solution Domain: Target Architectures Can handle arbitrary topologies B A D E Breaks down C each module into smaller parts 8 8

Talk Outline Previous Work and Background Basic Approach Advanced Techniques Results Conclusion 9 9

Previous Work “Spatial Computation” [Budiu 03] Convert ANSI C programs to dataflow hardware Spec language has inherent limitations cannot model channel communication no fork-join type of concurrency Data-Driven Compilation [Taylor 08, Plana 05] New data-driven specification language “Push” instead of “pull” components Designer must still be skillful at writing highly concurrent specs our approach effectively automates this by code rewriting 10 10

Previous Work Peephole Optzn/Resynthesis [Chelcea/Nowick 02, Plana 05] improve concurrency at circuit and handshake levels do not target higher-level (system-wide) concurrency CHP Specifications [Teifel 04, Wong 01] translate CHP specs into pipelined implementations Balsa/Haste ⇄ CDFG Conversion [Nielsen 04, Jensen 07] main goal is to leverage synchronous tools for resource sharing some peephole optimizations only 11 11

Background: Haste Language Key language constructs: channel reads / writes &fifo=proc(IN?chan byte & IN?x / OUT!y OUT!chan byte). assignments begin & x: var byte ff a := expr | sequential / parallel composition forever do A ; B / A || B IN?x; conditionals x:=x+1; OUT!x if C then X else Y fi od loops end forever do for while 12 12

Background: Haste Compilation Behavioral Spec &fifo=proc(IN?chan byte & OUT!chan byte). begin & x: var byte ff | forever do Compiler IN?x; OUT!x od end Handshake Graph TechMap A syntax-directed design flow Netlist for rapid development 13 13

Background: Haste Limitations forever do IN?a; b:=f1(a); c:=f2(b); d:=f3(c); OUT!f4(d) od straightforward coding ➜ long critical cycles ➜ poor performance 14 14

Talk Outline Introduction Background Basic Approach Advanced Techniques Results Conclusion 15 15

Basic Approach: Overview Four step method: 1. Input a behavioral specification 2. Perform parallelization on statements 3. Create a pipeline stage for each group of parallel statements 4. Produce new code incorporating these optimizations proc(IN?chan byte & OUT!chan byte). forever do (IN?a; forever do OUT!<<a,a*2>>) od IN?a; ... 1: b:=a*2; forever do (IN?<<a,b>>; 2: c:=b+5; OUT!<<b+5,a+b>>) od 3: d:=a+b; ... 4: e:=c+d: forever do (IN?<<a,b,c>>; 5: f:=d*3; OUT!<<c+d,d*3>>) od 6: g:=f+e; ... OUT!g forever do (IN?<<e,f>>; od OUT!<<e+f>>) od 16 16

Parallelizing Transformation proc(IN?chan byte & OUT!chan byte). Increases instruction-level concurrency forever do IN?a; 1: b:=a*2; statements are re-ordered or parallelized 2: c:=b+5; 3: d:=a+b; 4: e:=c+d: Original Example 5: f:=d*3; 6: g:=f+e; proc(IN?chan byte & OUT!chan byte). OUT!g od forever do IN?a; b:=a*2; (c:=b+5 || (c:=b+5 || d:=a+b); d:=a+b); (e:=c+d || (e:=c+d || f:=d*3); f:=d*3); g:=f+6; OUT!g od Reduced Latency! 17 17

Parallelizing Transformation Algorithm: forever do forever do IN?a; IN?a; Generate a dependence graph 1: b:=a*2; b:=a*2; Perform a topological sort 2: c:=b+5; (c:=b+5 || (group parallelizable statements) 3: d:=a+b; d:=a+b); Sequence parallel groupings 4: e:=c+d: (e:=c+d || 5: f:=d*3; f:=d*3); g:=f+e; 6: g:=f+e; OUT!g OUT!g od od 18 18

Parallelizing: What About Cycles? Cycles are collapsed into atomic nodes Parallelization is performed recursively 19 19

Pipelining Transformation proc(IN?chan byte Allows execution to overlap & OUT!chan byte). forever do Control is distributed IN?a; 1: b:=a*2; 2: c:=b+5; 3: d:=a+b; Stage1 (IN?chan byte & OUT!chan byte). 4: e:=c+d: Original Example forever do forever do 5: f:=d*3; 6: g:=f+e; IN?a; IN?a; OUT!g OUT!<<a,a*2>> OUT!<<a,a*2>> od od od Increased ... Stage2 (IN?chan byte & OUT!chan byte). Throughput forever do forever do IN?<<a,b>>; IN?<<a,b>>; OUT!<<a,b,b+5>> OUT!<<a,b,b+5>> od od ... 20 20

Concurrency-Enhancing Transformations for Asynchronous Behavioral - PowerPoint PPT Presentation

Concurrency-Enhancing Transformations for Asynchronous Behavioral Specifications: A Data-Driven Approach John Hansen and Montek Singh University of North Carolina Chapel Hill, NC, USA 1 Introduction: Motivation &MAIN : main proc (IN?

Asynchronous Programming Model for Concurrency concurrency Concurrency is when two or more tasks

How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers

AN ASYNCHRONOUS DIVIDER IMPLEMENTATION Navaneeth Jamadagni and Jo Ebergen 2 Asynchronous

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

COMP31212: Concurrency Topics 4.1: Concurrency Patterns - Monitors Topic 4.1: Concurrency

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Concurrency Control Ensuring Isolation 354 Concurrency control Concurrency To increase

Concurrency What is concurrency? In computer science, concurrency is a property of systems which

Synchronizing the Asynchronous Bernhard Kragl IST Austria Shaz Qadeer Thomas A. Henzinger

Asynchronous Communication II 2 / 41 INF4140 - Models of concurrency Asynchronous Communication,

Linear Transformations Linear Transformations 1 / 21 Linear Transformations A function T from R

CMSC427 Transformations I Credit: slides 9+ from Prof. Zwicker Transformations: outline

Transformations Composition of Transformations Congruence Transformations Dilations Similarity

Lecture 6: Normal Transformations, 3D Transformations, Euler Angles COMPSCI/MATH 290-04 Chris

lecture 3 view transformations model transformations GL_MODELVIEW transformation view

Observer Design Pattern Event-Driven Design EECS3311 A & E: Software Design Fall 2020 C HEN

Game Design Pa,erns CS 4730 Computer Game Design

Algorithm Design Maria-Florina (Nina) Balcan Carnegie Mellon University Analysis and Design of

Pronouns & Weak Nouns M&R 1525 ENG240Y Old English / Fri 17 Sep 2010

Towards Improving the Quality of Knowledge Graphs with Data-driven Ontology Patterns and SHACL

26. Data-Oriented Design Methods 1) Jackson Structured Programming (JSP) and Jackson Structured

CS 4518 Mobile and Ubiquitous Computing Lecture 5: Data-Driven Views and Android Components

Toward timely, predictable and cost-effective data analytics Renata Borovica-Gaji DIAS, EPFL

Concurrency-Enhancing Transformations for Asynchronous Behavioral - PowerPoint PPT Presentation

Concurrency-Enhancing Transformations for Asynchronous Behavioral Specifications: A Data-Driven Approach John Hansen and Montek Singh University of North Carolina Chapel Hill, NC, USA 1 Introduction: Motivation &MAIN : main proc (IN?

Asynchronous Programming Model for Concurrency concurrency Concurrency is when two or more tasks

How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers

AN ASYNCHRONOUS DIVIDER IMPLEMENTATION Navaneeth Jamadagni and Jo Ebergen 2 Asynchronous

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

COMP31212: Concurrency Topics 4.1: Concurrency Patterns - Monitors Topic 4.1: Concurrency

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Concurrency Control Ensuring Isolation 354 Concurrency control Concurrency To increase

Concurrency What is concurrency? In computer science, concurrency is a property of systems which

Synchronizing the Asynchronous Bernhard Kragl IST Austria Shaz Qadeer Thomas A. Henzinger

Asynchronous Communication II 2 / 41 INF4140 - Models of concurrency Asynchronous Communication,

Linear Transformations Linear Transformations 1 / 21 Linear Transformations A function T from R

CMSC427 Transformations I Credit: slides 9+ from Prof. Zwicker Transformations: outline

Transformations Composition of Transformations Congruence Transformations Dilations Similarity

Lecture 6: Normal Transformations, 3D Transformations, Euler Angles COMPSCI/MATH 290-04 Chris

lecture 3 view transformations model transformations GL_MODELVIEW transformation view

Observer Design Pattern Event-Driven Design EECS3311 A &amp; E: Software Design Fall 2020 C HEN

Game Design Pa,erns CS 4730 Computer Game Design

Algorithm Design Maria-Florina (Nina) Balcan Carnegie Mellon University Analysis and Design of

Pronouns &amp; Weak Nouns M&amp;R 1525 ENG240Y Old English / Fri 17 Sep 2010

Towards Improving the Quality of Knowledge Graphs with Data-driven Ontology Patterns and SHACL

26. Data-Oriented Design Methods 1) Jackson Structured Programming (JSP) and Jackson Structured

CS 4518 Mobile and Ubiquitous Computing Lecture 5: Data-Driven Views and Android Components

Toward timely, predictable and cost-effective data analytics Renata Borovica-Gaji DIAS, EPFL

Observer Design Pattern Event-Driven Design EECS3311 A & E: Software Design Fall 2020 C HEN

Pronouns & Weak Nouns M&R 1525 ENG240Y Old English / Fri 17 Sep 2010