A Graphical Dataflow Programming Approach To High Performance - PowerPoint PPT Presentation

A Graphical Dataflow Programming Approach To High Performance Computing Somashekar acharya G. Bhaskaracharya National Instruments Bangalore ni.com 1

Outline • Graphical Dataflow Programming • LabVIEW – Introduction and Demo • LabVIEW Compiler (under the hood) • Multicore Programming in LabVIEW • Polyhedral Compilation of Graphical Dataflow Programs ni.com 2

Evolution of Programming Languages Text Based: C#, Java, Fortran, Python, Binary Pascal Ruby LabVIEW Assembly C / C++ ni.com 3

Graphical Dataflow v/s Imperative Programs Imperative Programming • Computation specified as sequence of statements • Each statement changes the program state // s = ut + 0.5a*t*t double displacement_in_time_t(double time, double initial_velocity, double acceleration) { double displacement = initial_velocity * time; displacement += 0.5 * acceleration * time * time; return displacement; } ni.com 4

Graphical Dataflow v/s Imperative Programs Imperative Programming • Computation specified as sequence of statements • Each statement changes the program state // s = ut + 0.5a*t*t double displacement_in_time_t(double time, double initial_velocity, double acceleration) { double displacement = initial_velocity * time; displacement += 0.5 * acceleration * time * time; return displacement; } Graphical dataflow programming • No notion of statements • No fixed relative execution order • Referential transparency ni.com 5

Dataflow Execution Semantics • Interconnected set of nodes that represent specific computations • Nodes consume input data to produce output data • Nodes ready to fired as soon as data is available on all inputs ni.com 6

Inherent Parallelism Of Dataflow Programs Partially ordered program specification Possible orderings of node execution: Strictly Sequential Multiply < Square < TernaryMultiply < Add • Square < TernaryMultiply < Multiply < Add • Square < Multiply < TernaryMultiply < Add • • Sequentiality enforced through data dependences ni.com 7

Inherent Parallelism Of Dataflow Programs Partially ordered program specification Possible orderings of node execution: Strictly Sequential Multiply < Square < TernaryMultiply < Add • Square < TernaryMultiply < Multiply < Add • Square < Multiply < TernaryMultiply < Add • Exploiting inherent parallelism (Multiply || Square) < TernaryMultiply < Add • (Multiply || (Square < TernaryMultiply)) < Add • Square < (Multiply || TernaryMultiply) < Add • • Sequentiality enforced through data dependences • Compiler determines the granularity of parallelism ni.com 8

Memory Allocation in Graphical Dataflow • Valid to substitute expression with its value • at any point in program execution Programmer’s perspective of memory allocation Each new output value in a new memory location ni.com 9

Memory Allocation in Graphical Dataflow • Valid to substitute expression with its value • at any point in program execution Programmer’s perspective of memory allocation Each new output value in a new memory location • Copy avoidance strategies to reduce memory overhead • Output data is inplace to input data wherever possible After copy-avoidance, only 3 memory allocations are needed ni.com 10

Copy-avoidance and Execution Schedule • TernaryMultiply < Multiply • Destructive update of MEM 2 • Pending read of MEM 2 • Cannot exploit parallelism ni.com 11

Copy-avoidance and Execution Schedule • TernaryMultiply < Multiply No destructive update of MEM2 • • Destructive update of MEM 2 TernaryMultiply < Multiply • • Pending read of MEM 2 TernaryMultiply || Multiply • TernaryMultiply > Multiply • • Cannot exploit parallelism Strong interplay between copy-avoidance, clumping and scheduling ni.com 12

LabVIEW • Platform for graphical dataflow programming • Owned by National Instruments • G dataflow programming language • Editor, compiler, runtime and debugger • Supported on Windows, Linux, Mac • Power PC, Intel architectures, FPGA User Interface Deployable Math Technology Integration Measurement and Analysis Control I/O ni.com 14

Scalable: From Kindergarten to Rocket Science ni.com 15

LabVIEW Program • LabVIEW program • Front Panel + Block Diagram ni.com 16

G Programming Language • Data types • Built-in types: integer and floating point types, Boolean, string etc • Aggregate types: arrays, clusters, classes • Data manipulation through built-in collection of primitives • Numeric palette (add, multiply, divide, subtract etc) • Array palette (Build array, Index array, concatenate array, decimate array etc) ni.com 17

G Programming Language – Control Constructs • Case Structure One or more diagrams (cases) • Value wired to selector terminal for switching • Boolean, string, integer, enumerated type • ni.com 18

G Programming Language – Control Constructs Loop structures While loop • Timed loop • For loop • LoopMax and LoopIndex boundary nodes • Shift registers to propagate Loop carried data through shift registers • data across iterations Tunnels (with optional indexing) • Unindexed tunnels propagate same data every iteration Indexed tunnels Array auto-indexing • Auto- accumulate iteration outputs • ni.com 19

LabVIEW Compiler mov byte ptr [esi+29h],0 cmp dword ptr [esi+30h],2 mov edx,dword ptr [esi+8] mov eax,dword ptr [esi+18h] je 0ABFFE39 mov ecx,dword ptr [esi+0Ch] mov ebp,dword ptr [esi+14h] mov byte ptr [ebp+1Bh],1 mov eax,esi mov dword ptr [esi+0Ch],eax mov esi,dword ptr [ebp+360h] add esp,8 cmp byte ptr [esi+2Ah],1 mov esi,dword ptr [esi] pop esi je 0ABFFE0F mov dword ptr [ebp+37Ch],esi mov ebp,edx mov eax,dword ptr [esi+1Ch] inc dword rd ptr [ebp+37Ch Ch] ] jmp ecx mov eax,dword ptr [eax+14h] add ebp,3Ch mov esi,dword ptr [ebp+48h] test eax,eax cmp byte ptr [esi+3Dh],1 mov dword ptr [esp],ebp je 0ABFFCEF call SubrVIExit (24D6450h) mov eax,dword ptr [ebp+68h] cmp byte ptr [eax+2Ah],1 test eax,eax je 0ABFFE09 jne 0ABFFCEF je 0ABFFE02 cmp dword ptr [eax+28h],0 jmp 0ABFFE0F mov esi,eax jne 0ABFFE1F mov ecx,dword ptr [ebp+44h] jmp 0ABFFE0F mov dword ptr [ebp+48h],0 xor eax,eax mov byte ptr [ebp+1Bh],0 mov dword ptr [eax+10h],esi mov edx,1 jmp 0ABFFD90 mov byte ptr [ebp+1Eh],0 lock cmpxchg dword ptr [ecx],edx mov ecx,dword ptr [ebp+44h] test eax,eax mov dword ptr [ecx],0 jne 0ABFFCEF cmp dword ptr [eax+14h],esi Compiler mov eax,dword ptr [esi+1Ch] jne 0ABFFE0F lea ecx,[ebp+4Ch] mov dword ptr [eax+14h],0 mov dword ptr [eax+10h],ecx cmp byte ptr [esi+29h],5 mov dword ptr [ebp+68h],eax jne 0ABFFE0F mov dword ptr [ebp+48h],esi mov dword ptr [esi+29h],2 cmp dword ptr [eax+14h],0 xor eax,eax jne 0ABFFD90 jmp 0ABFFD13 mov dword ptr [eax+14h],esi mov dword ptr [esi+1Ch],eax mov byte ptr [ebp+1Eh],1 mov dword ptr [eax+10h],esi ni.com 21

LabVIEW Compiler • Abstracts the complexities of programming o Memory management o Thread allocation o Language syntax • Edit-time semantic analysis • Compile on Load/Run/Save ni.com 22

Optimizing the LabVIEW Compiler DataFlow Intermediate Representation (DFIR) Block Diagram • High-level graph-based representation • Preserves execution semantics, dataflow, DFIR parallelism, and structure hierarchy • Developed internally at NI Transforms Target Machine Code ni.com 23

Optimizing the LabVIEW Compiler DataFlow Intermediate Representation (DFIR) Block Diagram • High-level graph-based representation • Preserves execution semantics, dataflow, DFIR parallelism, and structure hierarchy • Developed internally at NI Transforms Low-Level Virtual Machine (LLVM) • Low-level sequential representation LLVM • Knowledge of target machine characteristics • 3 rd party, Open Source Transforms Target Machine Code ni.com 24

What does DFIR look like? ni.com 25

DFIR Decomposition Transforms • Lowering high-level nodes and constructs • equivalent lower-level nodes Feedback Node Decomposition ni.com 26

DFIR Optimization Transforms ? Common Sub-expression Elimination ni.com 27

DFIR Optimization Transforms Common Sub-expression Elimination ni.com 28

DFIR Optimization Transforms Common Sub-expression Elimination Unreachable Code Elimination ni.com 29

DFIR Optimization Transforms ? Loop Invariant Code Motion ni.com 30

DFIR Optimization Transforms Loop Invariant Code Motion ni.com 31

DFIR Optimization Transforms Loop Invariant Code Motion Constant folding ni.com 32

DFIR Optimization Transforms Loop Invariant Code Motion Dead Code Elimination Constant folding ni.com 33

A Graphical Dataflow Programming Approach To High Performance - PowerPoint PPT Presentation

A Graphical Dataflow Programming Approach To High Performance Computing Somashekar acharya G. Bhaskaracharya National Instruments Bangalore ni.com 1 Outline Graphical Dataflow Programming LabVIEW Introduction and Demo LabVIEW

Naiad (Timely Dataflow) & Streaming Systems CS 848: Models and Applications of Distributed

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

Google Cloud Dataflow Cosmin Arad , Senior Software Engineer carad@google.com August 7, 2015

Quantifying Dataflow Analysis with Gradients in LLVM Gabriel Ryan 1 , Abhishek Shah 1 , Dongdong

approach to parallelism www.pervasivedatarush.com Agenda Background Dataflow Overview

Dataflow Supercomputers Michael J. Flynn Maxeler T echnologies and Stanford University Outline

CO444H Dataflow Dataflow frameworks Ben Livshits Masters Projects Available 1. Crashes to

Dataflow Process Network Goals Formalize dataflow process network Widely used in signal

Chapter 8 Dataflow Descriptions in VHDL 1 benyamin@mehr.sharif.edu Dataflow Description

Dataflow Testing Chapter 10 Dataflow Testing Testing All-Nodes and All-Edges in a control

Dataflow Testing Chapter 10 Dataflow Testing Testing All-Nodes and All-Edges in a control

WaveScalar Dataflow machine good at exploiting ILP dataflow parallelism + traditional

Dataflow computation, tree transformations and comonads Tarmo Uustalu, Tallinn Joint work with

Biggest Challenge: Dataflow in Meetup for Android Mike Castleman Meetup New York Android

Oversampling in a Dataflow Synchronous Language (Heptagon) erard 1 L eonard G 1 PARKAS team

Dataflow Execution Dataflow Execution Craig Knoblock University of Southern California This

HiTune: Dataflow-Based Performance Analysis for Big Data Cloud Jinquan Dai, Jie Huang, Shengsheng

Group (CTUG) November 14, 2017 LSU Ballroom B 1. CIO Update C. Manriquez 2. HR/CS Upgrade

Urban Air Mobility From Concept to Reality Aviation Technology Solutions About JDA 2 Aviation

NSA and Privacy By Arman Siddique History Cipher Bureau (1917-1919) Black Chamber(1919-1929)

CIEL: a universal execution engine for distributed data-flow computing Derek G. Murray, Malte

CIEL: A UNIVERSAL EXECUTION ENGINE FOR DISTRIBUTED DATA-FLOW COMPUTING Derek G. Murray, Malte

Beta Presentation Force Platform Ingestion Tool The Capstone Experience Team Rook Roy Barnes

Towards the Prioritization of Regression Test Suites with Data Flow Information Matthew J. Rummel

A Graphical Dataflow Programming Approach To High Performance - PowerPoint PPT Presentation

A Graphical Dataflow Programming Approach To High Performance Computing Somashekar acharya G. Bhaskaracharya National Instruments Bangalore ni.com 1 Outline Graphical Dataflow Programming LabVIEW Introduction and Demo LabVIEW

Naiad (Timely Dataflow) &amp; Streaming Systems CS 848: Models and Applications of Distributed

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

Google Cloud Dataflow Cosmin Arad , Senior Software Engineer carad@google.com August 7, 2015

Quantifying Dataflow Analysis with Gradients in LLVM Gabriel Ryan 1 , Abhishek Shah 1 , Dongdong

approach to parallelism www.pervasivedatarush.com Agenda Background Dataflow Overview

Dataflow Supercomputers Michael J. Flynn Maxeler T echnologies and Stanford University Outline

CO444H Dataflow Dataflow frameworks Ben Livshits Masters Projects Available 1. Crashes to

Dataflow Process Network Goals Formalize dataflow process network Widely used in signal

Chapter 8 Dataflow Descriptions in VHDL 1 benyamin@mehr.sharif.edu Dataflow Description

Dataflow Testing Chapter 10 Dataflow Testing Testing All-Nodes and All-Edges in a control

Dataflow Testing Chapter 10 Dataflow Testing Testing All-Nodes and All-Edges in a control

WaveScalar Dataflow machine good at exploiting ILP dataflow parallelism + traditional

Dataflow computation, tree transformations and comonads Tarmo Uustalu, Tallinn Joint work with

Biggest Challenge: Dataflow in Meetup for Android Mike Castleman Meetup New York Android

Oversampling in a Dataflow Synchronous Language (Heptagon) erard 1 L eonard G 1 PARKAS team

Dataflow Execution Dataflow Execution Craig Knoblock University of Southern California This

HiTune: Dataflow-Based Performance Analysis for Big Data Cloud Jinquan Dai, Jie Huang, Shengsheng

Group (CTUG) November 14, 2017 LSU Ballroom B 1. CIO Update C. Manriquez 2. HR/CS Upgrade

Urban Air Mobility From Concept to Reality Aviation Technology Solutions About JDA 2 Aviation

NSA and Privacy By Arman Siddique History Cipher Bureau (1917-1919) Black Chamber(1919-1929)

CIEL: a universal execution engine for distributed data-flow computing Derek G. Murray, Malte

CIEL: A UNIVERSAL EXECUTION ENGINE FOR DISTRIBUTED DATA-FLOW COMPUTING Derek G. Murray, Malte

Beta Presentation Force Platform Ingestion Tool The Capstone Experience Team Rook Roy Barnes

Towards the Prioritization of Regression Test Suites with Data Flow Information Matthew J. Rummel

Naiad (Timely Dataflow) & Streaming Systems CS 848: Models and Applications of Distributed