A Multi-Paradigm Approach to High-Performance Scientific - PowerPoint PPT Presentation

A Multi-Paradigm Approach to High-Performance Scientific Programming Pritish Jetley Parallel Programming Laboratory

What will the language of tomorrow look like? ● Language ● Runtime support for assistance modularity ● No sacrifice of ● Abstraction performance → Productivity 2

The future is now... Charm++ PGAS (UPC, CAF, X10, Chapel, Fortress,...) MPI/PGAS Hybrids 3

...or is it? How abstract can languages be? Can we reconcile program & language semantics? Can we express algorithms naturally? 4

Our premise ● Productivity comes from abstractions ● Specialization of abstractions also yields better parallel performance – e.g. relaxed semantics in Global Arrays 5

Our approach ● Plurality ● Specialization ● Interoperability 6

Our agenda Complete set of incomplete, interoperable languages Abstract, specialized languages Completeness through interoperation ≈ 7

This talk Productive message-driven programming (Charj) Static data flow (Charisma) Generative recursion (Divcon) Tree-based algorithms (Distree) Disciplined sharing of global data (MSA) 8

Productive message-driven programming With Charj 9

Charj ● Char m++/ J ava = Charj ● Keep the good bits of Charm++: – Overdecomposition onto migratable objects – Message driven execution – Asynchrony – Intelligent runtime system (load balancing, message combination, etc.) ● But use a source-to-source compiler to address its drawbacks 10

Compiler intervention for productivity ● Automatically determines parallel interfaces // foo.ci entry void bar(); // foo.cj // foo.h void bar(); void bar(); // foo.cpp void Foo::bar() {...} 11

Compiler intervention for productivity ● Automatically generate per-entry (de)serialization code class Particle { Vec3 position, accel, vel; Real mass, charge; } class Compute { void pairwise(Array<Particle> first, Array<Particle> second){ // only uses Particle position, charge } } 12

Compiler intervention for productivity ● Semantic checking and type safety w.foo(); // “plain”: asynchronous x.foo(); // local: preempts y.foo(); // sync: blocks z.foo(); // array: multiple invocations // foo.ci readonly int n; // foo.cpp int n; … 13 n = 17; // bug (?)

Compiler intervention for productivity ● Simple optimizations such as live variable analysis – Minimize checkpoint footprint – Find pertinent data to be offloaded to GPU 14

Charm++ workflow 15

Charj workflow 16

Static data flow programming with Charisma 17

Expressive scope of Charisma ● Structured grid methods ● Wavefront computations ● Dense linear algebra ● Permutation ● MG 18

Charisma ● Salient features – Object-oriented – Programmer decomposes work – Global view of data and control – Publish-consume model for data dependencies – Separation of parallel structure & serial code – Compiled into message-driven Charm++ specification 19

A Charisma program orchestrates the interactions of collections of objects 20

Indexed collections of objects ● Objects encapsulate data and work – Explicit specification of grain size and locality – Allows for adaptive overlap of comm./comp. – Load balancing, check pointing, etc. ● Unit of work is a method invocation 21

Objects communicate by publishing and consuming values 22

Communication between objects ● Method invocations publish, consume values ● Publish-consume pattern → data dependencies ● Parsed by compiler to generate code (p) obj1.foo(); ← obj2.bar(p); 23

Parallelism across objects is specified via the foreach construct 24

Object parallelism ● Invoke foo() on all objects in collection A ispace S = {0:N-1:1}; foreach (x,y in S * S){ A[x,y].foo(); } ● ispace construct gives index space 25

Section communication ● Dense linear algebra (e.g. LU) I II III Factorize Tri-solve Update 26

LU in Charisma for(K = 0; K < N/g; K++){ ispace Trailing = {K+1 : N/g-1}; // factorize diagonal block, and mcast (d) A[K,K].factorize(); ← // update active panels, and mcast foreach(j in Trailing){ (c[j]) A[K,j].utri(d); ← // row (r[j]) A[j,K].ltri(d); ← // column } // trailing matrix update foreach(i,j in Trailing * Trailing){ A[i,j].update(r[i], c[j]); } } 27

Others too... ● Blelloch (work-efficient) scan ● MG ● Pipelining (Gauss-Seidel) ● Scatter-gather, reduction, multicasts (OpenAtom) ● Other dense linear algebra (Gaussian elimination, forward/backward substitution, etc.) ● MD 28

Expressing generative recursion with Divcon 29

Generative Recursion Elegant Intuitive Implicit, tree-structured parallelism 30

Examples ● Sorting, Closest pair ● Convex hull, Delaunay triangulation ● Adaptive quadrature, etc. 31

Recursive Structure f(A) = g(f(A 1 ), f(A 2 ), …, f(A n )) let A1 = f(p1(A)), A2 = f(p2(A)), … An = f(pn(A)) in g(A1, A2, …, An); 32

Data movement from A → A i A A 1 A 2 A n ● memcpy in shared memory systems ● Network communication in distributed memory! 33

Quicksort Array<int> qsort (Array<int> A){ if(A.length() <= THRESH) return seq_sort(A); Array<int> LT,EQ,GT; int pivot = A[rand(0,A.length())]; (LT,EQ,GT) = {partition(A,pivot)}; return concat( qsort (LT) , EQ, qsort (GT)); } 34

Significant redistribution costs = 73 = pivot 129 21 35 35

Parallel execution Redistribute Redistribute Root Root Redistribute LT GT Redistribute seq 36

Delayed data redistribution Amortize redistribution costs over several recursive invocations Reduces communication But lowers concurrency 37

Best of both worlds Recv partition data Delay shuffle? Shuffle partition Yes No data do Adaptive read, map grain size on leaves control No Yes Shuffle feasible? Serial computation 38

Allows consolidation ● Redistribution delay → several (new) arrays distributed across same section of containers ● If operation-issuing tasks are kept on same PE, issued operations may be consolidated ● Consolidated operations applied together on target arrays 39

Allows consolidation Quicksort on 256 BG/P cores 40

A framework for expressing tree-based algorithms 41

Tree-based algorithms Structural (as opposed to generative ) recursion N -body codes, granular dynamics, SPH,... Distributed tree + recursive traversal procedure 42

Data decomposition Spatial entities Compact spatial partitioning of data over chares 43

Distributed tree Global, distributed tree Spatial entities Compact spatial partitions 44

“Chunked” distribution of data Global tree TreePieces 45

Algorithm comprises concurrent traversals on pieces ● Visitor + Iterator pattern ● Visitor defines – node() – localLeaf() – remoteLeaf() ● Iterate over nodes using traversal – Order decided by traversal 46

Traversal with reuse TreePiece Respond with subtree Callback (immediate if data present locally) Local Remote Traversals Software Cache Request remote PE1 PE2 node Send request message to owner TreePiece if data not present locally 47

Barnes-Hut control flow for(int iteration = 0; iteration < parameters.nIterations; iteration++){ // decompose particles onto tree pieces decomposerProxy.decompose(universe, CkCallbackResumeThread()); // build local trees & submit to framework treePieceProxy.build(CkCallbackResumeThread()); // merge trees mdtHandle.syncToMerge(CkCallbackResumeThread()); ... } 48

Barnes-Hut control flow for(int iteration = 0; iteration < parameters.nIterations; iteration++){ ... // initialize traversals topdown.synch(mdtHandle, CkCallbackResumeThread()); bottomup.synch(mdtHandle, CkCallbackResumeThread()); // start gravity and SPH computations treePieceProxy.gravity(CkCallback(CkReductionTarget(gravityDone), thisProxy)); treePieceProxy.sph(CkCallback(CkReductionTarget(sphDone), thisProxy)); // done with traversal topdown.done(CkCallbackResumeThread()); bottomup.done(CkCallbackResumeThread()); ... } 49

Barnes-Hut control flow for(int iteration = 0; iteration < parameters.nIterations; iteration++){ … // integrate particle trajectories treePieceProxy.integrate(CkCallbackResumeThread((void *&)result)); // delete distributed tree mdtHandle.syncToDelete(CkCallbackResumeThread()); } 50

Visitor code Class BarnesHutVisitor { bool node(const Node *n){ bool doOpen = open(leaf_, n); if(!doOpen){ gravity(n); return false; } return true; } ... } 51

Visitor code Class BarnesHutVisitor { void localLeaf(Key sourceKey, const Particle *sources, int nSources){ gravity(sources, nSources); } void remoteLeaf(Key sourceKey, const RemoteParticle *sources, int nSources){ gravity(sources, nSources); } }; 52

Distributed Shared Array programming with MSA 53

Multiphase Shared Arrays (MSA) Disciplined shared address space abstraction Dynamic modes of operation Read-only Write-exclusive Accumulate 54

MSA Model PE 0 PE 1 Mapping PE 2 PE 3 55

A Multi-Paradigm Approach to High-Performance Scientific - PowerPoint PPT Presentation

A Multi-Paradigm Approach to High-Performance Scientific Programming Pritish Jetley Parallel Programming Laboratory What will the language of tomorrow look like? Language Runtime support for assistance modularity No sacrifice of

PARADIGM Erkin Otles CS 838 PARADIGM Approach We developed an approach called PARADIGM

Paradigm Shift: Moving from Vertical Paradigm Shift: Moving from Vertical Paradigm Shift:

Prolog Declarative/logic paradigm Functional paradigm No assignment statement

Heterogeneous Multi-Computer System A New Platform for Multi-Paradigm Scientific Simulation

ESG Criteria: ESG Criteria: ESG Criteria: ESG Criteria: New paradigm that will redefine the

Scientific report Mariusz ynel April 22, 2015 Scientific report 2 Contents 1 Scientific

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

Hybrid Wireless Network on Chip: A New Paradigm in Multi-Core Design Partha Pratim Pande and

X10: a High-Productivity Approach to X10: a High-Productivity Approach to High Performance

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach Kilian Evang 20

Multi-plane multi-view approach to project the sphere viewing sphere Introduction Global Map

Ark: A Kernel For Multi-Paradigm Modeling MSDL Summer Presentation 2009 Xiaoxi Dong, MSDL,

Mozart-Oz Multi-paradigm Programming System Boris Mejas and the Mozart community

Multi-paradigm Declarative Languages Michael Hanus Christian-Albrechts-University of Kiel

Declarative Multi-paradigm Programming Michael Hanus Christian-Albrechts-University of Kiel

Shar Shared Memory ed Memory Pr Programming Paradigm ogramming Paradigm Ivan Girotto

Preparation for Sonship Mike Parsons 6. Angelic Realm (2) eg.freedomarc.org Preparation for

THEIR WORK To execute upon them the judgment written: this honour have all his saints

Emerson, Lake & Palmer And did those feet in ancient time, Walk upon England's mountains

Welcome King Jesus I'll Fly Away Verse 1 Verse 1 Lift up the gates Some glad morning when this

1 Peter Series Lesson #133 June 28, 2018 Dean Bible Ministries www.deanbibleministries.org Dr.

Selecting Secondary Prevention Will We Finally Have Answers? SPIN S. Andrew Josephson MD Carmen

Presented by Fiona Stewart, Cassandra ONeill & Monica Brinkerhoff Leadership for Change

5G DEVELOPMENT AND VALIDATION PLATFORM FOR GLOBAL INDUSTRY-SPECIFIC NETWORK SERVICES AND APPS

A Multi-Paradigm Approach to High-Performance Scientific - PowerPoint PPT Presentation

A Multi-Paradigm Approach to High-Performance Scientific Programming Pritish Jetley Parallel Programming Laboratory What will the language of tomorrow look like? Language Runtime support for assistance modularity No sacrifice of

PARADIGM Erkin Otles CS 838 PARADIGM Approach We developed an approach called PARADIGM

Paradigm Shift: Moving from Vertical Paradigm Shift: Moving from Vertical Paradigm Shift:

Prolog Declarative/logic paradigm Functional paradigm No assignment statement

Heterogeneous Multi-Computer System A New Platform for Multi-Paradigm Scientific Simulation

ESG Criteria: ESG Criteria: ESG Criteria: ESG Criteria: New paradigm that will redefine the

Scientific report Mariusz ynel April 22, 2015 Scientific report 2 Contents 1 Scientific

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

Hybrid Wireless Network on Chip: A New Paradigm in Multi-Core Design Partha Pratim Pande and

X10: a High-Productivity Approach to X10: a High-Productivity Approach to High Performance

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach Kilian Evang 20

Multi-plane multi-view approach to project the sphere viewing sphere Introduction Global Map

Ark: A Kernel For Multi-Paradigm Modeling MSDL Summer Presentation 2009 Xiaoxi Dong, MSDL,

Mozart-Oz Multi-paradigm Programming System Boris Mejas and the Mozart community

Multi-paradigm Declarative Languages Michael Hanus Christian-Albrechts-University of Kiel

Declarative Multi-paradigm Programming Michael Hanus Christian-Albrechts-University of Kiel

Shar Shared Memory ed Memory Pr Programming Paradigm ogramming Paradigm Ivan Girotto

Preparation for Sonship Mike Parsons 6. Angelic Realm (2) eg.freedomarc.org Preparation for

THEIR WORK To execute upon them the judgment written: this honour have all his saints

Emerson, Lake &amp; Palmer And did those feet in ancient time, Walk upon England's mountains

Welcome King Jesus I'll Fly Away Verse 1 Verse 1 Lift up the gates Some glad morning when this

1 Peter Series Lesson #133 June 28, 2018 Dean Bible Ministries www.deanbibleministries.org Dr.

Selecting Secondary Prevention Will We Finally Have Answers? SPIN S. Andrew Josephson MD Carmen

Presented by Fiona Stewart, Cassandra ONeill &amp; Monica Brinkerhoff Leadership for Change

5G DEVELOPMENT AND VALIDATION PLATFORM FOR GLOBAL INDUSTRY-SPECIFIC NETWORK SERVICES AND APPS

Emerson, Lake & Palmer And did those feet in ancient time, Walk upon England's mountains

Presented by Fiona Stewart, Cassandra ONeill & Monica Brinkerhoff Leadership for Change