Towards High-Level Execution Primitives for And-parallelism: - PowerPoint PPT Presentation

Towards High-Level Execution Primitives for And-parallelism: Preliminary Results Amadeo Casas 1 Manuel Carro 2 Manuel Hermenegildo 1 , 2 1 University of New Mexico (USA) 2 Technical University of Madrid (Spain) CICLOPS’07 - September 8 th CICLOPS’07 - September 8 th Casas, Carro, Hermenegildo (UNM, UPM) Towards High-Level Execution Primitives. . . 1 / 1

Introduction Introduction and motivation Parallelism (finally!) becoming mainstream thanks to multicore architectures – even on laptops! Declarative languages interesting for parallelization: ◮ Program close to problem description. ◮ Notion of control provides more flexibility. ◮ Amenability to semantics-preserving automatic parallelization. Significant previous work in logic and functional programming. Two objectives in this work: ◮ New, efficient, and more flexible approach for exploiting (unrestricted) (and-)parallelism in LP. ◮ Take advantage of new automatic parallelization for LP. CICLOPS’07 - September 8 th Casas, Carro, Hermenegildo (UNM, UPM) Towards High-Level Execution Primitives. . . 2 / 1

Introduction Types of parallelism in LP Two main types: ◮ Or-parallelism : explores in parallel alternative computation branches . ◮ And-parallelism : executes procedure calls in parallel. ⋆ Traditional parallelism: parbegin-parend, loop parallelization, divide-and-conquer, etc. ⋆ Often marked with &/2 operator: fork-join nested parallelism. CICLOPS’07 - September 8 th Casas, Carro, Hermenegildo (UNM, UPM) Towards High-Level Execution Primitives. . . 3 / 1

Introduction Types of parallelism in LP Two main types: ◮ Or-parallelism : explores in parallel alternative computation branches . ◮ And-parallelism : executes procedure calls in parallel. ⋆ Traditional parallelism: parbegin-parend, loop parallelization, divide-and-conquer, etc. ⋆ Often marked with &/2 operator: fork-join nested parallelism. Example (QuickSort: sequential and parallel versions) qsort([], []). qsort([], []). qsort([X|L], R) :- qsort([X|L], R) :- partition(L, X, SM, GT), partition(L, X, SM, GT), qsort(GT, SrtGT), qsort(GT, SrtGT) & qsort(SM, SrtSM), qsort(SM, SrtSM), append(SrtSM, [X|SrtGT], R). append(SrtSM, [X|SrtGT], R). We will focus on and-parallelism. ◮ Need to detect independent tasks. CICLOPS’07 - September 8 th Casas, Carro, Hermenegildo (UNM, UPM) Towards High-Level Execution Primitives. . . 3 / 1

Introduction Background: parallel execution and independence Correctness: same results as sequential execution. Efficiency: execution time ≤ than seq. program (no slowdown), assuming parallel execution has no overhead. s 1 Y := W+2; (+ (+ W 2) Y = W+2, s 2 X := Y+Z; Z) X = Y+Z, Imperative Functional CLP main :- p(X) :- X = [1,2,3]. s 1 p(X), s 2 q(X) :- X = [], large computation . q(X), write(X). q(X) :- X = [1,2,3]. Fundamental issue: p affects q (prunes its choices). ◮ q ahead of p is speculative . Independence: correctness + efficiency . CICLOPS’07 - September 8 th Casas, Carro, Hermenegildo (UNM, UPM) Towards High-Level Execution Primitives. . . 4 / 1

Introduction Related work and proposed solution Versions of and-parallelism previously implementated: &-Prolog, &-ACE, AKL, Andorra-I,... They rely on complex low-level machinery: ◮ Each agent: new WAM instructions, goal stack, parcall frames, markers, etc. Current implementation for shared-memory multiprocessors: ◮ Each agent: sequential Prolog machine + goal list + (mostly) Prolog code. Approach: rise components to the source language level: ◮ Prolog-level : goal publishing, goal searching, goal scheduling, “marker” creation (through choice-points),... ◮ C-level : low-level threading, locking, stack management, sharing of memory, untrailing,... → Simpler machinery and more flexibility. CICLOPS’07 - September 8 th Casas, Carro, Hermenegildo (UNM, UPM) Towards High-Level Execution Primitives. . . 5 / 1

Introduction Ciao and CiaoPP Ciao : new generation multi-paradigm language. ◮ Supports ISO-Prolog (as a library). ◮ Predicates, functions (including laziness), constraints, higher-order, objects, tabling, etc. ◮ Parallel, concurrent and distributed execution primitives. Preprocessor / environment (CiaoPP): ◮ Infers many properties such as types, pointer aliasing, non-failure, determinacy, termination, data sizes, cost, etc. ◮ Performs automatic verification of program assertions (and bug detection if assertions are proved false). ◮ Performs automatic parallelization and automatic granularity control . CICLOPS’07 - September 8 th Casas, Carro, Hermenegildo (UNM, UPM) Towards High-Level Execution Primitives. . . 6 / 1

Automatic Parallelization CDG-based automatic parallelization C onditional D ependency G raph: [TOPLAS’99, JLP’99] ◮ Vertices: possible sequential tasks (statements, calls, etc.) ◮ Edges: conditions needed for independence (e.g., variable sharing). Local or global analysis to remove checks in the edges. Annotation converts graph back to (now parallel) source code. icond(1−3) g1 g3 g1 g3 icond(1−2) icond(2−3) foo(...) :- g2 g2 g 1 (...), g 2 (...), Local/Global analysis g 3 (...). and simplification test(1−3) g1 g3 ( test(1−3) −> ( g1, g2 ) & g3 ; g1, ( g2 & g3 ) ) "Annotation" g2 Alternative: g1, ( g2 & g3 ) CICLOPS’07 - September 8 th Casas, Carro, Hermenegildo (UNM, UPM) Towards High-Level Execution Primitives. . . 7 / 1

Flexible Parallelism Primitives An alternative, more flexible source code annotation Classical parallelism operator &/2 : nested fork-join. However, more flexible constructions can be used to denote parallelism: ◮ G &> H G — schedules goal G for parallel execution and continues executing the code after G &> H G . ⋆ H G is a handler which contains / points to the state of goal G . ◮ H G <& — waits for the goal associated with H G to finish. ⋆ The goal H G was associated to has produced a solution; bindings for the output variables are available. Operator &/2 can be written as: A & B :- A &> H, call(B), H <& . Optimized deterministic versions: &!>/2 , <&!/1 . CICLOPS’07 - September 8 th Casas, Carro, Hermenegildo (UNM, UPM) Towards High-Level Execution Primitives. . . 8 / 1

Flexible Parallelism Primitives Expressing more parallelism More parallelism can be exploited a(X,Z) b(X) with these primitives. Take the sequential code below (dep. graph at the right) and c(Y) d(Y,Z) three possible parallelizations: p(X,Y,Z) :- p(X,Y,Z) :- p(X,Y,Z) :- a(X,Z), a(X,Z) & c(Y), c(Y) &> Hc, b(X), b(X) & d(Y,Z). a(X,Z), c(Y), b(X) &> Hb, d(Y,Z). p(X,Y,Z) :- Hc <&, c(Y) & (a(X,Z),b(X)), d(Y,Z), d(Y,Z). Hb <&. Sequential Restricted IAP Unrestricted IAP In this case: unrestricted parallelization at least as good (time-wise) as any restricted one, assuming no overhead. CICLOPS’07 - September 8 th Casas, Carro, Hermenegildo (UNM, UPM) Towards High-Level Execution Primitives. . . 9 / 1

Shared-Memory Implementation Low-level support Low-level parallelism primitives: apll:push goal(+Goal,+Det,-Handler). apll:find goal(-Handler). apll:goal available(+Handler). apll:retrieve goal(+Handler,-Goal). apll:goal finished(+Handler). apll:set goal finished(+Handler). apll:waiting(+Handler). Synchronization primitives: apll:suspend. apll:release(+Handler). apll:release some suspended thread. apll:enter mutex(+Handler). apll:enter mutex self. apll:release mutex(+Handler). apll:release mutex self. CICLOPS’07 - September 8 th Casas, Carro, Hermenegildo (UNM, UPM) Towards High-Level Execution Primitives. . . 10 / 1

Shared-Memory Implementation Prolog-level algorithms (I) Thread creation: create agents(0) :- !. agent :- create agents(N) :- apll:enter mutex self, N > 0, ( conc:start thread(agent), find goal and execute -> true N1 is N - 1, ; create agents(N1). apll:exit mutex self, apll:suspend ), agent. High-level goal publishing: Goal &!> Handler :- apll:push goal(Goal,det,Handler), apll:release some suspended thread. CICLOPS’07 - September 8 th Casas, Carro, Hermenegildo (UNM, UPM) Towards High-Level Execution Primitives. . . 11 / 1

Shared-Memory Implementation Prolog-level algorithms (II) Performing goal joins: Handler <&! :- perform other work(Handler) :- apll:enter mutex self, apll:enter mutex self, ( ( apll:goal available(Handler) -> apll:goal finished(Handler), apll:retrieve goal(Handler,Goal), apll:exit mutex self, apll:exit mutex self, ; call(Goal) ( ; find goal and execute -> true apll:exit mutex self, ; perform other work(Handler) apll:exit mutex self, ). apll:suspend ), perform other work(Handler) ). CICLOPS’07 - September 8 th Casas, Carro, Hermenegildo (UNM, UPM) Towards High-Level Execution Primitives. . . 12 / 1

Shared-Memory Implementation Prolog-level algorithms (III) Search for parallel goals: find goal and execute :- apll:find goal(Handler), apll:exit mutex self, apll:retrieve goal(Handler,Goal), call(Goal), apll:enter mutex(Handler), apll:set goal finished(Handler), ( apll:waiting(Handler) -> apll:release(Handler) ; true ), apll:exit mutex(Handler). CICLOPS’07 - September 8 th Casas, Carro, Hermenegildo (UNM, UPM) Towards High-Level Execution Primitives. . . 13 / 1

Towards High-Level Execution Primitives for And-parallelism: - PowerPoint PPT Presentation

Towards High-Level Execution Primitives for And-parallelism: Preliminary Results Amadeo Casas 1 Manuel Carro 2 Manuel Hermenegildo 1 , 2 1 University of New Mexico (USA) 2 Technical University of Madrid (Spain) CICLOPS07 - September 8 th

Towards a High-Level Implementation of Execution Primitives for Unrestricted, Independent

RenderMan Primitives RenderMan Primitives CSCD 472? Slide 1 4/5/10 Primitive Attributes

MASTERING STRATEGY EXECUTION 18 BEST PRACTICES FOR STRATEGY EXECUTION STRATEGY EXECUTION AS

Implementing new Topology Mapping Primitives Guillermo Baltra Prior Work Primitives for

UN High UN High UN High UN High- - - -Level Meeting on TB Level Meeting on TB Level Meeting

Verilog HDL:Digital Design and Modeling Chapter 6 User-Defined Primitives Chapter 6

Beyond Block I/O: Rethinking / Traditional Storage Primitives Traditional Storage Primitives

Towards a High-Level Implementation of Flexible Parallelism Primitives for Symbolic Languages

Towards High- -performance performance Towards High Flow- -level Packet Processing level

Symbolic Execution of Security Protocol Impl.: Handling Cryptographic Primitives Mathy Vanhoef

execution states with swapping Processes, Execution, and State 3F. Execution State Model exit

PowerWizard Level 1.0 & Level 2.0 Control Systems Training Systems Comparison Level 2

Completed Rehab of Level 1 and Level 3 Completed Bypass Adit and Entry into Level 1

Towards Industrial Adoption of High-Order Methods Towards Industrial Adoption of High-Order

HelP: High-level Primitives for Large- Scale Graph Processing Semih Salihoglu Stanford

COMPUTER GRAPHICS COURSE Rasterization Architectures Georgios Papaioannou - 2014 A High Level

Filtrations induced by tilting torsion pairs alberto tonolo Universit di Padova P arnu,

Boa Robert Dyer, Hoan Nguyen, Hridesh Rajan, and Tien Nguyen

THE COVID-19 PANDEMIC: What Can We Learn from Social Media at Scale and in Real-time? Jiebo Luo,

On Pronormal Subgroups of Finite Groups Natalia V. Maslova Krasovskii Institute of Mathematics

Design & Controls JOSEPH JORDAN, SENIOR APPLICATION ENGINEER joseph.jordan@xyleminc.com

Joint Space-Division and Multiplexing: How to Achieve Massive MIMO Gains in FDD Systems Giuseppe

An Algorithmic Approach to Stability Verification of Hybrid Systems: A Summary Miriam Garca Soto

BepiColombo-a planetary mission to Mercury Focal plane instrumentation for the MIXS instrument

Towards High-Level Execution Primitives for And-parallelism: - PowerPoint PPT Presentation

Towards High-Level Execution Primitives for And-parallelism: Preliminary Results Amadeo Casas 1 Manuel Carro 2 Manuel Hermenegildo 1 , 2 1 University of New Mexico (USA) 2 Technical University of Madrid (Spain) CICLOPS07 - September 8 th

Towards a High-Level Implementation of Execution Primitives for Unrestricted, Independent

RenderMan Primitives RenderMan Primitives CSCD 472? Slide 1 4/5/10 Primitive Attributes

MASTERING STRATEGY EXECUTION 18 BEST PRACTICES FOR STRATEGY EXECUTION STRATEGY EXECUTION AS

Implementing new Topology Mapping Primitives Guillermo Baltra Prior Work Primitives for

UN High UN High UN High UN High- - - -Level Meeting on TB Level Meeting on TB Level Meeting

Verilog HDL:Digital Design and Modeling Chapter 6 User-Defined Primitives Chapter 6

Beyond Block I/O: Rethinking / Traditional Storage Primitives Traditional Storage Primitives

Towards a High-Level Implementation of Flexible Parallelism Primitives for Symbolic Languages

Towards High- -performance performance Towards High Flow- -level Packet Processing level

Symbolic Execution of Security Protocol Impl.: Handling Cryptographic Primitives Mathy Vanhoef

execution states with swapping Processes, Execution, and State 3F. Execution State Model exit

PowerWizard Level 1.0 &amp; Level 2.0 Control Systems Training Systems Comparison Level 2

Completed Rehab of Level 1 and Level 3 Completed Bypass Adit and Entry into Level 1

Towards Industrial Adoption of High-Order Methods Towards Industrial Adoption of High-Order

HelP: High-level Primitives for Large- Scale Graph Processing Semih Salihoglu Stanford

COMPUTER GRAPHICS COURSE Rasterization Architectures Georgios Papaioannou - 2014 A High Level

Filtrations induced by tilting torsion pairs alberto tonolo Universit di Padova P arnu,

Boa Robert Dyer, Hoan Nguyen, Hridesh Rajan, and Tien Nguyen

THE COVID-19 PANDEMIC: What Can We Learn from Social Media at Scale and in Real-time? Jiebo Luo,

On Pronormal Subgroups of Finite Groups Natalia V. Maslova Krasovskii Institute of Mathematics

Design &amp; Controls JOSEPH JORDAN, SENIOR APPLICATION ENGINEER joseph.jordan@xyleminc.com

Joint Space-Division and Multiplexing: How to Achieve Massive MIMO Gains in FDD Systems Giuseppe

An Algorithmic Approach to Stability Verification of Hybrid Systems: A Summary Miriam Garca Soto

BepiColombo-a planetary mission to Mercury Focal plane instrumentation for the MIXS instrument

PowerWizard Level 1.0 & Level 2.0 Control Systems Training Systems Comparison Level 2

Design & Controls JOSEPH JORDAN, SENIOR APPLICATION ENGINEER joseph.jordan@xyleminc.com