The old challenge: How to support users? - PowerPoint PPT Presentation

The old challenge: How to support users? mirko.rahn@itwm.fraunhofer.de Dagstuhl, November 2017

CC-HPC@itwm.fraunhofer.de 1 CC-HPC@itwm.fraunhofer.de • What we do: Holistic optimization, dealing with many core (structured) SMP machines (Xeon, Phi) and large (100–500 DSM nodes) and very large machines (1000–5000 DSM nodes). • Staff: 1 / 3 computer scientists, 1 / 3 mathematicians, 1 / 3 physicists, 1 / 3 else • Hardware: Many cores (10 4 –10 6 threads), multiple levels of memory (tape, cold disk, spinning disk, SSD, NVRAM, DRAM, HBM, Cache level 3, 2, 1, SIMD). • Costs: Computation is for free, data transfer dominates energy, latency, throughput. • Software high level: Asynchronous communication, task based programming (abstraction, load balancing, time skewing), framework & DSL. • Software mid level: hybrid process/thread, multi level cache blocking, zero indirection memory layouts, zero copy dependency management. • Software low level: SIMD intrinsics, coroutines (not threads), CAS & lock free. mirko.rahn@itwm.fraunhofer.de Dagstuhl, November 2017

Parallel programming: Shining reality. 16 nodes → 1584 nodes = 1 h → 1 min . 2 ⇒ Parallel programming: Shining reality. 16 nodes → 1584 nodes = ⇒ 1 h → 1 min . each of the 512 · 3 · 28 = 43008 cores has about 28 . 5 3 points. 8th order operator Problem size 1000 3 = ⇒ = ⇒ ((28 . 5 − 16) / 28 . 5) 3 ≈ 8 . 5% inner points. = ⇒ Latency is what matters. mirko.rahn@itwm.fraunhofer.de Dagstuhl, November 2017

Parallel programming: Reality. 3 Parallel programming: Reality. Server: Client: write /global/config on "setup": broadcast "setup" to all clients read /global/config Some clients might fail: Distributed file system violates POSIX!? Server: Client: write /global/config on "setup": fsync /global broadcast "setup" to all clients read /global/config Stalls cluster for minutes! Still some clients might fail because their local view on meta data is not updated. Server: Client: write /global/config on "setup": broadcast "setup" to all clients while (! exists /global/config ): sleep 1 read /global/config Typically works, so this is industrial production code. mirko.rahn@itwm.fraunhofer.de Dagstuhl, November 2017

Parallel programming: Distribute data. Beginner. 4 Parallel programming: Distribute data. Beginner. class EqualDistribution { size_t size_per_rank (); }; EqualDistribution distribution (size , nProc ); offset_t begin = iProc * distribution. size_per_rank (); // type error !? transfer (begin , distribution. size_per_rank ()); BROKEN: 12 elements on 5 ranks = ⇒ 3 elements per rank. ### ### ##? ##? ##? = ⇒ There is no “size per rank”! class ContiguousDistribution { offset_t begin (rank_t ); size_t size (rank_t i) = begin (i+1) - begin (i); // size_t operator - (offset_t , offset_t) }; Discipline! Programmer’s discipline! Teacher’s discipline! mirko.rahn@itwm.fraunhofer.de Dagstuhl, November 2017

Parallel programming: Distributed termination detection. Junior. 5 Parallel programming: Distributed termination detection. Junior. Each process i ∈ { 0 , . . . , P − 1 } : on init: t i = r i = 0, on send: t i = t i + 1, on recv: r i = r i + 1. Termination detection uses: bool Comm :: messages_in_flight () { long d = t - r; // note: signed !! long D; MPI_Allreduce (&d, &D, 1, MPI_LONG , MPI_SUM , MPI_COMM_WORLD ); return D != 0; } while (c. messages_in_flight ()) ... // global operation vs. resource utilization CORRECT! But does not scale! Also it mixes messages with control messages. (This is the state of the art in 2017.) mirko.rahn@itwm.fraunhofer.de Dagstuhl, November 2017

Parallel programming: Distributed termination detection. Professional. 6 Parallel programming: Distributed termination detection. Professional. Solved, e. g. Friedemann Mattern: Algorithms for distributed termination detection , 1987. ATTENTION: Inconsistent cuts are possible! Termination detection at scale: Asynchronously compute � P − 1 i =0 t i and � P − 1 i =0 r i twice . Termina- tion ⇐ ⇒ all four values are equal. Library? Interface? Transformation? Language construct? mirko.rahn@itwm.fraunhofer.de Dagstuhl, November 2017

Parallel programming: GASPI/GPI: Notifications. Professional. 7 Parallel programming: GASPI/GPI: Notifications. Professional. Fine grained synchronization: Remote notification attached to a single message. • write_notify (source, destination, notification) • waitsome (set of notifications) Structured stencil with double buffering: while (! done) { write_notify_to_all_neighbours (); compute_inner_region (); while (!all neighbour data received) { process (neighbour = wait_some (unprocessed neighbours )); } } Communication and computation happen at the same time. Requires a lot of programming discipline. Synthesis! mirko.rahn@itwm.fraunhofer.de Dagstuhl, November 2017

Parallel programming: GASPI/GPI: Notifications. Senior/Library writer. 8 Parallel programming: GASPI/GPI: Notifications. Senior/Library writer. Zero copy unstructured nearest neighbor: while (tile = lock_unlocked_and_ready_tile ()) { process (tile ); publish_progress (tile ); // update ready flags of neighbors unlock (tile ); } Task based middle ware: while (! done) { task = get_ready_task (); // blocking , maybe busy process (task ); publish_progress (task ); // might enable other tasks } Dynamic communication patterns. Debugging often a nightmare. Interface design! mirko.rahn@itwm.fraunhofer.de Dagstuhl, November 2017

Egor’s tool: Almost ready for the tool chain. 9 Egor’s tool: Almost ready for the tool chain. Implements GPI on top of threads rather than processes. WARNING: ThreadSanitizer: data race (pid=4141) Read of size 4 at 0x7f42f5ffc024 by thread T4 (mutexes: write M101): #0 dump main.c:9 (exe+0x00000006b9bd) #1 main main.c:41 (exe+0x00000006be3f) #2 operator() /devel/src/gpi/gpi_detail/GlobalState.cpp:50 (exe+0x0000000787e4) #3 execute_native_thread_routine /src/gcc-4.8.1/x86_64-unknown-linux-gnu/libstdc++-v3/src/c++11/../../../.././libstdc++-v3/src/c++11/thread.cc:84 Previous write of size 1 at 0x7f42f5ffc027 by thread T38: #0 memcpy /src/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:577 (exe+0x000000028090) #1 operator() /devel/src/gpi/gpi_detail/GlobalState.cpp:188 (exe+0x0000000781c2) #2 gpi_detail::Executor::threadMain() /devel/src/gpi/gpi_detail/Executor.cpp:23 (exe+0x00000007f869) #3 execute_native_thread_routine /src/gcc-4.8.1/x86_64-unknown-linux-gnu/libstdc++-v3/src/c++11/../../../.././libstdc++-v3/src/c++11/thread.cc:84 mirko.rahn@itwm.fraunhofer.de Dagstuhl, November 2017

Parallel programming: Intended data race: Weak minimum. Senior. 10 Parallel programming: Intended data race: Weak minimum. Senior. Let M ( t ) = min P − 1 i =0 f i ( t ) where f i is only known to process i and t is a point in time. To compute M ( t ) a barrier is required. = ⇒ Not possible at scale. Note: The barrier latency is not the problem, the problem is the accumulation of imbalances. Additional knowledge: All f i are strictly increasing. = ⇒ M ( t ) is strictly increasing. Easier to compute: Strictly increasing eventually consistent weak minimum W ( t ) ≤ M ( t ): Publish f i ( t ) asynchronously. (Publish wave.) Reduce all values upon request. (Reduction wave.) No synchronization between the waves. = ⇒ Data race. Race is okay as long as f i ( t ) is read/written atomically. Latency stays the same but work can be done asynchronously and therefore the imbalances are smeared out. Detect the race, prove the algorithm is correct with the race! mirko.rahn@itwm.fraunhofer.de Dagstuhl, November 2017

Parallel programming: Alltoall. Library writer. 11 Parallel programming: Alltoall. Library writer. @iProc: forall other ranks: async_write (data[other] -> other); wait_for_local_completion(); barrier(); work (received_data); BROKEN: Local completion plus barrier = ⇒ All data has been sent . Unknown whether or not data has been received ! Works fine on Infiniband (non overtaking) but fails on Cray and TCP Ethernet. mirko.rahn@itwm.fraunhofer.de Dagstuhl, November 2017

Olaf’s tool: Not ready for the tool chain. 12 Olaf’s tool: Not ready for the tool chain. mirko.rahn@itwm.fraunhofer.de Dagstuhl, November 2017

Parallel programming: Alltoall. 13 Parallel programming: Alltoall. @iProc: forall other ranks: async_write_notify (data[other] -> other, notify: data from iProc); outstanding_messages = nProc; while (outstanding_messages --> 0) sender = wait_for_notification(); partial_work (received_data[sender]); work (received_data); • CORRECT. No wait. No barrier. Better overlap. Partial work possible. • nProc many notifications → If memory scaling issue kicks in, then trade with latency. mirko.rahn@itwm.fraunhofer.de Dagstuhl, November 2017

Parallel programming: Reality. Undergraduate using tools. 14 Parallel programming: Reality. Undergraduate using tools. gitfan: parallel efficiency parallel efficiency normalized to 4 nodes 1.25 1 0.75 4x16 8x16 16x16 24x16 32x16 40x16 48x16 #nodes x #threads Legacy symbolic linear algebra now parallel. Tools help! mirko.rahn@itwm.fraunhofer.de Dagstuhl, November 2017

The old challenge: How to support users? - PowerPoint PPT Presentation

The old challenge: How to support users? mirko.rahn@itwm.fraunhofer.de Dagstuhl, November 2017 CC-HPC@itwm.fraunhofer.de 1 CC-HPC@itwm.fraunhofer.de What we do: Holistic optimization, dealing with many core (structured) SMP machines (Xeon,

OLD MCDONALD COUNTY JAIL PLAT MAP OLD MCDONALD COUNTY JAIL OLD MCDONALD COUNTY JAIL OLD

Fermilab Users Meeting Fermilab Users Meeting Fermilab Users Meeting Fermilab Users

ReSAKSS DATA CHALLENGE Annual Newsletter www.resakss.org/challenge ReSAKSS DATA CHALLENGE ANNUAL

VAST CHALLENGE 2017 Bianca Barnucz & Stephanie Wegscheidl OVERVIEW VAST Challenge

IT support Learning aim To be able to Support users IT personnel Users Manage IT

Old Dominion University Old Dominion Unive sity Old Dominion University Old Dominion University

The Hebrew Bible The Hebrew Bible (Old Testament) The Hebrew Bible (Old Testament) The

STEP CHALLENGE February 7 th March 8 th CHALLENGE OVERVIEW This Step Challenge is a fun

Michelin Challenge Bibendum 2014 CONTENT CHALLENGE BIBENDUM THINK & ACTION TANK TO

Ultimately our vision is about GRAND CHALLENGE using science to make a difference in the world.

New Challenge 10 New Challenge 10 June 1, 2007 Business environment Direction Challenge

Mail Service Quality Support: Mail Service Quality Support: Mail Service Quality Support: Mail

Users Satisfaction Survey & Users & producers dialogue Haitham Zeidan Dissemination and

Canadian Light Source Access Mechanisms Dean Chapman, Science Director International CLS users

USER-CENTRIC SOCIAL MULTIMEDIA COMPUTING FROM USERS,ON USERS,FOR USERS Jitao Sang Institute of

Unix : Name Resolution quick user overview Users and Groups Users and Groups Users

Run Time Storage Organization ASU Textbook Chapter 7.17.5, and 7.77.9 Tsan-sheng Hsu

FuZZan: Efficient Sanitizer Metadata Design for Fuzzing Yuseok Jeon 1 , WookHyun Han 2 , Nathan

COUNTY OF SAN DIEGO BUDGET OVERVIEW Ebony Shelton April 3, 2019 Director Financial Policy and

Health Enhancement Community Initiative Overview July 14, 2020 Note While this presentation

How Accurately do Respondents Identify the Race or Ethnicity of Their Interviewer Over the

this traffic signal will turn green? Why do I want to know when the signal turns green?

Closed Multicategory of A -Categories Yu. Bespalov 1 , V. Lyubashenko 2 , O. Manzyuk 3 1

The North Slope: Opportunities & Challenges Scott Jepsen, Vice President External Affairs

The old challenge: How to support users? - PowerPoint PPT Presentation

The old challenge: How to support users? mirko.rahn@itwm.fraunhofer.de Dagstuhl, November 2017 CC-HPC@itwm.fraunhofer.de 1 CC-HPC@itwm.fraunhofer.de What we do: Holistic optimization, dealing with many core (structured) SMP machines (Xeon,

OLD MCDONALD COUNTY JAIL PLAT MAP OLD MCDONALD COUNTY JAIL OLD MCDONALD COUNTY JAIL OLD

Fermilab Users Meeting Fermilab Users Meeting Fermilab Users Meeting Fermilab Users

ReSAKSS DATA CHALLENGE Annual Newsletter www.resakss.org/challenge ReSAKSS DATA CHALLENGE ANNUAL

VAST CHALLENGE 2017 Bianca Barnucz &amp; Stephanie Wegscheidl OVERVIEW VAST Challenge

IT support Learning aim To be able to Support users IT personnel Users Manage IT

Old Dominion University Old Dominion Unive sity Old Dominion University Old Dominion University

The Hebrew Bible The Hebrew Bible (Old Testament) The Hebrew Bible (Old Testament) The

STEP CHALLENGE February 7 th March 8 th CHALLENGE OVERVIEW This Step Challenge is a fun

Michelin Challenge Bibendum 2014 CONTENT CHALLENGE BIBENDUM THINK &amp; ACTION TANK TO

Ultimately our vision is about GRAND CHALLENGE using science to make a difference in the world.

New Challenge 10 New Challenge 10 June 1, 2007 Business environment Direction Challenge

Mail Service Quality Support: Mail Service Quality Support: Mail Service Quality Support: Mail

Users Satisfaction Survey &amp; Users &amp; producers dialogue Haitham Zeidan Dissemination and

Canadian Light Source Access Mechanisms Dean Chapman, Science Director International CLS users

USER-CENTRIC SOCIAL MULTIMEDIA COMPUTING FROM USERS,ON USERS,FOR USERS Jitao Sang Institute of

Unix : Name Resolution quick user overview Users and Groups Users and Groups Users

Run Time Storage Organization ASU Textbook Chapter 7.17.5, and 7.77.9 Tsan-sheng Hsu

FuZZan: Efficient Sanitizer Metadata Design for Fuzzing Yuseok Jeon 1 , WookHyun Han 2 , Nathan

COUNTY OF SAN DIEGO BUDGET OVERVIEW Ebony Shelton April 3, 2019 Director Financial Policy and

Health Enhancement Community Initiative Overview July 14, 2020 Note While this presentation

How Accurately do Respondents Identify the Race or Ethnicity of Their Interviewer Over the

this traffic signal will turn green? Why do I want to know when the signal turns green?

Closed Multicategory of A -Categories Yu. Bespalov 1 , V. Lyubashenko 2 , O. Manzyuk 3 1

The North Slope: Opportunities &amp; Challenges Scott Jepsen, Vice President External Affairs

VAST CHALLENGE 2017 Bianca Barnucz & Stephanie Wegscheidl OVERVIEW VAST Challenge

Michelin Challenge Bibendum 2014 CONTENT CHALLENGE BIBENDUM THINK & ACTION TANK TO

Users Satisfaction Survey & Users & producers dialogue Haitham Zeidan Dissemination and

The North Slope: Opportunities & Challenges Scott Jepsen, Vice President External Affairs