Sangmin Seo
Assistant Computer Scientist Argonne National Laboratory sseo@anl.gov April 19, 2016
Argobots and its Application to Charm++
Charm++ Workshop 2016
Argobots and its Application to Charm++ Sangmin Seo Assistant - - PowerPoint PPT Presentation
Argobots and its Application to Charm++ Sangmin Seo Assistant Computer Scientist Argonne National Laboratory sseo@anl.gov April 19, 2016 Charm++ Workshop 2016 Argo Concurrency Team Argonne National Laboratory (ANL) Pavan Balaji
Charm++ Workshop 2016
Charm++ Workshop 2016
2
3
Charm++ Workshop 2016
int in[1000][1000],
#pragma omp parallel for for (i = 0; i < 1000; i++) { petsc_voodoo(i); } petsc_voodoo(int x) { #pragma omp parallel for for (j = 0; j < 1000; j++)
= cosine(in[x][j]); }
Why is traditional OpenMP’s performance so bad? The compiler cannot analyze petsc_voodoo to know whether the function might ever block or yield, so it has to assume that it might. Therefore a stack is needed to facilitate it. Creating additional Pthreads for each nesting is the simplest way to achieve this.
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 Time (s) # OMP Threads | Argobots ULTs/tasks (inner loop) GCC/pthreads GCC/Argobots ULTs GCC/Argobots tasks
Lower is better
Charm++ Workshop 2016
4
computation communication
Charm++ Workshop 2016
5
Charm++ Workshop 2016
6
thread
Context switch Context switch
Core Core Core Core Core Core Core Core
ULTs : Kernel threads : Charm++ Workshop 2016
7
1 10 100 1000 10000 100000 1 2 4 8 16 32 64 128 256 512 1024 2048
(ns) Number of Threads pthread ULT (Argobots) * measured using LMbench3
Charm++ Workshop 2016
8
Charm++ Workshop 2016
9
– Exec. Streams guarantee progress – Work Units execute to completion
– Consistency domains
– Software can manage consistency
– High-level languages/libraries such as OpenMP, Charm++ have more information about the user application (data locality, dependencies)
– Enables dynamism, but always managed by high-level systems
core
Processor
Programming Models
(MPI, OpenMP, Charm++, PaRSEC, …) U
User-Level Thread
T
Tasklet Lightweight Work Units
Execution Stream
Private pool Private pool Shared pool
U U U T T T U T U
Execution Stream Execution Stream
(http://collab.cels.anl.gov/display/argobots/)
* Team members: Sangmin Seo, Abdelhalim Amer, Pavan Balaji (ANL), Laxmikant Kale, Prateek Jindal (UIUC)
Charm++ Workshop 2016
10
– Sequential instruction stream
– Mapped efficiently to a hardware resource – Implicitly managed progress semantics
– Independent execution units in user space – Associated with an ES when running – Yieldable and migratable – Can make blocking calls
– Atomic units of work – Asynchronous completion via notifications – Not yieldable, migratable before execution – Cannot make blocking calls
S
Scheduler Pool
U
ULT
T
Tasklet
E
Event
ES1
Sched U U E E E E U S S T T T T T
Argobots Execution Model
ESn
– Stackable scheduler with pluggable strategies
– Mutex, condition variable, barrier, future
– Communication triggers
Charm++ Workshop 2016
11
U0 U1 T1 T2 U2 U3
U4 U5
Charm++ Workshop 2016
12
Sched U U E E E E U S S T T T T T U S U U U
yield() yield_to(target)
Charm++ Workshop 2016
13
10 100 1000 10000 1 2 4 8 16 24 32 36 40 48 56 64 72 Create/Join Time per ULT (cycles) Number of Execution Streams (Workers) Qthreads MassiveThreads (H) MassiveThreads (W) Argobots (ULT) Argobots (Tasklet)
Charm++ Workshop 2016
14
15
Charm++ Workshop 2016
16
Mini-apps and real world applications Charm++ model Converse runtime (threading, messaging, scheduler) Communication libraries (MPI, uGNI, PAMI, Verbs) Intelligent runtime Argobots (ULTs, Tasks, scheduling, etc.) Charm++ infrastructure Charm++ with Argobots
* Team members: Laxmikant Kale, Jonathan Lifflander, PrateekJindal (UIUC)
Charm++ Workshop 2016
Charm++ Workshop 2016
17
Converse runtime (threading, messaging, scheduler) Argobots (ULTs, Tasks, scheduling, etc.)
Charm++ Workshop 2016
18
Charm++ Workshop 2016
19
20
ES0
Sched
ES1
Sched
ES2
Sched
NRM
ESn-1
Sched
E E E
Argobots socket
Charm++ CilkBot PaRSEC MPI+Argobots
programming model runtimes and applications callback functions 1. [Argobots] Connect to NRM using a socket on ABT_init() 2. [Runtimes/applications] Register callback functions for shrink/expand events 3. [Runtimes/applications] Deregister callback functions when they terminate 4. [Argobots] Disconnect from NRM on ABT_finalize()
ABT_ENV_POWER_EVENT_HOSTNAME ABT_ENV_POWER_EVENT_PORT ABT_event_add_callback() ABT_event_del_callback() Charm++ Workshop 2016
21
ES0
Sched
ES1
Sched
ES2
Sched E E E
1. ES1 picks an event, which requests ES2 to be stopped 2. Ask the runtime using callbacks whether ES2 can be stopped 3. If OK, mark ES2 to need to stop so when the scheduler on ES2 checks events, it can be stopped 4. Notify the runtime that ES2 will be stopped 5. Create a ULT on ES0 6. When the scheduler on ES2 stops, ES2 is terminated 7. After ES2 is terminated, the ULT frees ES2 and sends a response to NRM
can check and handle events
U 1 2 3 4 5
NRM
7 6 Charm++ Workshop 2016
22
ES0
Sched
ES1
Sched
ES2
Sched E E E
1. ES0 picks an event, which requests to create an ES2 2. Ask the runtime using callbacks whether it can create ES2 3. If OK, invoke a callback function so the runtime creates ES2 4. Create ES2 5. Send a response to NRM
1 2 3 5
NRM
4 Charm++ Workshop 2016
23
Charm++ Workshop 2016
Charm++ Workshop 2016
24
ES1
Sched U U E E E E U S S T T T T T
Argobots
ESn
MPI+Argobots
ULT ES ULT ES
MPI
Argobots runtime Communication libraries Charm++ Applications
Charm++
Cilk “Worker” Argobots ES RWS ULT Fused ULT 1 Fused ULT 2 Fused ULT N …
CilkBots
PO GE TR SY TR TR PO GE GE TR TR SY SY GE PO TR SY PO SY SYPaRSEC OpenMP Mercury RPC
Origin Target
RPC proc RPC proc
OmpSs GridFTP, Kokkos, RAJA, ROSE, TASCEL, XMP, etc. External Connections
Charm++ Workshop 2016
25
Charm++ Workshop 2016
26
Charm++ Workshop 2016
27
Charm++ Workshop 2016
Group Lead
– Pavan Balaji (computer scientist and group lead)
Current Staff Members
– Abdelhalim Amer (postdoc) – Yanfei Guo (postdoc) – Rob Latham (developer) – Lena Oden (postdoc) – Ken Raffenetti (developer) – SangminSeo(assistant computer scientist) – Min Si (postdoc) – Min Tian (visiting scholar)
Past Staff Members
– Antonio Pena (postdoc) – Wesley Bland (postdoc) – Darius T. Buntinas (developer) – James S. Dinan (postdoc) – David J. Goodell (developer) – Huiwei Lu (postdoc) – Yanjie Wei (visiting scholar) – Yuqing Xiong (visiting scholar) – Jian Yu (visiting scholar) – Junchao Zhang (postdoc) – Xiaomin Zhu (visiting scholar) – AshwinAji (Ph.D.) – Abdelhalim Amer (Ph.D.) –
– Alex Brooks (Ph.D.) – Adrian Castello(Ph.D.) – Dazhao Cheng (Ph.D.) – James S. Dinan (Ph.D.) – Piotr Fidkowski (Ph.D.) – Priyanka Ghosh (Ph.D.) – SayanGhosh (Ph.D.) – Ralf Gunter (B.S.) – Jichi Guo (Ph.D.) – Yanfei Guo (Ph.D.) – Marius Horga (M.S.) – John Jenkins (Ph.D.) – Feng Ji (Ph.D.) – Ping Lai (Ph.D.) – Palden Lama (Ph.D.) – Yan Li (Ph.D.) – Huiwei Lu (Ph.D.) – Jintao Meng (Ph.D.) – Ganesh Narayanaswamy(M.S.) – Qingpeng Niu(Ph.D.) – Ziaul Haque Olive (Ph.D.) – David Ozog (Ph.D.) – Renbo Pang (Ph.D.) – Sreeram Potluri (Ph.D.) – Li Rao(M.S.) – Gopal Santhanaraman(Ph.D.) – Thomas Scogland (Ph.D.) – Min Si (Ph.D.) – Brian Skjerven (Ph.D.) – Rajesh Sudarsan(Ph.D.) – Lukasz Wesolowski (Ph.D.) – Shucai Xiao (Ph.D.) – Chaoran Yang (Ph.D.) – Boyu Zhang (Ph.D.) – Xiuxia Zhang (Ph.D.) – Xin Zhao (Ph.D.)
Advisory Board
– Pete Beckman (senior scientist) – Rusty Lusk (retired, STA) – Marc Snir (division director) – Rajeev Thakur (deputy director)
Current and Recent Students
Charm++ Workshop 2016
Charm++ Workshop 2016