Outline 1) Introduc+on 8) Applica1on Design Object Design - PowerPoint PPT Presentation

Generated ¡Classes ¡ • CProxy_ YourClassName ¡ • The ¡type ¡of ¡the ¡proxy ¡handle ¡returned ¡by ¡the ¡constructor ¡ • For ¡use ¡in ¡method ¡invoca1ons ¡ ¡ • CBase_ YourClassName ¡ • YourClassName ¡should ¡inherit ¡from ¡this ¡ 22 ¡

Hello ¡World ¡Example ¡ • hello.cpp ¡file ¡ • hello.ci ¡file ¡ #include ¡<stdio.h> ¡ mainmodule ¡hello ¡{ ¡ #include ¡"hello.decl.h" ¡ mainchare ¡MyMain ¡{ ¡ class ¡MyMain ¡: ¡public ¡CBase_MyMain ¡{ ¡ entry ¡MyMain(CkArgMsg* ¡m); ¡ public: ¡ }; ¡ My Main(CkArgMsg* ¡m) ¡{ ¡ }; ¡ CkPrintf("Hello ¡World!\n“); ¡ CkExit(); ¡ }; ¡ }; ¡ #include ¡"hello.def.h" ¡ 23 ¡

Charm ¡Interface: ¡Modules ¡ • Charm++ ¡programs ¡are ¡organized ¡as ¡a ¡collec1on ¡of ¡modules ¡ • Each ¡module ¡defines ¡one ¡or ¡more ¡chares ¡ • The ¡module ¡that ¡contains ¡the ¡ mainchare , ¡is ¡declared ¡as ¡the ¡mainmodule ¡ • Each ¡module, ¡when ¡compiled, ¡generates ¡two ¡files: ¡ ¡ ¡ ¡ MyModule.decl.h and ¡ MyModule.def.h ¡ • .ci ¡file ¡ module ¡MyModule ¡{ ¡ ¡ ¡// ¡... ¡chare ¡definitions ¡... ¡ }; ¡ 24 ¡

Charm ¡Interface: ¡Chares ¡ • Chares ¡are ¡parallel ¡objects ¡that ¡are ¡managed ¡by ¡the ¡RTS ¡ • Each ¡chare ¡has ¡a ¡set ¡of ¡ entry ¡ methods , ¡which ¡are ¡asynchronous ¡methods ¡that ¡may ¡be ¡invoked ¡remotely ¡ • The ¡following ¡code, ¡when ¡compiled, ¡generates ¡a ¡C++ ¡class ¡ CBase_MyChare ¡that ¡encapsulates ¡the ¡RTS ¡ object ¡ • This ¡generated ¡class ¡is ¡extended ¡and ¡implemented ¡in ¡the ¡.C ¡file ¡ • .ci ¡file ¡ chare ¡MyChare ¡{ ¡ ¡ ¡// ¡... ¡entry ¡method ¡declarations ¡... ¡ }; ¡ • .C ¡file ¡ class ¡MyChare ¡: ¡public ¡Cbase_MyChare ¡{ ¡ ¡ ¡// ¡... ¡ ¡entry ¡method ¡definitions ¡... ¡ }; ¡ 25 ¡

Charm ¡Interface: ¡Entry ¡Methods ¡ • Entry ¡methods ¡are ¡C++ ¡methods ¡that ¡can ¡be ¡remotely ¡and ¡asynchronously ¡invoked ¡by ¡another ¡ chare ¡ • .ci ¡file ¡ entry ¡MyChare(); ¡/ ∗ ¡constructor ¡entry ¡ method ¡ ∗ / ¡ entry ¡void ¡foo(); ¡ entry ¡void ¡ bar(int ¡param); ¡ • .C ¡file ¡ MyChare::MyChare() ¡{ ¡ / ∗ ... ¡constructor ¡code ¡... ∗ / ¡ } ¡ MyChare::foo() ¡{ ¡ / ∗ ... ¡code ¡to ¡execute ¡... ∗ / ¡ } ¡ ¡ MyChare::bar( int ¡param) ¡{ ¡/ ∗ ... ¡code ¡to ¡execute ¡... ∗ / ¡ } ¡ 26 ¡

Charm ¡Interface: ¡mainchare ¡ • Execu1on ¡begins ¡with ¡the ¡mainchare’s ¡constructor ¡ • The ¡mainchare’s ¡constructor ¡takes ¡a ¡pointer ¡to ¡system-‑defined ¡class ¡ CkArgMsg • CkArgMsg ¡contains ¡ argv ¡and ¡ argc • The ¡mainchare ¡will ¡typically ¡create ¡some ¡addi1onal ¡chares ¡ 27 ¡

Crea1ng ¡a ¡Chare ¡ • A ¡chare ¡declared ¡as ¡ chare MyChare {...}; can ¡be ¡instan1ated ¡by ¡the ¡following ¡ call: ¡ CProxy_MyChare::ckNew(... ¡constructor ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡arguments ¡...); ¡ • To ¡communicate ¡with ¡this ¡class ¡in ¡the ¡future, ¡a ¡ proxy ¡to ¡it ¡must ¡be ¡retained ¡ ¡ CProxy_MyChare ¡proxy ¡= ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡CProxy_MyChare::ckNew(arg1); ¡ 28 ¡

Chare ¡Proxies ¡ • A ¡chare’s ¡own ¡proxy ¡can ¡be ¡obtained ¡through ¡a ¡special ¡variable ¡ thisProxy • Chare ¡proxies ¡can ¡also ¡be ¡passed ¡so ¡chares ¡can ¡learn ¡about ¡others ¡ ¡ • In ¡this ¡snippet, ¡ MyChare ¡learns ¡about ¡a ¡chare ¡instance ¡ main ¡, ¡and ¡then ¡invokes ¡a ¡ method ¡on ¡it: ¡ • .ci ¡file ¡ entry ¡void ¡ foobar2(CProxy_Main ¡main); ¡ • .C ¡file ¡ MyChare::foobar2(CProxy_Main ¡main) ¡{ ¡ ¡ ¡ ¡ ¡main.foo(); ¡ } ¡ 29 ¡

Charm ¡Termina1on ¡ • There ¡is ¡a ¡special ¡system ¡call ¡ CkExit() that ¡terminates ¡ the ¡parallel ¡execu1on ¡on ¡all ¡processors ¡(but ¡it ¡is ¡called ¡on ¡ one ¡processor) ¡and ¡performs ¡the ¡requisite ¡cleanup ¡ • The ¡tradi1onal ¡ exit() is ¡insufficient ¡because ¡it ¡only ¡ terminates ¡one ¡process, ¡not ¡the ¡en1re ¡parallel ¡job ¡(and ¡will ¡ cause ¡a ¡hang) ¡ • CkExit() should ¡be ¡called ¡when ¡you ¡can ¡safely ¡ terminate ¡the ¡applica1on ¡(you ¡may ¡want ¡to ¡synchronize ¡ before ¡calling ¡this) ¡ 30 ¡

Chare ¡Crea1on ¡Example: ¡.ci ¡file ¡ mainmodule ¡MyModule ¡{ ¡ ¡ ¡ ¡ mainchare ¡Main ¡{ ¡ ¡ ¡ ¡ ¡ ¡ ¡ entry ¡Main(CkArgMsg ∗ ¡m); ¡ ¡ ¡ ¡}; ¡ ¡ ¡ ¡ ¡ chare ¡ Simple ¡{ ¡ ¡ ¡ ¡ ¡ ¡ ¡ entry ¡ Simple( int ¡x, ¡ double ¡y); ¡ ¡ ¡ ¡}; ¡ }; ¡ 31 ¡

Chare ¡Crea1on ¡Example: ¡.C ¡file ¡ #include ¡"MyModule.decl.h" ¡ class ¡Main ¡: ¡public ¡CBase_Main ¡{ ¡ ¡public: ¡ ¡ ¡Main(CkArgMsg ∗ ¡m) ¡{ ¡ ¡ ¡ ¡ ¡CkPrintf("Hello ¡World!\n"); ¡ ¡ ¡ ¡ ¡double ¡pi ¡= ¡3.1415; ¡ ¡ ¡ ¡ ¡CProxy_Simple::ckNew(12, ¡pi); ¡ ¡ ¡}; ¡ }; ¡ class ¡Simple ¡: ¡public ¡CBase_Simple ¡{ ¡ ¡public: ¡ ¡ ¡Simple(int ¡x, ¡double ¡y) ¡{ ¡ ¡ ¡ ¡ ¡CkPrintf("From ¡chare ¡on ¡%d ¡Area ¡of ¡a ¡circle ¡of ¡radius ¡%d ¡is ¡%g\n“, ¡CkMyPe(), ¡x,y*x*x); ¡ ¡ ¡ ¡ ¡CkExit(); ¡ ¡ ¡}; ¡ }; ¡ #include ¡"MyModule.def.h" ¡ 32 ¡

Asynchronous ¡Methods ¡ • Entry ¡methods ¡are ¡invoked ¡by ¡performing ¡a ¡C++ ¡method ¡call ¡on ¡a ¡chare’s ¡proxy ¡ CProxy_MyChare ¡proxy ¡= ¡ ¡ ¡CProxy_MyChare::ckNew( /* ¡... ¡constructor ¡arguments ¡...*/ ); ¡ ¡ proxy.foo(); ¡ proxy.bar(5); ¡ • The ¡ foo ¡and ¡ bar ¡methods ¡will ¡then ¡be ¡executed ¡with ¡the ¡arguments, ¡wherever ¡the ¡created ¡ chare, ¡ MyChare , ¡happens ¡to ¡live ¡ • The ¡policy ¡is ¡one-‑at-‑a-‑1me ¡scheduling ¡(that ¡is, ¡one ¡entry ¡method ¡on ¡one ¡chare ¡executes ¡on ¡a ¡ processor ¡at ¡a ¡1me) ¡ 33 ¡

Asynchronous ¡Methods ¡ • Method ¡invoca1on ¡is ¡not ¡ordered ¡(between ¡chares, ¡entry ¡methods ¡on ¡one ¡chare, ¡etc.)! ¡ • For ¡example, ¡if ¡a ¡chare ¡executes ¡this ¡code: ¡ CProxy_MyChare ¡proxy ¡= ¡CProxy_MyChare::ckNew(); ¡ proxy.foo(); ¡ proxy.bar(5); ¡ • These ¡prints ¡may ¡occur ¡in ¡ any ¡order ¡ MyChare::foo() ¡{ ¡ ¡ ¡CkPrintf(" ¡foo ¡executes\n"); ¡ } ¡ MyChare::bar(int ¡param) ¡{ ¡ ¡ ¡CkPrintf(" ¡bar ¡executes\n"); ¡ ¡ } ¡ 34 ¡

Asynchronous ¡Methods ¡ • For ¡example, ¡if ¡a ¡chare ¡invokes ¡the ¡same ¡entry ¡method ¡twice: ¡ proxy.bar(7); ¡ proxy.bar(5); ¡ • These ¡may ¡be ¡delivered ¡in ¡ any ¡order ¡ MyChare::bar(int ¡param) ¡{ ¡ ¡ ¡CkPrintf(“bar ¡executes ¡with ¡%d\n”); ¡ } ¡ • Output: ¡ bar ¡executes ¡with ¡5 ¡ ¡ bar ¡executes ¡with ¡7 ¡ OR ¡ bar ¡executes ¡with ¡7 ¡ bar ¡executes ¡with ¡5 ¡ 35 ¡

Asynchronous ¡Example: ¡.ci ¡file ¡ mainmodule ¡MyModule ¡{ ¡ ¡ ¡mainchare ¡Main ¡{ ¡ ¡ ¡ ¡ ¡entry ¡Main(CkArgMsg ¡ ∗ m); ¡ ¡ ¡}; ¡ ¡ ¡chare ¡Simple ¡{ ¡ ¡ ¡ ¡ ¡entry ¡Simple(double ¡y); ¡ ¡ ¡ ¡ ¡entry ¡void ¡findArea(int ¡radius, ¡bool ¡done); ¡ ¡ ¡}; ¡ }; ¡ 36 ¡

Does ¡this ¡program ¡execute ¡correctly? ¡ struct ¡Main ¡: ¡public ¡CBase_Main ¡{ ¡ ¡ ¡Main(CkArgMsg ∗ ¡m) ¡{ ¡ ¡ ¡ ¡ ¡CProxy_Simple ¡sim ¡= ¡CProxy_Simple::ckNew(3.1415); ¡ ¡ ¡ ¡ ¡for ¡(int ¡i ¡= ¡1; ¡i ¡< ¡10; ¡i++) ¡sim.findArea(i, ¡false); ¡ ¡ ¡ ¡ ¡sim.findArea(10, ¡true); ¡ ¡ ¡}; ¡ }; ¡ struct ¡Simple ¡: ¡public ¡CBase_Simple ¡{ ¡ ¡ ¡double ¡y; ¡ ¡ ¡Simple(double ¡pi) ¡{ ¡y ¡= ¡pi; ¡} ¡ ¡ ¡void ¡findArea(int ¡r, ¡bool ¡done) ¡{ ¡ ¡ ¡ ¡ ¡CkPrintf("Area ¡of ¡a ¡circle ¡of ¡radius ¡%d ¡is ¡%f\n" ¡,r, ¡y ∗ r ∗ r); ¡ ¡ ¡ ¡ ¡if ¡(done) ¡CkExit(); ¡ ¡ ¡} ¡ }; ¡ 37 ¡

Data ¡types ¡and ¡entry ¡methods ¡ • You ¡can ¡pass ¡basic ¡C++ ¡types ¡to ¡entry ¡methods ¡(int, ¡char, ¡bool) ¡ • C++ ¡STL ¡data ¡structures ¡can ¡be ¡passed ¡ • Arrays ¡of ¡basic ¡data ¡types ¡can ¡also ¡be ¡passed ¡like ¡this: ¡ ¡ • .ci ¡file: ¡ entry ¡void ¡foobar(int ¡length, ¡int ¡data[length]); ¡ • .C ¡file ¡ MyChare::foobar(int ¡length, ¡int ∗ ¡data) ¡{ ¡ ¡ ¡// ¡... ¡foobar ¡code ¡... ¡ } ¡ 38 ¡

ReadOnly ¡Variables ¡ • Global ¡Constants ¡ ¡ • Ini1alized ¡in ¡MainChare ¡ readonly ¡int ¡foo; ¡ readonly ¡CProxy_Main ¡mainProxy; ¡ .C ¡ file: ¡at ¡global ¡scope ¡ int ¡foo; ¡ CProxy_Main ¡mainProxy; ¡ ¡ .C ¡ file: ¡inside ¡mainchare’s ¡constructor ¡ foo=2; ¡ mainProxy=thisProxy; ¡ ¡ 39 ¡

Collec1ons ¡of ¡Objects: ¡Concepts ¡ • Objects ¡can ¡be ¡grouped ¡into ¡indexed ¡collec1ons ¡ ¡ ¡ • Basic ¡examples ¡ • Matrix ¡block ¡ • Chunk ¡of ¡unstructured ¡mesh ¡ • Por1on ¡of ¡distributed ¡data ¡structure ¡ ¡ • Volume ¡of ¡simula1on ¡space ¡ ¡ • Advanced ¡Examples ¡ • Abstract ¡por1ons ¡of ¡computa1on ¡ • Interac1ons ¡among ¡basic ¡objects ¡or ¡underlying ¡en11es ¡ 40 ¡

Collec1ons ¡of ¡Objects ¡ • Structured: ¡1D, ¡2D, ¡. ¡. ¡. ¡, ¡6D ¡ • Unstructured: ¡Anything ¡hashable ¡ ¡ • Dense ¡ • Sparse ¡ • Sta1c ¡-‑ ¡all ¡created ¡at ¡once ¡ • Dynamic ¡-‑ ¡elements ¡come ¡and ¡go ¡ 41 ¡

Declaring ¡a ¡Chare ¡Array ¡ ¡ • .ci ¡file: ¡ • .C ¡file: ¡ struct ¡foo ¡: ¡public ¡CBase_foo ¡{ ¡ array ¡[1D] ¡foo ¡{ ¡ ¡ ¡foo() ¡{ ¡} ¡ ¡ ¡entry ¡foo(); ¡ // ¡constructor ¡ ¡ ¡foo(CkMigrateMessage ∗ ) ¡{ ¡} ¡ ¡ ¡// ¡... ¡entry ¡methods ¡... ¡ ¡ ¡// ¡... ¡entry ¡methods ¡... ¡ }; ¡ }; ¡ array ¡[2D] ¡bar ¡{ ¡ struct ¡bar ¡: ¡public ¡CBase_bar ¡{ ¡ ¡ ¡entry ¡bar(); ¡ // ¡constructor ¡ ¡ ¡bar() ¡{ ¡} ¡ ¡ ¡// ¡... ¡entry ¡methods ¡... ¡ ¡ ¡bar(CkMigrateMessage ∗ ) ¡{ ¡} ¡ }; ¡ }; ¡ 42 ¡

Construc1ng ¡a ¡Chare ¡Array ¡ • Constructed ¡much ¡like ¡a ¡regular ¡chare ¡ • The ¡size ¡of ¡each ¡dimension ¡is ¡passed ¡to ¡the ¡constructor ¡ • Dimensional ¡parameters ¡are ¡placed ¡arer ¡other ¡constructor ¡arguments ¡ CProxy_foo::ckNew(…,10); ¡ CProxy_bar::ckNew(…,5, ¡5); ¡ • The ¡proxy ¡may ¡be ¡retained: ¡ CProxy_foo ¡myFoo ¡= ¡CProxy_foo::ckNew(…, ¡10); ¡ • The proxy represents the entire array, and may be indexed to obtain a proxy to an individual element in the array myFoo[4].invokeEntry(); ¡ 43 ¡

thisIndex ¡ • 1d: ¡ thisIndex returns ¡the ¡index ¡of ¡the ¡current ¡chare ¡array ¡element ¡ ¡ • 2d: ¡ thisIndex.x ¡ and ¡ thisIndex.y ¡return ¡the ¡indices ¡of ¡the ¡current ¡chare ¡array ¡element ¡ .ci ¡ file: ¡ array ¡[1D] ¡foo ¡{ ¡ ¡ ¡entry ¡foo(); ¡ } ¡ .C ¡ file: ¡ struct ¡foo ¡: ¡public ¡CBase_foo ¡{ ¡ ¡ ¡foo() ¡{ ¡ ¡ ¡ ¡ ¡CkPrintf(" ¡ ¡array ¡index ¡= ¡%d",thisIndex); ¡ ¡ ¡ ¡} ¡ }; ¡ 44 ¡

Chare ¡Array: ¡Hello ¡Example ¡ mainmodule ¡arr ¡{ ¡ ¡ ¡mainchare ¡MyMain ¡{ ¡ ¡ ¡ ¡ ¡entry ¡MyMain(CkArgMsg ∗ ); ¡ ¡ ¡} ¡ ¡ ¡array ¡[1D] ¡hello ¡{ ¡ ¡ ¡ ¡ ¡entry ¡hello(int); ¡ ¡ ¡ ¡ ¡entry ¡void ¡printHello(); ¡ ¡ ¡} ¡ } ¡ 45 ¡

Chare ¡Array: ¡Hello ¡Example ¡ #include ¡"arr.decl.h" ¡ struct ¡MyMain ¡: ¡CBase_MyMain ¡{ ¡ ¡ ¡MyMain(CkArgMsg ∗ ¡msg) ¡{ ¡ ¡ ¡ ¡ ¡int ¡arraySize ¡= ¡atoi(msg-‑>argv[1]); ¡ ¡ ¡ ¡ ¡CProxy_hello ¡p ¡= ¡CProxy_hello::ckNew(arraySize, ¡arraySize); ¡ ¡ ¡ ¡ ¡p[0].printHello(); ¡ ¡ ¡} ¡ }; ¡ struct ¡hello ¡: ¡CBase_hello ¡{ ¡ ¡ ¡hello(int ¡n) ¡: ¡arraySize(n) ¡{ ¡} ¡ ¡ ¡void ¡printHello() ¡{ ¡ ¡ ¡ ¡ ¡CkPrintf("PE[%d]: ¡hello ¡from ¡p[%d]\n", ¡CkMyPe(), ¡thisIndex); ¡ ¡ ¡ ¡ ¡if ¡(thisIndex ¡== ¡arraySize ¡– ¡1) ¡CkExit(); ¡ ¡ ¡ ¡ ¡else ¡thisProxy[thisIndex ¡+ ¡1].printHello(); ¡ ¡ ¡} ¡ ¡ ¡int ¡arraySize; ¡ }; ¡ #include ¡"arr.def.h" ¡ 46 ¡

Hello ¡World ¡Array ¡Projec1ons ¡Timeline ¡View ¡ • Add ¡“-‑ tracemode ¡ projections ” ¡to ¡link ¡line ¡to ¡enable ¡tracing ¡ • Run ¡Projec1ons ¡tool ¡to ¡load ¡trace ¡log ¡files ¡and ¡visualize ¡performance ¡ • arrayHello ¡on ¡BG/Q ¡16 ¡Nodes, ¡mode ¡c16, ¡1024 ¡elements ¡ (4 ¡per ¡process) ¡ 47 ¡

Collec1ons ¡of ¡Objects: ¡Run1me ¡Service ¡ • System ¡knows ¡how ¡to ¡‘find’ ¡objects ¡efficiently: ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡( collec8on , ¡ index ) ¡ à ¡ processor ¡ ¡ • Applica1ons ¡can ¡specify ¡a ¡mapping ¡or ¡use ¡simple ¡run1me-‑provided ¡op1ons ¡(e.g. ¡ blocked, ¡round-‑robin) ¡ ¡ • Distribu1on ¡can ¡be ¡sta1c ¡or ¡dynamic! ¡ ¡ • Key ¡abstrac1on: ¡applica1on ¡logic ¡doesn’t ¡change, ¡even ¡though ¡performance ¡might ¡ 48 ¡

Collec1ons ¡of ¡Objects: ¡Run1me ¡Service ¡ • Can ¡develop ¡and ¡test ¡logic ¡in ¡objects ¡separately ¡from ¡their ¡distribu1on ¡ ¡ • Separa1on ¡in ¡1me: ¡make ¡it ¡work, ¡then ¡make ¡it ¡fast ¡ ¡ • Division ¡of ¡labor: ¡domain ¡specialist ¡writes ¡object ¡code, ¡computa1onalist ¡writes ¡ mapping ¡ ¡ • Portability: ¡different ¡mappings ¡for ¡different ¡systems, ¡scales, ¡or ¡configura1ons ¡ ¡ • Shared ¡progress: ¡improved ¡mapping ¡techniques ¡can ¡benefit ¡exis1ng ¡code ¡ 49 ¡

Collec1ve ¡Communica1on ¡Opera1ons ¡ • Point-‑to-‑point ¡opera1ons ¡involve ¡only ¡two ¡objects ¡ • Collec1ve ¡opera1ons ¡that ¡involve ¡a ¡collec1on ¡of ¡objects ¡ ¡ • Broadcast: ¡calls ¡a ¡method ¡in ¡each ¡object ¡of ¡the ¡array ¡ ¡ • Reduc1on: ¡collects ¡a ¡contribu1on ¡from ¡each ¡object ¡of ¡the ¡array ¡ ¡ • A ¡spanning ¡tree ¡is ¡used ¡to ¡send/receive ¡data ¡ A B C D E F G 50 ¡

Broadcast ¡ • A ¡message ¡to ¡each ¡object ¡in ¡a ¡collec1on ¡ • The ¡chare ¡array ¡proxy ¡object ¡is ¡used ¡to ¡perform ¡a ¡broadcast ¡ ¡ • It ¡looks ¡like ¡a ¡func1on ¡call ¡to ¡the ¡proxy ¡object ¡ • From ¡the ¡main ¡chare: ¡ CProxy_Hello ¡helloArray ¡= ¡CProxy_Hello::ckNew(helloArraySize); ¡ helloArray.foo(); ¡ • From ¡a ¡chare ¡array ¡element ¡that ¡is ¡a ¡member ¡of ¡the ¡same ¡array: ¡ thisProxy.foo(); ¡ • From any chare that has a proxy p to the chare array p.foo(); ¡ ¡ 51 ¡

Reduc1on ¡ • Combines ¡a ¡set ¡of ¡values: ¡ sum , ¡ max , ¡ concat ¡ • Usually ¡reduces ¡the ¡set ¡of ¡values ¡to ¡a ¡single ¡value ¡ ¡ • Combina1on ¡of ¡values ¡requires ¡an ¡operator ¡ • The ¡operator ¡must ¡be ¡commuta1ve ¡and ¡associa1ve ¡ ¡ • Each ¡object ¡calls ¡ contribute ¡in ¡a ¡reduc1on ¡ 52 ¡

Reduc1on: ¡Example ¡ mainmodule ¡reduction ¡{ ¡ ¡ ¡mainchare ¡Main ¡{ ¡ ¡ ¡ ¡ ¡entry ¡Main(CkArgMsg ∗ ¡msg); ¡ ¡ ¡ ¡ ¡entry ¡[reductiontarget] ¡void ¡done(int ¡value); ¡ ¡ ¡}; ¡ ¡ ¡array ¡[1D] ¡Elem ¡{ ¡ ¡ ¡ ¡ ¡entry ¡Elem(CProxy_Main ¡mProxy); ¡ ¡ ¡}; ¡ } ¡ 53 ¡

Reduc1on: ¡Example ¡ class ¡Elem ¡: ¡public ¡CBase_Elem ¡{ ¡ #include ¡"reduction.decl.h" ¡ ¡public: ¡ const ¡int ¡numElements ¡= ¡49; ¡ ¡ ¡Elem(CProxy_Main ¡mProxy) ¡{ ¡ class ¡Main ¡: ¡public ¡CBase_Main ¡{ ¡ ¡ ¡ ¡ ¡int ¡val ¡= ¡thisIndex; ¡ ¡public: ¡ ¡ ¡ ¡ ¡CkCallback ¡cb(CkReductionTarget(Main, ¡done), ¡mProxy); ¡ ¡ ¡Main(CkArgMsg* ¡msg) ¡{ ¡ ¡ ¡ ¡ ¡contribute(sizeof(int), ¡&val, ¡CkReduction::sum_int, ¡ ¡ ¡ ¡ ¡CProxy_Elem::ckNew(thisProxy, ¡numElements); ¡ cb); ¡ ¡ ¡} ¡ ¡ ¡} ¡ ¡ ¡void ¡done(int ¡value) ¡{ ¡ }; ¡ ¡ ¡ ¡ ¡CkPrintf("value: ¡%d\n“,value); ¡ ¡ ¡ ¡ ¡CkExit(); ¡ ¡ ¡} ¡ Output ¡ }; ¡ value: ¡1176 ¡ ¡ class ¡Elem ¡: ¡public ¡CBase_Elem ¡{ ¡ ¡// ¡. ¡. ¡. ¡ Program ¡finished. ¡ }; ¡ #include ¡"reduction.def.h" ¡ 54 ¡

Outline ¡ 1) Introduc1on ¡ 8) Applica1on ¡Design ¡ • Object ¡Design ¡ 9) Performance ¡Tuning ¡ • Execu1on ¡Model ¡ 10) Using ¡Dynamic ¡Load ¡Balancing ¡ 2) Hello ¡World ¡ 11) Interoperability ¡ 3) Benefits ¡of ¡Charm++ ¡ 12) Debugging ¡ 4) Charm++ ¡Basics ¡ 13) Further ¡Op1miza1ons ¡ • Object ¡Collec1ons ¡ 5) Overdecomposi+on ¡ 6) Migratability ¡ • Checkpoin1ng ¡and ¡Resilience ¡ 7) Structured ¡Dagger ¡ 55 ¡

Task ¡Parallelism ¡with ¡Objects ¡ • Divide-‑and-‑conquer ¡ • Each ¡object ¡recursively ¡creates ¡ n ¡objects ¡that ¡divide ¡the ¡problem ¡into ¡subproblems ¡ • Each ¡object ¡ t ¡then ¡waits ¡for ¡all ¡ n ¡objects ¡to ¡finish ¡and ¡then ¡may ¡‘combine’ ¡the ¡responses ¡ • At ¡some ¡point ¡the ¡recursion ¡stops ¡(at ¡the ¡bouom ¡of ¡the ¡tree), ¡and ¡some ¡sequen1al ¡kernel ¡is ¡ executed ¡ • Then ¡the ¡result ¡is ¡propagated ¡upward ¡in ¡the ¡tree ¡recursively ¡ ¡ • Examples: ¡fibonacci, ¡quicksort, ¡. ¡. ¡. ¡ 56 ¡

Fibonacci ¡Example ¡ • Each ¡ Fib object ¡is ¡a ¡task ¡that ¡performs ¡one ¡of ¡two ¡ac1ons: ¡ ¡ • Creates ¡two ¡new ¡ Fib ¡objects ¡to ¡compute ¡ fib(n ¡– ¡1) ¡ and ¡ fib(n ¡– ¡2) ¡ and ¡then ¡waits ¡for ¡the ¡response, ¡ adding ¡up ¡the ¡two ¡responses ¡when ¡they ¡arrive ¡ ¡ • Arer ¡both ¡arrive, ¡sends ¡a ¡response ¡message ¡with ¡the ¡result ¡to ¡the ¡parent ¡object ¡ ¡ • Or ¡prints ¡the ¡value ¡and ¡exits ¡if ¡it ¡is ¡the ¡root ¡ • If ¡ n ¡= ¡1 ¡or ¡ n ¡= ¡0 ¡(passed ¡down ¡from ¡the ¡parent) ¡it ¡sends ¡a ¡response ¡message ¡with ¡ n ¡back ¡to ¡the ¡ parent ¡object ¡ ¡ 57 ¡

Fibonacci ¡Execu1on ¡ fib(5) fib(5) fib(5) fib(5) fib(5) fib(4) fib(4) fib(4) fib(4) fib(3) fib(3) fib(3) fib(3) fib(3) fib(2) fib(3) fib(3) fib(2) fib(2) fib(2) fib(1) fib(2) fib(2) fib(1) fib(1) fib(1) fib(0) fib(2) fib(1) fib(2) fib(1) fib(1) fib(0) fib(1) fib(0) fib(1) fib(0) fib(1) fib(0) 58 ¡

Object-‑based ¡Overdecomposi1on ¡ • Charm++ ¡philosophy: ¡ • Let ¡the ¡programmer ¡decompose ¡their ¡work ¡and ¡data ¡into ¡coarse-‑grained ¡en11es ¡ ¡ • It ¡is ¡important ¡to ¡understand ¡what ¡we ¡mean ¡by ¡coarse-‑grained ¡en11es ¡ • You ¡don’t ¡write ¡sequen1al ¡programs ¡that ¡some ¡system ¡will ¡auto-‑decompose ¡ • You ¡don’t ¡write ¡programs ¡when ¡there ¡is ¡one ¡object ¡for ¡each ¡ float ¡ • You ¡consciously ¡choose ¡a ¡grainsize, ¡BUT ¡choose ¡it ¡independent ¡of ¡the ¡number ¡of ¡processors, ¡or ¡ parameterize ¡it, ¡so ¡you ¡can ¡tune ¡later ¡ 59 ¡

Amdahl's ¡Law ¡and ¡Grainsize ¡ • Original ¡“law”: ¡ 100 • If ¡a ¡program ¡has ¡ K% ¡sequen1al ¡sec1on, ¡then ¡speedup ¡is ¡limited ¡to ¡ ¡ ¡ ¡ ¡ ¡ K • If ¡the ¡rest ¡of ¡the ¡program ¡is ¡parallelized ¡completely ¡ ¡ • Grainsize ¡corollary: ¡ • If ¡any ¡individual ¡piece ¡of ¡work ¡is ¡ > ¡K ¡ 1me ¡units, ¡and ¡the ¡sequen1al ¡program ¡takes ¡ T seq , ¡ Tseq • Speedup ¡is ¡limited ¡to ¡ K • So: ¡ • Examine ¡performance ¡data ¡via ¡histograms ¡to ¡find ¡the ¡sizes ¡of ¡remappable ¡work ¡units ¡ • If ¡some ¡are ¡too ¡big, ¡change ¡the ¡decomposi1on ¡method ¡to ¡make ¡smaller ¡units ¡ 60 ¡

Quick ¡Example: ¡Crack ¡Propaga1on ¡ • Decomposi1on ¡into ¡16 ¡chunks ¡(ler) ¡and ¡128 ¡chunks, ¡8 ¡for ¡each ¡PE ¡(right). ¡The ¡middle ¡area ¡contains ¡ cohesive ¡elements. ¡Both ¡decomposi1ons ¡obtained ¡using ¡METIS. ¡ • Pictures: ¡S. ¡Breitenfeld, ¡and ¡P. ¡Geubelle ¡ 61 ¡

Overdecomposi1on ¡and ¡Grainsize ¡ • Common ¡misconcep1on: ¡overdecomposi1on ¡must ¡be ¡expensive ¡ • (Working) ¡Defini1on: ¡the ¡amount ¡of ¡computa1on ¡per ¡poten1ally ¡parallel ¡event ¡(task ¡ crea1on, ¡enqueue/dequeue, ¡messaging, ¡locking, ¡etc) ¡ ¡ 62 ¡

Grainsize ¡and ¡Overhead ¡ • What ¡is ¡the ¡ideal ¡grainsize? ¡ • Should ¡it ¡depend ¡on ¡the ¡number ¡of ¡processors? ¡ ! $ 1 = T 1 + v ' + ! $ T 1 + v T # & ) # & ) g " % ) ) g " % T p = max g , ( , p ) ) ! $ T p = max g , T 1 ) ) " % * - p # & v : overhead per message, T p : completion time of processor p g : grainsize (computation per message) 63 ¡

Grainsize ¡and ¡Scalability ¡ 64 ¡

Grainsize ¡Study ¡for ¡Jacobi3D ¡ Jacobi3D running on JYC using 64 cores on 2 nodes 2048x2048x2048 (total problem size) 4 timestep(sec) 2 1 4K 16K 64K 512K 2M 8M 32M 128M number of points per chare 65 ¡

Grainsize ¡Study ¡for ¡Stencil ¡Computa1on ¡ • Blue ¡Waters ¡(JYC), ¡2 ¡nodes, ¡32 ¡cores ¡each ¡ time step(sec) using different number of chares (64 cores) 2048x2048x2048 (50%mem) 2048x2048x1024 4 2048x1024x1024 1024x1024x1024 512x1024x1024 2 timestep(sec) 1 0.5 0.25 0.125 1 4 16 64 256 1024 4096 16384 number of chares per core Typically, ¡having ¡tens ¡of ¡chares ¡per ¡code ¡is ¡adequate ¡(although ¡ reasoning ¡should ¡be ¡based ¡on ¡computa1on ¡per ¡message) ¡ 66 ¡

Grainsize ¡and ¡Load ¡Balancing ¡ Solu1on: ¡ How ¡Much ¡Balance ¡Is ¡Possible? ¡ Split ¡compute ¡objects ¡that ¡ may ¡have ¡too ¡much ¡work, ¡ using ¡a ¡heuris1c ¡based ¡on ¡ number ¡of ¡interac1ng ¡ atoms ¡ 67 ¡

Grainsize ¡For ¡Extreme ¡Scaling ¡ • Strong ¡Scaling ¡is ¡limited ¡by ¡expressed ¡parallelism ¡ • Minimum ¡itera1on ¡1me ¡limited ¡by ¡lengthiest ¡computa1on ¡ ¡ • Largest ¡grains ¡set ¡lower ¡bound ¡ • 1-‑away ¡generalized ¡to ¡k-‑away ¡provides ¡fine ¡granularity ¡control ¡ 68 ¡

NAMD: ¡2-‑AwayX ¡Example ¡ 69 ¡

Rules ¡of ¡thumb ¡for ¡grainsize ¡ • Make ¡it ¡as ¡small ¡as ¡possible, ¡as ¡long ¡as ¡it ¡amor1zes ¡the ¡overhead ¡ ¡ • More ¡specifically, ¡ensure: ¡ • Average ¡ grainsize ¡is ¡greater ¡than ¡ kv ¡ (say ¡ 10v ) ¡ T • No ¡single ¡grain ¡should ¡be ¡allowed ¡to ¡be ¡too ¡large ¡ p • Must ¡be ¡smaller ¡than ¡ ¡ ¡ ¡ ¡, ¡but ¡actually ¡we ¡can ¡express ¡it ¡as: ¡p ¡ • Must ¡be ¡smaller ¡than ¡ kmv ¡ (say ¡ 100v ) ¡ ¡ • Important ¡corollary: ¡ • You ¡can ¡be ¡at ¡close ¡to ¡op1mal ¡grainsize ¡without ¡having ¡to ¡think ¡about ¡ p , ¡the ¡number ¡of ¡processors ¡ • kv ¡< ¡g ¡< ¡mkv ¡(10v ¡< ¡g ¡< ¡100v) ¡ 70 ¡

Grainsize ¡for ¡Fibonacci ¡Example ¡ • Set ¡a ¡sequen1al ¡threshold ¡in ¡the ¡computa1onal ¡tree ¡ • Past ¡this ¡threshold ¡(i.e. ¡when ¡n ¡< ¡threshold), ¡instead ¡of ¡construc1ng ¡two ¡new ¡chares, ¡compute ¡the ¡ fibonacci ¡sequen1ally ¡ fib(5) fib(4) fib(3) sequential fib(3) fib(3) fib(2) sequential fib(3) sequential fib(2) • Sezng ¡the ¡grainsize ¡limit ¡at ¡4 ¡(which ¡is ¡too ¡small, ¡but ¡good ¡for ¡illustra1on) ¡ • The ¡internal ¡nodes ¡of ¡the ¡tree ¡do ¡very ¡liule ¡work, ¡but ¡ • The ¡coarser ¡grains ¡now ¡amor1ze ¡the ¡cost ¡of ¡the ¡fine-‑grained ¡chares ¡ 71 ¡

Outline ¡ 1) Introduc1on ¡ 8) Applica1on ¡Design ¡ • Object ¡Design ¡ 9) Performance ¡Tuning ¡ • Execu1on ¡Model ¡ 10) Using ¡Dynamic ¡Load ¡Balancing ¡ 2) Hello ¡World ¡ 11) Interoperability ¡ 3) Benefits ¡of ¡Charm++ ¡ 12) Debugging ¡ 4) Charm++ ¡Basics ¡ 13) Further ¡Op1miza1ons ¡ • Object ¡Collec1ons ¡ 5) Overdecomposi1on ¡ 6) Migratability ¡ • Checkpoin1ng ¡and ¡Resilience ¡ 7) Structured ¡Dagger ¡ 72 ¡

Object Serialization Using PUP: The P ack/ U n P ack Framework 73 ¡

The ¡PUP ¡Process ¡ 74 ¡

PUP ¡Usage ¡Sequence ¡ • Migra1on ¡out: ¡ • Migra1on ¡in: ¡ • ckAboutToMigrate ¡ • Migra1on ¡constructor ¡ ¡ • Sizing ¡ • UnPacking ¡ • Packing ¡ • ckJustMigrated ¡ • Destructor ¡ 75 ¡

Wri1ng ¡a ¡PUP ¡rou1ne ¡ class ¡MyChare ¡: ¡ void ¡pup(PUP::er ¡&p) ¡{ ¡ ¡public ¡CBase_MyChare ¡{ ¡ ¡ ¡p ¡| ¡a; ¡ ¡ ¡int ¡a; ¡ ¡ ¡p ¡| ¡b; ¡ ¡ ¡float ¡b; ¡ ¡ ¡p ¡| ¡c; ¡ ¡ ¡char ¡c; ¡ ¡ ¡p(localArray, ¡SIZE); ¡ ¡ ¡float ¡localArray[SIZE]; ¡ } ¡ }; ¡ 76 ¡

Wri1ng ¡a ¡PUP ¡rou1ne ¡ void ¡pup(PUP::er ¡&p) ¡{ ¡ ¡ ¡p ¡| ¡heapArraySize; ¡ ¡ ¡if ¡(p.isUnpacking()) ¡{ ¡ class ¡MyChare ¡: ¡ ¡ ¡ ¡ ¡heapArray ¡= ¡ ¡public ¡CBase_MyChare ¡{ ¡ ¡ ¡ ¡ ¡ ¡ ¡new ¡float[heapArraySize]; ¡ ¡ ¡} ¡ ¡ ¡int ¡heapArraySize; ¡ ¡ ¡p(heapArray, ¡heapArraySize); ¡ ¡ ¡float ¡ ∗ heapArray; ¡ ¡ ¡bool ¡isNull ¡= ¡!pointer; ¡ ¡ ¡MyClass ¡ ∗ pointer; ¡ ¡ ¡p ¡| ¡isNull; ¡ }; ¡ ¡ ¡if ¡(!isNull) ¡{ ¡ ¡ ¡ ¡ ¡if(p.isUnpacking()) ¡ ¡ ¡ ¡ ¡ ¡ ¡pointer ¡= ¡new ¡MyClass(); ¡ ¡ ¡ ¡ ¡p ¡| ¡ ∗ pointer; ¡ ¡ ¡} ¡ } ¡ 77 ¡

PUP: ¡Piealls ¡ • If ¡variables ¡are ¡added ¡to ¡an ¡object, ¡update ¡the ¡PUP ¡rou1ne ¡ • If ¡the ¡object ¡allocates ¡data ¡on ¡the ¡heap, ¡copy ¡it ¡recursively, ¡not ¡just ¡the ¡pointer ¡ • Remember ¡to ¡allocate ¡memory ¡while ¡unpacking ¡ • Sizing, ¡Packing, ¡and ¡Unpacking ¡must ¡scan ¡the ¡variables ¡in ¡the ¡same ¡order ¡ • Test ¡PUP ¡rou1nes ¡with ¡ +balancer ¡RotateLB ¡ 78 ¡

Fault ¡Tolerance ¡in ¡Charm++/AMPI ¡ • Four ¡Approaches: ¡ • Disk-‑based ¡checkpoint/restart ¡ • In-‑memory ¡double ¡checkpoint/restart ¡ • Experimental: ¡Proac1ve ¡object ¡evacua1on ¡ • Experimental: ¡Message-‑logging ¡for ¡scalable ¡fault ¡tolerance ¡ • Common ¡Features: ¡ • Easy ¡checkpoint ¡ • Migrate-‑to-‑disk ¡leverages ¡object-‑migra1on ¡capabili1es ¡ ¡ • Based ¡on ¡dynamic ¡run1me ¡capabili1es ¡ • Can ¡be ¡used ¡in ¡concert ¡with ¡load-‑balancing ¡schemes ¡ 79 ¡

Checkpoin1ng ¡to ¡the ¡file ¡system ¡: ¡Split ¡Execu1on ¡ • The ¡common ¡form ¡of ¡checkpoin1ng ¡ • The ¡job ¡runs ¡for ¡5 ¡hours, ¡then ¡will ¡con1nue ¡at ¡the ¡next ¡alloca1on ¡another ¡day! ¡ • The ¡exis1ng ¡Charm++ ¡infrastructure ¡for ¡chare ¡migra1on ¡helps ¡ ¡ • Just ¡“migrate” ¡chares ¡to ¡disk ¡ • The ¡call ¡to ¡checkpoint ¡the ¡applica1on ¡is ¡made ¡in ¡the ¡main ¡chare ¡at ¡a ¡synchroniza1on ¡ point ¡ CkCallback ¡cb(CkIndex_Hello::SayHi(),helloProxy); ¡ CkStartCheckpoint("log",cb); ¡ ¡ > ¡./charmrun ¡hello ¡+p4 ¡+restart ¡log ¡ 80 ¡

Code ¡to ¡Use ¡Load ¡Balancing ¡ • Write ¡PUP ¡method ¡to ¡serialize ¡the ¡state ¡of ¡a ¡chare ¡ • Insert ¡ if(myLBStep) AtSync(); call ¡at ¡natural ¡barrier ¡ • Implement ¡ ResumeFromSync() to ¡resume ¡execu1on ¡ ¡ • Typically, ¡ ResumeFromSync ¡contribute ¡to ¡a ¡reduc1on ¡ 81 ¡

Using ¡the ¡Load ¡Balancer ¡ • link ¡a ¡LB ¡module ¡ • -module <strategy> • RefineLB, ¡NeighborLB, ¡GreedyCommLB, ¡others ¡ ¡ • EveryLB ¡will ¡include ¡all ¡load ¡balancing ¡strategies ¡ • compile ¡1me ¡op1on ¡(specify ¡default ¡balancer) ¡ • -balancer RefineLB • run1me ¡op1on ¡ • +balancer RefineLB 82 ¡

Outline ¡ 1) Introduc1on ¡ 8) Applica1on ¡Design ¡ • Object ¡Design ¡ 9) Performance ¡Tuning ¡ • Execu1on ¡Model ¡ 10) Using ¡Dynamic ¡Load ¡Balancing ¡ 2) Hello ¡World ¡ 11) Interoperability ¡ 3) Benefits ¡of ¡Charm++ ¡ 12) Debugging ¡ 4) Charm++ ¡Basics ¡ 13) Further ¡Op1miza1on ¡ • Object ¡Collec1ons ¡ 5) Overdecomposi1on ¡ 6) Migratability ¡ • Checkpoin1ng ¡and ¡Resilience ¡ 7) Structured ¡Dagger ¡ 83 ¡

Chares ¡are ¡reac1ve ¡ • The ¡way ¡we ¡described ¡Charm++ ¡so ¡far, ¡a ¡chare ¡is ¡a ¡reac1ve ¡en1ty: ¡ • If ¡it ¡gets ¡this ¡method ¡invoca1on, ¡it ¡does ¡this ¡ac1on, ¡ • If ¡it ¡gets ¡that ¡method ¡invoca1on ¡then ¡it ¡does ¡that ¡ac1on ¡ • But ¡what ¡does ¡it ¡do? ¡ • In ¡typical ¡programs, ¡chares ¡have ¡a ¡life-‑cycle ¡ • How ¡to ¡express ¡the ¡life-‑cycle ¡of ¡a ¡chare ¡in ¡code? ¡ • Only ¡when ¡it ¡exists ¡ • i.e. ¡some ¡chars ¡may ¡be ¡truly ¡reac1ve, ¡and ¡the ¡programmer ¡does ¡not ¡know ¡the ¡life ¡cycle ¡ • But ¡when ¡it ¡exists, ¡its ¡form ¡is: ¡ • Computa1ons ¡depend ¡on ¡remote ¡method ¡invoca1ons, ¡and ¡comple1on ¡of ¡other ¡local ¡computa1ons ¡ • A ¡DAG ¡(Directed ¡Acyclic ¡Graph)! ¡ 84 ¡

Fibonacci ¡Example ¡ mainmodule ¡fib ¡{ ¡ ¡ ¡mainchare ¡Main ¡{ ¡ ¡ ¡ ¡ ¡entry ¡Main(CkArgMsg ∗ ¡m); ¡ ¡ ¡}; ¡ ¡ ¡ ¡chare ¡Fib ¡{ ¡ ¡ ¡ ¡ ¡entry ¡Fib(int ¡n, ¡bool ¡isRoot, ¡CProxy_Fib ¡parent); ¡ ¡ ¡ ¡ ¡entry ¡void ¡respond(int ¡value); ¡ ¡ ¡}; ¡ }; ¡ 85 ¡

Fibonacci ¡Example ¡ class ¡Main ¡: ¡public ¡CBase_Main ¡{ ¡ ¡public: ¡ ¡ ¡Main(CkArgMsg*m) ¡{ ¡ ¡ ¡ ¡ ¡CProxy_Fib::ckNew(atoi(m-‑ ¡>argv[1]), ¡true, ¡ CProxy_Fib()); ¡ void ¡Fib::respond(int ¡val) ¡{ ¡ ¡ ¡} ¡ ¡ ¡result ¡+= ¡val; ¡ }; ¡ class ¡Fib ¡: ¡public ¡CBase_Fib ¡{ ¡ ¡ ¡if ¡(-‑-‑ ¡count ¡== ¡0 ¡|| ¡n ¡< ¡2) ¡{ ¡ ¡public: ¡ ¡ ¡ ¡ ¡if ¡(isRoot) ¡{ ¡ ¡ ¡CProxy_Fib ¡parent; ¡ ¡ ¡ ¡ ¡ ¡ ¡CkPrintf(“Fibonacci ¡number ¡is: ¡%d\n", ¡result); ¡ ¡ ¡bool ¡isRoot; ¡ ¡ ¡ ¡ ¡ ¡ ¡CkExit(); ¡ ¡ ¡int ¡result, ¡count; ¡ ¡ ¡ ¡ ¡} ¡else ¡{ ¡ ¡ ¡Fib(int ¡n, ¡bool ¡isRoot_, ¡CProxy_Fib ¡parent_) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡parent.respond(result); ¡ ¡ ¡: ¡parent(parent_), ¡isRoot(isRoot_), ¡ ¡ ¡ ¡ ¡ ¡ ¡delete ¡this; ¡ result(0), ¡count(2) ¡{ ¡ ¡ ¡ ¡ ¡if ¡(n ¡< ¡2) ¡respond(n); ¡ ¡ ¡ ¡ ¡} ¡ ¡ ¡ ¡ ¡else ¡{ ¡ ¡ ¡} ¡ ¡ ¡ ¡ ¡ ¡ ¡CProxy_Fib::ckNew(n ¡-‑1, ¡false, ¡thisProxy); ¡ } ¡ ¡ ¡ ¡ ¡ ¡ ¡CProxy_Fib::ckNew(n ¡-‑2, ¡false, ¡thisProxy); ¡ ¡ ¡ ¡ ¡} ¡ ¡ ¡} ¡ ¡ ¡void ¡respond(int ¡val); ¡ }; ¡ 86 ¡

Consider ¡Fibonacci ¡Chare ¡ • The Fibonacci chare gets created • If it’s not a leaf, Ø It fires two chares ¡ Ø When both children return results (by calling respond ): « It can compute my result and send it up, or print it Ø But in our example, this logic is hidden in the flags and counters . . . « This is simple for this simple example, but . . . Ø Let’s look at how this would look with a little notational support 87 ¡

Structured ¡Dagger ¡ The ¡ when ¡construct ¡ • The when construct Ø Declare the actions to perform when a message is received Ø In sequence, it acts like a blocking receive entry ¡void ¡someMethod() ¡{ ¡ ¡ ¡when ¡entryMethod1(parameters) ¡{ ¡ /* ¡block2 ¡*/ } ¡ ¡ ¡when ¡entryMethod2(parameters) ¡{ ¡ /* ¡block3 ¡*/ } ¡ }; ¡ 88 ¡

Structured ¡Dagger ¡ The ¡ serial construct ¡ • The serial construct Ø A sequencial block of C++ code in the .ci file Ø The keyword serial means that the code block will be executed without interruption/preemption, like an entry method Ø Syntax serial <optionalString> { /* C++ code */ } Ø The <optionalString> is used for identifying the serial for performance analysis Ø Serial blocks can access all members of the class they belong to • Examples (.ci file): entry ¡void ¡method1(parameters) ¡{ ¡ entry ¡void ¡method2(parameters) ¡{ ¡ ¡ ¡serial ¡{ ¡ ¡ ¡serial ¡"setValue" ¡{ ¡ ¡ ¡ ¡ ¡thisProxy.invokeMethod(10); ¡ ¡ ¡ ¡ ¡value ¡= ¡10; ¡ ¡ ¡ ¡ ¡callSomeFunction(); ¡ ¡ ¡} ¡ ¡ ¡} ¡ }; ¡ }; ¡ 89 ¡

Structured ¡Dagger ¡ Sequence ¡ entry ¡void ¡someMethod() ¡{ ¡ ¡ ¡serial ¡{ ¡ /* ¡block1 ¡*/ } ¡ ¡ ¡when ¡entryMethod1(parameters) ¡serial ¡{ ¡ /* ¡block2 ¡*/ } ¡ ¡ ¡when ¡entryMethod2(parameters) ¡serial ¡{ ¡ /* ¡block3 ¡*/ } ¡ }; ¡ • Sequence Ø Sequentially execute /* block1 */ Ø Wait for entryMethod1 to arrive, if it has not, return control back to the Charm++ scheduler, otherwise, execute /* block2 */ Ø Wait for entryMethod2 to arrive, if it has not, return control back to the Charm++ scheduler, otherwise, execute /* block3 */ 90 ¡

Structured ¡Dagger ¡ The ¡ when ¡construct ¡ • Execute /* further code */ when myMethod arrives � when ¡myMethod(int ¡param1, ¡int ¡param2) ¡ ¡ ¡{ ¡ /* ¡further ¡code ¡*/ ¡} ¡ • Execute /* further code */ when myMethod1 and myMethod2 arrive � when ¡myMethod1(int ¡param1, ¡int ¡param2), ¡ ¡ ¡ ¡ ¡ ¡myMethod2(bool ¡param3) ¡ ¡ ¡{ ¡ /* ¡ further ¡code ¡*/ ¡} ¡ • Which is almost the same as this: when ¡myMethod1(int ¡param1, ¡int ¡param2) ¡{ ¡ ¡ ¡when ¡myMethod2(bool ¡param3) ¡ ¡ ¡ ¡ ¡ ¡{ ¡ /* ¡further ¡code ¡*/ ¡ } ¡ } ¡ 91 ¡

Structured ¡Dagger ¡ Boilerplate ¡ • Structured Dagger can be used in any entry method (except for a constructor) Ø Can be used in a mainchare , chare , or array • For any class that has Structured Dagger in it you must insert: Ø The Structured Dagger macro: [ClassName]_SDAG_CODE 92 ¡

Structured ¡Dagger ¡ Declara1on ¡Syntax ¡ The .ci ¡ file: [mainchare,chare,array] ¡MyFoo ¡{ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡entry ¡void ¡method( /* ¡parameters ¡*/ ){ ¡ ¡ ¡ ¡ ¡// ¡... ¡structured ¡dagger ¡code ¡here ¡... ¡ ¡ ¡ ¡}; ¡ ¡ ¡// ¡... ¡ ¡ ¡ ¡ ¡ ¡ ¡ } ¡ The .cpp ¡ file: class ¡MyFoo ¡: ¡public ¡CBase_MyFoo ¡{ ¡ ¡ ¡MyFoo_SDAG_Code ¡ /* ¡insert ¡SDAG ¡macro ¡*/ ¡ ¡public: ¡ ¡ ¡MyFoo() ¡{ ¡} ¡ }; ¡ 93 ¡

Fibonacci with Structured Dagger ¡ chare ¡Fib ¡{ ¡ ¡ ¡ ¡ ¡entry ¡Fib(int ¡n, ¡bool ¡isRoot, ¡CProxy_Fib ¡parent); ¡ ¡ ¡ ¡ ¡entry ¡void ¡calc(int ¡n) ¡{ ¡ ¡ ¡ ¡ ¡ ¡ ¡if ¡(n ¡< ¡THRESHOLD) ¡serial ¡{ ¡respond(seqFib(n)); ¡} ¡ ¡ ¡ ¡ ¡ ¡ ¡else ¡{ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡serial ¡{ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡CProxy_Fib::ckNew(n ¡-‑1, ¡false, ¡thisProxy); ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡CProxy_Fib::ckNew(n ¡-‑2, ¡false, ¡thisProxy); ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡} ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡when ¡response(int ¡val) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡when ¡response(int ¡val2) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡serial ¡{ ¡respond(val ¡+ ¡val2); ¡} ¡ ¡ ¡ ¡ ¡ ¡ ¡} ¡ ¡ ¡ ¡ ¡}; ¡ ¡ ¡ ¡ ¡entry ¡void ¡response(int); ¡ }; ¡ 94 ¡

Fibonacci with Structured Dagger ¡ #include ¡" ¡fib.decl.h" ¡ ¡ #define ¡THRESHOLD ¡10 ¡ class ¡Main ¡: ¡public ¡CBase_Main ¡{ ¡ ¡public: ¡ ¡ ¡Main(CkArgMsg*m) ¡{ ¡CProxy_Fib::ckNew(atoi(m-‑ ¡>argv[1]), ¡true, ¡CProxy_Fib()); ¡} ¡ }; ¡ class ¡Fib ¡: ¡public ¡CBase_Fib ¡{ ¡ ¡public: ¡ ¡ ¡Fib_SDAG_CODE ¡ ¡ ¡CProxy_Fib ¡parent; ¡bool ¡isRoot; ¡ ¡ ¡Fib(int ¡n, ¡bool ¡isRoot_, ¡CProxy_Fib ¡parent_):parent(parent_), ¡isRoot(isRoot_) ¡ ¡ ¡ ¡ ¡ ¡{ ¡calc(n); ¡} ¡ ¡ ¡int ¡seqFib(int ¡n) ¡{ ¡return ¡(n ¡< ¡2) ¡? ¡n ¡: ¡seqFib(n ¡-‑1) ¡+ ¡seqFib(n ¡-‑2); ¡} ¡ ¡ ¡void ¡respond(int ¡val) ¡{ ¡ ¡ ¡ ¡ ¡if ¡(!isRoot) ¡{ ¡ ¡ ¡ ¡ ¡ ¡ ¡parent.response(val); ¡ ¡ ¡ ¡ ¡ ¡ ¡thisProxy.ckDestroy(); ¡ ¡ ¡ ¡ ¡} ¡else ¡{ ¡ ¡ ¡ ¡ ¡ ¡ ¡CkPrintf(" ¡Fibonacci ¡number ¡is: ¡%d\n", ¡val); ¡ ¡ ¡ ¡ ¡ ¡ ¡CkExit(); ¡ ¡ ¡ ¡ ¡} ¡ ¡ ¡} ¡ }; ¡ #include ¡" ¡fib.def.h" ¡ ¡ 95 ¡

Structured ¡Dagger ¡ The ¡ when ¡construct ¡ • What is the sequence? when ¡myMethod1(int ¡param1, ¡int ¡param2) ¡{ ¡ ¡ ¡when ¡myMethod2(bool ¡param3), ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡myMethod3(int ¡size, ¡int ¡arr[size]) ¡ /* ¡sdag ¡block1 ¡*/ ¡ ¡ ¡when ¡myMethod4(bool ¡param4) ¡ /* ¡sdag ¡block2 ¡*/ ¡ } ¡ • Sequence: Ø Wait for myMethod1 , upon arrival execute body of myMethod1 Ø Wait for myMethod2 and myMethod3 , upon arrival of both, execute /* sdag block1 */ Ø Wait for myMethod4 , upon arrival execute /* sdag block2 */ • Question: if myMethod4 arrives first what will happen? 96 ¡

Structured ¡Dagger ¡ The ¡ when ¡construct ¡ • The when clause can wait on a certain reference number • If a reference number is specified for a when , the first parameter for the when must be the reference number • Semantic: the when will “block” until a message arrives with that reference number when ¡method1[100](int ¡ref, ¡bool ¡param1) ¡ ¡ ¡/* ¡ sdag ¡block ¡*/ ¡ ¡ serial ¡{ ¡ ¡ ¡proxy.method1(200, ¡false); ¡ /* ¡will ¡not ¡be ¡delivered ¡to ¡the ¡when ¡*/ ¡ ¡ ¡proxy.method1(100, ¡true); ¡ /* ¡will ¡be ¡delivered ¡to ¡the ¡when ¡*/ ¡ } ¡ 97 ¡

Structured ¡Dagger ¡ The ¡ if-then-else construct ¡ • The if-then-else construct: Ø Same as the typical C if-then-else semantics and syntax if ¡(thisIndex.x ¡== ¡10) ¡{ ¡ ¡ ¡when ¡method1[block](int ¡ref, ¡bool ¡someVal) ¡ /* ¡code ¡block1 ¡*/ ¡ } ¡else ¡{ ¡ ¡ ¡when ¡method2(int ¡payload) ¡serial ¡{ ¡ ¡ ¡ ¡ ¡// ¡... ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡some ¡C++ ¡code ¡ ¡ ¡} ¡ } ¡ 98 ¡

Structured ¡Dagger ¡ The for construct ¡ • The for construct: Ø Defines a sequenced for loop (like a sequential C for loop) Ø Once the body for the i th iteration completes, the i + 1 iteration is started for ¡(iter ¡= ¡0; ¡iter ¡< ¡maxIter; ¡++iter) ¡{ ¡ ¡ ¡when ¡recvLeft[iter](int ¡num, ¡int ¡len, ¡double ¡data[len]) ¡ ¡ ¡ ¡ ¡serial ¡{ ¡computeKernel(LEFT, ¡data); ¡} ¡ ¡ ¡when ¡recvRight[iter](int ¡num, ¡int ¡len, ¡double ¡data[len]) ¡ ¡ ¡ ¡ ¡serial ¡{ ¡computeKernel(RIGHT, ¡data); ¡} ¡ } ¡ • iter must be defined in the class as a member class ¡Foo ¡: ¡public ¡CBase_Foo ¡{ ¡ ¡public: ¡int ¡iter; ¡ }; ¡ 99 ¡

Structured ¡Dagger ¡ The ¡ ¡ while construct ¡ • The while construct: Ø Defines a sequenced while loop (like a sequential C while loop) while ¡(i ¡< ¡numNeighbors) ¡{ ¡ ¡ ¡when ¡recvData(int ¡len, ¡double ¡data[len]) ¡{ ¡ ¡ ¡ ¡ ¡serial ¡{ ¡ /* ¡do ¡something ¡*/ } ¡ ¡ ¡ ¡ ¡when ¡method1() ¡ /* ¡block1 ¡*/ ¡ ¡ ¡ ¡ ¡when ¡method2() ¡ /* ¡block2 ¡*/ ¡ ¡ ¡} ¡ ¡ ¡serial ¡{ ¡i++; ¡} ¡ } ¡ 100 ¡

Outline 1) Introduc+on 8) Applica1on Design Object Design - PowerPoint PPT Presentation

Outline 1) Introduc+on 8) Applica1on Design Object Design 9) Performance Tuning Execu1on Model 10) Using Dynamic Load Balancing 2) Hello World 11) Interoperability

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Resource Oriented Computing goto conference; London Peter Rodgers September 2015 1/46

Java Programming Pancakes In one bowl mix: sequence of 1 cup flour instructions telling

Software Construction Fernando Brito e Abreu (fba@di.fct.unl.pt) Universidade Nova de Lisboa

ECEU530 Schedule ECE U530 Classes on November 6 and 8 will be in 429 Dana Digital Hardware

#nhsworkforcesupply www.nhsemployers.org/workforcesupply @NHSE_Caroline # nhsworkforcesupply #

CS310 - Advanced Data Structures and Algorithms Fall 2016 Algorithmic Techniques October 2,

Source Control Kendra Wannamaker and Jarrett Spiker

US Economic Outlook: 2017 and Beyond Matthew C. Roberts, PhD | www.kernmantlegroup.com 13 Dec