Composable Parallel Libraries in Charm++ e Phil Miller Laxmikant - - PowerPoint PPT Presentation

composable parallel libraries in charm
SMART_READER_LITE
LIVE PREVIEW

Composable Parallel Libraries in Charm++ e Phil Miller Laxmikant - - PowerPoint PPT Presentation

Composable Parallel Libraries in Charm++ e Phil Miller Laxmikant V. Kal Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign { mille121, kale } @illinois.edu SIAM PP12: 15 February


slide-1
SLIDE 1

Composable Parallel Libraries in Charm++

Phil Miller Laxmikant V. Kal´ e∗

Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign

∗{mille121, kale}@illinois.edu

SIAM PP12: 15 February 2012

Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 1 / 15

slide-2
SLIDE 2

Charm++

Programming Model

Object-based Express logic via indexed collections of interacting objects (both data and tasks) Over-decomposed Expose more parallelism than available processors

Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 2 / 15

slide-3
SLIDE 3

Charm++

Programming Model

Message-Driven Trigger computation by invoking remote entry methods Non-blocking, Asynchronous Implicitly overlapped data transfer Runtime-Assisted scheduling, observation-based adaptivity, load balancing, composition, etc.

Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 3 / 15

slide-4
SLIDE 4

Charm++

Capabilities

Promotes natural expression of parallelism Supports modularity

Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 4 / 15

slide-5
SLIDE 5

Charm++

Capabilities

Promotes natural expression of parallelism Supports modularity Overlaps communication and computation Automatically balances load

Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 4 / 15

slide-6
SLIDE 6

Charm++

Capabilities

Promotes natural expression of parallelism Supports modularity Overlaps communication and computation Automatically balances load Automatically handles heterogenous systems Adapts to reduce energy consumption Tolerates component failures

For more info

http://charm.cs.illinois.edu/why/

Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 4 / 15

slide-7
SLIDE 7

Separation of Concerns

Application developers focus on their algorithms and data Libraries should

◮ not tie users’ hands ◮ share resources seamlessly ◮ overlap ◮ manage their own performance

Strong runtime makes it possible!

Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 5 / 15

slide-8
SLIDE 8

LU: Capabilities

Composable library

◮ Modular program structure ◮ Seamless execution structure (interleaved modules) Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 6 / 15

slide-9
SLIDE 9

LU: Capabilities

Composable library

◮ Modular program structure ◮ Seamless execution structure (interleaved modules)

Block-centric

◮ Algorithm from a block’s perspective ◮ Agnostic of processor-level considerations Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 6 / 15

slide-10
SLIDE 10

LU: Capabilities

Composable library

◮ Modular program structure ◮ Seamless execution structure (interleaved modules)

Block-centric

◮ Algorithm from a block’s perspective ◮ Agnostic of processor-level considerations

Separation of concerns

◮ Domain specialist codes algorithm ◮ Systems specialist codes tuning, resource mgmt etc

Lines of Code Module-specific CI C++ Total Commits Factorization 517 419 936 472/572 83%

  • Mem. Aware Sched.

9 492 501 86/125 69% Mapping 10 72 82 29/42 69%

Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 6 / 15

slide-11
SLIDE 11

LU: Capabilities

Flexible data placement

◮ Don’t mind client’s layout - transposition is cheap ◮ Variations don’t impose on client ◮ Can improve performance1

Memory-constrained dynamic lookahead

1Lifflander et al., IPDPS 2012 Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 7 / 15

slide-12
SLIDE 12

LU: Performance

Weak Scaling: (N such that matrix fills 75% memory)

0.1 1 10 100 128 1024 8192 Total TFlop/s Number of Cores Theoretical peak on XT5 Weak scaling on XT5

67% 67.4% 67.4% 67.1% 66.2% 65.7%

Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 8 / 15

slide-13
SLIDE 13

LU: Performance

... and strong scaling too! (N=96,000)

0.1 1 10 100 128 1024 8192 Total TFlop/s Number of Cores Theoretical peak on XT5 Weak scaling on XT5 Theoretical peak on BG/P Strong scaling on BG/P

60.3% 45% 40.8% 31.6%

Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 9 / 15

slide-14
SLIDE 14

Parallel IO

MPI-IO is selfish, still demands dedicated nodes Overlap IO in-line with the application!

Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 10 / 15

slide-15
SLIDE 15

Parallel IO

Architecture

Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 11 / 15

slide-16
SLIDE 16

Parallel IO

Implementation notes

Forward data to selected processors for stripe-disjoint access Buffer to write whole stripes (not in results shown)

Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 12 / 15

slide-17
SLIDE 17

Parallel IO

Implementation

void Manager::write(Token token, const char *data, size_t bytes, size_t offset) { Options &opts = files[token].opts; do { size_t stripe = offset / opts.peStripe; int pe = opts.basePE + stripe * opts.skipPEs; size_t bytesToSend = min(bytes, opts.peStripe - offset % opts.peStripe); thisProxy[pe].write_forwardData(token, data, bytesToSend, offset); data += bytesToSend;

  • ffset += bytesToSend;

bytes -= bytesToSend; } while (bytes > 0); }

Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 13 / 15

slide-18
SLIDE 18

Parallel IO

Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 14 / 15

slide-19
SLIDE 19

Conclusion

Parallel libraries needn’t be call and return Need to respect resource bounds Applications can find other work to do Let developers fully utilize system resources

Miller, Kal´ e (PPL, UIUC) Composable Parallel Libraries in Charm++ SIAM PP12: 15 February 2012 15 / 15