1
Charm++ 2007 Kathy Yelick
Compilation Techniques for Partitioned Global Address Space - - PowerPoint PPT Presentation
Compilation Techniques for Partitioned Global Address Space Languages Katherine Yelick U.C. Berkeley and Lawrence Berkeley National Lab http://titanium.cs.berkeley.edu http://upc.lbl.gov 1 Charm++ 2007 Kathy Yelick HPC Programming: Where
1
Charm++ 2007 Kathy Yelick
Kathy Yelick, 2 Charm++ 2007
Slide source: Horst Simon and John Shalf, LBNL/NERSC
Kathy Yelick, 3 Charm++ 2007
100 1000 10000 100000 1E+06 1E+07 1E+08 1E+09 1E+10 1E+11 1E+12
1993 1996 1999 2002 2005 2008 2011 2014
SUM #1 #500
Slide source Horst Simon, LBNL
Kathy Yelick, 4 Charm++ 2007
5
Charm++ 2007 Kathy Yelick
Kathy Yelick, 6 Charm++ 2007
Global address space
Kathy Yelick, 7 Charm++ 2007
Kathy Yelick, 8 Charm++ 2007
Kathy Yelick, 9 Charm++ 2007
Joint work with Dan Bonachea
Kathy Yelick, 10 Charm++ 2007
100 200 300 400 500 600 700 800 900 10 100 1,000 10,000 100,000 1,000,000 Size (bytes) Bandwidth (MB/s)
GASNet put (nonblock)" MPI Flood
Re lative BW (GASNet/MPI)
1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 10 1000 100000 10000000 Size (bytes)
Joint work with Paul Hargrove and Dan Bonachea
(up is good)
NERSC Jacquard machine with Opteron processors
Kathy Yelick, 11 Charm++ 2007
8-byte Roundtrip Latency 14.6 6.6 22.1 9.6 6.6 4.5 9.5 18.5 24.2 13.5 17.8 8.3
5 10 15 20 25
Elan3/Alpha Elan4/IA64 Myrinet/x86 IB/G5 IB/Opteron SP/Fed
Roundtrip Latency (usec)
MPI ping-pong GASNet put+sync
Joint work with UPC Group; GASNet design by Dan Bonachea
Kathy Yelick, 12 Charm++ 2007
Flood Bandwidth for 2MB messages
1504 630 244 857 225 610 1490 799 255 858 228 795 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Elan3/Alpha Elan4/IA64 Myrinet/x86 IB/G5 IB/Opteron SP/Fed
Percent HW peak (BW in MB)
Joint work with UPC Group; GASNet design by Dan Bonachea
Kathy Yelick, 13 Charm++ 2007
Flood Bandwidth for 4KB messages
547 420 190 702 152 252 750 714 231 763 223 679 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Elan3/Alpha Elan4/IA64 Myrinet/x86 IB/G5 IB/Opteron SP/Fed
Percent HW peak
Joint work with UPC Group; GASNet design by Dan Bonachea
Kathy Yelick, 14 Charm++ 2007
Joint work with Chris Bell, Rajesh Nishtala, Dan Bonachea
Kathy Yelick, 15 Charm++ 2007
200 400 600 800 1000 M yrinet 64 Infi niBand 256 Elan3 256 Elan3 512 E lan4 256 Elan4 512 M F l
s p e r T h r e a d Best MFlop rates for all NAS FT Benchmark versions Best NAS Fortran/MPI Best MPI Best UPC
100 200 300 400 500 600 700 800 900 1000 1100 Myrinet 64 InfiniBand 256 Elan3 256 Elan3 512 Elan4 256 Elan4 512 MFlops per Thread Best NAS Fortran/MPI Best MPI (always Slabs) Best UPC (always Pencils)
Kathy Yelick, 16 Charm++ 2007
Kathy Yelick, 17 Charm++ 2007
Joint work with Jimmy Su
Kathy Yelick, 18 Charm++ 2007
1 1.1 1.2 1.3 1.4 1.5 1.6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 matrix number speedup average speedup maximum speedup
Joint work with Jimmy Su
Kathy Yelick, 19 Charm++ 2007
Kathy Yelick, 20 Charm++ 2007
Joint work with Amir Kamil and Jimmy Su
Kathy Yelick, 21 Charm++ 2007
Joint work with Amir Kamil and Jimmy Su
Kathy Yelick, 22 Charm++ 2007
Joint work with Amir Kamil
Kathy Yelick, 23 Charm++ 2007
1 Line counts do not include the reachable portion of the
Joint work with Amir Kamil
Kathy Yelick, 24 Charm++ 2007
Joint work with Amir Kamil
Kathy Yelick, 25 Charm++ 2007
Local Qualification Inference
10 20 30 40 50 60 70 80 90 100
3d-fft amr- poisson amr-gas gsrb lu-fact pi pps sample- sort demv spmv Benchmark % of Declarations
Old Constraint-Based (LQI) Thread-Aware Pointer Analysis Hierarchical Pointer Analysis
Joint work with Amir Kamil
Kathy Yelick, 26 Charm++ 2007
Private Qualification Inference
10 20 30 40 50 60 70 80 90 100
3d-fft amr- poisson amr-gas gsrb lu-fact pi pps sample- sort demv spmv Benchmark % of Declarations
Old Type-Based SQI Thread-Aware Pointer Analysis
Joint work with Amir Kamil
27
Charm++ 2007 Kathy Yelick
Kathy Yelick, 28 Charm++ 2007
AMR Titanium work by Tong Wen and Philip Colella
Kathy Yelick, 29 Charm++ 2007
automatically
Work by Tong Wen and Philip Colella; Communication optimizations joint with Jimmy Su 5000 10000 15000 20000 25000 30000 Titanium C++/F/MPI (Chombo) Lines of Code AMRElliptic AMRTools Util Grid AMR Array
Kathy Yelick, 30 Charm++ 2007
Speedup
10 20 30 40 50 60 70 80 16 28 36 56 112 #procs speedup Ti Chombo
Joint work with Tong Wen, Jimmy Su, Phil Colella
Kathy Yelick, 31 Charm++ 2007
Joint work with Ed Givelberg, Armando Solar-Lezama, Charlie Peskin, Dave McQueen
8000 Fortran 4000 Titanium
Kathy Yelick, 32 Charm++ 2007
10 20 30 40 50 1 2 4 8 16 32 64 128 procs time (secs)
256^3 on Power3/Colony 512^3 on Power3/Colony 512^2x256 on Pent4/Myrinet
0.5 1 1.5 2 1 2 4 8 16 32 64 128 procs time (secs)
128^3 on Power4/Federation 256^3 on Power4/Federation
Joint work with Ed Givelberg, Armando Solar-Lezama, Charlie Peskin, Dave McQueen
Kathy Yelick, 33 Charm++ 2007
Kathy Yelick, 34 Charm++ 2007
Joint work with Parry Husbands
Kathy Yelick, 35 Charm++ 2007
Joint work with Parry Husbands
Kathy Yelick, 36 Charm++ 2007
Kathy Yelick, 37 Charm++ 2007
Joint work with Parry Husbands
Kathy Yelick, 38 Charm++ 2007
Kathy Yelick, 39 Charm++ 2007
Joint work with Parry Husbands
Kathy Yelick, 40 Charm++ 2007
X1 UPC vs. MPI/HPL
200 400 600 800 1000 1200 1400 60 X1/64 X1/128 GFlop/s MPI/HPL UPC
Opteron cluster UPC vs. MPI/HPL
50 100 150 200 Opt/64 GFlop/s MPI/HPL UPC
Altix UPC. Vs. MPI/HPL
20 40 60 80 100 120 140 160 Alt/32 GFlop/s MPI/HPL UPC
Joint work with Parry Husbands
UPC vs. ScaLAPACK
20 40 60 80
2x4 pr oc gr id 4x4 pr oc gr id
GFlops ScaLAPACK UPC
Kathy Yelick, 41 Charm++ 2007
Kathy Yelick, 42 Charm++ 2007