The HPC Challenge Benchmarks and the PMaC project Certificates of - - PDF document

the hpc challenge benchmarks and the pmac project
SMART_READER_LITE
LIVE PREVIEW

The HPC Challenge Benchmarks and the PMaC project Certificates of - - PDF document

The HPC Challenge Benchmarks and the PMaC project Certificates of relevance for benchmarks Certificates of relevance for benchmarks Do they cover a useful performance space? Do they cover a useful performance space? Do they


slide-1
SLIDE 1

1

The HPC Challenge Benchmarks and the PMaC project

  • Certificates of relevance for benchmarks

– Do they cover a useful performance space? – Do they enable reasoning about expected app. Performance?

  • How practically to measure memory access

patterns in nature

  • Useful performance taxonomy
  • Certificates of relevance for benchmarks

– Do they cover a useful performance space? – Do they enable reasoning about expected app. Performance?

  • How practically to measure memory access

patterns in nature

  • Useful performance taxonomy

Components of a Performance Prediction Framework

  • Machine Profile - characterizations of the rates at which a

machine can (or is projected to) carry out fundamental

  • perations abstract from the particular application
  • Application Signature - detailed summaries of the fundamental
  • perations to be carried out by the application independent of

any particular machine Combine Machine Profile and Application Signature using:

  • Convolution Method - algebraic mapping of the application

signature onto the machine profile to calculate a performance prediction

  • Machine Profile - characterizations of the rates at which a

machine can (or is projected to) carry out fundamental

  • perations abstract from the particular application
  • Application Signature - detailed summaries of the fundamental
  • perations to be carried out by the application independent of

any particular machine Combine Machine Profile and Application Signature using:

  • Convolution Method - algebraic mapping of the application

signature onto the machine profile to calculate a performance prediction

slide-2
SLIDE 2

2

MAPS Data

MAPS –

Memory bandwidth benchmark measures memory rates (MB/s) for different levels of cache (L1, L2, L3, Main Memory) and different access patterns (stride-one and random)

MAPS –

Memory bandwidth benchmark measures memory rates (MB/s) for different levels of cache (L1, L2, L3, Main Memory) and different access patterns (stride-one and random)

Stride-one access L1 cache Stride-one access L1/L2 cache Random access L1/L2 cache Stride-one access L2 cache/Main Memory

Convolutions

MetaSim trace collected on Cobalt60 simulating SC45 memory structure MetaSim trace collected on Cobalt MetaSim trace collected on Cobalt60

60

simulating SC45 memory structure simulating SC45 memory structure 97.07 92.29 99.99 L2 hit rate 0.23 0.20 0.00 Random ratio 572 92.01 1.81E10 Ucm6 8649 1327 88.97 4.90E08 Poorgrd 10729 8851 97.28 2.22E11 Walldst 5247 Memory Bandwidth L1 hit rate # Memory References Procedure Name Basic- Block Number

n 1 i=1

Memory time = MemOpsBBi/MemRateBB

slide-3
SLIDE 3

3

Results-Predictions for AVUS (Cobalt60)

  • 1%

3,239 3,272 NAVO IBM PWR4+ (Romulus) 6,192 ARL IBM PWR4 (Shelton) +14% 9,488 8,354 MHPCC IBM PWR3 (Tempest)

  • 14%

4,258 4,932 MHPCC IBM PWR4 (Hurricane)

  • 3%

10,385 10,675 ARL IBM PWR3 (Brainerd) 3,459 ARL Linux Networx Xeon Cluster

  • 19%

2,688 3,334 ASC HP SC45 +2% 4,445 4,375 NAVO IBM PWR4 (Marcellus) +30% 11,180 8,601 NAVO IBM PWR3 (Habu) % Error Predicted time (s) Actual time (s) System

AVUS TI-05 standard data set on 64 CPUs AVUS TI AVUS TI-

  • 05 standard data set on 64 CPUs

05 standard data set on 64 CPUs

Spatial and Temporal Locality

How could one Quantify the Spatial and Temporal Locality in a Real Code?

SpatialScore(N) = (Refs Stride i / i) / Total Refs i=1

Σ

N TemporalScore(N) = Observed Reuse / (Total Refs – Spatial Refs)

slide-4
SLIDE 4

4

It’s Harder Than it Looks

Where does one plot RandomAccess? for ( i = 0; i < N; i++) { add = random_number; table[add] ^= random_number; }

Temporal Spatial

1 1

?

Update (design goal) Load + Store (temporal) Load + Store (spatial) Two loads + Store

HPC Challenge Benchmarks on axes

  • f spatial and temporal locality

Streams Random Access AVUS NAS CG C FFT HPL

  • 0.2

0.2 0.4 0.6 0.8 1 0.7 0.75 0.8 0.85 0.9 0.95 1 Spatial Tem poral