Global HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and - - PowerPoint PPT Presentation

▶

Nov 22, 2023 756 likes •929 views

Global HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain, Steve Deitz, Mary Beth Hribar, Wayne Wong Chapel Team, Cray Inc. Overview Chapel:

SLIDE 1

Global HPCC Benchmarks in Chapel:

STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission

November 14, 2006 Brad Chamberlain, Steve Deitz, Mary Beth Hribar, Wayne Wong

Chapel Team, Cray Inc.

SLIDE 2

HPCC BOF, SC06

Overview

Chapel: Cray’s HPCS language Our approach to the HPC Challenge codes:

performance-minded
clear, intuitive, readable
general across…

types problem parameters modular boundaries

SLIDE 3

HPCC BOF, SC06

Code Size Summary

156 124 86 1406 1668 433

200 400 600 800 1000 1200 1400 1600 1800

Reference Chapel Reference Chapel Reference Chapel

SLOC

Reference Version Framework Computation Chapel Version

Prob. Size (common)

Results and output Verification Initialization Kernel declarations Kernel computation

STREAM Triad Random Access FFT

SLIDE 4

HPCC BOF, SC06

Chapel Code Size Summary

156 86

124

20 40 60 80 100 120 140 160 180

STREAM Triad Random Access FFT SLOC

Problem Size (common) Results and output Verification Initialization Kernel declarations Kernel computation

SLIDE 5

HPCC BOF, SC06

Chapel Code Size Summary

1299 593

863

200 400 600 800 1000 1200 1400

STREAM Triad Random Access FFT Static Lexical Tokens

Problem Size (common) Results and output Verification Initialization Kernel declarations Kernel computation

SLIDE 6

HPCC BOF, SC06

STREAM Triad Overview

const ProblemSpace: domain(1) distributed(Block) = [1..m]; var A, B, C: [ProblemSpace] elemType; A = B + alpha * C;

SLIDE 7

HPCC BOF, SC06

STREAM Triad Overview

const ProblemSpace: domain(1) distributed(Block) = [1..m]; var A, B, C: [ProblemSpace] elemType; A = B + alpha * C;

ProblemSpace Declare a 1D arithmetic domain (first-class index set) L0 L1 L2 L3 L4 Specify its distribution A B C Use domain to declare distributed arrays alpha = + * = + * = + * = + * = + * Express computation using promoted scalar operators and whole-array references ⇒ parallel computation

SLIDE 8

HPCC BOF, SC06

Random Access Overview

[i in TableSpace] T(i) = i; forall block in subBlocks(updateSpace) do for r in RAStream(block.numIndices, block.low) do T(r & indexMask) ^= r;

SLIDE 9

HPCC BOF, SC06

Random Access Overview

[i in TableSpace] T(i) = i; forall block in subBlocks(updateSpace) do for r in RAStream(block.numIndices, block.low) do T(r & indexMask) ^= r;

Initialize table using a forall expression Express table updates using forall- and for-loops Random stream expressed modularly using an iterator

iterator RAStream(numvals, start:randType = 0): randType { var val = getNthRandom(start); for i in 1..numvals { getNextRandom(val); yield val; } }

SLIDE 10

HPCC BOF, SC06

FFT Overview (radix 4)

for i in [2..log2(numElements)) by 2 { const m = spanradix, m2 = 2m; forall (k,k1) in (Adom by m2, 0..) { var wk2 = …, wk1 = …, wk3 = …; forall j in [k..k+span) do butterfly(wk1, wk2, wk3, A[j..j+3span by span]); wk1 = …; wk3 = …; wk2 = 1.0i; forall j in [k+m..k+m+span) do butterfly(wk1, wk2, wk3, A[j..j+3span by span]); } span = radix; } def butterfly(wk1, wk2, wk3, inout A:[1..radix]) { … }

SLIDE 11

HPCC BOF, SC06

FFT Overview (radix 4)

for i in [2..log2(numElements)) by 2 { const m = spanradix, m2 = 2m; forall (k,k1) in (Adom by m2, 0..) { var wk2 = …, wk1 = …, wk3 = …; forall j in [k..k+span) do butterfly(wk1, wk2, wk3, A[j..j+3span by span]); wk1 = …; wk3 = …; wk2 = 1.0i; forall j in [k+m..k+m+span) do butterfly(wk1, wk2, wk3, A[j..j+3span by span]); } span = radix; } def butterfly(wk1, wk2, wk3, inout A:[1..radix]) { … }

Parallelism expressed using nested forall-loops Support for complex and imaginary math simplifies FFT arithmetic Generic arguments allow routine to be called with complex, real, or imaginary twiddle factors

SLIDE 12

HPCC BOF, SC06

Chapel Compiler Status

All codes compile and run with our current Chapel compiler

focus to date has been on…

prototyping Chapel, not performance targeting a single locale

platforms: Linux, Cygwin (Windows), Mac OS X, SunOS, …

No meaningful performance results yet

written report contains performance discussions for our codes

Upcoming milestones

December 2006: limited release to HPLS team
2007: work on distributed-memory execution and optimizations
SC07: intend to have publishable performance results for HPCC`07

SLIDE 13

HPCC BOF, SC06

Summary

Have expressed HPCC codes attractively

clear, concise, general
express parallelism, compile and execute correctly on one locale
benefit from Chapel’s global-view parallelism
utilize generic programming and modern SW Engineering principles

Our written report contains:

complete source listings
detailed walkthroughs of our solutions as Chapel tutorial
performance notes for our implementations

Report and presentation available at our website:

http://chapel.cs.washington.edu

We’re interested in your feedback:

chapel_info@cray.com

SLIDE 14

Backup Slides

SLIDE 15

HPCC BOF, SC06

Compact High-Level Code…

CG EP FT

54 36 17 25 3

10 20 30 40 50 60 70 80 F+MPI ZPL

Language Lines of Code communication declarations computation

82 38 79 37 89

50 100 150 200 250 300 F+MPI ZPL

Language Lines of Code