Application Accelerators: Application Accelerators: Application - PowerPoint PPT Presentation

Application Accelerators: Application Accelerators: Application Accelerators: Application Accelerators: Dues ex Dues ex machina machina machina ? ? Dues ex Dues ex machina CCGSC, Flat Rock, North Carolina CCGSC, Flat Rock, North Carolina CCGSC, Flat Rock, North Carolina CCGSC, Flat Rock, North Carolina Jeffrey S. Vetter Jeffrey S. Vetter Oak Ridge National Laboratory Oak Ridge National Laboratory and and Georgia Institute of Technology Georgia Institute of Technology

Highlights Highlights � Background and motivation � Background and motivation – Current trends in architectures favor two strategies • Homogenous multicore • Application accelerators � Correct architecture for an application can provide � Correct architecture for an application can provide astounding results astounding results � Challenges to adopting application accelerators � Challenges to adopting application accelerators – Performance prediction – Productive software systems � Solutions from Siskiyou � Solutions from Siskiyou – Modeling assertions – Multi-paradigm procedure call 2

The Drama The Drama � Years of prosperity � Years of prosperity – Increasing large-scale parallelism – Increasing number of transistors – Increasing clock speed – Stable programming models and languages � Notable constraints force a new utility function for � Notable constraints force a new utility function for architectures architectures – Signaling – Power – Heat / thermal envelope – Packaging – Memory, I/O, interconnect latency and bandwidth – Instruction level parallelism – Market trends favor ‘good enough’ computing – Economist 3

Current Approaches to Current Approaches to Continue Improving Performance Continue Improving Performance � Chip Multiprocessors � Chip Multiprocessors – Homogenous multicore – Intel – AMD – IBM � Application accelerators to augment general � Application accelerators to augment general purpose multi- -cores cores purpose multi 4

5 Results from Initial Multicores Multicores Provide Performance Boost Provide Performance Boost POP Results from Initial DGEMM

Quad Kilo- -core chips are on the w ay! core chips are on the w ay! Quad Kilo � 4 core chips coming � 4 core chips coming � 8 core chips likely � 8 core chips likely � ?? � ?? � Rapport � Rapport – Rapport currently offers a 256 core chip – Planning 1024 core chip in 2007 – Kilocore™ – Targeted at mobile and other consumer applications 6

Enter Application Accelerators Enter Application Accelerators � Optional hardware installed to accelerate applications � Optional hardware installed to accelerate applications beyond the performance of the general purpose beyond the performance of the general purpose processor processor Intel Woodcrest NVIDIA Quadro NVIDIA GeForce IBM Cell ClearSpeed Dual Core FX 4500 GPU 6600 GPU Processor Avalon clock frequency 3.0 GHz 470 MHz 350 MHz 3.2 GHz 250 MHz type CPU accelerator card accelerator card CPU accelerator card power usage 80 W 110 W 30 W 100 W 20 W speed single / ~48 GFLOPS / 180 GFLOPS / 20 GFLOPS / 256 GFLOPS / 50 GFLOPS / double ~24 GFLOPS NA NA 25 GFLOPS 50 GFLOPS precision PCIe / MXM 1 card PCIe / MXM 1 card typical size CPU socket CPU socket PCI-X card cooling heatsink + fan heatsink + fan HS-only or HS+fan heatsink + fan HS-only 7

8 … Graphics Cards Graphics Cards For Example … For Example

9 … STI Cell STI Cell For Example … For Example

10 … ClearSpeed ClearSpeed For Example … For Example

11 … FPGAs FPGAs For Example … For Example

12 Torrenza Ecosystem Ecosystem AMD Torrenza AMD

Architectures that Match Application Requirements can offer Architectures that Match Application Requirements can offer Impressive/Astounding Performance Benefits Impressive/Astounding Performance Benefits Video Imagery Geo-registration 2k x 2k Output � Geo � Geo- -registration on GPU registration on GPU 1 – 700x speedup over commodity processor � Numerous FPGA results on � Numerous FPGA results on Time (seconds) 0.1 CPU P4 2.4GHz integer, logic, flop applications integer, logic, flop applications GPU GeForce 6600 with readback GPU QuadroFX 4500 with readback GPU GeForce 6600 – 40x on Smith-Waterman GPU QuadroFX 4500 0.01 – 10x speedup on MD � HPCC � HPCC RandomAccess RandomAccess on on Cray X1E Cray X1E 0.001 512x512 1024x1024 2048x2048 Input Image Size (pixels) – 7 GUPS on 512 MSPs Arbitrary Kernel, 32-bit, 4-color 64x64 Image – 32 GUPS on 64,000 procs 0.1 Molecular Dynamics Molecular Dynamics System Seconds System Seconds 0.01 CPU P4 (debug) Time (sec) CPU P4 (opt) Cell PPE 0.425 Cell PPE 0.425 Cell SPE GeForce 6600 MTA2 2 w/32 procs w/32 procs ~0.035 0.035 MTA ~ QuadroFX 4500 0.001 2.2GHz Opteron Opteron 0.125 2.2GHz 0.125 Cell w/ 8 SPEs SPEs 0.013 Cell w/ 8 0.013 0.0001 GPU (7900GT) GPU (7900GT) 0.012 0.012 3 5 7 9 11 13 15 17 19 21 23 25 Kernel Size 13

Disruptive Technologies and the S- -Curve Curve Disruptive Technologies and the S � D � Dé éj jà à vu? vu? – Floating Point Systems accelerator (1970-80s) – Weitek coprocessors (1980s) � Some differences � Some differences … … – Flops are free – Power and thermal envelopes are constraining designs 14

Significant Hurdles to Adoption for Significant Hurdles to Adoption for Accelerators (and multicores multicores?) ?) Accelerators (and � Performance prediction � Performance prediction – Should my organization purchase an accelerator? – What will be the performance improvement on my application workload with the accelerator? – Is the accelerator working as we expect? – How can I optimize my application for the accelerator? � Productive software systems � Productive software systems – Do I have to rewrite my application for each accelerator? – How stable is the performance across systems? 15

Performance Modeling Performance Modeling

Modeling Assertions Introduction Modeling Assertions Introduction � We need new application performance modeling techniques for � We need new application performance modeling techniques for HPC to tackle scale and architectural diversity HPC to tackle scale and architectural diversity – Performance modeling is quite useful at many stages in the architecture and application development process � Existing approaches � Existing approaches – Manual • Application driven – Automated • Target architecture driven – Black box schemes—accurate but applicability to a range of applications and systems is unknown � Goals � Goals – Aim to combine analytical and empirical schemes – A framework for systematic model development – performance engineering of applications – Modular – Hierarchical – Separate application and system variables – Based on ‘user’ or ‘code developer’ input—no magical solution – Scalable—future application and system configurations 17

Symbolic Performance Models w ith MA Symbolic Performance Models w ith MA Modeling Assertion (MA) = Empirical data + Symbolic modeling Declare important � � application variables Advantages over traditional Advantages over traditional modeling techniques modeling techniques – Modularity, portability and Incrementally refine Declare important extensibility model based on application operations error rates by – Parameterized, symbolic adding and models are evaluated with modifying variable and operation Matlab and Octave declarations Annotate code � � Construct, validate, and Construct, validate, and with MA API project application project application requirements as a function requirements as a function of input parameters of input parameters Validate Modeling Assertions empirically at runtime Terminate when model is representative& error level is acceptable 18

MA Framew ork MA Framew ork ma(f)_subroutine_start/end Source ma(f)_loop_start/end MA API in C code ma(f)_flop_start/stop (for Fortran & annotation ma(f)_heap/stack_memory C applications Classes of API ma(f)_mpi_xxxx With MPI) calls currently ma(f)_set/unset_tracing implemented and Runtime tested system generate trace files main () { ….. loop (NAME = conj_loop) (COUNT = niter) { loop (NAME = norm_loop) (COUNT = l2npcols) Model Control { mpi_irecv (NAME = nrecv) (SIZE = dp * 2); validation flow model mpi_send (NAME = nsend) (SIZE = dp * 2); send = niter*(l2npcols*(dp*2)+l2npcols*(dp)+ Symbolic model cgitmax*(l2npcols*(dp*na/num_proc_cols)+dp*na/n um_proc_cols+l2npcols*(dp)+l2npcols*(dp))+l2npc ols*(dp*na/num_proc_cols)+dp*na/num_proc_cols+l 2npcols*(dp)) Post-processing toolset (in Java) 19

Application Accelerators: Application Accelerators: Application - PowerPoint PPT Presentation

Application Accelerators: Application Accelerators: Application Accelerators: Application Accelerators: Dues ex Dues ex machina machina machina ? ? Dues ex Dues ex machina CCGSC, Flat Rock, North Carolina CCGSC, Flat Rock, North

DETECTORS AND ACCELERATORS DETECTORS AND ACCELERATORS APPLIED TO MEDICINE Jos Bernabu Jos

Accelerators for Americas Future ACCELERATORS - MODERN SHIPS OF DISCOVERY October 26, 2009

R265: Advanced Topics in Computer Architecture Seminar 7: HW accelerators and accelerators for

Confidential Accelerators Stavros Volos Microsoft Research Accelerators Play Pivotal Role in

Activities on accelerators in Spain Francis Perez ALBA Accelerators Head on behalf of

EUCARD2/WP4:Applications Medium Energy Accelerators/Accelerators for Medicine Introduction Hywel

EUCARD2/WP4:Applica2ons Medium Energy Accelerators/Accelerators for Medicine

Post- -accelerators accelerators for EURISOL for EURISOL Post Marie- -H H l l ne

HISTORY HISTORY AND AND APPLICATIONS APPLICATIONS OF OF ACCELERATORS ACCELERATORS

Scaling Datacenter Accelerators With Compute-Reuse Architectures Adi Fuchs and David Wentzlaff

Laser plasma accelerators: Laser plasma accelerators: state-of-the-art and perspective

Chronos: Efficient Speculative Parallelism for Accelerators MALEEN ABEYDEERA, DANIEL SANCHEZ

Accelerators LISHEP Lecture I Oliver Brning CERN http://bruening.home.cern.ch/bruening

Power radiated in linear accelerators 1 In linear accelerators We need to evaluate the

Applications on Heterogeneous Platforms with Accelerators Accelerators and Hybrid Exascale

ION BEAMS PROVIDED BY SMALL ACCELERATORS FOR MATERIAL SYNTHESIS AND CHARACTERIZATION A. Mackova

T alk Outline Platform Overview Introducing Metro style games App Model Object model for

Dialogue management, system design & evaluation Pierre Lison IN4080 : Natural Language

L OGO TO SVG Vladimir Batagelj Department of mathematics, FMF, University of Ljubljana Jadranska

3. Text and document databases Normal databases: formatted records; document databases:

Testing a URL Class http://www.askigor.org/status.php?id=sample Query Protocol Host Path 31

Pattern Review Pattern Name and Classification: A descriptive and unique name that helps in

Software Engineering I (02161) Week 7: Sequence Diagrams, Class Diagrams II, Layered Architecture

Writing FreeBSD IR driver for ARM boards using evdev interface Ganbold Tsagaankhuu, Mongolian

Application Accelerators: Application Accelerators: Application - PowerPoint PPT Presentation

Application Accelerators: Application Accelerators: Application Accelerators: Application Accelerators: Dues ex Dues ex machina machina machina ? ? Dues ex Dues ex machina CCGSC, Flat Rock, North Carolina CCGSC, Flat Rock, North

DETECTORS AND ACCELERATORS DETECTORS AND ACCELERATORS APPLIED TO MEDICINE Jos Bernabu Jos

Accelerators for Americas Future ACCELERATORS - MODERN SHIPS OF DISCOVERY October 26, 2009

R265: Advanced Topics in Computer Architecture Seminar 7: HW accelerators and accelerators for

Confidential Accelerators Stavros Volos Microsoft Research Accelerators Play Pivotal Role in

Activities on accelerators in Spain Francis Perez ALBA Accelerators Head on behalf of

EUCARD2/WP4:Applications Medium Energy Accelerators/Accelerators for Medicine Introduction Hywel

EUCARD2/WP4:Applica2ons Medium Energy Accelerators/Accelerators for Medicine

Post- -accelerators accelerators for EURISOL for EURISOL Post Marie- -H H l l ne

HISTORY HISTORY AND AND APPLICATIONS APPLICATIONS OF OF ACCELERATORS ACCELERATORS

Scaling Datacenter Accelerators With Compute-Reuse Architectures Adi Fuchs and David Wentzlaff

Laser plasma accelerators: Laser plasma accelerators: state-of-the-art and perspective

Chronos: Efficient Speculative Parallelism for Accelerators MALEEN ABEYDEERA, DANIEL SANCHEZ

Accelerators LISHEP Lecture I Oliver Brning CERN http://bruening.home.cern.ch/bruening

Power radiated in linear accelerators 1 In linear accelerators We need to evaluate the

Applications on Heterogeneous Platforms with Accelerators Accelerators and Hybrid Exascale

ION BEAMS PROVIDED BY SMALL ACCELERATORS FOR MATERIAL SYNTHESIS AND CHARACTERIZATION A. Mackova

T alk Outline Platform Overview Introducing Metro style games App Model Object model for

Dialogue management, system design &amp; evaluation Pierre Lison IN4080 : Natural Language

L OGO TO SVG Vladimir Batagelj Department of mathematics, FMF, University of Ljubljana Jadranska

3. Text and document databases Normal databases: formatted records; document databases:

Testing a URL Class http://www.askigor.org/status.php?id=sample Query Protocol Host Path 31

Pattern Review Pattern Name and Classification: A descriptive and unique name that helps in

Software Engineering I (02161) Week 7: Sequence Diagrams, Class Diagrams II, Layered Architecture

Writing FreeBSD IR driver for ARM boards using evdev interface Ganbold Tsagaankhuu, Mongolian

Dialogue management, system design & evaluation Pierre Lison IN4080 : Natural Language