Texas Learning and Computation Center
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Application and Platform Adaptive Scientific Software Lennart - - PowerPoint PPT Presentation
Texas Learning and Computation Center Application and Platform Adaptive Scientific Software Lennart Johnsson Dragan Mirkovic University of Houston Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson Texas Learning and
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Fixed library code Generated code Code generator Unparser Scheduler Optimizer Initializer
(Algorithm Abstraction)
FFT Code Generator Library of FFT Modules Initialization Routines Mixed-Radix (Cooly-Tukey) Prime Factor Algorithm Split-Radix Algorithm Rader's Algorithm Execution Routines Utilities UHFFT Library
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Input Parameters System specifics, UHFFT Code generator Library of FFT modules Performance database User options
Installation
Input Parameters Size, dim., … Initialization Select best plan (factorization) Execution Calculate one
Run-time
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
L1: 64K+64K, L2: 4M 1.66 GFlops 833 MHz Alpha EV67/68 L1: 64K+64K, L2: 256K 1.4 GFlops 1.4 GHz AMD Athlon L1: 32K+32K, L2: 4M 1 GFlop 500 MHz MIPS R1x000 L1: 1.5M + 0.75M 3 GFlops 750 MHz HP PA 8x00 L1: 64K+32K, L2: 1-16M 1.5 GFlops 375 MHz IBM Power3/4 L1: 16K+16K L2: 92K, L3: 2-4M 3.2 GFlops 800 Mhz Intel Itanium L1: 32K+32K L2: 256K, L3: 1-2M 867 MFlops 867 MHz PowerPC G4 L1: 8K+8K, L2: 256K 1.8 GFlops 1.8 GHz Intel Pentium IV Cache structure Peak Performance Clock frequency Processor
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
50 100 150 200 250 300 350
MFLOPS
16 2 8 4 4 8 2 2 2 4 2 4 2 4 2 2 2 2 2 2
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
340 350 360 370 380 390 400 410 420 430
"MFLOPS"
9 5 8 7 7 9 5 8 5 7 8 9 5 8 7 9 8 9 5 7 8 5 7 9 8 7 9 5 9 7 5 8 5 9 8 7 8 7 5 9 9 5 7 8 9 7 8 5 5 8 9 7 5 7 9 8 7 8 9 5 7 5 8 9 8 5 9 7 9 8 5 7 7 8 5 9 7 5 9 8 8 9 7 5 9 8 7 5 7 9 8 5 5 9 7 8
Plan
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson 800 Mflops peak PFA sizes
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson
Alliance Performance Expedition Workshop March 14, 2002 Lennart Johnsson