 
              FPL 2006 L. W. Howes P . Price O. Mencer O. Beckmann O. Pell Comparing FPGAs, GPUs and the PS2 Motivation using a unified source description The Future? Accelerators Benefits Technology Related Work L. W. Howes, P . Price, O. Mencer, O. Beckmann, O. Pell Implementation ASC targets Targeting the Department of Computing, Imperial College London architectures Example Limitations August 28, 2006 Results Conclusions 1 / 21
Motivation: Graphics Processing Units - the future? FPL 2006 L. W. Howes P . Price O. Mencer O. Beckmann O. Pell Motivation The Future? Accelerators Benefits Technology Related Work Implementation ASC targets Targeting the architectures Example Limitations Results Conclusions 2 / 21 Thanks to Mark Harris of NVIDIA for this graph
Motivation: Comparing Accelerators FPL 2006 L. W. Howes P . Price O. Mencer Different characteristics O. Beckmann O. Pell Applications Accelerators Motivation The Future? Accelerators Benefits Technology As a result, accelerators match some applications Related Work better than others Implementation ASC targets Targeting the Wish to learn which accelerator is best architectures Example Limitations Experiment fairly Results A single representation Conclusions 3 / 21
Motivation: Development FPL 2006 Heterogeneous architectures L. W. Howes P . Price A variety of programming methodologies O. Mencer O. Beckmann Even high level languages require low level knowledge O. Pell Development becomes slow and expensive Motivation The Future? Use a single source description Accelerators Benefits Technology Related Work A Stream Compiler Implementation (ASC) ASC targets Targeting the architectures Example Limitations Results Conclusions GPU FPGA PS2 4 / 21
Motivation: Single Source Benefits FPL 2006 Fair comparison of performance L. W. Howes on different architectures P . Price O. Mencer May need architecture specific optimisations O. Beckmann O. Pell Easier development for multiple architectures Could use architecture specific optimisations Motivation The Future? Allow integration of multiple accelerators into a project Accelerators - sharing the performance gain Benefits Technology Related Work A Stream Compiler Implementation ASC targets (ASC) Targeting the architectures Example Limitations Results Conclusions GPU FPGA PS2 5 / 21
Target: FPGAs FPL 2006 L. W. Howes P . Price Flexible O. Mencer O. Beckmann Highly parallel O. Pell Generally considered to be very difficult to program Motivation The Future? Accelerators Benefits Technology Related Work Implementation ASC targets Targeting the architectures Example Limitations Results Conclusions 6 / 21
Target: GPUs FPL 2006 Highly parallel L. W. Howes Widespread and used to accelerate P . Price O. Mencer graphics processing, largely for games O. Beckmann O. Pell Relatively low cost Recently being investigated for Motivation The Future? general purpose computation Accelerators Benefits Technology Host Related Work Implementation VP VP VP VP VP VP ASC targets Vertex Processors Targeting the Rasterisation architectures Example Limitations FP FP FP FP FP FP FP FP Results Texture Cache FP FP FP FP FP FP FP FP Conclusions Fragment Processors 7 / 21 DRAM DRAM DRAM DRAM
Target: PS2 FPL 2006 L. W. Howes Core MIPS processor P . Price O. Mencer Programmable vector units with local memory O. Beckmann O. Pell Large install base Motivation The real benefit: A step towards Cell The Future? Accelerators Benefits Technology Vector Unit 0 Vector Unit 1 Related Work Graphics (VU0) (VU1) MIPS CPU Synthesiser Implementation 4KB Data 16KB Data (EE Core) FPU (GS) 4KB Code 16KB Code 16KB I cache ASC targets 8KB D cache Targeting the architectures Vector Unit Vector Unit Graphics Example Scratch Pad Interface 0 Interface 1 Interface Limitations 32KB (VIF0) (VIF1) (GIF) Results Conclusions 2.4 Gb/s bus 10 Channel DMA Memory I/O Controller (DMAC) Interface Interface 8 / 21
Related Work FPL 2006 McCool et. al.;SIGGRAPH 2002 L. W. Howes P . Price Shader Metaprogramming O. Mencer O. Beckmann Cope et. al.; FPT 2005 O. Pell Have GPUs made FPGAs redudant in the field of Motivation Video Processing? The Future? Accelerators Benefits Cornwall et. al.; IPDPS 2006 Technology Automatically Translating a General Purpose Related Work C++ Image Processing Library for GPUs Implementation ASC targets Trancoso et. al.; DSD 2005 Targeting the architectures Exploring Graphics Processor Performance for Example Limitations General Purpose Applications Results Pavan Tumati; Undergraduate Thesis, Univ. Illinois Conclusions Sony Playstation-2 VPU: A Study on the Feasibility of Utilizing Gaming Vector Hardware for Scientific Computing 9 / 21
A Stream Compiler - ASC FPL 2006 Generates stream architectures for FPGAs L. W. Howes P . Price C++ object oriented approach to development O. Mencer O. Beckmann Combines algorithm, architecture and arithmetic levels O. Pell into a single tool Motivation The Future? Accelerators Benefits Technology Related Work Implementation ASC targets Targeting the architectures Example Limitations Results Conclusions 10 / 21
ASC Compilation FPL 2006 Map a data-flow graph directly to hardware L. W. Howes High throughput, low clock frequency P . Price O. Mencer O. Beckmann word1 word2 word3 word4 O. Pell key[x+3] key[x] key[x+1] key[x+2] * + + * Motivation The Future? Accelerators XOR XOR Benefits Technology key[x+4] * Related Work Implementation + ASC targets Targeting the key[x+5] architectures * Example Limitations + Results Conclusions XOR XOR XOR XOR 11 / 21 word1 word2 word3 word4
ASC for other architectures FPL 2006 ASC code represents the data flow of a program L. W. Howes P . Price O. Mencer The ASC data flow can be implemented for O. Beckmann O. Pell various architectures Motivation The Future? ASC Accelerators Code Benefits Technology Related Work ASC ASC Implementation ASC GPU PS2 ASC targets Accelerated Targeting the architectures Application Example Limitations GPU FPGA PS2 Results Conclusions Runtime API 12 / 21
Targeting the PS2 FPL 2006 PS2 vector units take entire data flow L. W. Howes Input data is split into blocks P . Price O. Mencer Data is fed to vector units to process each block in turn O. Beckmann O. Pell Makes use of operations on vector registers Can use both vector units to improve parallelism Motivation The Future? Accelerators ASC – PS2 executable Benefits Technology Dataflow Graph Related Work Implementation AST ASC targets Targeting the architectures PS2 ASM Example Limitations Data Vector Unit Calling Results Management Programs Program Conclusions Final Combined Executable 13 / 21 PS2 Emotion Engine
Targeting the GPU FPL 2006 Split data flow at various points and divide into computation kernels separated by intermediate arrays L. W. Howes P . Price Split at points of data reuse O. Mencer O. Beckmann Split where kernel complexity would be high O. Pell Uses the OpenGL Shader Language to Motivation program the GPU The Future? Accelerators Benefits ASC – GPU executable Technology Related Work Dataflow Graph Implementation ASC targets AST Targeting the architectures Example GLSL Code Limitations Results OpenGL C++ and Calling Conclusions Libraries GLSL Program Final Executable 14 / 21 GPU Hardware
Example: ASC code targeting the GPU FPL 2006 Example L. W. Howes P . Price STREAM_START; O. Mencer O. Beckmann O. Pell HWfloat input(IN); Motivation HWfloat temporary(TMP); The Future? HWfloat intermediate(TMP); Accelerators Benefits HWfloat output(OUT); Technology Related Work STREAM_LOOP(40); Implementation ASC targets Targeting the architectures temporary = input + prev(input,2); Example intermediate = temporary + prev(temporary,2); Limitations Results output = input + prev(intermediate,3) Conclusions + prev(temporary,4); STREAM_END_GLSL; 15 / 21
Recommend
More recommend