"On the Efficacy of a Fused CPU + GPU Processor (or APU) for - - PowerPoint PPT Presentation

on the efficacy of a fused cpu gpu processor or apu for
SMART_READER_LITE
LIVE PREVIEW

"On the Efficacy of a Fused CPU + GPU Processor (or APU) for - - PowerPoint PPT Presentation

"On the Efficacy of a Fused CPU + GPU Processor (or APU) for Parallel Computing" Paper: Mayank Daga, et. al Presentation: Liesl Wigand General: an APU is Accelerated Processing Unit Combines CPU and GPU onto a single die Best


slide-1
SLIDE 1

"On the Efficacy of a Fused CPU + GPU Processor (or APU) for Parallel Computing"

Paper: Mayank Daga, et. al Presentation: Liesl Wigand

slide-2
SLIDE 2

General: an APU is

Accelerated Processing Unit

  • Combines CPU and GPU onto a single die
  • Best of both worlds - with some problems
  • AMD Fusion is the focus of this paper
  • Other APUs:

○ IBM CELL ○ Intel HD Graphics: Core i3, i5 and i7 contain graphics cores ○ Projects Denver: NVIDIA teaming with Windows 8

slide-3
SLIDE 3

General: AMD Fusion: Zacate, also known as AMD E-350

  • Radeon HD 6310 GPU
  • 80 GPU cores
  • 1.6 GHz CPU
  • 500 MHz GPU
  • 2 Compute Units

/SIMD /SM

  • 192 MB Memory
slide-4
SLIDE 4

Amdahl's Law

  • An estimate of parallel speed-up

S =

  • This assumes:

○ Same processor capabilities ○ Constant Workload ○ Little or no overhead

  • Underestimates
slide-5
SLIDE 5

Altered Amdahl

  • S' =
  • p' is the parallel fraction, assuming core capabilities vary
  • p/N is gone: It's not the number of cores, it's how you use them
  • Added overhead value o

○ Data transfer ○ Thread set up and rejoining

  • No more handy reduction to 1/s for large N
slide-6
SLIDE 6

Tested:

  • Tested against :

○ AMD Radeon HD 5870 (a good GPU) ○ AMD Radeon HD 5450 (a GPU similar to the APU)

  • Tested using:

○ Bandwidth Test ○ Fast Fourier Transform ○ Molecular Dynamics ○ Scan ○ Reduction

slide-7
SLIDE 7

Results

  • Data transfer benefits: Good.
  • Actual processing times: So-so.
slide-8
SLIDE 8

So...

  • Clearly no PCIe helps
  • But:

○ Internal memory is slow ○ Small data-transfer sizes are slow Improvements needed:

  • Better memory merging
  • Improved on die GPU
slide-9
SLIDE 9
slide-10
SLIDE 10

References

AMD Fusion; http://sites.amd.com/us/fusion/apu/Pages/fusion.aspx IBM CELL; http://www.research.ibm.com/cell/ Project Denver; http://blogs.nvidia.com/2011/01/project-denver- processor-to-usher-in-new-era-of-computing/ Intel HD Graphics; http://ark.intel.com/products/50072/Intel-Core-i5- 2540M-Processor-(3M-Cache-2_60-GHz) Error; http://browse.deviantart.com/?q=error% 20message&order=9&offset=0#/d196lsa