Accelerating The Cloud with Heterogeneous Computing Sahil Suneja, - - PowerPoint PPT Presentation

accelerating the cloud
SMART_READER_LITE
LIVE PREVIEW

Accelerating The Cloud with Heterogeneous Computing Sahil Suneja, - - PowerPoint PPT Presentation

Accelerating The Cloud with Heterogeneous Computing Sahil Suneja, Elliott Baron, Eyal de Lara, Ryan Johnson GPGPU Computing Data Parallel Tasks Apply a fixed operation in parallel to each element of a data array Examples


slide-1
SLIDE 1

Sahil Suneja, Elliott Baron, Eyal de Lara, Ryan Johnson

Accelerating The Cloud

with Heterogeneous Computing

slide-2
SLIDE 2

GPGPU Computing

2

 Data Parallel Tasks

 Apply a fixed operation in parallel to each

element of a data array

 Examples

 Bioinformatics  Data Mining  Computational Finance  NOT Systems Tasks

 High-latency memory copying

slide-3
SLIDE 3

Game Changer – On-Chip GPUs

 Processors combining CPU/GPU on one die  AMD Fusion APU, Intel Sandy/Ivy Bridge  Share Main Memory  Very Low Latency  Energy Efficient

3

slide-4
SLIDE 4

Accelerating The Cloud

 Use GPUs to accelerate Data Parallel Systems Tasks  Better Performance  Offload CPU for other tasks  No Cache Pollution  Better Energy Efficiency (Silberstein et al, SYSTOR 2011)  Cloud Environment particularly attractive  Hybrid CPU/GPU will make it to the data center  GPU cores likely underutilized  Useful for Common Hypervisor Tasks 4

slide-5
SLIDE 5

Data Parallel Cloud Operations

 Memory Scrubbing  Batch Page Table Updates  Memory Compression  Virus Scanning  Memory Hashing

6

slide-6
SLIDE 6

 Complications

 Different Privilege Levels  Multiple Users

 Requirements

 Performance Isolation  Memory Protection

7

Hardware Management

slide-7
SLIDE 7

Hardware Management

 Management Policies

 VMM Only  Time Multiplexing  Space Multiplexing

8

slide-8
SLIDE 8

Memory Access

  • All Tasks mentioned assume GPU can Directly

Access Main (CPU) Memory

  • Many require Write Access
  • Currently, CPU <-> GPU copying required!
  • Even though both share Main Memory
  • Makes some tasks infeasible on GPU, others

less efficient

9

slide-9
SLIDE 9

Case Study – Page Sharing

 “De-duplicate” Memory  Hashing identifies sharing candidates  Remove all, but one physical copy  Heavy on CPU  Scanning Frequency ∝ Sharing Opportunities

10

slide-10
SLIDE 10

Memory Hashing Evaluation

11

2 4 6 8 10 12 14 16 CPU GPU CPU GPU Fusion Discrete Time (s)

Running Time (CPU vs. GPU)

slide-11
SLIDE 11

Conclusion/Summary

 Hybrid CPU/GPU Processors Are Here  Get Full Benefit in Data Centres

 Accelerate and Offload Administrative Tasks

 Need to Consider Effective Management and

Remedy Memory Access Issues

 Memory Hashing Example Shows Promise

 Over Order of Magnitude Faster

22

slide-12
SLIDE 12

Extra Slides

slide-13
SLIDE 13

Memory Hashing Evaluation

17

50 100 150 200 250 300 350 400 450 500 Memory Kernel Memory Kernel Fusion Discrete Time (ms)

Running Time (Memory vs. Kernel)

slide-14
SLIDE 14

CPU Overhead

 Measure performance degradation of CPU-

Heavy program

 Hashing via CPU = 50% Overhead  Hashing via GPU = 25% Overhead

 Without Memory Transfers = 11% Overhead

21