Memory Management Strategies in CPU/GPU Database Systems: A Survey - - PowerPoint PPT Presentation

memory management strategies in cpu gpu database systems
SMART_READER_LITE
LIVE PREVIEW

Memory Management Strategies in CPU/GPU Database Systems: A Survey - - PowerPoint PPT Presentation

Workgroup Databases and Software Engineering Memory Management Strategies in CPU/GPU Database Systems: A Survey Iya Arefyeva, David Broneske, Gabriel Campero Durand, Marcus Pinnecke, Gunter Saake Presenter: Marten Wallewein-Eising 1


slide-1
SLIDE 1

Memory Management Strategies in CPU/GPU Database Systems: A Survey

Iya Arefyeva, David Broneske, Gabriel Campero Durand, Marcus Pinnecke, Gunter Saake Presenter: Marten Wallewein-Eising

Workgroup Databases and Software Engineering

1

slide-2
SLIDE 2

Motivation

Summit Supercomputer, Oak Ridge

GPGPU is an essential technology nowadays:

  • 3 out of Top 5 from HPC 500 (June 2018) are powered by GPUs
  • 56% of the flops on the list come from GPU acceleration

2 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

slide-3
SLIDE 3

A modern server can have hundreds of GB of RAM (e.g. 64 TB on IBM Power System E980) Top workstation GPUs, on the other hand:

  • Nvidia Tesla V100
  • AMD Radeon Vega Frontier Edition

(both released in June 2017)

  • nly 16GB of memory

picture from ibm.com picture from nvidia.com

Motivation

3 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

slide-4
SLIDE 4
  • GPU memory size is not enough to store all the data
  • No shared memory between a CPU and a GPU
  • Data has to be transferred over the PCI-E bus
  • Bandwidth is increasing with the years

but

  • The limitations have

to be considered until they are eliminated

Motivation

4 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

memory bandwidth of Nvidia Tesla V100

slide-5
SLIDE 5

Motivation

An ideal GPU memory management model should be able to:

  • 1. allow for GPU memory oversubscription
  • 2. minimize the idle time of the GPU by
  • verlapping transfers and computations
  • 3. avoid unnecessary transfers
  • 4. keep the data coherent

5 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

slide-6
SLIDE 6

Strategies [1]

  • Programmer-managed GPU memory:

○ divide-and-conquer approach ○ Unified Memory

  • Pinned host memory

○ Mapped memory ○ Unified Virtual Addressing

6 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

slide-7
SLIDE 7

Divide-and-conquer

The data is split into chunks small enough for the GPU memory, the output is merged on the CPU.

  • Serial processing:
  • Asynchronous processing:

highest speedup when transfer time is ≤ processing time

used by He et al. [2] and Wang et al. [3]

7 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

slide-8
SLIDE 8

Mapped memory

Direct access to data in host memory with implicit data transfers.

  • 1. A block of pinned (page-locked) host memory

is allocated

  • 2. Data is copied to this memory block
  • 3. Accessed data is transferred to the GPU

memory during processing

used in Bakkum et al. [4] and Yuan et al. [5]

8 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

slide-9
SLIDE 9

Unified Virtual Addressing*

  • Makes usage of mapped memory more

convenient

  • GPU and CPU share address space: identical

pointers for pinned host memory

  • Data can be directly transferred between two

GPUs

* supported starting from CUDA 4.0 and requires a Fermi-class GPU with compute capability ≥ 2.0 used in Appuswamy et al. [6]

9 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

slide-10
SLIDE 10

Unified Memory*

Automatically manages device memory allocations and data transfers.

  • single pointer for both GPU and CPU memories
  • migrates the data (accessed pages) between

GPU and CPU

  • Makes programming easier, but does not always

lead to performance improvements [7] [8]

* supported starting from CUDA 6.0 and requires compute capability ≥ 3.0

10 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

slide-11
SLIDE 11

Properties

Data location

UM UVA mapped memory divide

  • and-

conquer

MAIN MEMORY GPU MEMORY

Memory oversubscription

YES NO

mapped memory divide

  • and-

conquer UM UVA UVA = Unified Virtual Addressing UM = Unified Memory

11 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

slide-12
SLIDE 12

Comparison

Performance

  • verlapped

transfers and executions fast memory accesses no unnecessary data transfers

UM UVA mapped memory divide

  • and-

conquer

12 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

slide-13
SLIDE 13

Comparison

Convenience

no explicit allocations and transfers no coherence problems unified address space

UM UVA mapped memory divide-and-conquer

13 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

slide-14
SLIDE 14

When to use?

UM UVA mapped memory divide

  • and-

conquer

+ GPU operations are read-only + long processing time

  • GPU changes data

(requires synchronization)

  • accessing a small portion of

data (unnecessary transfers) + repeated accesses by one device + data changed by both devices

  • the same data often accessed

by both devices + data is big + not all elements are accessed

  • repeated accesses by GPU

+ data is big + not all elements are accessed (for data in the main memory)

  • repeated accesses by GPU

(for data in the main memory)

14 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

slide-15
SLIDE 15

Thank you!

Questions?

you are also welcome to send questions/suggestions to iya.arefyeva@ovgu.de

15

slide-16
SLIDE 16

References

1. Kim, Y., Lee, J., Jo, J.E. and Kim, J., 2014, February. GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management. In High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on (pp. 546-557). IEEE. 2. He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational query coprocessing on graphics processors. TODS 34(4) (2009) 21 3. Wang, K., Zhang, K., Yuan, Y., Ma, S., Lee, R., Ding, X., Zhang, X.: Concurrent analytical query processing with GPUs. Proceedings of the VLDB Endowment 7(11) (2014) 1011-1022 4. Bakkum, P., Chakradhar, S.: Efficient data management for GPU databases. Technical report, High Performance Computing on Graphics Processing Units (2012) 5. Yuan, Y., Lee, R., Zhang, X.: The Yin and Yang of processing data warehousing queries on GPU devices. VLDB 6(10) (2013) 817-828 6. Appuswamy, R., Karpathiotakis, M., Porobic, D., Ailamaki, A.: The case for heterogeneous HTAP. In: CIDR. (2017) 7. Negrut, D., Serban, R., Li, A., Seidl, A.: Unified Memory in CUDA 6: A brief

  • verview and related data access. Technical Report TR-2014-09, University of

Wisconsin-Madison (2014) 8. Landaverde, R., Zhang, T., Coskun, A.K., Herbordt, M.: An investigation of unified memory access performance in CUDA. In: HPEC, IEEE (2014) 1-6

16