Memory Management Strategies in CPU/GPU Database Systems: A Survey - - PowerPoint PPT Presentation

▶

Mar 09, 2023 124 likes •296 views

Workgroup Databases and Software Engineering Memory Management Strategies in CPU/GPU Database Systems: A Survey Iya Arefyeva, David Broneske, Gabriel Campero Durand, Marcus Pinnecke, Gunter Saake Presenter: Marten Wallewein-Eising 1

SLIDE 1

Memory Management Strategies in CPU/GPU Database Systems: A Survey

Iya Arefyeva, David Broneske, Gabriel Campero Durand, Marcus Pinnecke, Gunter Saake Presenter: Marten Wallewein-Eising

Workgroup Databases and Software Engineering

SLIDE 2

Motivation

Summit Supercomputer, Oak Ridge

GPGPU is an essential technology nowadays:

3 out of Top 5 from HPC 500 (June 2018) are powered by GPUs
56% of the flops on the list come from GPU acceleration

2 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

SLIDE 3

A modern server can have hundreds of GB of RAM (e.g. 64 TB on IBM Power System E980) Top workstation GPUs, on the other hand:

Nvidia Tesla V100
AMD Radeon Vega Frontier Edition

(both released in June 2017)

nly 16GB of memory

picture from ibm.com picture from nvidia.com

Motivation

3 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

SLIDE 4

GPU memory size is not enough to store all the data
No shared memory between a CPU and a GPU
Data has to be transferred over the PCI-E bus
Bandwidth is increasing with the years

but

The limitations have

to be considered until they are eliminated

Motivation

4 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

memory bandwidth of Nvidia Tesla V100

SLIDE 5

Motivation

An ideal GPU memory management model should be able to:

1. allow for GPU memory oversubscription
2. minimize the idle time of the GPU by
verlapping transfers and computations
3. avoid unnecessary transfers
4. keep the data coherent

5 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

SLIDE 6

Strategies [1]

Programmer-managed GPU memory:

○ divide-and-conquer approach ○ Unified Memory

Pinned host memory

○ Mapped memory ○ Unified Virtual Addressing

6 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

SLIDE 7

Divide-and-conquer

The data is split into chunks small enough for the GPU memory, the output is merged on the CPU.

Serial processing:
Asynchronous processing:

highest speedup when transfer time is ≤ processing time

used by He et al. [2] and Wang et al. [3]

7 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

SLIDE 8

Mapped memory

Direct access to data in host memory with implicit data transfers.

1. A block of pinned (page-locked) host memory

is allocated

2. Data is copied to this memory block
3. Accessed data is transferred to the GPU

memory during processing

used in Bakkum et al. [4] and Yuan et al. [5]

8 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

SLIDE 9

Unified Virtual Addressing*

Makes usage of mapped memory more

convenient

GPU and CPU share address space: identical

pointers for pinned host memory

Data can be directly transferred between two

GPUs

* supported starting from CUDA 4.0 and requires a Fermi-class GPU with compute capability ≥ 2.0 used in Appuswamy et al. [6]

9 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

SLIDE 10

Unified Memory*

Automatically manages device memory allocations and data transfers.

single pointer for both GPU and CPU memories
migrates the data (accessed pages) between

GPU and CPU

Makes programming easier, but does not always

lead to performance improvements [7] [8]

* supported starting from CUDA 6.0 and requires compute capability ≥ 3.0

10 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

SLIDE 11

Properties

Data location

UM UVA mapped memory divide

and-

conquer

MAIN MEMORY GPU MEMORY

Memory oversubscription

YES NO

mapped memory divide

and-

conquer UM UVA UVA = Unified Virtual Addressing UM = Unified Memory

11 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

SLIDE 12

Comparison

Performance

verlapped

transfers and executions fast memory accesses no unnecessary data transfers

UM UVA mapped memory divide

and-

conquer

12 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

SLIDE 13

Comparison

Convenience

no explicit allocations and transfers no coherence problems unified address space

UM UVA mapped memory divide-and-conquer

13 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

SLIDE 14

When to use?

UM UVA mapped memory divide

and-

conquer

+ GPU operations are read-only + long processing time

GPU changes data

(requires synchronization)

accessing a small portion of

data (unnecessary transfers) + repeated accesses by one device + data changed by both devices

the same data often accessed

by both devices + data is big + not all elements are accessed

repeated accesses by GPU

+ data is big + not all elements are accessed (for data in the main memory)

repeated accesses by GPU

(for data in the main memory)

14 Arefyeva et al., Memory Management Strategies in CPU/GPU Database Systems: A Survey

SLIDE 15

Thank you!

Questions?

you are also welcome to send questions/suggestions to iya.arefyeva@ovgu.de

SLIDE 16

References

1. Kim, Y., Lee, J., Jo, J.E. and Kim, J., 2014, February. GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management. In High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on (pp. 546-557). IEEE. 2. He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational query coprocessing on graphics processors. TODS 34(4) (2009) 21 3. Wang, K., Zhang, K., Yuan, Y., Ma, S., Lee, R., Ding, X., Zhang, X.: Concurrent analytical query processing with GPUs. Proceedings of the VLDB Endowment 7(11) (2014) 1011-1022 4. Bakkum, P., Chakradhar, S.: Efficient data management for GPU databases. Technical report, High Performance Computing on Graphics Processing Units (2012) 5. Yuan, Y., Lee, R., Zhang, X.: The Yin and Yang of processing data warehousing queries on GPU devices. VLDB 6(10) (2013) 817-828 6. Appuswamy, R., Karpathiotakis, M., Porobic, D., Ailamaki, A.: The case for heterogeneous HTAP. In: CIDR. (2017) 7. Negrut, D., Serban, R., Li, A., Seidl, A.: Unified Memory in CUDA 6: A brief

verview and related data access. Technical Report TR-2014-09, University of

Wisconsin-Madison (2014) 8. Landaverde, R., Zhang, T., Coskun, A.K., Herbordt, M.: An investigation of unified memory access performance in CUDA. In: HPEC, IEEE (2014) 1-6