Dynamic Data Structures for the GPU
John Owens Child Family Professor of Engineering & Entrepreneurship Department of Electrical & Computer Engineering UC Davis Joint work with Martin Farach-Colton
Dynamic Data Structures for the GPU John Owens Child Family - - PowerPoint PPT Presentation
Dynamic Data Structures for the GPU John Owens Child Family Professor of Engineering & Entrepreneurship Department of Electrical & Computer Engineering UC Davis Joint work with Martin Farach-Colton CUDA Programming Model (SPMD + SIMD)
John Owens Child Family Professor of Engineering & Entrepreneurship Department of Electrical & Computer Engineering UC Davis Joint work with Martin Farach-Colton
kernels; copy results back
blocks
“core” (SM)
that can cooperate with each other by:
shared memory
cannot cooperate
Host Kernel 1 Kernel 2 Device Grid 1 Block (0, 0) Block (1, 0) Block (2, 0) Block (0, 1) Block (1, 1) Block (2, 1) Grid 2 Block (1, 1)
Thread (0, 1) Thread (1, 1) Thread (2, 1) Thread (3, 1) Thread (4, 1) Thread (0, 2) Thread (1, 2) Thread (2, 2) Thread (3, 2) Thread (4, 2) Thread (0, 0) Thread (1, 0) Thread (2, 0) Thread (3, 0) Thread (4, 0)
Level Computation Memory Global Kernels DRAM (12 GB) Per-block Blocks (MIMD within a kernel) (~15) L2 cache (1.57 MB) Per-warp Warps (MIMD within a block) Shared/L1 cache (48 kB/SM x 15 SMs = 720 kB) Per-thread Threads (32-wide SIMD within a thread) (≥30k) Registers (64k/SM * 4 B/register = 262 kB/ SM * 15 SMs = 3.93 MB)
memory hierarchy e.g., thread coarsening
accesses (threads in a warp should access neighboring locations in memory)
1.57 MB L2 / 30k threads = 51 B/thread
Initial key distribution Warp level reordering Block level reordering Multisplit result
32 64 96 128 160 192 224 256 2 4 6 Buckets
Tero Karras. Maximizing parallelism in the construction of BVHs, octrees, and k- d trees. In High-Performance Graphics, HPG ’12, pages 33–37, June 2012.
GPU
parallelizations?
parallel-friendly updates?
a) b) c)
Saman Ashkiani, Shengren Li, Martin Farach-Colton, Nina Amenta, and John D. Owens. GPU COLA: A dynamic dictionary data structure for the GPU. February 2016. Unpublished.
independently)
bitmap subtrie key
0110
bitmap subtrie
0101 0010
key key
0001 1001 Root C node C node S node S node S node
(set not list)