GiST Scan Acceleration Using Coprocessors Felix Beier, Torsten - - PowerPoint PPT Presentation

gist scan acceleration using coprocessors
SMART_READER_LITE
LIVE PREVIEW

GiST Scan Acceleration Using Coprocessors Felix Beier, Torsten - - PowerPoint PPT Presentation

Introduction GiST Hardware Abstraction Layer Evaluation Summary GiST Scan Acceleration Using Coprocessors Felix Beier, Torsten Kilias, Kai-Uwe Sattler Ilmenau University of Technology 05/21/2012 1 / 18 Introduction GiST Hardware


slide-1
SLIDE 1

Introduction GiST Hardware Abstraction Layer Evaluation Summary

GiST Scan Acceleration Using Coprocessors

Felix Beier, Torsten Kilias, Kai-Uwe Sattler

Ilmenau University of Technology

05/21/2012

1 / 18

slide-2
SLIDE 2

Introduction GiST Hardware Abstraction Layer Evaluation Summary

Outline

1

Introduction

2

GiST Hardware Abstraction Layer

3

Evaluation

4

Summary

2 / 18

slide-3
SLIDE 3

Introduction GiST Hardware Abstraction Layer Evaluation Summary

Co-processing Index Searches

Source: [1] Source: [5]

Ray tracing: many independent point queries Collision detection (spatial join): many independent range queries Utilization of massive parallelism offered by modern coprocessors → Special index structures carefully tuned for specific hardware

3 / 18

slide-4
SLIDE 4

Introduction GiST Hardware Abstraction Layer Evaluation Summary

Index Frameworks

Source: [3] Source: [2]

Various applications require specialized index structures Scientific data

Enormous data volumes Unknown data characteristics Costly prototyping Gap: scientists vs. system developers

→ Rapid index development with frameworks like GiST

4 / 18

slide-5
SLIDE 5

Introduction GiST Hardware Abstraction Layer Evaluation Summary

GiST - Generalized Index Search Tree

Framework for implementation of height-balanced search trees

Implements common tree operations (insertions, deletions, node splits, height-balancing) Developer specifies key data type and type-specific operations Lookup predicate returns

false: entry can not be found in child subtree true: entry may be found in child subtree

Example: R-tree

Key type: minimal bounding rectangles Predicate: rectangle intersection test

key1 key2 ... Internal Nodes (directory) Leaf Nodes (linked list)

Source: [4] 5 / 18

slide-6
SLIDE 6

Introduction GiST Hardware Abstraction Layer Evaluation Summary

Motivation

Combining the best of both worlds

Extensibility of GiST framework Performance improvements through co-processing

Challenges

Finding fine-grained parallel algorithms Utilizing hardware capabilities Out-of-core implementation Consideration of co-processing overheads

6 / 18

slide-7
SLIDE 7

Introduction GiST Hardware Abstraction Layer Evaluation Summary

Framework Design

Applications issue stream of query batches Iterator for matching leaf nodes Grouping queries to node batches for better locality Specialized scan implementation for various (co)processors Automatic scheduling to best execution unit Out-of-core implementation

Index Framework

Memory Layer Execution Layer Control Layer

Application

Buffer Pool

root N1 NM Nk N1 NM ... Nk ... ... root

CPUs GPUs

...

Worker Thread 1 Worker Thread T

...

nod e i nod e i nod e i nod e i nod e i Matching Inner Nodes Matching Leaf Nodes Query Stream ... Node Buffer ... ... ... ... N1 root

Scheduler Result Sets

... ...
  • Ni, Nj, ...
M a t c h i n g N
  • d
e s . . . N j , . . .
  • N
i , N j , . . . N k

7 / 18

slide-8
SLIDE 8

Introduction GiST Hardware Abstraction Layer Evaluation Summary

Scan Parallelization

Inter-Node Parallelization: Independent node batches Pipelining

root

child nodes

N1

child nodes

Nm

child nodes

...

child nodes

Nk

child nodes

...

child nodes

...

child nodes

...

Layer 1 2

... ... ... ...

Query Queue Query Queue Query Queue Query Queue Query Queue Query Queue Query Queue

Intra-Node Parallelization: Independent predicate tests SIMD features

node j

child nodes

Query Queue

node i

child nodes

Query Queue

node k

child nodes

Query Queue

max matching child nodes

8 / 18

slide-9
SLIDE 9

Introduction GiST Hardware Abstraction Layer Evaluation Summary

GPU Implementation - Processing Model

Nvidia CUDA implementation Multiple cores per GPU device

Execution of independent subtasks without synchronization Subtask = node - query batch pair

Multiple thread processors per GPU core

Execution of data parallel instructions by separate threads Synchronization possible Instruction = predicate test

9 / 18

slide-10
SLIDE 10

Introduction GiST Hardware Abstraction Layer Evaluation Summary

GPU Implementation - Memory Hierarchy

Global main memory on device

Explicit transfer from host memory via PCIe bus Caching of node and query data to avoid transfer overhead Scan preparation phase to determine input data offsets

Shared memory on each die

Two orders of magnitude faster than global memory Software-controlled cache for scan data

10 / 18

slide-11
SLIDE 11

Introduction GiST Hardware Abstraction Layer Evaluation Summary

Setup

3-D R-tree implementation Generated index nodes Intel Xeon X5690 CPU Nvidia Tesla C2050 GPU → Where shall a scan be executed? → How can the performance be improved with hybrid processing?

11 / 18

slide-12
SLIDE 12

Introduction GiST Hardware Abstraction Layer Evaluation Summary

Tree Parameters

Generated index nodes Generated predicates Full processor utilization Overheads included for GPU measurements speedup = CPU time

GPU time

64 128 192 256 320 384 448 512 number of queries per task 32 64 96 128 number of slots 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

12 / 18

slide-13
SLIDE 13

Introduction GiST Hardware Abstraction Layer Evaluation Summary

Workload Simulation

CPU for small batches GPU for large batches How do batch sizes change when queries are streamed through the tree? Simulation for full R-tree

96 slots per node 5 layers → 8 billion indexed entries 10.000 root queries

13 / 18

slide-14
SLIDE 14

Introduction GiST Hardware Abstraction Layer Evaluation Summary

Workload Simulation - Parameter Correlation

!" #!" $!" %!" &!" '!" (!" )!" *!" +!" !" $!" &!" (!" *!" #!!" !" '" #!" #'" $!" $'" %!" %'"

!"#$%&'()&*+,-'&'#$.()#/'0(123#

+(,-./012,$'",-3.340565072,#!!!!,8//0,9:3;5312,'#$,<=>?=04@-5A3 BCD,EF.7 GCD,EF.7 83=. HI3=. HI3=.,61,BCD HI3=.,61,GCD HI3=.,61,83=. 83=.,61,BCD 83=.,61,GCD 4,++'53$%,# !'53$%-'(!"#$%&' !"#$%&'()&*+,-'&'#$

14 / 18

slide-15
SLIDE 15

Introduction GiST Hardware Abstraction Layer Evaluation Summary

Workload Simulation - Parameter Selectivity

'" #!" #'" $!" $'" %!" %'" &!" &'" '!" !" $!" &!" (!" *!" #!!" J#!" !" #!" $!" %!" &!" '!" (!" )!"

!"#$%&'()&*+,-'&'#$.()#/'0(123#

+(,-./012,$'",K:L.54=0312,#!!!!,8//0,9:3;5312,'#$,<=>?=04@-5A3 BCD,EF.7 GCD,EF.7 83=. HI3=. HI3=.,61,BCD HI3=.,61,GCD HI3=.,61,83=. 83=.,61,BCD 83=.,61,GCD 1'5'2$%-%$6 !'53$%-'(!"#$%&' !"#$%&'()&*+,-'&'#$

15 / 18

slide-16
SLIDE 16

Introduction GiST Hardware Abstraction Layer Evaluation Summary

Conclusion & Outlook

Conclusion Extended GiST with hardware abstraction layer Performance improvements are possible Overheads are not negligible! Next steps Prototype improvements Specialization for other tree types Full support for all GiST operations

16 / 18

slide-17
SLIDE 17

Introduction GiST Hardware Abstraction Layer Evaluation Summary

References

[1] http://en.wikipedia.org/wiki/File:Ray_trace_diagram.svg. [2] Science and Technology Review, June 2007. ”Virtual Dams Subjected to Strong Earthquakes”. [3] Chourasia, A., Olsen, K., Cui, Y., Lee, K., Zhou, J., Ely, G., Small, P., Roten, D., Day, S., Maechling, P., Jordan, T., Panda, D. K., and Levesque, J. Ground motion visualization of M8 earthquake simulation using height field. In SciDAC (2011). Available at http://www.mcs.anl.gov/uploads/cels/papers/scidac11/. [4] Hellerstein, J. M., Naughton, J. F., and Pfeffer, A. Generalized Search Trees for Database Systems. In VLDB (1995). [5] Kavan, L., and Zara, J. Fast Collision Detection for Skeletally Deformable Models. Computer Graphics Forum 24, 3 (2005), 363–372.

17 / 18

slide-18
SLIDE 18

Introduction GiST Hardware Abstraction Layer Evaluation Summary

Discussion

Thanks for your attention! Questions?

18 / 18