Are Databases Fit for Hybrid Workloads on GPUs? A Storage Engines - - PowerPoint PPT Presentation

are databases fit for hybrid workloads on gpus
SMART_READER_LITE
LIVE PREVIEW

Are Databases Fit for Hybrid Workloads on GPUs? A Storage Engines - - PowerPoint PPT Presentation

Database and Software Engineering Group University of Magdeburg Are Databases Fit for Hybrid Workloads on GPUs? A Storage Engines Perspective Marcus Pinnecke , David Broneske, Gabriel Campero Durand, Gunter Saake HardBD 2017, San Diego,


slide-1
SLIDE 1

HardBD 2017, San Diego, April 22, 2017

Are Databases Fit for Hybrid Workloads on GPUs?

A Storage Engine’s Perspective

Marcus Pinnecke, David Broneske, Gabriel Campero Durand, Gunter Saake

Database and Software Engineering Group

University of Magdeburg

slide-2
SLIDE 2

ANALYTICAL WORKLOADS OLTP Optimized OLAP Optimized TRANSACTIONAL WORKLOADS Main Processor Only Co-Processor Only Physical Record Layout Re-Organization Compute Device Re-Assignment HTAP Optimized Co-Processor Accelerated

Database

Storage Engine

Physical Record Layout Re-Organization HTAP Optimized OLTP Optimized P OLAP Optimized t

Hybrid Transaction and Analytic Processing (HTAP)

1

HTAP database systems run both OLTP & OLAP

  • HyPer, Peloton, HANA, …

benefit is larger business value, through:

  • less latency for analysis
  • less synchronization effort

related challenges

  • different data access pattern
  • adapt record layout (NSM, DSM,…)
  • interference between query types
  • contradicting optimization goals
  • different types of parallelism
  • hot and cold data
slide-3
SLIDE 3

2

ANALYTICAL WORKLOADS OLTP Optimized OLAP Optimized TRANSACTIONAL WORKLOADS Main Processor Only Co-Processor Only Physical Record Layout Re-Organization Compute Device Re-Assignment HTAP Optimized Co-Processor Accelerated

Database

Storage Engine

Database Systems on Heterogenous Platforms

heterogenous systems use co-processors

  • host (CPU), and device (e.g., GPU)
  • CoGaDB, GPUTx, Ocelot, …

benefit is exploiting compute capacities

  • vercome limitations of power wall
  • special jobs for specialized processors

related challenges

  • data transfer costs for I/O
  • different programming models
  • device limitations (e.g., memory capacity)
  • data and operator placement
slide-4
SLIDE 4

Motivation

slide-5
SLIDE 5

ANALYTICAL WORKLOADS OLTP Optimized OLAP Optimized TRANSACTIONAL WORKLOADS Main Processor Only Co-Processor Only Physical Record Layout Re-Organization Compute Device Re-Assignment HTAP Optimized Co-Processor Accelerated

Database

Physical Record Layout Re-Organization HTAP Optimized OLTP Optimized P OLAP Optimized t

ANALYTICAL WORKLOADS OLTP Optimized OLAP Optimized TRANSACTIONAL WORKLOADS Main Processor Only Co-Processor Only Physical Record Layout Re-Organization Compute Device Re-Assignment HTAP Optimized Co-Processor Accelerated

Database

Hybridization of HTAP and Heterogenous Computing

First: Is there performance potential?

3

HTAP Database Systems Heterogenous Database Systems TPC-C Benchmark Dataset

select * from customers where 150 customers

“OLTP“ query materialization

select sum(c_bought_item.price) from customers ⨝ … ⨝ item where 150 items

“HTAP“ query aggregation of some “OLAP“ query aggregation of all

select sum(price) from item where true

measured effort

slide-6
SLIDE 6

Hybridization of HTAP and Heterogenous Computing

First: Is there performance potential?

4

Setup TPC-C benchmark customer record 96B (21 fields) / item record 20B + 8B (4 fields + price field ), system configuration operator-at-a-time processing w/ late materialization, host: max. 8 threads blockwise partitioning, device: optimized parallel reduction kernel (>= 1024 blocks w/ 512 threads), final reduction on 1 block w/ 1024 threads, effort for join processing not incl.

50M 100M 150M throughput [records/s]

  • 0M

0.03M 0.06M 0.09M 0.12M 5M 25M 45M 65M 85M #records in customer table throughput [records/s] materialize 150 customers

row-store / host & multi-threaded

„OLTP“ query materialization

column-store / host & single-threaded column-store / host & multi-threaded row-store / host & single-threaded

higher values are better

slide-7
SLIDE 7

1000M 1500M 2000M throughput [records/s]

  • 0M

50M 100M 150M 10M 20M 30M 40M 50M 60M #records in item table throughput [records/s] sum prices of 150 items 0.03M 0.06M 0.09M 0.12M throughput [records/s]

Hybridization of HTAP and Heterogenous Computing

First: Is there performance potential?

5

Setup TPC-C benchmark customer record 96B (21 fields) / item record 20B + 8B (4 fields + price field ), system configuration operator-at-a-time processing w/ late materialization, host: max. 8 threads blockwise partitioning, device: optimized parallel reduction kernel (>= 1024 blocks w/ 512 threads), final reduction on 1 block w/ 1024 threads, effort for join processing not incl.

column-store / host & single-threaded row-store / host & single-threaded row-store / host & multi-threaded column-store / host & multi-threaded

„HTAP“ query aggregation of some

higher values are better

slide-8
SLIDE 8

10000M throughput [records/s]

  • 500M

1000M 1500M 2000M 5M 15M 25M 35M 45M 55M 65M #records in item table throughput [records/s] sum all prices in items table 55M

0.03M 0.06M 0.09M 0.12M throughput [records/s]

Hybridization of HTAP and Heterogenous Computing

First: Is there performance potential?

6

Setup TPC-C benchmark customer record 96B (21 fields) / item record 20B + 8B (4 fields + price field ), system configuration operator-at-a-time processing w/ late materialization, host: max. 8 threads blockwise partitioning, device: optimized parallel reduction kernel (>= 1024 blocks w/ 512 threads), final reduction on 1 block w/ 1024 threads, effort for join processing not incl.

column-store / host & single-threaded row-store / host & single-threaded row-store / host & multi-threaded column-store / host & multi-threaded

„OLAP“ query aggregation of all 5M

higher values are better

slide-9
SLIDE 9
  • 0M

2500M 5000M 7500M 10000M 5M 15M 25M 35M 45M 55M 65M #records in item table throughput [records/s] sum all prices in items table [transfer costs to device excluded]

55M 65M

55M 65M

0.03M 0.06M 0.09M 0.12M throughput [records/s]

Hybridization of HTAP and Heterogenous Computing

First: Is there performance potential?

7

Setup TPC-C benchmark customer record 96B (21 fields) / item record 20B + 8B (4 fields + price field ), system configuration operator-at-a-time processing w/ late materialization, host: max. 8 threads blockwise partitioning, device: optimized parallel reduction kernel (>= 1024 blocks w/ 512 threads), final reduction on 1 block w/ 1024 threads, effort for join processing not incl.

column-store / host & multi-threaded

„OLAP“ query aggregation of all

*Device transfer costs excluded

column-store / device *

higher values are better

slide-10
SLIDE 10

Hybridization of HTAP and Heterogenous Computing

First: Is there performance potential?

8

„OLTP“ query materialization „HTAP“ query aggregation of some „OLAP“ query aggregation of all

row-store / host & single-threaded

column-store / host & single-threaded

column-store / host & multi-threaded

column-store / device

Row-Store Column-Store Host Device

change switch

Column-Store Inter-query parallelism Inter-query parallelism Intra-query parallelism

switch

Storage Engine Query Engine

transition no transition

150 customers 150 items all items

slide-11
SLIDE 11

ANALYTICAL WORKLOADS OLTP Optimized OLAP Optimized TRANSACTIONAL WORKLOADS Main Processor Only Co-Processor Only Physical Record Layout Re-Organization Compute Device Re-Assignment HTAP Optimized Co-Processor Accelerated

9

To take advantage, we need to double hybridize to have the best of both worlds.

slide-12
SLIDE 12

Contribution

slide-13
SLIDE 13

10

Survey and Classification of SOTA Engines

Taxonomy

We defined a fine-grained set of concepts and relations between them in order to classify systems

  • Unified terms to compare different systems
  • Overview on design space
  • Possibility to assess adaptability w.r.t. transitions

Storage Engine Layout Handling Layout Flexibility Layout Adaptability Data Location Fragment Linearization Fragment Scheme Built-In Emulated Inflexible Flexible Weak Strong Constrained Unconstrained Static Target Host-Memory-Only Device-Memory-Only Mixed Locality Centralized Distributed Emulated Linearization NSM-Fixed Partially DSM-Emulated Replication-Based Delegation-Based Thin Fragments Fat Fragments Responsive Multi Layout Single Layout NSM-Fixed DSM-Fixed Direct Linearization Variable NSM DSM Variable DSM-Fixed Partially NSM-Emulated

Core concepts

  • Layout. Division of a relation in terms of fragments.
  • Fragment. „Gapless“ region of data.
  • Tuplet. Portion of tuple that fell into a fragment.
slide-14
SLIDE 14

11

Survey and Classification of SOTA Engines

Classification

HTAP systems OLAP or OLTP OLAP and OLTP co-processor support multiple layout support advanced flexibility for layout adaptive layouts during runtime mixed NSM/DSM support Co-processor systems OLAP or OLTP OLAP and OLTP co-processor support multiple layout-support advanced flexibility for layout adaptive layouts during runtime mixed NSM/DSM support

slide-15
SLIDE 15

Heterogenous HTAP

Survey and Classification of SOTA Engines

View on Results

HTAP host OLAP host/device device

PAX

  • Frac. Mirrors

HYRISE ES2 GPUTx H2O HyPer CoGaDB L-Store Peloton

static inflexible weak flexible strong flexible

PAX

  • Frac. Mirrors

HYRISE ES2 GPUTx H2O HyPer CoGaDB L-Store Peloton

layout adaptability restrictions to fragments

responsive (during runtime)

compute device support workload type support Heterogenous HTAP

12

Device HTAP

year

2016 2002

slide-16
SLIDE 16

Future Work

slide-17
SLIDE 17

13

Self-managing multi-model HTAP database system on CPU/GPUs Codename Vector Pipes Codename Grid Store Codename Alfred

Query Engine Storage Engine Optimizer

  • Micro-batch query execution
  • UDFs as first-class citizen
  • Multi query execution
  • Adaptive HTAP storage
  • Heterogenous platforms
  • Advanced-Tile based
  • Self-managing component
  • Learning on event streams

Peloton L-Store CoGaDB … X100 PIPES MapReduce …

slide-18
SLIDE 18

Grid Store: A Storage Engine for Heterogenous HTAP

Wrap Up and Outlook. Feedback is welcome

Challenges

  • different data access pattern
  • adapt record layout (NSM, DSM,…)
  • interference between query types
  • contradicting optimization goals
  • different types of parallelism
  • hot and cold data parts
  • data transfer costs for I/O
  • different programming models
  • device limitations (e.g., memory capacity)
  • data and operator placement

HTAP Heter.

1 2 3 4 5 6 7

14

slide-19
SLIDE 19

15

Grid Store: A Storage Engine for Heterogenous HTAP

Wrap Up and Outlook. Feedback is welcome

cold data (compressed) hot data, stored on device (DSM) hot data, stored on host (DSM)

A B C D E F G H

Example table layout in Grid Store (unconstrained strong flexible responsive layout w/ data placement) hot data, stored on device (NSM) hot data, stored on host (NSM) reasonability currently under study 1 2 4 3 5 4 3 6 7 7

Built tables from arbitrary tiles (grids) which…

  • … may live on host or device (or both)
  • … are self-contained (NSM/DSM/Indexed/…)
  • … mutable w.r.t. to access pattern and shape

Table layouts respond online to forecasted workload changes

  • shrink or enlarge grids, compress/uncompress
  • move grids (temporary, permanent) to platform

Backed by flexible query engine (Vector Pipes) and adaptive self-leaning optimizer on real-time analytics of system event stream (Alfred)

slide-20
SLIDE 20

Grid Store: A Storage Engine for Heterogenous HTAP

Wrap Up and Outlook. Feedback is welcome

14+15

Challenges

  • different data access pattern
  • adapt record layout (NSM, DSM,…)
  • interference between query types
  • contradicting optimization goals
  • different types of parallelism
  • hot and cold data parts
  • data transfer costs for I/O
  • different programming models
  • device limitations (e.g., memory capacity)
  • data and operator placement

HTAP Heter.

cold data (compressed) hot data, stored on device (DSM) hot data, stored on host (DSM)

A B C D E F G H

Example table layout in Grid Store (unconstrained strong flexible responsive layout w/ data placement) hot data, stored on device (NSM) hot data, stored on host (NSM) reasonability currently under study 1 2 3 4 5 6 7 1 2 4 3 5 4 3 6 7 7

Built tables by arbitrary tiles (grids) which…

  • … may live on host or device (or both)
  • … are self-contained (NSM/DSM/Indexed/…)
  • … mutable w.r.t. to access pattern and shape

Table layouts respond online to forecasted workload changes

  • shrink or enlarge grids, compress/uncompress
  • move grids (temporary, permanent) to platform

Backed by flexible query engine (Vector Pipes) and adaptive self-leaning optimizer on real-time analytics of system event stream (Alfred)