PG-Strom Query Acceleration Engine of PostgreSQL Powered by GPGPU - - PowerPoint PPT Presentation

pg strom
SMART_READER_LITE
LIVE PREVIEW

PG-Strom Query Acceleration Engine of PostgreSQL Powered by GPGPU - - PowerPoint PPT Presentation

PG-Strom Query Acceleration Engine of PostgreSQL Powered by GPGPU NEC OSS Promotion Center The PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> Self Introduction Name: KaiGai Kohei Company: NEC Mission: Software


slide-1
SLIDE 1

PG-Strom

Query Acceleration Engine of PostgreSQL Powered by GPGPU

NEC OSS Promotion Center The PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>

slide-2
SLIDE 2

Self Introduction

▌Name: KaiGai Kohei ▌Company: NEC ▌Mission: Software architect & Intrepreneur ▌Background:

 Linux kernel development (2003~?)  PostgreSQL development (2006~)  SAP alliance (2011~2013)  PG-Strom development & productization (2012~)

▌PG-Strom Project:

 In-company startup of NEC  Also, an open source software project

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.2

slide-3
SLIDE 3

What is PG-Strom

▌An Extension of PostgreSQL ▌Off-loads CPU intensive SQL workloads to GPU processors ▌Major Features

① Automatic and just-in-time GPU code generation from SQL ② Asynchronous and concurrent query executor

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.3

database Query Executor Query Planner Custom Executor Custom Planner GPU code

  • n the fly

SQL command Async- Execution PG-Strom Query Frontend

slide-4
SLIDE 4

Concept ▌No Pain

 Looks like a traditional PostgreSQL database from standpoint of applications, thus, we can utilize existing tools, drivers, applications.

▌No Tuning

 Massive computing capability by GPGPU kills necessity of database tuning by human. It allows engineering folks to focus on the task only human can do.

▌No Complexity

 No need to export large data to external tools from RDBMS, because its computing performance is sufficient to run the workloads nearby data.

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.4

slide-5
SLIDE 5

Technology Trend

▌Movement to CPU/GPU integrated architecture rather than multicore CPU ▌Free lunch for SW by HW evolution will finish soon  Unless software is not designed to utilize GPU capability, unable to pull-out the full hardware capability.

DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQL

  • Page. 5

SOURCE: THE HEART OF AMD INNOVATION, Lisa Su, at AMD Developer Summit 2013

slide-6
SLIDE 6

Background: How SQL is executed

▌Planner constructs query execution plan based on cost estimation ▌SQL never defines how to execute the query, but what shall be returned

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.6

postgres# EXPLAIN SELECT cat, avg(x) FROM t0 NATURAL JOIN t1 WHERE t0.z like '%abc%' GROUP BY cat; QUERY PLAN

  • HashAggregate (cost=6629.88..6629.89 rows=1 width=12)

Group Key: t0.cat

  • > Hash Join (cost=1234.00..6619.77 rows=2020 width=12)

Hash Cond: (t0.aid = t1.aid)

  • > Seq Scan on t0 (cost=0.00..5358.00 rows=2020 width=16)

Filter: (z ~~ '%abc%'::text)

  • > Hash (cost=734.00..734.00 rows=40000 width=4)
  • > Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=4)

(8 rows)

slide-7
SLIDE 7

Background: Custom-Plan Interface

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.7

Aggregate

SELECT cat, avg(x) FROM t1, t2 WHERE t1.id = t2.id AND y > 100 GROUP BY cat;

Scan on t1 Scan on t2 Join t1 t2 key: cat

  • Hash Join
  • Merge Join
  • Nested Loop
  • Custom Join
  • Seq Scan
  • Index Scan
  • Index-Only Scan
  • Tid Scan
  • Custom Scan

IndexScan on t1 y > 100 “BulkLoad” on t1 “GpuHashJoin” t1.id = t2.id

slide-8
SLIDE 8

PG-Strom Features

▌Logics

 GpuScan ... Parallel evaluation of scan qualifiers  GpuHashJoin ... Parallel multi-relational join  GpuPreAgg ... Two phase aggregation  GpuSort ... GPU + CPU Hybrid Sorting  GpuNestedLoop (in develop)

▌Data Types

 Integer, Float, Date/Time, Numeric, Text

▌Function and Operators

 Equality and comparison operators  Arithmetic operators and mathematical functions  Aggregates: count, min/max, sum, avg, std, var, corr, regr

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.8

slide-9
SLIDE 9

Automatic GPU code generation

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.9

postgres=# SET pg_strom.show_device_kernel = on; postgres=# EXPLAIN VERBOSE SELECT * FROM t0 WHERE sqrt(x+y) < 10; QUERY PLAN

  • Custom Scan (GpuScan) on public.t0 (cost=500.00..357569.35 rows=6666683 width=77)

Output: id, cat, aid, bid, cid, did, eid, x, y, z Device Filter: (sqrt((t0.x + t0.y)) < 10::double precision) Features: likely-tuple-slot Kernel Source: #include "opencl_common.h“ : static pg_bool_t gpuscan_qual_eval(__private cl_int *errcode, __global kern_parambuf *kparams, __global kern_data_store *kds, __global kern_data_store *ktoast, size_t kds_index) { pg_float8_t KPARAM_0 = pg_float8_param(kparams,errcode,0); pg_float8_t KVAR_8 = pg_float8_vref(kds,ktoast,errcode,7,kds_index); pg_float8_t KVAR_9 = pg_float8_vref(kds,ktoast,errcode,8,kds_index); return pgfn_float8lt(errcode, pgfn_dsqrt(errcode, pgfn_float8pl(errcode, KVAR_8, KVAR_9)), KPARAM_0); }

slide-10
SLIDE 10

Implementation (1/3) – GpuScan

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.10

Table

DMA Send DMA Recv DMA Send DMA Recv DMA Send DMA Recv

Execution of auto-generated GPU code Result Output Stream Input Stream Chunk

(16~64MB)

slide-11
SLIDE 11

PostgreSQL PG-Strom

Software Architecture

DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQL

  • Page. 11

GPU Code Generator

Storage

Storage Manager

Shared Buffer

Query Parser Query Optimizer Query Executor

SQL Query Breaks down the query to parse tree Makes query execution plan Run the query Custom-Plan APIs

GpuScan GpuHashJoin GpuPreAgg GPU Program Manager PG-Strom OpenCL Server Message Queue GpuSort

※ Current version based on OpenCL

slide-12
SLIDE 12

Implementation (2/3) – GpuHashJoin

PG-Strom Preview Feb-2015

  • Page. 12

Inner relation Outer relation Inner relation Outer relation

Hash Table Hash Table

Next stage Next stage

CPU just references materialized results Hash-Table Search by CPU Sequential Materialization by CPU Parallel Materialization Parallel Hash-Table Search

vanilla Hash-Join GpuHashJoin

slide-13
SLIDE 13

Benchmark result (1/2) – simple tables join

▌Benchmark Query: SELECT * FROM t0 NATURAL JOIN t1 [NATURAL JOIN ....]; ▌Environment:

 t0 has 100million rows (13GB), t1-t9 has 40,000 rows for each, all-data pre-loaded  CPU: Xeon E5-2670v3 (12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.13 18.19 19.45 21.04 23.66 26.69 37.64 43.22 49.57 56.38 64.27 87.73 109.73 132.21 155.10 179.62 207.85 233.31 263.51 0.00 50.00 100.00 150.00 200.00 250.00 300.00 2 3 4 5 6 7 8 9 10

Query Response Time [sec] number of tables joined

Simple Tables Join Benchmark

PG-Strom PostgreSQL

slide-14
SLIDE 14

Implementation (3/3) – GpuPreAgg

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.14

Table

1st Stage Reduction 2nd Stage Reduction

Chunk

(16~64MB)

slide-15
SLIDE 15

Benchmark result (2/2) – Star Schema Model

▌40 typical reporting queries ▌100GB of retail / start-schema data, all pre-loaded ▌Environment

 CPU: Xeon E5-2670v3(12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.15 0.00 200.00 400.00 600.00 800.00 1000.00 1200.00 1400.00 1600.00 1800.00 2000.00 Q.01 Q.02 Q.03 Q.04 Q.05 Q.06 Q.07 Q.08 Q.09 Q.10 Q.11 Q.12 Q.13 Q.14 Q.15 Q.16 Q.17 Q.18 Q.19 Q.20 Q.21 Q.22 Q.23 Q.24 Q.25 Q.26 Q.27 Q.28 Q.29 Q.30 Q.31 Q.32 Q.33 Q.34 Q.35 Q.36 Q.37 Q.38 Q.39 Q.40

Query Response Time [sec]

Typical Reporting Queries on Retail / Star-Schema Data

PG-Strom PostgreSQL

slide-16
SLIDE 16

Expected Scenario – Reduction of ETL

▌ETL – Its design is human centric task ▌Replication – much automatous task

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.16

ERP CRM SCM BI OLTP database OLAP database

ETL

OLAP Cubes Master / Fact Tables

BI

Replication

Replica of Master / Fact Tables

Optimized to transaction workloads Optimized to analytic workloads Sufficient to analytic workloads also

PG-Strom

slide-17
SLIDE 17

Direction of PG-Strom

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.17

slide-18
SLIDE 18

Development Plan

 Migration of OpenCL to CUDA  Add support of GpuNestedLoop  Add support multi-functional kernel  Standardization of custom-join interface  ...and more...?

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.18

Short term target: PostgreSQL v9.5 timeline (2015) Middle term target: PostgreSQL v9.6 timeline (2016) Current version: PG-Strom β + PostgreSQL v9.5devel

 Integration with funnel executor  Investigation to SSD/NvRAM utilization  Custom-sort/aggregate interface  Add support for spatial data types (?)

slide-19
SLIDE 19

Let’s try – Deployment on AWS

The PostgreSQL Conference 2014, Tokyo - GPGPU Accelerates PostgreSQL

  • Page. 19

Search by “strom” !

AWS GPU Instance (g2.2xlarge) CPU Xeon E5-2670 (8 xCPU) RAM 15GB GPU NVIDIA GRID K2 (1536core) Storage 60GB of SSD Price $0.898/hour, $646.56/mon

(*) Price for on-demand instance

  • n Tokyo region at Nov-2014
slide-20
SLIDE 20

Welcome your involvement!

▌How to be involved?

 as a user  as a developer  as a business partner

▌Source code

 https://github.com/pg-strom/devel

▌Contact US

 e-mail: kaigai@ak.jp.nec.com  twitter: @kkaigai

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.20

check it out!

...or, catch me in the Convention Center

slide-21
SLIDE 21
slide-22
SLIDE 22