PG-Strom Query Acceleration Engine of PostgreSQL Powered by GPGPU - - PowerPoint PPT Presentation
PG-Strom Query Acceleration Engine of PostgreSQL Powered by GPGPU - - PowerPoint PPT Presentation
PG-Strom Query Acceleration Engine of PostgreSQL Powered by GPGPU NEC OSS Promotion Center The PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> Self Introduction Name: KaiGai Kohei Company: NEC Mission: Software
Self Introduction
▌Name: KaiGai Kohei ▌Company: NEC ▌Mission: Software architect & Intrepreneur ▌Background:
Linux kernel development (2003~?) PostgreSQL development (2006~) SAP alliance (2011~2013) PG-Strom development & productization (2012~)
▌PG-Strom Project:
In-company startup of NEC Also, an open source software project
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.2
What is PG-Strom
▌An Extension of PostgreSQL ▌Off-loads CPU intensive SQL workloads to GPU processors ▌Major Features
① Automatic and just-in-time GPU code generation from SQL ② Asynchronous and concurrent query executor
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.3
database Query Executor Query Planner Custom Executor Custom Planner GPU code
- n the fly
SQL command Async- Execution PG-Strom Query Frontend
Concept ▌No Pain
Looks like a traditional PostgreSQL database from standpoint of applications, thus, we can utilize existing tools, drivers, applications.
▌No Tuning
Massive computing capability by GPGPU kills necessity of database tuning by human. It allows engineering folks to focus on the task only human can do.
▌No Complexity
No need to export large data to external tools from RDBMS, because its computing performance is sufficient to run the workloads nearby data.
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.4
Technology Trend
▌Movement to CPU/GPU integrated architecture rather than multicore CPU ▌Free lunch for SW by HW evolution will finish soon Unless software is not designed to utilize GPU capability, unable to pull-out the full hardware capability.
DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQL
- Page. 5
SOURCE: THE HEART OF AMD INNOVATION, Lisa Su, at AMD Developer Summit 2013
Background: How SQL is executed
▌Planner constructs query execution plan based on cost estimation ▌SQL never defines how to execute the query, but what shall be returned
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.6
postgres# EXPLAIN SELECT cat, avg(x) FROM t0 NATURAL JOIN t1 WHERE t0.z like '%abc%' GROUP BY cat; QUERY PLAN
- HashAggregate (cost=6629.88..6629.89 rows=1 width=12)
Group Key: t0.cat
- > Hash Join (cost=1234.00..6619.77 rows=2020 width=12)
Hash Cond: (t0.aid = t1.aid)
- > Seq Scan on t0 (cost=0.00..5358.00 rows=2020 width=16)
Filter: (z ~~ '%abc%'::text)
- > Hash (cost=734.00..734.00 rows=40000 width=4)
- > Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=4)
(8 rows)
Background: Custom-Plan Interface
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.7
Aggregate
SELECT cat, avg(x) FROM t1, t2 WHERE t1.id = t2.id AND y > 100 GROUP BY cat;
Scan on t1 Scan on t2 Join t1 t2 key: cat
- Hash Join
- Merge Join
- Nested Loop
- Custom Join
- Seq Scan
- Index Scan
- Index-Only Scan
- Tid Scan
- Custom Scan
IndexScan on t1 y > 100 “BulkLoad” on t1 “GpuHashJoin” t1.id = t2.id
PG-Strom Features
▌Logics
GpuScan ... Parallel evaluation of scan qualifiers GpuHashJoin ... Parallel multi-relational join GpuPreAgg ... Two phase aggregation GpuSort ... GPU + CPU Hybrid Sorting GpuNestedLoop (in develop)
▌Data Types
Integer, Float, Date/Time, Numeric, Text
▌Function and Operators
Equality and comparison operators Arithmetic operators and mathematical functions Aggregates: count, min/max, sum, avg, std, var, corr, regr
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.8
Automatic GPU code generation
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.9
postgres=# SET pg_strom.show_device_kernel = on; postgres=# EXPLAIN VERBOSE SELECT * FROM t0 WHERE sqrt(x+y) < 10; QUERY PLAN
- Custom Scan (GpuScan) on public.t0 (cost=500.00..357569.35 rows=6666683 width=77)
Output: id, cat, aid, bid, cid, did, eid, x, y, z Device Filter: (sqrt((t0.x + t0.y)) < 10::double precision) Features: likely-tuple-slot Kernel Source: #include "opencl_common.h“ : static pg_bool_t gpuscan_qual_eval(__private cl_int *errcode, __global kern_parambuf *kparams, __global kern_data_store *kds, __global kern_data_store *ktoast, size_t kds_index) { pg_float8_t KPARAM_0 = pg_float8_param(kparams,errcode,0); pg_float8_t KVAR_8 = pg_float8_vref(kds,ktoast,errcode,7,kds_index); pg_float8_t KVAR_9 = pg_float8_vref(kds,ktoast,errcode,8,kds_index); return pgfn_float8lt(errcode, pgfn_dsqrt(errcode, pgfn_float8pl(errcode, KVAR_8, KVAR_9)), KPARAM_0); }
Implementation (1/3) – GpuScan
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.10
Table
DMA Send DMA Recv DMA Send DMA Recv DMA Send DMA Recv
Execution of auto-generated GPU code Result Output Stream Input Stream Chunk
(16~64MB)
PostgreSQL PG-Strom
Software Architecture
DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQL
- Page. 11
GPU Code Generator
Storage
Storage Manager
Shared Buffer
Query Parser Query Optimizer Query Executor
SQL Query Breaks down the query to parse tree Makes query execution plan Run the query Custom-Plan APIs
GpuScan GpuHashJoin GpuPreAgg GPU Program Manager PG-Strom OpenCL Server Message Queue GpuSort
※ Current version based on OpenCL
Implementation (2/3) – GpuHashJoin
PG-Strom Preview Feb-2015
- Page. 12
Inner relation Outer relation Inner relation Outer relation
Hash Table Hash Table
Next stage Next stage
CPU just references materialized results Hash-Table Search by CPU Sequential Materialization by CPU Parallel Materialization Parallel Hash-Table Search
vanilla Hash-Join GpuHashJoin
Benchmark result (1/2) – simple tables join
▌Benchmark Query: SELECT * FROM t0 NATURAL JOIN t1 [NATURAL JOIN ....]; ▌Environment:
t0 has 100million rows (13GB), t1-t9 has 40,000 rows for each, all-data pre-loaded CPU: Xeon E5-2670v3 (12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.13 18.19 19.45 21.04 23.66 26.69 37.64 43.22 49.57 56.38 64.27 87.73 109.73 132.21 155.10 179.62 207.85 233.31 263.51 0.00 50.00 100.00 150.00 200.00 250.00 300.00 2 3 4 5 6 7 8 9 10
Query Response Time [sec] number of tables joined
Simple Tables Join Benchmark
PG-Strom PostgreSQL
Implementation (3/3) – GpuPreAgg
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.14
Table
1st Stage Reduction 2nd Stage Reduction
Chunk
(16~64MB)
Benchmark result (2/2) – Star Schema Model
▌40 typical reporting queries ▌100GB of retail / start-schema data, all pre-loaded ▌Environment
CPU: Xeon E5-2670v3(12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.15 0.00 200.00 400.00 600.00 800.00 1000.00 1200.00 1400.00 1600.00 1800.00 2000.00 Q.01 Q.02 Q.03 Q.04 Q.05 Q.06 Q.07 Q.08 Q.09 Q.10 Q.11 Q.12 Q.13 Q.14 Q.15 Q.16 Q.17 Q.18 Q.19 Q.20 Q.21 Q.22 Q.23 Q.24 Q.25 Q.26 Q.27 Q.28 Q.29 Q.30 Q.31 Q.32 Q.33 Q.34 Q.35 Q.36 Q.37 Q.38 Q.39 Q.40
Query Response Time [sec]
Typical Reporting Queries on Retail / Star-Schema Data
PG-Strom PostgreSQL
Expected Scenario – Reduction of ETL
▌ETL – Its design is human centric task ▌Replication – much automatous task
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.16
ERP CRM SCM BI OLTP database OLAP database
ETL
OLAP Cubes Master / Fact Tables
BI
Replication
Replica of Master / Fact Tables
Optimized to transaction workloads Optimized to analytic workloads Sufficient to analytic workloads also
PG-Strom
Direction of PG-Strom
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.17
Development Plan
Migration of OpenCL to CUDA Add support of GpuNestedLoop Add support multi-functional kernel Standardization of custom-join interface ...and more...?
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.18
Short term target: PostgreSQL v9.5 timeline (2015) Middle term target: PostgreSQL v9.6 timeline (2016) Current version: PG-Strom β + PostgreSQL v9.5devel
Integration with funnel executor Investigation to SSD/NvRAM utilization Custom-sort/aggregate interface Add support for spatial data types (?)
Let’s try – Deployment on AWS
The PostgreSQL Conference 2014, Tokyo - GPGPU Accelerates PostgreSQL
- Page. 19
Search by “strom” !
AWS GPU Instance (g2.2xlarge) CPU Xeon E5-2670 (8 xCPU) RAM 15GB GPU NVIDIA GRID K2 (1536core) Storage 60GB of SSD Price $0.898/hour, $646.56/mon
(*) Price for on-demand instance
- n Tokyo region at Nov-2014
Welcome your involvement!
▌How to be involved?
as a user as a developer as a business partner
▌Source code
https://github.com/pg-strom/devel
▌Contact US
e-mail: kaigai@ak.jp.nec.com twitter: @kkaigai
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.20
check it out!