How to run SQL queries on TBs of data using GPUs Jake Wheat Lead - PowerPoint PPT Presentation

How to run SQL queries on TBs of data using GPUs Jake Wheat Lead Architect, SQream Technologies @sqreamtech SIL7138 1

How to run SQL queries on TBs of data using GPUs 1. A toy SQL query engine 2. Support wide range of SQL queries 3. Support larger data 4. Optimise @sqreamtech SIL7138 2

Hypothetical set of physical operator primitives Exam ample SQ SQL Physical cal O Operato ator Implement ntati ation select select a+b, c * 5 from t thrust::transform (a.k.a project/extend/rename) stream aggregate select a, count(*), sum(b), thrust::reduce_by_key avg(b) from t group by a filter select a, b from t where a thrust::remove_if > 0.5 stream distinct select distinct a from t thrust::unique sort select a, b, c, d from t thrust::sort order by a,b union all select * from t union all - select * from u simple implementation: sort merge join (smj) select * from t thrust::upper_bounds, lower_bounds, inner join u using (a) unnest, gather @sqreamtech SIL7138 3

A more complete engine • non indexed nested loop join (ninlj) • not matching • outer join • stream union distinct • distinct aggregates • grouping sets • window functions @sqreamtech SIL7138 4

Running on bigger data @sqreamtech SIL7138 5

Queries on data which doesn't fit in the GPU memory Chunk the data GPU 4 MB Result 1 TB Didn’t fit 1 TB 4 MB Result at a time @sqreamtech SIL7138 6

Queries on data which doesn't fit in the GPU memory Use an external sorting algorithm Extend the idea to other operators too, for instance: NINLJ • SMJ • window functions • distinct aggregates (select a, count(distinct b), count(b) from t group by a) • outer joins • @sqreamtech SIL7138 7

Queries on data which doesn't fit in the GPU memory Use external sorting algorithms and a variety of spools GPU Sort GPU / CPU spool + external sort 4 MB at a time 1 TB @sqreamtech SIL7138 8

Optimising: some non-GPU specific ideas • Use good benchmarking and profiling tools • Take advantage of columnar - don't read columns which aren't needed • Compressed data – helps with both disk I/O and PCI bus I/O • Make sure we use good I/O patterns: big reads and writes, work with the Linux FS cache effectively • Use large host memory buffers • Use AST optimisations (e.g. push predicates closer to the tablescans) • Use a cost based optimizer @sqreamtech SIL7138 9

Some GPU-specific ideas • Use CUDA specific profiling tools in addition to your regular C++ ones (or at least have a way to enable cudaDeviceSynchronize calls all over your code) • GPU task/queue • Combine tasks on the GPU • Use large chunks • Rechunking • Reduce or avoid host ↔ device transfers • Optimise GPU code @sqreamtech SIL7138 10

GPU task/queue 1. Allocate all the GPU memory we will need 2. Upload the data to the GPU 3. Run some kernels on this data 4. Download the results to host 5. Release the allocated GPU memory @sqreamtech SIL7138 11

GPU task/queue Benefits • Can help support concurrency in a single query (to hide PCI transfer latency) • Can help support concurrent queries on a single GPU with some basic fairness property (long running query won't starve out a short running query) • GPU memory usage is very predictable • No inter-task co-ordination needed • No chance of deadlock on GPU resources • Can be simply extended to distribute tasks over multiple GPUs @sqreamtech SIL7138 12

Short GPU task concept, concurrency GPU 1 GPU worker Host Worker GPU task Host Worker GPU worker queue …. Host Worker GPU 2 Host Worker GPU worker Host Worker GPU worker …. @sqreamtech SIL7138 13

Combine tasks on the GPU where possible select a, b+c as d from t where b > 5 order by a Logical TableScan Transform Remove If Sort by Sort Merge a,b,c d:=b+c b>5 a Direct TableScan Transform Remove If Sort by Sort To device To host To device To host To device To host a,b,c d:=b+c b>5 a Merge Combined TableScan Transform Remove If Sort by To device To host Sort Merge a,b,c d:=b+c b>5 a @sqreamtech SIL7138 14

Use larger chunks of data 1. transfer 10,000 rows to device 1. transfer 10,000,000 rows to device 2. transform d := b + c 2. transform d := b + c 3. remove_if b > 5 3. remove_if b > 5 4. sort by a 4. sort by a 5. transfer results to host 5. transfer results to host @sqreamtech SIL7138 15

Rechunking Table scans Optimised disk IO sizes often much bigger than the GPU can handle IO reads GPU process Chunk size Small slow good Big fast too big to fit split Big fast good @sqreamtech SIL7138 16

Rechunking Output of remove_if, join, reduce/reduce_by_key , etc. If the output of these is very small amounts of rows, collecting these rows back into big chunks can improve the performance of later operations (e.g. sort). @sqreamtech SIL7138 17

Rechunking input remove if sort High selectivity fast input remove if sort Low selectivity slow input remove if rechunk sort fast @sqreamtech SIL7138 18

Rechunking Reducing PCI transfer amounts in NINLJ for each chunk in table A: for each chunk in table B: load A,B to the GPU, join them Table A size: 10,000,000,000 rows Table B size: 10,000,000 rows At a chunk size of 1 million rows you will upload 110 billion rows to the device. If you increase it to 20 million rows, you will upload only 15 billion rows to the device 110B / 15B = Potentially more than 7x faster @sqreamtech SIL7138 19

Rechunking Reducing PCI transfer amounts in NINLJ 9 chunks Loop and load 3 chunks Loop and load 9 times 3 times @sqreamtech SIL7138 20

Reduce / avoid HOST ↔ DEVICE transfers • NINLJ: keep the small table on the GPU for the whole join • Chunk skipping - data is inserted to the database ordered by a record timestamp. The storage layer preserves this order (or inserts the data in sorted order) Typical use-case is 24 months of data in the table, but queries only want to summarize a particular day, week or month. @sqreamtech SIL7138 21

Optimising Kernels Three example bottlenecks from SQream: • reduce_by_key • multikey sort • internals of the join @sqreamtech SIL7138 22

Strings are very difficult to deal with • compression • can be large size per record non big data specific GPU issues: • variable length data • collations • unicode, ICU doesn't run on GPUs Good solutions: we're still looking @sqreamtech SIL7138 23

Find out more about SQream’s high performance GPU-driven database software  sqream.com @sqreamtech SIL7138 24

How to run SQL queries on TBs of data using GPUs Jake Wheat Lead - PowerPoint PPT Presentation

How to run SQL queries on TBs of data using GPUs Jake Wheat Lead Architect, SQream Technologies @sqreamtech SIL7138 1 How to run SQL queries on TBs of data using GPUs 1. A toy SQL query engine 2. Support wide range of SQL queries 3.

Basic SQL Lecture 2 1 Outline Data in SQL Simple Queries in SQL Queries with more

Top- -k k Queries Queries on SQL on SQL Databases Databases Top Top-k Queries on SQL

BASIC SQL CHAPTER 4 (6/E) CHAPTER 8 (5/E) 1 CHAPTER 4 OUTLINE SQL Data Definition and

SQL SQL SQL = Structured Query Language Standard query language for relational

Company Presentation July 2012 1 Content 1 Introduction to TBS Investment Highlights and

(ETBS) & RECAT EU Heathrow Crew Briefing TBS Time Based Separation has been in use at London

This Lecture SQL The SQL language SQL, the relational model, and E/R diagrams SQL Data

Basic SQL Queries 1 Why SQL? SQL is a very-high-level language Say what to do

Basic SQL Queries 1 Why SQL? SQL is a very-high-level language Say what to do

What is SQL? SQL stands for Structured Query Language SQL lets you access and manipulate

A1 (Part 2): Injection SQL Injection SQL injection is prevalent SQL injection is impactful Why a

Intermezzo: A typical database architecture 136 A typical database architecture SQL SQL SQL

Simple SQL Queries (2) Review SQL the structured query language for relational databases

Outline Background SQL history and terminology Introduction SAS seminar Proc

SQL and JS Pitfalls Assignment 2 Preparation SQL Concepts SQL vs. NoSQL

Queries in PSM The following rules apply to the use of queries: CS 235: 1. Queries

Deutsches Forum fr Interoperabilitt e.V. Interop V Lab Task Force EI Practice and

OpenStack-Health dashboard and Dealing with Data from the Gate Matthew Treinish

W12 6/28/2006 1:45 PM W EB S ERVICES I NTERFACE D ESIGN - P ITFALLS AND P ROVEN T ECHNIQUES Dave

Configuration database status report Configuration database status report Eric van Herwijnen

EXPLAIN Demystified Baron Schwartz Percona Inc Outline What is EXPLAIN? How MySQL

Aquatic Identification and Sorting System SENIOR PROJECT PROPOSAL PRESENTATION JOHN SCHULZ AND

Investor Presentation Caravel Minerals Limited (ASX:CVV) | Nov 2017 CALINGIRI Advancing WAs

REGIONS 4 RECYCLING: FROM LOCAL WASTE DATA TO THE EUROPEAN RECYCLING SOCIETY

How to run SQL queries on TBs of data using GPUs Jake Wheat Lead - PowerPoint PPT Presentation

How to run SQL queries on TBs of data using GPUs Jake Wheat Lead Architect, SQream Technologies @sqreamtech SIL7138 1 How to run SQL queries on TBs of data using GPUs 1. A toy SQL query engine 2. Support wide range of SQL queries 3.

Basic SQL Lecture 2 1 Outline Data in SQL Simple Queries in SQL Queries with more

Top- -k k Queries Queries on SQL on SQL Databases Databases Top Top-k Queries on SQL

BASIC SQL CHAPTER 4 (6/E) CHAPTER 8 (5/E) 1 CHAPTER 4 OUTLINE SQL Data Definition and

SQL SQL SQL = Structured Query Language Standard query language for relational

Company Presentation July 2012 1 Content 1 Introduction to TBS Investment Highlights and

(ETBS) &amp; RECAT EU Heathrow Crew Briefing TBS Time Based Separation has been in use at London

This Lecture SQL The SQL language SQL, the relational model, and E/R diagrams SQL Data

Basic SQL Queries 1 Why SQL? SQL is a very-high-level language Say what to do

Basic SQL Queries 1 Why SQL? SQL is a very-high-level language Say what to do

What is SQL? SQL stands for Structured Query Language SQL lets you access and manipulate

A1 (Part 2): Injection SQL Injection SQL injection is prevalent SQL injection is impactful Why a

Intermezzo: A typical database architecture 136 A typical database architecture SQL SQL SQL

Simple SQL Queries (2) Review SQL the structured query language for relational databases

Outline Background SQL history and terminology Introduction SAS seminar Proc

SQL and JS Pitfalls Assignment 2 Preparation SQL Concepts SQL vs. NoSQL

Queries in PSM The following rules apply to the use of queries: CS 235: 1. Queries

Deutsches Forum fr Interoperabilitt e.V. Interop V Lab Task Force EI Practice and

OpenStack-Health dashboard and Dealing with Data from the Gate Matthew Treinish

W12 6/28/2006 1:45 PM W EB S ERVICES I NTERFACE D ESIGN - P ITFALLS AND P ROVEN T ECHNIQUES Dave

Configuration database status report Configuration database status report Eric van Herwijnen

EXPLAIN Demystified Baron Schwartz Percona Inc Outline What is EXPLAIN? How MySQL

Aquatic Identification and Sorting System SENIOR PROJECT PROPOSAL PRESENTATION JOHN SCHULZ AND

Investor Presentation Caravel Minerals Limited (ASX:CVV) | Nov 2017 CALINGIRI Advancing WAs

REGIONS 4 RECYCLING: FROM LOCAL WASTE DATA TO THE EUROPEAN RECYCLING SOCIETY

(ETBS) & RECAT EU Heathrow Crew Briefing TBS Time Based Separation has been in use at London