Parallel Query Execution in POLARDB for MySQL ystein Grvlen Benny - PowerPoint PPT Presentation

Parallel Query Execution in POLARDB for MySQL Øystein Grøvlen Benny Wang Alibaba Cloud

Agenda What is Parallel Query? • Parallel Query Design • How to use Parallel Query • Parallel Query Performance • Future Work •

What is Parallel Query? Parallel Query is an innovative method to accelerate MySQL queries from Alibaba Cloud. • Traditionally, 1 MySQL query runs with just 1 thread, and can not take advantage of multiple cores on modern processors. • Parallel Query takes advantage of modern processors to distribute work across many or all available cores: • 8 parallel threads can be up to 8 times faster • 32 parallel threads can be up to 32 times faster

Why Parallel Query? • 2003: CPUs stopped getting faster • 2004-2019 focus on more cores, sockets. • PQ lets MySQL take advantage of last 15 years of progress.

How to Use Parallel Query Parallel Query runs against your existing InnoDB data. No data extraction to another system is required. No query modifications are required. Parallel Query within InnoDB (no extraction needed) is an amazing feature exclusive to Alibaba Cloud

Query with Parallelism SELECT count(*) FROM production.product; Serial execution plan: SQL Thread 1: Scan, Count Client 1 active thread 63 idle threads .

Parallel Execution Plan Thread 1: Scan, Count Thread 2: Scan, Count With 64 parallel Thread 3: Scan, Count threads, each thread SQL Thread 4: Scan, Count Sum Client does < 2% of the work. Thread 5: Scan, Count Thread 6: Scan, Count Thread 7: Scan, Count . . . Thread 64: Scan, Count

Parallel Query Architecture " ! ⑅ LEADER … ⑅ T T Gather … … MESSAGE QUEUE " " " " " ! ! ! ! ! … … ⑅ WORKERs ⑅ ⑅ ⑅ ⑅ … ⑅ … … … … ⑅ ⑅ ⑅ ⑅ T T T T T T T T T T T T T T T … … … … … PARALLEL SCAN

Partitioning 11 17 25 5 8 14 20 22 28 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Partitioning InnoDB partitions the B-tree 2 partitions 11 17 25 5 8 14 20 22 28 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Partition 1 Partition 2

Partitioning Workers see only one partition (at a time) 2 partitions 11 17 25 5 8 14 20 22 28 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Partition 1 Partition 2

Partitioning 2 nd level of B-tree may be used to split into many partitions 6 partitions 11 17 25 5 8 14 20 22 28 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Part. 5 Part. 4 Part. 6 Part. 1 Part. 3 Part. 2

Partitioning • Server will normally request 100 partitions per worker thread • “Fast” workers may process more partitions than “slow” workers • Partitions of more equal size • When finished with one partition, a worker may be automatically attached to a new partition.

Parallel Query Sort Parallel threads run local sort SELECT col1, col2, col3 FROM t1 ORDER BY 1,2; Thread 1: Scan, Sort Thread 2: Scan, Sort Thread 3: Scan, Sort Merge SQL Thread 4: Scan, Sort Sort Client Thread 5: Scan, Sort Thread 6: Scan, Sort 1. Parallel data access (table scan or index) Thread 7: Scan, Sort 2. Parallel order by of the data handled by each worker . . . Thread 64: Scan, Sort 3. Final merge sort of the results and return to client.

Parallel Query GROUP BY Parallel threads run local group SELECT col1, col2, SUM (col3) FROM t1 GROUP BY 1,2; Thread 1: Scan, Group Thread 2: Scan, Group Thread 3: Scan, Group Merge SQL Thread 4: Scan, Group Groups Client Thread 5: Scan, Group 1. Parallel data access (table scan or index) Thread 6: Scan, Group Thread 7: Scan, Group 2. Parallel group by of the data handled by each worker . . . 3. Final merge of the local group by and return results Thread 64: Scan, Group DISTINCT operation will be similar to GROUP BY.

Parallel Query Nested-Loops JOIN Parallel scan and join SELECT * FROM t1 JOIN t3 ON t1.id = t3. id ; Thread 1: Scan, Join Thread 2: Scan, Join Thread 3: Scan, Join SQL Thread 4: Scan, Join Merge Client Thread 5: Scan, Join 1. Parallel data access (table scan or index) of driving table Thread 6: Scan, Join 2. Parallel join of the local data handled by each worker Thread 7: Scan, Join . . . 3. Final merge of the and return to client Thread 64: Scan, Join

Parallel Query Usage System variables to control parallel query To enable parallel execution for a session: set max_parallel_degree = n Maximum n worker threads will be used. MySQL may still decide to not use parallelization. If so, parallel execution may be forced with set force_parallel_mode = on

Parallel Query Usage Hints to control parallel query To force parallel query execution for a single query: SELECT /*+ PARALLEL () */ * FROM ... To force the use of a specific number of worker threads, n : SELECT /*+ PARALLEL (n) */ * FROM ... More hints are on the way.

Parallel Query Resource control Control number of parallel workers To control worker threads for parallel query: set max_parallel_workers = n Only maximum n worker threads will be allowed in total. MySQL will decide to whether to use parallel query and how many workers for current query. total number_of_workers <= max_parallel_workers; actual_workers_number <= max_data_partitions; actual_workers_number <= max_parallel_degree;

Parallel Query Resource control Monitor the running status of parallel query mysql> select * from performance_schema.events_parallel_query_current\G *************************** 2. row *************************** *************************** 1. row *************************** THREAD_ID: 95 THREAD_ID: 94 PARENT_THREAD_ID: 94 PARENT_THREAD_ID: 0 PARALLEL_TYPE: WORKER PARALLEL_TYPE: GATHER EVENT_ID: 2 EVENT_ID: 11 END_EVENT_ID: NULL END_EVENT_ID: NULL EVENT_NAME: parallel_query EVENT_NAME: parallel_query STATE: COMPLETED STATE: COMPLETED PLANNED_DOP: 0 PLANNED_DOP: 16 ACTUAL_DOP: 0 ACTUAL_DOP: 16 NUMBER_OF_PARTITIONS: 0 NUMBER_OF_PARTITIONS: 36 PARTITIONED_OBJECT: PARTITIONED_OBJECT: t1 ROWS_SCANED: 718 ROWS_SCANED: 10189 ROWS_SENT: 8 ROWS_SENT: 77 ROWS_SORTED: 0 ROWS_SORTED: 0 EXECUTION_TIME: 423644016 EXECUTION_TIME: 435373818 NESTING_EVENT_ID: 1 NESTING_EVENT_ID: 9 NESTING_EVENT_TYPE: STATEMENT NESTING_EVENT_TYPE: STATEMENT **************************** 3. row***************************

Parallel Query Resource control Constrain total memory To control maximum memory consumed by parallel query: set query_memory_hard_limit = n set query_memory_soft_limit = n If the total memory usage is above query_memory_hard_limit , the parallel query will be aborted and an error will be reported. If the total memory usage is above query_memory_soft_limit , parallel query execution will not be chosen. The following memory buffers we considered mainly • Internal temporary table • Sort buffer • Join cache

Parallel Query Resource control Constrain total memory - example mysql> set global query_memory_soft_limit = 99 * 1024 * 1024; Query OK, 0 rows affected (0.00 sec) mysql> set join_buffer_size = 100 * 1024 * 1024; Query OK, 0 rows affected (0.00 sec) mysql> select x.b from t1 x, t1 y where x.a = y.a; … mysql> show warnings; +---------+-------+----------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +---------+-------+----------------------------------------------------------------------------------------------------------------+ | Warning | 13250 | Could not choose parallel plan due to total query memory temporarily overflow, consider increasing @@query_memory_soft_limit value. | +---------+-------+----------------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec) n set, 1 warning (0.00 sec)

Parallel Query Resource control Control total memory - example mysql> set global query_memory_hard_limit = 99 * 1024 * 1024; Query OK, 0 rows affected (0.00 sec) mysql> set join_buffer_size=100 * 1024 * 1024; Query OK, 0 rows affected (0.00 sec) mysql> select x.b from t1 x, t1 y where x.a = y.a; ERROR 13249 (HY000): Failed to allocate memory for query due to overflow, consider increasing @@query_memory_hard_limit value.

Parallel Query Resource control Monitor memory status SHOW GLOBAL STATUS WHERE variable_name = ‘total_running_parallel_workers’; SHOW GLOBAL STATUS WHERE variable_name = ‘total_used_query_memory';

Parallel Query Execution in POLARDB for MySQL ystein Grvlen Benny - PowerPoint PPT Presentation

Parallel Query Execution in POLARDB for MySQL ystein Grvlen Benny Wang Alibaba Cloud Agenda What is Parallel Query? Parallel Query Design How to use Parallel Query Parallel Query Performance Future Work Agenda

Performance Guide for MySQL Cluster Mikael Ronstrm, Ph.D Senior MySQL Architect Sun

MySQL Replication Update MySQL 5.5 (GA) & MySQL 5.6.2 (Dev. Milestone) Lars Thalmann

MySQL Proxy meets: binlogs Jan Kneschke MySQL Enterprise Tools mailto: jan@mysql.com What is

MySQL Proxy Making MySQL more flexible Jan Kneschke jan@mysql.com MySQL Proxy proxy-servers

MySQL Group Replication & MySQL InnoDB Cluster Production Ready? Kenny Gryp MySQL Practice

PolarDB Cloud Native DB @ Alibaba Lixun Peng Inaam Rana Alibaba Cloud Team Agenda

MySQL Cluster und MySQL Proxy Alles Online Diese Slides gibt es auch unter:

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Reducing Risk When Upgrading Your MySQL Environment Kenny Gryp MySQL Practice Manager My

PHP and MySQL Dr. E. Benoist Winter Term 2006-2007 PHP and MySQL 1 PHP and MySQL Introduction

More on gdb for MySQL DBAs or Using gdb to study MySQL internals and as a last resort Valerii

POLARDB for MyRocks Extending shared storage to MyRocks Zhang, Yuan Alibaba Cloud Apr, 2018

Percona MySQL About Me Qunar.com DB Director

Forecasting MySQL Scalability Baron Schwartz O'Reilly MySQL Conference & Expo 2011

PHP + MySQL MySQL on the command line is great and all well not its not really that great

Repetition Structures Joan Boone jpboone@email.unc.edu Summer 2020 Slide 1 Topics Part 1

301AA - Advanced Programming Lecturer: Andrea Corradini andrea@di.unipi.it

The Future of Software Privacy concerns commerce, security, cryptography, www.digicash.com

OptiML: An Implicitly Parallel Domain-Specific Language for ML Arvind K. Sujeeth, HyoukJoong Lee,

T OP INCOMES IN G ERMANY , 1871-2013 Charlotte Bartels (DIW/SOEP) First WID.World conference

Module 5.2 Thread Execusion Efficiency Performance Impact of Control Divergence Objective

Control Unit (single cycle implementation) Control unit sends control signals to data path

Chapter 4 MARIE: An Introduction to a Simple Computer 2 4.8 MARIE 4.13 A Discussion on

Parallel Query Execution in POLARDB for MySQL ystein Grvlen Benny - PowerPoint PPT Presentation

Parallel Query Execution in POLARDB for MySQL ystein Grvlen Benny Wang Alibaba Cloud Agenda What is Parallel Query? Parallel Query Design How to use Parallel Query Parallel Query Performance Future Work Agenda

Performance Guide for MySQL Cluster Mikael Ronstrm, Ph.D Senior MySQL Architect Sun

MySQL Replication Update MySQL 5.5 (GA) &amp; MySQL 5.6.2 (Dev. Milestone) Lars Thalmann

MySQL Proxy meets: binlogs Jan Kneschke MySQL Enterprise Tools mailto: jan@mysql.com What is

MySQL Proxy Making MySQL more flexible Jan Kneschke jan@mysql.com MySQL Proxy proxy-servers

MySQL Group Replication &amp; MySQL InnoDB Cluster Production Ready? Kenny Gryp MySQL Practice

PolarDB Cloud Native DB @ Alibaba Lixun Peng Inaam Rana Alibaba Cloud Team Agenda

MySQL Cluster und MySQL Proxy Alles Online Diese Slides gibt es auch unter:

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Reducing Risk When Upgrading Your MySQL Environment Kenny Gryp MySQL Practice Manager My

PHP and MySQL Dr. E. Benoist Winter Term 2006-2007 PHP and MySQL 1 PHP and MySQL Introduction

More on gdb for MySQL DBAs or Using gdb to study MySQL internals and as a last resort Valerii

POLARDB for MyRocks Extending shared storage to MyRocks Zhang, Yuan Alibaba Cloud Apr, 2018

Percona MySQL About Me Qunar.com DB Director

Forecasting MySQL Scalability Baron Schwartz O'Reilly MySQL Conference &amp; Expo 2011

PHP + MySQL MySQL on the command line is great and all well not its not really that great

Repetition Structures Joan Boone jpboone@email.unc.edu Summer 2020 Slide 1 Topics Part 1

301AA - Advanced Programming Lecturer: Andrea Corradini andrea@di.unipi.it

The Future of Software Privacy concerns commerce, security, cryptography, www.digicash.com

OptiML: An Implicitly Parallel Domain-Specific Language for ML Arvind K. Sujeeth, HyoukJoong Lee,

T OP INCOMES IN G ERMANY , 1871-2013 Charlotte Bartels (DIW/SOEP) First WID.World conference

Module 5.2 Thread Execusion Efficiency Performance Impact of Control Divergence Objective

Control Unit (single cycle implementation) Control unit sends control signals to data path

Chapter 4 MARIE: An Introduction to a Simple Computer 2 4.8 MARIE 4.13 A Discussion on

MySQL Replication Update MySQL 5.5 (GA) & MySQL 5.6.2 (Dev. Milestone) Lars Thalmann

MySQL Group Replication & MySQL InnoDB Cluster Production Ready? Kenny Gryp MySQL Practice

Forecasting MySQL Scalability Baron Schwartz O'Reilly MySQL Conference & Expo 2011