Whats New in Alibabas X-DB SQL Engine Min Qiu, Alibaba Group Santa - - PowerPoint PPT Presentation

what s new in alibaba s x db sql engine
SMART_READER_LITE
LIVE PREVIEW

Whats New in Alibabas X-DB SQL Engine Min Qiu, Alibaba Group Santa - - PowerPoint PPT Presentation

Whats New in Alibabas X-DB SQL Engine Min Qiu, Alibaba Group Santa Clara, California | April 23th 25th, 2018 Agenda Introduction to X-DB Features in X-DB SQL Engine Query Plan Cache Remote Execution Distributed SQL


slide-1
SLIDE 1

Min Qiu, Alibaba Group Santa Clara, California | April 23th – 25th, 2018

What’s New in Alibaba’s X-DB SQL Engine

slide-2
SLIDE 2

2

Agenda

  • Introduction to X-DB
  • Features in X-DB SQL Engine
  • Query Plan Cache
  • Remote Execution
  • Distributed SQL Processing
  • Future Work
slide-3
SLIDE 3

Introduction to X-DB

slide-4
SLIDE 4

4

What’s X-DB

  • Alibaba home-made distributed database
  • MySQL Compatible syntax/interfaces/protocols
  • High performance at low cost
  • Horizontal scalability with high availability and fault tolerance
  • Data Strong consistency guarantee
  • Globally distributed
  • SW/HW co-designed
slide-5
SLIDE 5

5

X-DB Architecture

Distributed FS App XDriver X-Server

X- Engine SQL/KV Engine

Coordinator

X-Paxos

Partition Service GMS IO

X-RPC

AZ1

App XDriver App XDriver App XDriver App XDriver App XDriver Distributed FS X-Server

X- Engine SQL/KV Engine

Coordinator

X-Paxos

Partition Service GMS IO

X-RPC

Distributed FS X-Server

X- Engine SQL/KV Engine

Coordinator

X-Paxos

Partition Service GMS IO

X-RPC

  • Key Components
  • HA(X-Paxos)
  • Global Meta Management(GMS)
  • Storage Engine(X-Engine)
  • Distributed SQL Engine
  • Features
  • Self-contained
  • Decoupling storage and compute

allowing separate scaling

  • Multiple replicas allowing multiple

reads

AZ2 AZ3

slide-6
SLIDE 6

Features in X-DB SQL Engine

slide-7
SLIDE 7

7

Features in X-DB SQL Engine

  • Window Function
  • Global Sequence
  • Query Plan Cache
  • Distributed Query Processing
slide-8
SLIDE 8

8

Query Plan Cache

What and Why?

  • Plan is cached to skip compilation

efforts in subsequent runs

  • Parameter bind is probably necessary
  • Good for short-running queries
  • Good for query plan insensitive to bind

parameters

slide-9
SLIDE 9

9

Query Plan Cache

How?

  • Options
  • Full text match (case sensitive)

select * from t where c = 1 != select * from t where c = 2

  • Parameterized SQL template, i.e. Prepared Statement(PS)

select * from t where c = ?

  • Our Solution: Extension of existing PS solution
  • Queries from Alibaba online system fall into limited number of patterns
  • No extra work is required to parameterize SQL template
slide-10
SLIDE 10

10

Query Plan Cache

  • Problems with MySQL PS
  • PS objects are saved in client connection, OOM risk
  • No cache invalidation mechanism
  • PS only saves parsing time, optimization is still needed
  • Our Enhancement
  • Cache PS and plan related objects in worker threads
  • JOIN
  • best_ref
  • QEP_TAB
  • Cache invalidation when capacity limit is hit
slide-11
SLIDE 11

11

Query Plan Cache

Implementation

  • Generate query plan with first-run parameters
  • Good for cases where plan is insensitive to input parameters
  • Performance regression is possible if cached plan is suboptimal for specific parameters
  • Parameter substitution
  • New parameter values should be mapped to correct location of relevant data structures
  • Decouple execution from optimization
  • Cache Management
  • New system variable plan_cache_size to control cache size on each worker thread
  • Use LRU to evict if memory limit is hit
slide-12
SLIDE 12

12

Query Plan Cache

Something interesting

  • Disable the optimization against const table
  • Prevent Impossible Plan from being generated at the first run
  • Disallow the JOIN_TYPE to be set to ref for between predicate

create table t (c1 INT primary key, c2 INT); insert into t values (1, 1); prepare stmt from 'select * from t where c1 = ?’; set @a=NULL; execute stmt using @a; set @a=1; execute stmt using @a; create table t (c1 INT primary key, c2 INT); insert into t values (1, 1), (2,2); prepare stmt from 'select * from t where c1 between ? and ?’; set @a=1;set @b=1; execute stmt using @a, @b; set @a=1;set @b=2; execute stmt using @a, @b;

slide-13
SLIDE 13

13

Query Plan Cache

Plan Cache Invalidation

  • Cache schema version(i.e. V1) in PS when

generating query plan

  • Schema version changes(i.e. V2) as DDL is applied
  • Raise a flag when a different schema version is

detected at runtime

  • Automatic re-prepare when invalidated, transparent

to the user

slide-14
SLIDE 14

14

Performance Evaluation

Configuration

  • Sysbench
  • sltp_read_only
  • select_random_points
  • 100 Tables
  • 100K records per table
  • 500 Client connections
slide-15
SLIDE 15

Distributed Query Processing

Click to add text

slide-16
SLIDE 16

16

Remote Execution

Architecture

  • SQL Request Routing/Forwarding
  • Handle incorrect SQL routing
  • Partition Location Identification
  • Pass back result from remote
  • External dependencies
  • RPC service
  • GMS/LMS
slide-17
SLIDE 17

17

Remote Execution

Execution Scheduling

  • Promise/Future async paradigm
  • CONTEXT
  • Client communication protocol
  • THD context
  • Privilege check skipped on node B
slide-18
SLIDE 18

18

Remote Execution

Exception handling

  • Only one forwarding is allowed
  • Unsupported SQL request is prevent from execution
  • Query which touches data across multiple nodes
  • Transaction across multiple nodes
  • Error message/code is overlaid to node initializing the remote execution
  • Perform refresh operation if error is caused by out-of-date location cache
slide-19
SLIDE 19

19

Distributed Execution

Architecture

  • Query Coordinator
  • Accept SQL request
  • Generate distributed query plan
  • Split the execution pipeline(stage)
  • Schedule stage execution
  • Query Worker
  • Execute pipeline with given control information
  • Redistribute data to next stage
  • Scheduler
  • Pipeline dependency relation
slide-20
SLIDE 20

20

Distributed Execution

Query Plan on Coordinator

  • Generate a single-host query plan first
  • Identify the node under which Exchange is

inserted if necessary

  • Join
  • Sort
  • Group By
  • Record the location where Exchange node is

inserted (index of QEP_TAB array)

slide-21
SLIDE 21

21

Distributed Execution

Query Plan on Coordinator

  • mm_tree is built against query condition for

Partition Pruning

  • i.e. only t1p1, t1p3, t2p2, t2p4 left after partition pruning
  • Partition Location Cache at LMS tells which hosts

hold those partitions

  • Cache might be out-of-sync which can be detected at

execution time

  • LMS is forced to refresh from GMS when out-of-sync is found
  • Table access operations are dispatched to

corresponding hosts by RPC

slide-22
SLIDE 22

22

Distributed Execution

Modification on optimizer

  • Disable ”const table” optimization when necessary
  • Perform “lock table” operation as late as possible
  • Currently tables are locked in between prepare and optimize
  • Avoid diving into storage engine during optimization
  • Row number estimate
  • Record in range estimate
  • Use statistics stored in global catalog for cost estimation
slide-23
SLIDE 23

23

Distributed Execution

Terminology

  • Pipeline
  • A segment(set of operations) in the query plan tree
  • Receives input from upstream pipeline and generation output for downstream pipeline
  • The unit of scheduling and execution
  • Stage
  • The process in which a pipeline execution is performed
  • Task
  • A subset of a stage, which deals with a partition of data
  • A stage might be consist of multiple tasks
slide-24
SLIDE 24

24

Distributed Execution

Pipeline Tree Generation

  • Exchange node is the pipeline boundary
  • A pair of In/Out nodes corresponding to each

Exchange

  • Exchange Out node at Producer side
  • Exchange In node at Consumer side
  • Pipeline dependency relationship is

represented by the edge in the tree

  • Pipeline tree is input to scheduler
slide-25
SLIDE 25

25

Distributed Execution

Query Plan on Worker

  • Plan on worker must be the same

as that on coordinator

  • Pipeline generated on coordinator must

match the same segment on worker

  • Full SQL statement is compiled only
  • nce on worker
  • Plan is cached on worker with globally

unique job ID as label

slide-26
SLIDE 26

26

Distributed Execution

Same environment on Coordinator and Workers

  • Environment on coordinator to generate query plan must be restored on workers
  • Several categories of environment/context information
  • All system variables used by optimizer
  • All session variables used by optimizer
  • Statistics used by optimizer
  • table::file::stats.records
  • Constant values in cost_model objects
slide-27
SLIDE 27

27

Distributed Execution

Two Level Scheduler

  • Stage level scheduling
  • Satisfies dependency restriction of pipeline tree
  • Bottom up
  • JOIN::exec() is modified to be reentrant
  • Allow to execute a segment of QEP_TAB array with given start and end indexes.
  • Join_buffer/Temp_table must be ready before the segment starts to run
  • Task level scheduling
  • Task is divided into a set of sub-task
  • Schedule sub-task to run in parallel
slide-28
SLIDE 28

28

Distributed Execution

Examples

  • Without Data Exchange
  • Simple query without group by/order

by/subquery

  • Partition key is prefix of group by list
  • Partition key is prefix of order by list
  • Partition-aware Join

Stage 0 Task 1 Stage 0 Task 0 Stage 0 Task 1 Merge Partition-aware Join Push Down

slide-29
SLIDE 29

29

Distributed Execution

Examples

  • With Data Exchange
  • Broadcast
  • T1 has 3 partitions on 3 nodes
  • T2 has 1 partitions on 1 node
  • T2 is small

T2 T2 T2

slide-30
SLIDE 30

30

Distributed Execution

Examples

  • With Data Exchange
  • Shuffle
  • T1 has 4 partitions on 2

nodes

  • T2 has 2 partitions on 2

nodes

T1P1_1 T1P1_2 T1P2_1 T1P2_2 T1P3_1 T1P3_2 T1P4_1 T1P4_2 T2P1_1 T2P1_2 T2P2_1 T2P2_2

slide-31
SLIDE 31

Future Work

slide-32
SLIDE 32

32

Future Work

  • Distributed Processing Enhancement
  • Support distributed query processing requiring data redistribution
  • Advanced Scheduling Options
  • Parallel scheduling for independent stages
  • Location/Resource based scheduling
  • Load Balancer
  • Admission Control
  • Enhancement to optimizer
  • Data Redistribution selection
  • Automatic Parallelism selection
  • Enhancement to executor
  • Hash Join
  • Sort Merge Join
slide-33
SLIDE 33

33

About me

Min Qiu

  • Staff Engineer at Alibaba Database Kernel Team
  • Tech Lead of SQL Engine Team
  • Contact: min.qiu@alibaba-inc.com
  • LinkedIn: https://www.linkedin.com/in/min-qiu-088b5516/
slide-34
SLIDE 34

Thank You!