Scaling the Relational Database for the Cloud Age Sumedh Pathak, - - PowerPoint PPT Presentation

scaling the relational database for the cloud age
SMART_READER_LITE
LIVE PREVIEW

Scaling the Relational Database for the Cloud Age Sumedh Pathak, - - PowerPoint PPT Presentation

Scaling the Relational Database for the Cloud Age Sumedh Pathak, Co-Founder & VP Engineering, Citus Data DataEngConf San Francisco 2018 @moss_toss | @citusdata About Me Co-Founder & VP Engineering at Citus Data Amazon Shopping


slide-1
SLIDE 1

Sumedh Pathak, Co-Founder & VP Engineering, Citus Data DataEngConf San Francisco 2018

Scaling the Relational Database for the Cloud Age

@moss_toss | @citusdata

slide-2
SLIDE 2

About Me

  • Co-Founder & VP Engineering at Citus Data
  • Amazon Shopping Cart


(former)

  • Amazon Supply Chain & Order Fulfillment (former)
  • Stanford Computer Science
Sumedh Pathak | Citus Data | DataEngConf 2018
slide-3
SLIDE 3
slide-4
SLIDE 4

Why RDBMS?

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-5
SLIDE 5

Streaming Storage Map Reduce NoSQL SQL Database

SELECT ...

Application Application

Sumedh Pathak | Citus Data | DataEngConf 2018

Because your architecture could be simpler

slide-6
SLIDE 6

An RDBMS is a general-purpose
 data platform

Sumedh Pathak | Citus Data | DataEngConf 2018

Fast writes Real-time & bulk High throughput High concurrency Data consistency Query optimizers

slide-7
SLIDE 7

PostgreSQL

MySQL MongoDB SQL Server + Oracle

Source: Hacker News, https://news.ycombinator.com


Startups Are Choosing Postgres

% database job posts mentioning each database, across 20K+ job posts

slide-8
SLIDE 8

Data Trends in the “Cloud Age”

  • Explosion of Data
  • Higher Volume
  • Higher Velocity

** SCALE **

slide-9
SLIDE 9

but RDBMS’s 
 don’t scale!

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-10
SLIDE 10

but RDBMS’s don’t scale RDBMS’s are hard to scale

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-11
SLIDE 11

What exactly needs to Scale?

  • Tables (Data)
  • Partitioning, Co-location, Reference Tables
  • SQL (Reads)
  • How do we express and optimize distributed SQL
  • Transactions (Writes)
  • Cross Shard updates/deletes, Global Atomic Transactions
Sumedh Pathak | Citus Data | DataEngConf 2018

1 2 3

slide-12
SLIDE 12

Scaling Tables

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-13
SLIDE 13

Data Partitioning

  • Pick a column
  • Date
  • Id (customer_id, cart_id)
  • Pick a method
  • Hash
  • Range
slide-14
SLIDE 14

Partition data across nodes

R1 R2 R3 R4 R5 R6 R7 Coordinator Node

Worker Nodes

Shards

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-15
SLIDE 15

Worker → RDBMS, Shard → Table

slide-16
SLIDE 16

Reference Tables

N1 N1 N1 N1 Copies of same table

Sumedh Pathak | Citus Data | DataEngConf 2018

Coordinator Node

Worker Nodes

slide-17
SLIDE 17

Co-Location

R1 R2 R3 R4 S1 S2 S3 S4 Explicit Co-Location API.
 E.g. Partition by Tenant

Sumedh Pathak | Citus Data | DataEngConf 2018

Coordinator Node

Worker Nodes

slide-18
SLIDE 18

What about Foreign Keys?

slide-19
SLIDE 19

The key to scaling tables...

  • Use relational databases as a building block
  • Understand semantics of application—to be smart about

partitioning

  • Multi-tenant applications
slide-20
SLIDE 20

Scaling SQL

slide-21
SLIDE 21

FROM table R SELECT x Projectx(R) R’ WHERE f(x) Filterf(x)(R) R’ … JOIN … R × S R’

SQL ↔ Relational Algebra

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-22
SLIDE 22

FROM sharded_table Collect(R1,R2,...) R

Distributed Relational Algebra

R1 R2 C

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-23
SLIDE 23

Commutativity

A + B = B + A

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-24
SLIDE 24

Projectx(Collect(R1,R2,...)) = Collect(Projectx(R1), Projectx(R2)...)

Commutative property

R1 R2 C R1 R2 C Px Px Px

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-25
SLIDE 25

Distributivity

A*(B + C) = A*B + A*C

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-26
SLIDE 26

Collect(R1,R2,...) x Collect(S1,S2,...) = Collect(R1× S1,R2× S2,...)

Distributive property

R1 R2 C × C S1 S2 R1 R2 C × S1 S2 ×

X = Join Operator

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-27
SLIDE 27

Associativity

A + B + C = (A + B) + C = A + (B + C)

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-28
SLIDE 28

SUM(x)(Collect(R1,R2,...)) = SUM(Collect(SUM(R1), SUM(R2)...))

Associative property

R1 R2 C R1 R2 C

Sumx Sumx Sumx

Sumedh Pathak | Citus Data | DataEngConf 2018

Sumx

slide-29
SLIDE 29

SELECT sum(price) FROM orders, nation WHERE orders.nation = nation.name AND

  • rders.date >= '2012-01-01'

AND nation.region = 'Asia';

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-30
SLIDE 30 Sumedh Pathak | Citus Data | DataEngConf 2018
slide-31
SLIDE 31 Sumedh Pathak | Citus Data | DataEngConf 2018

Volcano style processing

Data flows from
 bottom
 to top

slide-32
SLIDE 32 Sumedh Pathak | Citus Data | DataEngConf 2018
slide-33
SLIDE 33 Sumedh Pathak | Citus Data | DataEngConf 2018

Parallelize Aggregate Push Joins & Filters below collect. Run in parallel across all nodes Filters & Projections done before Join

slide-34
SLIDE 34 Sumedh Pathak | Citus Data | DataEngConf 2018
slide-35
SLIDE 35

SELECT sum(intermediate_col) 
 FROM <concatenated results>; SELECT sum(price) FROM orders_2 JOIN nation_2
 ON (orders_2.name = nation_2.name) WHERE

  • rders_2.date >= '2017-01-01'

AND nation_2.region = 'Asia'; SELECT sum(price) FROM orders_2 JOIN nation_1
 ON (orders_2.name = nation_1.name) WHERE

  • rders_2.date >= '2017-01-01'

AND nation_2.region = 'Asia';

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-36
SLIDE 36

Executing Distributed SQL

SQL database

  • rders_1

nation_1

  • rders

nation

SELECT sum(price) FROM orders_2 o JOIN nation_1 ON (o.name = n.name) WHERE o.date >= '2017-01-01' AND n.region = 'Asia'; SELECT sum(price) FROM orders_1 o JOIN nation_1 ON (o.name = n.name) WHERE o.date >= '2017-01-01' AND n.region = 'Asia'; SELECT sum(price) FROM <results>;

SQL database

  • rders_2

nation_1

slide-37
SLIDE 37

The key to scaling SQL...

  • New relational algebra operators for

distributed processing

  • Relational Algebra Properties to optimize

tree: Commutativity, Associativity, & Distributivity

  • Map / Reduce operators
Sumedh Pathak | Citus Data | DataEngConf 2018
slide-38
SLIDE 38

Scaling Transactions

slide-39
SLIDE 39

Money Transfer, as an example

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT;

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-40
SLIDE 40

A1 A2

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT;

A3 A4

Coordinator

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-41
SLIDE 41

A1 A2

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT;

A3 A4

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’;

Sumedh Pathak | Citus Data | DataEngConf 2018

Coordinator

slide-42
SLIDE 42

A1 A2

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT;

A3 A4

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; COMMIT;

Sumedh Pathak | Citus Data | DataEngConf 2018

Coordinator

slide-43
SLIDE 43

A1 A2

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT;

A3 A4

Coordinator

BEGIN; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’;

Sumedh Pathak | Citus Data | DataEngConf 2018

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’;

slide-44
SLIDE 44

A1 A2

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT;

A3 A4

BEGIN; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT

Sumedh Pathak | Citus Data | DataEngConf 2018

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’;

Coordinator

slide-45
SLIDE 45

A1 A2

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT;

A3 A4

PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’;

Sumedh Pathak | Citus Data | DataEngConf 2018

Coordinator

slide-46
SLIDE 46

What happens during PREPARE?

State of transaction stored

  • n a durable store

Locks are
 maintained

&

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-47
SLIDE 47

A1 A2

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT;

A3 A4 Coordinator

PREPARE TRANSACTION ‘citus_...98’; ROLLBACK TRANSACTION ‘citus_... 98’;

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-48
SLIDE 48

A1 A2

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT;

A3 A4

PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’;

Sumedh Pathak | Citus Data | DataEngConf 2018

Coordinator

slide-49
SLIDE 49

A1 A2

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT;

A3 A4

COMMIT PREPARED ‘citus_... 98’; COMMIT PREPARED ‘citus_... 98’;

Sumedh Pathak | Citus Data | DataEngConf 2018

Coordinator

slide-50
SLIDE 50

A1 A2

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT;

A3 A4

COMMIT PREPARED ‘citus_... 98’; COMMIT PREPARED ‘citus_... 98’;

Sumedh Pathak | Citus Data | DataEngConf 2018

Coordinator

slide-51
SLIDE 51

A1 A2

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT;

A3 A4

COMMIT PREPARED ‘citus_... 98’; COMMIT PREPARED ‘citus_... 98’;

Sumedh Pathak | Citus Data | DataEngConf 2018

Coordinator

slide-52
SLIDE 52

But.... Deadlocks!

slide-53
SLIDE 53

Example

// SESSION 1

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; (LOCK on ROW with id ‘ALICE’)

// SESSION 2

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-54
SLIDE 54

Example

// SESSION 1

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; (LOCK on ROW with id ‘ALICE’)

// SESSION 2

BEGIN; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; (LOCK on ROW with id ‘BOB’)

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-55
SLIDE 55

Example

// SESSION 1

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; (LOCK on ROW with id ‘ALICE’)

UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’;

// SESSION 2

BEGIN; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; (LOCK on ROW with id ‘BOB’)

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-56
SLIDE 56

Example

// SESSION 1

BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; (LOCK on ROW with id ‘ALICE’)

UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’;

// SESSION 2

BEGIN; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; (LOCK on ROW with id ‘BOB’)

UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-57
SLIDE 57

How do Relational DB’s solve this?

S1 S2

  • Construct a

Directed Graph

  • Each node is a

session/transaction

  • Edge represents a

wait on a lock

Waiting

  • n ‘Bob’

Waiting

  • n ‘Alice’
slide-58
SLIDE 58

S1 S2 S1 S2 S2 S1 S2 Waits on S1 S1 Waits on S2

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-59
SLIDE 59

S1 S2 S1 S2 S2 S1 S2 Waits on S1 S1 Waits on S2

Distributed
 Deadlock Detector

S1 S2 S2 S1

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-60
SLIDE 60

keys to scaling transactions

  • 2PC to ensure atomic transactions across nodes
  • Deadlock Detection—to scale complex & concurrent

transaction workloads

  • MVCC
  • Failure Handling

4

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-61
SLIDE 61

It’s 2018. Distributed can be Relational

  • Scale tables—via sharding
  • Scale SQL—via distributed relational algebra
  • Scale transactions—via 2PC & Deadlock Detection
Sumedh Pathak | Citus Data | DataEngConf 2018

Now, how do we implement all of this?

slide-62
SLIDE 62

Postgres

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-63
SLIDE 63

PostgreSQL

Planner Executor Custom scan Commit / abort

Extension (.so)

Access methods Foreign tables Functions

... ... ... ... ... ... ...

CREATE EXTENSION ...

Sumedh Pathak | Citus Data | DataEngConf 2018
slide-64
SLIDE 64

PostgreSQL PostgreSQL PostgreSQL

shards shards shard shards shards shard

SELECT … FROM distributed_table …

SELECT … FROM shard… SELECT … FROM shard…

Citus

slide-65
SLIDE 65

PostgreSQL

Citus PostGIS PL/Python JSONB 2PC Replication ... Sequences Indexes Full-Text Search Transactions … dblink

Sumedh Pathak | Citus Data | DataEngConf 2018

Rich ecosystem

  • f tooling,

data types,
 & extensions in PostgreSQL

slide-66
SLIDE 66

66

PostgreSQL can be
 extended into an all-purpose distributed RDBMS

slide-67
SLIDE 67

67

I believe the future of distributed databases is relational.

slide-68
SLIDE 68

Thank you!

Sumedh Pathak sumedh@citusdata.com @moss_toss | @citusdata | citusdata.com

DataEngConf 2018