scaling the relational database for the cloud age
play

Scaling the Relational Database for the Cloud Age Sumedh Pathak, - PowerPoint PPT Presentation

Scaling the Relational Database for the Cloud Age Sumedh Pathak, Co-Founder & VP Engineering, Citus Data DataEngConf San Francisco 2018 @moss_toss | @citusdata About Me Co-Founder & VP Engineering at Citus Data Amazon Shopping


  1. Scaling the Relational Database for the Cloud Age Sumedh Pathak, Co-Founder & VP Engineering, Citus Data DataEngConf San Francisco 2018 @moss_toss | @citusdata

  2. About Me • Co-Founder & VP Engineering at Citus Data • Amazon Shopping Cart 
 (former) • Amazon Supply Chain & Order Fulfillment (former) • Stanford Computer Science Sumedh Pathak | Citus Data | DataEngConf 2018

  3. Why RDBMS? Sumedh Pathak | Citus Data | DataEngConf 2018

  4. Because your architecture could be simpler SELECT ... SQL Application Database Map Streaming Storage NoSQL Application Reduce Sumedh Pathak | Citus Data | DataEngConf 2018

  5. An RDBMS is a general-purpose 
 data platform Fast writes High throughput Data consistency Real-time & bulk High concurrency Query optimizers Sumedh Pathak | Citus Data | DataEngConf 2018

  6. Startups Are Choosing Postgres % database job posts mentioning each database, across 20K+ job posts PostgreSQL MySQL MongoDB SQL Server + Oracle Source: Hacker News, https://news.ycombinator.com 


  7. Data Trends in the “Cloud Age” - Explosion of Data - Higher Volume - Higher Velocity ** SCALE **

  8. but RDBMS’s 
 don’t scale! Sumedh Pathak | Citus Data | DataEngConf 2018

  9. but RDBMS’s don’t scale RDBMS’s are hard to scale Sumedh Pathak | Citus Data | DataEngConf 2018

  10. What exactly needs to Scale? - Tables (Data) 1 - Partitioning, Co-location, Reference Tables - SQL (Reads) 2 - How do we express and optimize distributed SQL - Transactions (Writes) 3 - Cross Shard updates/deletes, Global Atomic Transactions Sumedh Pathak | Citus Data | DataEngConf 2018

  11. Scaling Tables Sumedh Pathak | Citus Data | DataEngConf 2018

  12. Data Partitioning - Pick a column - Date - Id (customer_id, cart_id) - Pick a method - Hash - Range

  13. Partition data across nodes Coordinator Node Shards Worker R 1 R 2 R 3 R 4 R 5 R 6 R 7 Nodes Sumedh Pathak | Citus Data | DataEngConf 2018

  14. Worker → RDBMS, Shard → Table

  15. Reference Tables Coordinator Node Copies of same table Worker N 1 N 1 N 1 N 1 Nodes Sumedh Pathak | Citus Data | DataEngConf 2018

  16. Co-Location Coordinator Node Explicit Co-Location API. 
 E.g. Partition by Tenant Worker R 1 S 1 R 2 S 2 R 3 S 3 R 4 S 4 Nodes Sumedh Pathak | Citus Data | DataEngConf 2018

  17. What about Foreign Keys?

  18. The key to scaling tables... - Use relational databases as a building block - Understand semantics of application—to be smart about partitioning - Multi-tenant applications

  19. Scaling SQL

  20. SQL ↔ Relational Algebra FROM table R Project x (R) � R’ SELECT x Filter f(x) (R) � R’ WHERE f(x) R × S � R’ … JOIN … Sumedh Pathak | Citus Data | DataEngConf 2018

  21. Distributed Relational Algebra Collect(R 1 ,R 2 ,...) � R FROM sharded_table C R 1 R 2 Sumedh Pathak | Citus Data | DataEngConf 2018

  22. Commutativity A + B = B + A Sumedh Pathak | Citus Data | DataEngConf 2018

  23. Commutative property Project x (Collect(R 1 ,R 2 ,...)) = Collect(Project x (R 1 ), Project x (R 2 )...) P x C C P x P x R 1 R 2 R 1 R 2 Sumedh Pathak | Citus Data | DataEngConf 2018

  24. Distributivity A*(B + C) = A*B + A*C Sumedh Pathak | Citus Data | DataEngConf 2018

  25. Distributive property Collect(R 1 ,R 2 ,...) x Collect(S 1 ,S 2 ,...) = Collect(R 1 × S 1 ,R 2 × S 2, ...) × C C C × × R 1 S 1 R 2 S 2 R 1 S 1 R 2 S 2 X = Join Operator Sumedh Pathak | Citus Data | DataEngConf 2018

  26. Associativity A + B + C = (A + B) + C = A + (B + C) Sumedh Pathak | Citus Data | DataEngConf 2018

  27. Associative property SUM(x)(Collect(R 1 ,R 2 ,...)) = SUM(Collect(SUM(R 1 ), SUM(R 2 )...)) Sum x Sum x C C Sum x Sum x R 1 R 2 R 1 R 2 Sumedh Pathak | Citus Data | DataEngConf 2018

  28. SELECT sum(price) FROM orders, nation WHERE orders.nation = nation.name AND orders.date >= '2012-01-01' AND nation.region = 'Asia'; Sumedh Pathak | Citus Data | DataEngConf 2018

  29. Sumedh Pathak | Citus Data | DataEngConf 2018

  30. Volcano style processing Data flows from 
 bottom 
 to top Sumedh Pathak | Citus Data | DataEngConf 2018

  31. Sumedh Pathak | Citus Data | DataEngConf 2018

  32. Parallelize Aggregate Push Joins & Filters below collect. Run in parallel across all nodes Filters & Projections done before Join Sumedh Pathak | Citus Data | DataEngConf 2018

  33. Sumedh Pathak | Citus Data | DataEngConf 2018

  34. SELECT sum(intermediate_col) 
 FROM <concatenated results>; SELECT sum(price) SELECT sum(price) FROM orders_2 JOIN nation_2 
 FROM orders_2 JOIN nation_1 
 ON (orders_2.name = ON (orders_2.name = nation_2.name) nation_1.name) WHERE WHERE orders_2.date >= '2017-01-01' orders_2.date >= '2017-01-01' AND AND nation_2.region = 'Asia'; nation_2.region = 'Asia'; Sumedh Pathak | Citus Data | DataEngConf 2018

  35. Executing Distributed SQL SELECT sum(price) FROM <results>; orders SELECT sum(price) SELECT sum(price) FROM orders_1 o JOIN nation_1 nation FROM orders_2 o JOIN nation_1 ON (o.name = n.name) ON (o.name = n.name) WHERE o.date >= '2017-01-01' WHERE o.date >= '2017-01-01' AND n.region = 'Asia'; AND n.region = 'Asia'; orders_1 orders_2 nation_1 nation_1 SQL database SQL database

  36. The key to scaling SQL... - New relational algebra operators for distributed processing - Relational Algebra Properties to optimize tree: Commutativity, Associativity, & Distributivity - Map / Reduce operators Sumedh Pathak | Citus Data | DataEngConf 2018

  37. Scaling Transactions

  38. Money Transfer, as an example BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Sumedh Pathak | Citus Data | DataEngConf 2018

  39. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  40. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  41. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; COMMIT; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  42. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator BEGIN; BEGIN; UPDATE accounts SET balance = UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; balance - 100 WHERE id = ‘ALICE’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  43. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; BEGIN; Coordinator UPDATE accounts SET balance = BEGIN; balance + 100 WHERE id = ‘BOB’; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; COMMIT A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  44. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  45. What happens during PREPARE? Locks are 
 State of transaction stored maintained on a durable store & Sumedh Pathak | Citus Data | DataEngConf 2018

  46. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator ROLLBACK TRANSACTION ‘citus_... PREPARE TRANSACTION ‘citus_...98’; 98’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  47. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  48. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator COMMIT PREPARED ‘citus_... COMMIT PREPARED ‘citus_... 98’; 98’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  49. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator COMMIT PREPARED ‘citus_... COMMIT PREPARED ‘citus_... 98’; 98’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend