databases and stream processing
play

Databases and Stream Processing: A Future of Consolidation Ben - PowerPoint PPT Presentation

Databases and Stream Processing: A Future of Consolidation Ben Stopford Office of the CTO, Confluent Marc Andreessen: Software is Eating the World Weak Form Strong Form Companies are Companies are USING MORE SOFTWARE BECOMING SOFTWARE


  1. Databases and Stream Processing: A Future of Consolidation Ben Stopford Office of the CTO, Confluent

  2. Marc Andreessen: Software is Eating the World

  3. Weak Form Strong Form Companies are Companies are USING MORE SOFTWARE BECOMING SOFTWARE

  4. Loan Application Using Software 1 2 3 4 5 6 APPROVE DENY BORROWER APPLICATION CREDIT RISK LOAN FORM OFFICER OFFICER OFFICER

  5. Loan Application in Software 1 2 3 APPROVE DENY CREDIT RISK CRM BORROWER LOAN APP UI SERVICE SERVICE SERVICE

  6. Using Software: Classic Three-Tier Architecture USER UI SERVICE DATABASE

  7. Becoming Software: Services Talking To Each Other With APIs SERVICE SERVICE SERVICE SERVICE

  8. CUSTOMER DRIVER REQUESTING A RIDE BUSINESS BUSINESS EVENTS EVENTS GEOSPATIAL ROUTE MATCHING RE-PLANNING

  9. 9 Evolution of software systems Event-Driven Monolith Distributed Monolith Microservices Microservices UI UI UI UI Service App App App Service Service Service Apps App Service Apps Service Apps Apps Apps Apps Apps Apps Apps Apps User Centric Kafka Software Centric Service Service Apps Service Apps Apps Increasing Complexity

  10. THE USER OF IS MORE THE SOFTWARE SOFTWARE

  11. What does this mean for databases?

  12. 10

  13. We have hundreds of databases...

  14. FUNDAMENTAL ASSUMPTION: We have hundreds DATA IS PASSIVE of databases...

  15. Databases are designed to help you !

  16. Unless there is a user and UI waiting, why should it be synchronous?

  17. The Alternative: Event Streams

  18. Stream Processors are built for Asynchronicity

  19. Stream Processors have a different interaction model TRADITIONAL EVENT STREAM DATABASE PROCESSING Active Query Passive Data Active Data Passive Query CREATE TABLE AS SELECT * SELECT * FROM FROM EVENT_STREAM DB_TABLE DB Table Event Stream

  20. Streams or Tables?

  21. 21 An Event records the fact that something happened A good An invoice A payment A new customer was sold was issued was made registered

  22. Events are state changes, they carry intent State: Event: Bob works at Bob moved Google from Google to Amazon

  23. 23 Tables Streams current state record exactly what happened Where you have been vs. Where you are now Payments you made vs. Your account balance

  24. 24 Streams Tables A sequence of moves Position of each piece 1. e4 e5 2. Nf3 Nc6 3. Bc4 Bc5 4. d3 Nf6 5. Nbd2

  25. 25 Streams = INSERT only Immutable, append-only Tables = INSERT, UPDATE, DELETE Mutable, Primary Key

  26. A stream can be considered as an immutable, append-only table

  27. Stream Processors Communicate Through Streams INPUT STREAMS OUTPUT STREAMS SP

  28. But internally they use tables CREATE TABLE credit_scores AS SELECT user, updateScore(p.amount)… Payments Stream Credit Score Table Credit Score Stream 20

  29. 29 Tables Streams represent state record history projection (Group By Key, SUM, COUNT) Duality table changes *See Streams and Tables: Two Sides of the Same Coin, M. Sax et al., BIRTE ’18

  30. Similar to a materialized view in a database STREAM Payments - Asynchronous PROCESSOR APP Stream - Push query Credit Score Credit Score Table Stream ACTIVE - Synchronous DATABASE Payments - Pull query Table Credit Score Table 20

  31. 31 Joins

  32. Joining a stream with a table Orders Lookup Customer Table of Customers Customers (with Primary Key)

  33. 33 Joining two streams Bob’s Jill’s Order Order Orders Payments Bob’s Jill’s Payment Payment orders.join(payments)

  34. 34 Joining two streams Bob’s Jill’s Order Order Bob’s Jill’s Payment Payment orders.join(payments)

  35. 35 Joining two streams Bob’s Jill’s Order Order Bob’s Jill’s Payment Payment orders.join(payments)

  36. 36 Joining two streams Jill’s Bob’s Order Order Jill’s Bob’s Payment Payment orders.join(payments)

  37. 37 Joining two streams Key-value store Jill’s Bob’s Order Order Jill’s Payment Bob’s Payment

  38. 38 Joining two streams Jill’s Order Key-value store Bob’s Order Jill’s Payment Bob’s Payment

  39. 39 Joining two streams Jill’s Order Key-value store Bob’s Order Jill’s Payment Bob’s Payment

  40. 40 Joining two streams Jill’s Order Bob’s Order Jill’s Payment Bob’s Payment

  41. 41 Joining two streams Bob’s Order Jill’s Order Jill’s Payment Bob’s Payment

  42. 42 Joining two streams Bob’s Order Jill’s Order Jill’s Payment Bob’s Payment

  43. 43 Joining two streams Bob’s Order Jill’s Order Jill’s Payment Bob’s Payment

  44. 44 Joining two streams Bob’s Jill’s Order Order Bob’s Jill’s Payment Payment

  45. 45 Streams represent history –> Cartesian Product Payments Orders Stream Stream 101 Boots 101 $50 200 Hat 200 $10 101 Boots2 105 $3 105 Pants 200 $12 200 Hat2 101 $60 Join Output (Stream)

  46. 46 Joining Streams to Streams Orders Payments Stream Stream 101 Boots 101 $50 200 Hat 200 $10 101 Boots2 105 $3 105 Pants 200 $12 200 Hat2 101 $60 Use time window Join Output (Stream)

  47. Tools for correlating recent events in time

  48. 48 More advanced temporal functions Orders Page Visits Session Join Output (Stream)

  49. 49 Late and out-of-order data Orders Page Visits Window 1 Window 2 Join Output (Stream)

  50. Stream processors provide tools that handle asynchronicity, leverage time and focus on ‘now’

  51. 51 Data Placement

  52. 52 Layered storage model Storage (Kafka) ‘Caching’ in ... streaming layer ... from stream’s P2 ... Stream read via Processor network from table’s P2 ... ... ...

  53. 53 Partitioned Data (Fact-Fact joins) Partitioned Storage (Kafka) KTable / TABLE 2 GB SP 1 P1 ... 3 GB SP 2 P2 ... 5 GB SP 3 P3 ... SP 4 2 GB P4 ...

  54. 54 Broadcast Data (Fact-Dimension Joins) GlobalKTable 2 + 3 + 5 + 2 = 12 GB Stream Task 1 P1 ... 12 GB Stream Task 2 P2 ... Stream Task 3 12 GB P3 ... Stream Task 4 12 GB P4 ...

  55. Architecturally there are parallels e.g. Data Warehousing FACTS DIMS ETL REPORTING

  56. 56 Interaction Model

  57. Stream Processors Continuously Process Input to Output INPUT STREAMS OUTPUT STREAMS SP

  58. TRADITIONAL EVENT STREAM DATABASE PROCESSING Active Query Passive Data Active Data Passive Query CREATE TABLE AS SELECT * SELECT * FROM FROM EVENT_STREAM DB_TABLE DB Table Event Stream

  59. Stream Processors are Databases are Push Queries Pull Queries Payments Payments What is Ben’s credit score now? Ben’s credit score is 670 Ben’s credit score is 710 APP APP Ben’s credit score is 695 695 ...

  60. Hybrid stream processors provide both interaction models Payments Stream APP ksqlDB Query Credit Scores APP Summarize & Materialize Stream Credit Scores Credit Scores

  61. Unified Model For: ynchronous and the Syn 1. The As Asyn ynchronous Active or Pa 2. 2. In Interaction on with Ac Passive Dat Data

  62. Unified interaction model Standard Database Earliest to now Query The Past The Future Now

  63. Unified interaction model Standard Stream Processing Query Now to forever The Past The Future Now

  64. Unified interaction model ‘Dashboard query’ Earliest to forever The Past The Future Now

  65. Unified Interaction Model Earliest to now Earliest to forever Now to forever The Past The Future Now

  66. PUSH PULL SELECT user, credit_score SELECT user, credit_score FROM orders FROM orders WHERE ROWKEY = ‘bob’ WHERE ROWKEY = ‘bob’; EMIT CHANGES;

  67. Asynchronous => Pipelines Transactions APP SQL SQL SQL APP Joins/aggregation/time-handling

  68. Other important variants ● Stream processors are often programming frameworks today Storm ○ Flink ○ Kafka Streams ○ ● Today we have active databases that include change streams: Mongo ○ Couchbase ○ RethinkDB ○

  69. As Software Eats the World

  70. THE USER OF IS MORE THE SOFTWARE SOFTWARE

  71. We need Asynchronous + Synchronous Active + Passive

  72. We still need all of these

  73. So is the traditional perception of “a database” enough?

  74. Ben Stopford Confluent @benstopford ben@confluent.io

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend