Databases and Stream Processing: A Future of Consolidation Ben - - PowerPoint PPT Presentation

databases and stream processing
SMART_READER_LITE
LIVE PREVIEW

Databases and Stream Processing: A Future of Consolidation Ben - - PowerPoint PPT Presentation

Databases and Stream Processing: A Future of Consolidation Ben Stopford Office of the CTO, Confluent Marc Andreessen: Software is Eating the World Weak Form Strong Form Companies are Companies are USING MORE SOFTWARE BECOMING SOFTWARE


slide-1
SLIDE 1

Databases and Stream Processing:

A Future of Consolidation

Ben Stopford Office of the CTO, Confluent

slide-2
SLIDE 2

Marc Andreessen: Software is Eating the World

slide-3
SLIDE 3

Strong Form

Companies are BECOMING SOFTWARE

Weak Form

Companies are USING MORE SOFTWARE

slide-4
SLIDE 4

Loan Application Using Software

BORROWER

1

CREDIT OFFICER

3

LOAN OFFICER

5

RISK OFFICER

4

APPROVE DENY

6

APPLICATION FORM

2

slide-5
SLIDE 5

Loan Application in Software

BORROWER

1

APPROVE DENY

3

LOAN APP UI CREDIT SERVICE RISK SERVICE CRM SERVICE

2

slide-6
SLIDE 6

Using Software: Classic Three-Tier Architecture

USER UI SERVICE DATABASE

slide-7
SLIDE 7

Becoming Software: Services Talking To Each Other With APIs

SERVICE SERVICE SERVICE SERVICE

slide-8
SLIDE 8

GEOSPATIAL MATCHING ROUTE RE-PLANNING BUSINESS EVENTS BUSINESS EVENTS

DRIVER CUSTOMER

REQUESTING A RIDE

slide-9
SLIDE 9

9

Increasing Complexity

Apps

App

Apps

App

Apps

App

Apps

App

Apps Apps Apps

Service

Apps Apps Apps Apps Apps Apps

Kafka

Evolution of software systems

Monolith Distributed Monolith Microservices Event-Driven Microservices

User Centric Software Centric Service Service Service Service Service Service Service Service

UI UI UI UI

slide-10
SLIDE 10

IS MORE SOFTWARE THE USER OF THE SOFTWARE

slide-11
SLIDE 11

What does this mean for databases?

slide-12
SLIDE 12

10

slide-13
SLIDE 13

We have hundreds

  • f databases...
slide-14
SLIDE 14

We have hundreds

  • f databases...

FUNDAMENTAL ASSUMPTION:

DATA IS PASSIVE

slide-15
SLIDE 15

Databases are designed to help you!

slide-16
SLIDE 16

Unless there is a user and UI waiting, why should it be synchronous?

slide-17
SLIDE 17

The Alternative: Event Streams

slide-18
SLIDE 18

Stream Processors are built for Asynchronicity

slide-19
SLIDE 19

TRADITIONAL DATABASE

SELECT * FROM DB_TABLE

Active Query Passive Data DB Table

EVENT STREAM PROCESSING

CREATE TABLE AS SELECT * FROM EVENT_STREAM

Active Data Passive Query Event Stream

Stream Processors have a different interaction model

slide-20
SLIDE 20

Streams or Tables?

slide-21
SLIDE 21

An Event records the fact that something happened

21

A good was sold An invoice was issued A payment was made A new customer registered

slide-22
SLIDE 22

Events are state changes, they carry intent

State:

Bob works at Google

Event:

Bob moved from Google to Amazon

slide-23
SLIDE 23

23

Where you have been vs. Where you are now Payments you made vs. Your account balance

Streams record exactly what happened

Tables current state

slide-24
SLIDE 24

24

  • 1. e4

e5

  • 2. Nf3

Nc6

  • 3. Bc4

Bc5

  • 4. d3

Nf6

  • 5. Nbd2

Streams

A sequence of moves

Tables

Position of each piece

slide-25
SLIDE 25

Streams = INSERT only

Immutable, append-only

Tables = INSERT, UPDATE, DELETE

Mutable, Primary Key

25

slide-26
SLIDE 26

A stream can be considered as an immutable, append-only table

slide-27
SLIDE 27

Stream Processors Communicate Through Streams

INPUT STREAMS OUTPUT STREAMS

SP

slide-28
SLIDE 28

But internally they use tables

Payments Stream Credit Score Stream CREATE TABLE credit_scores AS SELECT user, updateScore(p.amount)… Credit Score Table 20

slide-29
SLIDE 29

29

projection

(Group By Key, SUM, COUNT)

table changes

*See Streams and Tables: Two Sides of the Same Coin, M. Sax et al., BIRTE ’18

Streams record history

Tables represent state

Duality

slide-30
SLIDE 30

Similar to a materialized view in a database

20

Payments Table Credit Score Table Payments Stream Credit Score Stream Credit Score Table

APP

  • Asynchronous
  • Push query

STREAM PROCESSOR ACTIVE DATABASE

  • Synchronous
  • Pull query
slide-31
SLIDE 31

31

Joins

slide-32
SLIDE 32

Customers Orders

Lookup Customer Table of Customers (with Primary Key)

Joining a stream with a table

slide-33
SLIDE 33

33

Joining two streams

  • rders.join(payments)

Bob’s Order Bob’s Payment Jill’s Payment Jill’s Order Orders Payments

slide-34
SLIDE 34

34

  • rders.join(payments)

Bob’s Order Bob’s Payment Jill’s Payment Jill’s Order

Joining two streams

slide-35
SLIDE 35

35

  • rders.join(payments)

Bob’s Order Bob’s Payment Jill’s Payment Jill’s Order

Joining two streams

slide-36
SLIDE 36

36

  • rders.join(payments)

Bob’s Payment Bob’s Order Jill’s Payment Jill’s Order

Joining two streams

slide-37
SLIDE 37

37

Key-value store Bob’s Order Bob’s Payment Jill’s Payment Jill’s Order

Joining two streams

slide-38
SLIDE 38

38

Key-value store Bob’s Order Jill’s Payment Jill’s Order Bob’s Payment

Joining two streams

slide-39
SLIDE 39

39

Key-value store Bob’s Order Jill’s Payment Jill’s Order Bob’s Payment

Joining two streams

slide-40
SLIDE 40

40

Bob’s Order Jill’s Payment Jill’s Order Bob’s Payment

Joining two streams

slide-41
SLIDE 41

41

Bob’s Order Jill’s Payment Jill’s Order Bob’s Payment

Joining two streams

slide-42
SLIDE 42

42

Bob’s Order Jill’s Payment Jill’s Order Bob’s Payment

Joining two streams

slide-43
SLIDE 43

43

Bob’s Order Jill’s Payment Jill’s Order Bob’s Payment

Joining two streams

slide-44
SLIDE 44

44

Jill’s Payment Jill’s Order Bob’s Payment Bob’s Order

Joining two streams

slide-45
SLIDE 45

Streams represent history –> Cartesian Product

45

Payments Stream Orders Stream Join Output (Stream)

200 Hat2 101 Boots2 105 Pants 101 Boots 200 Hat 101 $60 105 $3 200 $12 101 $50 200 $10

slide-46
SLIDE 46

Joining Streams to Streams

46

Payments Stream Orders Stream Join Output (Stream)

200 Hat2 101 Boots2 105 Pants 101 Boots 200 Hat 101 $60 105 $3 200 $12 101 $50 200 $10

Use time window

slide-47
SLIDE 47

Tools for correlating recent events in time

slide-48
SLIDE 48

More advanced temporal functions

48

Page Visits Orders Join Output (Stream) Session

slide-49
SLIDE 49

Late and out-of-order data

49

Page Visits Orders Join Output (Stream) Window 1 Window 2

slide-50
SLIDE 50

Stream processors provide tools that handle asynchronicity, leverage time and focus on ‘now’

slide-51
SLIDE 51

51

Data Placement

slide-52
SLIDE 52

Layered storage model

52

... ... ...

Storage (Kafka)

Stream Processor

read via network

... ... ...

from stream’s P2 from table’s P2 ‘Caching’ in streaming layer

slide-53
SLIDE 53

Partitioned Data (Fact-Fact joins)

53

... ... ... ...

P1 P2 P3 P4

SP 1 SP 2 SP 3 SP 4

Partitioned KTable / TABLE 2 GB 3 GB 5 GB 2 GB Storage (Kafka)

slide-54
SLIDE 54

Broadcast Data (Fact-Dimension Joins)

54

... ... ... ...

P1 P2 P3 P4

Stream Task 1 Stream Task 2 Stream Task 3 Stream Task 4

GlobalKTable 2 + 3 + 5 + 2 = 12 GB 12 GB 12 GB 12 GB

slide-55
SLIDE 55

Architecturally there are parallels e.g. Data Warehousing

ETL FACTS DIMS REPORTING

slide-56
SLIDE 56

56

Interaction Model

slide-57
SLIDE 57

Stream Processors Continuously Process Input to Output

INPUT STREAMS OUTPUT STREAMS

SP

slide-58
SLIDE 58

TRADITIONAL DATABASE

SELECT * FROM DB_TABLE

Active Query Passive Data DB Table

EVENT STREAM PROCESSING

CREATE TABLE AS SELECT * FROM EVENT_STREAM

Active Data Passive Query Event Stream

slide-59
SLIDE 59

Databases are Pull Queries

What is Ben’s credit score now? 695

APP

Stream Processors are Push Queries

APP

Ben’s credit score is 670 Ben’s credit score is 710 Ben’s credit score is 695

...

Payments Payments

slide-60
SLIDE 60

Hybrid stream processors provide both interaction models

ksqlDB

Payments Stream APP Query Credit Scores Stream Credit Scores Summarize & Materialize Credit Scores

APP

slide-61
SLIDE 61

Unified Model For:

1. The As Asyn ynchronous and the Syn ynchronous 2. 2. In Interaction

  • n with Ac

Active or Pa Passive Dat Data

slide-62
SLIDE 62

Unified interaction model

Now

Earliest to now

The Future The Past

Standard Database Query

slide-63
SLIDE 63

Unified interaction model

Now The Future

Now to forever

The Past

Standard Stream Processing Query

slide-64
SLIDE 64

Unified interaction model

Now The Future

Earliest to forever

The Past

‘Dashboard query’

slide-65
SLIDE 65

Unified Interaction Model

Now

Earliest to now

The Future

Earliest to forever Now to forever

The Past

slide-66
SLIDE 66

PUSH PULL

SELECT user, credit_score FROM orders WHERE ROWKEY = ‘bob’ EMIT CHANGES; SELECT user, credit_score FROM orders WHERE ROWKEY = ‘bob’;

slide-67
SLIDE 67

Asynchronous => Pipelines

Transactions Joins/aggregation/time-handling APP

SQL SQL SQL

APP

slide-68
SLIDE 68

Other important variants

  • Stream processors are often programming frameworks today

○ Storm ○ Flink ○ Kafka Streams

  • Today we have active databases that include change streams:

○ Mongo ○ Couchbase ○ RethinkDB

slide-69
SLIDE 69

As Software Eats the World

slide-70
SLIDE 70

IS MORE SOFTWARE THE USER OF THE SOFTWARE

slide-71
SLIDE 71

We need Asynchronous + Synchronous Active + Passive

slide-72
SLIDE 72

We still need all of these

slide-73
SLIDE 73

So is the traditional perception of “a database” enough?

slide-74
SLIDE 74

Ben Stopford

Confluent @benstopford ben@confluent.io