Push vs. Pull The Future of Real-Time Databases in the Cloud - - PowerPoint PPT Presentation

push vs pull
SMART_READER_LITE
LIVE PREVIEW

Push vs. Pull The Future of Real-Time Databases in the Cloud - - PowerPoint PPT Presentation

Push vs. Pull The Future of Real-Time Databases in the Cloud Wolfram Wingerath ww@baqend.com December 10, SCDM 2018, Seattle www.baqend.com About me Wolfram Wingerath PhD Thesis & Distributed Research Systems Engineer Research:


slide-1
SLIDE 1

Push vs. Pull

The Future of Real-Time Databases in the Cloud

Wolfram Wingerath

ww@baqend.com December 10, SCDM 2018, Seattle

www.baqend.com

slide-2
SLIDE 2

Research:

  • Real-Time Databases
  • Stream Processing
  • NoSQL & Cloud Databases

Practice: Backend-as-a-Service Web Caching Real-Time Database …

+

  • www.baqend.com

About me Wolfram Wingerath

PhD Thesis & Research Distributed Systems Engineer

slide-3
SLIDE 3

Outline

  • A Small History Lesson
  • The Problem With

Traditional Databases

  • Real-Time Databases to the

Rescue!

Discussion What are the bottlenecks? Push-Based Data Access Why Real-Time Databases? Real-Time Databases System survey Future Directions Scalability & Use Cases

3

slide-4
SLIDE 4

1970 1970 1980 1980 1990 1990 2000 2000 2010 2010 tod

  • day

Relational Model Ingres System R Triggers Entity-Relationship Model SQL Standard PostgreSQL HiPAC Starburst Rapide STREAM Aurora & Borealis MapReduce Bigtable Dynamo Spark Storm Flink Samza RethinkDB Meteor Firebase Baqend GFS Relational Databases Active Databases CEP & Streams Big Data & NoSQL Stream Processing Real-Time Databases

A S Short His istory ry of Data Management

Hot t Topic ics Th Through Th The Ages

Telegraph

slide-5
SLIDE 5

Traditional Databases

The Problem: No No Request – No No Data!

circular shapes

What‘s the current state? Periodic Pol

  • llin

ling for query result maintenance: → in inefficie ient → sl slow

5

slide-6
SLIDE 6

Real-time Databases

Alw lways Up Up-to to-Date Wit ith Database St State

circular shapes

Real-Time Querie ies for query result maintenance: → efficient → fast

6

slide-7
SLIDE 7

Real-Time Query ry Main intenance

Matchin ing Every ry Query ry Again inst Every ry Update

 Potential bottlenecks:

  • Number of queries
  • Write throughput
  • Query complexity

Similar processing for:

  • Triggers
  • ECA rules
  • Materialized views
slide-8
SLIDE 8

Outline

  • Meteor
  • RethinkDB
  • Parse
  • Firebase
  • Others

Discussion What are the bottlenecks? Push-Based Data Access Why Real-Time Databases? Real-Time Databases System survey

8

Future Directions Scalability & Use Cases

slide-9
SLIDE 9

Real-Time Databases

slide-10
SLIDE 10

Overvie iew:

  • Ja

JavaScript Fr Framework for interactive apps and websites  Mon

  • ngoDB under the hood

 Real-time result updates, full MongoDB expressiveness

  • Open-source: MIT license
  • Man

anaged se service: Galaxy (Platform-as-a-Service)

His istory ry:

  • 2011: Skybreak is announced
  • 2012: Skybreak is renamed to Meteor
  • 2015: Managed hosting service Galaxy is announced

Meteor

10

slide-11
SLIDE 11

Liv ive Queries

Poll ll-and and-Dif iff

  • Chan

ange monit itoring: app servers detect relevant changes → incomplete in multi-server deployment

  • Pol
  • ll-and-diff: queries are re-executed periodically

→ stale leness win indow → doe

  • es not
  • t sc

scal ale with queries

app server

monitor incoming writes

CRUD app server

repeat query every 10 seconds

?

forward CRUD

11

!

slide-12
SLIDE 12

Oplog Tail iling

Basic ics: MongoDB Repli lication

  • Oplog: rolling record of data modifications
  • Mas

aster-slave replication: Secondaries subscribe to oplog

Secondary C2

apply propagate change write operation

Secondary C3 Secondary C1 MongoDB cluster (3 shards) Primary B Primary A Primary C

12

slide-13
SLIDE 13

Oplog Tail iling

Tapping in into the Oplo log

Primary B Primary A Primary C MongoDB cluster (3 shards) App server App server Oplog broadcast CRUD

query (when in doubt) monitor

  • plog

push relevant events

13

slide-14
SLIDE 14

Oplog Tail iling

Oplo log In Info is is In Incomple lete

  • 1. { name: „Joy“, game: „baccarat“, score: 100 }
  • 2. { name: „Tim“, game: „baccarat“, score: 90 }
  • 3. { name: „Lee“, game: „baccarat“, score: 80 }

Baccarat players sorted by high-score

Partial update from oplog:

{ name: „Bobby“, score: 500 } // game: ???

What game does Bobby pla lay?

→ if baccarat, he takes first place! → if something else, nothing changes!

14

slide-15
SLIDE 15

Oplog Tail iling

Tapping in into the Oplo log

  • Every Meteor server receives

all DB writes through oplogs → doe

  • es not
  • t sc

scal ale

Primary B Primary A Primary C MongoDB cluster (3 shards) App server App server Oplog broadcast CRUD

query (when in doubt) monitor

  • plog

push relevant events

Bot

  • ttle

leneck!

15

slide-16
SLIDE 16

Overv rview:

  • „Mon
  • ngoDB don
  • ne rig

right“: comparable queries and data model, but also:

 Pus ush-base sed qu querie ies (filters only)  Jo Joins ins (non-streaming)  Str trong con

  • nsi

sistency: linearizability

  • Ja

JavaS aScript SD SDK (Horizon): open-source, as managed service

  • Op

Open-source: Apache 2.0 license

His istory ry:

  • 2009: RethinkDB is founded
  • 2012: RethinkDB is open-sourced under AGPL
  • 2016, May: first official release of Horizon (JavaScript SDK)
  • 2016, October: RethinkDB announces shutdown
  • 2017: RethinkDB is relicensed under Apache 2.0

RethinkDB

16

slide-17
SLIDE 17

RethinkDB

Changefeed Archit itecture

William Stein, RethinkDB versus PostgreSQL: my personal experience (2017) http://blog.sagemath.com/2017/02/09/rethinkdb-vs-postgres.html (2017-02-27)

RethinkDB proxy RethinkDB proxy RethinkDB storage cluster

  • Range-sharded data
  • RethinkDB proxy: support node

without data

  • Client communication
  • Request routing
  • Real-time query matching
  • Every proxy receives

all database writes → doe

  • es not
  • t sc

scale

App server App server

Daniel Mewes, Comment on GitHub issue #962: Consider adding more docs on RethinkDB Proxy (2016) https://github.com/rethinkdb/docs/issues/962 (2017-02-27)

Bot

  • ttle

leneck!

17

slide-18
SLIDE 18

Overv rview:

  • Bac

ackend-as as-a-Service for mobile apps

 Mon

  • ngoDB:

: largest deployment world-wide  Eas asy de develo elopment: great docs, push notifications, authentication, …  Rea eal-ti time updates for most MongoDB queries

  • Op

Open-source: BSD license

  • Man

anaged serv service: discontinued

His istory ry:

  • 2011: Parse is founded
  • 2013: Parse is acquired by Facebook
  • 2015: more than 500,000 mobile apps reported on Parse
  • 2016, January: Parse shutdown is announced
  • 2016, March: Liv

Live Quer eries are announced

  • 2017: Parse shutdown is finalized

Parse

18

slide-19
SLIDE 19

Illustration taken from: http://parseplatform.github.io/docs/parse-server/guide/#live-queries (2017-02-22)

  • Liv

LiveQuery Se Server: no data, real-time query matching

  • Every LiveQuery Server receives

all database writes → doe

  • es not
  • t sc

scal ale

Parse

Liv iveQuery ry Archit itecture

Bot

  • ttle

leneck!

19

slide-20
SLIDE 20

Overv rview:

  • Real-tim

ime state synchroniz izatio ion across devices

  • Sim

Simpli listic ic data model: : nested hierarchy of lists and objects

  • Sim

Simpli listic ic querie ies: mostly navigation/filtering

  • Fu

Full lly managed, proprietary

  • Ap

App SDK SDK for App development, mobile-first

  • Go

Google le se services in integratio ion: analytics, hosting, authorization, …

His istory:

  • 2011: chat service startup Envolve is founded

→ was often used for cross-device state synchronization → state synchronization is separated (Firebase)

  • 2012: Firebase is founded
  • 2013: Firebase is acquired by Google

Fir irebase

20

slide-21
SLIDE 21

Fir irebase

Real-Time State Syn ynchronization

Illustration taken from: Frank van Puffelen, Have you met the Realtime Database? (2016) https://firebase.googleblog.com/2016/07/have-you-met-realtime-database.html (2017-02-27)

  • Tree data mod
  • del: application state ̴JSON object
  • Su

Subtree syn ynchin ing: push notifications for specific keys only → Flat structure for fine granularity → Limited expr pressiv iveness!

21

slide-22
SLIDE 22

Fir irebase

Query Processing in in the Clie lient

Illustration taken from: Frank van Puffelen, Have you met the Realtime Database? (2016) https://firebase.googleblog.com/2016/07/have-you-met-realtime-database.html (2017-02-27)

  • Push notifications for sp

specific keys only

  • Order by a si

single le attribute

  • Apply a si

single le filt filter on that attribute

  • Non-trivial query processing in client

→ doe

  • es not
  • t sc

scal ale!

Jacob Wenger, on the Firebase Google Group (2015) https://groups.google.com/forum/#!topic/firebase-talk/d-XjaBVL2Ko (2017-02-27)

22

slide-23
SLIDE 23

Fir irebase

Hard Scali ling Lim imits

Firebase, Choose a Database: Cloud Firestore or Realtime Database (2018) https://firebase.google.com/docs/database/rtdb-vs-firestore (2018-03-10)

“Scale to around 100,000 concurrent connections and 1,000 writes/second in a single database. Scaling beyond that requires sharding your data across multiple databases.”

Bot

  • ttle

leneck!

slide-24
SLIDE 24

Illustration taken from: Todd Kerpelman, Cloud Firestore for Realtime Database Developers (2017) https://firebase.googleblog.com/2017/10/cloud-firestore-for-rtdb-developers.html (2018-03-10)

colle llections documents references

Fir irebase

Fir irestore: New Model

slide-25
SLIDE 25

Fir irebase

Fir irestore: New Model

Illustration taken from: Todd Kerpelman, Cloud Firestore for Realtime Database Developers (2017) https://firebase.googleblog.com/2017/10/cloud-firestore-for-rtdb-developers.html (2018-03-10)

tr tree-lik ike str tructure fin finer ac access granula lates

slide-26
SLIDE 26

Fir irebase

Fir irestore: Summary

  • More specific data selection
  • Logical AND for some filter combinations

… But:

  • Still Lim

Limited Expressiveness

  • No logical OR
  • No logical AND for many filter combinations
  • No content-based search (regex, full-text search)
  • Still Lim

Limited Writ ite Throughput:

  • 500 writes/s per collection
  • 1 writes/s per document

Firebase, Firestore: Quotas and Limits (2018) https://firebase.google.com/docs/firestore/quotas (2018-03-10)

slide-27
SLIDE 27

27

Honorable Mentions

Oth ther Systems Wit ith Real-Tim ime Features

slide-28
SLIDE 28

Outline

  • System Classification:
  • Databases
  • Real-Time Databases
  • Stream Management
  • Stream Processing
  • Side-by-Side Comparison

Discussion What are the bottlenecks? Push-Based Data Access Why Real-Time Databases? Real-Time Databases System survey

28

Future Directions Scalability & Use Cases

slide-29
SLIDE 29

Wrapup & Discussion

slide-30
SLIDE 30

Database Management static collections

push-based pull-based

Real-Time Databases evolving collections

Data Management Overview

DBMS S vs. . Real-Tim ime DB vs. . Stream Management

Data Stream Management persistent/ ephemeral streams

slide-31
SLIDE 31

Poll-and-Diff Log Tailing Unknown 2-D Partitioning

Write Scalability

     

Read Scalability

   

?

(100k connections)

Composite Filters (AND/OR)

   

(AND In Firestore)

Sorted Queries

   

(single attribute)

Limit

     

Offset

   

(value-based)

Self-Maintaining Queries

     

Event Stream Queries

     

Real-Time Database Comparison

slide-32
SLIDE 32

Outline

  • Performance & Scalability
  • Query Expressiveness
  • Use Cases
  • Real-Time Apps
  • Query Caching
  • Summary

Discussion What are the bottlenecks? Push-Based Data Access Why Real-Time Databases? Real-Time Databases System survey

32

Future Directions Scalability & Use Cases

slide-33
SLIDE 33

Making Real-Time Databases Scale

slide-34
SLIDE 34

Pub-Sub Pub-Sub

Baqend Real-Time Queries

Real-Time Decoupled

Keeps data up-to-date!

34

App Server

slide-35
SLIDE 35

Match!

Baqend Real-Time Queries

Filt ilter Queries: Dis istr tributed Query Matching

Two-dimensional l par artit titioning:

  • by Query
  • by Object

→ sc scale les wit ith querie ies an and writ rites Implementation:

  • Apache Storm
  • Topology in Java
  • MongoDB query language
  • Plu

Plugg ggable le query ry engin ine Subscription! Write op!

35

slide-36
SLIDE 36

Baqend Real-Time Queries

Staged Real-Tim ime Query ry Processin ing

Change notifications go through up to 4 query processing stages: 1.

  • 1. Filt

Filter queries: track matching status → before- and after-images 2.

  • 2. So

Sorted querie ies: maintain result order 3.

  • 3. Joi

Joins: combine maintained results 4.

  • 4. Aggregations: maintain aggregations

Ordering Joins Aggregation Filtering

Event! Event! Event! Event!

a b c

36

slide-37
SLIDE 37

Linear Scalability Stable Latency Distribution

Baqend Real-Time Queries

Low Latency + Lin inear Scalabili lity

Quaestor: Query Web Caching for Database-as-a-Service Providers VLDB ‘17

slide-38
SLIDE 38

var query = DB.Tweet.find() .matches('text', /my filter/) .descending('createdAt') .offset(20) .limit(10); query.resultList(result => ...); query.resultStream(result => ...);

Static Query Real-Time Query

Programming Real-Time Queries

Ja JavaScript API

slide-39
SLIDE 39
slide-40
SLIDE 40

Problem: : Slo low Websites

Two Bot

  • ttlenecks:

: La Latency an and Proc

  • cessing

High Latency Processing Overhead

slide-41
SLIDE 41

Solu lution: : Glo lobal l Caching

Fr Fres esh Data a Fr From Dis istributed Web Cac aches

Low Latency Less Processing

slide-42
SLIDE 42

New Caching Al Algorithms

Sol Solve Con

  • nsistency Prob
  • blem

1 0 1 1 0 0 1

slide-43
SLIDE 43

How to detect changes to query results: „Give me the most popular products that are in stock.“

Add Change Remove

In InvaliDB

In Invalidating DB DB Queries

slide-44
SLIDE 44

 Sc

Scala labil ilit ity:

 Handle increasing throughput  Handle additional queries

 Exp

xpressiv iveness:

 Content-based search? Composite filters?  Ordering? Limit? Offset?

 Legacy Su

Support:

 Real-time queries for existing databases?  Decouple OLTP from real-time workloads?

Summary ry

Real-Time Databases: Majo jor chall llenges

slide-45
SLIDE 45

Our Related Publications

Quaestor: Query Web Caching for Database-as- a-Service Providers VLDB ‘17 NoSQL Database Systems: A Survey and Decision Guidance SummerSOC ‘16 Real-time stream processing for Big Data it - Information Technology 58 (2016) Real-Time Databases Explained: Why Meteor, RethinkDB, Parse and Firebase Don't Scale Baqend Tech Blog (2017): https://medium.com/p/822ff87d2f87 The Case For Change Notifications in Pull-Based DatabasesBTW ‘17

Book, Papers, Articles & Tutorials: Blog Posts:

Learn more at blog.baqend.com!

Real-Time & Stream Data Management: Push-Based Data in Research & Practice. Springer 2019 Real-Time Data Management for Big Data. EDBT 2018 Scalable Push-Based Real-Time Queries on Top of Pull- Based Databases. PhD thesis, Wolfram Wingerath, 2018 Low Latency for Cloud Data Management. PhD thesis, Felix Gessert, 2018

slide-46
SLIDE 46

Thank you

@baqendcom

wingerath@informatik.uni-hamburg.de Blog: blog.baqend.com Slides: slides.baqend.com