The Case for Change Notifications in Pull-Based Databases Wolfram - - PowerPoint PPT Presentation

the case for change notifications
SMART_READER_LITE
LIVE PREVIEW

The Case for Change Notifications in Pull-Based Databases Wolfram - - PowerPoint PPT Presentation

The Case for Change Notifications in Pull-Based Databases Wolfram Wingerath, Felix Gessert, Steffen Friedrich, Erik Witt and Norbert Ritter Wolfram Wingerath wingerath@informatik.uni-hamburg.de March 6th, 2017, Stuttgart Traditional Databases


slide-1
SLIDE 1

The Case for Change Notifications

in Pull-Based Databases

Wolfram Wingerath

wingerath@informatik.uni-hamburg.de March 6th, 2017, Stuttgart

Wolfram Wingerath, Felix Gessert, Steffen Friedrich, Erik Witt and Norbert Ritter

slide-2
SLIDE 2

Traditional Databases

No No Request? No No Data!

circular shapes

Query ry main aintenance: : periodic polling → In Inefficient → Sl Slow

45

What‘s the current state?

slide-3
SLIDE 3

db.User.find() .equal('room','B') .ascending('name') .limit(3) .streamResult()

A B C

x y

Find people in Room B:

10 20 5 10 1. 2. 3. 5 15 25 15 Wolle (22/8) Erik (5/10)

Id Ideal: : Push-Based Data Access

Self lf-Main intaining Results

46

slide-4
SLIDE 4

Real-Time Databases

slide-5
SLIDE 5

Overv rview:

  • Real-tim

ime state synchroniz izatio ion across devices

  • Sim

Simpli listic ic data model: : nested hierarchy of lists and objects

  • Sim

Simpli listic ic querie ies: mostly navigation/filtering

  • Fu

Full lly managed, proprietary

  • Ap

App SDK SDK for App development, mobile-first

  • Go

Google le se services in integratio ion: analytics, hosting, authorization, …

His istory:

  • 2011: chat service startup Envolve is founded

→ was often used for cross-device state synchronization → state synchronization is separated (Firebase)

  • 2012: Firebase is founded
  • 2013: Firebase is acquired by Google

Fir irebase

48

slide-6
SLIDE 6

Fir irebase

Real-Time State Syn ynchronization

Illustration taken from: Frank van Puffelen, Have you met the Realtime Database? (2016) https://firebase.googleblog.com/2016/07/have-you-met-realtime-database.html (2017-02-27)

  • Tree data mod
  • del: application state ̴JSON object
  • Su

Subtree syn ynchin ing: push notifications for specific keys only → Flat structure for fine granularity → Limited expr pressiv iveness!

49

slide-7
SLIDE 7

Fir irebase

Query Processing in in the Clie lient

Illustration taken from: Frank van Puffelen, Have you met the Realtime Database? (2016) https://firebase.googleblog.com/2016/07/have-you-met-realtime-database.html (2017-02-27)

  • Push notifications for sp

specific keys only

  • Order by a si

single le attribute

  • Apply a si

single le filt filter on that attribute

  • Non-trivial query processing in client

→ doe

  • es not
  • t sc

scal ale!

Jacob Wenger, on the Firebase Google Group (2015) https://groups.google.com/forum/#!topic/firebase-talk/d-XjaBVL2Ko (2017-02-27)

50

slide-8
SLIDE 8

Overvie iew:

  • Ja

JavaScript Fr Framework for interactive apps and websites  Mon

  • ngoDB under the hood

 Real-time result updates, full MongoDB expressiveness

  • Open-source: MIT license
  • Man

anaged se service: Galaxy (Platform-as-a-Service)

His istory ry:

  • 2011: Skybreak is announced
  • 2012: Skybreak is renamed to Meteor
  • 2015: Managed hosting service Galaxy is announced

Meteor

51

slide-9
SLIDE 9

Liv ive Queries

Poll ll-and and-Dif iff

  • Chan

ange monit itoring: app servers detect relevant changes → incomplete in multi-server deployment

  • Pol
  • ll-and-diff: queries are re-executed periodically

→ stale leness win indow → doe

  • es not
  • t sc

scal ale with queries

app server

monitor incoming writes

CRUD app server

poll DB every 10 seconds forward CRUD

52

? !

slide-10
SLIDE 10

Oplog Tail iling

Basic ics: MongoDB Repli lication

  • Oplog: rolling record of data modifications
  • Mas

aster-slave replication: Secondaries subscribe to oplog

Secondary C2

apply propagate change write operation

Secondary C3 Secondary C1 MongoDB cluster (3 shards) Primary B Primary A Primary C

53

slide-11
SLIDE 11

Oplog Tail iling

Tapping in into the Oplo log

  • Every Meteor server receives

all DB writes through oplogs → doe

  • es not
  • t sc

scal ale

Primary B Primary A Primary C MongoDB cluster (3 shards) App server App server Oplog broadcast CRUD

query (when in doubt) monitor

  • plog

push relevant events

Bot

  • ttle

leneck!

54

slide-12
SLIDE 12

Oplog Tail iling

Oplo log In Info is is In Incomple lete

  • 1. { name: „Joy“, game: „baccarat“, score: 100 }
  • 2. { name: „Tim“, game: „baccarat“, score: 90 }
  • 3. { name: „Lee“, game: „baccarat“, score: 80 }

Baccarat players sorted by high-score

Partial update from oplog:

{ name: „Bobby“, score: 500 } // game: ???

What game does Bobby pla lay?

→ if baccarat, he takes first place! → if something else, nothing changes!

55

slide-13
SLIDE 13

Overv rview:

  • „Mon
  • ngoDB don
  • ne rig

right“: comparable queries and data model, but also:

 Pus ush-base sed qu querie ies (filters only)  Jo Joins ins (non-streaming)  Str trong con

  • nsi

sistency: linearizability

  • Ja

JavaS aScript SD SDK (Horizon): open-source, as managed service

  • Op

Open-source: Apache 2.0 license

His istory ry:

  • 2009: RethinkDB is founded
  • 2012: RethinkDB is open-sourced under AGPL
  • 2016, May: first official release of Horizon (JavaScript SDK)
  • 2016, October: RethinkDB announces shutdown
  • 2017: RethinkDB is relicensed under Apache 2.0

RethinkDB

56

slide-14
SLIDE 14

RethinkDB

Changefeed Archit itecture

William Stein, RethinkDB versus PostgreSQL: my personal experience (2017) http://blog.sagemath.com/2017/02/09/rethinkdb-vs-postgres.html (2017-02-27)

RethinkDB proxy RethinkDB proxy RethinkDB storage cluster

  • Range-sharded data
  • Rethin

inkDB proxy: support node without data

  • Client communication
  • Request routing
  • Real-time query matching
  • Every proxy receives

all database writes → doe

  • es not
  • t sc

scal ale

App server App server

Daniel Mewes, Comment on GitHub issue #962: Consider adding more docs on RethinkDB Proxy (2016) https://github.com/rethinkdb/docs/issues/962 (2017-02-27)

Bot

  • ttle

leneck!

57

slide-15
SLIDE 15

Overv rview:

  • Bac

ackend-as as-a-Service for mobile apps

 Mon

  • ngoDB:

: largest deployment world-wide  Eas asy de develo elopment: great docs, push notifications, authentication, …  Rea eal-ti time updates for most MongoDB queries

  • Op

Open-source: BSD license

  • Man

anaged serv service: discontinued

His istory ry:

  • 2011: Parse is founded
  • 2013: Parse is acquired by Facebook
  • 2015: more than 500,000 mobile apps reported on Parse
  • 2016, January: Parse shutdown is announced
  • 2016, March: Liv

Live Quer eries are announced

  • 2017: Parse shutdown is finalized

Parse

58

slide-16
SLIDE 16

Illustration taken from: http://parseplatform.github.io/docs/parse-server/guide/#live-queries (2017-02-22)

  • Liv

LiveQuery Se Server: no data, real-time query matching

  • Every LiveQuery Server receives

all database writes → doe

  • es not
  • t sc

scal ale

Parse

Liv iveQuery ry Archit itecture

Bot

  • ttle

leneck!

59

slide-17
SLIDE 17

Comparison by by Real-Time Query ry

Why Comple lexit ity Matters

matching conditions

  • rdering

Firebase Meteor RethinkDB Parse Todos created by „Bob“

  • rdered by deadline

   

Todos created by „Bob“ AND with status equal to „active“

   

Todos with „work“ in the name

   

  • rdered by deadline

   

Todos with „work“ in the name AND status of „active“

  • rdered by deadline

AND then by the creator‘s name

   

60

slide-18
SLIDE 18

Quick Comparison

DBMS vs. . RT DB vs. . DSMS vs. . Stream Processing

61

Database Management Real-Time Databases Data Stream Management Stream Processing Data persistent collections persistent/ephemeral streams Processing

  • ne-time
  • ne-time +

continuous continuous Access random random + sequential sequential Streams structured structured, unstructured

slide-19
SLIDE 19

Every database with real-time features suffers from several of these problems:

  • Expr

xpres essiveness:

  • Queries
  • Data model
  • Legacy support
  • Per

erformance:

  • Latency & throughput
  • Scala

labil ilit ity

  • Robustness:
  • Fault-tolerance, handling malicious behavior etc.
  • Separation of concerns:

→ Avail ilabili lity: will a crashing real-time subsystem take down primary data storage? → Co Consis istency: can real-time be scaled out independently from primary storage?

Dis iscussion

Common Is Issues

62

slide-20
SLIDE 20

Engineering Efforts:

Add-On Real-Time Queries

slide-21
SLIDE 21

Pub-Sub Pub-Sub

In InvaliDB

Ext xternal Query ry Main intenance

65

slide-22
SLIDE 22

In InvaliDB

Change Notifications

ad add ch changeIndex ch change remove

{ title: "SQL", year: 2016 }

SELECT * FROM posts WHERE title LIKE "%NoSQL%" ORDER BY year DESC

66

slide-23
SLIDE 23

In InvaliDB

Filt ilter Queries: Dis istributed Query Matching

Two-dimensional l par artit itioning:

  • by Query
  • by Object

→ sc scale les wit ith querie ies an and writ rites Implementation:

  • Apache Storm
  • Topology in Java
  • MongoDB query language
  • Plu

lugg ggable le query ry engin ine Write op!

67

Match!

slide-24
SLIDE 24

In InvaliDB

Staged Real-Tim ime Query ry Processin ing

Change notifications go through up to 4 query processing stages: 1.

  • 1. Filt

Filter queries: track matching status → before- and after-images 2.

  • 2. So

Sorted querie ies: maintain result order 3.

  • 3. Joi

Joins: combine maintained results 4.

  • 4. Aggregations: maintain aggregations

Ordering Joins Aggregation Filtering

Event! Event! Event! Event!

a b c

68

slide-25
SLIDE 25

In InvaliDB

Low Latency + Lin inear Scalabili lity

69

slide-26
SLIDE 26

Research in Hamburg

slide-27
SLIDE 27

Deli livering Dynamic Content

Two Bottlenecks: : Latency und und Processing

High Latency Processing Time

slide-28
SLIDE 28

Solution: : Glo lobal Caching

Fresh Data from Ubiq iquitous Web Caches

Low Latency Less Processing

slide-29
SLIDE 29

Caching Dynamic Content

Now Feasible le: : In Invalid idating Updated Queries

1 0 1 1 0 0 1

slide-30
SLIDE 30

 Push

sh-based data acc access

  • Natural for many applications
  • Hard to implement on top of traditional (pull-based) databases

 Real-time databases

  • Natively push-based
  • Not legacy-compatible
  • Barely scalable

 In

Invali liDB

  • Add-On push-based queries
  • Database-independent
  • Linear scalability
  • Filter, sorting, joins, aggregations

Wrap-up up

30

slide-31
SLIDE 31

Questions?