Push vs. Pull
The Future of Real-Time Databases in the Cloud
Wolfram Wingerath
ww@baqend.com December 10, SCDM 2018, Seattle
www.baqend.com
Push vs. Pull The Future of Real-Time Databases in the Cloud - - PowerPoint PPT Presentation
Push vs. Pull The Future of Real-Time Databases in the Cloud Wolfram Wingerath ww@baqend.com December 10, SCDM 2018, Seattle www.baqend.com About me Wolfram Wingerath PhD Thesis & Distributed Research Systems Engineer Research:
Wolfram Wingerath
ww@baqend.com December 10, SCDM 2018, Seattle
www.baqend.com
Research:
Practice: Backend-as-a-Service Web Caching Real-Time Database …
About me Wolfram Wingerath
PhD Thesis & Research Distributed Systems Engineer
Traditional Databases
Rescue!
Discussion What are the bottlenecks? Push-Based Data Access Why Real-Time Databases? Real-Time Databases System survey Future Directions Scalability & Use Cases
3
…
1970 1970 1980 1980 1990 1990 2000 2000 2010 2010 tod
Relational Model Ingres System R Triggers Entity-Relationship Model SQL Standard PostgreSQL HiPAC Starburst Rapide STREAM Aurora & Borealis MapReduce Bigtable Dynamo Spark Storm Flink Samza RethinkDB Meteor Firebase Baqend GFS Relational Databases Active Databases CEP & Streams Big Data & NoSQL Stream Processing Real-Time Databases
Hot t Topic ics Th Through Th The Ages
Telegraph
circular shapes
What‘s the current state? Periodic Pol
ling for query result maintenance: → in inefficie ient → sl slow
5
circular shapes
Real-Time Querie ies for query result maintenance: → efficient → fast
6
Matchin ing Every ry Query ry Again inst Every ry Update
Potential bottlenecks:
Similar processing for:
Discussion What are the bottlenecks? Push-Based Data Access Why Real-Time Databases? Real-Time Databases System survey
8
…
Future Directions Scalability & Use Cases
Overvie iew:
JavaScript Fr Framework for interactive apps and websites Mon
Real-time result updates, full MongoDB expressiveness
anaged se service: Galaxy (Platform-as-a-Service)
His istory ry:
10
ange monit itoring: app servers detect relevant changes → incomplete in multi-server deployment
→ stale leness win indow → doe
scal ale with queries
app server
monitor incoming writes
CRUD app server
repeat query every 10 seconds
forward CRUD
11
aster-slave replication: Secondaries subscribe to oplog
Secondary C2
apply propagate change write operation
Secondary C3 Secondary C1 MongoDB cluster (3 shards) Primary B Primary A Primary C
12
Primary B Primary A Primary C MongoDB cluster (3 shards) App server App server Oplog broadcast CRUD
query (when in doubt) monitor
push relevant events
13
Baccarat players sorted by high-score
Partial update from oplog:
{ name: „Bobby“, score: 500 } // game: ???
→ if baccarat, he takes first place! → if something else, nothing changes!
14
all DB writes through oplogs → doe
scal ale
Primary B Primary A Primary C MongoDB cluster (3 shards) App server App server Oplog broadcast CRUD
query (when in doubt) monitor
push relevant events
Bot
leneck!
15
Overv rview:
right“: comparable queries and data model, but also:
Pus ush-base sed qu querie ies (filters only) Jo Joins ins (non-streaming) Str trong con
sistency: linearizability
JavaS aScript SD SDK (Horizon): open-source, as managed service
Open-source: Apache 2.0 license
His istory ry:
16
William Stein, RethinkDB versus PostgreSQL: my personal experience (2017) http://blog.sagemath.com/2017/02/09/rethinkdb-vs-postgres.html (2017-02-27)
RethinkDB proxy RethinkDB proxy RethinkDB storage cluster
without data
all database writes → doe
scale
App server App server
Daniel Mewes, Comment on GitHub issue #962: Consider adding more docs on RethinkDB Proxy (2016) https://github.com/rethinkdb/docs/issues/962 (2017-02-27)
Bot
leneck!
17
Overv rview:
ackend-as as-a-Service for mobile apps
Mon
: largest deployment world-wide Eas asy de develo elopment: great docs, push notifications, authentication, … Rea eal-ti time updates for most MongoDB queries
Open-source: BSD license
anaged serv service: discontinued
His istory ry:
Live Quer eries are announced
18
Illustration taken from: http://parseplatform.github.io/docs/parse-server/guide/#live-queries (2017-02-22)
LiveQuery Se Server: no data, real-time query matching
all database writes → doe
scal ale
Bot
leneck!
19
Overv rview:
ime state synchroniz izatio ion across devices
Simpli listic ic data model: : nested hierarchy of lists and objects
Simpli listic ic querie ies: mostly navigation/filtering
Full lly managed, proprietary
App SDK SDK for App development, mobile-first
Google le se services in integratio ion: analytics, hosting, authorization, …
His istory:
→ was often used for cross-device state synchronization → state synchronization is separated (Firebase)
20
Illustration taken from: Frank van Puffelen, Have you met the Realtime Database? (2016) https://firebase.googleblog.com/2016/07/have-you-met-realtime-database.html (2017-02-27)
Subtree syn ynchin ing: push notifications for specific keys only → Flat structure for fine granularity → Limited expr pressiv iveness!
21
Illustration taken from: Frank van Puffelen, Have you met the Realtime Database? (2016) https://firebase.googleblog.com/2016/07/have-you-met-realtime-database.html (2017-02-27)
specific keys only
single le attribute
single le filt filter on that attribute
→ doe
scal ale!
Jacob Wenger, on the Firebase Google Group (2015) https://groups.google.com/forum/#!topic/firebase-talk/d-XjaBVL2Ko (2017-02-27)
22
Firebase, Choose a Database: Cloud Firestore or Realtime Database (2018) https://firebase.google.com/docs/database/rtdb-vs-firestore (2018-03-10)
“Scale to around 100,000 concurrent connections and 1,000 writes/second in a single database. Scaling beyond that requires sharding your data across multiple databases.”
Bot
leneck!
Illustration taken from: Todd Kerpelman, Cloud Firestore for Realtime Database Developers (2017) https://firebase.googleblog.com/2017/10/cloud-firestore-for-rtdb-developers.html (2018-03-10)
colle llections documents references
Illustration taken from: Todd Kerpelman, Cloud Firestore for Realtime Database Developers (2017) https://firebase.googleblog.com/2017/10/cloud-firestore-for-rtdb-developers.html (2018-03-10)
tr tree-lik ike str tructure fin finer ac access granula lates
… But:
Limited Expressiveness
Limited Writ ite Throughput:
Firebase, Firestore: Quotas and Limits (2018) https://firebase.google.com/docs/firestore/quotas (2018-03-10)
27
Discussion What are the bottlenecks? Push-Based Data Access Why Real-Time Databases? Real-Time Databases System survey
28
…
Future Directions Scalability & Use Cases
Database Management static collections
push-based pull-based
Real-Time Databases evolving collections
Data Stream Management persistent/ ephemeral streams
Poll-and-Diff Log Tailing Unknown 2-D Partitioning
Write Scalability
Read Scalability
?
(100k connections)
Composite Filters (AND/OR)
(AND In Firestore)
Sorted Queries
(single attribute)
Limit
Offset
(value-based)
Self-Maintaining Queries
Event Stream Queries
Discussion What are the bottlenecks? Push-Based Data Access Why Real-Time Databases? Real-Time Databases System survey
32
…
Future Directions Scalability & Use Cases
Pub-Sub Pub-Sub
34
App Server
Match!
Two-dimensional l par artit titioning:
→ sc scale les wit ith querie ies an and writ rites Implementation:
Plugg ggable le query ry engin ine Subscription! Write op!
35
Change notifications go through up to 4 query processing stages: 1.
Filter queries: track matching status → before- and after-images 2.
Sorted querie ies: maintain result order 3.
Joins: combine maintained results 4.
Ordering Joins Aggregation Filtering
Event! Event! Event! Event!
a b c
36
Linear Scalability Stable Latency Distribution
Quaestor: Query Web Caching for Database-as-a-Service Providers VLDB ‘17
var query = DB.Tweet.find() .matches('text', /my filter/) .descending('createdAt') .offset(20) .limit(10); query.resultList(result => ...); query.resultStream(result => ...);
Static Query Real-Time Query
Two Bot
: La Latency an and Proc
High Latency Processing Overhead
Fr Fres esh Data a Fr From Dis istributed Web Cac aches
Low Latency Less Processing
Sol Solve Con
1 0 1 1 0 0 1
How to detect changes to query results: „Give me the most popular products that are in stock.“
Add Change Remove
Sc
Scala labil ilit ity:
Handle increasing throughput Handle additional queries
Exp
xpressiv iveness:
Content-based search? Composite filters? Ordering? Limit? Offset?
Legacy Su
Support:
Real-time queries for existing databases? Decouple OLTP from real-time workloads?
Quaestor: Query Web Caching for Database-as- a-Service Providers VLDB ‘17 NoSQL Database Systems: A Survey and Decision Guidance SummerSOC ‘16 Real-time stream processing for Big Data it - Information Technology 58 (2016) Real-Time Databases Explained: Why Meteor, RethinkDB, Parse and Firebase Don't Scale Baqend Tech Blog (2017): https://medium.com/p/822ff87d2f87 The Case For Change Notifications in Pull-Based DatabasesBTW ‘17
Learn more at blog.baqend.com!
Real-Time & Stream Data Management: Push-Based Data in Research & Practice. Springer 2019 Real-Time Data Management for Big Data. EDBT 2018 Scalable Push-Based Real-Time Queries on Top of Pull- Based Databases. PhD thesis, Wolfram Wingerath, 2018 Low Latency for Cloud Data Management. PhD thesis, Felix Gessert, 2018
@baqendcom
wingerath@informatik.uni-hamburg.de Blog: blog.baqend.com Slides: slides.baqend.com