Towards Automated Polyglot Persistence Michael Schaarschmidt, Felix - - PowerPoint PPT Presentation
Towards Automated Polyglot Persistence Michael Schaarschmidt, Felix - - PowerPoint PPT Presentation
Towards Automated Polyglot Persistence Michael Schaarschmidt, Felix Gessert, Norbert Ritter gessert@informatik.uni-hamburg.de Poly lyglot Persistence Current best practice Application Layer Nested Billing Data Files Session data
Poly lyglot Persistence
Current best practice
Application Layer
Billing Data Nested Application Data Session data Search Index Files
Amazon Elastic MapReduce Google Cloud Storage
Friend network Cached data & metrics Recommen- dation Engine
Poly lyglot Persistence
Current best practice
Application Layer
Billing Data Nested Application Data Session data Search Index Files
Amazon Elastic MapReduce Google Cloud Storage
Friend network Cached data & metrics Recommen- dation Engine
Research Question:
Can we automate the mapping problem?
data database
Vis ision
Sch chemas can be be annotated wit ith requirements
- Write Throughput > 10,000 RPS
- Read Availability > 99.9999%
- Scans = true
- Full-Text-Search = true
- Monotonic Read = true
Schema DBs Tables Fields
Vis ision
Th The Poly lyglot Persistence Mediator ch chooses th the database
Application
Database Metrics Data and Operations db1 db2 db3 Polyglot Persistence Mediator Latency < 30ms Annotated Schema
Goal:
- Extend classic workload management to polyglot persistence
- Leverage hetereogeneous (NoSQL) databases
Tenant specifies requirements as Service- Level-Agreements Find or provision a suitable combination
- f databases
Mediate data and database operations
- 1. Requirements
- 2. Resolution
- 3. Mediation
Towards Automated Poly lyglot Persis istence
Nece cessary ry steps
Fu Functional Service Level Objectives
- Guarantee a „feature“
- Determined by database system
- Examples: transactions, join
Non-Functional Service Level Objectives
- Guarantee a certain quality of service (QoS)
- Determined by database system and service provider
- Examples:
Con
- ntinuous: response time (latency), throughput
Bin inary ry: Elasticity, Read-your-writes
Servic ice Level Agreements
Expressing application requirements
Utility expresses „value“ of a continuous non-functional requirement:
𝑔
𝑣𝑢𝑗𝑚𝑗𝑢𝑧 𝑛𝑓𝑢𝑠𝑗𝑑 → [0,1]
Servic ice Level Agreements
Refining th the utili tility of
- f each
ch SLO
Functional Re Requirements
Scan-Querys Conditional Updates Transactions Query by Example Joins Analytics Elasticity Consistency Read-Latency Write-Latency Write-Throughput Scalability of Data Volume Read Scalability Read-Availability Write-Availability
Non-Functional Re Requirements
Durability Write Scalability
SLA Example
For MongoDB
Step I I - Requirements
Expressing th the application‘s needs
Re Requirements
1
Database Table Field Field Field
- 1. Define
schema
Tenant
Inherits continuous annotations annotated
Table Field
Tenant annotates schema
with his requirements
Annotations Continuous non-functional e.g. write latency < 15ms Binary functional e.g. Atomic updates Binary non-functional e.g. Read-your-writes
- 2. Annotate
Step I I - Requirements
Expressing th the application‘s needs
Re Requirements
1
Database Table Field Field Field
- 1. Define
schema
Tenant
Inherits continuous annotations annotated
Table Field
Tenant annotates schema
with his requirements
Annotations Continuous non-functional e.g. write latency < 15ms Binary functional e.g. Atomic updates Binary non-functional e.g. Read-your-writes
- 2. Annotate
Step II II - Resolution
Fin Finding th the best database
The Provider resolves the
requirements
RANK
ANK: scores available
database systems
Routing Mod
- del: defines the
- ptimal mapping from schema
elements to databases
Re Resolution
2 Provider
Capabilities for available DBs
- 1. Find optimal
RANK(schema_root, DBs)
through recursive descent using annotated schema and metrics
- 2a. If unsatisfiable
Either: Refuse or Provision new DB
- 2b. Generates
routing model Routing Model Route schema_element db transform db-independent to db- specific operations
Step II II - Resolution
Ranking alg lgorithm by by example
Customers
Table
ECommerceDB
database
ShoppingBasket
List<String>
UserName
String
Lineariza- bility Availability Read latency Schema Annotations No annotation recursive descent to child RANK Algorithm
DBs DBs = { MongoDB, Riak, Cassandra, CouchDB, Redis, MySQL, S3, Hbase }
Step II II - Resolution
Ranking alg lgorithm by by example
Customers
Table
ECommerceDB
database
ShoppingBasket
List<String>
UserName
String
Lineariza- bility Availability Read latency Schema Annotations No annotation recursive descent to child RANK Algorithm Binary requirement 1. Exclude DBs that do not support it 2. Recursive descent
DBs DBs = { MongoDB, Riak, Cassandra, CouchDB, Redis, MySQL, S3, Hbase }
Step II II - Resolution
Ranking alg lgorithm by by example
Customers
Table
ECommerceDB
database
ShoppingBasket
List<String>
UserName
String
Lineariza- bility Availability Read latency Schema Annotations RANK Algorithm Continuous requirement ∀ databases calculate 𝑒𝑐 → 𝑔
𝑣𝑢𝑗𝑚𝑗𝑢𝑧(𝑒𝑐. 𝑏𝑤𝑏𝑗𝑚𝑏𝑐𝑗𝑚𝑗𝑢𝑧)
Da Database Avail ilability MongoDB 99%0.8 Redis 95%0.05 MySQL 94% 0.04 HBase 99.9%0.9
Step II II - Resolution
Ranking alg lgorithm by by example
Customers
Table
ECommerceDB
database
ShoppingBasket
List<String>
UserName
String
Lineariza- bility Availability Read latency Schema Annotations RANK Algorithm Continuous requirement ∀ databases calculate 𝑒𝑐 → 𝑔
𝑣𝑢𝑗𝑚𝑗𝑢𝑧(𝑒𝑐. 𝑚𝑏𝑢𝑓𝑜𝑑𝑧)
Da Database Avail ilability MongoDB 99%0.8 Redis 95%0.05 MySQL 94% 0.04 HBase 99.9%0.9 La Latency 10ms1 1ms1 40ms0.2 50ms0.1
Step II II - Resolution
Ranking alg lgorithm by by example
Customers
Table
ECommerceDB
database
ShoppingBasket
List<String>
UserName
String
Lineariza- bility Availability Read latency Schema Annotations RANK Algorithm Binary requirement 1. Exclude DBs that do not support it 2. Recursive descent 3. Pick DB with best total score and add it to routing model
DB DB Sc Score MongoDB 0.9 Redis 0.525 MySQL 0.12 HBase 0.5
Step II II - Resolution
Ranking alg lgorithm by by example
Customers
Table
ECommerceDB
database
ShoppingBasket
List<String>
UserName
String
Lineariza- bility Availability Read latency Schema Annotations RANK Algorithm Binary requirement 1. Exclude DBs that do not support it 2. Recursive descent 3. Pick DB with best total score and add it to routing model
DB DB Sc Score MongoDB 0.9 Redis 0.525 MySQL 0.12 HBase 0.5
Routing Model: Customers MongoDB
Step III III - Media iation
Routing data and and operations
The PPM routes data Operation Rewrit
iting: : translates from abstract to database-specific operations
Ru
Runtime Metric ics: Latency, availability, etc. are reported to the resolver
Prim
rimary ry Da Database Option: All data periodically gets materialized to designated database Mediation
3 Application
Polyglot Persistence Mediator Uses Routing Model Triggers periodic materialization Report metrics
- 1. CRUD, queries,
transactions, etc. db1 db2 db3
- 2. route
Evaluation: News Artic icle
Prototype built on ORESTES
Sc Scenario io: news articles with impression counts Obje jectiv ives: low-latency top-k queries, high- throughput counts, article-queries
Article Counter
Evaluation: News Artic icle
Prototype built on ORESTES
Sc Scenario io: news articles with impression counts Obje jectiv ives: low-latency top-k queries, high- throughput counts, article-queries
Mediator
Evaluation: News Artic icle
Prototype built on ORESTES
Sc Scenario io: news articles with impression counts Obje jectiv ives: low-latency top-k queries, high- throughput counts, article-queries
Mediator Counter updates kill performance
Evaluation: News Artic icle
Prototype built on ORESTES
Sc Scenario io: news articles with impression counts Obje jectiv ives: low-latency top-k queries, high- throughput counts, article-queries
Mediator
Evaluation: News Artic icle
Prototype built on ORESTES
Sc Scenario io: news articles with impression counts Obje jectiv ives: low-latency top-k queries, high- throughput counts, article-queries
Mediator No powerful queries
Evaluation: News Artic icle
Prototype built on ORESTES
Sc Scenario io: news articles with impression counts Obje jectiv ives: low-latency top-k queries, high- throughput counts, article-queries
Art rticle le ID ID Title itle … Im Imp. Im Imp. ID ID
Document Sorted Set
Found Resolution
Worklo load Management: during mediation actively schedule requests based on requirements Ranking: Predict future metrics from historic
- nes (time-series analysis) or from
performance models Database se sele lection: minimize 𝑄 𝑇𝑀𝐵 𝑤𝑗𝑝𝑚𝑏𝑢𝑗𝑝𝑜 ∗ 𝑞𝑓𝑜𝑏𝑚𝑢𝑧 (e.g. through reinforcement learning)
Challgenges & Future Work
Meta-DBaaS: Mediate over DBaaS-systems and factor in their SLAs Liv ive Migr igration: Enable requirement changes Requirements ts: collect library of common
- nes
Util tility ty: Provide intuitive, visual „knobs“ for developers
Challgenges & Future Work
(Manual) Polyglot Persistence is a reality - but difficult
and error-prone
Polyglot Persistence Mediator: SLA-driven, fine-grained
selection of database systems
- 1. Let the tenant define his requirements
- 2. Choose or provision a database based on that
- 3. Route data and operations according to that mapping
Summary ry
Requirements Resolution Mediation