Towards Automated Polyglot Persistence Michael Schaarschmidt, Felix - - PowerPoint PPT Presentation

towards automated polyglot
SMART_READER_LITE
LIVE PREVIEW

Towards Automated Polyglot Persistence Michael Schaarschmidt, Felix - - PowerPoint PPT Presentation

Towards Automated Polyglot Persistence Michael Schaarschmidt, Felix Gessert, Norbert Ritter gessert@informatik.uni-hamburg.de Poly lyglot Persistence Current best practice Application Layer Nested Billing Data Files Session data


slide-1
SLIDE 1

Michael Schaarschmidt, Felix Gessert, Norbert Ritter

gessert@informatik.uni-hamburg.de

Towards Automated Polyglot Persistence

slide-2
SLIDE 2

Poly lyglot Persistence

Current best practice

Application Layer

Billing Data Nested Application Data Session data Search Index Files

Amazon Elastic MapReduce Google Cloud Storage

Friend network Cached data & metrics Recommen- dation Engine

slide-3
SLIDE 3

Poly lyglot Persistence

Current best practice

Application Layer

Billing Data Nested Application Data Session data Search Index Files

Amazon Elastic MapReduce Google Cloud Storage

Friend network Cached data & metrics Recommen- dation Engine

Research Question:

Can we automate the mapping problem?

data database

slide-4
SLIDE 4

Vis ision

Sch chemas can be be annotated wit ith requirements

  • Write Throughput > 10,000 RPS
  • Read Availability > 99.9999%
  • Scans = true
  • Full-Text-Search = true
  • Monotonic Read = true

Schema DBs Tables Fields

slide-5
SLIDE 5

Vis ision

Th The Poly lyglot Persistence Mediator ch chooses th the database

Application

Database Metrics Data and Operations db1 db2 db3 Polyglot Persistence Mediator Latency < 30ms Annotated Schema

slide-6
SLIDE 6

 Goal:

  • Extend classic workload management to polyglot persistence
  • Leverage hetereogeneous (NoSQL) databases

Tenant specifies requirements as Service- Level-Agreements Find or provision a suitable combination

  • f databases

Mediate data and database operations

  • 1. Requirements
  • 2. Resolution
  • 3. Mediation

Towards Automated Poly lyglot Persis istence

Nece cessary ry steps

slide-7
SLIDE 7

Fu Functional Service Level Objectives

  • Guarantee a „feature“
  • Determined by database system
  • Examples: transactions, join

Non-Functional Service Level Objectives

  • Guarantee a certain quality of service (QoS)
  • Determined by database system and service provider
  • Examples:

 Con

  • ntinuous: response time (latency), throughput

 Bin inary ry: Elasticity, Read-your-writes

Servic ice Level Agreements

Expressing application requirements

slide-8
SLIDE 8

Utility expresses „value“ of a continuous non-functional requirement:

𝑔

𝑣𝑢𝑗𝑚𝑗𝑢𝑧 𝑛𝑓𝑢𝑠𝑗𝑑 → [0,1]

Servic ice Level Agreements

Refining th the utili tility of

  • f each

ch SLO

slide-9
SLIDE 9

Functional Re Requirements

Scan-Querys Conditional Updates Transactions Query by Example Joins Analytics Elasticity Consistency Read-Latency Write-Latency Write-Throughput Scalability of Data Volume Read Scalability Read-Availability Write-Availability

Non-Functional Re Requirements

Durability Write Scalability

SLA Example

For MongoDB

slide-10
SLIDE 10

Step I I - Requirements

Expressing th the application‘s needs

Re Requirements

1

Database Table Field Field Field

  • 1. Define

schema

Tenant

Inherits continuous annotations annotated

Table Field

 Tenant annotates schema

with his requirements

Annotations  Continuous non-functional e.g. write latency < 15ms  Binary functional e.g. Atomic updates  Binary non-functional e.g. Read-your-writes

  • 2. Annotate
slide-11
SLIDE 11

Step I I - Requirements

Expressing th the application‘s needs

Re Requirements

1

Database Table Field Field Field

  • 1. Define

schema

Tenant

Inherits continuous annotations annotated

Table Field

 Tenant annotates schema

with his requirements

Annotations  Continuous non-functional e.g. write latency < 15ms  Binary functional e.g. Atomic updates  Binary non-functional e.g. Read-your-writes

  • 2. Annotate
slide-12
SLIDE 12

Step II II - Resolution

Fin Finding th the best database

 The Provider resolves the

requirements

 RANK

ANK: scores available

database systems

 Routing Mod

  • del: defines the
  • ptimal mapping from schema

elements to databases

Re Resolution

2 Provider

Capabilities for available DBs

  • 1. Find optimal

RANK(schema_root, DBs)

through recursive descent using annotated schema and metrics

  • 2a. If unsatisfiable

Either: Refuse or Provision new DB

  • 2b. Generates

routing model Routing Model Route schema_element db  transform db-independent to db- specific operations

slide-13
SLIDE 13

Step II II - Resolution

Ranking alg lgorithm by by example

Customers

Table

ECommerceDB

database

ShoppingBasket

List<String>

UserName

String

Lineariza- bility Availability Read latency Schema Annotations No annotation  recursive descent to child RANK Algorithm

DBs DBs = { MongoDB, Riak, Cassandra, CouchDB, Redis, MySQL, S3, Hbase }

slide-14
SLIDE 14

Step II II - Resolution

Ranking alg lgorithm by by example

Customers

Table

ECommerceDB

database

ShoppingBasket

List<String>

UserName

String

Lineariza- bility Availability Read latency Schema Annotations No annotation  recursive descent to child RANK Algorithm Binary requirement  1. Exclude DBs that do not support it 2. Recursive descent

DBs DBs = { MongoDB, Riak, Cassandra, CouchDB, Redis, MySQL, S3, Hbase }

slide-15
SLIDE 15

Step II II - Resolution

Ranking alg lgorithm by by example

Customers

Table

ECommerceDB

database

ShoppingBasket

List<String>

UserName

String

Lineariza- bility Availability Read latency Schema Annotations RANK Algorithm Continuous requirement  ∀ databases calculate 𝑒𝑐 → 𝑔

𝑣𝑢𝑗𝑚𝑗𝑢𝑧(𝑒𝑐. 𝑏𝑤𝑏𝑗𝑚𝑏𝑐𝑗𝑚𝑗𝑢𝑧)

Da Database Avail ilability MongoDB 99%0.8 Redis 95%0.05 MySQL 94% 0.04 HBase 99.9%0.9

slide-16
SLIDE 16

Step II II - Resolution

Ranking alg lgorithm by by example

Customers

Table

ECommerceDB

database

ShoppingBasket

List<String>

UserName

String

Lineariza- bility Availability Read latency Schema Annotations RANK Algorithm Continuous requirement  ∀ databases calculate 𝑒𝑐 → 𝑔

𝑣𝑢𝑗𝑚𝑗𝑢𝑧(𝑒𝑐. 𝑚𝑏𝑢𝑓𝑜𝑑𝑧)

Da Database Avail ilability MongoDB 99%0.8 Redis 95%0.05 MySQL 94% 0.04 HBase 99.9%0.9 La Latency 10ms1 1ms1 40ms0.2 50ms0.1

slide-17
SLIDE 17

Step II II - Resolution

Ranking alg lgorithm by by example

Customers

Table

ECommerceDB

database

ShoppingBasket

List<String>

UserName

String

Lineariza- bility Availability Read latency Schema Annotations RANK Algorithm Binary requirement  1. Exclude DBs that do not support it 2. Recursive descent 3. Pick DB with best total score and add it to routing model

DB DB Sc Score MongoDB 0.9 Redis 0.525 MySQL 0.12 HBase 0.5

slide-18
SLIDE 18

Step II II - Resolution

Ranking alg lgorithm by by example

Customers

Table

ECommerceDB

database

ShoppingBasket

List<String>

UserName

String

Lineariza- bility Availability Read latency Schema Annotations RANK Algorithm Binary requirement  1. Exclude DBs that do not support it 2. Recursive descent 3. Pick DB with best total score and add it to routing model

DB DB Sc Score MongoDB 0.9 Redis 0.525 MySQL 0.12 HBase 0.5

Routing Model: Customers  MongoDB

slide-19
SLIDE 19

Step III III - Media iation

Routing data and and operations

 The PPM routes data  Operation Rewrit

iting: : translates from abstract to database-specific operations

 Ru

Runtime Metric ics: Latency, availability, etc. are reported to the resolver

 Prim

rimary ry Da Database Option: All data periodically gets materialized to designated database Mediation

3 Application

Polyglot Persistence Mediator  Uses Routing Model  Triggers periodic materialization Report metrics

  • 1. CRUD, queries,

transactions, etc. db1 db2 db3

  • 2. route
slide-20
SLIDE 20

Evaluation: News Artic icle

Prototype built on ORESTES

Sc Scenario io: news articles with impression counts Obje jectiv ives: low-latency top-k queries, high- throughput counts, article-queries

Article Counter

slide-21
SLIDE 21

Evaluation: News Artic icle

Prototype built on ORESTES

Sc Scenario io: news articles with impression counts Obje jectiv ives: low-latency top-k queries, high- throughput counts, article-queries

Mediator

slide-22
SLIDE 22

Evaluation: News Artic icle

Prototype built on ORESTES

Sc Scenario io: news articles with impression counts Obje jectiv ives: low-latency top-k queries, high- throughput counts, article-queries

Mediator Counter updates kill performance

slide-23
SLIDE 23

Evaluation: News Artic icle

Prototype built on ORESTES

Sc Scenario io: news articles with impression counts Obje jectiv ives: low-latency top-k queries, high- throughput counts, article-queries

Mediator

slide-24
SLIDE 24

Evaluation: News Artic icle

Prototype built on ORESTES

Sc Scenario io: news articles with impression counts Obje jectiv ives: low-latency top-k queries, high- throughput counts, article-queries

Mediator No powerful queries

slide-25
SLIDE 25

Evaluation: News Artic icle

Prototype built on ORESTES

Sc Scenario io: news articles with impression counts Obje jectiv ives: low-latency top-k queries, high- throughput counts, article-queries

Art rticle le ID ID Title itle … Im Imp. Im Imp. ID ID

Document Sorted Set

Found Resolution

slide-26
SLIDE 26

Worklo load Management: during mediation actively schedule requests based on requirements Ranking: Predict future metrics from historic

  • nes (time-series analysis) or from

performance models Database se sele lection: minimize 𝑄 𝑇𝑀𝐵 𝑤𝑗𝑝𝑚𝑏𝑢𝑗𝑝𝑜 ∗ 𝑞𝑓𝑜𝑏𝑚𝑢𝑧 (e.g. through reinforcement learning)

Challgenges & Future Work

slide-27
SLIDE 27

Meta-DBaaS: Mediate over DBaaS-systems and factor in their SLAs Liv ive Migr igration: Enable requirement changes Requirements ts: collect library of common

  • nes

Util tility ty: Provide intuitive, visual „knobs“ for developers

Challgenges & Future Work

slide-28
SLIDE 28

 (Manual) Polyglot Persistence is a reality - but difficult

and error-prone

 Polyglot Persistence Mediator: SLA-driven, fine-grained

selection of database systems

  • 1. Let the tenant define his requirements
  • 2. Choose or provision a database based on that
  • 3. Route data and operations according to that mapping

Summary ry

Requirements Resolution Mediation

slide-29
SLIDE 29

Thank you.

gessert@informatik.uni-hamburg.de Orestes.info Baqend.com