Cloud Data Management Felix Gessert December 18, 2018, Universitt - PowerPoint PPT Presentation

Low Latency for Cloud Data Management Felix Gessert December 18, 2018, Universität Hamburg, DBIS Group

Presentation is loading

Web Performance An Open Challenge With a Huge Impact = $1.7 Billion 100 ms faster → +1% Revenue = $19 Billion → 500 ms faster +20% Ad Sales Greg Linden. Make Data Tammy Everts. Time Is Money: The Business 3 Useful. 2006 Value of Web Performance. O’Reilly Media, 2016.

Cloud-Based Web Applications Three Sources of Page Load Time 2. Network Delays 3. Frontend Rendering 1. Backend Processing 4

Latency is the Problem Throughput vs. Latency Throughput Latency 3500 4500 4000 3000 Page Load Time (ms) Page Load Time (ms) 3500 2500 3000 VS 2000 2500 2000 1500 1500 1000 1000 500 500 0 0 1 2 3 4 5 6 7 8 9 10 240 220 200 180 160 140 120 100 80 60 40 20 0 Bandwidth in MBit/s (at 60ms latency) Latency in ms (at 5MBit/s bandwidth) Mike Belshe. More Bandwidth Doesn’t Matter 5 (much). Technical report, Google Inc., 2010.

Latency is the Problem Throughput vs. Latency 2 × Throughput ½ Latency = ≈ VS Same Load Time ½ Load Time Mike Belshe. More Bandwidth Doesn’t Matter 6 (much). Technical report, Google Inc., 2010.

Problem Statement Four Challenges 2 1 Latency of Direct Client Dynamic Data Access 4 3 Transaction Polyglot Persistence Abort Rates 7

Problem Statement Research Question How can the latency of retrieving dynamic data from cloud services be minimized in an application- and database-independent way while maintaining strict consistency guarantees? 8

Problem Statement Four Challenges 1. Latency 4. Polyglot Persistence 3. Transactions 2. Direct Access 9

Outline Background & Cloud Data Manage- Caching Outlook and Motivation ment Middleware Dynamic Data Summary The Future of End-to-End Cache Sketches: Providing Modern Polyglot Data Latency in Solving Staleness NoSQL Systems 4 1 2 3 Management in Cloud-based of Reads and as a Low-Latency the Cloud Architectures Queries DBaaS 10

Why is end-to-end latency an open problem?

Background & Motivation Cloud Data Management Frontend Network 12

Background & Motivation Cloud Data Management Frontend Network State of the Art: Problems:  Web interface based on HTML, files,  No direct access or integration APIs, and application logic to data management  Performance defined by critical  Storage maintained manually rendering path 13

Background & Motivation Cloud Data Management Frontend Network State of the Art: Problem:  Service interfaces use REST & HTTP  Web caching not compatible with  Web caching can reduce consistent dynamic data end-to-end latency 13

Background & Motivation Cloud Data Management Frontend Network State of the Art: Problems:  SaaS , PaaS , and IaaS models  Combination of cloud services  Scalability & multi-tenancy entails high latency  Common application building blocks often re-implemented 13

Background & Motivation Cloud Data Management Frontend Network State of the Art: Problems:  Scalability & high availability through  Lack of common data management NoSQL systems abstractions  Sharding & replication  DBaaS model not supported  Database-as-a-Service (DBaaS) model  Polyglot persistence manual & error-prone 13

Background & Motivation Data Management RDBMS Critical Data Document Nested Data Store Key-Value Challenge: Volatile Data Store Wide-Column How to tackle the mapping problem? Large Data Sets Store data database Distributed Static Files File System 13

Functional Data Management Non-Functional Requirements Requirements Techniques Sharding Data Scalability Scan Queries Range-Sharding Hash-Sharding Write Scalability Entity-Group Sharding Consistent Hashing ACID Transactions Shared-Disk Read Scalability Replication Elasticity Conditional or Atomic Writes Commit/Consensus Protocol Synchronous Asynchronous Consistency Primary Copy Joins Update Anywhere Write Latency Storage Management Sorting Logging Read Latency Update-in-Place Caching In-Memory Storage Write Throughput Filter Queries Append-Only Storage Read Availability Query Processing Full-text Search Global Secondary Indexing Local Secondary Indexing Write Availability Query Planning Analytics Framework Aggregation and Analytics Durability Materialized Views [GWFR16] 18

NoSQL Decision Sharding Data Scalability Scan Queries Range-Sharding Toolbox Tree Hash-Sharding Write Scalability Entity-Group Sharding Consistent Hashing ACID Transactions Access Shared-Disk Read Scalability Fast Lookups Complex Queries Replication Elasticity Conditional or Atomic Writes Commit/Consensus Protocol Volume Volume Synchronous HDD-Size Unbounded Unbounded RAM Asynchronous Consistency Primary Copy Joins Update Anywhere CAP Consistency Query Pattern Write Latency Storage Management Availability Ad-hoc Analytics AP CP ACID Sorting Logging Read Latency Update-in-Place Redis Cassandra HBase RDBMS CouchDB MongoDB Hadoop, Spark Caching Memcache Riak MongoDB Neo4j MongoDB RethinkDB Parallel DWH In-Memory Storage Write Throughput Voldemort CouchBase RavenDB SimpleDB HBase, Cassandra, HBase Filter Queries Append-Only Storage Aerospike DynamoDB MarkLogic Accumulo Riak, MongoDB ElasticSearch, Solr Read Availability Query Processing Full-text Search Global Secondary Indexing Shopping- Order Social Cache OLTP Website Big Data Local Secondary Indexing Write Availability basket History Network Query Planning Analytics Framework Example Applications Aggregation and Analytics Durability Materialized Views [GWFR16] 19

NoSQL Decision Unified Data Toolbox Tree Management API Backend-as-a-Service Database-as-a-Service Object Transaction Data Persistence Processing Validation Query Partial Code Support Updates Execution Schema Indexing & Access Management Configuration Control [GFW+14] 20

How can cloud data management be unified & combined with low latency?

Orestes: Goals A Data Management Middleware for Low Latency Database DBaaS & BaaS Independence Functionality Scalable, Available, Low Latency with Multi-Tenant Tunable Consistency 22

Orestes Concept Overview Heterogeneous Data Stores Unmodified Database Systems Scalable Data Management Platform (Multi-Tenancy, Scaling , Caching, Failover, …) DBaaS/BaaS Middleware Data and Default Modules Unified REST API Web Caching for Low Latency Web and Mobile Applications [GBR14, GB13] 23

How can dynamic data be accelerated through web caching?

The Web‘s Caching Model Client Expiration-Based Caches: Browser Caches,  An object x is considered Forward Proxies, ISP Caches fresh for TTL x seconds Expiration - based Caches  Server assigns TTLs for each object Request Cache Content Delivery Path Hits Networks, Invalidation-Based Caches: Reverse Proxies Invalidation -  Expose object eviction based Caches operation to the server Invalidations, Objects Server/DB [GSW+15] 25

Web Caching for Data Management Overview of Cache Sketch Method Data Cached for Fixed TTL Without Cache Sketch: invalidate Stale Cached Data Invalidation Expiration Cache Cache Add to Server Validate Cache Sketch Freshness Compact Cache Sketch 0 1 0 1 1 0 2 3 0 4 0 1 [GSW+15, GSW+17] 26

The Cache Sketch Approach Minimize Bloom 10101010 filter Client Needs Revalidation ? Staleness Initialization Client Cache Sketch 1 from Cache Periodic at at every Δ transaction Δ -Atomic Expiration - connect seconds begin 2 based Caches Consistency Request Cache 1 2 3 Path Hits Cache-Aware 3 Transactions Invalidation - based Caches 10101010 10201040 Minimize Invalidation Invalidations, 4 Objects Counting Non-expired Minimization Report Expirations Invalidations Bloom Filter Object Keys and Writes 4 Server/DB Server Cache Sketch Needs Invalidation ? [GSW+15] 27

Cache Sketch Main Properties To ensure Δ -atomicity the Cache Sketch at time t contains key(x) of every object x that was written before it expired in all caches. Retrieved Staleness Cache Sketch Bound TTL Δ w(x) r(x) r(x) c t t 1 + TTL t 1 t 3 t 2 t timespan for which x c [GSW+15, GSW+17] 28

Cache Sketch Construction To ensure compactness the Cache Sketch stores n keys in a Bloom filter with m bits, k hash functions and a false 𝑙 . positive rate of 𝑔 ≈ 1 − exp 𝑙⋅𝑜 Example 𝑛 20 000 entries & 5% false positives Hit key Client Cache Sketch GET request Miss ↓ no h 1 11 KB in size Cache key ... 1 0 0 1 1 0 1 1 Bits = 1 find(key) h k yes key Revalidation k hash functions m Bloom filter bits [GSW+15, GSW+17] 29

Cloud Data Management Felix Gessert December 18, 2018, Universitt - PowerPoint PPT Presentation

Low Latency for Cloud Data Management Felix Gessert December 18, 2018, Universitt Hamburg, DBIS Group Presentation is loading Web Performance An Open Challenge With a Huge Impact = $1.7 Billion 100 ms faster +1% Revenue = $19 Billion

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Big Data on Google Cloud Using Cloud Dataflow, BigQuery, and friends to process data the Cloud

Cloud Computing & Cloud Models Cloud Models Topics Defining cloud computing

SNR SNR- -cloud interaction cloud interaction cloud interaction SNR SNR cloud interaction

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Cloud Ross Mallace Commercial Director Cloud/SaaS Cloud is here. ALL By 2020 most core

Embracing Cloud Ian Apperley Agenda A little about me What is Cloud and where did it come

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

CS5412: THE CLOUD VALUE PROPOSITION Lecture XXII Ken Birman Cloud Hype 2 The cloud is

SAS and (the) Cloud Dave Annis SAS Solutions onDemand SAS and (the) Cloud Everyones Cloud

CS5412: THE CLOUD VALUE PROPOSITION Lecture XXII Ken Birman Cloud Hype 2 The cloud is

Electron Cloud Build Electron Cloud Build- Electron Cloud Build Electron Cloud Build -Up

Filling the Management Gap: Cloud Management Platforms for Managing OpenStack and Other Cloud

Why do big data and cloud systems slow down and stop? Shan Lu What are? Why do big data and

Problem solved: Cloud Cost Management 2 Cloud Cost Management Using cloud services from

Improving Performance in the Gnutella Protocol Jonathan Hess Benjamin Poon University of

Alternative Map and Set Implementations Mark Redekopp David Kempe 2 An imperfect set BLOOM

Bloom Filters Queries False-Positives Analysis Summary Anil Maheshwari anil@scs.carleton.ca

Meta-Learning Neural Bloom Filters Jack Rae Sergey Bartunov Tim Lillicrap Architecture

EvenDB: Optimizing Key-Value Storage for Spatial Locality Eran Gilad, Edward Bortnikov, Anastasia

SPARQLing Kleene Fast Property Paths in RDF-3X Andrey Gubichev, TU Munich Stephan Seufert,

Content Who? Why? Learning Pyramid Millers Pyramid How? Blooms Taxonomy What?

Summarizing A 3 Way Relational Data Stream Baptiste Csernel, 3rd year PhD Student Fabrice

Cloud Data Management Felix Gessert December 18, 2018, Universitt - PowerPoint PPT Presentation

Low Latency for Cloud Data Management Felix Gessert December 18, 2018, Universitt Hamburg, DBIS Group Presentation is loading Web Performance An Open Challenge With a Huge Impact = $1.7 Billion 100 ms faster +1% Revenue = $19 Billion

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Big Data on Google Cloud Using Cloud Dataflow, BigQuery, and friends to process data the Cloud

Cloud Computing &amp; Cloud Models Cloud Models Topics Defining cloud computing

SNR SNR- -cloud interaction cloud interaction cloud interaction SNR SNR cloud interaction

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Cloud Ross Mallace Commercial Director Cloud/SaaS Cloud is here. ALL By 2020 most core

Embracing Cloud Ian Apperley Agenda A little about me What is Cloud and where did it come

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

CS5412: THE CLOUD VALUE PROPOSITION Lecture XXII Ken Birman Cloud Hype 2 The cloud is

SAS and (the) Cloud Dave Annis SAS Solutions onDemand SAS and (the) Cloud Everyones Cloud

CS5412: THE CLOUD VALUE PROPOSITION Lecture XXII Ken Birman Cloud Hype 2 The cloud is

Electron Cloud Build Electron Cloud Build- Electron Cloud Build Electron Cloud Build -Up

Filling the Management Gap: Cloud Management Platforms for Managing OpenStack and Other Cloud

Why do big data and cloud systems slow down and stop? Shan Lu What are? Why do big data and

Problem solved: Cloud Cost Management 2 Cloud Cost Management Using cloud services from

Improving Performance in the Gnutella Protocol Jonathan Hess Benjamin Poon University of

Alternative Map and Set Implementations Mark Redekopp David Kempe 2 An imperfect set BLOOM

Bloom Filters Queries False-Positives Analysis Summary Anil Maheshwari anil@scs.carleton.ca

Meta-Learning Neural Bloom Filters Jack Rae Sergey Bartunov Tim Lillicrap Architecture

EvenDB: Optimizing Key-Value Storage for Spatial Locality Eran Gilad, Edward Bortnikov, Anastasia

SPARQLing Kleene Fast Property Paths in RDF-3X Andrey Gubichev, TU Munich Stephan Seufert,

Content Who? Why? Learning Pyramid Millers Pyramid How? Blooms Taxonomy What?

Summarizing A 3 Way Relational Data Stream Baptiste Csernel, 3rd year PhD Student Fabrice

Cloud Computing & Cloud Models Cloud Models Topics Defining cloud computing