Multi-Tenancy & Isolation Bogdan Munteanu - Dropbox Overview - PowerPoint PPT Presentation

Multi-Tenancy & Isolation Bogdan Munteanu - Dropbox

Overview • What is Edgestore? • Workloads & API • Multi-tenancy & Isolation • Lessons Learned

What is Edgestore • Distributed Metadata Store built on top of MySQL • Highly Available, Scalable, Durable • Abstract away sharding and caching • Reduce operational burden • Flexible schemas • Multi-Region Setup

Architecture

Architecture cont’d • 2048 Shards • 8 Shards per Engine (and MySQL cluster) • 1 Master - 2 Slaves (semi-sync) • Multi-region setup

MYSQL EDGESTORE Team Edgedata Id Company Size Schema Id Data Edge type Gid Data 1 Expedia 5000 Company:Expedia; Name:SF.jpg; 2 NatGeo 500 Team 10:1 Photo Entity ? Size:5000 Size:64 3 Intuit 2000 Company:NatGeo; Team 20:1 Photo Entity ? Name:Hawaii; Size:64 Size:500 4 Spotify 600 Company:Intuit; Name:Tahoe.jpg; Team 30:7 Photo Entity ? Size:2000 Size:128 Company:Spotity; User Name:Office.jpg; Team 35:3 Photo Entity ? Size:600 Size:1024 Id Email Name Type Email:jondoe@, User 15:1 Name:Jon; Type:Free 1 jondoe@ Jon Free Email:jenny@; User 20:2 Name:Jenny; Type:Pro 2 jenny@ Jenny Pro

Shard the table Schema Id Data Schema Id Data Schema Id Data Company:Expedia; Company:Intuit; Company:Spotity; Team 10:1 Team 30:1 Team 50:2 Size:5000 Size:2000 Size:600 Company:NatGeo; Email:jondoe@, Email:jenny@; Team 20:4 User 40:2 User 60:1 Size:500 Name:Jon; Type:Free Name:Jenny; Type:Pro Shard 1 Shard 2 Shard n

Restricted API • Create/Update/Delete • single and batch • Compare and Set semantics • Reads: • Read(Id, ) • List(Id, *) • Count(Id, *) • List(Id, condition=[equals, prefix, range]) • ReadLog(Id) • ListLog(Id, *) • Acquire Read/Write Lock • Commit/Rollback • Strong consistency semantics

Workloads • 10 million QPS • 600k Writes / second • 9.4mil Reads / second • 90% of Reads are cache hits • 1.5 million QPS to Engine fleet

Workloads cont’d • Batch Size 1 to 10000 • Some read requests can return 1 row • Some can return 100000 rows • Rows can be between a few bytes to several MB • 500+ unique Schemas

Engine Proto -> SQL Query Query Result -> Proto Connection Pooling Control / Reduce load to MySQL

Workloads cont’d • High QPS • Write / Read • Large / expensive requests: • Write - large transactions • Read - large number of rows, or large rows • Multi-Read / Multi-Write

Single Request - 1 token Engine Resource Request Pool Handler

Batch (parallel) Request - n tokens Engine Id1 goroutine Resource Request Id2 goroutine Pool Handler Id3 goroutine

Batch (sequential) Request - n tokens Engine Resource Request Id 1 - Id 10 Pool Handler Id 11 - Id 20 Id 21 - Id 30

More Isolation breakdowns • Type of Traffic: • Live traffic: Front Ends - user traffic, sync related traffic • Offline traffic: Scripts / Async processing / Offline processing • Type of Request: • Write (Insert, Delete, Update, Create Ids, Aquire Read/ Write Locks) • Read (Single read, multi read, list, count, listLog)

Layer Resource Pools Engine Write Live Resource Pool Read Live Request Handler Resource Pool Write Offline Resource Pool Read Offline Resource Pool

Breakdown by tenant • What is a tenant? • Source Machine Tag (e.g. front-end) • Source ServiceName (e.g. FileSync) • Source Schema (e.g. Team) • Source Handler (e.g. Thumbnail generator) • Source Script (e.g. backfill-albums)

Examples • “frontend:www:TeamEvent” • “async-worker:async_task_wrapper:Contacts” • “email:emailservice.py:UserEmailEvent” • “taskrunner-node- quota:update_team_usage.py:User”

CPU Memory Engine Network Storage Disk IO Mysql: Mysql: Threads CPU / Disk IO connected Mysql: Mysql: Semi-sync Threads running

Resources • QPS is not a good metric, as requests vary considerably • # Connections used (mapping to token resource pool) • connections used * time • 200 connections total pool = 200 * 60 = 12000 connection seconds / min: • 1 connection per second for 1 min = 60 connection seconds / min • 60 connections for 1 second = 60 connection seconds / min

Write Live - 1 minute snapshot ConnSec Tenant Connections Errors Used 20 % 5 0 frontend:rpc:User 3 % 90 0 frontend:www:FileId taskrunner: growth: 0,5 % 4 0 team_quota 1 % 1 0 email: UserEmail Total 24,5 % 100 0

100 75 Percentage 50 25 0 10:00 10:01 10:02 10:03 10:04 10:05 Time

Throttle mechanism • Auto-throttle heuristics based on history of resource usage per tenant • No predefined quota • Steady state usage by tenant varies wildly 0.001% - 20% • Triggering event -> find “bad” tenant -> decide how much to throttle them -> throttle “bad” tenant • Disabled the auto-throttling mechanism • We have learned a lot

Timer 9 3 8 7 6 5 4 1 2 Start Acquire Lock Commit Read Write Engine Conn

Resources • Used Time -> Execution Time • Bytes In/Out

Write Live - 1 minute snapshot Tenant Used Execution MB Read Conns Errors 20 % 1 % 1 5 0 frontend:rpc:User frontend:www:File 3 % 3 % 30 90 0 Id taskrunner: 0,5 % 0,5 % 5 4 0 growth: team_quota 1 % 0,5 % 4 1 0 email: UserEmail Total 24,5 % 5 % 40 100 0

Layer: write_live, NumTenants: 360 Throttle Controls: State: steady, TokensPrimaryPool: 300, TokensThrottledPool: 0 Throttled Tenants: [] Period 1: | Used | Idle |Execution| Conns | Errors | Size(MB) | Tenants | 6.48% | 60.15% | 2.58% | 34057 | 0 | 19 | Aggregated stats Top 5 Sources sorted by Used: | 0.79% | 94.11% | 0.36% | 423 | 0 | 0 | offline:bluemail:Email | 0.76% | 93.68% | 0.20% | 437 | 0 | 0 | frontend:rpc:UserEntity | 0.45% | 19.92% | 0.08% | 4922 | 0 | 4 | cape-sfj:cape_dispatcher:CursorEntity | 0.42% | 52.50% | 0.07% | 2783 | 0 | 0 | filejournal:fj_server_bin:FileID | 0.36% | 93.80% | 0.02% | 252 | 0 | 0 | frontend:www:ActivityEntity Layer: write_live, NumTenants: 360 Throttle Controls: State: steady, TokensPrimaryPool: 300, TokensThrottledPool: 0 Throttled Tenants: [] Period 2: | Used | Idle |Execution| Conns | Errors | Size(MB) | Tenants | 100% | 60.15% | 52.58% | 34057 | 20000 | 19 | Aggregated stats Top 5 Sources sorted by Used: | 93.79% | 0.11% | 50.36% | 600 | 300 | 0 | offline:bluemail:Email | 0.76% | 93.68% | 0.20% | 437 | 254 | 0 | frontend:rpc:UserEntity | 0.45% | 19.92% | 0.08% | 4922 | 1293 | 4 | cape-sfj:cape_dispatcher:CursorEntity | 0.42% | 52.50% | 0.07% | 2783 | 2913 | 0 | filejournal:fj_server_bin:FileID | 0.36% | 93.80% | 0.02% | 252 | 23 | 0 | frontend:www:ActivityEntity

edgestore_throttle —tenant=offline:bluemail:Email —tokens=30 —host=abc-de-fg —layer=write_live Layer: write_live, NumTenants: 360 Throttle Controls: State: throttled, TokensPrimaryPool: 270, TokensThrottledPool: 30 Throttled Tenants: [offline:bluemail:Email] Period 3: | Used | Idle |Execution| Conns | Errors | Size(MB) | Tenants | 16.20% | 60.15% | 7.58% | 34057 | 1900 | 19 | Aggregated stats Top 5 Sources sorted by Used: | 10.79% | 0.11% | 5.36% | 600 | 1900 | 0 | offline:bluemail:Email | 0.76% | 93.68% | 0.20% | 437 | 0 | 0 | frontend:rpc:UserEntity | 0.45% | 19.92% | 0.08% | 4922 | 0 | 4 | cape-sfj:cape_dispatcher:CursorEntity | 0.42% | 52.50% | 0.07% | 2783 | 0 | 0 | filejournal:fj_server_bin:FileID | 0.36% | 93.80% | 0.02% | 252 | 0 | 0 | frontend:www:ActivityEntity

Impact • Reduce MTTR • Availability event: • 1. Detection • 2. Investigation • 3. Containment • 4. Short term fix • 5. Long term fix

Findings • Expensive queries • Abusable APIs • Query optimizer • Inconsistencies • Insufficient documentation • Bugs • Perf optimization

Auto-throttle heuristics Manual Throttle using a throttle tool Lessons Learned Query / Throttle / Unthrottle Aggregate tool - queries and filters all engines to isolate the error and limit blast radius while investigating, root causing and fixing the underlying problem. There was a time when we shut down scripts manually not knowing who was causing the problem • 1 deployment to rule them all works found issues with API, bugs, poorly documented client, best practices Throttle mechanism Future work (in progress) • There is such a thing as automating too soon • Silently throttling is bad • Throttling should be a temporary state • Not having pre-defined quotas works • Multiple Isolation breakdowns (by user, by table, by tenant, by request type (Read/Write), by traffic type (Live vs Offline)

What’s next • Control Plane “brain” • continuously query all Engines • automatically throttle tenants when system is degraded • detecting trends • Per logical micros shard (and per Id) granularity for throttling

Credits • Zviad Metreveli • Rati Gelashvili • Robert Verkuil • Alex Degtiar • Jonathan Lee

Multi-Tenancy & Isolation Bogdan Munteanu - Dropbox Overview - PowerPoint PPT Presentation

Multi-Tenancy & Isolation Bogdan Munteanu - Dropbox Overview What is Edgestore? Workloads & API Multi-tenancy & Isolation Lessons Learned What is Edgestore Distributed Metadata Store built on top of MySQL

DEVELOPING A MULTI-TENANT SAAS USING CLOJURE Ari-Pekka Viitanen ME Programmer Architect

PLATFORM AS A SERVICE MULTI TENANCY AND OPEN STANDARDS Peter Chittum @pchittum

GCC Highlighted Products GSure Gel Extraction kit GSure Soil DNA Isolation kit GSure Sputum DNA

Serializable Snapshot Isolation Making ISOLATION LEVEL SERIALIZABLE Provide Serializable

Towards a European Role in Tenancy Law and Housing Policy Christoph U. Schmid, ZERP, Bremen 1

Introduction to pixel track isolation The purpose of track isolation algorithm is an additional

ADAPTED SPAULDING PYRAMID Making Isolation: How does it work? Patient Isolation- Creating

Isolation trees Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Isolation

Key Findings 2015 Dublin Tenancy Protection Service Tenant households on rent supplement (RS)

Tenancy Strategy Local Authority Perspective Stephen Ward Housing Strategy & Research

Presentation to Australian Retailers Association "ARA Managing the Asset Retail Tenancy Forum

Tenancy Training Course and Landlord Workshops Louise Green Operational Manager Need As a

APHAA Presentation Residential Tenancy Dispute Resolution Service (RTDRS) October 16, 2017

Landlord101 Presented By LandlordBC Understanding the Landscape Module 1: Tenant Selection

Justice Connect Homeless Law Tenancy Induction Training 2015 Katie Ho, lawyer Melbourne Office

A Multi-Tenancy Cloud-Native Digital Library Platform Yinlin Chen, Jim Tuttle, William A. Ingram

Reliable and Efficient RFID Networks Jue Wang with Haitham Hassanieh, Dina Katabi, Piotr Indyk

Structured Encryption and Leakage Suppression Tarik Moataz Part I is a joint work with Seny

Syntax Directed Translation Attribute grammar and translation schemes cs4713 1 Typical

opencypher.org opencypher.org | opencypher@googlegroups.com val user=... val product=... val

Cosette: An Automated Solver for SQL Chenglong Shumo Konstantin Alvin Dan Wang Chu Weitz

Biennial Hazardous Waste Report Business Operations Unit Department of Toxic Substances Control

Automatic Printer Driver Installation in Fedora 13 Presenter Tim Waugh Senior Software

!"#$"%&&'(%)#"'#+'(%$,#+-.' /#"01#"2%'3+,,-*,4%&

Multi-Tenancy & Isolation Bogdan Munteanu - Dropbox Overview - PowerPoint PPT Presentation

Multi-Tenancy & Isolation Bogdan Munteanu - Dropbox Overview What is Edgestore? Workloads & API Multi-tenancy & Isolation Lessons Learned What is Edgestore Distributed Metadata Store built on top of MySQL

DEVELOPING A MULTI-TENANT SAAS USING CLOJURE Ari-Pekka Viitanen ME Programmer Architect

PLATFORM AS A SERVICE MULTI TENANCY AND OPEN STANDARDS Peter Chittum @pchittum

GCC Highlighted Products GSure Gel Extraction kit GSure Soil DNA Isolation kit GSure Sputum DNA

Serializable Snapshot Isolation Making ISOLATION LEVEL SERIALIZABLE Provide Serializable

Towards a European Role in Tenancy Law and Housing Policy Christoph U. Schmid, ZERP, Bremen 1

Introduction to pixel track isolation The purpose of track isolation algorithm is an additional

ADAPTED SPAULDING PYRAMID Making Isolation: How does it work? Patient Isolation- Creating

Isolation trees Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Isolation

Key Findings 2015 Dublin Tenancy Protection Service Tenant households on rent supplement (RS)

Tenancy Strategy Local Authority Perspective Stephen Ward Housing Strategy &amp; Research

Presentation to Australian Retailers Association &quot;ARA Managing the Asset Retail Tenancy Forum

Tenancy Training Course and Landlord Workshops Louise Green Operational Manager Need As a

APHAA Presentation Residential Tenancy Dispute Resolution Service (RTDRS) October 16, 2017

Landlord101 Presented By LandlordBC Understanding the Landscape Module 1: Tenant Selection

Justice Connect Homeless Law Tenancy Induction Training 2015 Katie Ho, lawyer Melbourne Office

A Multi-Tenancy Cloud-Native Digital Library Platform Yinlin Chen, Jim Tuttle, William A. Ingram

Reliable and Efficient RFID Networks Jue Wang with Haitham Hassanieh, Dina Katabi, Piotr Indyk

Structured Encryption and Leakage Suppression Tarik Moataz Part I is a joint work with Seny

Syntax Directed Translation Attribute grammar and translation schemes cs4713 1 Typical

opencypher.org opencypher.org | opencypher@googlegroups.com val user=... val product=... val

Cosette: An Automated Solver for SQL Chenglong Shumo Konstantin Alvin Dan Wang Chu Weitz

Biennial Hazardous Waste Report Business Operations Unit Department of Toxic Substances Control

Automatic Printer Driver Installation in Fedora 13 Presenter Tim Waugh Senior Software

!&quot;#$&quot;%&amp;&amp;'(%)#&quot;*'#+'(%$,#+-.' /#&quot;01#&quot;2%'3+,*,-*,4%&amp;

Tenancy Strategy Local Authority Perspective Stephen Ward Housing Strategy & Research

Presentation to Australian Retailers Association "ARA Managing the Asset Retail Tenancy Forum

!"#$"%&&'(%)#"'#+'(%$,#+-.' /#"01#"2%'3+,,-*,4%&