Multi-tenant Distributed Systems Jonathan Mace Peter Bodik Rodrigo - PowerPoint PPT Presentation

Workflows 16

Workflows Resources Purpose: cope with diversity of resources 16

Workflows Resources Purpose: cope with diversity of resources What we need: 1. Identify overloaded resources 16

Workflows Resources Purpose: cope with diversity of resources What we need: 1. Identify overloaded resources 2. Identify culprit workflows 16

Workflows Resources Purpose: cope with diversity of resources What we need: 1. Identify overloaded resources Slowdown Ratio of how slow the resource is now compared to its baseline performance with no contention. 2. Identify culprit workflows 16

Workflows Resources Purpose: cope with diversity of resources What we need: 1. Identify overloaded resources Slowdown Ratio of how slow the resource is now compared to its baseline performance with no contention. 2. Identify culprit workflows Load Fraction of current utilization that we can attribute to each workflow 16

Workflows Resources Purpose: cope with diversity of resources What we need: 1. Identify overloaded resources Slowdown Ratio of how slow the resource is now compared to its baseline performance with no contention. 2. Identify culprit workflows Load Fraction of current utilization that we can attribute to each workflow 17

Slowdown (queue time + execute time) / execute time eg. 100ms queue, 10ms execute => slowdown 11 Load time spent executing eg. 10ms execute => load 10 Workflows Resources Purpose: cope with diversity of resources What we need: 1. Identify overloaded resources Slowdown Ratio of how slow the resource is now compared to its baseline performance with no contention. 2. Identify culprit workflows Load Fraction of current utilization that we can attribute to each workflow 17

Workflows Resources 17

Control Points Workflows Resources Goal: enforce resource management decisions 18

Control Points Workflows Resources Goal: enforce resource management decisions Decoupled from resources Rate-limits workflows, agnostic to underlying implementation e.g., token bucket priority queue 18

Control Points Workflows Resources Goal: enforce resource management decisions Decoupled from resources Rate-limits workflows, agnostic to underlying implementation e.g., token bucket priority queue 19

Control Points Workflows Resources 20

Pervasive Measurement Control Points Workflows Resources 1. Pervasive Measurement Aggregated locally then reported centrally once per second 20

Pervasive Measurement Retro Controller API Control Points Workflows Resources 1. Pervasive Measurement Aggregated locally then reported centrally once per second 2. Centralized Controller Global, abstracted view of the system 20

Pervasive Measurement Retro Controller API Policy Policy Policy Control Points Workflows Resources 1. Pervasive Measurement Aggregated locally then reported centrally once per second 2. Centralized Controller Global, abstracted view of the system Policies run in continuous control loop 20

Pervasive Measurement Retro Controller API Policy Policy Policy Distributed Enforcement Workflows Resources Control Points 1. Pervasive Measurement Aggregated locally then reported centrally once per second 2. Centralized Controller Global, abstracted view of the system Policies run in continuous control loop 3. Distributed Enforcement 20 Co-ordinates enforcement using distributed token bucket

Pervasive Measurement Retro Controller API Policy Policy Policy Distributed Enforcement Control Points Workflows Resources “Control Plane” for resource management Global, abstracted view of the system Easier to write Reusable 21

Example: LatencySLO Policy 22

Example: LatencySLO Policy H High Priority Workflows “200ms average request latency” 22

Example: LatencySLO Policy H High Priority Workflows L Low Priority Workflows “200ms average request latency” (use spare capacity) 22

Example: LatencySLO Policy H High Priority Workflows L Low Priority Workflows “200ms average request latency” (use spare capacity) monitor latencies 22

Example: LatencySLO Policy H High Priority Workflows L Low Priority Workflows “200ms average request latency” (use spare capacity) monitor latencies attribute interference 22

Example: LatencySLO Policy H High Priority Workflows L Low Priority Workflows “200ms average request latency” (use spare capacity) monitor latencies throttle interfering workflows 22

Example: LatencySLO Policy H High Priority Workflows L Low Priority Workflows 1 foreach candidate in H 2 miss[candidate] = latency (candidate) / guarantee[candidate] 3 W = candidate in H with max miss[candidate] 4 foreach rsrc in resources () // calculate importance of each resource for hipri 5 importance[rsrc] = latency ( W , rsrc) * log( slowdown (rsrc)) 6 foreach lopri in L // calculate low priority workflow interference 7 interference[lopri] = Σ rsrc importance[rsrc] * load (lopri, rsrc) / load (rsrc) 8 foreach lopri in L // normalize interference 9 interference[lopri] /= Σ k interference[k] 10 foreach lopri in L 11 if miss[ W ] > 1 // throttle 12 scalefactor = 1 – α * (miss[ W ] – 1) * interference[lopri] 13 e lse // release 14 scalefactor = 1 + β 15 foreach cpoint in controlpoints () // apply new rates 16 set_rate (cpoint, lopri, scalefactor * get_rate (cpoint, lopri) 23

Example: LatencySLO Policy H High Priority Workflows L Low Priority Workflows 1 foreach candidate in H Select the high priority workflow W with worst performance 2 miss[candidate] = latency (candidate) / guarantee[candidate] 3 W = candidate in H with max miss[candidate] 4 foreach rsrc in resources () // calculate importance of each resource for hipri 5 importance[rsrc] = latency ( W , rsrc) * log( slowdown (rsrc)) 6 foreach lopri in L // calculate low priority workflow interference 7 interference[lopri] = Σ rsrc importance[rsrc] * load (lopri, rsrc) / load (rsrc) 8 foreach lopri in L // normalize interference 9 interference[lopri] /= Σ k interference[k] 10 foreach lopri in L 11 if miss[ W ] > 1 // throttle 12 scalefactor = 1 – α * (miss[ W ] – 1) * interference[lopri] 13 e lse // release 14 scalefactor = 1 + β 15 foreach cpoint in controlpoints () // apply new rates 16 set_rate (cpoint, lopri, scalefactor * get_rate (cpoint, lopri) 24

Example: LatencySLO Policy H High Priority Workflows L Low Priority Workflows 1 foreach candidate in H Select the high priority workflow W with worst performance 2 miss[candidate] = latency (candidate) / guarantee[candidate] 3 W = candidate in H with max miss[candidate] 4 foreach rsrc in resources () // calculate importance of each resource for hipri Weight low priority workflows by their interference with W 5 importance[rsrc] = latency ( W , rsrc) * log( slowdown (rsrc)) 6 foreach lopri in L // calculate low priority workflow interference 7 interference[lopri] = Σ rsrc importance[rsrc] * load (lopri, rsrc) / load (rsrc) 8 foreach lopri in L // normalize interference 9 interference[lopri] /= Σ k interference[k] 10 foreach lopri in L 11 if miss[ W ] > 1 // throttle 12 scalefactor = 1 – α * (miss[ W ] – 1) * interference[lopri] 13 e lse // release 14 scalefactor = 1 + β 15 foreach cpoint in controlpoints () // apply new rates 16 set_rate (cpoint, lopri, scalefactor * get_rate (cpoint, lopri) 24

Example: LatencySLO Policy H High Priority Workflows L Low Priority Workflows 1 foreach candidate in H Select the high priority workflow W with worst performance 2 miss[candidate] = latency (candidate) / guarantee[candidate] 3 W = candidate in H with max miss[candidate] 4 foreach rsrc in resources () // calculate importance of each resource for hipri Weight low priority workflows by their interference with W 5 importance[rsrc] = latency ( W , rsrc) * log( slowdown (rsrc)) 6 foreach lopri in L // calculate low priority workflow interference 7 interference[lopri] = Σ rsrc importance[rsrc] * load (lopri, rsrc) / load (rsrc) 8 foreach lopri in L // normalize interference 9 interference[lopri] /= Σ k interference[k] 10 foreach lopri in L Throttle low priority workflows proportionally to their weight 11 if miss[ W ] > 1 // throttle 12 scalefactor = 1 – α * (miss[ W ] – 1) * interference[lopri] 13 e lse // release 14 scalefactor = 1 + β 15 foreach cpoint in controlpoints () // apply new rates 16 set_rate (cpoint, lopri, scalefactor * get_rate (cpoint, lopri) 24

Example: LatencySLO Policy H High Priority Workflows L Low Priority Workflows Select the high priority workflow W with worst performance 1 foreach candidate in H 2 miss[candidate] = latency (candidate) / guarantee[candidate] 3 W = candidate in H with max miss[candidate] 4 foreach rsrc in resources () // calculate importance of each resource for hipri 5 importance[rsrc] = latency ( W , rsrc) * log( slowdown (rsrc)) Weight low priority workflows by their interference with W 6 foreach lopri in L // calculate low priority workflow interference 7 interference[lopri] = Σ rsrc importance[rsrc] * load (lopri, rsrc) / load (rsrc) 8 foreach lopri in L // normalize interference 9 interference[lopri] /= Σ k interference[k] 10 foreach lopri in L 11 if miss[ W ] > 1 // throttle Throttle low priority workflows proportionally to their weight 12 scalefactor = 1 – α * (miss[ W ] – 1) * interference[lopri] 13 e lse // release 14 scalefactor = 1 + β 15 foreach cpoint in controlpoints () // apply new rates 16 set_rate (cpoint, lopri, scalefactor * get_rate (cpoint, lopri) 25

Example: LatencySLO Policy H High Priority Workflows L Low Priority Workflows Select the high priority workflow W with worst performance 1 foreach candidate in H 2 miss[candidate] = latency (candidate) / guarantee[candidate] 3 W = candidate in H with max miss[candidate] Weight low priority workflows by their interference with W 4 foreach rsrc in resources () // calculate importance of each resource for hipri 5 importance[rsrc] = latency ( W , rsrc) * log( slowdown (rsrc)) 6 foreach lopri in L // calculate low priority workflow interference 7 interference[lopri] = Σ rsrc importance[rsrc] * load (lopri, rsrc) / load (rsrc) 8 foreach lopri in L // normalize interference 9 interference[lopri] /= Σ k interference[k] 10 foreach lopri in L 11 if miss[ W ] > 1 // throttle Throttle low priority workflows proportionally to their weight 12 scalefactor = 1 – α * (miss[ W ] – 1) * interference[lopri] 13 e lse // release 14 scalefactor = 1 + β 15 foreach cpoint in controlpoints () // apply new rates 16 set_rate (cpoint, lopri, scalefactor * get_rate (cpoint, lopri) 26

Example: LatencySLO Policy H High Priority Workflows L Low Priority Workflows Select the high priority workflow W with worst performance 1 foreach candidate in H 2 miss[candidate] = latency (candidate) / guarantee[candidate] 3 W = candidate in H with max miss[candidate] Weight low priority workflows by their interference with W 4 foreach rsrc in resources () // calculate importance of each resource for hipri 5 importance[rsrc] = latency ( W , rsrc) * log( slowdown (rsrc)) 6 foreach lopri in L // calculate low priority workflow interference 7 interference[lopri] = Σ rsrc importance[rsrc] * load (lopri, rsrc) / load (rsrc) Throttle low priority workflows proportionally to their weight 8 foreach lopri in L // normalize interference 9 interference[lopri] /= Σ k interference[k] 10 foreach lopri in L 11 if miss[ W ] > 1 // throttle 12 scalefactor = 1 – α * (miss[ W ] – 1) * interference[lopri] 13 e lse // release 14 scalefactor = 1 + β 15 foreach cpoint in controlpoints () // apply new rates 16 set_rate (cpoint, lopri, scalefactor * get_rate (cpoint, lopri) 27

Other types of policy… 28

Other types of policy… Bottleneck Fairness Policy Detect most overloaded resource Fair-share resource between tenants using it 28

Other types of policy… Bottleneck Fairness Policy Detect most overloaded resource Fair-share resource between tenants using it Policy Dominant Resource Fairness Estimate demands and capacities from measurements 28

Other types of policy… Bottleneck Fairness Policy Detect most overloaded resource Fair-share resource between tenants using it Policy Dominant Resource Fairness Estimate demands and capacities from measurements Concise Any resources can be bottleneck (policy doesn’t care) Not system specific 28

Evaluation 29

Instrumentation 30

Multi-tenant Distributed Systems Jonathan Mace Peter Bodik Rodrigo - PowerPoint PPT Presentation

Targeted Resource Management in Multi-tenant Distributed Systems Jonathan Mace Peter Bodik Rodrigo Fonseca Madanlal Musuvathi Brown University MSR Redmond Brown University MSR Redmond Resource Management in Multi-Tenant Systems 2

LANDLORD TENANT LAW UPDATES HIGHLIGHTS FROM LAWS PASSED IN 2019 STARTING A LANDLORD-TENANT

Election s Agenda Agenda 1. Welcome and Introduction 2. Why we are here today 3. Tenant

February 14, 2018 Staff: Kim Painter, Laura London Queens Court Tenant Profile Tenant Profile

Northampton Tenant Panel Housing Options Appraisal Appendix 2 Results and Analysis of the Tenant

MTOB ae senior thesis april 26, 2013 multi-tenant advisor: dr. boothby office building

Operating Multi-Tenant Kafka Services for Developers Data Council SF 2019 Ali Hamidi - Heroku

assigning a lease from tenant to guarantor Peter Williams June 2016 www.shoosmiths.co.uk

4/12/2018 SEVEN THINGS TO KNOW ABOUT FOR DISTRICT COURT LANDLORD-TENANT JUDGES LAW THING #1:

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges

Distributed File Systems Distributed File Systems A distributed file system (DFS) is a

Introduction to Distributed * Systems Introduction to Distributed * Systems Outline Outline

Introduction to Distributed Systems Introduction to Distributed Systems Outline Outline

Distributed File Storage in Multi-Tenant Clouds using CephFS Openstack Vancouver 2018 May 23

Distributed multi-tenant cloud/fog and heterogeneous SDN/NFV orchestration for 5G services Ricard

Operational and Scaling Wins at Workday From 50K to 300K Cores OpenStack Summit Berlin 2018

A new way to pro fi le Node . js Matteo Collina Maximum number of servers sales traf fi c

MIDAS MANAGED INTELLIGENT DECONFICTION AND SCHEDULING FOR SATELLITE COMMUNICATION IEEE Aerospace

SANDVIK MATERIALS TECHNOLOGY PRIMARY PRODUCTS 1 SAFETY FIRST Sandviks objective is zero harm

Broadband and Competition In Alaska September 7- 9, 2016 alaskacommunications.com 1 | Alaska

Public Meeting Fox Chapel Area High School Auditorium December 12, 2017 Project Team PennDOT

"Manufacturing challenges - now and future - how will we ensure patient access to these

Algorithms and Data Structures to Accelerate Network Analysis Reservoir Labs Jordi Ros-Giralt,

Multi-tenant Distributed Systems Jonathan Mace Peter Bodik Rodrigo - PowerPoint PPT Presentation

Targeted Resource Management in Multi-tenant Distributed Systems Jonathan Mace Peter Bodik Rodrigo Fonseca Madanlal Musuvathi Brown University MSR Redmond Brown University MSR Redmond Resource Management in Multi-Tenant Systems 2

LANDLORD TENANT LAW UPDATES HIGHLIGHTS FROM LAWS PASSED IN 2019 STARTING A LANDLORD-TENANT

Election s Agenda Agenda 1. Welcome and Introduction 2. Why we are here today 3. Tenant

February 14, 2018 Staff: Kim Painter, Laura London Queens Court Tenant Profile Tenant Profile

Northampton Tenant Panel Housing Options Appraisal Appendix 2 Results and Analysis of the Tenant

MTOB ae senior thesis april 26, 2013 multi-tenant advisor: dr. boothby office building

Operating Multi-Tenant Kafka Services for Developers Data Council SF 2019 Ali Hamidi - Heroku

assigning a lease from tenant to guarantor Peter Williams June 2016 www.shoosmiths.co.uk

4/12/2018 SEVEN THINGS TO KNOW ABOUT FOR DISTRICT COURT LANDLORD-TENANT JUDGES LAW THING #1:

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals &amp; Challenges

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals &amp; Challenges

Distributed File Systems Distributed File Systems A distributed file system (DFS) is a

Introduction to Distributed * Systems Introduction to Distributed * Systems Outline Outline

Introduction to Distributed Systems Introduction to Distributed Systems Outline Outline

Distributed File Storage in Multi-Tenant Clouds using CephFS Openstack Vancouver 2018 May 23

Distributed multi-tenant cloud/fog and heterogeneous SDN/NFV orchestration for 5G services Ricard

Operational and Scaling Wins at Workday From 50K to 300K Cores OpenStack Summit Berlin 2018

A new way to pro fi le Node . js Matteo Collina Maximum number of servers sales traf fi c

MIDAS MANAGED INTELLIGENT DECONFICTION AND SCHEDULING FOR SATELLITE COMMUNICATION IEEE Aerospace

SANDVIK MATERIALS TECHNOLOGY PRIMARY PRODUCTS 1 SAFETY FIRST Sandviks objective is zero harm

Broadband and Competition In Alaska September 7- 9, 2016 alaskacommunications.com 1 | Alaska

Public Meeting Fox Chapel Area High School Auditorium December 12, 2017 Project Team PennDOT

&quot;Manufacturing challenges - now and future - how will we ensure patient access to these

Algorithms and Data Structures to Accelerate Network Analysis Reservoir Labs Jordi Ros-Giralt,

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges

"Manufacturing challenges - now and future - how will we ensure patient access to these