ahab data driven virtual cluster hunting
play

AHAB: Data-Driven Virtual Cluster Hunting Johannes Zerwas* Patrick - PowerPoint PPT Presentation

Chair of Communication Networks Departement of Electrical and Computer Engineering Technical University of Munich AHAB: Data-Driven Virtual Cluster Hunting Johannes Zerwas* Patrick Kalmbach* Carlo Fuerst Arne Ludwig Andreas Blenk* Wolfgang


  1. Chair of Communication Networks Departement of Electrical and Computer Engineering Technical University of Munich AHAB: Data-Driven Virtual Cluster Hunting Johannes Zerwas* Patrick Kalmbach* Carlo Fuerst° Arne Ludwig° Andreas Blenk* Wolfgang Kellerer* Stefan Schmid^ *Technical University of Munich, Germany °Technical University of Berlin, Germany ^University of Vienna, Austria IFIP Networking 2018, Zurich, Switzerland

  2. Context VM 1 VM 1 VM 3 ? ? VM 2 VM 2 VM N • Increased use data-intensive applications in shared data centers • Many provider-tenant interfaces neglect network as a resource • Problems: − Unpredictable application performance − Limited applicability of cloud − Inefficiencies in production data centers • Solution: Network-aware abstraction - Virtual Cluster (ACM SIGCOMM 2011) Johannes Zerwas (TUM) 2

  3. Background: Virtual Cluster Abstraction 0/8 Physical Cluster • Compute Units (CUs) used total BUs 0/4 • Bandwidth Units (BUs) • Tree-like topology (abstracted 0/2 0/2 0/2 0/2 from Fat-Tree) Virtual Cluster (VC) • Number of VMs (N) • Size of VMs (S) 1 • Bandwidth (B) • Lifetime given resource fulfillment Johannes Zerwas (TUM) 3

  4. Background: Virtual Cluster Abstraction 0/8 Physical Cluster Footprint F=6 • Compute Units (CUs) 1/4 • Bandwidth Units (BUs) • Tree-like topology (abstracted 0/2 1/2 1/2 1/2 from Fat-Tree) Virtual Cluster (VC) Utilization U=9/32 • Number of VMs (N) • Size of VMs (S) 1 • Bandwidth (B) • Lifetime given resource fulfillment Johannes Zerwas (TUM) 4

  5. Problem: Resource Fragmentation Existing allocation algorithms focus on single request: ▪ Oktopus (ACM SIGCOMM 2011) ▪ Kraken (IEEE/ACM TON 2018) 0/16 0/16 0/16 0/16 0/16 0/16 0/8 0/8 8/8 0/8 8/8 0/8 0/8 4/8 4/8 0/8 4/8 4/8 Contribution 1: TETRIS - Sacrifice the footprint 2/4 2/4 0/4 2/4 0/4 2/4 2/4 2/4 0/4 2/4 2/4 0/4 4/4 0/4 0/4 4/4 0/4 0/4 0/4 0/4 4/4 0/4 4/4 0/4 Fragmentation of resources Contribution 2: AHAB - Admission Control 1 2 t Johannes Zerwas (TUM) 5

  6. TETRIS: Sacrifice Footprint for Fragmentation Choose hosts with max. ratio of residual resources 0/16 0/16 4/16 4/16 0/16 0/16 2/8 0/8 0/8 0/8 2/8 0/8 0/8 0/8 2/8 0/8 0/8 2/8 1/4 1/4 0/4 0/4 0/4 0/4 0/4 1/4 1/4 0/4 0/4 0/4 0/4 1/4 0/4 1/4 0/4 0/4 0/4 1/4 0/4 1/4 0/4 1/4 = 4 − 2 2/3 2/3 2/3 2/3 2/3 2/3 0/2 2/3 2/3 4 − 1 1 t Johannes Zerwas (TUM) 6

  7. TETRIS: Sacrifice Footprint for Fragmentation Choose hosts with max. ratio of residual resources 12/16 4/16 4/16 12/16 4/16 4/16 2/8 6/8 2/8 6/8 2/8 2/8 6/8 2/8 6/8 2/8 2/8 2/8 1/4 3/4 3/4 3/4 1/4 1/4 1/4 3/4 1/4 1/4 3/4 1/4 3/4 1/4 1/4 1/4 1/4 3/4 1/4 1/4 3/4 1/4 3/4 1/4 Resources still usable - 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1 2 t Johannes Zerwas (TUM) 7

  8. Algorithm Evaluation ▪ Baseline: OKTOPUS (ACM SIGCOMM 2011), KRAKEN (IEEE/ACM TON 2018) ▪ Physical Cluster: Fat-Tree with k=12, 8CUs and 8BUs ▪ Performance metrics: CU Utilization, avg. VC Footprint ▪ Virtual Cluster Requests: ▪ 1000 / run with varying arrival rates ▪ Num. VMs, size VMs, BW similar to traces from Google & Microsoft Johannes Zerwas (TUM) 8

  9. TETRIS Evaluation +5% utilization +10% footprint Johannes Zerwas (TUM) 9

  10. TETRIS Evaluation Bandwidth (BU) 1 2 4 8 Num. VMs Size VMs (CU) Add Admission Control Johannes Zerwas (TUM) 10

  11. AHAB: The Case for Data-Driven Admission Control Data-Driven Leverage Monte Carlo Decision Tree Search Knowledge Johannes Zerwas (TUM) 11

  12. AHAB: The Case for Data-Driven Admission Control 0/4 0/4 0/2 0/2 0/2 0/2 accept 1 reject Johannes Zerwas (TUM) 12

  13. AHAB: The Case for Data-Driven Admission Control Utilization 1/4 1/4 accept reject 1 1 1/2 1/2 1/2 0/2 = 12 accept … … 1 reject Johannes Zerwas (TUM) 13

  14. AHAB: The Case for Data-Driven Admission Control 0/4 0/4 accept reject 1 1 0/2 1/2 0/2 1/2 = 12 accept … Num. requests / sequence A … Num. sequences “accept” accept accept A > B? 1 1 1 B = 9 Works with every VC embedding … reject algorithm (Oktopus, Kraken, Tetris) … Johannes Zerwas (TUM) 14

  15. AHAB improves utilization +10% utilization -25% footprint Johannes Zerwas (TUM) 15

  16. Why is AHAB better? Kraken AHAB(Kraken) Bandwidth (BU) Bandwidth (BU) 1 2 4 8 1 2 4 8 Num. VMs Num. VMs Size VMs (CU) Size VMs (CU) Small VMs Large VMs Large BW Small BW Johannes Zerwas (TUM) 16

  17. Why is AHAB better? Kraken & Tetris AHAB Acceptance Ratio Size VM / BW AHAB accepts more valuable requests Johannes Zerwas (TUM) 17

  18. Optimization Opportunities Trade-Off: Utilization - Use ML for Computations speed-up Johannes Zerwas (TUM) 18

  19. Summary ▪ TETRIS sacrifices footprint increase utilization ▪ AHAB employs a data-driven approach for Admission Control ▪ AHAB evaluates the impact of a single request on future requests ▪ AHAB’s approach applies also to other use-cases ▪ Future Work: Use ML to predict AHAB’s decisions Johannes Zerwas (TUM) 19

  20. Thank you! Questions? Johannes Zerwas (TUM) 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend