AHAB: Data-Driven Virtual Cluster Hunting Johannes Zerwas* Patrick - - PowerPoint PPT Presentation

ahab data driven virtual cluster hunting
SMART_READER_LITE
LIVE PREVIEW

AHAB: Data-Driven Virtual Cluster Hunting Johannes Zerwas* Patrick - - PowerPoint PPT Presentation

Chair of Communication Networks Departement of Electrical and Computer Engineering Technical University of Munich AHAB: Data-Driven Virtual Cluster Hunting Johannes Zerwas* Patrick Kalmbach* Carlo Fuerst Arne Ludwig Andreas Blenk* Wolfgang


slide-1
SLIDE 1

Johannes Zerwas* Patrick Kalmbach* Carlo Fuerst° Arne Ludwig° Andreas Blenk* Wolfgang Kellerer* Stefan Schmid^ *Technical University of Munich, Germany °Technical University of Berlin, Germany ^University of Vienna, Austria

AHAB: Data-Driven Virtual Cluster Hunting

IFIP Networking 2018, Zurich, Switzerland

Chair of Communication Networks Departement of Electrical and Computer Engineering Technical University of Munich

slide-2
SLIDE 2

?

  • Increased use data-intensive applications in shared data centers
  • Many provider-tenant interfaces neglect network as a resource
  • Problems:

− Unpredictable application performance − Limited applicability of cloud − Inefficiencies in production data centers

  • Solution: Network-aware abstraction - Virtual Cluster

(ACM SIGCOMM 2011)

2 Johannes Zerwas (TUM)

Context

VM 1 VM 2 VM N VM 3

?

VM 1 VM 2

slide-3
SLIDE 3

3 Johannes Zerwas (TUM)

0/4 0/8 1 0/2 0/2 0/2 0/2

Physical Cluster

  • Compute Units (CUs)
  • Bandwidth Units (BUs)
  • Tree-like topology (abstracted

from Fat-Tree) Virtual Cluster (VC)

  • Number of VMs (N)
  • Size of VMs (S)
  • Bandwidth (B)
  • Lifetime given resource

fulfillment

Background: Virtual Cluster Abstraction

used total BUs

slide-4
SLIDE 4

4 Johannes Zerwas (TUM)

Physical Cluster

  • Compute Units (CUs)
  • Bandwidth Units (BUs)
  • Tree-like topology (abstracted

from Fat-Tree) Virtual Cluster (VC)

  • Number of VMs (N)
  • Size of VMs (S)
  • Bandwidth (B)
  • Lifetime given resource

fulfillment

1/4 0/8 1

Footprint F=6

1/2 1/2 0/2 1/2

Utilization U=9/32

Background: Virtual Cluster Abstraction

slide-5
SLIDE 5

Johannes Zerwas (TUM)

2/4 4/8 0/16 2/4 2/4 2/4 4/4 4/4 4/4 4/4 4/8 8/8 8/8 0/16 0/4 0/8 0/16 0/4 0/4 0/4 0/4 0/4 0/4 0/4 0/8 0/8 0/8 0/16 2/4 4/8 0/16 2/4 2/4 2/4 0/4 0/4 0/4 0/4 4/8 0/8 0/8 0/16

Existing allocation algorithms focus on single request: ▪ Oktopus (ACM SIGCOMM 2011) ▪ Kraken (IEEE/ACM TON 2018)

5

Problem: Resource Fragmentation

1 2

Fragmentation

  • f resources

Contribution 1: TETRIS - Sacrifice the footprint

t

Contribution 2: AHAB - Admission Control

slide-6
SLIDE 6

Johannes Zerwas (TUM)

0/4 0/8 0/16 0/4 0/4 0/4 0/4 0/4 0/4 0/4 0/8 0/8 0/8 0/16 1/4 2/8 4/16 1/4 1/4 1/4 1/4 1/4 1/4 1/4 2/8 2/8 2/8 4/16 0/4 0/8 0/16 0/4 0/4 0/4 0/4 0/4 0/4 0/8 0/8 0/8 0/16 1/4

Choose hosts with max. ratio of residual resources

6

TETRIS: Sacrifice Footprint for Fragmentation

1

2/3 2/3 2/3 2/3 2/3 2/3 2/3 2/3 0/2 t = 4 − 2 4 − 1

slide-7
SLIDE 7

Johannes Zerwas (TUM)

Choose hosts with max. ratio of residual resources

7

TETRIS: Sacrifice Footprint for Fragmentation

1 2 1/4 2/8 4/16 1/4 1/4 1/4 1/4 1/4 1/4 1/4 2/8 2/8 2/8 4/16

1/1 1/1 1/1 1/1 1/1 1/1 1/1 1/1

  • 1/4

2/8 4/16 1/4 1/4 3/4 1/4 1/4 1/4 1/4 2/8 2/8 2/8 4/16 3/4 3/4 6/8 12/16 3/4 3/4 3/4 3/4 3/4 3/4 6/8 6/8 6/8 12/16

Resources still usable

t

slide-8
SLIDE 8

Johannes Zerwas (TUM)

▪ Baseline: OKTOPUS (ACM SIGCOMM 2011), KRAKEN (IEEE/ACM TON 2018) ▪ Physical Cluster: Fat-Tree with k=12, 8CUs and 8BUs ▪ Performance metrics: CU Utilization, avg. VC Footprint ▪ Virtual Cluster Requests:

▪ 1000 / run with varying arrival rates ▪ Num. VMs, size VMs, BW similar to traces from Google & Microsoft

Algorithm Evaluation

8

slide-9
SLIDE 9

Johannes Zerwas (TUM)

TETRIS Evaluation

9

+5% utilization +10% footprint

slide-10
SLIDE 10

Johannes Zerwas (TUM)

TETRIS Evaluation

10

Bandwidth (BU) 1 2 4 8

  • Num. VMs

Size VMs (CU)

Add Admission Control

slide-11
SLIDE 11

Johannes Zerwas (TUM)

AHAB: The Case for Data-Driven Admission Control

11

Leverage Knowledge Monte Carlo Tree Search Data-Driven Decision

slide-12
SLIDE 12

Johannes Zerwas (TUM) 12

0/2 0/4 0/2 0/2 0/4 1

accept reject

0/2

AHAB: The Case for Data-Driven Admission Control

slide-13
SLIDE 13

Johannes Zerwas (TUM) 13

1/2 1/4 1/2 1/2 1/4 0/2 1 1 1

accept = 12

… …

reject

AHAB: The Case for Data-Driven Admission Control

accept reject

Utilization

slide-14
SLIDE 14

Johannes Zerwas (TUM) 14

1 1 1 0/2 0/4 1/2 0/2 0/4

A > B?

1 1

= 9

1/2

… … … …

A B “accept” accept reject = 12

Works with every VC embedding algorithm (Oktopus, Kraken, Tetris)

AHAB: The Case for Data-Driven Admission Control

accept reject accept accept

  • Num. requests / sequence
  • Num. sequences
slide-15
SLIDE 15

Johannes Zerwas (TUM)

AHAB improves utilization

15

+10% utilization

  • 25% footprint
slide-16
SLIDE 16

Johannes Zerwas (TUM)

AHAB(Kraken)

Kraken

16

Small VMs Large BW Large VMs Small BW

Why is AHAB better?

Bandwidth (BU) 1 2 4 8

  • Num. VMs

Size VMs (CU) Bandwidth (BU) 1 2 4 8

  • Num. VMs

Size VMs (CU)

slide-17
SLIDE 17

Johannes Zerwas (TUM)

Why is AHAB better?

17

Kraken & Tetris AHAB

AHAB accepts more valuable requests

Size VM / BW Acceptance Ratio

slide-18
SLIDE 18

Johannes Zerwas (TUM) 18

Optimization Opportunities

Trade-Off: Utilization - Computations Use ML for speed-up

slide-19
SLIDE 19

Johannes Zerwas (TUM)

▪ TETRIS sacrifices footprint increase utilization ▪ AHAB employs a data-driven approach for Admission Control ▪ AHAB evaluates the impact of a single request on future requests ▪ AHAB’s approach applies also to other use-cases ▪ Future Work: Use ML to predict AHAB’s decisions

Summary

19

slide-20
SLIDE 20

Johannes Zerwas (TUM)

Thank you! Questions?

20