Mage: Online and Interference-Aware Scheduling for Multi-Scale - - PowerPoint PPT Presentation

mage online and interference aware
SMART_READER_LITE
LIVE PREVIEW

Mage: Online and Interference-Aware Scheduling for Multi-Scale - - PowerPoint PPT Presentation

Mage: Online and Interference-Aware Scheduling for Multi-Scale Heterogeneous Systems Francisco Romero 1 and Christina Delimitrou 2 1 Stanford University, 2 Cornell University PACT Session 4a November 2, 2018 Motivation Heterogeneity


slide-1
SLIDE 1

Mage: Online and Interference-Aware Scheduling for Multi-Scale Heterogeneous Systems

Francisco Romero1 and Christina Delimitrou2

1Stanford University, 2Cornell University

PACT – Session 4a – November 2, 2018

slide-2
SLIDE 2

Motivation

  • Heterogeneity is becoming more

prevalent

  • Different server generations
  • Advanced management features,

e.g., power management

  • Allows for systems to better

match applications to the underlying hardware

  • Challenge: How do we maximize

application performance and maintain high resource utilization?

Big Core Small Core Memory Memory App 1 App 2 Small Core

slide-3
SLIDE 3

Prior Work

System Heterogeneous Clusters Heterogeneous CMPs Paragon ✓ ❌ Whare-map ✓ ❌ Bubble-flux ✓ ❌ Composite cores ❌ ✓ Hass ❌ ✓ PIE ❌ ✓

slide-4
SLIDE 4

The Problem with “Sum of Schedulers”

  • Suboptimal performance
  • Revisit several scheduling decisions

Heterogeneous Cluster Scheduler Heterogeneous CMP Scheduler

Heterogeneous Cluster + CMP Scheduler Exhaustive search

  • High overhead
  • Not scalable

Need a data-driven approach to avoid exhaustive search

slide-5
SLIDE 5

Mage

  • Tiered runtime scheduler that considers inter- and intra-server heterogeneity jointly
  • Leverages fast and online data mining to quickly explore the space of application

placements

  • Lightweight application monitoring and rescheduling
  • Heterogeneous CMPs: 38% average improvement compared to a greedy scheduler
  • Heterogeneous Cluster: 30% average improvement compared to a greedy scheduler

and 11% average improvement compared to a heterogeneity- and interference- aware scheduler

slide-6
SLIDE 6

Mage Master and Mage Agents

Big Core

Small Core

Memory Memory Agent Agent

Master

Small Core

Mage Master

  • Runs inference
  • Makes optimal application-to-

resource scheduling decision

  • Decides when applications should

be migrated/rescheduled Mage Agent

  • Monitor the performance of

all scheduled applications

  • Notify the master when QoS

violations occur

slide-7
SLIDE 7

Application Arrival and Initial Scheduling

Big Core

Small Core

Memory Memory Agent Agent

Master

Small Core

slide-8
SLIDE 8

What we want

App1:Core1 App2:Core2 App3:Core3 App1:Core1 App2:Core3 App3:Core2 … App1:Core3 App2:Core2 App3:Core1 App1 MIPS1,1 MIPS1,2 … MIPS1,6 App2 MIPS2,1 MIPS2,2 … MIPS2,6 App3 MIPS3,1 MIPS3,2 … MIPS3,6 Applications Application-to-Resource

✓ Heterogeneous resources that benefit an application ✓ Performance impact of co-scheduling applications

How can Mage quickly and accurately generate this matrix?

slide-9
SLIDE 9

Collaborative Filtering

  • Use Single Value Decomposition (SVD) with PQ-Reconstruction (SGD)

to uncover:

  • Heterogeneous resources that benefit individual applications
  • Interference that can be tolerated between applications

Apps 4 2 3 9 App-to-Resource Sparse Utility Matrix SVD Decomposed Matrices

U Σ V

SGD 4 5 3 1 2 7 2 3 9 Reconstructed Utility Matrix

slide-10
SLIDE 10
  • Cont. Kernel 1

[Network]

  • Cont. Kernel 2

[CPU] …

  • Cont. Kernel n

[Cache] App1 MIPS1,1 MIPS1,2 … ? App2 ? ? … MIPS2,n App3 MIPS3,1 ? … ?

  • Cont. Kernel 1

[Network]

  • Cont. Kernel 2

[CPU] …

  • Cont. Kernel n

[Cache] App1 MIPS1,1 MIPS1,2 … MIPS1,n App2 MIPS2,1 MIPS2,2 … MIPS2,n App3 MIPS3,1 MIPS3,2 … MIPS3,n

Contentious Kernel Profiling

Core1 Core2 Memory Core3

Common reference point for the sensitivity of new applications to interference of shared resources

slide-11
SLIDE 11

Co-Scheduling Sensitivity

Big Core

Small Core

Memory Memory

Small Core

slide-12
SLIDE 12

Co-Scheduling Sensitivity

App1:Core1 App2:Core2 App3:Core3 App1:Core1 App2:Core3 App3:Core2 App1:Core2 App2:Core1 App3:Core3 App1:Core2 App2:Core3 App3:Core1 App1:Core3 App2:Core1 App3:Core2 App1:Core3 App2:Core2 App3:Core1 App1 MIPS1,1 MIPS1,2 ? ? ? ? App2 MIPS2,1 ? ? ? ? MIPS2,6 App3 MIPS3,1 ? MIPS3,3 ? ? ?

slide-13
SLIDE 13

App1:Core1 App2:Core2 App3:Core3 App1:Core1 App2:Core3 App3:Core2 App1:Core2 App2:Core1 App3:Core3 App1:Core2 App2:Core3 App3:Core1 App1:Core3 App2:Core1 App3:Core2 App1:Core3 App2:Core2 App3:Core1 App1 MIPS1,1 MIPS1,2 MIPS1,3 MIPS1,4 MIPS1,5 MIPS1,6 App2 MIPS2,1 MIPS2,2 MIPS2,3 MIPS2,4 MIPS2,5 MIPS2,6 App3 MIPS3,1 MIPS3,2 MIPS3,3 MIPS3,4 MIPS3,5 MIPS3,6

Co-Scheduling Sensitivity

Profile of the impact of co-scheduling applications on all combinations

  • f resources
slide-14
SLIDE 14

Initial Application Placement

Big Core

Small Core

Memory Memory

Master

Small Core

Agent Agent

slide-15
SLIDE 15

Runtime Monitoring and Rescheduling

Big Core

Small Core

Memory Memory

Master

Small Core

Agent Agent Agent

  • Increase

resources locally

  • Migrate from

smaller core to bigger core

  • Migrate across

servers

Least invasive Most invasive

slide-16
SLIDE 16

Evaluation

  • Workloads

○ Single- and multi-threaded benchmark suites ○ Latency-critical, interactive services

  • Execution scenarios

○ Simulated heterogeneous 16-core CMP ○ Real 40-server heterogeneous cluster ○ Real cluster with core-level heterogeneity using power management (DVFS)

  • Comparison schedulers

○ Greedy, Smallest-First, Mage-Static, PIE [ISCA’12], Paragon [ASPLOS’13]

slide-17
SLIDE 17

Low Error and Scheduling Overhead

Mage has low initial scheduling overhead and low estimation error

  • Reduces the need to adjust scheduling decisions frequently during application lifetime

50 100 150 200 250 300 350

Application Mix

2 4 6 8 10

Estimation Error (%)

Heterogeneous CMP

20 40 60 80 100 120 140 160

Application Mix

2 4 6 8 10

Estimation Error (%)

Heterogeneous Cluster without DVFS with DVFS

50 100 150 200 250 300 350

Application Mix

0.0 0.5 1.0 1.5 2.0

Initial Scheduling Overhead (sec)

CMP Cluster + DVFS

slide-18
SLIDE 18

Versus Greedy

Mage outperforms the Greedy scheduler by only allocating the necessary resources to meet an application’s QoS

50 100 150 200 250 300 350

Application Mix

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

Speedup Gmean

Heterogeneous CMP

20 40 60 80 100 120 140 160

Application Mix

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

Speedup Gmean

Heterogeneous Cluster

20 40 60 80 100 120 140 160

Application Mix

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

Speedup Gmean

Heterogeneous Cluster + DVFS

slide-19
SLIDE 19

Versus Smallest-First

Mage outperforms the Smallest-First scheduler by not exacerbating contention in shared resources

50 100 150 200 250 300 350

Application Mix

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

Speedup Gmean

Heterogeneous CMP

20 40 60 80 100 120 140 160

Application Mix

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

Speedup Gmean

Heterogeneous Cluster

20 40 60 80 100 120 140 160

Application Mix

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

Speedup Gmean

Heterogeneous Cluster + DVFS

slide-20
SLIDE 20

Versus Mage-Static

Mage outperforms Mage-Static by rescheduling applications that were mispredicted or that exhibit diurnal patterns

50 100 150 200 250 300 350

Application Mix

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

Speedup Gmean

Heterogeneous CMP

20 40 60 80 100 120 140 160

Application Mix

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

Speedup Gmean

Heterogeneous Cluster

20 40 60 80 100 120 140 160

Application Mix

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

Speedup Gmean

Heterogeneous Cluster + DVFS

slide-21
SLIDE 21

Versus Paragon+PIE and Paragon+Paragon

Mage outperforms Paragon+PIE and Paragon+Paragon by having a global view of resource availability and per-application resource requirements

20 40 60 80 100 120 140 160

Application Mix

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

Speedup Gmean

Heterogeneous Cluster + DVFS

20 40 60 80 100 120 140 160

Application Mix

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

Speedup Gmean

Heterogeneous Cluster + DVFS

slide-22
SLIDE 22

Sensitivity to Heterogeneity Increase

  • As degree of heterogeneity increases, the benefits of using Mage also increases
  • Results are also consistent for heterogeneous CMPs
  • Minimal scheduling overhead as degree of heterogeneity increases
slide-23
SLIDE 23

Conclusion

  • Heterogeneity is becoming more prevalent; need a scheduler that can

match applications to their resource needs

  • Mage is a tiered scheduler that bridges the gap between CMP- and

cluster-level heterogeneous scheduling

  • Mage leverages a novel staged, parallel SGD algorithm to quickly and

accurately classify applications

  • Mage is lightweight and scalable
  • Mage outperforms heterogeneity-agnostic and the sum of CMP- and

cluster-level schedulers

slide-24
SLIDE 24

Thank you! Questions?

faromero@stanford.edu

slide-25
SLIDE 25

Backup

slide-26
SLIDE 26

Versus Paragon

20 40 60 80 100 120 140 160

Application Mix

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

Speedup Gmean

Heterogeneous Cluster

slide-27
SLIDE 27

Versus PIE

50 100 150 200 250 300 350

Application Mix

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

Speedup Gmean

Heterogeneous CMP

slide-28
SLIDE 28

Partial Interference Sensitivity – SGD Step 2

App1:Core1 App2:Core2 App3:Core3 App1:Core1 App2:Core3 App3:Core2 App1:Core2 App2:Core1 App3:Core3 App1:Core3 App2:Core2 App3:Core1 App1 MIPS1,1 MIPS1,2 ? ? App2 MIPS2,1 ? ? MIPS2,6 App3 MIPS3,1 ? MIPS3,3 ?

Solution: Run SGD without those columns, and add them in afterwards

slide-29
SLIDE 29

Partial Interference Sensitivity – SGD Step 2

App1:Core1 App2:Core2 App3:Core3 App1:Core1 App2:Core3 App3:Core2 App1:Core2 App2:Core1 App3:Core3 App1:Core3 App2:Core2 App3:Core1 App1 MIPS1,1 MIPS1,2 ? ? App2 MIPS2,1 ? ? MIPS2,6 App3 MIPS3,1 ? MIPS3,3 ?

Solution: Run SGD without those columns, and add them in afterwards

ASGD1

slide-30
SLIDE 30

Partial Interference Sensitivity – SGD Step 2

App1:Core1 App2:Core2 App3:Core3 App1:Core1 App2:Core3 App3:Core2 App1:Core2 App2:Core1 App3:Core3 App1:Core3 App2:Core2 App3:Core1 App1 MIPS1,1 MIPS1,2 MIPS1,3 MIPS1,6 App2 MIPS2,1 MIPS2,2 MIPS2,3 MIPS2,6 App3 MIPS3,1 MIPS3,2 MIPS3,3 MIPS3,6

Solution: Run SGD without those columns, and add them in afterwards

ASGD1 ASGD2

slide-31
SLIDE 31

Complete Placements – SGD Step 3

App1:Core2 App2:Core3 App3:Core1 App1:Core3 App2:Core1 App3:Core2 App1 [minASGD2, maxASGD2] [minASGD2, maxASGD2] App2 [minASGD2, maxASGD2] [minASGD2, maxASGD2] App3 [minASGD2, maxASGD2] [minASGD2, maxASGD2]

Populate remaining columns with results from Partial Placements

ASGD2

slide-32
SLIDE 32

Complete Placements – SGD Step 3

App1:Core2 App2:Core3 App3:Core1 App1:Core3 App2:Core1 App3:Core2 App1 MIPS1,4 MIPS1,5 App2 MIPS2,4 MIPS2,5 App3 MIPS3,4 MIPS3,5

Select column from ASGD3 with highest geometric mean for scheduling

ASGD2 ASGD3