Mage: Online and Interference-Aware Scheduling for Multi-Scale Heterogeneous Systems
Francisco Romero1 and Christina Delimitrou2
1Stanford University, 2Cornell University
PACT – Session 4a – November 2, 2018
Mage: Online and Interference-Aware Scheduling for Multi-Scale - - PowerPoint PPT Presentation
Mage: Online and Interference-Aware Scheduling for Multi-Scale Heterogeneous Systems Francisco Romero 1 and Christina Delimitrou 2 1 Stanford University, 2 Cornell University PACT Session 4a November 2, 2018 Motivation Heterogeneity
Francisco Romero1 and Christina Delimitrou2
1Stanford University, 2Cornell University
PACT – Session 4a – November 2, 2018
prevalent
e.g., power management
match applications to the underlying hardware
application performance and maintain high resource utilization?
Big Core Small Core Memory Memory App 1 App 2 Small Core
System Heterogeneous Clusters Heterogeneous CMPs Paragon ✓ ❌ Whare-map ✓ ❌ Bubble-flux ✓ ❌ Composite cores ❌ ✓ Hass ❌ ✓ PIE ❌ ✓
Heterogeneous Cluster Scheduler Heterogeneous CMP Scheduler
Heterogeneous Cluster + CMP Scheduler Exhaustive search
placements
and 11% average improvement compared to a heterogeneity- and interference- aware scheduler
Big Core
Small Core
Memory Memory Agent Agent
Master
Small Core
Mage Master
resource scheduling decision
be migrated/rescheduled Mage Agent
all scheduled applications
violations occur
Big Core
Small Core
Memory Memory Agent Agent
Master
Small Core
App1:Core1 App2:Core2 App3:Core3 App1:Core1 App2:Core3 App3:Core2 … App1:Core3 App2:Core2 App3:Core1 App1 MIPS1,1 MIPS1,2 … MIPS1,6 App2 MIPS2,1 MIPS2,2 … MIPS2,6 App3 MIPS3,1 MIPS3,2 … MIPS3,6 Applications Application-to-Resource
✓ Heterogeneous resources that benefit an application ✓ Performance impact of co-scheduling applications
to uncover:
Apps 4 2 3 9 App-to-Resource Sparse Utility Matrix SVD Decomposed Matrices
SGD 4 5 3 1 2 7 2 3 9 Reconstructed Utility Matrix
[Network]
[CPU] …
[Cache] App1 MIPS1,1 MIPS1,2 … ? App2 ? ? … MIPS2,n App3 MIPS3,1 ? … ?
[Network]
[CPU] …
[Cache] App1 MIPS1,1 MIPS1,2 … MIPS1,n App2 MIPS2,1 MIPS2,2 … MIPS2,n App3 MIPS3,1 MIPS3,2 … MIPS3,n
Core1 Core2 Memory Core3
Common reference point for the sensitivity of new applications to interference of shared resources
Big Core
Small Core
Memory Memory
Small Core
App1:Core1 App2:Core2 App3:Core3 App1:Core1 App2:Core3 App3:Core2 App1:Core2 App2:Core1 App3:Core3 App1:Core2 App2:Core3 App3:Core1 App1:Core3 App2:Core1 App3:Core2 App1:Core3 App2:Core2 App3:Core1 App1 MIPS1,1 MIPS1,2 ? ? ? ? App2 MIPS2,1 ? ? ? ? MIPS2,6 App3 MIPS3,1 ? MIPS3,3 ? ? ?
App1:Core1 App2:Core2 App3:Core3 App1:Core1 App2:Core3 App3:Core2 App1:Core2 App2:Core1 App3:Core3 App1:Core2 App2:Core3 App3:Core1 App1:Core3 App2:Core1 App3:Core2 App1:Core3 App2:Core2 App3:Core1 App1 MIPS1,1 MIPS1,2 MIPS1,3 MIPS1,4 MIPS1,5 MIPS1,6 App2 MIPS2,1 MIPS2,2 MIPS2,3 MIPS2,4 MIPS2,5 MIPS2,6 App3 MIPS3,1 MIPS3,2 MIPS3,3 MIPS3,4 MIPS3,5 MIPS3,6
Profile of the impact of co-scheduling applications on all combinations
Big Core
Small Core
Memory Memory
Master
Small Core
Agent Agent
Big Core
Small Core
Memory Memory
Master
Small Core
Agent Agent Agent
resources locally
smaller core to bigger core
servers
Least invasive Most invasive
○ Single- and multi-threaded benchmark suites ○ Latency-critical, interactive services
○ Simulated heterogeneous 16-core CMP ○ Real 40-server heterogeneous cluster ○ Real cluster with core-level heterogeneity using power management (DVFS)
○ Greedy, Smallest-First, Mage-Static, PIE [ISCA’12], Paragon [ASPLOS’13]
Mage has low initial scheduling overhead and low estimation error
50 100 150 200 250 300 350
Application Mix
2 4 6 8 10
Estimation Error (%)
Heterogeneous CMP
20 40 60 80 100 120 140 160
Application Mix
2 4 6 8 10
Estimation Error (%)
Heterogeneous Cluster without DVFS with DVFS
50 100 150 200 250 300 350
Application Mix
0.0 0.5 1.0 1.5 2.0
Initial Scheduling Overhead (sec)
CMP Cluster + DVFS
Mage outperforms the Greedy scheduler by only allocating the necessary resources to meet an application’s QoS
50 100 150 200 250 300 350
Application Mix
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
Speedup Gmean
Heterogeneous CMP
20 40 60 80 100 120 140 160
Application Mix
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
Speedup Gmean
Heterogeneous Cluster
20 40 60 80 100 120 140 160
Application Mix
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
Speedup Gmean
Heterogeneous Cluster + DVFS
Mage outperforms the Smallest-First scheduler by not exacerbating contention in shared resources
50 100 150 200 250 300 350
Application Mix
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
Speedup Gmean
Heterogeneous CMP
20 40 60 80 100 120 140 160
Application Mix
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
Speedup Gmean
Heterogeneous Cluster
20 40 60 80 100 120 140 160
Application Mix
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
Speedup Gmean
Heterogeneous Cluster + DVFS
Mage outperforms Mage-Static by rescheduling applications that were mispredicted or that exhibit diurnal patterns
50 100 150 200 250 300 350
Application Mix
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
Speedup Gmean
Heterogeneous CMP
20 40 60 80 100 120 140 160
Application Mix
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
Speedup Gmean
Heterogeneous Cluster
20 40 60 80 100 120 140 160
Application Mix
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
Speedup Gmean
Heterogeneous Cluster + DVFS
Mage outperforms Paragon+PIE and Paragon+Paragon by having a global view of resource availability and per-application resource requirements
20 40 60 80 100 120 140 160
Application Mix
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
Speedup Gmean
Heterogeneous Cluster + DVFS
20 40 60 80 100 120 140 160
Application Mix
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
Speedup Gmean
Heterogeneous Cluster + DVFS
match applications to their resource needs
cluster-level heterogeneous scheduling
accurately classify applications
cluster-level schedulers
faromero@stanford.edu
20 40 60 80 100 120 140 160
Application Mix
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
Speedup Gmean
Heterogeneous Cluster
50 100 150 200 250 300 350
Application Mix
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
Speedup Gmean
Heterogeneous CMP
App1:Core1 App2:Core2 App3:Core3 App1:Core1 App2:Core3 App3:Core2 App1:Core2 App2:Core1 App3:Core3 App1:Core3 App2:Core2 App3:Core1 App1 MIPS1,1 MIPS1,2 ? ? App2 MIPS2,1 ? ? MIPS2,6 App3 MIPS3,1 ? MIPS3,3 ?
Solution: Run SGD without those columns, and add them in afterwards
App1:Core1 App2:Core2 App3:Core3 App1:Core1 App2:Core3 App3:Core2 App1:Core2 App2:Core1 App3:Core3 App1:Core3 App2:Core2 App3:Core1 App1 MIPS1,1 MIPS1,2 ? ? App2 MIPS2,1 ? ? MIPS2,6 App3 MIPS3,1 ? MIPS3,3 ?
Solution: Run SGD without those columns, and add them in afterwards
App1:Core1 App2:Core2 App3:Core3 App1:Core1 App2:Core3 App3:Core2 App1:Core2 App2:Core1 App3:Core3 App1:Core3 App2:Core2 App3:Core1 App1 MIPS1,1 MIPS1,2 MIPS1,3 MIPS1,6 App2 MIPS2,1 MIPS2,2 MIPS2,3 MIPS2,6 App3 MIPS3,1 MIPS3,2 MIPS3,3 MIPS3,6
Solution: Run SGD without those columns, and add them in afterwards
App1:Core2 App2:Core3 App3:Core1 App1:Core3 App2:Core1 App3:Core2 App1 [minASGD2, maxASGD2] [minASGD2, maxASGD2] App2 [minASGD2, maxASGD2] [minASGD2, maxASGD2] App3 [minASGD2, maxASGD2] [minASGD2, maxASGD2]
Populate remaining columns with results from Partial Placements
App1:Core2 App2:Core3 App3:Core1 App1:Core3 App2:Core1 App3:Core2 App1 MIPS1,4 MIPS1,5 App2 MIPS2,4 MIPS2,5 App3 MIPS3,4 MIPS3,5
Select column from ASGD3 with highest geometric mean for scheduling