Fairness-Aware Scheduling on Single-ISA Heterogeneous Multi-Cores - PowerPoint PPT Presentation

Fairness-Aware Scheduling on Single-ISA Heterogeneous Multi-Cores Kenzo Van Craeynest + Shoaib Akram + Wim Heirman + Aamer Jaleel * Lieven Eeckhout + + Ghent University * VSSAD, Intel Corporation PACT 2013 - Edinburgh- September 11 th 2013

Single-ISA heterogeneous multi-cores Multiple core types – representing different power/performance trade-offs Well-established power benefits – [Kumar et al. MICRO’03, ISCA’04] Comercial examples – Big.LITTLE, Kal-El big high-performance cores … B B B small power-efficient cores S S S … S 3/1/16 Kenzo Van Craeynest 2

Prior Work: Put the Thread That Will Benefit the Most on the Big Core Many different scheduling techniques B – Static scheduling ? Chen and John, DAC’08 S – Sampling-based scheduling Kumar et al., ISCA’04; Patsilaras et al., TACO’12 – Proxies for performance Memory-domance (Becchi et al., JILP’08; Koufaty et al., EuroSys’10; Shelepov et al., OS Review’09) Age-based Scheduling (Lakshminararayana et al., SC’09) – Model-based scheduling Van Craeynest et al., ISCA’12; Lukefahr et al., MICRO’12 3/1/16 Kenzo Van Craeynest 3

Traditional Scheduling can be Suboptimal S S S B execution time Intel Information Technology , FOR INTERNAL USE ONLY 3/1/16 Kenzo Van Craeynest 4

Threads pinned on Small Cores Determine Performance normalized 4S 4x small 4B 4x big 1B3S 1x big, 3x small run-time 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Intel Information Technology , FOR INTERNAL USE ONLY

Fairness-Aware Scheduling on Single-ISA Heterogeneous Multi-Cores Scheduling methodologies that aim to improve fairness – Equal-time scheduling – Equal-progress scheduling Will show that Fairness-Aware Scheduling – Significantly improves fairness Allowing QoS, accounting,… • – Significantly reduced run-time for many multi-threaded applications over state-of-the-art throughput-optimizing scheduling Intel Information Technology , FOR INTERNAL USE ONLY 3/1/16 Kenzo Van Craeynest 6

Fairness for Heterogeneous Multi-Cores Number of cycles to execute a thread on a heterogeneous multi-core ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 𝑡𝑚𝑝𝑥𝑒𝑝𝑥𝑜 = 𝑇↓𝑗 = ¡ 𝑈↓ℎ𝑓𝑢 , 𝑗 /𝑈↓𝑐𝑗𝑕 , 𝑗 ¡ ¡ ¡ ¡ ¡ Number of cycles to execute a thread in isolation on big core Schedule is fair if slowdown of all running threads is the same 𝑔𝑏𝑗𝑠𝑜𝑓𝑡𝑡 =1 ¡− 𝑑↓𝑇 =1− 𝜏↓𝑇 /𝜈↓𝑇 =1 ¡− 𝑡𝑢𝑒 _ 𝑒𝑓𝑤 ( 𝑇 ) /𝑏𝑤𝑕 ( 𝑇 ) Coefficient of variation, a measure of unfairness Intel Information Technology , FOR INTERNAL USE ONLY 3/1/16 Kenzo Van Craeynest 7

Experimental Setup Simulated hardware small big issue width 4-wide clock frequency 2.6 GHz cache hierarchy 32KB (p) / 256 KB (p)/ 16MB (s) µarch in-order out-of-order Sniper: – parallel, hardware-validated x86-64 multi-core simulator Multi-threaded and multi-programmed workloads – spec2006, PARSEC and MapReduce Intel Information Technology , FOR INTERNAL USE ONLY 3/1/16 Kenzo Van Craeynest 8

Achieving Fairness: Equal-time Scheduling – Each thread runs for same amount of time on each core type – Can be implemented with minor changes to a Round-robin scheduler t 0 t 1 t 2 t 3 t 0 t 1 t 2 t 3 t 0 t 1 t 2 t 3 B t 1 t 0 t 0 t 0 t 3 t 3 t 3 t 2 t 2 t 2 t 1 t 1 S t 2 t 2 t 1 t 1 t 1 t 0 t 0 t 0 t 3 t 3 t 3 t 2 S t 3 t 3 t 3 t 2 t 2 t 2 t 1 t 1 t 1 t 0 t 0 t 0 S Intel Information Technology , FOR INTERNAL USE ONLY 3/1/16 Kenzo Van Craeynest 9

Optimizing for Fairness Reduces Run-time for Homogeneous Multi-Threaded Workloads 1B3S system Intel Information Technology , FOR INTERNAL USE ONLY

Equal-Time Doesn’t Guarantee Equal-Progress Some threads experience a larger slowdown than others – Equal time on different core types ≠ equal progress – Therefore fairness is not guaranteed Running on big core Running on small core S S S B execution time Intel Information Technology , FOR INTERNAL USE ONLY 3/1/16 Kenzo Van Craeynest 11

Achieving Fairness: Equal-progress Fairness-Aware Scheduling – Guarantee that all threads make the same progress compared to their big-core performance – Continuously monitor fairness and adjust schedule to achieve fairness 𝒋 ¡ ¡ ¡ ¡ 𝑇↓𝑗 = ¡ 𝑈↓ℎ𝑓𝑢 , 𝑗 /𝑈↓𝑐𝑗𝑕 , 𝑗 = 𝑈↓𝑐𝑗𝑕 , 𝑗 + 𝑈↓𝑡𝑛𝑏𝑚𝑚 , 𝑗 /𝑈↓𝑐𝑗𝑕 , 𝑗 + 𝑈↓𝑡𝑛𝑏𝑚𝑚 , 𝑗 /𝑺↓ 𝑺↓𝒋 Scale execution time on small core Overall slowdown of the thread Performance ratio between big and small core Intel Information Technology , FOR INTERNAL USE ONLY 3/1/16 Kenzo Van Craeynest 12

Estimating the Performance Ratio – Proposed 3 methods – sampling-based sampling ¡ symbiosis ¡ sampling ¡ symbiosis ¡ … R i – history-based … R i R i sampling ¡ … R i R i – model-based … PIE ¡ PIE ¡ Kenzo Van Craeynest, VSSAD intern R i ¡ R i ¡ 3/1/16 Kenzo Van Craeynest, VSSAD intern Intel Information Technology , FOR INTERNAL USE ONLY 3/1/16 3/1/16 Kenzo Van Craeynest 13

Performance Impact Estimation (PIE) [Van Craeynest et al., ISCA’12 ] 1. Determine where application spends its execution time 2. Use change in MLP exposed to predict change in CPI mem 3. Use change in ILP exposed to predict change in CPI base CPI small CPI big MLP big ILP big B CPI big MLP change S CPI small MLP small ILP small ILP change 3/1/16 Kenzo Van Craeynest 14

Fairness-aware Scheduling Across Configurations for Multi-Programmed Workloads pinned throughput-optimized equal-time equal-progress normalized throughput 1.3 1.2 1.1 1.0 0.9 1B1S 1B3S 3B1S 1B7S 7B1S fairness 100% 90% 80% 70% 60% 50% 40% 30% QoS, cycle-accounting , abstraction of heterogeneity,… 20% 10% 0% Intel Information Technology 1B1S 1B3S 3B1S 1B7S 7B1S , FOR INTERNAL USE ONLY 3/1/16 Kenzo Van Craeynest 15

Optimizing Fairness Reduces Run-time for Homogeneous Multi-Threaded Workloads Intel Information Technology , FOR INTERNAL USE ONLY

Optimizing for Fairness Reduces Run-time for Heterogeneous Multi-Threaded Workloads – Heterogeneous applications – Threads can have different performance ratio – Equal-time scheduling does not result in a fair schedule – Equal progress greatly reduces run-time over throughput- optimized AND equal-time scheduling for heterogeneous multi-threaded applications Kenzo Van Craeynest, VSSAD intern Intel Information Technology , FOR INTERNAL USE ONLY 3/1/16 3/1/16 3/1/16 Kenzo Van Craeynest 17

Fairness-aware Scheduling Across Configurations for Homogeneous Multi-Threaded Workloads Kenzo Van Craeynest, VSSAD intern Intel Information Technology , FOR INTERNAL USE ONLY 3/1/16 3/1/16 3/1/16 Kenzo Van Craeynest 18

Conclusions and Contributions Proposed Fairness-optimizing scheduling – Two methods: equal-time and equal-progress Multi-program workloads – Achieves average fairness of 86% for a 1B3S system while within 3.6% performance of throughput-optimizing scheduling – Allows for QoS, cycle-accounting, etc. in heterogeneous systems Multi-threaded workloads – Unfair performance results in no performance benefits from heterogeneity – Threads running on a big core wait at barriers for threads running on small core – Average 14% (and up to 25%) performance improvement over pinned scheduling Kenzo Van Craeynest, VSSAD intern 3/1/16 3/1/16 Kenzo Van Craeynest 19

Questions? 3/1/16 Kenzo Van Craeynest 20

Fairness-Aware Scheduling on Single-ISA Heterogeneous Multi-Cores - PowerPoint PPT Presentation

Fairness-Aware Scheduling on Single-ISA Heterogeneous Multi-Cores Kenzo Van Craeynest + Shoaib Akram + Wim Heirman + Aamer Jaleel * Lieven Eeckhout + + Ghent University * VSSAD, Intel Corporation PACT 2013 - Edinburgh- September 11 th 2013

Corporate Presentation December 2019 Agenda Overview ISA Group 1 Overview ISA Group in Per

ISAs and Y86-64 Samira Khan Agenda ISA vs Microarchitecture ISA Tradeoffs Y86-64 ISA

Instructions and Addressing 1 ISA vs. Microarchitecture ISA vs. Microarchitecture An ISA or

ISA Implementations Partly in Run programs for one ISA on hardware with different ISA Techniques:

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

arXiv:1710.06921v1 [cs.CY] 18 Oct 2017 ABSTRACT and fairness-aware ML methods [6, 7, 8, 9, 10,

INSTITUTIONAL PRESENTATION 1 Q 2 0 | R E S U L T S ISA Viso geral CTEEP ISA CTEEP in

Van Hove Limit for Infinitely Extended Open Quantum Systems David Taj Physics Dept., University

Van der Waerden spaces and their relatives Jana Flakov Department of Mathematics University

Tutorial on XRF Data Analysis Piet Van Espen piet.vanespen@uantwerpen.be 21 Nov 2014 1 X-ray

ABSENCE: Usage-based Failure Detection in Mobile Networks Binh Nguyen , Zihui Ge, Jacobus Van der

NSI & SDN Guy Roberts, DANTE GLIF Chicago, October 12th, 2012 NSI v 2.0 Plugest

rt ttt rt

Elections, Computer Security, and Electronic Voting CS161 4/19/2010 David Wagner #1 #2 #3

aks249 Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield Topic :

Fairness-Aware Scheduling on Single-ISA Heterogeneous Multi-Cores - PowerPoint PPT Presentation

Fairness-Aware Scheduling on Single-ISA Heterogeneous Multi-Cores Kenzo Van Craeynest + Shoaib Akram + Wim Heirman + Aamer Jaleel * Lieven Eeckhout + + Ghent University * VSSAD, Intel Corporation PACT 2013 - Edinburgh- September 11 th 2013

Corporate Presentation December 2019 Agenda Overview ISA Group 1 Overview ISA Group in Per

ISAs and Y86-64 Samira Khan Agenda ISA vs Microarchitecture ISA Tradeoffs Y86-64 ISA

Instructions and Addressing 1 ISA vs. Microarchitecture ISA vs. Microarchitecture An ISA or

ISA Implementations Partly in Run programs for one ISA on hardware with different ISA Techniques:

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

arXiv:1710.06921v1 [cs.CY] 18 Oct 2017 ABSTRACT and fairness-aware ML methods [6, 7, 8, 9, 10,

INSTITUTIONAL PRESENTATION 1 Q 2 0 | R E S U L T S ISA Viso geral CTEEP ISA CTEEP in

Van Hove Limit for Infinitely Extended Open Quantum Systems David Taj Physics Dept., University

Van der Waerden spaces and their relatives Jana Flakov Department of Mathematics University

Tutorial on XRF Data Analysis Piet Van Espen piet.vanespen@uantwerpen.be 21 Nov 2014 1 X-ray

ABSENCE: Usage-based Failure Detection in Mobile Networks Binh Nguyen , Zihui Ge, Jacobus Van der

NSI &amp; SDN Guy Roberts, DANTE GLIF Chicago, October 12th, 2012 NSI v 2.0 Plugest

rt ttt rt

Elections, Computer Security, and Electronic Voting CS161 4/19/2010 David Wagner #1 #2 #3

aks249 Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield Topic :

NSI & SDN Guy Roberts, DANTE GLIF Chicago, October 12th, 2012 NSI v 2.0 Plugest