Fairness-Aware Scheduling on Single-ISA Heterogeneous Multi-Cores - - PowerPoint PPT Presentation

fairness aware scheduling on single isa heterogeneous
SMART_READER_LITE
LIVE PREVIEW

Fairness-Aware Scheduling on Single-ISA Heterogeneous Multi-Cores - - PowerPoint PPT Presentation

Fairness-Aware Scheduling on Single-ISA Heterogeneous Multi-Cores Kenzo Van Craeynest + Shoaib Akram + Wim Heirman + Aamer Jaleel * Lieven Eeckhout + + Ghent University * VSSAD, Intel Corporation PACT 2013 - Edinburgh- September 11 th 2013


slide-1
SLIDE 1

Fairness-Aware Scheduling on Single-ISA Heterogeneous Multi-Cores

PACT 2013 - Edinburgh- September 11th 2013

Kenzo Van Craeynest+ Shoaib Akram+ Wim Heirman + Aamer Jaleel* Lieven Eeckhout+

+ Ghent University * VSSAD, Intel Corporation

slide-2
SLIDE 2

Multiple core types

– representing different power/performance trade-offs

Well-established power benefits

– [Kumar et al. MICRO’03, ISCA’04]

Comercial examples

– Big.LITTLE, Kal-El

small power-efficient cores big high-performance cores

Single-ISA heterogeneous multi-cores

B S B B

S S S

Kenzo Van Craeynest 3/1/16 2

slide-3
SLIDE 3

Prior Work: Put the Thread That Will Benefit the Most on the Big Core

Kenzo Van Craeynest 3/1/16 3

Many different scheduling techniques

– Static scheduling

Chen and John, DAC’08

– Sampling-based scheduling

Kumar et al., ISCA’04; Patsilaras et al., TACO’12

– Proxies for performance

Memory-domance (Becchi et al., JILP’08; Koufaty et al., EuroSys’10; Shelepov et al., OS Review’09)

Age-based Scheduling (Lakshminararayana et al., SC’09) – Model-based scheduling

Van Craeynest et al., ISCA’12; Lukefahr et al., MICRO’12

? B S

slide-4
SLIDE 4

Intel Information Technology

, FOR INTERNAL USE ONLY

Traditional Scheduling can be Suboptimal

execution time

S B S S

Kenzo Van Craeynest 3/1/16 4

slide-5
SLIDE 5

Intel Information Technology

, FOR INTERNAL USE ONLY

Threads pinned on Small Cores Determine Performance

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

4S 4B 1B3S

normalized run-time

4x small 4x big 1x big, 3x small

slide-6
SLIDE 6

Intel Information Technology

, FOR INTERNAL USE ONLY

Fairness-Aware Scheduling on Single-ISA Heterogeneous Multi-Cores

Scheduling methodologies that aim to improve fairness

– Equal-time scheduling – Equal-progress scheduling

Will show that Fairness-Aware Scheduling

– Significantly improves fairness

  • Allowing QoS, accounting,…

– Significantly reduced run-time for many multi-threaded applications

  • ver state-of-the-art throughput-optimizing scheduling

Kenzo Van Craeynest 3/1/16 6

slide-7
SLIDE 7

Intel Information Technology

, FOR INTERNAL USE ONLY

Fairness for Heterogeneous Multi-Cores

¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡​𝑡𝑚𝑝𝑥𝑒𝑝𝑥𝑜=𝑇↓𝑗 = ¡​𝑈↓ℎ𝑓𝑢,𝑗 /​𝑈↓𝑐𝑗𝑕,𝑗 ¡ ¡ ¡ ¡ ¡

Schedule is fair if slowdown of all running threads is the same

𝑔𝑏𝑗𝑠𝑜𝑓𝑡𝑡=1 ¡−​𝑑↓𝑇 =1−​𝜏↓𝑇 /​𝜈↓𝑇 =1 ¡−​𝑡𝑢𝑒_𝑒𝑓𝑤(𝑇)/𝑏𝑤𝑕(𝑇)

Number of cycles to execute a thread in isolation on big core Number of cycles to execute a thread on a heterogeneous multi-core Coefficient of variation, a measure of unfairness

Kenzo Van Craeynest 3/1/16 7

slide-8
SLIDE 8

Intel Information Technology

, FOR INTERNAL USE ONLY

Simulated hardware Sniper:

– parallel, hardware-validated x86-64 multi-core simulator

Multi-threaded and multi-programmed workloads

– spec2006, PARSEC and MapReduce

Experimental Setup

small big

issue width 4-wide clock frequency 2.6 GHz cache hierarchy 32KB (p) / 256 KB (p)/ 16MB (s) µarch in-order

  • ut-of-order

Kenzo Van Craeynest 3/1/16 8

slide-9
SLIDE 9

Intel Information Technology

, FOR INTERNAL USE ONLY

Achieving Fairness: Equal-time Scheduling

– Each thread runs for same amount of time on each core type

– Can be implemented with minor changes to a Round-robin scheduler

t1 t0 t0 t0 t2 t2 t1 t1 t3 t3 t3 t2 t0 t1 t2 t3 t3 t3 t3 t2 t1 t0 t0 t0 t2 t2 t1 t1 t0 t1 t2 t3 t2 t2 t1 t1 t3 t3 t3 t2 t1 t0 t0 t0 t0 t1 t2 t3 S S S B

Kenzo Van Craeynest 3/1/16 9

slide-10
SLIDE 10

Intel Information Technology

, FOR INTERNAL USE ONLY

Optimizing for Fairness Reduces Run-time for Homogeneous Multi-Threaded Workloads

1B3S system

slide-11
SLIDE 11

Intel Information Technology

, FOR INTERNAL USE ONLY

Equal-Time Doesn’t Guarantee Equal-Progress

Running on small core Running on big core

S B S S

execution time

Some threads experience a larger slowdown than others – Equal time on different core types ≠ equal progress – Therefore fairness is not guaranteed

Kenzo Van Craeynest 3/1/16 11

slide-12
SLIDE 12

Intel Information Technology

, FOR INTERNAL USE ONLY

Achieving Fairness: Equal-progress Fairness-Aware Scheduling

– Guarantee that all threads make the same progress compared to their big-core performance

– Continuously monitor fairness and adjust schedule to achieve fairness ​𝑇↓𝑗 = ¡​𝑈↓ℎ𝑓𝑢,𝑗 /​𝑈↓𝑐𝑗𝑕,𝑗 =​𝑈↓𝑐𝑗𝑕,𝑗 +​𝑈↓𝑡𝑛𝑏𝑚𝑚,𝑗 /​𝑈↓𝑐𝑗𝑕,𝑗 +​𝑈↓𝑡𝑛𝑏𝑚𝑚,𝑗 /​𝑺↓ 𝑺↓𝒋 𝒋 ¡ ¡ ¡ ¡

Scale execution time

  • n small core

Overall slowdown of the thread Performance ratio between big and small core

Kenzo Van Craeynest 3/1/16 12

slide-13
SLIDE 13

Intel Information Technology

, FOR INTERNAL USE ONLY

Estimating the Performance Ratio

– Proposed 3 methods

– sampling-based – history-based – model-based

Kenzo Van Craeynest, VSSAD intern 3/1/16

sampling ¡ symbiosis ¡ sampling ¡ symbiosis ¡

Ri … Ri Ri … …

sampling ¡

Ri Ri

Ri ¡

PIE ¡ Ri ¡ PIE ¡

3/1/16 Kenzo Van Craeynest, VSSAD intern Kenzo Van Craeynest 3/1/16 13

slide-14
SLIDE 14

Performance Impact Estimation (PIE)

CPIbig MLPbig ILPbig CPIsmall MLPsmall ILPsmall

CPIbig

ILP change

CPIsmall MLP change

S B

  • 1. Determine where application spends its execution time
  • 2. Use change in MLP exposed to predict change in CPImem
  • 3. Use change in ILP exposed to predict change in CPIbase

[Van Craeynest et al., ISCA’12]

Kenzo Van Craeynest 3/1/16 14

slide-15
SLIDE 15

Intel Information Technology

, FOR INTERNAL USE ONLY

Fairness-aware Scheduling Across Configurations for Multi-Programmed Workloads

0.9 1.0 1.1 1.2 1.3

1B1S 1B3S 3B1S 1B7S 7B1S

pinned throughput-optimized equal-time equal-progress

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

1B1S 1B3S 3B1S 1B7S 7B1S

Kenzo Van Craeynest 3/1/16 15

normalized throughput fairness

QoS, cycle-accounting , abstraction of heterogeneity,…

slide-16
SLIDE 16

Intel Information Technology

, FOR INTERNAL USE ONLY

Optimizing Fairness Reduces Run-time for Homogeneous Multi-Threaded Workloads

slide-17
SLIDE 17

Intel Information Technology

, FOR INTERNAL USE ONLY

Optimizing for Fairness Reduces Run-time for Heterogeneous Multi-Threaded Workloads

Kenzo Van Craeynest, VSSAD intern 3/1/16

3/1/16 Kenzo Van Craeynest 3/1/16 17

– Heterogeneous applications

– Threads can have different performance ratio – Equal-time scheduling does not result in a fair schedule

– Equal progress greatly reduces run-time over throughput-

  • ptimized AND equal-time

scheduling for heterogeneous multi-threaded applications

slide-18
SLIDE 18

Intel Information Technology

, FOR INTERNAL USE ONLY 3/1/16

Fairness-aware Scheduling Across Configurations for Homogeneous Multi-Threaded Workloads

3/1/16 Kenzo Van Craeynest, VSSAD intern Kenzo Van Craeynest 3/1/16 18

slide-19
SLIDE 19

3/1/16

Conclusions and Contributions

Kenzo Van Craeynest, VSSAD intern

Proposed Fairness-optimizing scheduling

– Two methods: equal-time and equal-progress

Multi-program workloads

– Achieves average fairness of 86% for a 1B3S system while within 3.6% performance of throughput-optimizing scheduling – Allows for QoS, cycle-accounting, etc. in heterogeneous systems

Multi-threaded workloads

– Unfair performance results in no performance benefits from heterogeneity

– Threads running on a big core wait at barriers for threads running

  • n small core

– Average 14% (and up to 25%) performance improvement over pinned scheduling

Kenzo Van Craeynest 3/1/16 19

slide-20
SLIDE 20

Questions?

Kenzo Van Craeynest 3/1/16 20