POWER-AWARE JOB SCHEDULING Maximizing Data Center Performance - PowerPoint PPT Presentation

POWER-‑AWARE ¡JOB ¡SCHEDULING ¡ Maximizing ¡Data ¡Center ¡Performance ¡Under ¡Strict ¡Power ¡Budget ¡ Osman ¡Sarood, ¡ Akhil ¡Langer , ¡Abhishek ¡Gupta, ¡Laxmikant ¡Kale ¡ ¡ Parallel ¡Programming ¡Laboratory ¡ Department ¡of ¡Computer ¡Science ¡ University ¡of ¡Illinois ¡at ¡Urbana-‑Champaign ¡ ¡ 29 th ¡April ¡2014 ¡ ¡ ¡

Major ¡Challenges ¡to ¡Achieve ¡Exascale 1 ¡ q Energy ¡and ¡Power ¡Challenge ¡ q Memory ¡and ¡Storage ¡Challenge ¡ q Concurrency ¡and ¡Locality ¡Challenge ¡ q Resiliency ¡Challenge ¡ Kogge, ¡Peter, ¡et ¡al. ¡"Exascale ¡compuWng ¡study: ¡Technology ¡challenges ¡in ¡achieving ¡exascale ¡systems." ¡(2008). ¡ 3/4/15 ¡ Power-‑Aware ¡Job ¡Scheduling ¡ 2 ¡

Major ¡Challenges ¡to ¡Achieve ¡Exascale 1 ¡ Power ¡consumpWon ¡for ¡Top500 ¡ Exascale ¡in ¡ 20MW! ¡ Kogge, ¡Peter, ¡et ¡al. ¡"Exascale ¡compuWng ¡study: ¡Technology ¡challenges ¡in ¡achieving ¡exascale ¡systems." ¡(2008). ¡ 3/4/15 ¡ Power-‑Aware ¡Job ¡Scheduling ¡ 3 ¡

Data ¡Center ¡Power ¡ How ¡is ¡data ¡center ¡power ¡need ¡calculated? ¡ ¡ q using ¡Thermal ¡Design ¡Power ¡(TDP) ¡of ¡nodes ¡ However, ¡TDP ¡is ¡hardly ¡reached!! ¡ ¡ ¡ SoluWon ¡ q constrain ¡power ¡consumpWon ¡of ¡nodes ¡ q Overprovisioning ¡-‑ ¡Use ¡more ¡nodes ¡than ¡convenWonal ¡ data ¡center ¡for ¡the ¡same ¡power ¡budget ¡ ¡ ¡ 3/4/15 ¡ Power-‑Aware ¡Job ¡Scheduling ¡ 4 ¡ ¡

DistribuWon ¡of ¡Node ¡Power ¡ConsumpWon ¡ Power ¡distribuWon ¡for ¡BG/Q ¡ processor ¡on ¡Mira ¡ q 76% ¡by ¡CPU/Memory ¡ q No ¡good ¡mechanism ¡for ¡ controlling ¡other ¡power ¡ domains ¡ ¡ ¡ ¡ Pie ¡Chart: ¡Sean ¡Wallace, ¡Measuring ¡Power ¡ConsumpWon ¡on ¡IBM ¡Blue ¡Gene/Q ¡ 3/4/15 ¡ Power-‑Aware ¡Job ¡Scheduling ¡ 5 ¡

Constraining ¡CPU/Memory ¡Power ¡ Intel ¡Sandy ¡Bridge ¡ q Running ¡Average ¡Power ¡Limit ¡(RAPL) ¡library ¡ Ø measure ¡and ¡set ¡CPU/memory ¡power ¡ 3/4/15 ¡ Power-‑Aware ¡Job ¡Scheduling ¡ 6 ¡

ApplicaWon ¡Performance ¡with ¡Power ¡ Configura7on ¡ ¡ p c : ¡CPU ¡power ¡cap ¡ P m :Memory ¡power ¡cap ¡ (n ¡x ¡p c , ¡p m ¡) ¡ q ApplicaWon ¡performance ¡does ¡not ¡ improve ¡proporWonately ¡with ¡ increase ¡in ¡power ¡cap ¡ q Beher ¡is ¡to ¡run ¡on ¡larger ¡number ¡ of ¡nodes ¡each ¡capped ¡at ¡lower ¡ power ¡level ¡ (12x44,18) ¡ ¡ (20x32,10) ¡ ¡ Performance ¡of ¡LULESH ¡at ¡different ¡configuraWons ¡ 3/4/15 ¡ Power-‑Aware ¡Job ¡Scheduling ¡ 7 ¡

Problem ¡Statement ¡ ¡ ¡ Maximizing ¡Data ¡Center ¡Performance ¡Under ¡ ¡Strict ¡ Power ¡Budget ¡ ¡ Data ¡center ¡capabiliWes ¡and ¡job ¡features ¡ q Power ¡capping ¡ability ¡ q Overprovisioning ¡ q Moldability ¡(OpWonal) ¡ q Malleability ¡(OpWonal) ¡ Ø Charm++ ¡ Ø Dynamic ¡MPI ¡ 3/4/15 ¡ Power-‑Aware ¡Job ¡Scheduling ¡ 8 ¡

Power ¡Aware ¡Resource ¡Manager ¡(PARM) ¡ JOB ¡PROFILER ¡ EXECUTION ¡ SCHEDULER ¡ PASS ¡MODEL ¡ FRAMEWORK ¡ ¡ q SHRINK/EXPAND ¡JOBS ¡ q APPLY ¡POWER ¡CAPS ¡ ¡ JOB ¡QUEUE ¡ JOB ¡ TRIGGERS ¡ JOB ¡ARRIVAL ¡ TERMINATION ¡ 3/4/15 ¡ Power-‑Aware ¡Job ¡Scheduling ¡ 9 ¡

JOB ¡PROFILER ¡ q Measure ¡job ¡performance ¡at ¡various ¡scales ¡ and ¡cpu ¡power ¡caps ¡ q Power ¡Aware ¡Strong ¡Scaling ¡(PASS) ¡Model ¡ ¡ Ø Predict ¡job ¡performance ¡at ¡any ¡(n, ¡p) ¡ 3/4/15 ¡ Power-‑Aware ¡Job ¡Scheduling ¡ 10 ¡

Power ¡Aware ¡Strong ¡Scaling ¡(PASS) ¡Model ¡ Time ¡vs ¡Scale ¡ Downey’s ¡strong ¡scaling ¡ q p core : ¡core ¡power ¡ Time ¡vs ¡Frequency ¡ t = F ( n , A , σ ) q g i : ¡cost ¡level ¡I ¡cache ¡access ¡ q L i : ¡#level ¡I ¡accesses ¡ 8 W cpu + T mem , for f < f h q n: ¡number ¡of ¡nodes ¡ < q g m : ¡cost ¡of ¡mem ¡access ¡ t ( f ) = f q A: ¡Average ¡Parallelism ¡ q M: ¡#mem ¡accesses ¡ T h , for f ≥ f h : q ¡σ ¡: ¡duraWon ¡of ¡parallelism ¡A ¡ q p base : ¡idle ¡power ¡ Frequency ¡vs ¡Power ¡ q W cpu : ¡CPU ¡work ¡ xpressed as [31]: q T mem : ¡memory ¡work ¡ 3 X p = p core + g i L i + g m M + p base q T h ¡ : ¡ ¡ ¡minimum ¡exec ¡Wme ¡ ¡ i =1 is the base/static package power consumption. Time ¡as ¡a ¡funcDon ¡of ¡power ¡and ¡number ¡of ¡nodes ¡ ¡ 3/4/15 ¡ Power-‑Aware ¡Job ¡Scheduling ¡ 11 ¡

Scheduler: ¡Integer ¡Linear ¡Program ¡FormulaWon ¡ Objective Function X X X w j ∗ s j,n,p ∗ x j,n,p j ∈ J n ∈ N j p ∈ P j Select One Resource Combination Per Job X X x j,n,p ≤ 1 ∀ j ∈ I n ∈ N j p ∈ P j X X x j,n,p = 1 ∀ j ∈ I n ∈ N j p ∈ P j Bounding total nodes X X X nx j,n,p ≤ N j ∈ J p ∈ P j n ∈ N j Bounding power consumption X X X ( n ∗ ( p + W base )) x j,n,p ≤ W max j ∈ J n ∈ N j p ∈ P j Disable Malleability (Optional) X X nx j,n,p = n j ∀ j ∈ I n ∈ N j p ∈ P j 3/4/15 ¡ Power-‑Aware ¡Job ¡Scheduling ¡ 13 ¡

Scheduler: ¡ObjecWve ¡FuncWon ¡ q Maximizing ¡throughput ¡makes ¡ILP ¡opWmizaWon ¡infeasible ¡ q Maximize ¡sum ¡of ¡power-‑aware ¡speedup ¡of ¡selected ¡jobs: ¡ s j,n,p = t j,min ( N j ) ,min ( P j ) t j,n,p is the speedup of job executing 3/4/15 ¡ Power-‑Aware ¡Job ¡Scheduling ¡ 14 ¡

Experimental ¡Setup ¡ q ApplicaWons ¡ q Job ¡Dataset ¡ Ø Memory-‑intensive ¡ q ¡β ¡corresponds ¡to ¡CPU ¡sensiWvity ¡ § Jacobi ¡and ¡Wave2D ¡ q SetL: ¡Mix ¡of ¡apps ¡with ¡average ¡β=0.1 ¡ Ø ComputaWon-‑intensive ¡ q SetH: ¡Mix ¡of ¡apps ¡with ¡average ¡β=0.27 ¡ § LeanMD ¡ Ø Mixed ¡ § AMR ¡and ¡Lulesh ¡ q Power ¡Budget ¡ q Testbed ¡ q 38-‑node ¡Intel ¡Sandy ¡Bridge ¡ q CPU ¡power ¡levels={30, ¡32, ¡34, ¡39, ¡45, ¡55}W ¡ q 6 ¡physical ¡cores, ¡16GB ¡RAM ¡ ¡ q Node ¡power ¡consumpWon= ¡116W ¡ q Power ¡capping ¡using ¡RAPL ¡ q Power ¡Budget ¡= ¡3000W ¡ q CPU ¡power ¡cap ¡range ¡[25-‑95]W ¡ q #nodes ¡in ¡tradiWonal ¡data ¡center ¡= ¡28 ¡ ¡ 3/4/15 ¡ Power-‑Aware ¡Job ¡Scheduling ¡ 16 ¡

EsWmaWng ¡Performance ¡using ¡PASS ¡ 1.8 LeanMD AMR 1.7 Model ¡Parameters ¡ Lulesh Wave2D 1.6 Power − aware speedup Jacobi2D 1.5 1.4 1.3 1.2 1.1 1 30 35 40 45 50 55 60 CPU power (W) 3/4/15 ¡ Power-‑Aware ¡Job ¡Scheduling ¡ 17 ¡

PARM ¡Performance ¡Results ¡ Average ¡CompleWon ¡Wmes ¡ 300 Descrip(on ¡ SLURM noMM q noMM : ¡without ¡Malleability ¡and ¡Moldability ¡ Average completion time (mins) 250 noSE wSE q noSE : ¡ ¡ ¡ ¡with ¡Moldability ¡but ¡no ¡Malleability ¡ 200 q wSE : ¡ ¡ ¡ ¡ ¡ ¡with ¡Moldability ¡and ¡Malleability ¡ 150 Performance ¡ q 32% ¡improvement ¡with ¡nMM ¡over ¡SLURM ¡ 100 q 13.9% ¡improvement ¡with ¡noSE ¡over ¡noMM ¡ 50 q 7.5% ¡improvement ¡with ¡wSE ¡over ¡noSE ¡ q 1.7X ¡improvement ¡in ¡throughput ¡ 0 SetL SetH 3/4/15 ¡ Power-‑Aware ¡Job ¡Scheduling ¡ 18 ¡

POWER-AWARE JOB SCHEDULING Maximizing Data Center Performance - PowerPoint PPT Presentation

POWER-AWARE JOB SCHEDULING Maximizing Data Center Performance Under Strict Power Budget Osman Sarood, Akhil Langer , Abhishek Gupta, Laxmikant Kale Parallel

Points of Pride: What we have accomplished so far! Created Job Framework 24 Job Groups/Job

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING Lecture 16 Job Shop 1. Job Shop

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Communication-aware Job Scheduling using SLURM Priya Mishra, Tushar Agrawal, Preeti Malakar

(power x 0) == 1 (power x (+ n 1)) == (* (power x n) x) (power x 0) == 1 (power x (+ (* 2 m)

Ponchatoula High School Scheduling for your Junior Year 2015-2016 Scheduling Procedures Online

CPU Scheduling Schedulers in the OS Structure of a CPU Scheduler Scheduling =

knowledge, motivation and ideas towards Low Carbon Energy Efficient operations Prof Osman

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

BK-HT Semi Premium Connection Precision Connections BK-HT Semi Premium Connection Designed

Recognizing and Eliminating Flux Concentrator Failures Mr. Robert Ruffini President

The Deal Right Under Your Nose Adding Value to Clients through Worker-Owner Transactions

McIndoe the Maestro East Grinstead Museum WARNING: There will be some graphic images used in

Students Perceptions for Em Employability Process Flow Chart for Getti tting a job 1

Job Safety Analysis (JSA) Worker Performing Job from Elevation 1 1 Topics Safety Controls