FOR L ATENCY -C RITICAL S YSTEMS H ARSHAD K ASTURE , D AVIDE B - PowerPoint PPT Presentation

R UBIK : F AST A NALYTICAL P OWER M ANAGEMENT FOR L ATENCY -C RITICAL S YSTEMS H ARSHAD K ASTURE , D AVIDE B ARTOLINI , N ATHAN B ECKMANN , D ANIEL S ANCHEZ MICRO 2015

Motivation 2  Low server utilization in today’s datacenters results in resource and energy inefficiency  Stringent latency requirements of user-facing services is a major contributing factor  Power management for these services is challenging  Strict requirements on tail latency  Inherent variability in request arrival and service times  Rubik uses statistical modeling to adapt to short-term variations  Respond to abrupt load changes  Improve power efficiency  Allow colocation of latency-critical and batch applications

Understanding Latency-Critical Applications 3 Back End Back End Leaf Node Back End Client Root Node Back End Leaf Node Back End Back End Leaf Node Datacenter

Understanding Latency-Critical Applications 6 Back End Back End 1 ms Leaf Node Back End 1 ms Client Root Node Back End Leaf Node Back End Back End Leaf Node Datacenter  The few slowest responses determine user-perceived latency  Tail latency (e.g., 95 th / 99 th percentile), not mean latency, determines performance

Prior Schemes Fall Short 7  Traditional DVFS schemes (cpufreq, TurboBoost …)  React to coarse grained metrics like processor utilization, oblivious to short-term performance requirements  Power management for embedded systems (PACE, GRACE…)  Do not consider queuing  Schemes designed specifically for latency-critical systems (PEGASUS [Lo ISCA’14], Adrenaline [Hsu HPCA’15])  Rely on application-specific heuristics  Too conservative

Insight 1: Short-Term Load Variations 8  Latency-critical applications have significant short-term load variations moses  PEGASUS [Lo ISCA’14] uses feedback control to adapt frequency setting to diurnal load variations  Deduce server load from observed request latency  Cannot adapt to short-term variations

Insight 2: Queuing Matters! 9 moses  Tail latency is often determined by queuing, not the length of individual requests  Adrenaline [Hsu HPCA’15] uses application-level hints to distinguish long requests from short ones  Long requests boosted (sped up)  Frequency settings must be conservative to handle queuing

Rubik Overview 10  Use queue length as a measure of instantaneous system load  Update frequency whenever queue length changes  Adapt to short-term load variations Core Activity Time Idle Queue Length Time Rubik Core Frequency Time

Goal: Reshaping Latency Distribution 11 Probability Density Response Latency

Key Factors in Setting Frequencies 12  Distribution of cycle requirements of individual requests  Larger variance  more conservative frequency setting  How long has a request spent in the queue?  Longer wait times  higher frequency  How many requests are queued waiting for service  Longer queues  higher frequency

There’s Math! 13 P [ S  c   ] P [ S 0  c ]  P [ S  c   | S   ]  P [ S   ] ω ฀ Cycles Cycles i times 6 4 4 7 4 4 8 P S i  P S i  1 * P S  P S 0 * P S * P S * ... * P S * ฀ Cycles Cycles Cycles c i ฀ f  max L  ( t i  m i ) i  0 ... N ฀

Efficient Implementation 14  Pre-computed tables store most of the required quantities Target Tail Tables c 0 c 1 c 2 c 15 Updated Periodically m 0 m 1 m 2 m 15 ω = 0 ω = 0 ω < 25 th pct ω < 25 th pct ω < 50 th pct ω < 50 th pct Read on each request ω < 75 th pct ω < 75 th pct arrival/departure Otherwise  Table contents are independent of system load!  Implemented as a software runtime  Hardware support: fast, per-core DVFS, performance counters for CPI stacks

Evaluation 15  Microarchitectural simulations using zsim  Power model tuned to a real system Core 3 Core 4 Core 5 o Westmere-like OOO cores o Fast per-core DVFS Shared L3 o CPI stack counters o Pin threads to cores Core 0 Core 1 Core 2  Compare Rubik against two oracular schemes:  StaticOracle: Pick the lowest static frequency that meets latency targets for a given request trace  AdrenalineOracle: Assume oracular knowledge of long and short requests, use offline training to pick frequencies for each

Evaluation 16  Five diverse latency-critical applications  xapian (search engine)  masstree (in-memory key-value store)  moses (statistical machine translation)  shore-mt (OLTP)  specjbb (java middleware)  For each application, latency target set at the tail latency achieved at nominal frequency (2.4 GHz) at 50% utilization

Tail Latency 17

Tail Latency 18

Core Power Savings 19  All three schemes save significant power at low utilization  Rubik performs best, reducing core power by up to 66%

Core Power Savings 20  All three schemes save significant power at low utilization  Rubik performs best, reducing core power by up to 66%  Rubik’s relative savings increase as short-term adaptation becomes more important

Core Power Savings 21  All three schemes save significant power at low utilization  Rubik performs best, reducing core power by up to 66%  Rubik’s relative savings increase as short-term adaptation becomes more important  Rubik saves significant power even at high utilization  17% on average, and up to 34%

Real Machine Power Savings 22  V/F transition latencies of >100 µs even with integrated voltage controllers  Likely due to inefficiencies in firmware  Rubik successfully adapts to higher V/F transition latencies

Static Power Limits Efficiency 23 Idle Latency-critical Utilization Batch Datacenter Utilization

RubikColoc: Colocation Using Rubik 24 Statically Partitioned LLC Rubik sets Latency-Critical Frequencies RubikColoc

RubikColoc Savings 25  RubikColoc saves significant power and resources over a segregated datacenter baseline  17% reduction in datacenter power consumption; 19% fewer machines at high load  31% reduction in datacenter power consumption, 41% fewer machines at high load

Conclusions 26  Rubik uses fine-grained power management to reduce active core power consumption by up to 66%  Rubik uses statistical modeling to account for various sources of uncertainty, and avoids application-specific heuristics  RubikColoc uses Rubik to colocate latency-critical and batch applications, reducing datacenter power consumption by up to 31% while using up to 41% fewer machines

T HANKS F OR Y OUR A TTENTION ! Q UESTIONS ?

FOR L ATENCY -C RITICAL S YSTEMS H ARSHAD K ASTURE , D AVIDE B - PowerPoint PPT Presentation

R UBIK : F AST A NALYTICAL P OWER M ANAGEMENT FOR L ATENCY -C RITICAL S YSTEMS H ARSHAD K ASTURE , D AVIDE B ARTOLINI , N ATHAN B ECKMANN , D ANIEL S ANCHEZ MICRO 2015 Motivation 2 Low server utilization in todays datacenters results in

FOR L ATENCY C RITICAL W ORKLOADS H ARSHAD K ASTURE , D ANIEL S ANCHEZ ASPLOS 2014 Motivation 2

T AIL B ENCH : A B ENCHMARK S UITE AND E VALUATION M ETHODOLOGY FOR L ATENCY - C RITICAL A

M M adison E adison E mbedded S mbedded S ystems & A ystems & A rchitectures Laboratory

M INIMIZING L ATENCY FOR S ECURE D ISTRIBUTED C OMPUTING Rawad Bitar Illinois

D ISTRIBUTED S YSTEMS [COMP9243] S YNCHRONOUS VS A SYNCHRONOUS D ISTRIBUTED S YSTEMS Lecture 7

L ECTURE 2: D YNAMICAL S YSTEMS 1 I NSTRUCTOR : G IANNI A. D I C ARO C OMPLEX S YSTEMS : F

CIRT C ritical I ncident R esponse T eam What is CIRT? CIRT is the Critical Incident Response Team

Pediatric Emergency and Cri ritical Care in in Low Mid iddle In Income Countries: An In

Suic icidal thoughts st start young: The cri ritical need for r famil ily su support and

M ISSION C RITICAL HVAC COLLABORATION CERTAINTY HVAC A IR D

State e and Lo Loca cal l Per erspect ctives on Critic ritical Water er Res esource

PER formance EVAL uation Eneko Atxutegi of C ritical C ommunications www.n .nem emerg ergen

T HE TRANSITION FROM 2D TO 3D AND TO IMRT - R ATIONALE AND C RITICAL E LEMENTS ICTP P S CHOOL ON

consultations in in Queensland: : benefits, challenges and cri ritical enablers Amina TARIQ a ,

M ANAGING C RITICAL I NFRASTRUCTURES THROUGH B EHAVIOURAL O BSERVATION W ILLIAM H

A SSESSING THE C OMMON C ORE , C OMPREHENSIVE A SSESSMENT S YSTEMS C OMPREHENSIVE A SSESSMENT S

COMP80122 Giving a good academic talk Carole Goble | Uli Sattler School of Computer Science

Welcome! Check your audio connection to be sure your speakers are on and the volume is up. Archive

Supporting Families and Caregivers of Infants and Young Children Surrounding the COVID-19 Pandemic

Action Social Mastery Achievement Immersion Creativity Boom! Lets Play Together

Patterns of Projects: From Adrenaline Junkies to Template Zombies Tim Lister Patterns

Clinical Associate Professor George Washington University. Dysautonomia International July

Marys mom has four children: April, May, June and _____? Whats 1+1 Whats 2+2 Whats

Biochemical and Nutri1onal Interven1ons for ADHD and Behavioral Disorders Presented by William

FOR L ATENCY -C RITICAL S YSTEMS H ARSHAD K ASTURE , D AVIDE B - PowerPoint PPT Presentation

R UBIK : F AST A NALYTICAL P OWER M ANAGEMENT FOR L ATENCY -C RITICAL S YSTEMS H ARSHAD K ASTURE , D AVIDE B ARTOLINI , N ATHAN B ECKMANN , D ANIEL S ANCHEZ MICRO 2015 Motivation 2 Low server utilization in todays datacenters results in

FOR L ATENCY C RITICAL W ORKLOADS H ARSHAD K ASTURE , D ANIEL S ANCHEZ ASPLOS 2014 Motivation 2

T AIL B ENCH : A B ENCHMARK S UITE AND E VALUATION M ETHODOLOGY FOR L ATENCY - C RITICAL A

M M adison E adison E mbedded S mbedded S ystems &amp; A ystems &amp; A rchitectures Laboratory

M INIMIZING L ATENCY FOR S ECURE D ISTRIBUTED C OMPUTING Rawad Bitar Illinois

D ISTRIBUTED S YSTEMS [COMP9243] S YNCHRONOUS VS A SYNCHRONOUS D ISTRIBUTED S YSTEMS Lecture 7

L ECTURE 2: D YNAMICAL S YSTEMS 1 I NSTRUCTOR : G IANNI A. D I C ARO C OMPLEX S YSTEMS : F

CIRT C ritical I ncident R esponse T eam What is CIRT? CIRT is the Critical Incident Response Team

Pediatric Emergency and Cri ritical Care in in Low Mid iddle In Income Countries: An In

Suic icidal thoughts st start young: The cri ritical need for r famil ily su support and

M ISSION C RITICAL HVAC COLLABORATION CERTAINTY HVAC A IR D

State e and Lo Loca cal l Per erspect ctives on Critic ritical Water er Res esource

PER formance EVAL uation Eneko Atxutegi of C ritical C ommunications www.n .nem emerg ergen

T HE TRANSITION FROM 2D TO 3D AND TO IMRT - R ATIONALE AND C RITICAL E LEMENTS ICTP P S CHOOL ON

consultations in in Queensland: : benefits, challenges and cri ritical enablers Amina TARIQ a ,

M ANAGING C RITICAL I NFRASTRUCTURES THROUGH B EHAVIOURAL O BSERVATION W ILLIAM H

A SSESSING THE C OMMON C ORE , C OMPREHENSIVE A SSESSMENT S YSTEMS C OMPREHENSIVE A SSESSMENT S

COMP80122 Giving a good academic talk Carole Goble | Uli Sattler School of Computer Science

Welcome! Check your audio connection to be sure your speakers are on and the volume is up. Archive

Supporting Families and Caregivers of Infants and Young Children Surrounding the COVID-19 Pandemic

Action Social Mastery Achievement Immersion Creativity Boom! Lets Play Together

Patterns of Projects: From Adrenaline Junkies to Template Zombies Tim Lister Patterns

Clinical Associate Professor George Washington University. Dysautonomia International July

Marys mom has four children: April, May, June and _____? Whats 1+1 Whats 2+2 Whats

Biochemical and Nutri1onal Interven1ons for ADHD and Behavioral Disorders Presented by William

M M adison E adison E mbedded S mbedded S ystems & A ystems & A rchitectures Laboratory