I nterposed Proportional Sharing f or a Storage Service Utility Wei - - PowerPoint PPT Presentation
I nterposed Proportional Sharing f or a Storage Service Utility Wei - - PowerPoint PPT Presentation
I nterposed Proportional Sharing f or a Storage Service Utility Wei J in J asleen Kaur UNC - Chapel Hill J ef f Chase Duke Univer sit y Resource Sharing in Ut ilit ies Resour ce ef f iciency Adapt ivit y Sur ge prot ect ion Client s
Resource Sharing in Ut ilit ies
shared service e.g., st or age arr ay
Client s Request f lows Resour ce ef f iciency Adapt ivit y Sur ge prot ect ion Robust ness “Pay as you gr ow” Economy of scale Aggregat ion
- Resour ce sharing of f ers import ant benef it s.
- But sharing must be “f air ” t o pr ot ect user s.
- Shared ser vices of t en have cont r act ual perf ormance
t ar get s f or gr oups of client s or r equest s.
- Service Level Agr eement s or SLAs
Goals
- Per f or mance isolat ion
– Localize t he damage f rom unbudget ed demand surges.
- Dif f er ent iat ed service qualit y
– Of f er predict able, conf igur able perf or mance (e.g., mean r esponse t ime) f or st able r equest st r eams.
- Non-invasive
– Ext ernal cont r ol of a “black box” or “black cloud” – Gener alize t o a r ange of ser vices – No changes t o service st ruct ur e or implement at ion
I nt erposed Request Scheduling I
e.g., r out er shar ed ser vice e.g., st orage arr ay – I nt er cept and t hrot t le or r eorder r equest s on t he pat h
bet ween t he client s and t he service [e.g., Lumb03]. – Build t he scheduler int o net wor k swit ching component s,
- r int o t he client s (e.g., server s in a ut ilit y dat a cent er).
– Manage r equest t r af f ic r at her t han r equest execut ion.
scheduler client s
Alt ernat ive Approaches
- Ext end scheduler f or each resource in a ser vice.
– Cello, Xen, VMwar e, Resour ce Cont ainers, et c. – Precise but invasive, and must coor dinat e scheduler s t o manage sharing of an aggregat e resource (server, array).
- Facade [Lumb03] uses Earliest Deadline First in an int er posed
r equest scheduler t o meet r esponse t ime t ar get s. – Does not pr ovide isolat ion, t hough pr ior it y can help. – Can admission cont rol make isolat ion unnecessary?
- SLEDS [Chambliss03] is a per -client net wor k st or age cont roller
using leaky bucket rat e t hr ot t ling. – Flows cannot exceed conf igur ed rat e even if resources ar e idle.
Proport ional Sharing
- Each f low is assigned a weight • .
- Allocat e resour ces among act ive f lows in pr oport ion
t o t heir weight s. – Work-conser ving: allocat e surplus propor t ionally
- Fair ness
– Lag is t he dif f erence in weight ed work done on behalf of a pair of f lows. – Pr ove a const ant worst -case bound on lag f or any pair of f lows t hat are act ive over any int erval. – “Use it or lose it ”: no penalt y f or consuming surplus r esources.
Weight s as Shares
- Weight s def ine a conf igur ed or assured service rat e.
– Adj ust weight s t o meet per f or mance t arget s.
- I dealize weight s as shares of t he service’s capacit y
t o ser ve request s. – Normalize weight s t o sum t o one.
- For net work services, your mileage may vary.
– Deliver ed ser vice rat e depends on request dist ribut ion, cross-t alk, hot spot s, et c. – Pr emise: behavior is suf f icient ly r egular t o adj ust weight s under f eedback cont r ol.
I nt erposed Request Scheduling I I
e.g., r out er shar ed ser vice e.g., st orage arr ay – Dispat ch/ issue up t o D r equest s or D unit s of wor k. – I ssue r equest s t o r espect weight s assigned t o each f low. – Choose D t o balance ser ver ut ilizat ion and t ight resource cont r ol. – Request concurr ency is def ined/ cont rolled by t he ser ver . dept h D scheduler
Overview
- Backgr ound on propor t ional share scheduling
– Virt ual Clock [Zhang90] – Weight ed Fair Queuing [Demer s89] – St ar t -t ime Fair Queuing or SFQ [Goyal97]
- New dept h-cont rolled var iant s f or int erposed scheduling
– Why SFQ is not suf f icient : concurr ency. – New algor it hm: SFQ(D) – Ref inement : FSFQ(D)
- Decent r alized t hr ot t ling wit h Request Windows (RW)
- Proven f airness r esult s and exper iment al evaluat ion
A Request Flow
pf pf
1
pf
2
A(pf
0)
A(pf
1)
A(pf
2)
cf
0=10
cf
1=5
cf
2=10
Consider a f low f of service r equest s. – Could be packet s, CPU demands, I / Os, r equest s f or a ser vice – Each request has a dist inct arr ival t ime (serialize arr ivals). – Each r equest has a cost : packet lengt h, ser vice dur at ion, et c. t ime
Request Cost s
- Can apply t o any service if we can est imat e t he cost
- f each request .
- Relat ively easy t o est imat e cost f or block st or age.
- Fair ness result s are relat ive t o t he est imat ed cost s;
t hey are only as accurat e as t he est imat es.
A Flow wit h a Share
pf pf
2
Consider a sequent ial unit r esour ce: capacit y is 1 unit wor k/ t ime unit . – Suppose f low f has a conf igur ed shar e of 50% (• f = 0.5). – f is assur ed T unit s of ser vice in T/ • f unit s of real t ime. – How t o implement shar es/ weight s in an int erposed r equest scheduler?
pf
1
5 10 10
pf
10
pf
2
10
pf
1
5
arr ival dispat ch
Virt ual Clock
Each arr iving r equest is t agged wit h a st art (eligible) t ime and a f inish t ime.
pf
10
pf
2
10
S(pf
0) = 0
S(pf
1) = 20
S(pf
2) = 30
F(pf
0) = 20
F(pf
1) = 30
S(pf
2) = 50
pf
1
5
S(pf
i)= F(pf i-1)
F(pf
i) = S(pf i) +
cf
i
- f
View t he t ags as a vir t ual clock f or each f low.
Each request advances t he f low’s clock by t he amount of real t ime unt il it s next request must be served.
I f t he f low complet es wor k at it s conf igured ser vice r at e, t hen virt ual t ime • real t ime. [Zhang90]
Sharing wit h Virt ual Clock
Vir t ual clock scheduler [Zhang90] orders t he r equest s/ packet s by t heir virt ual clock t ags.
This example: – shows t wo f lows each at • =50% – assumes bot h f lows ar e act ive and backlogged What if a f low does not consume it s conf igured shar e?
5 10 10 8 10 5
16 20 26 30 38 28 23 18 10 virt ual real
Virt ual Clock is Unf air
A scheduler is work-conser ving if t he resour ce is never lef t idle while a request is queued await ing service. Virt ual Clock is wor k-conser ving, but it is unf air: an act ive f low is penalized f or consuming idle r esources. The lag is unbounded: really want a “use it or lose it ” policy.
5 10 10 8 10 5
20 16
5 5 10 10 8 10 5 inactive 5
30 50 26 penalized unf airly
Weight ed Fair Queuing
Def ine syst em vir t ual t ime v(t ), which advances wit h t he pr ogr ess of t he act ive f lows. – Less compet it ion speeds up v(t ); mor e slows it down. Advance (lagging) clock of a newly act ive f low t o t he syst em vir t ual t ime, t o r elinquish it s claim t o r esour ces it lef t idle. How t o maint ain v(t )? – Too f ast ? Rever t s t o FI FO. – Too slow? Rever t s t o Virt ual Clock.
S(pf
i)= max (v(A(pf i)), F(pf i-1))
F(pf
i) = S(pf i) +
cf
i
- f
- v(t)
- t
- • i
C for active flows i
St art -Time Fair Queuing (SFQ)
SFQ der ives v(t ) f rom t he st ar t t ag of t he r equest in service. Use t he resour ce it self t o drive t he global clock.
– Or der r equest s by st ar t t ag [Goyal97]. – Cheap t o comput e v(t ). – Fair even if capacit y (service r at e) C varies. – Lag bet ween t wo backlogged f lows is bounded by:
5 10 10 8 10 5
20 46
5 5 10 10 8 10 5 inactive
30
5
30 50 56 Vir t ual clock derived f r om act ive f low. cf
max
- f
cg
max
- g
+
SFQ f or I nt erposed Scheduling?
st or age service
Challenge: concurr ency.
– Up t o D r equest s ar e “in service” concurr ent ly. – SFQ vir t ual t ime v(t ) is no longer uniquely def ined. – Direct adapt at ion: Min-SFQ(D) t akes min of r equest s in ser vice. dept h D SFQ scheduler f or ser vice
Min-SFQ is Unf air
6 6 6 6 6 6 6 6 24 72 8 48 16 24 6 6 6 6 6 6 inactive 6 6 6
- f = .25
- g = .75
- 1. Green has insuf f icient
concurrency in request st ream
- 2. Request burst f or Green
Green is act ive enough t o ret ain it s virt ual clock, but lags arbit rarily f ar behind. Purple st arves unt il Green’s virt ual clock cat ches up.
virt ual Problem: v(t ) advances wit h t he slowest act ive f low: clock skew causes t he algor it hm t o degr ade t o Vir t ual Clock, which is unf air .
SFQ(D)
Solut ion: t ake v(t ) f rom clocks of backlogged f lows. – Take v(t ) as min t ag of queued request s await ing dispat ch.
– (The st art t ag of t he request t hat will issue next .) – I mplement at ion: t ake v(t ) f rom t he last issued request . – Equivalent t o scheduling t he sequence of issue slot s wit h SFQ. dept h D SFQ f or D issue slot s
SFQ(D) Lag Bounds
pf
10 10
dispat ch complet e Apply SFQ bounds t o issued r equest s.
cf
max
- f
cg
max
- g
+ (D+1)
SFQ lag bounds apply t o request s issued under SFQ(D). Fr om t his we can derive t he lag bound f or request s complet ed under SFQ(D).
Lag bet ween t wo backlogged f lows f and g is bounded by:
Ref ining SFQ(D)
- SFQ(D) vir t ual t ime advances monot onically, but
advances at most once per r equest issue.
- Burst s of request s may receive t he same st art t ag,
including request s f r om act ive f lows t hat ar e “ahead”.
- To be f air, t he scheduler should bias against f lows
t hat hold more t han t heir shar e of issue slot s.
- Four-t ag St art -t ime Fair Queuing (FSFQ(D)) is a
r ef inement t o SFQ(D).
– Br eak t ies wit h a second pair of “adj ust ed” t ags derived f r om Min-SFQ(D).
Request Windows: Mot ivat ion
- SFQ(D) and FSFQ(D) assume a cent ral point of
cont rol over t he request f lows.
– Designed t o r eside wit hin a service swit ch, e.g., a net work st orage rout er. – Single point of complexit y and vulnerabilit y.
- Any cent ral scheduler requires log(F) overhead t o
select t he next r equest .
- Thr ot t ling can impr ove delay bounds by reserving
issue slot s.
Request Windows
st or age service weight dept h D Reserve slot s per f low (Request Window) based on t he f low’s share.
Limit each f low t o it s share of t he t ot al weight (D) allowed int o t he syst em f rom all f lows. for each flow f and all flows i
- • i
- f
nf = D nf
Behavior of Request Windows
weight D
I s RW work-conserving? I t does allow a f low t o exceed it s conf igur ed service r at e under light load.
– Window const r ains t he out st anding r equest s, not r at e. – P er -f low issue r at e incr eases wit h ser vice r at e. – Balance t ight cont r ol wit h concurrency under light load. Theor em: Lag bet ween any t wo per sist ent ly backlogged f lows is bounded by 2D f or a FI FO server .
Experiment s
- I mplement ed an NFS pr oxy f or int erposed request
scheduling.
– Ext ends Anypoint [Yocum03] r edir ect ing swit ch pr ot ot ype. – SFQ(D), FSFQ(D), EDF in about 1000 lines of code.
- I mplement ed a disk array simulat or.
- Used prot ot ype t o validat e simulat or f or random read
workloads (f st r ess load gener at or [Anderson02]).
- Simulat ed r andom read workloads wit h var ying dept h,
arr ival r at e, and shares.
Perf ormance I solat ion wit h FSFQ
2 4 6 8 10 12 14 10 20 30 40 50 60 70 80 90 100 100 200 300 400 500 10 20 30 40 50 60 70 80 90 100
t hr oughput (I OPS) r esponse t ime (seconds) t ime (seconds) Blue: 480 I OPS 67% shar e Red: ON/ OFF 0/ 120 I OPS 10 sec int ervals 33% shar e Ser ver sat urat ion @500 I OPS FSFQ(16)
- r RW(16)
2 4 6 8 10 12 10 20 30 40 50 60 70 80 90 100 100 200 300 400 500 10 20 30 40 50 60 70 80 90 100
EDF Alone is Not Suf f icient
t hr oughput (I OPS) r esponse t ime (seconds) Blue: 480 I OPS t ar get = 1 second Red: ON/ OFF 0/ 120 I OPS 10 sec int ervals t arget = 10 ms Ser ver sat urat ion @500 I OPS
Preview
- Flow f issues request s at a f ixed arr ival rat e.
- Compet it or g increases it s r equest rat e on X-axis.
- Plot mean r esponse t ime f or f on Y-axis.
- Evaluat e per f ormance isolat ion, work conservat ion.
Flow g request rat e Flow f response t ime work conser vat ion predict abilit y isolat ion Flow g request rat e Flow g response t ime
SFQ and FSFQ
10 20 30 40 50 60 70 100 200 300 400 500 Client #2 request rate (IOPS) Client #1 response time (ms)
SFQ(8) FSFQ(8) 2:1 8:1
- f ’s r esponse t ime st abilizes at
a level det er mined by it s weight .
- f ’s r esponse t ime improves
when g’s load is low.
- FSFQ improves f air ness modest ly.
Ot her r esult s
- g’s r esponse t ime degr ades
wit hout bound as it s load exceeds it s shar e.
- When f gener at es low load,
r esponse t imes improve f or bot h f lows, and t he st able level is less sensit ive t o weight . F 5c
FSFQ and RW
10 20 30 40 50 60 70 80 100 200 300 400 500 request rate for g (IOPS) response time for f (ms)
FSFQ(32) 1:1 FSFQ(32) 2:1 FSFQ(32) 8:1 RW(32) 1:1 RW(32) 2:1 RW(32) 8:1 F
- As expect ed....
- RW isolat es f mor e
ef f ect ively t han FSFQ because it limit s t he abilit y
- f g t o consume slot s lef t
idle by f .
Ot her r esult s
- FSFQ(32) is similar t o
RW(32) wit h f @240 I OPS.
- FSFQ(32) is less ef f ect ive
t han FSFQ(8). 6a
Ef f ect of Dept h
20 40 60 80 100 120 140 160 100 200 300 400 500
request rate for g (IOPS)
response time for f (ms)
FSFQ(64) RW(64) FSFQ(16) RW(16) F 7c
- 2:1 weight s
- I ncreasing D
weakens cont r ol
- RW of f ers t ight er
cont rol t han FSFQ.
- FSFQ uses surplus
r esour ces more aggressively.
Summary of Result s
- I nt er posed request scheduling wit h *SFQ and RW
- f f ers accept able perf ormance isolat ion and is non-
invasive.
– Pr edict able, conf igur able dif f er ent iat ed service. – Wit h larger syst ems dept h must increase. The algor it hms ar e f air and isolat ing even wit h high D, but cannot suppor t t ight r esponse t ime bounds. – I n a wor k-conserving syst em, a f low wit h low ut ilizat ion
- f it s shar e experiences weaker isolat ion.
– FSFQ(D) yields modest improvement s over SFQ(D). – RW(D) of f er s st r onger isolat ion t han *SFQ, but is “less work-conser ving” (mor e like a r eser vat ion).
Furt her St udy
- How precisely can we est imat e cost s?
– Wor kload cr osst alk, e.g., disk ar m movement
- Assumes int er nally balanced load
– I nt ernal bot t lenecks can slow service r at e and “bleed over ” int o ot her shar es. – May need some component -local st at us/ cont r ol if / when signif icant load imbalances exist (e.g., St onehenge).
- Explor e hybrids of *SFQ(D) and RW(D) f or var ying balances of
decent r alizat ion and cont r ol. – Degr ee of cont r ol is r educed as we incr ease parallelism wit hin t he cloud.
- Sizing shares f or r esponse-t ime SLAs.
ht t p:/ / issg.cs.duke.edu/ publicat ions/ shares ht t p:/ / issg.cs.duke.edu/ publicat ions/ shares-
- sigmet 04.pdf
sigmet 04.pdf (Enhanced/ corr ect ed version of paper ) (Enhanced/ corr ect ed version of paper )
ht t p:/ / ht t p:/ / www.cs.duke.edu www.cs.duke.edu/ ~chase / ~chase
Ef f ect of dept h f or a low- demand f low
20 40 60 80 100 120 140 160 100 200 300 400 500 request rate for g (IOPS) response time for f (ms)
FSFQ(64) RW(64) FSFQ(16) RW(16) F 7a
- 2:1 weight s
- I ncreasing D
weakens cont r ol
- f r esponse t imes
incr ease; g response t imes decrease
- RW of f ers t ight er
cont rol t han FSFQ
- FSFQ uses surplus