Support ing Time-Sensit ive I nt roduct ion Mult imedia applicat - - PDF document

support ing time sensit ive
SMART_READER_LITE
LIVE PREVIEW

Support ing Time-Sensit ive I nt roduct ion Mult imedia applicat - - PDF document

Support ing Time-Sensit ive I nt roduct ion Mult imedia applicat ions t ime-sensit ive Applicat ions on a Commodit y Ex: periodic execut ion wit h low j it t er (e.g. sof t OS modem) Ex: quick response t o ext ernal event (e.g. f


slide-1
SLIDE 1

1

Support ing Time-Sensit ive Applicat ions on a Commodit y OS

  • A. Goel, L. Abeni, C. Krasic, J. Snow

and J. Walpole

Proceedings of USENI X 5t h Symposium

  • n Operat ing Syst em Design and

I mplement at ion (OSDI )

December 2002

I nt roduct ion

  • Mult imedia applicat ions t ime-sensit ive

– Ex: periodic execut ion wit h low j it t er (e.g. sof t modem) – Ex: quick response t o ext ernal event (e.g. f rame capt ure in videoconf erence)

  • OS must allocat e r esour ces at appr opr iat e t imes
  • Needs:

– High precision t iming f acilit y – Well-designed preempt ible kernel – Appropriat e scheduling

  • Most commodit y OSes don’t (Windows, Linux)
  • Special OS enhancement s can suppor t r eal
  • t ime

– But hard real-t ime, s.t . degradat ion of non-real-time applicat ions suf f er

Approach

1) Firm t imers f or ef f icient , high-resolut ion t iming 2) Fine-grained kernel preempt ibilit y 3) Priorit y and Reservat ion–based CPU scheduling

  • I nt egrat e int o Linux kernel

Time-sensit ive Linux

  • Show benef it s real-t ime applicat ion, but

not degrade perf ormance of ot her apps

Out line

  • I nt roduct ion

(done)

  • Relat ed Work

(next )

  • Requirement s
  • I mplement at ion
  • Evaluat ion
  • Conclusions

Relat ed Work

  • I llust r at ion of r eal-t ime implement at ion

dif f icult ies [6,15,16]

  • Mat hemat ical r eal-t ime scheduling [10,19]

– But ignore pract ical issues such as non- preempt ibilit y

  • Pr act ical r eal-t ime scheduling [12,17,22]

– But perf ormance of non

  • r eal
  • t ime suf f ers
  • Real-t ime micro-ker nelish [ 4]

– But hard-t imers add more overhead

  • New OSes [ 9]

– But dif f erent AP I so hard t o port apps

Time-Sensit ive Requirement s

  • From t ime need t o handle event unt il act ual

dispat ch is ker nel lat ency

  • Need: Timing Mechanism, Responsive Ker nel, CPU

Scheduling Algor it hm

slide-2
SLIDE 2

2

Timer Mechanism

  • Accur at e t imer t he lar gest add t o ker nel lat ency
  • Can use:

– One-shot t imer –on x86, use on

  • chip C

P U Advanced P rogrammable I nt errupt Cont roller (AP I C). Needs t o be reprogrammed each t ime. – Sof t Timer –check f or expired t imers at st rat egic locat ions, reduce t he number of int errupt s

  • Solut ion: Combine t o call f ir m t imer s

Responsive Ker nel

  • I f t imer is accur at e, might st ill not have low

ker nel lat ency if ker nel cannot r espond – (Tradit ionally, t hread in kernel runs unt il done)

  • Solut ion: r educe size of non-pr eempt ible r egions

CPU Scheduling Algorit hm

  • Need t o schedule t he r ight pr ocess as quickly as

possible

  • Solut ions:

– P riorit y-based scheduler –pre-assign priorit ies and schedule in t hat order – P roport ion

  • period scheduler – schedule wit h an

upper-bound on delay

Misc

  • Not e, any one alone not suf f icient !

– High-resolut ion t imer doesn’t help if kernel not pr eemt ible or: – Responsive ker nel not usef ul wit hout accurat e t ime

  • Not e, t asks may not be independent :

– X server operat es (and is scheduled) in FI FO order – Video applicat ion wit h higher priorit y t han X server will have priorit y inversion (wait ing

  • n low pr ior it y) (will addr ess)

Out line

  • I nt roduct ion

(done)

  • Relat ed Work

(done)

  • Requirement s

(done)

  • I mplement at ion

– Fir m Timer s (next ) – Fine-Gr ained Pr eempt ibilit y – CPU Scheduling

  • Evaluat ion
  • Conclusions

Per iodic Timer s

  • Commodit y OSes implement t iming wit h per iodic

t imer s. – Ex: on I nt el x86, int errupt s generat ed wit h P rogrammable I nt erval Timer (P I T) – Ex: is 10 ms on Linux, t hus is max lat ency

  • Can r educe lat ency by r educing per iod, but adds

mor e int er r upt over head

  • I nst ead, move t o one-shot t imer
  • Ex: t wo t asks, period 5 and 7 ms, t imer period 1

ms, 35 ms running t ime – P eriodic: 35 int errupt s generat ed – One-shot : 11 int errupt s generat ed (5, 7, 10, 14 … ) – P lus, one-shot t imer reduces t imer lat ency

slide-3
SLIDE 3

3

Fir m Timer Design

  • One-shot t imer cost s: t imer r epr ogr amming and

f ielding t imer int er r upt s – Reprogramming cost has decreased in modern hardware (P 2+)

  • P

I T on x86 used t o use slow out on bus

  • Newer AP

I C resides on CP U chip

– Thus, last cost is int errupt cost

  • Reduce by sof t-t imer s

– P

  • ll f or expired t imers at st rat egic point s where

cont ext swit ch is occurring

  • Ex: syst em call, int errupt , except ion ret urn
  • Two new pr oblems: poll cost and added t imer lat ency
  • Can solve 2nd pr oblem wit h t imer over shoot

– P rovides upper bound on lat ency – Tradeof f bet ween accuracy and overhead

  • 0 hard t imers, large sof t -t imers
  • At 100 MHz, t heor et ical accur acy of 10 nanoseconds

Firm Timer I mplement at ion

  • Timer queue f or each queue, sor t ed by expir y
  • When t imer expir es

– execut e callback f unct ion f or each expired t imer – Reprogram AP I C

  • Global over shoot value (but could be done per

t imer)

  • Accessible t hr ough: nanosleep(), pause(),

setitimer(), select() and poll()

Out line

  • I nt roduct ion

(done)

  • Relat ed Work

(done)

  • Requirement s

(done)

  • I mplement at ion

– Fir m Timer s (done) – Fine-Gr ained Pr eempt ibilit y (next ) – CPU Scheduling

  • Evaluat ion
  • Conclusions

Reasons Scheduler Cannot Run

  • I nt errupt s disabled

– Hopef ully, short

  • Anot her t hread in crit ical region
  • Commodit y OSes have no preempt ion f or

ent ire kernel period

– Ex: when int errupt f ires or durat ion of syst em call – Unless known it will be long (ex: disk I / O) – Preempt ion lat ency under Linux can be 30 ms

Enabling More Preempt ion

1) Add more preempt ion point s

– Must be done manually

2) Allow preempt ion anyt ime not using shared dat a st ruct ures

– Prot ect shared st ruct ures wit h locks – Can st ill result in long lat encies

  • Combine 1) and 2) works best

– (Done by Rober t Love [11]) – (Aut hor s evalut ed in [1])

Out line

  • I nt roduct ion

(done)

  • Relat ed Work

(done)

  • Requirement s

(done)

  • I mplement at ion

– Fir m Timer s (done) – Fine-Gr ained Pr eempt ibilit y (done) – CPU Scheduling (next )

  • Evaluat ion
  • Conclusions
slide-4
SLIDE 4

4

CPU Scheduling

  • Priorit y CPU scheduling is simple, POSI X

compliant

– But assumes applicat ions well-behaved

  • So, combine wit h pr opor t ion-period on t op

t o give prot ect ion

P roport ion-Period CPU Scheduling

  • For single independent t asks, assign highest

priorit y t ask – Mis-behaving t ask can consume “t oo much” – Use t emporal prot ect ion

  • Pr opor t ion-per iod pr ovides by allocat ing f ixed CPU

amount each per iod – Task execut es as “real-t ime” (highest priorit y) f or t ime Q every T – P eriod det ermined by applicat ion requirement s (Ex: 30ms f or video)

  • I mplement ed using Ear liest Deadline Fir st (EDF)

Priorit y CPU Scheduling

  • Pr ior it y inver sion occur s when an applicat ion has

mult iple t asks t hat ar e independent – Example: Video applicat ion uses X – Video is highest since t ime-sensit ive – Sends f rame t o X server and blocks – X server may be preempt ed by ot her medium priorit y t ask, hence delaying Video client

  • To solve, use highest-locking pr ior it y (HLP) [19] in

which t ask inher it s pr ior it y when using shar ed resource – Example: display is shared resource so X server get s highest priorit y of blocking client s

Out line

  • I nt roduct ion

(done)

  • Relat ed Work

(done)

  • Requirement s

(done)

  • I mplement at ion

(done)

  • Evaluat ion

(next )

  • Conclusions

Evaluat ion

1) Behavior of t ime-sensit ive applicat ions r unning on TSL 2) The Over heads of TSL

  • Set up:

– Sof t ware

  • Linux 2.4.16
  • Robert Love’s lock-breaking preempt ible kernel pat ch
  • P

roport ion-period scheduler

– Hardware

  • 1.5 GHz I nt el P4 wit h 512 MB RAM

Lat ency in Micro Benchmarks

  • Test low-level component s of ker nel lat ency:

t imer , pr eempt ion and scheduling – Time-sensit ive process t hat sleeps f or a specif ied amount of t ime (using nanosleep()) – Result s: 10 ms in st andard Linux, f ew microseconds in TSL

  • Test pr eempt ion lat ency under loads

– Result s: Linux worst case 100 ms (when copying dat a f rom kernel t o user space), but t ypically less t han 10 ms and is hidden by t imer lat ency. TSL is 1 ms. (Result det ails in [1])

slide-5
SLIDE 5

5

Lat ency in Real Applicat ions

  • Test ed t wo applicat ions:

– mplayer – a open-source audio/ video player – P roport ion- period scheduler - a kernel-level “applicat ion”

Mplayer Det ails

  • Synchronizes audio and video using t ime-

st amps

  • Audio card used as t iming source
  • When video f rame decoded, t ime st amp

compared wit h audio clock.

– I f lat e, t hen play – I f early, t hen sleep f or t ime t hen play

  • I f kernel not responsive or has coarse

t iming, will be poor audio/ video synch and high int er-f rame display j it t er

Test ing MPlayer

  • Compare Linux wit h TSL under:

– Non-kernel CPU load –run user -level st r ess t est – Kernel CPU load –large (8 MB) mem buf f er copied t o a f ile (one write() call) , 90% in ker nel mode – File-syst em load –lar ge dir (linux sr c, 13000 f iles, 180 MB dat a, ext2) copied (via DMA) recursively and f lushed

  • Fore ach t est , run mplayer f or 100 seconds

at r eal-t ime priorit y

Non-kernel CPU Load : Linux

  • 5 ms t o 50 ms when X server run normal prio

(X server at real-t ime, 250 microseconds (not y-axis)) (This conf ig used f or all ot hers)

Non-Ker nel CPU Load : TSL Ker nel CPU Load : Linux

(90 msec f or Linux, since done in non-preempt ible sect ion)

slide-6
SLIDE 6

6

Kernel CPU Load : TSL

(Skew improves t o less t han 400 microseconds)

File Syst em Load : Linux

(Skew of t en low, but as high as 120 msec)

File Syst em Load : TSL

(Skel less t han 500 microseconds, of t en lower)

Compar ison wit h Real-Time Kernel

  • Linux-SRT [6], includes f iner -gr ained t imer s and

r eser vat ion scheduler – (See f igure 5a, 5b, 5c)

  • Non-ker nel CPU load skew less t han 2ms, but as

high as 7 ms (compar e w/ TSL of 250 micr osec)

  • Ker nel CPU load wor st case was 60 ms (compar e

w/ TSL of 400 micr osec)

  • File-Syst em load wor st case was 30 ms (compar e

w/ TSL of 500 micr osec)

  • Shows r eal- t ime scheduling and mor e pr ecise

t imer s insuf f icient . Responsive ker nel also required.

Non-Ker nel CPU Load : TSL

(Much lower, but can st ill be 35 msec)

P roport ion-Period Scheduler

  • Simult aneously ran 2 t ime-sensit ive apps

wit h proport ions of 40% and 20% and per iods of 8192 microsec and 512 microsec

  • Each process records t ime via

gettimeofday() and records in array

  • Measure perf ormance by dif f erences in

array compared wit h period

slide-7
SLIDE 7

7

Maximum Deviat ion

Deviat ions low. Higher when load is high. Maximum gives you bounds. Example: sof t -modem needs CP U every 4 t o 16 ms so could be support ed.

Syst em Overhead

  • Cost s of execut ing code at newly insert ed

preempt ion point s

  • Cost s of execut ing f ir m t imess

Cost of Preempt ion

  • Memory access t est (sequent ially access

128 MB ar r ay), f or k t est (cr eat e 512 processes) and f ile-syst em access t est (copy 2 MB buf f er s t o 8 MB f ile)

– Designed in [1], should be wor st case

  • Test s hit addit ional preempt ion checks
  • Measure rat io of complet ion t ime under

TSL / Linux

  • Result : memory .42%+-.18%, f ork .53%+-

.06%, f ile sys had no overhead

Fir m Timer s

  • Firm t imers user hard and sof t t imers.

Cost s:

– Har d t imer s cost s only –int er r upt handling and cache pollut ion – Hard and sof t t imers common cost s – manipulat ion t imer s f r om queue execut ing preempt ion f or expired t hread – Sof t t imers cost s only –checking f or expired t imers

Firm Timers : Set up

  • Timer process - t ime-sensit ive process is

periodic t ask wakes up via setitimer() call, measures t ime, goes t o sleep

  • Throughput process – povray, a r ay-

t racing program rendering skyvase benchmark, measure elapsed t ime

  • Run t imer wit h 10 ms period since is

suppor t ed by Linux

Fir m Timer Over head

(Dif f erent overshoot values. 8 t imes w/ 95% conf idence int ervals) (Only small decrease in overhead wit h larger overshoot )

slide-8
SLIDE 8

8

Fir m Timer Over head

(Larger decrease in overhead since more t imers) (Linux slower wit h 500 since synchronizes 500 procs. Art if act of set up)

Firm Timer Overhead High Frequencies

(Compare wit h hard t imers only since Linux cannot do 1 ms)

Discussion

  • Firm t imers lower overhead when sof t-

t imer checks f ind t imers

  • Firm t imers higher overhead when sof t-

t imer checks f ind not hing and t imer goes

  • f f

– Fr om t heir wor k, f ir m t imer s lower when more t han 2.1% of t imer checks f ind t imer

Conclusions

  • TSL can suppor t applicat ions needing f ine-gr ained

r esour ce allocat ion and low lat ency r esponse – Firm t imers f or accurat e t iming – Fine-grained kernel preempt ibilit y f or improving kernel response – P roport ion

  • period scheduling f or providing precise

allocat ion of t asks

  • Var iat ions of less t han 400 micr oseconds under

heavy CPU and f ile syst em load

  • Overhead is low