q uestions
play

Q UESTIONS ? David.Snowdon@nicta.com.au http://ertos.nicta.com.au - PowerPoint PPT Presentation

R UN T IME P REDICTION O F P ERFORMANCE AND E NERGY WHEN F REQUENCY S CALING David Snowdon, Stefan Petters and Gernot Heiser David.Snowdon@nicta.com.au The imagination driving Australias ICT fu R UN T IME P REDICTION O F P ERFORMANCE AND E


  1. R UN T IME P REDICTION O F P ERFORMANCE AND E NERGY WHEN F REQUENCY S CALING David Snowdon, Stefan Petters and Gernot Heiser David.Snowdon@nicta.com.au The imagination driving Australia’s ICT fu

  2. R UN T IME P REDICTION O F P ERFORMANCE AND E NERGY WHEN F REQUENCY S CALING David Snowdon, Stefan Petters and Gernot Heiser David.Snowdon@nicta.com.au ➀ Motivation: problems with DVFS ➁ Modelling performance and energy ➂ Evaluation ➃ Future work The imagination driving Australia’s ICT fu

  3. M OTIVATION ➜ Embedded systems are often restricted by battery life. ➜ Total system energy consumption. Our work looks at effective DVFS in real systems " # ! $ 2 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  4. M OTIVATION Theory: E ∝ V 2 Normalised Total Energy 50 100 150 200 250 300 350 400 450 500 CPU Frequency (MHz) 3 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  5. M OTIVATION Practice: (PXA255 based system) Normalised Total Energy 50 100 150 200 250 300 350 400 450 500 CPU Frequency (MHz) 4 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  6. M OTIVATION Why?: 5 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  7. M OTIVATION Why?: Simple models ➜ P ∝ fV 2 ➜ T ∝ 1 f ➜ V = F ( f ) and F monotonically increasing Modern systems aren’t simple! ➜ Varying number of switches (workload specific!) ➜ Multiple frequency domains ➜ Frequency independent (static) power 5- A R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  8. M OTIVATION Why?: Simple models ➜ P ∝ fV 2 ➜ T ∝ 1 f ➜ V = F ( f ) and F monotonically increasing Modern systems aren’t simple! ➜ Varying number of switches (workload specific!) ➜ Multiple frequency domains ➜ Frequency independent (static) power We want to be able to deal with these nuances 5- B R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  9. E XECUTION TIME MODEL 1 ➜ Simple execution time model: T ∝ f cpu ➜ i.e. Constant cycles ➜ Problem: Ignores execution time independent of CPU-clock 6 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  10. E XECUTION TIME MODEL 1 ➜ Simple execution time model: T ∝ f cpu ➜ i.e. Constant cycles ➜ Problem: Ignores execution time independent of CPU-clock 2 2 bitcnt gzip 1.8 1.8 Normalised Cycles Normalised cycles 1.6 1.6 1.4 1.4 1.2 1.2 1 1 0.8 0.8 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 CPU Frequency (MHz) CPU Frequency (MHz) Implicaton: ➜ Memory-bound performance is less dependent on CPU frequency 6- A R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  11. E XECUTION TIME MODEL ➜ Task: predict the execution time of a workload in an arbitrary system configuration ➜ Low overhead, cross-architectural, dynamic applications 7 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  12. E XECUTION TIME MODEL ➜ Task: predict the execution time of a workload in an arbitrary system configuration ➜ Low overhead, cross-architectural, dynamic applications T = C cpu + C bus + C mem + C io + . . . f cpu f bus f mem f io 7- A R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  13. E XECUTION TIME MODEL ➜ Task: predict the execution time of a workload in an arbitrary system configuration ➜ Low overhead, cross-architectural, dynamic applications T = C cpu + C bus + C mem + C io + . . . f cpu f bus f mem f io C x : characterise a-priori, or online using performance counters = α 1 PMC 1 + α 2 PMC 2 + . . . C bus = β 1 PMC 1 + β 2 PMC 2 + . . . C mem ( C cpu inferred from the other results) 7- B R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  14. E XECUTION TIME MODEL ➜ Task: predict the execution time of a workload in an arbitrary system configuration ➜ Low overhead, cross-architectural, dynamic applications T = C cpu + C bus + C mem + C io + . . . f cpu f bus f mem f io C x : characterise a-priori, or online using performance counters = α 1 PMC 1 + α 2 PMC 2 + . . . C bus = β 1 PMC 1 + β 2 PMC 2 + . . . C mem ( C cpu inferred from the other results) ➜ 2-parameter: avg 1.7%, max 7% ➜ CPU frequency only: avg 10%, max 36% 7- C R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  15. P OWER MODEL ➜ Simple CMOS model: P ∝ fV 2 Problems: ➜ System power ➜ Static power and leakage ➜ Multiple frequency/voltage domains ➜ Temperature dependence ➜ Conversion inefficiencies 8 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  16. P OWER MODEL ➜ Simple CMOS model: P ∝ fV 2 Problems: ➜ System power ➜ Static power and leakage ➜ Multiple frequency/voltage domains ➜ Temperature dependence ➜ Conversion inefficiencies A (slightly more) realistic model: N � C n f n V 2 P = n + P static n =0 8- A R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  17. P OWER MODEL The interaction of run-time and static power: ➜ Dynamic energy increases as frequency increases ➜ Static energy decreases as frequency increases E total = P dyn ∆ t + P static ∆ t Etotal(f) Edyn(f) Estatic(f) Energy CPU Frequency 9 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  18. P OWER MODEL Power/Energy model principles: ➜ Events each use an amount of energy ➜ An event may use energy in more than one voltage domain For our system: E events = V 2 cpu ( α 0 PMC 0 + · · · + α m PMC m )+ β 0 PMC 0 + · · · + β m PMC m 10 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  19. P OWER MODEL Power/Energy model principles: ➜ Events each use an amount of energy ➜ An event may use energy in more than one voltage domain ➜ Clocks cycles count as events For our system: E events = V 2 cpu ( α 0 PMC 0 + · · · + α m PMC m )+ β 0 PMC 0 + · · · + β m PMC m E freqs = V 2 cpu ( γ 1 f cpu + γ 2 f bus + γ 3 f mem )∆ t +( γ 4 f cpu + γ 5 f bus + γ 6 f mem )∆ t 10- A R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  20. P OWER MODEL Power/Energy model principles: ➜ Events each use an amount of energy ➜ An event may use energy in more than one voltage domain ➜ Clocks cycles count as events ➜ Static power models power not related to events or voltages. ➜ Constant IO power for the benchmarks tested. For our system: E events = V 2 cpu ( α 0 PMC 0 + · · · + α m PMC m )+ β 0 PMC 0 + · · · + β m PMC m E freqs = V 2 cpu ( γ 1 f cpu + γ 2 f bus + γ 3 f mem )∆ t +( γ 4 f cpu + γ 5 f bus + γ 6 f mem )∆ t E static = P static ∆ t 10- B R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  21. P OWER MODEL Parameter selection: ➜ Systematically picking the best model for N counters ➜ Least-squares regression finds the coefficients 11 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  22. E VALUATION ➜ Typical embedded platform (PLEB 2, XScale based) ➜ Cycle counter, 2 performance counters, 13 events 12 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  23. E VALUATION ➜ Typical embedded platform (PLEB 2, XScale based) ➜ Cycle counter, 2 performance counters, 13 events ➜ 37 benchmarks run to completion at each setpoint for all frequency settings ➜ 22 frequency setpoints with different f cpu , f bus and f mem ➜ Voltage varied to three settings for each frequency 12- A R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  24. E VALUATION ➜ Typical embedded platform (PLEB 2, XScale based) ➜ Cycle counter, 2 performance counters, 13 events ➜ 37 benchmarks run to completion at each setpoint for all frequency settings ➜ 22 frequency setpoints with different f cpu , f bus and f mem ➜ Voltage varied to three settings for each frequency ➜ Measurements: Cycles, Frequencies, Performance counters, Energy ➜ Benchmarks were partitioned for calibration and validation 12- B R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  25. E VALUATION 13 R UN T IME P REDICTION O F P ERFORMANCE AND . . . R2 0.92 0.96 0.98 0.98 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 1 1 1 1 1 1 1 1 (Intercept) v*v*fcpu v*fcpu fcpu v*v*fbus fbus v*v*fmem fmem v*v*PMC0/t v*v*PMC1/t v*v*PMC2/t v*v*PMC3/t v*v*PMC4/t v*v*PMC5/t v*v*PMC6/t v*v*PMC7/t v*v*PMC8/t v*v*PMC9/t v*v*PMC10/t v*v*PMC11/t v*v*PMC12/t v*v*PMC13/t PMC0/t PMC1/t The imagination driving Australia’s ICT fu PMC2/t PMC3/t PMC4/t PMC5/t PMC6/t PMC7/t PMC8/t PMC9/t PMC10/t PMC11/t PMC12/t PMC13/t

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend