Evaluation of resource Evaluation of resource arbitration methods - - PowerPoint PPT Presentation

evaluation of resource evaluation of resource arbitration
SMART_READER_LITE
LIVE PREVIEW

Evaluation of resource Evaluation of resource arbitration methods - - PowerPoint PPT Presentation

Computer Science 12 Design Automation for Embedded Systems Evaluation of resource Evaluation of resource arbitration methods for arbitration methods for multi-core real-time systems multi-core real-time systems Paper presentation at WCET


slide-1
SLIDE 1

Computer Science 12 Design Automation for Embedded Systems

Evaluation of resource Evaluation of resource arbitration methods for arbitration methods for multi-core real-time systems multi-core real-time systems

Timon Kelter, Tim Harde, Peter Marwedel

Department of Computer Science TU Dortmund, Germany

Heiko Falk

Institute of Embedded Systems/ Real-Time Systems Ulm University, Germany Paper presentation at WCET Workshop 2013, Paris

slide-2
SLIDE 2

2 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

Predictability for Multicore-Platforms Predictability for Multicore-Platforms

Timing influence of parallel task execution

  • Major problem: Contention on shared resources
  • Option 1: Reduce sharing / Duplicate ressources

→ Wastes economic potential, some communic. is unavoidable

  • Option 2: Provide deterministic and analyzable arbitration

→ Needs new analysis methods

Basic block runtime Memory access

→ Local bounds for arbitration delay of individual accesses

slide-3
SLIDE 3

3 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

Predictability for Multicore-Platforms Predictability for Multicore-Platforms

Timing influence of parallel task execution

  • Major problem: Contention on shared resources
  • Option 1: Reduce sharing / Duplicate ressources

→ Wastes economic potential, some communic. is unavoidable

  • Option 2: Provide deterministic and analyzable arbitration

→ Needs new analysis methods

Basic block runtime Memory access Arbitration delay

→ Local bounds for arbitration delay of individual accesses

slide-4
SLIDE 4

4 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

Outline Outline

1) System model 2) Arbitration methods 3) Analysis framework 4) Benchmark Setup 5) Evaluation 6) Summary

slide-5
SLIDE 5

5 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

System model System model

Core 1

ARM7TDMI Core I-SPM D-SPM I-Cache D-Cache Bridge

Shared bus with configurable arbitration …

D-RAM (Uncached) L2 D-Cache BootROM D-RAM (Cached) L2 I-Cache I-RAM (Cached) I-RAM (Uncached)

Implemented in CoMET/Virtualizer [8] → Flexible experi- mentation platform Core N

ARM7TDMI Core I-SPM D-SPM I-Cache D-Cache Bridge

slide-6
SLIDE 6

6 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

Bus arbitration methods Bus arbitration methods

  • „Classic“ methods (Utilization)
  • Fixed Priority (PRIO)

Priority value for each core (non-preemtable access)

  • Fair (Round-Robin) (FAIR)
  • Time-triggered methods (Predictability)
  • Time-Division Multiple Access (TDMA)

Slots of length , owner core for each slot

  • Priority Division (PD)

Slots of length , priorities for core in slot

pi i n l

  • j

j n l pij i j

  • 1=1
  • 2=2
  • 3=3
  • 4=4

p11=max p22=max p33=max p44=max

→ Comparison of achieveable – WCET – ACET – Bus Utilization

slide-7
SLIDE 7

7 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

Bus arbitration methods Bus arbitration methods

  • „Classic“ methods (Utilization)
  • Fixed Priority (PRIO)

Priority value for each core (non-preemtable access)

  • Fair (Round-Robin) (FAIR)
  • Time-triggered methods (Predictability)
  • Time-Division Multiple Access (TDMA)

Slots of length , owner core for each slot

  • Priority Division (PD)

Slots of length , priorities for core in slot

pi i n l

  • j

j n l pij i j

  • 1=1
  • 2=2
  • 3=3
  • 4=4

p11=max p22=max p33=max p44=max

slide-8
SLIDE 8

8 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

Core 1

Memory hierarchy analysis options Memory hierarchy analysis options

  • Employed approach: Generalized combined analysis ([4], aiT)

CFG Reconstruction Combined Microarchitectural Analysis Path Analysis Value Analysis

  • Per-core CFG-based data

flow analysis

  • Memory accesses are

handled by hierarchical state update

  • Each stage may forward or

handle (e.g. guaranteed cache hit)

  • Timing information is

exchanged along with general access information

Pipeline State Update L1 Cache State Update Shared Bus State Update L2 Cache State Update L2 Cache State Merge

slide-9
SLIDE 9

9 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

Shared Bus Analysis Shared Bus Analysis

  • What is the “state” for the shared bus?

→ Approximation of the current position in the cyclic schedule

  • Position: Offset from the beginning of the last TDMA period
  • Abstraction: Set of offsets

Core 1 Slot Core 2 Slot Core 3 Slot Core 4 Slot

x x+1⋅l x+2⋅l x+3⋅l x+4⋅l

… … Time

1⋅l 2⋅l 3⋅l

Abstract Bus State Offsets

Ob

in⊆{0,... ,n⋅l−1}

Ob

  • ut

transfer

slide-10
SLIDE 10

10 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

Shared Bus Analysis (TDMA & PD) Shared Bus Analysis (TDMA & PD)

  • Transfer function for the shared bus state?
  • Pipeline analysis passes in access with spent time since
  • Forwarding to next stages yields post-bus runtime

ai T ai ai−1 Φc

TDMA(o)={

{o} if o∈ωmust {⌊ωmust⌋} else D Ob

i+1= ∪

  • ∈Ob

i ,t∈T ai

{Φc(o+t mod n⋅l)}⊕D Φc

PD(o)={

{o}⊕{0,…,mmax−1} if o∈ωmust φc(ω(o)→ωmust)∪{⌊ωmust⌋} if ∃ωmust ∅ else grant immediately

slide-11
SLIDE 11

11 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

Shared Bus Analysis (TDMA & PD) Shared Bus Analysis (TDMA & PD)

  • Transfer function for the shared bus state?
  • Pipeline analysis passes in access with spent time since
  • Forwarding to next stages yields post-bus runtime

ai T ai ai−1 Φc

TDMA(o)={

{o} if o∈ωmust {⌊ωmust⌋} else D Ob

i+1= ∪

  • ∈Ob

i ,t∈T ai

{Φc(o+t mod n⋅l)}⊕D Φc

PD(o)={

{o}⊕{0,…,mmax−1} if o∈ωmust φc(ω(o)→ωmust)∪{⌊ωmust⌋} if ∃ωmust ∅ else wait for grant window begin

slide-12
SLIDE 12

12 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

Shared Bus Analysis (TDMA & PD) Shared Bus Analysis (TDMA & PD)

  • Transfer function for the shared bus state?
  • Pipeline analysis passes in access with spent time since
  • Forwarding to next stages yields post-bus runtime

ai T ai ai−1 Φc

TDMA(o)={

{o} if o∈ωmust {⌊ωmust⌋} else D Ob

i+1= ∪

  • ∈Ob

i ,t∈T ai

{Φc(o+t mod n⋅l)}⊕D Φc

PD(o)={

{o}⊕{0,…,mmax−1} if o∈ωmust φc(ω(o)→ωmust)∪{⌊ωmust⌋} if ∃ωmust ∅ else Grant, with possible lower prio access

slide-13
SLIDE 13

13 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

Shared Bus Analysis (TDMA & PD) Shared Bus Analysis (TDMA & PD)

  • Transfer function for the shared bus state?
  • Pipeline analysis passes in access with spent time since
  • Forwarding to next stages yields post-bus runtime

ai T ai ai−1 Φc

TDMA(o)={

{o} if o∈ωmust {⌊ωmust⌋} else D Ob

i+1= ∪

  • ∈Ob

i ,t∈T ai

{Φc(o+t mod n⋅l)}⊕D Φc

PD(o)={

{o}⊕{0,…,mmax−1} if o∈ωmust φc(ω(o)→ωmust)∪{⌊ωmust⌋} if ∃ωmust ∅ else Wait for “own” slot, collect “may”-slot offsets

slide-14
SLIDE 14

14 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

Shared Bus Analysis (TDMA & PD) Shared Bus Analysis (TDMA & PD)

  • Transfer function for the shared bus state?
  • Pipeline analysis passes in access with spent time since
  • Forwarding to next stages yields post-bus runtime

ai T ai ai−1 Φc

TDMA(o)={

{o} if o∈ωmust {⌊ωmust⌋} else D Ob

i+1= ∪

  • ∈Ob

i ,t∈T ai

{Φc(o+t mod n⋅l)}⊕D Φc

PD(o)={

{o}⊕{0,…,mmax−1} if o∈ωmust φc(ω(o)→ωmust)∪{⌊ωmust⌋} if ∃ωmust ∅ else No “own” slot exists → Not boundable

slide-15
SLIDE 15

15 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

Analogous to PD cases

Pessimistic Analyses (PRIO & FAIR) Pessimistic Analyses (PRIO & FAIR)

  • Local bounds for PRIO & FAIR:

Need all parallel access interleavings (parallel analysis)

  • → Revert to worst-case assumptions in per-core analysis
  • Arbitration delay bound function analogous to
  • → Transfer & Meet (Set union) functions for DFA

Φc

PRIO(o)={

{o}⊕{0,…,mmax−1} if c is max prio core ∅ else Φc

FAIR(o)={o}⊕{0,…,(n−1)

⋅mmax−1} Φc

slide-16
SLIDE 16

16 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

Single access from every other core at max

Pessimistic Analyses (PRIO & FAIR) Pessimistic Analyses (PRIO & FAIR)

  • Local bounds for PRIO & FAIR:

Need all parallel access interleavings (parallel analysis)

  • → Revert to worst-case assumptions in per-core analysis
  • Arbitration delay bound function analogous to
  • → Transfer & Meet (Set union) functions for DFA

Φc

PRIO(o)={

{o}⊕{0,…,mmax−1} if c is max prio core ∅ else Φc

FAIR(o)={o}⊕{0,…,(n−1)

⋅mmax−1} Φc

slide-17
SLIDE 17

17 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

Benchmarking Method Benchmarking Method

  • Standard multicore benchmarks (SPEC,EEMBC,BDTI):
  • Unpredictable behavior of req. OS/middleware software stack
  • Aggregate known single-thread benchmarks (MRTC / UTDSP /

MiBench / MediaBench / DSPStone, 110 benchmarks in total)

  • Allocate single-thread task to each single-thread core
  • How to form balanced task sets?
  • Parametrization:
  • Minimal slot length
  • Memory access times: 1 cycle (L1), 3 cycles (L2)
  • Map (only) global variables to Shared Memory (→ IO-Devices)

l=mmax

slide-18
SLIDE 18

18 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

WCET Evaluation (Maximum Overestimation) WCET Evaluation (Maximum Overestimation)

Higher overestimation due to accesses in

  • ther cores' slots

Linear increase due to worst-case assumption

slide-19
SLIDE 19

19 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

ACET Evaluation (Baseline: 1-Core, FAIR) ACET Evaluation (Baseline: 1-Core, FAIR)

Extremely low

  • verhead for

FAIR / PRIO Inacceptable

  • verhead for rising

core numbers Scales better than TDMA

slide-20
SLIDE 20

20 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

Total Bus Utilization Results Total Bus Utilization Results

FAIR/PRIO: Almost linear scaling Less steep increase for PD TDMA: Approximatively constant!

slide-21
SLIDE 21

21 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

Summary / Future Work Summary / Future Work

  • Combined state-based analysis framework for shared resources
  • Evaluation of arbitration policies for a configurable multi-core ARM

platform

  • TDMA incurs serious ACET overhead with rising core count
  • PD can balance WCET, ACET and resource utilization
  • FAIR/PRIO provide unmatched utilization
  • Extensions:
  • Optimization of TDMA / PD schedules
  • Extension of state-based approach to true parallel analysis
  • Analysis of dependent / cooperative threads
slide-22
SLIDE 22

22 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

References References

  • [1] Hermann Kopetz and Günther Bauer. The time-triggered architecture. In

Proceedings of the IEEE, 91(1):112–126, 2003.

  • [2] Christoph Cullmann, Christian Ferdinand, Gernot Gebhard, Daniel Grund,

Claire Maiza, Jan Reineke, Benoît Triquet, Simon Wegener, and Reinhard

  • Wilhelm. Predictability Considerations in the Design of Multi-Core Embedded
  • Systems. Ingénieurs de l’Automobile, 807:36–42, September 2010.
  • [3] Benjamin Lesage, Damien Hardy, Isabelle Puaut: WCET Analysis of

Multi-Level Set-Associative Data Caches. In Proceedings of WCET Workshop 2009

  • [4] Marc Langenbach, Stephan Thesing, and Reinhold Heckmann. 2002.

Pipeline Modeling for Timing Analysis. In Proceedings of the 9th International Symposium on Static Analysis (SAS '02), Manuel V. Hermenegildo and German Puebla (Eds.). Springer-Verlag, London, UK, 294-309.

slide-23
SLIDE 23

23 Kelter, Harde, Marwedel and Falk: “Evaluation of resource arbitration methods […]“ Computer Science 12 Design Automation for Embedded Systems

References References

  • [5] Timon Kelter, Heiko Falk, Peter Marwedel, Sudipta Chattopadhyay, Abhik

Roychoudhury: Bus-Aware Multicore WCET Analysis through TDMA Offset

  • Bounds. In Proceedings of ECRTS 2011: 3-12
  • [6] Sudipta Chattopadhyay, Lee Kee Chong, Abhik Roychoudhury, Timon Kelter,

Peter Marwedel, Heiko Falk: A Unified WCET Analysis Framework for Multi-core

  • Platforms. IEEE Real-Time and Embedded Technology and Applications

Symposium 2012: 99-108

  • [7] Mingsong Lv, Wang Yi, Nan Guan and Ge Yu: Combining Abstract

Interpretation with Model Checking for Timing Analysis of Multicore Software. In Proceedings of RTSS. 2010, 339-349.

  • [8] Synopsys Inc. CoMET system engineering IDE. htttp://www.synopsys.com/

Systems/VirtualPrototyping/Pages/CoMET-METeor.aspx