M ULTICORE H ARDWARE S HARED R ESOURCES : U NDERSTANDING OF THE S - - PowerPoint PPT Presentation

m ulticore h ardware s hared r esources
SMART_READER_LITE
LIVE PREVIEW

M ULTICORE H ARDWARE S HARED R ESOURCES : U NDERSTANDING OF THE S - - PowerPoint PPT Presentation

C ONTENTION IN M ULTICORE H ARDWARE S HARED R ESOURCES : U NDERSTANDING OF THE S TATE OF THE A RT Gabriel Fernandez 1 , Jaume Abella 2 , Eduardo Quiones 2 , Christine Rochange 3 , Tullio Vardanega 4 and Francisco J. Cazorla 2,4 1 2 3 5 4 14


slide-1
SLIDE 1

CONTENTION IN MULTICORE HARDWARE SHARED RESOURCES: UNDERSTANDING OF THE STATE OF THE ART

Gabriel Fernandez1, Jaume Abella2, Eduardo Quiñones2, Christine Rochange3, Tullio Vardanega4 and Francisco J. Cazorla2,4

3 1 4 5

14th International Workshop on Worst‐Case Execution Time Analysis (WCET 2014)

2

slide-2
SLIDE 2

Multicores: benefits and challenges

  • Multicores

– Allow higher “guaranteed performance”

  • Guaranteed as opposed to average‐case

– Interference on execution time and WCET due to contention in the access to HW shared resources

  • Challenge timing analysis
  • Higher impact than in singlecore
  • Contention in multicores has been deeply

studied by the research community

– Different approaches taken to contention

  • At different levels of abstraction

– The solutions space is difficult to fully understand

2

slide-3
SLIDE 3

Motivation of this work

  • Provide a sensible taxonomy of the SoA

techniques

– Identifying ‘families’ of techniques – Singling out representative works for each class

  • Without seeking absolutely exhaustive coverage
  • Review each family

– Seeking overlaps and gaps with others – Understanding assumptions and challenges of use – Gaging confidence in WCET bounds and assurance guarantees for industrial use

  • Capture cross‐cutting techniques

3

slide-4
SLIDE 4

Taxonomy

System Centric Time Analysis Frameworks Task Assignment and Scheduling Handling Contention WCET Centric Contention aware Contention

  • blivious

Joint Analysis Independent Analysis Architecture Centric COTS Centric Bottom‐up / Top‐down Idealistic‐innovative / Practical‐pragmatic

4

slide-5
SLIDE 5

System‐centric

System Centric Time Analysis Frameworks Task Assignment and Scheduling Handling Contention Contention aware Contention

  • blivious

5

slide-6
SLIDE 6

Timing analysis frameworks

  • Assume replicated on‐chip resources

– SW on core suffers no parallel contention

  • Model off‐chip shared resources in isolation

– Provide worst‐case access timing bounds – Contention captured compositionally: off‐chip contention in the presence of co‐runners

  • TDMA arbiter

– Co‐running tasks do not affect one another’s execution time – Worst‐case alignment of the requests in the TDMA

  • Dynamic arbiter

– Co‐running tasks do affect one another’s execution time – Focus on deriving bounds for the number of accesses per task in a given period of time

6

slide-7
SLIDE 7

Task allocation and scheduling

  • Contention oblivious

– The WCET of all tasks is given in input

  • WCET bounds may be determined before decisions are

made on task mapping and on scheduling

– Escape circularity in the mutual dependence between WCET analysis and schedulability analysis

  • Contention aware

– Focus on the shared last‐level cache – Benefit from HW techniques for cache partitioning

  • r allocate program data to different pages

– Assume partitioned scheduling and augment assignment with colouring

7

slide-8
SLIDE 8

WCET‐centric

Handling Contention WCET Centric Joint Analysis Independent Analysis

8

slide-9
SLIDE 9

Including contention costs in WCETs

  • Stall times integrated in the ILP formulation

used to derive WCETs (IPET method)

– Worst‐case memory instruction latencies – Worst‐case number of L2 cache misses

  • Two philosophies to capture worst cases

– Contextual

  • The set of concurrent threads/tasks is known at

analysis time ➙ joint analysis

– Universal

  • Concurrent tasks are unknown ➙ independent analysis
  • Needs hardware/software support

9

slide-10
SLIDE 10

Joint analysis of concurrent tasks

  • Approach A

– Iterative computation

  • f interferences
  • Approach B

– Timed automata + model checking

low‐ level analysis Task A low‐ level analysis Task A low‐ level analysis Task B low‐ level analysis Task B low‐ level analysis Task C low‐ level analysis Task C

analysis of possible interferences analysis of possible interferences tasks schedule

WCET

  • f Task A

private resources shared resources

show: WCET(A) < x

10

slide-11
SLIDE 11

Independent analysis

  • No assumption on the concurrent workload

– Independent of task assignment and scheduling

  • Requires hardware/software support

– To derive worst‐case latencies and worst‐case behaviours – Examples include

  • Partitioned caches: eliminate impact from concurrent

tasks

  • Static bus arbiters: make it possible to derive worst‐

case latencies

11

slide-12
SLIDE 12

Architecture‐Centric

Handling Contention Architecture Centric

12

slide-13
SLIDE 13

Hardware support for handling contention

  • Bound contention impact on access time to

hardware shared resources

– TTA (<‘00), PRET (’06), CompSOC (‘09), MERASA (‘07), …

  • Time composability

– WCET estimates

  • The execution time of a task varies under different

workloads its WCET estimate does not

– Execution time

  • Same execution time under any workload
  • Time composability is achieved by ‘resource

reservation’  performance degradation

13

slide-14
SLIDE 14

Hardware support for handling contention

  • Bound contention impact on access time to

hardware shared resources

– Indirectly: bandwidth guarantees – Directly: access time guarantees

  • Type of resources

– Stateless (e.g bus): access policy – Stateful (e.g. cache): partition to prevent task interaction

  • NoC

14

slide-15
SLIDE 15

COTS

Handling Contention COTS Centric

15

slide-16
SLIDE 16

Challenge

  • Time analyzability properties of real COTS

multicores

– No assumptions can be made – Analyze hardware shared resources – Analyze their impact on execution time – Bounds derived by ad‐hoc experiments

  • Understanding timing behavior of hardware

shared resources

– The way they challenge timing analyzability

  • Software cache partitioning on ARM A9

16

slide-17
SLIDE 17

Critique

17

slide-18
SLIDE 18

System‐centric

  • Time Analysis frameworks: assumptions

– One shared resource, blocking and no split – Program broken down into superblocks with resource usage bounds per block – Dynamic arbiters

  • WCET estimate dependent on co‐runners: this can be

tightened but it is no longer time composable

  • Task assignment and scheduling

– Static task‐to‐CPU assignment determines opponents

  • This is good but not enough unless you have a viable

technique to avoid exploring the space of all possible contentions

  • Static over‐provisioning is never good news and may defeat

the purpose

18

slide-19
SLIDE 19

WCET‐centric techniques

  • Assumptions
  • Limits

Independent analysis Joint analysis

  • Static (boundable)

arbitration of shared resources

  • One task per core,

schedule known Independent analysis Joint analysis

  • Pessimism (blind

estimation of contention)

  • Not time composable
  • Complexity (state

explosion)

19

slide-20
SLIDE 20

Architecture‐centric

  • Will the proposed designs ever see the silicon?

– Applies to all hardware designs ;‐) – Cache partitioning mechanisms: won battle – Proposed changes are ‘simple’

  • Timing Anomalies

– Design hardware that prevents appearance of TA

20

slide-21
SLIDE 21

COTS‐centric

  • Architectural support for isolation or

controlled contention

– Not fully adopted!

  • This generates uncertainty

– Build confidence arguments in accordance with requirements and practices of the application domain – How safety assurance relate to stipulating bounds

  • n execution time

21

slide-22
SLIDE 22

Concluding remarks

  • More understanding of existing techniques is

needed

– Do they form a consistent picture from which a user can choose sensibly?

  • What is the top priority for the industrial user

– Question for the audience

  • Seeking time composability vs. guaranteed

performance

– First negatively affects the second – Not possible in the single‐core sense  compositional

22

slide-23
SLIDE 23

Work mainly funded by …

23