MULTICORE SHARED MEMORY IN INTERFERENCE ANALYSIS THROUGH HARDWARE - - PowerPoint PPT Presentation

multicore shared memory in interference analysis through
SMART_READER_LITE
LIVE PREVIEW

MULTICORE SHARED MEMORY IN INTERFERENCE ANALYSIS THROUGH HARDWARE - - PowerPoint PPT Presentation

MULTICORE SHARED MEMORY IN INTERFERENCE ANALYSIS THROUGH HARDWARE PERFORMANCE COUNTERS Alfonso Mascareas Gonzlez Youcef Bouchebaba Luca Santinelli GEN-F178-3 (GEN-SCI-029) PLAN 1. Objectives 2. Background 3. Multicore device 4.


slide-1
SLIDE 1

GEN-F178-3 (GEN-SCI-029)

MULTICORE SHARED MEMORY IN INTERFERENCE ANALYSIS THROUGH HARDWARE PERFORMANCE COUNTERS

Alfonso Mascareñas González Youcef Bouchebaba Luca Santinelli

slide-2
SLIDE 2

PLAN

  • 1. Objectives
  • 2. Background
  • 3. Multicore device
  • 4. Measurement framework
  • 5. Task design
  • 6. Statistical application
  • 7. Results
  • 8. Conclusions

2

slide-3
SLIDE 3

OBJECTIVES

  • Design and validate a Performance Monitor Hardware measurement based framework
  • Analyze memory interference within a multicore system
  • Check the pWCET applicability on the obtained results

3

slide-4
SLIDE 4

BACKGROUND

  • Critical application: Meet timing conditions
  • Single core vs Multicore processor systems
  • Multicore systems

+ Throughput + SWaP (Size, Weight and Power)

  • Predictability: Interference within the whole platform increases
  • Timing analysis: Tasks Worst Case Execution Time (WCET) to

Tasks probabilistic WCET (pWCET)

4

Memory Core Cache Interconnection Core 1 Cache Core 2 Cache Shared cache Interconnection Memory

slide-5
SLIDE 5

MULTICORE DEVICE: OVERVIEW

Keystone II TCI6630K2L

  • 2 ARM cores @ 1.2GHz
  • 4 DSP cores @ 1.2GHz
  • L1, L2 cache memories
  • MSM SRAM and DDR3 memories

5

slide-6
SLIDE 6

MULTICORE DEVICE: MEMORY ORGANIZATION

ARM1

32KB L1P 32KB L1D

ARM2

32KB L1P 32KB L1D

1MB L2 DSP3

32KB L1P 32KB L1D

1MB L2 DSP1

32KB L1P 32KB L1D

1MB L2 DSP4

32KB L1P 32KB L1D

1MB L2 DSP2

32KB L1P 32KB L1D

1MB L2 2MB MSM 2GB DDR

6

slide-7
SLIDE 7

MEASUREMENT FRAMEWORK

  • Performance Monitor Hardware (PMH):
  • Coprocessors
  • Performance Monitor Unit (PMU): 6 general counters + 1 cycle specific

counter

  • Start-read access pattern:

1. Selection of the counter 2. Selection of the event 3. Enable counter 4. Reset counter 5. Read actual counter value (first time) 6. Run critical task 7. Read actual counter value (second time) and make the difference

Events (~ 80) L1 data cache refill L1 data cache access Mispredicted branch speculatively executed Execution cycles L2 data cache access L2 data cache refill L2 data cache Write-Back Bus access Data memory access …

7

slide-8
SLIDE 8

TASKS DESIGN

  • The real-time applications:
  • Critical task: The one under observation. Three

stressing levels to choose (safety1, safety2, safety3)

  • Non-critical tasks: Act as memory stressing source

Loops Simple operations Matrices: Main memory demanding source Tasks are continuously being executed. They are structured as follows:

▪ Critical task in 1 ARM ▪ Non-critical task in 1 ARM and 4 DSPs ARMs are managed by PikeOS DSPs are fully bare metal

8

slide-9
SLIDE 9

STATISTICAL APPLICATION: pWCET & EVT

MBPTA = Measurement-Based Probabilistic Timing Analysis MBTA = Measurement-Based Timing Analysis EVT = Extreme Value Theorem

MBTA Measures EVT Relative WCET MBPTA

Hypothesis to fulfill: 1. Stationarity 2. Short or Long range independence 3. Maximum Domain of Attraction (MDA)

Relative pWCET

9

slide-10
SLIDE 10

SCENARIOS: DESIGN

Four possible scenarios:

  • 1. Critical task analysis
  • 2. Critical task + ARM non-critical task analysis
  • 3. Critical task + DSPs non-critical task analysis
  • 4. Critical task + ARM and DSPs non-critical task analysis

10

ARM Critical Task ARM Critical Task

ARM Non- critical Task Scenario1 Scenario2

ARM Critical Task

DSP Non-critical Task

Scenario3

DSP Non-critical Task DSP Non-critical Task DSP Non-critical Task

ARM Critical Task

DSP Non-critical Task

Scenario4

DSP Non-critical Task DSP Non-critical Task DSP Non-critical Task

ARM Non- critical Task

slide-11
SLIDE 11

SCENARIO 1 RESULTS: EXECUTION CYCLES (SAFETY1)

Memory usage = 32KB Memory usage = 128KB

11

38498

L1-L2 L2

139000 cycles

140957

36000 cycles

slide-12
SLIDE 12

SCENARIO 1 RESULTS: EXECUTION CYCLES (SAFETY1)

Memory usage = 512KB Memory usage = 2MB

12

L2 DDR

542626 cycles

slide-13
SLIDE 13

13

Safety1 Safety3

SCENARIO 1 SUMMARY: EXECUTION CYCLES

slide-14
SLIDE 14

SCENARIO 2 RESULTS: EXECUTION CYCLES (SAFETY1)

Memory usage = 128KB Memory usage = 2MB Non-critical task memory usage = 2MB

14

DDR DDR

slide-15
SLIDE 15

Safety1

Memory Size (KB) Mean Overhead (%) Max Overhead (%)

8 0,185 11,241 32 7,362 13,735 128 21,228 45,112 512 10,72 23,481 2048 4,091 4,363 Non-critical task memory usage = 2MB

15

SCENARIO 2 SUMMARY: EXECUTION CYCLES

slide-16
SLIDE 16

SCENARIO 3 RESULTS: EXECUTION CYCLES (SAFETY1)

Memory usage = 8MB Memory usage = 8MB Non-critical task memory usage = 12MB 0 DSPs 1 DSPs

16

slide-17
SLIDE 17

SCENARIO 3 RESULTS: EXECUTION CYCLES (SAFETY1)

Memory usage = 8MB Memory usage = 8MB Non-critical task memory usage = 12MB 2 DSPs 3 DSPs

17

slide-18
SLIDE 18

SCENARIO 3 RESULTS: EXECUTION CYCLES (SAFETY1)

Memory usage = 8MB Non-critical task memory usage = 12MB 4 DSPs

18

slide-19
SLIDE 19

Safety1

Cores Mean Overhead (%) Max Overhead (%)

ARM ARM + 1DSP 0,299 0,363 ARM + 2DSP 0,659 1,105 ARM + 3DSP 1,854 5,769 ARM + 4DSP 4,514 14,991

19

Data caches have been turned off

SCENARIO 3 SUMMARY: EXECUTION CYCLES

slide-20
SLIDE 20

PREDICTABILITY

EVT application to the different scenarios

  • Hypothesis check
  • Inverse Cumulative Distribution

Function (ICDF)

  • Pay attention to its convergence

20

Memory usage = 128KB

slide-21
SLIDE 21

CONCLUSIONS

  • Measurements based on Performance Monitor Hardware successfully works
  • The EVT can successfully predict the outcome
  • The best placement strategy is:

1. The critical task in one ARM core 2. Non-critical tasks in the DSPs (Resource accessing arbitration may be used if needed) 3. Non-critical tasks in the second ARM (main interference source)

21