motivation
play

Motivation Memory is one of the most energy hungry subsystems in - PDF document

Efficient Utilization of Scratch Pad Memory in Preemptive Multi Task Systems Hiroyuki Tomiyama Ritsumeikan University http://hiroyuki.tomiyama lab.org/ Motivation Memory is one of the most energy hungry subsystems in embedded


  1. Efficient Utilization of Scratch ‐ Pad Memory in Preemptive Multi ‐ Task Systems Hiroyuki Tomiyama Ritsumeikan University http://hiroyuki.tomiyama ‐ lab.org/ Motivation  Memory is one of the most energy ‐ hungry subsystems in embedded systems  Up to 50% of total energy  Cache improves energy efficiency by reducing off ‐ chip memory accesses  Cache is still energy hungry because of  Tag comparison  Automatic replacement mechanism  Parallel accesses to multiple ways (in high ‐ performance cache)  Use of SPM instead of (in addition to) cache  Normalized read energy (calculated by CACTI 5.0) SPM Direct $ 2 ‐ way $ 4 ‐ way $ 1 1.56 1.93 2.54  SPM is energy efficient but small 2

  2. Overview  How to efficiently utilize SPM in the presence of multiple tasks?  For simplicity, this talk focuses on instruction memory  Static data is OK, but stack and heap data need special care.  Outline  SPM partitioning and code allocation for  Non ‐ preemptive multi ‐ task systems  [Takase, Tomiyama and Takada, VLSI ‐ DAT 2009]  Preemptive multi ‐ task systems  [Takase, Tomiyama and Takada, DATE 2010]  Code layout for inter ‐ task Interference minimization  [Gauthier, Ishihara, Takase, Tomiyama and Takada, CASES 2010]  Main Contributors  Hideki Takase (Ph.D. Candidate, Nagoya University)  Lovic Gauthier (Associate Professor, Kyushu University) 3 SPM Allocation Principle  Which memory objects should be placed in SPM?  Memory objects can be functions (procedures), basic blocks, or other granularity.  For simplicity, we consider functions as memory objects  Knapsack problem 1 if i ‐ th function is placed in SPM. Otherwise, 0.  x i # of accesses to i ‐ th function  fetch i (Obtained by profiling) Code size of i ‐ th function  size i Σ i fetch i × x i  Maximize Σ i size i × x i ≦ SPMsize  Subject to 4

  3. Task Execution Model  Task states dispatch  Dormant / Ready / Running Ready Running  Task scheduling policy activate terminate Dormant  All tasks are periodic and independent  Fixed ‐ priority ‐ based scheduling  The highest priority task among ready tasks gets dispatched when CPU becomes available  Periods and priorities of tasks are statically decided  No task preemption 5 SPM Partitioning and Code Allocation  SPM partitioning  Assignment of SPM address space to tasks  Code allocation  Assignment of memory objects to SPM  Three methods  Spatial method  Temporal method  Hybrid method Execution time Execution time Execution time task Task1 arrival task is Task2 runnig MM-SPM Task3 copy 6

  4. Spatial Method  SPM space is exclusively partitioned and assigned to tasks  No transfer necessary region SPM between SPM and main memory  Effective for large SPM  ILP Formulation of simultaneous partitioning and allocation  func i,j j ‐ th function of i ‐ th task 1 if func i,j is placed in SPM  x i,j Period of i ‐ th task  period i  hyperperiod Least common multiple of periods 7 Temporal Method  Running task may use entire SPM space  When dispatched, code is region transferred from main SPM memory to SPM.  Effective for small SPM  ILP formulation of simultaneous partitioning and allocation  Eoverhead i,j Energy overhead for transfer of func i,j 1 if func i,j is placed in SPM  y i,j 8

  5. Hybrid Method  Mixture of spatial and temporal approaches  More flexible than the two approaches  Partition the SPM space into two regions  Spatial region  Temporal region  Spatial region is further partitioned and assigned to tasks statically Execution time task Task1 arrival task is Task2 running MM-SPM Task3 copy Spatial region region SPM Temporal region 9 Hybrid Method  ILP Formulation Partitioning of SPM into spatial region and temporal one Partitioning of spatial region into tasks Code allocation for temporal region 10

  6. Experimental Setup and Tools  Simulator : SimpleScalar / ARM  An instruction ‐ set simulator of ARM7TDMI microprocessor  Compiler : arm ‐ linux ‐ gcc 2.95.2  ILP solver : GNU GLPK 4.23  Memory configurations:  On ‐ chip: 16KBytes 4 ‐ way cache + 4K / 8K / 12K / 16KBytes SPM  Energy model: CACTI 4.2  Off ‐ chip main memory: Mobile DDR SDRAM  Energy model: Micron System ‐ Power Calculator  Benchmark task sets (from MiBench suite)  TasksetA : bf / tiff2rgba  TasksetB : cjpeg / crc / qsort / tiff2rgba  TasksetC : bitcnts / cjpeg / ispell / rawcaudio / sha  TasksetD : bitcnts / bf / crc / dijkstra / ispell / qsort / rawcaudio / sha  TasksetE : bitcnts / bf / cjpeg / crc / dijkstra / ispell / qsort / rawcaudio / sha / tiff2rgba 11 Experimental Procedure 12

  7. Results: TasksetE (10 tasks) 80.0 cache hit cache miss -47.2 % SPM hit Overhead 60.0 Energy [mJ] 40.0 20.0 0.0 Spt Tmp Hyb Spt Tmp Hyb Spt Tmp Hyb Spt Tmp Hyb Std Std Std Std 4k 8k 12k 16k Std : Simple spatial method where SPM is partitioned equally to every task Spt: Spatial method, Tmp: Temporal method, Hyb: Hybrid method 13 Results: TasksetA (2 tasks) 4.0 cache hit cache miss SPM hit Overhead -28.4 % 3.0 Energy [mJ] 2.0 1.0 0.0 Spt Tmp Hyb Spt Tmp Hyb Spt Tmp Hyb Spt Tmp Hyb Std Std Std Std 4k 8k 12k 16k 14

  8. Results: TasksetA / TasksetC / TasksetE 1.2 Cache hit Cache miss SPM hit Overhead 1.0 Normalized Energy Consumption 0.8 0.6 0.4 0.2 0.0 Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn 4K 8K 12K 16K 4K 8K 12K 16K 4K 8K 12K 16K setA setC setE  Hybrid approach is stably good  Increased SPM size is not always effective 15 Preemptive Multi ‐ Task Systems dispatch  Task states Ready Running  Dormant / Ready / Running preempted  Task scheduling policy terminate activate  All tasks are periodic and independent Dormant  Fixed ‐ priority preemptive scheduling  Periods and priorities of tasks are statically decided  Higher ‐ priority task preempts lower ‐ priority task under execution 16

  9. SPM Partitioning and Code Allocation  Spatial method  Same as non ‐ preemptive systems  Temporal method  Hybrid method 17 Temporal Method  Running task may use entire SPM space  Program code is transferred at most twice per execution When the task gets started 1. When a higher priority task is completed, and a preempted 2. task resumes execution  The contents of the preempted task needs to be restored into SPM 18

  10. Temporal Method: ILP Formulation  Eoverhead i,j : Energy consumption of transferring func i,j  SPMsize_tmp i : Amount of SPM space that task i can use.  y i,j : 1 if func i,j is placed in SPM. 19 Hybrid Method  Mixture of the two methods  At compile time, SPM is partitioned by the spatial method  At run time, a higher priority task may preempt not only CPU but also SPM space of lower ‐ priority tasks  Reduces overhead of high ‐ priority tasks Execution time Execution time Execution time Execution time task task task task Task1 Task1 Task1 Task1 arrival arrival arrival arrival task is task is task is task is Task2 Task2 Task2 Task2 running running running running Task3 Task3 Task3 Task3 MM-SPM MM-SPM MM-SPM MM-SPM copy copy copy copy Task1 preempts SPM The contents of spaces of Task2 and 3 SPM is restored 20

  11. Hybrid Method: ILP Formulation  SPMsize_spt i  SPM size statically assigned to task i by spatial method  Constraint (1)  SPMsize_tmp i  SPM size which task i preempts by temporal method  Constraint (2) 21 Experimental Setup and Tools  Simulator: SkyEye ‐ 1.2.6_rc1 (ARM920T)  ILP solver: GNU GLPK 4.23  Compiler: arm ‐ elf ‐ gcc 4.1.1  RTOS: TOPPERS/ASP Kernel (Release 1.3.2)  Memory configurations:  On ‐ chip: 4 KBytes 4 ‐ way cache + 1 / 2 / 4 / 8 KBytes SPM  Off ‐ chip main memory: Mobile DDR SDRAM  Energy model: CACTI 5.3  Task sets: tasks are selected from EEMBC suites  SetA: aifftr, basefp, bitmnp, cacheb, idctrn  SetB: bezier, dither, ospf, pktflow, rotate, routelookup, text  SetC: conven, rgbcmy, rgbriq, viterb, and SetB  SetD: SetA and SetC  The periods were set according to be proportional to their execution times  The total CPU utilization rate of the task set was set about 50 % 22

  12. Overall Workflow 23 Results: SetC (11 tasks) cache hit cache miss 1600 -73 % Energy Consumption [uJ] SPM hit overhead 1200 800 400 0 Tmp Tmp Tmp Tmp Std Spt Hyb Std Spt Hyb Std Spt Hyb Std Spt Hyb 1k 2k 4k 8k Std: Simple method where SPM space is partitioned equally to each task Spt: Spatial method, Tmp: Temporal method, Hyb: Hybrid method 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend