 
              Efficient Utilization of Scratch ‐ Pad Memory in Preemptive Multi ‐ Task Systems Hiroyuki Tomiyama Ritsumeikan University http://hiroyuki.tomiyama ‐ lab.org/ Motivation  Memory is one of the most energy ‐ hungry subsystems in embedded systems  Up to 50% of total energy  Cache improves energy efficiency by reducing off ‐ chip memory accesses  Cache is still energy hungry because of  Tag comparison  Automatic replacement mechanism  Parallel accesses to multiple ways (in high ‐ performance cache)  Use of SPM instead of (in addition to) cache  Normalized read energy (calculated by CACTI 5.0) SPM Direct $ 2 ‐ way $ 4 ‐ way $ 1 1.56 1.93 2.54  SPM is energy efficient but small 2
Overview  How to efficiently utilize SPM in the presence of multiple tasks?  For simplicity, this talk focuses on instruction memory  Static data is OK, but stack and heap data need special care.  Outline  SPM partitioning and code allocation for  Non ‐ preemptive multi ‐ task systems  [Takase, Tomiyama and Takada, VLSI ‐ DAT 2009]  Preemptive multi ‐ task systems  [Takase, Tomiyama and Takada, DATE 2010]  Code layout for inter ‐ task Interference minimization  [Gauthier, Ishihara, Takase, Tomiyama and Takada, CASES 2010]  Main Contributors  Hideki Takase (Ph.D. Candidate, Nagoya University)  Lovic Gauthier (Associate Professor, Kyushu University) 3 SPM Allocation Principle  Which memory objects should be placed in SPM?  Memory objects can be functions (procedures), basic blocks, or other granularity.  For simplicity, we consider functions as memory objects  Knapsack problem 1 if i ‐ th function is placed in SPM. Otherwise, 0.  x i # of accesses to i ‐ th function  fetch i (Obtained by profiling) Code size of i ‐ th function  size i Σ i fetch i × x i  Maximize Σ i size i × x i ≦ SPMsize  Subject to 4
Task Execution Model  Task states dispatch  Dormant / Ready / Running Ready Running  Task scheduling policy activate terminate Dormant  All tasks are periodic and independent  Fixed ‐ priority ‐ based scheduling  The highest priority task among ready tasks gets dispatched when CPU becomes available  Periods and priorities of tasks are statically decided  No task preemption 5 SPM Partitioning and Code Allocation  SPM partitioning  Assignment of SPM address space to tasks  Code allocation  Assignment of memory objects to SPM  Three methods  Spatial method  Temporal method  Hybrid method Execution time Execution time Execution time task Task1 arrival task is Task2 runnig MM-SPM Task3 copy 6
Spatial Method  SPM space is exclusively partitioned and assigned to tasks  No transfer necessary region SPM between SPM and main memory  Effective for large SPM  ILP Formulation of simultaneous partitioning and allocation  func i,j j ‐ th function of i ‐ th task 1 if func i,j is placed in SPM  x i,j Period of i ‐ th task  period i  hyperperiod Least common multiple of periods 7 Temporal Method  Running task may use entire SPM space  When dispatched, code is region transferred from main SPM memory to SPM.  Effective for small SPM  ILP formulation of simultaneous partitioning and allocation  Eoverhead i,j Energy overhead for transfer of func i,j 1 if func i,j is placed in SPM  y i,j 8
Hybrid Method  Mixture of spatial and temporal approaches  More flexible than the two approaches  Partition the SPM space into two regions  Spatial region  Temporal region  Spatial region is further partitioned and assigned to tasks statically Execution time task Task1 arrival task is Task2 running MM-SPM Task3 copy Spatial region region SPM Temporal region 9 Hybrid Method  ILP Formulation Partitioning of SPM into spatial region and temporal one Partitioning of spatial region into tasks Code allocation for temporal region 10
Experimental Setup and Tools  Simulator : SimpleScalar / ARM  An instruction ‐ set simulator of ARM7TDMI microprocessor  Compiler : arm ‐ linux ‐ gcc 2.95.2  ILP solver : GNU GLPK 4.23  Memory configurations:  On ‐ chip: 16KBytes 4 ‐ way cache + 4K / 8K / 12K / 16KBytes SPM  Energy model: CACTI 4.2  Off ‐ chip main memory: Mobile DDR SDRAM  Energy model: Micron System ‐ Power Calculator  Benchmark task sets (from MiBench suite)  TasksetA : bf / tiff2rgba  TasksetB : cjpeg / crc / qsort / tiff2rgba  TasksetC : bitcnts / cjpeg / ispell / rawcaudio / sha  TasksetD : bitcnts / bf / crc / dijkstra / ispell / qsort / rawcaudio / sha  TasksetE : bitcnts / bf / cjpeg / crc / dijkstra / ispell / qsort / rawcaudio / sha / tiff2rgba 11 Experimental Procedure 12
Results: TasksetE (10 tasks) 80.0 cache hit cache miss -47.2 % SPM hit Overhead 60.0 Energy [mJ] 40.0 20.0 0.0 Spt Tmp Hyb Spt Tmp Hyb Spt Tmp Hyb Spt Tmp Hyb Std Std Std Std 4k 8k 12k 16k Std : Simple spatial method where SPM is partitioned equally to every task Spt: Spatial method, Tmp: Temporal method, Hyb: Hybrid method 13 Results: TasksetA (2 tasks) 4.0 cache hit cache miss SPM hit Overhead -28.4 % 3.0 Energy [mJ] 2.0 1.0 0.0 Spt Tmp Hyb Spt Tmp Hyb Spt Tmp Hyb Spt Tmp Hyb Std Std Std Std 4k 8k 12k 16k 14
Results: TasksetA / TasksetC / TasksetE 1.2 Cache hit Cache miss SPM hit Overhead 1.0 Normalized Energy Consumption 0.8 0.6 0.4 0.2 0.0 Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn 4K 8K 12K 16K 4K 8K 12K 16K 4K 8K 12K 16K setA setC setE  Hybrid approach is stably good  Increased SPM size is not always effective 15 Preemptive Multi ‐ Task Systems dispatch  Task states Ready Running  Dormant / Ready / Running preempted  Task scheduling policy terminate activate  All tasks are periodic and independent Dormant  Fixed ‐ priority preemptive scheduling  Periods and priorities of tasks are statically decided  Higher ‐ priority task preempts lower ‐ priority task under execution 16
SPM Partitioning and Code Allocation  Spatial method  Same as non ‐ preemptive systems  Temporal method  Hybrid method 17 Temporal Method  Running task may use entire SPM space  Program code is transferred at most twice per execution When the task gets started 1. When a higher priority task is completed, and a preempted 2. task resumes execution  The contents of the preempted task needs to be restored into SPM 18
Temporal Method: ILP Formulation  Eoverhead i,j : Energy consumption of transferring func i,j  SPMsize_tmp i : Amount of SPM space that task i can use.  y i,j : 1 if func i,j is placed in SPM. 19 Hybrid Method  Mixture of the two methods  At compile time, SPM is partitioned by the spatial method  At run time, a higher priority task may preempt not only CPU but also SPM space of lower ‐ priority tasks  Reduces overhead of high ‐ priority tasks Execution time Execution time Execution time Execution time task task task task Task1 Task1 Task1 Task1 arrival arrival arrival arrival task is task is task is task is Task2 Task2 Task2 Task2 running running running running Task3 Task3 Task3 Task3 MM-SPM MM-SPM MM-SPM MM-SPM copy copy copy copy Task1 preempts SPM The contents of spaces of Task2 and 3 SPM is restored 20
Hybrid Method: ILP Formulation  SPMsize_spt i  SPM size statically assigned to task i by spatial method  Constraint (1)  SPMsize_tmp i  SPM size which task i preempts by temporal method  Constraint (2) 21 Experimental Setup and Tools  Simulator: SkyEye ‐ 1.2.6_rc1 (ARM920T)  ILP solver: GNU GLPK 4.23  Compiler: arm ‐ elf ‐ gcc 4.1.1  RTOS: TOPPERS/ASP Kernel (Release 1.3.2)  Memory configurations:  On ‐ chip: 4 KBytes 4 ‐ way cache + 1 / 2 / 4 / 8 KBytes SPM  Off ‐ chip main memory: Mobile DDR SDRAM  Energy model: CACTI 5.3  Task sets: tasks are selected from EEMBC suites  SetA: aifftr, basefp, bitmnp, cacheb, idctrn  SetB: bezier, dither, ospf, pktflow, rotate, routelookup, text  SetC: conven, rgbcmy, rgbriq, viterb, and SetB  SetD: SetA and SetC  The periods were set according to be proportional to their execution times  The total CPU utilization rate of the task set was set about 50 % 22
Overall Workflow 23 Results: SetC (11 tasks) cache hit cache miss 1600 -73 % Energy Consumption [uJ] SPM hit overhead 1200 800 400 0 Tmp Tmp Tmp Tmp Std Spt Hyb Std Spt Hyb Std Spt Hyb Std Spt Hyb 1k 2k 4k 8k Std: Simple method where SPM space is partitioned equally to each task Spt: Spatial method, Tmp: Temporal method, Hyb: Hybrid method 24
Recommend
More recommend