Motivation Memory is one of the most energy hungry subsystems in - PDF document

Efficient Utilization of Scratch ‐ Pad Memory in Preemptive Multi ‐ Task Systems Hiroyuki Tomiyama Ritsumeikan University http://hiroyuki.tomiyama ‐ lab.org/ Motivation  Memory is one of the most energy ‐ hungry subsystems in embedded systems  Up to 50% of total energy  Cache improves energy efficiency by reducing off ‐ chip memory accesses  Cache is still energy hungry because of  Tag comparison  Automatic replacement mechanism  Parallel accesses to multiple ways (in high ‐ performance cache)  Use of SPM instead of (in addition to) cache  Normalized read energy (calculated by CACTI 5.0) SPM Direct $ 2 ‐ way $ 4 ‐ way $ 1 1.56 1.93 2.54  SPM is energy efficient but small 2

Overview  How to efficiently utilize SPM in the presence of multiple tasks?  For simplicity, this talk focuses on instruction memory  Static data is OK, but stack and heap data need special care.  Outline  SPM partitioning and code allocation for  Non ‐ preemptive multi ‐ task systems  [Takase, Tomiyama and Takada, VLSI ‐ DAT 2009]  Preemptive multi ‐ task systems  [Takase, Tomiyama and Takada, DATE 2010]  Code layout for inter ‐ task Interference minimization  [Gauthier, Ishihara, Takase, Tomiyama and Takada, CASES 2010]  Main Contributors  Hideki Takase (Ph.D. Candidate, Nagoya University)  Lovic Gauthier (Associate Professor, Kyushu University) 3 SPM Allocation Principle  Which memory objects should be placed in SPM?  Memory objects can be functions (procedures), basic blocks, or other granularity.  For simplicity, we consider functions as memory objects  Knapsack problem 1 if i ‐ th function is placed in SPM. Otherwise, 0.  x i # of accesses to i ‐ th function  fetch i (Obtained by profiling) Code size of i ‐ th function  size i Σ i fetch i × x i  Maximize Σ i size i × x i ≦ SPMsize  Subject to 4

Task Execution Model  Task states dispatch  Dormant / Ready / Running Ready Running  Task scheduling policy activate terminate Dormant  All tasks are periodic and independent  Fixed ‐ priority ‐ based scheduling  The highest priority task among ready tasks gets dispatched when CPU becomes available  Periods and priorities of tasks are statically decided  No task preemption 5 SPM Partitioning and Code Allocation  SPM partitioning  Assignment of SPM address space to tasks  Code allocation  Assignment of memory objects to SPM  Three methods  Spatial method  Temporal method  Hybrid method Execution time Execution time Execution time task Task1 arrival task is Task2 runnig MM-SPM Task3 copy 6

Spatial Method  SPM space is exclusively partitioned and assigned to tasks  No transfer necessary region SPM between SPM and main memory  Effective for large SPM  ILP Formulation of simultaneous partitioning and allocation  func i,j j ‐ th function of i ‐ th task 1 if func i,j is placed in SPM  x i,j Period of i ‐ th task  period i  hyperperiod Least common multiple of periods 7 Temporal Method  Running task may use entire SPM space  When dispatched, code is region transferred from main SPM memory to SPM.  Effective for small SPM  ILP formulation of simultaneous partitioning and allocation  Eoverhead i,j Energy overhead for transfer of func i,j 1 if func i,j is placed in SPM  y i,j 8

Hybrid Method  Mixture of spatial and temporal approaches  More flexible than the two approaches  Partition the SPM space into two regions  Spatial region  Temporal region  Spatial region is further partitioned and assigned to tasks statically Execution time task Task1 arrival task is Task2 running MM-SPM Task3 copy Spatial region region SPM Temporal region 9 Hybrid Method  ILP Formulation Partitioning of SPM into spatial region and temporal one Partitioning of spatial region into tasks Code allocation for temporal region 10

Experimental Setup and Tools  Simulator : SimpleScalar / ARM  An instruction ‐ set simulator of ARM7TDMI microprocessor  Compiler : arm ‐ linux ‐ gcc 2.95.2  ILP solver : GNU GLPK 4.23  Memory configurations:  On ‐ chip: 16KBytes 4 ‐ way cache ＋ 4K / 8K / 12K / 16KBytes SPM  Energy model: CACTI 4.2  Off ‐ chip main memory: Mobile DDR SDRAM  Energy model: Micron System ‐ Power Calculator  Benchmark task sets (from MiBench suite)  TasksetA ： bf / tiff2rgba  TasksetB ： cjpeg / crc / qsort / tiff2rgba  TasksetC ： bitcnts / cjpeg / ispell / rawcaudio / sha  TasksetD ： bitcnts / bf / crc / dijkstra / ispell / qsort / rawcaudio / sha  TasksetE ： bitcnts / bf / cjpeg / crc / dijkstra / ispell / qsort / rawcaudio / sha / tiff2rgba 11 Experimental Procedure 12

Results: TasksetE (10 tasks) 80.0 cache hit cache miss -47.2 % SPM hit Overhead 60.0 Energy [mJ] 40.0 20.0 0.0 Spt Tmp Hyb Spt Tmp Hyb Spt Tmp Hyb Spt Tmp Hyb Std Std Std Std 4k 8k 12k 16k Std ： Simple spatial method where SPM is partitioned equally to every task Spt: Spatial method, Tmp: Temporal method, Hyb: Hybrid method 13 Results: TasksetA (2 tasks) 4.0 cache hit cache miss SPM hit Overhead -28.4 % 3.0 Energy [mJ] 2.0 1.0 0.0 Spt Tmp Hyb Spt Tmp Hyb Spt Tmp Hyb Spt Tmp Hyb Std Std Std Std 4k 8k 12k 16k 14

Results: TasksetA / TasksetC / TasksetE 1.2 Cache hit Cache miss SPM hit Overhead 1.0 Normalized Energy Consumption 0.8 0.6 0.4 0.2 0.0 Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn 4K 8K 12K 16K 4K 8K 12K 16K 4K 8K 12K 16K setA setC setE  Hybrid approach is stably good  Increased SPM size is not always effective 15 Preemptive Multi ‐ Task Systems dispatch  Task states Ready Running  Dormant / Ready / Running preempted  Task scheduling policy terminate activate  All tasks are periodic and independent Dormant  Fixed ‐ priority preemptive scheduling  Periods and priorities of tasks are statically decided  Higher ‐ priority task preempts lower ‐ priority task under execution 16

SPM Partitioning and Code Allocation  Spatial method  Same as non ‐ preemptive systems  Temporal method  Hybrid method 17 Temporal Method  Running task may use entire SPM space  Program code is transferred at most twice per execution When the task gets started 1. When a higher priority task is completed, and a preempted 2. task resumes execution  The contents of the preempted task needs to be restored into SPM 18

Temporal Method: ILP Formulation  Eoverhead i,j : Energy consumption of transferring func i,j  SPMsize_tmp i : Amount of SPM space that task i can use.  y i,j : 1 if func i,j is placed in SPM. 19 Hybrid Method  Mixture of the two methods  At compile time, SPM is partitioned by the spatial method  At run time, a higher priority task may preempt not only CPU but also SPM space of lower ‐ priority tasks  Reduces overhead of high ‐ priority tasks Execution time Execution time Execution time Execution time task task task task Task1 Task1 Task1 Task1 arrival arrival arrival arrival task is task is task is task is Task2 Task2 Task2 Task2 running running running running Task3 Task3 Task3 Task3 MM-SPM MM-SPM MM-SPM MM-SPM copy copy copy copy Task1 preempts SPM The contents of spaces of Task2 and 3 SPM is restored 20

Hybrid Method: ILP Formulation  SPMsize_spt i  SPM size statically assigned to task i by spatial method  Constraint (1)  SPMsize_tmp i  SPM size which task i preempts by temporal method  Constraint (2) 21 Experimental Setup and Tools  Simulator: SkyEye ‐ 1.2.6_rc1 (ARM920T)  ILP solver: GNU GLPK 4.23  Compiler: arm ‐ elf ‐ gcc 4.1.1  RTOS: TOPPERS/ASP Kernel (Release 1.3.2)  Memory configurations:  On ‐ chip: 4 KBytes 4 ‐ way cache ＋ 1 / 2 / 4 / 8 KBytes SPM  Off ‐ chip main memory: Mobile DDR SDRAM  Energy model: CACTI 5.3  Task sets: tasks are selected from EEMBC suites  SetA: aifftr, basefp, bitmnp, cacheb, idctrn  SetB: bezier, dither, ospf, pktflow, rotate, routelookup, text  SetC: conven, rgbcmy, rgbriq, viterb, and SetB  SetD: SetA and SetC  The periods were set according to be proportional to their execution times  The total CPU utilization rate of the task set was set about 50 % 22

Overall Workflow 23 Results: SetC (11 tasks) cache hit cache miss 1600 -73 % Energy Consumption [uJ] SPM hit overhead 1200 800 400 0 Tmp Tmp Tmp Tmp Std Spt Hyb Std Spt Hyb Std Spt Hyb Std Spt Hyb 1k 2k 4k 8k Std: Simple method where SPM space is partitioned equally to each task Spt: Spatial method, Tmp: Temporal method, Hyb: Hybrid method 24

Motivation Memory is one of the most energy hungry subsystems in - PDF document

Efficient Utilization of Scratch Pad Memory in Preemptive Multi Task Systems Hiroyuki Tomiyama Ritsumeikan University http://hiroyuki.tomiyama lab.org/ Motivation Memory is one of the most energy hungry subsystems in embedded

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

Motivation: Theory & practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

5. Motivation Motivation: Big Questions Where does motivation come from? Can

Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 Motivation Indoor

UBER RUSH AND REBUILDING UBERS DISPATCHING PLATFORM motivation CHAPTER 1 OF 8 MOTIVATION

MOTIVATION MOTIVATION Dr. M. Thenmozhi Professor Department of Management Studies Indian

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

MOTIVATION Watch this video on intrinsic versus extrinsic motivation Value x Expectation (of

Learner Motivation Motivational Self-Reflection Self-Reflection Time Travel Think about a time

Motivation What is Motivation? How motivated are you now? What are your thoughts as you enter

RedGate - Enterprise MSE Project - Phase I Integration Server Motivation 2 Motivation 2

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/13/2011 Motivation and Toolkits

Recent work in Truncated Statistics Andrew Ilyas Motivation: Poincar and the Baker

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/14/2014 Motivation and Toolkits

Towards Automated Generation of Time-Predictable Code 1 Daniel Prokesch, Benedikt Huber, Peter

BEST: a Binary Executable Slicing Tool and its use to improve Model Checking-based WCET Analysis

RIStAL Centre de Recherche en Informatique, Signal et Automatique de Lille 1 Outline

VideoLAN VLC 3.0.0 Jean-Baptiste Kempf samedi 30 janvier 2016 Ecole Centrale Paris The Cone

Heterogeneous Latch-based Asynchronous Pipelines Girish Venkataramani Tiberiu Chelcea Seth C.

VPIM Voice Profile for Internet Mail http://www.ema.org/vpim http://www.vpim.org VPIM WG chair:

SimpleScalar Overview Slides borrowed with permission from Todd Austin info@simplescalar.com

Applications & transport Example client/server systems and network their

Motivation Memory is one of the most energy hungry subsystems in - PDF document

Efficient Utilization of Scratch Pad Memory in Preemptive Multi Task Systems Hiroyuki Tomiyama Ritsumeikan University http://hiroyuki.tomiyama lab.org/ Motivation Memory is one of the most energy hungry subsystems in embedded

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&amp;M University Motivation

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

Motivation: Theory &amp; practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

5. Motivation Motivation: Big Questions Where does motivation come from? Can

Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 Motivation Indoor

UBER RUSH AND REBUILDING UBERS DISPATCHING PLATFORM motivation CHAPTER 1 OF 8 MOTIVATION

MOTIVATION MOTIVATION Dr. M. Thenmozhi Professor Department of Management Studies Indian

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

MOTIVATION Watch this video on intrinsic versus extrinsic motivation Value x Expectation (of

Learner Motivation Motivational Self-Reflection Self-Reflection Time Travel Think about a time

Motivation What is Motivation? How motivated are you now? What are your thoughts as you enter

RedGate - Enterprise MSE Project - Phase I Integration Server Motivation 2 Motivation 2

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/13/2011 Motivation and Toolkits

Recent work in Truncated Statistics Andrew Ilyas Motivation: Poincar and the Baker

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/14/2014 Motivation and Toolkits

Towards Automated Generation of Time-Predictable Code 1 Daniel Prokesch, Benedikt Huber, Peter

BEST: a Binary Executable Slicing Tool and its use to improve Model Checking-based WCET Analysis

RIStAL Centre de Recherche en Informatique, Signal et Automatique de Lille 1 Outline

VideoLAN VLC 3.0.0 Jean-Baptiste Kempf samedi 30 janvier 2016 Ecole Centrale Paris The Cone

Heterogeneous Latch-based Asynchronous Pipelines Girish Venkataramani Tiberiu Chelcea Seth C.

VPIM Voice Profile for Internet Mail http://www.ema.org/vpim http://www.vpim.org VPIM WG chair:

SimpleScalar Overview Slides borrowed with permission from Todd Austin info@simplescalar.com

Applications &amp; transport Example client/server systems and network their

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

Motivation: Theory & practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

Applications & transport Example client/server systems and network their