CAM: CONSTRAINT AWARE APPLICATION MAPPING FOR APPLICATION MAPPING - PowerPoint PPT Presentation

CAM: CONSTRAINT ‐ AWARE APPLICATION MAPPING FOR APPLICATION MAPPING FOR EMBEDDED SYSTEMS Luis A. Bathen, Nikil D. Dutt

Outline 2 � Introduction & Motivation � Introduction & Motivation � CAM Overview � Memory ‐ aware Macro ‐ Pipelining � Customized Security Policy Generation � Customized Security Policy Generation � Related Work � Conclusion CASA '10 11/5/2010

Software/Hardware Co ‐ Design / g 4 Application � Given an existing applications, designers can li ti d i � Design a customized platform Mapping � Dedicated logic Process � Custom memory hierarchy / � Custom memory hierarchy / communication architecture � Take an existing platform and In this presentation we will focus on Scheduler CMP efficiently map the CPU1 Core CPU2 Core CPU n Core the application mapping process on Controller/ CMP CPU1 Core CPU2 Core CPU n Core Off ‐ chip application on it application on it CMP CMP B A CMPs memory DMA CPU1 Core CPU2 Core CPU n Core Off ‐ chip SPM1 SPM2 SPM n memory DMA � Data allocation and Task Off ‐ chip FIFO FIFO DC_LS, MCT, … SPM2 SPM n BPC/BAC SPM1 memory Data Data DMA Collector Dispatcher FIFO FIFO CPU SPM2 DWT w/iDMA SPM n mapping BPC/BAC SPM1 Data Data Data Data FIFO FIFO Tier2 BPC/BAC Data Data � Start with an existing Data Data AMBA 2.0 AMBA 2 0 BPC/BAC BPC/BAC BPC/BAC BPC/BAC platform and customize it to FIFO FIFO FIFO FIFO Data Data Data Data Data Data BPC/BAC BPC/BAC DWT w/iDMA DWT w/iDMA BPC/BAC satisfy the requirements FIFO FIFO FIFO FIFO Dispatcher Collector FIFO FIFO Image/ Data Data Data Data BPC/BAC DWT w/iDMA Data Data Bitstream FIFO FIFO Dispatcher BPC/BAC BPC/BAC Collector Data Data FIFO FIFO FIFO FIFO BPC/BAC � Add custom blocks, and reuse FIFO FIFO components p A B A A B B Controller/ Scheduler CASA '10 11/5/2010

Target Platforms (Chip Multiprocessors) Multiprocessors) 5 Multiple low power RISC Well suited for applications with high levels of Well suited for applications with high levels of cores cores parallelism CPU CPU DMA and SPM I$ SPM I$ SPM DMA Support I$ SPM I$ SPM RAM RAM Bus based systems Bus based systems – still still CPU CPU most commonly used CASA '10 11/5/2010

Motivation 6 Typical Mapping Process CMP CPU1 Core CPU2 Core Off chip Off ‐ chip Platform memory DMA C/C++ SPM1 SPM2 Definition e.g. iteration partitioning, unrolling, tiling… Apply loop Apply loop T2 Generate input task graph Generate input task graph 2 T . T 4 to scheduler optimizations T 1 T 5 T1 T2. T 1 3 What do we care about? Define Task Define Task energy performance? energy, performance? Mapping and T5 T1 T2.1 T3 CPU1 T2 Schedule CPU2 T4 T2.2 The whole process depends on p p T3 Task 1 Data Sets Task 1 Data Sets Task 3 Data Sets Task 3 Data Sets Task 2.1 Data Sets Task 2 1 Data Sets T4 the available resources Define Data Time Placement T5 Task 2.2 Data Sets Si Size ISS, CMP ISS? Simulate/Verify CASA '10 11/5/2010

Motivation (Cont.) ( ) 7 CMP CPU1 Core CPU2 Core CPU n Core Platform Off ‐ chip memory DMA Definition SPM1 SPM2 SPM n Apply loop optimizations ti i ti T 4 f T 1 T 5 T 3 This dependence shows the need to evaluate different Define Task optimizations, schedules, placements for power and optimizations schedules placements for power and M Mapping and i d T1 T2.1 T3 T5 CPU1 T2. performance in a quick yet accurate fashion Schedule CPU2 T4 2 T2.3 CPU3 T2. CPU4 4 Define Data Time Placement Size Simulate/Verify CASA '10 11/5/2010

CAM: Constraint ‐ aware Application Mapping for Embedded Systems Mapping for Embedded Systems 9 • Efficiently utilize memory resources • Very secure might mean very Very secure might mean very • Voltage/Frequency scaling power hungry/slow affect performance • Limited multiprocessor • Limits type of security support mechanisms • Solutions are generic Security Power •Policy Generation •Data partitioning and distribution •Selective Enforcement •Data reuse Data Application •Memory ‐ aware Placement/Sched Scheduling C/C++ ule/Policies Performance Performance •Task/kernel partitioning •Macro ‐ pipelining • •Early Execution Edges Fully utilize compute resources • Increased Parallelism • Increased vulnerabilities Increased vulnerabilities CASA '10 11/5/2010

CAM Overview 10 Front End Front End Middle End Middle End CMP Define CMP Application Pre ‐ CPU1 Core CPU2 Core CPU n Core Template Off ‐ chip processing memory DMA SPM1 SPM2 SPM n (CFG extraction, task graph generation, input h i i model generation) End up with End up with Task Decomposition Augmen Nope, let’s see if Task C1 K2 K3 Task G 1 Data Reuse Analysis massive task massive task massive task massive task increasing degree of increasing degree of Graph ntation Early Execution Edge unrolling (in loops) Task graphs graphs Generation C4 K5 C6 2 helps, tile size? Memory ‐ Aware Macro ‐ Pipelining Back End Back End CPU1 CPU1 C1 C1 K2 K2 CPU2 CPU2 C6 C6 K3 K3 Very tightly Very tightly Performance Meet Energy and CPU3 CPU3 C4 C4 K5 K5 Model Performance Constraints? Performance Constraints? Generation Generation coupled process! coupled process! l d l d ! ! SPM1 SPM1 K2D K2D K3D K3D CPU1 Core CPU2 Core CPU n Core CMP Off ‐ chip CPU1 Core CPU2 Core CPU n Core DM CMP SPM SPM SPM SPM SPM SPM memory Off ‐ chip A CPU1 Core CPU2 Core CPU n Core DM K5D K5D SPM2 SPM2 CMP 1 1 2 2 n n SPM SPM SPM SPM SPM SPM memory Off ‐ chip A DM CASA '10 11/5/2010 1 1 2 2 n n SPM SPM SPM SPM SPM SPM memory A 1 1 2 2 n n

Outline 11 � Introduction & Motivation � Introduction & Motivation � CAM Overview � Memory ‐ aware Macro ‐ Pipelining � ESTImedia ‘08, ’09 � ESTImedia 08, 09 � Customized Security Policy Generation � Related Work l d k � Conclusion CASA '10 11/5/2010

Application Domain Example (JPEG2000) (JPEG2000) 12 Task Set (T) Supports multiple levels of data parallelism t 1 t 1 t 2 t 2 t n t n t 1 t 1 DWT Quant. EBCOT t 2 DWT Quant. EBCOT t 3 DWT Quant. EBCOT t m t mn t mn DWT Quant. EBCOT CASA '10 11/5/2010

Inter ‐ kernel Reuse Opportunities pp 13 We target our approach to data intensive � streaming applications streaming applications Task level parallelism, Data level parallelism � Examples � Inter ‐ kernel data reuse Cache based systems are not Cache based systems are not Macroblock level (H.264) Macroblock level (H.264) � � opportunities are often suitable to meet these types Component level, Tile level, Code block level (JPEG2000) � ignored of applications void tiling() void tiling() { void mct() void dcls() // input: Yr, Ur, Vr { { { // output: n x tY tV tU // output: n x tY, tV, tU // input: B,G, R // input: B,G, R // output: Yr, Ur, Vr // output: B,G, R for( i=0; i<m; i+=tw) { for( j=0; j<n; j+=th) { for ( i = 0; i < width; i++) { for ( i = 0; i < width; i++) { for( k=0; k<tw; k++) { for ( j = 0; j < height; j++) { for ( j = 0; j < height; j++) { for( l=0; l<th; l++) { B[i][j] = B[i][j] tY[k][l] = Yr[i+k][j+l]; Yr[i][j] = [ ][j] - pow(2, info->siz - 1); po (2 info >si 1) tU[k][l] = Ur[i+k][j+l]; ceil((float)(R[i][j] + (2*(G[i][j])) + G[i][j] = B[i][j] tV[k][l] = Vr[i+k][j+l]; B[i][j])/4); - pow(2, info->siz - 1); } Ur[i][j] = B[i][j] - G[i][j]; R[i][j] = B[i][j] } Vr[i][j] = R[i][j] - G[i][j]; - pow(2, info->siz - 1); yCoeff=dwt(tY); } } yQ=quant(yCoeff); } } } } ebcot(yQ); ebcot(yQ); } } ……………… ……………... CASA '10 11/5/2010

Access Patterns and Data Requirements Requirements 14 Our proposal: Take kernels that produce large data streams and decompose them into smaller kernels producing smaller data streams decompose them into smaller kernels producing smaller data streams ress Add Iteration Iteration Task/Kernel Data Requirements: The Problem: � Data is read in, and written out by each DCLS: Consumption=Production=3MB � task Cannot keep ALL data in SPM and task. Cannot keep ALL data in SPM, and MCT: Same as DCLS, 3MB pass it to the next task. � Tiling: � Consumption = same as MCT, Production: 3 tiles at a time, 128x128 pixels (16KB), total of 16 ( ) � x 3 tiles CASA '10 11/5/2010

CAM: CONSTRAINT AWARE APPLICATION MAPPING FOR APPLICATION MAPPING - PowerPoint PPT Presentation

CAM: CONSTRAINT AWARE APPLICATION MAPPING FOR APPLICATION MAPPING FOR EMBEDDED SYSTEMS Luis A. Bathen, Nikil D. Dutt Outline 2 Introduction & Motivation Introduction & Motivation CAM Overview Memory aware Macro

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Constraint Networks Dario Maggi University Basel October 9, 2014 Dario Maggi Constraint

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Constraint Satisfaction Problems Chapter 5 Section 1 3 Constraint Satisfaction 1 Outline

Monthly Statewide CAM Call-in September 19, 2018 If you are not connected through Join by phone

CAM SLIDES Steel

Constraint-Based Refactoring Rename Field Problem Proven Correct Solution Constraint- Based

WEM Reform: Constraint Development Responsibilities PSO-WG Meeting 3 February 2019 1

On Minimal Constraint Networks Georg Gottlob Minimal Constraint Networks Montanari 1974: To

Combining Combining Constraint Programming Constraint Programming and Integer Programming and

Constraint Satisfaction Problems Chapter 6 Constraint Satisfaction Problems A constraint

Advanced Texturing Environment Mapping Environment Mapping reflections Environment Mapping

Texture Mapping Texture Mapping 1 Texture Mapping Texture Mapping Motivation Motivation:

Texture Mapping Surface mapping OpenGl and Implementation Details Texture mapping Bump

Snapshots: CAM, Being SMARTR, & Exercise Today ... Have a better understanding of what

LHC 750 GeV Diphoton excess in a radiative seesaw model through photon fusion production Kenji

Scaling limit of random planar maps Lecture 1. Olivier Bernardi, CNRS, Universit Paris-Sud

Detecting Communities of Commuters: Graph Based Techniques vs Generative Models Ashish Dandekar,

Quantum transport in graphene L1 Disordered graphene (G) L2 Ballistic electrons in graphene

Seminar on Modern Software Engineering and Database Concepts Gunter Saake, Jacob Kr uger,

Anglo-Chinese School (Junior) Primary 3 & 4 Pupils Meet-The-Parents Session 1 11 January

Search for long-lived for long-lived Search particles at CMS particles at CMS Jie Chen Florida

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Friday 5:30 at

CAM: CONSTRAINT AWARE APPLICATION MAPPING FOR APPLICATION MAPPING - PowerPoint PPT Presentation

CAM: CONSTRAINT AWARE APPLICATION MAPPING FOR APPLICATION MAPPING FOR EMBEDDED SYSTEMS Luis A. Bathen, Nikil D. Dutt Outline 2 Introduction & Motivation Introduction & Motivation CAM Overview Memory aware Macro

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Constraint Networks Dario Maggi University Basel October 9, 2014 Dario Maggi Constraint

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Constraint Satisfaction Problems Chapter 5 Section 1 3 Constraint Satisfaction 1 Outline

Monthly Statewide CAM Call-in September 19, 2018 If you are not connected through Join by phone

CAM SLIDES Steel

Constraint-Based Refactoring Rename Field Problem Proven Correct Solution Constraint- Based

WEM Reform: Constraint Development Responsibilities PSO-WG Meeting 3 February 2019 1

On Minimal Constraint Networks Georg Gottlob Minimal Constraint Networks Montanari 1974: To

Combining Combining Constraint Programming Constraint Programming and Integer Programming and

Constraint Satisfaction Problems Chapter 6 Constraint Satisfaction Problems A constraint

Advanced Texturing Environment Mapping Environment Mapping reflections Environment Mapping

Texture Mapping Texture Mapping 1 Texture Mapping Texture Mapping Motivation Motivation:

Texture Mapping Surface mapping OpenGl and Implementation Details Texture mapping Bump

Snapshots: CAM, Being SMARTR, &amp; Exercise Today ... Have a better understanding of what

LHC 750 GeV Diphoton excess in a radiative seesaw model through photon fusion production Kenji

Scaling limit of random planar maps Lecture 1. Olivier Bernardi, CNRS, Universit Paris-Sud

Detecting Communities of Commuters: Graph Based Techniques vs Generative Models Ashish Dandekar,

Quantum transport in graphene L1 Disordered graphene (G) L2 Ballistic electrons in graphene

Seminar on Modern Software Engineering and Database Concepts Gunter Saake, Jacob Kr uger,

Anglo-Chinese School (Junior) Primary 3 &amp; 4 Pupils Meet-The-Parents Session 1 11 January

Search for long-lived for long-lived Search particles at CMS particles at CMS Jie Chen Florida

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Friday 5:30 at

Snapshots: CAM, Being SMARTR, & Exercise Today ... Have a better understanding of what

Anglo-Chinese School (Junior) Primary 3 & 4 Pupils Meet-The-Parents Session 1 11 January