Outline Specific Issues Related to Embedded Processor Architectures - PowerPoint PPT Presentation

Sorin Manolache Sorin Manolache Outline Specific Issues Related to Embedded Processor Architectures • General approach Sorin Manolache • SIMD example sorma@ida.liu.se • VLIW example 1 2 November 7, 2001 November 7, 2001 Sorin Manolache Sorin Manolache Programmable Architectures of Interest (2) Programmable Architectures of Interest (1) • Multimedia processors • Microcontrollers (MCU) – VLIW – CISC – High performance – High code density – Low code density – Reduced computational resources – High power consumption • RISC processors • Application-Specific Processors (ASIP) – Large file of GP registers – Application-specific data paths – Orthogonal instruction set – Sometimes customizable (reg. file size, reg. width, • DSP word width) – DSP-specific data path architectures (irregular) 3 4 November 7, 2001 November 7, 2001

Sorin Manolache Sorin Manolache Source Level Optimization Optimization Levels • Architecture independent • Standard optimizations – Constant folding – Common subexpression elimination • Source level optimization – Jump optimization • Optimized instruction set mapping • Address code transformations • Assembly level optimizations – Attempt to optimize address generation for array indexes (up to 50% code for addr. gen.) • Loop transformations – Loop unrolling – Loop folding (software pipelining) • Function inlining 5 6 November 7, 2001 November 7, 2001 Sorin Manolache Sorin Manolache Optimized Instruction Set Mapping Tree Pattern Matching (1) • Mapping of machine-independent intermediate representation (IR) to matching instructions SUB • Consider: ADD + SUB – – – – Special-purpose registers – Complex instruction patterns (MAC, SIMD) + + * * ADD – Inter-instruction constraints (resource) + + • Phases: * MAC MAC + – Tree pattern matching (mapping of “functionality”) – Register allocation (mapping of data) – Instruction scheduling (multiprocessor scheduling, with dependency and communication constraints) 7 8 November 7, 2001 November 7, 2001

Sorin Manolache Sorin Manolache Tree Pattern Matching (2) Register Allocation and Instruction Scheduling • Optimal cover in linear time with dynamic programming • Graph colouring problems (heuristics) • There are compiler generators for such problems • Simulated annealing, ILP , CLP • Limitations: • Cost function, number of register spillings – Only for DFT, not DFG • Better results when applied concurrently with instruction sched- – Difficult to apply for irregular data paths uling ( phase coupling ) – Doesn’t apply to SIMD processors • Mutation scheduling, list scheduling • Algebraic transformations on operations 9 10 November 7, 2001 November 7, 2001 Sorin Manolache Sorin Manolache Assembly Level Optimization Even More Problems • Goals: • Compiling for low-power (Siemens reported 60% power saving for a mobile phone in stand-by mode achieved only from soft- – Memory access optimization ware!) – Instruction scheduling optimization • Retargetability • Consider: • Industrial trends: VLIW – Memory architecture (many DSP have two memory banks) – Address generation hardware – Instruction level parallelism • Fewer pipeline stalls • Speed-ups of 24% reported 11 12 November 7, 2001 November 7, 2001

Sorin Manolache Sorin Manolache Compilation for SIMD (2) Compilation for SIMD (1) • Constraints: • Authors go for an ILP approach – Each node in the DFT has to be covered by exactly • Write a grammar for tree covers one machine operation • A derivation in this grammar is a possible cover – Result destination of an operation have to be the • A cost is associated to each rule same as operand sources for result consumers DFT: store (REG, plus (REG, OFFS)) – Common subexpressions REG: plus (REG, REG) – SIMD selection: same operation, correct alignment in REG_LO: plus (REG_LO, REG_LO) memory, correct placing relative to the SIMD register, REG_HI: plus (REG_HI, REG_HI) no dependency • same cost associated to the three rules for covering an ADD. – Data dependency constraints • The problem is to find a derivation with an optimal cost 13 14 November 7, 2001 November 7, 2001 Sorin Manolache Sorin Manolache Code Generation for Irregular Data Paths Compilation for VLIW • Authors go for a CLP approach • Consider only the partitioning (mapping) and scheduling phases (code selection is considered to be done) • Representation by means of a factored machine operation • Done concurrently by feeding back the results of the scheduling (Op, R, [O 1 , O 2 ,..., O n ], ERI, Cons) phase to a new iteration of the mapping phase • Works even if Op is not available on the target processor (alge- • Mapping by means of a simulated annealing approach braic transformations) • Scheduling based on a list scheduling approach • Able to model chained operations (MACs), large class of restric- tions on instruction-level parallelism • For the most complex models, performance improvement to up to 50%, at the expense of sometimes 24 hours of compilation 15 16 November 7, 2001 November 7, 2001

Sorin Manolache Sorin Manolache Scheduling Algorithm (1) The Problems Come from the Architecture mark all nodes as not scheduled S = empty_set while not scheduled nodes A Register File B Register File m = NEXT_READY_NODE(); X2 X1 S = SCHEDULE_NODE(S, m, P); mark m as scheduled L1 S1 M1 D1 D2 M2 S2 L2 • NEXT_READY_NODE selects the node that can be scheduled the earliest addr bus • ties are broken by selecting the node that has the smallest data bus ALAP time (first_fail in CLP) 17 18 November 7, 2001 November 7, 2001 Sorin Manolache Sorin Manolache Scheduling Algorithm (2) Scheduling Algorithm (3) cs = EARLIEST(m) - 1; repeat • Use copies cs = cs + 1; fm = GET_NODE_UNIT(m, cs, P); • If no copies, use cross path rather than inserting a move instruction in an earlier step if (fm == 0) continue; if (m has an argument in a diff. cluster) • Exploit commutativity CHECK_ARG_TRANSFER(); if (at least one transfer impossible) continue ; • Approach best when the ratio of the length of the critical path SCHEDULE_TRANSFERS(); and the number of instructions is low until m is scheduled if (m is a LOAD) DETERMINE_LOAD_PATH(m) if (m is a CSE) INSERT_FORWARD(S, m); 19 20 November 7, 2001 November 7, 2001

Sorin Manolache Conclusions • Generally difficult problem • Additionally, difficulty very much dependent on the particular problem instance => difficult to create a retargetable compiler • Heuristics, but which? General (simulated annealing)? Problem specific (CP)? 21 November 7, 2001

Outline Specific Issues Related to Embedded Processor Architectures - PowerPoint PPT Presentation

Sorin Manolache Sorin Manolache Outline Specific Issues Related to Embedded Processor Architectures General approach Sorin Manolache SIMD example sorma@ida.liu.se VLIW example 1 2 November 7, 2001 November 7, 2001 Sorin

Embedded Processor Based Embedded Processor Based Fault Injection and SEU Fault Injection and

FPGA co-processor Patrick Dunne for the co-processor group Introduction Co-processor will

Embedded systems & the Nios II soft core processor A Nios II processor system I equivalent to

Embedded PC The modular Industrial PC for mid-range control Embedded PC 1 Embedded OS

EMBEDDED EMBEDDED REAL TIME SYSTEMS REAL TIME SYSTEMS EMBEDDED EMBEDDED REAL TIME SYSTEMS

Platform Convergence Journey Windows Embedded Standard 7 Windows Embedded Standard 8 Converged

Processor Design Pipelined Processor Hung-Wei Tseng Drawbacks of a single-cycle processor

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

Embedded PC The modular Industrial PC for mid-range control Stefan Hoppe 14.09.2007 1 Embedded

Specific Aims One Page The single most important page in a grant Specific Aims Specific Aims

Last Time Embedded systems introduction u Definition of embedded system Common

Embedded Systems (ESII) Prof. Dr. J. Henkel, Dr. M. Shafique CES - Chair for Embedded Systems

Cortex-A15 Processor ARMs next generation mobile applications processor Travis Lanier Senior

Ch. 5: Processor + Memory December 12, 2008 Ch. 5: Processor + Memory Overview of Implementation

Chapter 12 CPU Structure and Function Contents Processor organization Register

Processor Architecture: Current Trends A B Transfer a truckload at a time from A to B Processor

HRANCO SERVICE FABRICATION ERECTION PRESSURE FINISHING CAPACITY

Public Safety Communications Interoperability: The Future looks Bright! Proudly governed by:

EAST LONDON NHS FOUNDATION PRESENTATION TO TOWER HAMLETS HEALTH SCRUTINY COMMITTEE JULY 2018 1

Conceptual Architecture Team OMG Who we are: Nelson Yi Joshua Lee Hassan Haq

A GPU-based x86 Disassembler ISC 2015 Evangelos Ladakis , Giorgos Vasiliadis, Michalis

RoFaL Roma Families Learning Project County Clare VEC Clare Adult Basic Education Service is

Translating ETC to LLVM Assembly Carl Ritson C.G.Ritson@kent.ac.uk School of Computing,

Clamp Selection Guide www.sinctech.com | Page 1 Clamp Selection Guide Clamps and Mounting

Outline Specific Issues Related to Embedded Processor Architectures - PowerPoint PPT Presentation

Sorin Manolache Sorin Manolache Outline Specific Issues Related to Embedded Processor Architectures General approach Sorin Manolache SIMD example sorma@ida.liu.se VLIW example 1 2 November 7, 2001 November 7, 2001 Sorin

Embedded Processor Based Embedded Processor Based Fault Injection and SEU Fault Injection and

FPGA co-processor Patrick Dunne for the co-processor group Introduction Co-processor will

Embedded systems &amp; the Nios II soft core processor A Nios II processor system I equivalent to

Embedded PC The modular Industrial PC for mid-range control Embedded PC 1 Embedded OS

EMBEDDED EMBEDDED REAL TIME SYSTEMS REAL TIME SYSTEMS EMBEDDED EMBEDDED REAL TIME SYSTEMS

Platform Convergence Journey Windows Embedded Standard 7 Windows Embedded Standard 8 Converged

Processor Design Pipelined Processor Hung-Wei Tseng Drawbacks of a single-cycle processor

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

Embedded PC The modular Industrial PC for mid-range control Stefan Hoppe 14.09.2007 1 Embedded

Specific Aims One Page The single most important page in a grant Specific Aims Specific Aims

Last Time Embedded systems introduction u Definition of embedded system Common

Embedded Systems (ESII) Prof. Dr. J. Henkel, Dr. M. Shafique CES - Chair for Embedded Systems

Cortex-A15 Processor ARMs next generation mobile applications processor Travis Lanier Senior

Ch. 5: Processor + Memory December 12, 2008 Ch. 5: Processor + Memory Overview of Implementation

Chapter 12 CPU Structure and Function Contents Processor organization Register

Processor Architecture: Current Trends A B Transfer a truckload at a time from A to B Processor

HRANCO SERVICE FABRICATION ERECTION PRESSURE FINISHING CAPACITY

Public Safety Communications Interoperability: The Future looks Bright! Proudly governed by:

EAST LONDON NHS FOUNDATION PRESENTATION TO TOWER HAMLETS HEALTH SCRUTINY COMMITTEE JULY 2018 1

Conceptual Architecture Team OMG Who we are: Nelson Yi Joshua Lee Hassan Haq

A GPU-based x86 Disassembler ISC 2015 Evangelos Ladakis , Giorgos Vasiliadis, Michalis

RoFaL Roma Families Learning Project County Clare VEC Clare Adult Basic Education Service is

Translating ETC to LLVM Assembly Carl Ritson C.G.Ritson@kent.ac.uk School of Computing,

Clamp Selection Guide www.sinctech.com | Page 1 Clamp Selection Guide Clamps and Mounting

Embedded systems & the Nios II soft core processor A Nios II processor system I equivalent to