Extraction of Efficient Instruction Schedulers from Cycle-true - - PowerPoint PPT Presentation

extraction of efficient instruction schedulers from cycle
SMART_READER_LITE
LIVE PREVIEW

Extraction of Efficient Instruction Schedulers from Cycle-true - - PowerPoint PPT Presentation

Extraction of Efficient Instruction Schedulers from Cycle-true Processor Models Oliver Wahlen, Manuel Hohenauer, Rainer Leupers, Gerd Ascheid, Gunnar Braun Xiaoning Nie Heinrich Meyr CoWare, Inc. Infineon Technologies RWTH Aachen


slide-1
SLIDE 1

Institute for Integrated Signal Processing Systems

Extraction of Efficient Instruction Schedulers from Cycle-true Processor Models

Oliver Wahlen, Manuel Hohenauer, Rainer Leupers, Gerd Ascheid, Heinrich Meyr RWTH Aachen Xiaoning Nie Infineon Technologies Gunnar Braun CoWare, Inc.

slide-2
SLIDE 2

Institute for Integrated Signal Processing Systems

Motivation: Why ASIPs? Application Specific Instruction-Set Processors

Combine advantages of processors and ASICs:

  • Provide system programmability and reconfigurability
  • Good tradeoff: performance/power consumption/area
  • Can easily be integrated into embedded systems

efficiency (MIPS/Watt) flexibility ASICs GPPs ASIPs ASIPs

domain specific

slide-3
SLIDE 3

Institute for Integrated Signal Processing Systems

Solution: LISA Processor Design Platform

Language for Instruction-Set Architectures

C-Compiler Application C Compiler Assembler Simulator Profiler Linker Assembler C Compiler Simulator Profiler Application C Compiler Assembler Simulator Profiler Linker Assembler C Compiler (research) Simulator Profiler

Application Design Integration and Verification

System on Chip Simulator / Debug. Assembler / Linker

Architecture Exploration Architecture Implementation

LISA 2.0

Architecture Specification

EDGETM Processor Designer RIMTM Software Designer HUBTM System Integrator

http://www.coware.com

Software

slide-4
SLIDE 4

Institute for Integrated Signal Processing Systems

Solution: LISA Processor Design Platform

C-Compiler Application C Compiler Assembler Simulator Profiler Linker Assembler C Compiler Simulator Profiler Application C Compiler Assembler Simulator Profiler Linker Assembler C Compiler (research) Simulator Profiler

Application Design Integration and Verification

System on Chip Simulator / Debug. Assembler / Linker

Architecture Exploration Architecture Implementation

LISA 2.0

Architecture Specification

EDGETM Processor Designer RIMTM Software Designer HUBTM System Integrator

http://www.coware.com Language for Instruction-Set Architectures

Software

slide-5
SLIDE 5

Institute for Integrated Signal Processing Systems

Architecture Exploration Loop

application .c simulator & profiler assembler & linker automatic generation LISA processor model application .asm manual changes check no yes design criteria met? VHDL model C compiler

Automatic tool generation:

  • Speeds up design cycles
  • Eliminates consistency

problem C – compiler in the loop:

  • Reduction in implementation

and verification time

  • IP reuse
slide-6
SLIDE 6

Institute for Integrated Signal Processing Systems

Compiler Structure and Generation LISA processor model

Semiautomatic Generation compiler backend description .cgd Generation

CoSy Compiler Development System

.c

C front-end engine Optimizations Optimizations

  • ptimizations

.asm

Instruction Selector Register Allocator

IR architecture specific backend engines

Scheduler Emitter

slide-7
SLIDE 7

Institute for Integrated Signal Processing Systems

Scheduler Generation LISA processor model

Semiautomatic Generation compiler backend description .cgd Generation

CoSy Compiler Development System

[EXPRESSION, PEAS-III]

.c

C front-end engine Optimizations Optimizations

  • ptimizations

.asm

Instruction Selector Register Allocator

IR

Scheduler

postpass tool lpacker

Scheduler Emitter Emitter

slide-8
SLIDE 8

Institute for Integrated Signal Processing Systems

Scheduler Description

Reservation Tables

[O.Wahlen, M.Hohenauer, R.Leupers,

  • H. Meyr, 2003]

cycle 4 EX_mul cycle 3 EX_mul EX_alu cycle 2 cycle 1 cycle 0 MUL_op ALU_op Example:

0: MUL R1,R2,R3 1: NOP 2: MUL R4,R5,R6

Elimination of Structural Hazards

slide-9
SLIDE 9

Institute for Integrated Signal Processing Systems

Scheduler Description

Latency Tables Reservation Tables

[O.Wahlen, M.Hohenauer, R.Leupers,

  • H. Meyr, 2003]

cycle 4 EX_mul cycle 3 EX_mul EX_alu cycle 2 cycle 1 cycle 0 MUL_op ALU_op 2 2 MUL_out 1 1 ALU_out MUL_in ALU_in RAW WAW WAR Example:

Elimination of Dataflow Hazards

0: MUL R3,R1,R2 1: NOP 2: ADD R5,R3,R4

Example:

0: MUL R1,R2,R3 1: NOP 2: MUL R4,R5,R6

Elimination of Structural Hazards

slide-10
SLIDE 10

Institute for Integrated Signal Processing Systems

LISA Description

OPERATION reg_alu_instr IN pipe.ID { DECLARE { GROUP Opcode = { ADD || SUB }; GROUP Rs1, Rs2, Rd = { gp_reg }; } CODING { Opcode Rs2 Rs1 Rd 0b0[10] } SYNTAX { Opcode ~" " Rd ~" " Rs1 ~" " Rs2 } BEHAVIOR { PIPELINE_REGISTER(pipe,ID/EX).src1 = GP_Regs[Rs1]; PIPELINE_REGISTER(pipe,ID/EX).src2 = GP_Regs[Rs2]; PIPELINE_REGISTER(pipe,ID/EX).dst = Rd; } ACTIVATION { Opcode } } OPERATION reg_alu_instr IN pipe.ID { DECLARE { GROUP Opcode = { ADD || SUB }; GROUP Rs1, Rs2, Rd = { gp_reg }; } CODING { Opcode Rs2 Rs1 Rd 0b0[10] } SYNTAX { Opcode ~" " Rd ~" " Rs1 ~" " Rs2 } BEHAVIOR { PIPELINE_REGISTER(pipe,ID/EX).src1 = GP_Regs[Rs1]; PIPELINE_REGISTER(pipe,ID/EX).src2 = GP_Regs[Rs2]; PIPELINE_REGISTER(pipe,ID/EX).dst = Rd; } ACTIVATION { Opcode } }

... decode alu ... ... ... ... control imm_alu_instr reg_alu_instr

  • pcode

Rd Rs1 Rs2 ADD ... ... ... SUB

slide-11
SLIDE 11

Institute for Integrated Signal Processing Systems

Scheduler Generation: Operation Schedule

1

register_ alu_instr SUB ADDI

ALU

alu_wb imm_ alu_instr SUBI

1 1 1 1 1 1 1 1

Activation DAG: Register File R

1 2 1

Write Port

1 x write

decode fetch main

Operation Schedule:

Cycle Resource usage

3

main

1

fetch

  • Read Ports

2 x read

decode

2 x read

1 2 x read of GP-register file register_ alu_instr

1

ADD 3 alu_wb

1 x write

1 x write of GP-register file 2

  • ADD

1

slide-12
SLIDE 12

Institute for Integrated Signal Processing Systems

Latency Calculation

Latencies between two instructions i and j (R is a processor resource)

Lraw(i , j ) = MaxR( last_write_cycle(j , R ) – first_read_cycle( i, R ) + 1 )

3 2 1 Instructions … … GPR … SUB R4, R1, R5 GPR … … … ADD R1, R2, R3 GPR 4 … 5 … GPR 3 6 2 1 … SUB R4, R1, R5 … … … ADD R1, R2, R3

Lraw = 3 – 1 + 1 = 3

slide-13
SLIDE 13

Institute for Integrated Signal Processing Systems

Latency Calculation

Latencies between two instructions i and j (R is a processor resource)

Lraw(i , j ) = MaxR( last_write_cycle(j , R ) – first_read_cycle( i, R ) + 1 ) Lwaw(i , j ) = MaxR( last_write_cycle(j , R ) – first_write_cycle( i, R ) + 1 )

3 2 1 Instructions GPR … ... … SUB R1, R4, R5 GPR … … … ADD R1, R2, R3 GPR 4 … GPR 3 2 1 ... ... SUB R1, R4, R5 … … … ADD R1, R2, R3

Lwaw = 3 – 3 + 1 = 1

slide-14
SLIDE 14

Institute for Integrated Signal Processing Systems

Latency Calculation

Latencies between two instructions i and j (R is a processor resource)

Lraw(i , j ) = MaxR( last_write_cycle(j , R ) – first_read_cycle( i, R ) + 1 ) Lwaw(i , j ) = MaxR( last_write_cycle(j , R ) – first_write_cycle( i, R ) + 1 ) Lwar(i , j ) = MaxR( last_read_cycle(j , R ) – first_write_cycle( i, R ) )

3 2 1 Instructions ... … PC … JMP addr ... … ... PC ADD R2, R1, R3 ... 4 ... ... 3 2 1 ... PC ... JMP addr … PC ADD R2, R1, R3

Lwar = 0 – 1 = -1

negative latency = delay slot negative latency = delay slot

slide-15
SLIDE 15

Institute for Integrated Signal Processing Systems

List Scheduling Example

data dependence dag ready set:

ADD R1 R2 SUB R3 R4 ADD R1 R2 SUB R3 R4 PC: -1 PC: -1 JMP addr

slide-16
SLIDE 16

Institute for Integrated Signal Processing Systems

List Scheduling Example

data dependence dag ready set:

ADD R1 R2 SUB R3 R4 SUB R3 R4 PC: -1 PC: -1 JMP addr ADD R1 R2 Step 1 3 2 1 Cycle

slide-17
SLIDE 17

Institute for Integrated Signal Processing Systems

List Scheduling Example

data dependence dag ready set:

ADD R1 R2 ADD R1 R2 Step 1 3 2 SUB R3 R4 1 Step 2 Cycle ADD R1 R2 SUB R3 R4 JMP addr PC: -1 PC: -1 JMP addr

slide-18
SLIDE 18

Institute for Integrated Signal Processing Systems

List Scheduling Example

data dependence dag

ADD R1 R2 ADD R1 R2 ADD R1 R2 Step 1 3 JMP addr 2 SUB R3 R4 SUB R3 R4 1 Step 3 Step 2 Cycle

ready set:

ADD R1 R2 SUB R3 R4 PC: -1 PC: -1 JMP addr

slide-19
SLIDE 19

Institute for Integrated Signal Processing Systems

List Scheduling Example

data dependence dag ready set:

ADD R1 R2 SUB R3 R4 PC: -1 PC: -1 JMP addr ADD R1 R2 ADD R1 R2 ADD R1 R2 ADD R1 R2 Step 1 NOP 3 JMP addr JMP addr 2 SUB R3 R4 SUB R3 R4 SUB R3 R4 1 Step 4 Step 3 Step 2 Cycle delay slot must be filled delay slot must be filled

slide-20
SLIDE 20

Institute for Integrated Signal Processing Systems

Backtracking Scheduler

  • Negative latencies can automatically

be extracted from the LISA model

  • They indicate delay slots
  • Negative weights in dependence DAG

cannot be utilized by list schedulers because scheduling decisions need to be revoked Development of a retargetable Backtracking Scheduler [S. G. Abraham, W. Meleis, I. D. Baev, 2000] Development of a retargetable Backtracking Scheduler [S. G. Abraham, W. Meleis, I. D. Baev, 2000]

slide-21
SLIDE 21

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler

Concept: three scheduling modes 1. normal scheduling: if there is no conflict instructions are scheduled according to their data dependencies 2. displace scheduling: unschedule instructions that have lower priority and are causing a structural hazard 3. force scheduling: if 1 and 2 are not possible unschedule conflicts and force the scheduling of the candidate

slide-22
SLIDE 22

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler Example

3 2 1 Cycle

  • 1
  • 1

data dependence dag

1

  • 1

ADD R1 R2 priority: attempted cycle: 2

  • 1

SUB R3 R4 priority: attempted cycle: 2 1

  • 1

JMP addr priority: attempted cycle:

unscheduled list

SUB ADD JMP

initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } }

slide-23
SLIDE 23

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler Example

JMP N 3 2 1 Cycle

unscheduled list

SUB ADD

  • 1
  • 1

data dependence dag

1

  • 1

ADD R1 R2 priority: attempted cycle: 2

  • 1

SUB R3 R4 priority: attempted cycle: 2 1

  • 1

JMP addr priority: attempted cycle:

initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } }

N: normal, scheduled according to data dependencies

slide-24
SLIDE 24

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler Example

JMP JMP N 3 2 ADD 1 N Cycle

unscheduled list

SUB

  • 1
  • 1

data dependence dag

1

  • 1

ADD R1 R2 priority: attempted cycle: 2

  • 1

SUB R3 R4 priority: attempted cycle: 2 1

  • 1

JMP addr priority: attempted cycle:

initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } }

N: normal, scheduled according to data dependencies

slide-25
SLIDE 25

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler Example

JMP JMP JMP N 3 2 ADD ADD 1 N Cycle

unscheduled list

  • 1
  • 1

data dependence dag

1

  • 1

ADD R1 R2 priority: attempted cycle: 2

  • 1

SUB R3 R4 priority: attempted cycle: 2 1

  • 1

JMP addr priority: attempted cycle:

initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } SUB

normal and displace scheduling are not possible

slide-26
SLIDE 26

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler Example

SUB JMP JMP N 3 2 ADD ADD 1 F N Cycle

unscheduled list

JMP

  • 1
  • 1

data dependence dag

1

  • 1

ADD R1 R2 priority: attempted cycle: 2 SUB R3 R4 priority: attempted cycle: 2 1

  • 1

JMP addr priority: attempted cycle:

initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } }

F: unschedule conflicts and force the scheduling

slide-27
SLIDE 27

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler Example

ADD JMP D SUB JMP JMP N 3 2 ADD ADD 1 F N Cycle

unscheduled list

SUB D: unschedule instructions that have lower priority & are causing a structural hazard

  • 1
  • 1

data dependence dag

1

  • 1

ADD R1 R2 priority: attempted cycle: 2 SUB R3 R4 priority: attempted cycle: 2 1

  • 1

JMP addr priority: attempted cycle:

initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } }

slide-28
SLIDE 28

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler Example

ADD JMP D ADD JMP SUB JMP JMP N 3 2 ADD ADD 1 F N Cycle

unscheduled list

  • 1
  • 1

data dependence dag

1

  • 1

ADD R1 R2 priority: attempted cycle: 2 SUB R3 R4 priority: attempted cycle: 2 1

  • 1

JMP addr priority: attempted cycle:

initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } SUB

normal and displace scheduling are not possible

slide-29
SLIDE 29

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler Example

ADD JMP D SUB JMP F SUB JMP JMP N 3 2 ADD ADD 1 F N Cycle

unscheduled list

ADD

  • 1
  • 1

data dependence dag

1

  • 1

ADD R1 R2 priority: attempted cycle: 2 1 SUB R3 R4 priority: attempted cycle: 2 1

  • 1

JMP addr priority: attempted cycle:

initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } }

F: unschedule conflicts and force the scheduling

slide-30
SLIDE 30

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler Example

ADD JMP D SUB JMP F SUB JMP SUB JMP JMP N 3 2 ADD ADD 1 F N Cycle

unscheduled list

  • 1
  • 1

data dependence dag

1

  • 1

ADD R1 R2 priority: attempted cycle: 2 1 SUB R3 R4 priority: attempted cycle: 2 1

  • 1

JMP addr priority: attempted cycle:

initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } ADD

normal and displace scheduling are not possible

slide-31
SLIDE 31

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler Example

ADD JMP D SUB JMP F SUB ADD F SUB JMP JMP N 3 2 ADD ADD 1 F N Cycle

unscheduled list

JMP

  • 1
  • 1

data dependence dag

1 ADD R1 R2 priority: attempted cycle: 2 1 SUB R3 R4 priority: attempted cycle: 2 1

  • 1

JMP addr priority: attempted cycle:

initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } }

F: unschedule conflicts and force the scheduling

slide-32
SLIDE 32

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler Example

ADD JMP D SUB JMP F SUB ADD F SUB JMP D SUB JMP JMP N 3 2 ADD ADD 1 F N Cycle

unscheduled list

ADD

  • 1
  • 1

data dependence dag

1 ADD R1 R2 priority: attempted cycle: 2 1 SUB R3 R4 priority: attempted cycle: 2 1

  • 1

JMP addr priority: attempted cycle:

initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } }

D: unschedule instructions that have lower priority & are causing a structural hazard

slide-33
SLIDE 33

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler Example

ADD JMP D SUB JMP F SUB ADD F SUB JMP D SUB JMP SUB JMP JMP N 3 2 ADD ADD 1 F N Cycle

unscheduled list

  • 1
  • 1

data dependence dag

1 ADD R1 R2 priority: attempted cycle: 2 1 SUB R3 R4 priority: attempted cycle: 2 1

  • 1

JMP addr priority: attempted cycle:

initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } ADD

normal and displace scheduling are not possible

slide-34
SLIDE 34

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler Example

ADD JMP D SUB JMP F SUB ADD F SUB JMP D ADD JMP F SUB JMP JMP N 3 2 ADD ADD 1 F N Cycle

unscheduled list

SUB

  • 1
  • 1

data dependence dag

1 1 ADD R1 R2 priority: attempted cycle: 2 1 SUB R3 R4 priority: attempted cycle: 2 1

  • 1

JMP addr priority: attempted cycle:

initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } }

F: unschedule conflicts and force the scheduling

slide-35
SLIDE 35

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler Example

ADD JMP D SUB JMP F SUB ADD F SUB JMP D ADD JMP F ADD JMP SUB JMP JMP N 3 2 ADD ADD 1 F N Cycle

unscheduled list

  • 1
  • 1

data dependence dag

1 1 ADD R1 R2 priority: attempted cycle: 2 1 SUB R3 R4 priority: attempted cycle: 2 1

  • 1

JMP addr priority: attempted cycle:

initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } SUB

normal and displace scheduling are not possible

slide-36
SLIDE 36

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler Example

ADD JMP D SUB JMP F SUB ADD F SUB JMP D ADD JMP F SUB ADD NOP F SUB JMP JMP N 3 2 ADD ADD 1 F N Cycle

unscheduled list

JMP

  • 1
  • 1

data dependence dag

1 1 ADD R1 R2 priority: attempted cycle: 2 2 SUB R3 R4 priority: attempted cycle: 2 1

  • 1

JMP addr priority: attempted cycle:

initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } }

F: unschedule conflicts and force the scheduling

slide-37
SLIDE 37

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler Example

ADD JMP D SUB JMP F SUB ADD F SUB JMP D ADD JMP F SUB ADD NOP F SUB JMP NOP D SUB JMP JMP N 3 2 ADD ADD 1 F N Cycle

unscheduled list

ADD

  • 1
  • 1

data dependence dag

1 1 ADD R1 R2 priority: attempted cycle: 2 2 SUB R3 R4 priority: attempted cycle: 2 1

  • 1

JMP addr priority: attempted cycle:

initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } }

D: unschedule instructions that have lower priority & are causing a structural hazard

slide-38
SLIDE 38

Institute for Integrated Signal Processing Systems

mixedBT Backtracking Scheduler Example

ADD JMP D SUB JMP F SUB ADD F SUB JMP D ADD JMP F SUB ADD NOP F SUB JMP NOP D ADD SUB JMP JMP N 3 SUB 2 JMP ADD ADD 1 N F N Cycle

delay slot automatically filled delay slot automatically filled

  • 1
  • 1

data dependence dag

1 1 ADD R1 R2 priority: attempted cycle: 2 2 SUB R3 R4 priority: attempted cycle: 2 1

  • 1

JMP addr priority: attempted cycle:

initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } } initialize_priorities loop(until_all_insn_scheduled) { try normal_schedule

  • r displace_schedule

else { force_schedule update_attempted_cycle } }

N: normal, scheduled according to data dependencies

slide-39
SLIDE 39

Institute for Integrated Signal Processing Systems

Toolflow

.c .ls lpacker .s lasm / llnk .out

scheduler code emitter assembler linker assembly instrumented assembly C source executable

C - Compiler .lisa Target Architecture: PP32 Network Processor, Infineon Tech. AG Compilers:

  • CoSy Compiler with LPacker as

Scheduler/Emitter

  • LCC (Little C Compiler, Princeton

Univ.) with LPacker as Scheduler/Emitter

slide-40
SLIDE 40

Institute for Integrated Signal Processing Systems

Toolflow

Target Architecture: PP32 Network Processor, Infineon Tech. AG Compilers:

  • CoSy Compiler with LPacker as

Scheduler/Emitter

  • LCC (Little C Compiler, Princeton Univ.)

with LPacker as Scheduler/Emitter Reference Compiler:

  • CoSy Compiler with native List Scheduler

C source

.c .s lasm / llnk .out C - Compiler

assembly assembler linker

.lisa

executable

slide-41
SLIDE 41

Institute for Integrated Signal Processing Systems

Results: Rel. Cycle Count for PP32 NPU

50 100 150 200 250 300 frag tos hwacc route reed md5 crc CoSy lcc+lpacker(list) CoSy+lpacker(list) CoSy+lpacker(mixedBT)

slide-42
SLIDE 42

Institute for Integrated Signal Processing Systems

Results: Rel. Code Size for PP32 NPU

50 100 150 200 250 frag tos hwacc route reed md5 crc CoSy lcc+lpacker(list) CoSy+lpacker(list) CoSy+lpacker(mixedBT)

slide-43
SLIDE 43

Institute for Integrated Signal Processing Systems

Summary

  • It is possible to extract all scheduler related information

from LISA processor models

  • Negative latencies represent delay slots

but list scheduler cannot utilize this information

  • Usage of retargetable mixedBT backtracking scheduler

produces up to 20% cycle count improvement compared to list scheduler

  • Cycle count improvement of mixedBT scheduler is

up to 7% better than existing listBT schedulers

  • MixedBT is efficient because it behaves like a list scheduler

for all instructions that do not have delay slots

slide-44
SLIDE 44

Institute for Integrated Signal Processing Systems

Thank you !