outline
play

Outline Modern architectures Spring 2003 Delay slots - PDF document

Outline Modern architectures Spring 2003 Delay slots Introduction to instruction scheduling List scheduling Code Scheduling Resource constraints Interaction with register allocation Scheduling across basic blocks


  1. Outline • Modern architectures Spring 2003 • Delay slots • Introduction to instruction scheduling • List scheduling Code Scheduling • Resource constraints • Interaction with register allocation • Scheduling across basic blocks • Trace scheduling • Scheduling for loops • Loop unrolling • Software pipelining Kostis Sagonas Spring 2003 2 Simple Machine Model Simple Execution Model 5 Stage pipe-line • Instructions are executed in sequence – Fetch, decode, execute, store results fetch decode execute memory write back – One instruction at a time Fetch: get the next instruction • For branch instructions, start fetching from a different location if needed Decode: figure-out what that instruction is – Check branch condition Execute: perform ALU operation – Next instruction may come from a new location address calculation in a memory op given by the branch instruction Memory: do the memory access in a mem. op. Write Back: write the results back Kostis Sagonas Spring 2003 Kostis Sagonas Spring 2003 3 4 Execution Models Outline time Model 1 • Modern architectures Inst 1 IF DE EXE MEM WB • Delay slots Inst 2 IF DE EXE MEM WB • Introduction to instruction scheduling • List scheduling • Resource constraints Inst 1 IF DE EXE MEM WB • Interaction with register allocation Model 2 Inst 2 IF DE EXE MEM WB • Scheduling across basic blocks • Trace scheduling Inst 3 IF DE EXE MEM WB • Scheduling for loops Inst 4 IF DE EXE MEM WB • Loop unrolling Inst 5 IF DE EXE MEM WB • Software pipelining Kostis Sagonas Spring 2003 Kostis Sagonas Spring 2003 5 6 1

  2. Handling Branch Instructions Handling Branch Instructions Problem: We do not know the location of the next What to do with the middle 2 instructions? instruction until later 1. Stall the pipeline in case of a branch until we – after DE in jump instructions know the address of the next instruction – after EXE in conditional branch instructions – wasted cycles Branch IF DE EXE MEM WB Branch IF DE EXE MEM WB ??? IF DE EXE MEM WB IF DE EXE MEM WB Next inst IF DE EXE MEM WB ??? IF DE EXE MEM WB Inst What to do with the middle 2 instructions? Kostis Sagonas Spring 2003 Kostis Sagonas Spring 2003 7 8 Handling Branch Instructions Branch Delay Slot(s) What to do with the middle 2 instructions? MIPS has a branch delay slot 2. Delay the action of the branch – The instruction after a conditional branch gets – Make branch affect only after two instructions executed even if the code branches to target – Following two instructions after the branch get – Fetching from the branch target takes place only executed regardless of the branch after that Branch IF DE EXE MEM WB ble r3, foo Next seq inst IF DE EXE MEM WB Branch delay slot IF DE EXE MEM WB Next seq inst IF DE EXE MEM WB Branch target inst What instruction to put in the branch delay slot? Kostis Sagonas Spring 2003 Kostis Sagonas Spring 2003 9 10 Filling the Branch Delay Slot Filling the Branch Delay Slot Move an instruction from above the branch Simple Solution: Put a no-op prev_instr ble r3, lbl Wasted instruction, just like a stall prev_instr Branch delay slot • moved instruction executes iff branch executes – Get the instruction from the same basic block as the ble r3, lbl branch noop Branch delay slot – Don’t move a branch instruction! • instruction need to be moved over the branch – branch does not depend on the result of the inst. Kostis Sagonas Spring 2003 Kostis Sagonas Spring 2003 11 12 2

  3. Filling the Branch Delay Slot Filling the Branch Delay Slot Move an instruction dominated by the branch Move an instruction from the branch target instruction – Instruction dominated by target – No other ways to reach target (if so, take care of them) ble r3, lbl – If conditional branch, instruction should not have a lasting effect if the branch is not taken dom_instr Branch delay slot ble r3, lbl lbl: instr Branch delay slot dom_instr lbl: instr Kostis Sagonas Spring 2003 Kostis Sagonas Spring 2003 13 14 Load Delay Slots Load Delay Slots If the value of the load is used…what to do?? Problem: Results of the loads are not available • Always stall one cycle until end of MEM stage • Stall one cycle if next instruction uses the value Load – Need hardware to do this IF DE EXE MEM WB • Have a delay slot for load – The new value is only available after two instructions IF DE EXE MEM WB Use of load – If next inst. uses the register, it will get the old value Load IF DE EXE MEM WB If the value of the load is used…what to do?? IF DE EXE MEM WB ??? IF DE EXE MEM WB Use of load Kostis Sagonas Spring 2003 Kostis Sagonas Spring 2003 15 16 Example Example r2 = *(r1 + 4) r2 = *(r1 + 4) r3 = *(r1 + 8) r3 = *(r1 + 8) noop r4 = r2 + r3 r4 = r2 + r3 r5 = r2 - 1 r5 = r2 - 1 goto L1 goto L1 noop Assume 1 cycle delay on branches and 1 cycle latency for loads Kostis Sagonas Spring 2003 Kostis Sagonas Spring 2003 17 18 3

  4. Example Example r2 = *(r1 + 4) r2 = *(r1 + 4) r3 = *(r1 + 8) r3 = *(r1 + 8) r5 = r2 - 1 r5 = r2 - 1 r4 = r2 + r3 goto L1 goto L1 noop r4 = r2 + r3 Assume 1 cycle delay on branches Assume 1 cycle delay on branches and 1 cycle latency for loads and 1 cycle latency for loads Kostis Sagonas Spring 2003 Kostis Sagonas Spring 2003 19 20 Example Outline r2 = *(r1 + 4) • Modern architectures r3 = *(r1 + 8) • Delay slots r5 = r2 - 1 • Introduction to instruction scheduling goto L1 • List scheduling • Resource constraints r4 = r2 + r3 • Interaction with register allocation • Scheduling across basic blocks • Trace scheduling Final code after delay slot filling • Scheduling for loops • Loop unrolling • Software pipelining Kostis Sagonas Spring 2003 Kostis Sagonas Spring 2003 21 22 From a Simple Machine Model Real Machine Model cont. to a Real Machine Model • Many pipeline stages • Most modern processors have multiple execution units (superscalar) – MIPS R4000 has 8 stages – If the instruction sequence is correct, multiple • Different instructions take different amount of operations will take place in the same cycles time to execute – Even more important to have the right instruction – mult 10 cycles sequence – div 69 cycles – ddiv 133 cycles • Hardware to stall the pipeline if an instruction uses a result that is not ready Kostis Sagonas Spring 2003 Kostis Sagonas Spring 2003 23 24 4

  5. Instruction Scheduling Data Dependencies Goal: Reorder instructions so that pipeline stalls • If two instructions access the same variable, they can be dependent are minimized • Kinds of dependencies – True: write → read Constraints on Instruction Scheduling: – Anti: read → write – Data dependencies – Output: write → write – Control dependencies • What to do if two instructions are dependent? – Resource constraints – The order of execution cannot be reversed – Reduce the possibilities for scheduling Kostis Sagonas Spring 2003 Kostis Sagonas Spring 2003 25 26 Computing Data Dependencies Representing Dependencies • Using a dependence DAG, one per basic block • For basic blocks, compute dependencies by walking through the instructions • Nodes are instructions, edges represent dependencies • Identifying register dependencies is simple 1 2 1: r2 = *(r1 + 4) – is it the same register? 2: r3 = *(r1 + 8) 2 2 2 • For memory accesses 3: r4 = r2 + r3 4 3 4: r5 = r2 - 1 – simple: base + offset1 ?= base + offset2 – data dependence analysis: a[2i] ?= a[2i+1] Edge is labeled with latency: – interprocedural analysis: global ?= parameter v(i → j) = delay required between initiation times of – pointer alias analysis: p1 ?= p i and j minus the execution time required by i Kostis Sagonas Spring 2003 Kostis Sagonas Spring 2003 27 28 Example Another Example 1: r2 = *(r1 + 4) 1: r2 = *(r1 + 4) 2: r3 = *(r2 + 4) 2: *(r1 + 4) = r3 3: r4 = r2 + r3 3: r3 = r2 + r3 3 1 4: r5 = r2 - 1 1 2 4: r5 = r2 - 1 1 2 2 2 2 2 2 1 4 3 4 3 Kostis Sagonas Spring 2003 Kostis Sagonas Spring 2003 29 30 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend