mips pipeline with tomasulo s algorithm add add rs ir
play

MIPS Pipeline with Tomasulos Algorithm ADD ADD RS IR Issue WB - PowerPoint PPT Presentation

MIPS Pipeline with Tomasulos Algorithm ADD ADD RS IR Issue WB Dispatch DIV LSQ MEM REG FILE MULT Common Data Bus (CDB) Renaming Source Registers Example A: DIVD F0 , F2, F4 RAW dependency between (A,B) B: ADDD F6, F0 , F8


  1. MIPS Pipeline with Tomasulo’s Algorithm ADD ADD RS IR Issue WB Dispatch DIV LSQ MEM REG FILE MULT Common Data Bus (CDB)

  2. Renaming Source Registers Example A: DIVD F0 , F2, F4 RAW dependency between (A,B) B: ADDD F6, F0 , F8 WAR dependency between (B,C) C: SUBD F8 , F10, F12 RAW dependency between (C,D) D: DIVD F0, F8 , F10 A source register that is not the destination of an in-flight instruction is copied to the • new storage location allocated for the source operand A source register that is the destination of an in-flight instruction is tagged with the id • of the producer • When the producer instruction does a write the value and its id are broadcast on the CDB • The waiting instruction copies the value from the CDB to the storage allocated for the source operand 2

  3. Example (Issue Unit) Issue A: DIVD F0, F2, F4 RS A F0 TAG SOURCE1 SOURCE2 OP v 2 v 4 RS A DIVD RS A • v 2 , v 4 : Values read from registers F2, F4 during issue • Record A as the current writer of F0 Issue B: ADDD F6, F0, F8 RS A RS B F6 RS B v 8 RS B ADDD • v 8 : Value read from F8 • Other operand will be result of instruction A 3

  4. Example (Issue Unit) Issue C: SUBD F8, F10, F12 RS C F8 TAG SOURCE1 SOURCE2 OP RS C v 10 v 12 RS C SUBD v 10 , v 12 : Values read from registers F10, F12 during instruction issue Issue D: DIVD F14, F8, F10 RS D RS C F14 RS D RS D v 10 DIVD • v 10 : Value read from F10 • Other operand will be result of instruction C 4

  5. Dispatch and WB Units Dispatch unit selects instructions from RS whose : • operands are available and functional unit is free In the example A, C are ready to execute. • When execution is complete the Write Unit is notified • Write Unit selects one of the completed instruction in EX/WB register for WB For the selected instruction, say RS I • Broadcast its result on the CDB • Broadcast the TAG (RS I ) along with the value • All units that are waiting on the result of the completing instruction (tag comparison): copy the broadcast value • Reservation stations copy the value into the RS registers with matching tags • Register file copies the value into the destination register of the instruction. 5

  6. Snapshot after Issue of A, B, C, D RS A Issue A: DIVD F0, F2, F4 F0 TAG SOURCE1 SOURCE2 OP v 2 v 4 RS A RS A DIVD Issue B: ADDD F6, F0, F8 RS A RS B F6 RS B RS B v 8 ADDD Issue C: SUBD F8, F10, F12 RS C F8 TAG SOURCE1 SOURCE2 OP RS C v 10 v 12 RS C SUBD Issue D: DIVD F14, F8, F10 RS C RS D F14 RS D RS D v 10 DIVD 6

  7. Example (contd ...) Event: C completes execution Write Unit broadcasts result (RES C ) of C together with its tag RS C C releases Reservation Station RS C All registers of the RS and all registers in the Register File monitor CDB for broadcast TAG If the TAG matches copy the broadcast value into the register ID SOURCE1 SOURCE2 OP F8 RS C v 10 v 12 RS C SUBD RES C RS D F14 v 10 RS D RES C DIVD RS D • F8 updated with result of C even though B has not yet started execution. • A in execution 7 • D ready to be dispatched

  8. Example (contd ...) Event: A completes execution Write Unit broadcasts result (RES A ) of A along with its tag RS A A releases RS A ID SOURCE1 SOURCE2 OP F0 RS A v 2 v 4 RS A DIVD RES A RS B F6 v 8 RS B RES A DIVD RS B • D in execution. • B ready to dispatch • When D and B complete execution, their results are broadcast and used to update F14 and F6 respectively. 8

  9. WAW Hazards A: DIVD F0, F2, F4 B: ADDD F6, F0, F8 C: DIVD F0, F10, F12 D: ADDD F8, F0, F14 RS A ID SOURCE1 SOURCE2 OP F0 RS A v 2 v 4 RS A DIVD RS A RS B F6 v 8 RS B ADDD RS B RS C F0 RS C v10 v12 RS C DIVD • Write of F0 by A effectively canceled. • Intermediate instructions (like B) get operands directly from A bypassing F0. 9

  10. WAW Hazards A: DIVD F0, F2, F4 B: ADDD F6, F0, F8 C: DIVD F0, F10, F12 D: ADDD F8, F0, F14 RS A ID SOURCE1 SOURCE2 OP F0 RS A v 2 v 4 RS A DIVD RS A RS B F6 v 8 RS B ADDD RS B RS C F0 RS C v10 v12 RS C DIVD • Write of F0 by A effectively canceled. • Intermediate instructions (like B) get operands directly from A bypassing F0. 10

  11. WAW Hazards A: DIVD F0, F2, F4 • Write of F0 by A effectively canceled. B: ADDD F6, F0, F8 • Intermediate instructions (like B) get operands directly from A bypassing F0. C: DIVD F0, F10, F12 D: ADDD F8, F0, F14 ID SOURCE1 SOURCE2 OP F0 RS A v 2 v 4 RS A DIVD RS A RS B F6 v 8 RS B ADDD RS B RS C F0 RS C v 10 v 12 RS C DIVD RS D RS B RS C F8 RS D RS D v 14 RS D ADDD 11

  12. Load Store Queue (LSQ) LOAD/STORE BUFFERS FIFO QUEUE ISSUE MEM DISPATCH WRITE LSQ CDB • In-flight LOAD or STORE instructions are held in Load/Store Buffers in the LSQ unit • LOAD dispatched to memory when MEM is free • STORE dispatched when MEM is free and value to be stored is available in the Load/Store Buffer • STORE value either copied from REG during issue or copied from CDB while waiting • We will assume that the effective address is calculated during the ISSUE stage 12

  13. Load Store Queue (LSQ) LSQ: FIFO Queue that holds descriptors of issued LOAD and STORE instructions LOAD and STORE instructions wait in LSQ for memory access. ID MEM ADDR OPERAND OP LSQ A LSQ A ea SD LSQ Buffers ID: Identifies buffer (which also serves as the identification of the issued instruction) OP: Load or Store (may be implicit if the queues for LOAD and STORE are separate) MEM ADDR: Holds the effective address of the memory location to be accessed OPERAND: (STORE only) Holds value to be stored in memory When a SD instruction is issued the Buffer may either receive the: • Actual operand value by copying it from the source register or • Tag of the instruction producing the value if it (producer instruction) is still in flight ID MEM ADDR OP LSQ B ea LD 13 LSQ B

  14. Load/Store Buffers Load and Store Buffers: Hold descriptors for Load and Store instructions A: SD 0(R1), F0 I: ADDD F0, F2, F4 B: LD 0(R2), F2 A: SD 0(R1), F0 B: LD 0(R2), F2 ID ADDRESS OPERAND OP SQ A ea v 0 SQ A ea SD RS I SQ A ea SQ A ea SD LQ B LQ B ea LD F2 LQ B 14

  15. Example Schedule LOOP: A LD F0, 0(R1) | temp = x[i] B MUL F4, F0, F2 | temp = temp * a C SD F4, 0(R1) | x[i] = temp D ADDI R1, R1, #8 | i++ E BNE R1, R2, LOOP | branch if R1.ne.R2 Within an iteration RAW between (A, B) and (C, D) [focusing on FP registers only] • Across iterations WAR and WAW dependencies become apparent • A1 LD F0 , 0(R1) B1 MUL F4 , F0 , F2 C1 SD F4 , 0(R1) D1 ADDI R1, R1, #8 E1 BNE R1, R2, LOOP A2 LD F0, 0(R1) WAR with B1, WAW with A1 B2 MUL F4, F0, F2 WAR with C1, WAW with B1 C2 SD F4, 0(R1) D2 ADDI R1, R1, #8 E2 BNE R1, R2, LOOP 15

  16. Assumptions in Constructing Schedule MUL unit is 4 cycles fully pipelined • There are an unlimited number of buffers in LSQ • Hence we never stall ISSUE for want of a buffer in the LSQ unit • Assume that Branches are predicted as Taken • The Target Address is calculated during the ISSUE stage of the Branch • Hence there is a 1 cycle stall between the issue of the Branch instruction and the issue of the • instruction at the target address When a LD or SD instruction is issued the effective memory address is calculated in the Issue stage • That is, the base address register Rn is read and the offset added to it • The effective address is put into the appropriate field of the Buffer in the LSQ • The integer ALU instructions do not go through the FP or MEM pipeline • They read the source registers in the ID stage • Execute at the next cycle in the EX stage • Write on the following cycle to the integer register • Forwarding is assumed to be used whenever beneficial to reduce latency of integer instructions • 16

  17. Example Schedule LOOP: | Assume R1 = 1000 initially; Assume F2 = 200 A1 LD F0, 0(R1) | temp = x[i] B1 MUL F4, F0, F2 | temp = temp * a C1 SD F4, 0(R1) | x[i] = temp D1 ADDI R1, R1, #4 | Index next element of the array E1 BNE R1, R2, LOOP | branch if R1.ne.R2 R2 is the address past end of array LQ A1 Cycle 2. Issue A1 F0 LQ A1 LD &x[0] = 1000 LQ A LQ A1 RS B1 Cycle 3 . Issue B1 F4 MUL v 2 = 200 RS B1 v2= 200 RS B1 Cycle 4 . Issue C1 r1 SD SQ C1 &x[0] = 1000 17

  18. Schedule 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 A1 IF I D M B1 IF I D C1 IF I D1 IF E1 A2 B2 C2 D2 E2 Cycle 4: A1: Memory Read B1: In RS in Dispatch stage C1: In I stage. Will issued to LOAD Buffer at end of cycle 18 • Effective Address &x[0] = 1000 and tag LQ C

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend