out of order loads stores
play

OUT-OF-ORDER LOADS/STORES Mahdi Nazm Bojnordi Assistant Professor - PowerPoint PPT Presentation

OUT-OF-ORDER LOADS/STORES Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Loads and Stores What if IQ also had load and store instructions? Issue Queue (IQ) Physical


  1. OUT-OF-ORDER LOADS/STORES Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture

  2. Loads and Stores ¨ What if IQ also had load and store instructions? Issue Queue (IQ) Physical Register Branch File FU-1 Predictor Front … RAT FU-n Inst. Inst. Free Branch Retire Memory Decoder Register RAT List Data Memory Re-Order Buffer (ROB) Fetch Decode Rename Issue Execute Complete Commit

  3. Memory Data Dependence ¨ Can we continue executing loads/stores out-of- order? ¤ Effective address is required for dependence check Instructions in the issue queue Memory Load R1 ß Mem[R2] Load R3 ß Mem[R4+8] Store R5 à Mem[R6] Load R7 ß Mem[R8+16] Store R9 à Mem[R10]

  4. Memory Data Dependence ¨ Can we continue executing loads/stores out-of- order? ¤ Effective address is required for dependence check Instructions in the issue queue Memory Load R1 ß Mem[R2] Load R3 ß Mem[R4+8] Store R5 à Mem[R6] Load R7 ß Mem[R8+16] Store R9 à Mem[R10]

  5. Memory Data Dependence ¨ Can we continue executing loads/stores out-of- order? ¤ Effective address is required for dependence check Instructions in the issue queue Memory Possible WAR Load R1 ß Mem[R2] Load R3 ß Mem[R4+8] Store R5 à Mem[R6] Load R7 ß Mem[R8+16] Store R9 à Mem[R10]

  6. Memory Data Dependence ¨ Can we continue executing loads/stores out-of- order? ¤ Effective address is required for dependence check Instructions in the issue queue Memory Possible WAR Load R1 ß Mem[R2] Load R3 ß Mem[R4+8] Store R5 à Mem[R6] Load R7 ß Mem[R8+16] Possible RAW Store R9 à Mem[R10]

  7. Memory Data Dependence ¨ Can we continue executing loads/stores out-of- order? ¤ Effective address is required for dependence check Instructions in the issue queue Memory Possible WAR Load R1 ß Mem[R2] Load R3 ß Mem[R4+8] Store R5 à Mem[R6] Load R7 ß Mem[R8+16] Possible RAW Store R9 à Mem[R10] Possible WAW

  8. Memory Data Dependence ¨ Can we continue executing loads/stores out-of- order? ¤ Effective address is required for dependence check Instructions in the issue queue Memory Possible WAR Load R1 ß Mem[R2] Load R3 ß Mem[R4+8] Store R5 à Mem[R6] Load R7 ß Mem[R8+16] Possible RAW Store R9 à Mem[R10] Possible WAW Does renaming help?

  9. Load-Store Queue ¨ Dedicated queue only for load/store instructions ¤ Check availability of operands every cycle ¨ Two steps for load/store instructions ¤ Compute the effective address when register is available ¤ Send the request to memory if there is no memory hazards Load P34 P13 + 8 ALU

  10. Load-Store Queue ¨ Dedicated queue only for load/store instructions ¤ Check availability of operands every cycle ¨ Two steps for load/store instructions ¤ Compute the effective address when register is available ¤ Send the request to memory if there is no memory hazards P13 Load P34 P13 + 8 ALU

  11. Load-Store Queue ¨ Dedicated queue only for load/store instructions ¤ Check availability of operands every cycle ¨ Two steps for load/store instructions ¤ Compute the effective address when register is available ¤ Send the request to memory if there is no memory hazards P13 Load P34 P13 + 8 ALU

  12. Load-Store Queue ¨ Dedicated queue only for load/store instructions ¤ Check availability of operands every cycle ¨ Two steps for load/store instructions ¤ Compute the effective address when register is available ¤ Send the request to memory if there is no memory hazards P13 Load P34 0xbeef00 P13 + 8 ALU

  13. Memory Dependence Check ¨ Checking for RAW, WAR, and WAW hazards 1. Which load instructions can be issued? Load P34 0x12345 Load P61 Store P26 Load P11 Load P29 0x12345 Store P30 0x11111 Load P15 0x22222 Load P10 0x11111 Memory

  14. Memory Dependence Check ¨ Checking for RAW, WAR, and WAW hazards 1. Which load instructions can be issued? Load P34 0x12345 Load P61 Due to RAW hazards, only those loads that are not following any Store P26 unknown stores can be issued. Load P11 Load P29 0x12345 Store P30 0x11111 Load P15 0x22222 Load P10 0x11111 Memory

  15. Memory Dependence Check ¨ Checking for RAW, WAR, and WAW hazards 1. Which load instructions can be issued? Load P34 0x12345 Load P61 Due to RAW hazards, only those loads that are not following any Store P26 unknown stores can be issued. Load P11 Can we bypass memory? Load P29 0x12345 Store P30 0x11111 Load P15 0x22222 Load P10 0x11111 Memory

  16. Memory Dependence Check ¨ Checking for RAW, WAR, and WAW hazards 2. Which store instructions can be issued? Load P34 0x12345 Load P61 Store P26 Load P11 Load P29 0x12345 Store P30 0x11111 Load P15 0x22222 Load P10 0x11111 Memory

  17. Memory Dependence Check ¨ Checking for RAW, WAR, and WAW hazards 2. Which store instructions can be issued? Load P34 0x12345 Load P61 Due to WAW and WAR hazards, only when there is no older instructions. Store P26 (why?) Load P11 Load P29 0x12345 Store P30 0x11111 Load P15 0x22222 Load P10 0x11111 Memory

  18. Memory Dependence Check ¨ Checking for RAW, WAR, and WAW hazards Which instructions can be issued? Load P34 0x12345 Load P61 Store P26 0x22222 Load P11 Load P29 0x12345 Store P30 0x11111 Load P15 0x22222 Load P10 0x11111 Memory

  19. Memory Dependence Prediction ¨ Can we predict memory dependence? Issue/execute load instructions even if they Load P34 0x12345 are following unresolved stores Load P61 Store P26 What if the prediction was not correct? Load P11 Load P29 0x12345 Store P30 0x11111 Load P15 0x22222 Load P10 0x11111

  20. Out-of-order Pipeline with LSQ ¨ LSQ is an extension to IQ Issue Queue (IQ) Physical Register Branch File FU-1 Predictor Front … RAT FU-n Free Inst. Inst. Branch Retire Register Memory Decoder RAT List LSQ Data Memory Re-Order Buffer (ROB) Fetch Decode Rename Issue Execute Complete Commit

  21. Memory Hierarchy “Ideally one would desire an indefinitely large memory capacity such that any particular [...] word would be immediately available [...] We are [...] forced to recognize the possibility of constructing a hierarchy of memories, each of which has greater capacity than the preceding but which is less quickly accessible.” -- Burks, Goldstine, and von Neumann, 1946 Core Level 1 Greater capacity Level 2 Less quickly accessible Level 3

  22. The Memory Wall ¨ Processor-memory performance gap increased over 50% per year ¤ Processor performance historically improved ~60% per year ¤ Main memory access time improves ~5% per year

  23. Modern Memory Hierarchy ¨ Trade-off among memory speed, capacity, and cost small, fast, expensive Register Cache Memory big, slow, inexpensive SSD Disk

  24. Memory Technology ¨ Random access memory (RAM) technology ¤ access time same for all locations (not so true anymore) ¤ Static RAM (SRAM) n typically used for caches n 6T/bit; fast but – low density, high power, expensive ¤ Dynamic RAM (DRAM) n typically used for main memory n 1T/bit; inexpensive, high density, low power – but slow

  25. RAM Cells ¨ 6T SRAM cell bitline bitline ¤ internal feedback wordline maintains data while power on ¨ 1T-1C DRAM cell bitline ¤ needs refresh regularly to wordline preserve data

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend