superscalar
play

Superscalar Organization Nima Honarmand Spring 2018 :: CSE 502 - PowerPoint PPT Presentation

Spring 2018 :: CSE 502 Superscalar Organization Nima Honarmand Spring 2018 :: CSE 502 Review: Instruction-Level Parallelism (ILP) Parallelism is the number of independent tasks available ILP is a measure of inter-dependencies


  1. Spring 2018 :: CSE 502 Superscalar Organization Nima Honarmand

  2. Spring 2018 :: CSE 502 Review: Instruction-Level Parallelism (ILP) • “Parallelism is the number of independent tasks available” • ILP is a measure of inter-dependencies between insns • Average ILP = num. instruction / num. cyc required in an “ideal machine” code1: ILP = 1 i.e. must execute serially code2: ILP = 3 i.e. can execute at the same time r1  r2 + 1 r1  r2 + 1 code1: code2: r3  r9 / 17 r3  r1 / 17 r4  r0 - r10 r4  r0 - r3

  3. Spring 2018 :: CSE 502 ILP != IPC • ILP usually assumes – Infinite resources – Perfect fetch and branch prediction – Unit-latency for all instructions • ILP is a property of the program dataflow • IPC is the “real” observed metric – How many insns. are executed per cycle • ILP is an upper-bound on the attainable IPC – Specific to a particular program

  4. Spring 2018 :: CSE 502 Purported Limits on ILP Weiss and Smith [1984] 1.58 Sohi and Vajapeyam [1987] 1.81 Tjaden and Flynn [1970] 1.86 Tjaden and Flynn [1973] 1.96 Uht [1986] 2.00 Smith et al. [1989] 2.00 Jouppi and Wall [1988] 2.40 Johnson [1991] 2.50 Acosta et al. [1986] 2.79 Wedig [1982] 3.00 Butler et al. [1991] 5.8 Melvin and Patt [1991] 6 Wall [1991] 7 Kuck et al. [1972] 8 Riseman and Foster [1972] 51 Nicolau and Fisher [1984] 90

  5. Spring 2018 :: CSE 502 ILP Limits of Scalar Pipelines (1) • Scalar upper bound on throughput – Limited to IPC <= 1 – Solution: superscalar pipelines with multiple insns at each stage Prefetch Decode1 U-pipe V-pipe Decode2 Decode2 Execute Execute Pentium Pipeline Writeback Writeback

  6. Spring 2018 :: CSE 502 ILP Limits of Scalar Pipelines (2) • Unified pipeline : a IF • • • pipeline where all instructions go ID • • • through the same stages RD • • • – Like our 5-stage pipeline EX ALU MEM1 FP1 BR • Unified pipelines are MEM2 FP2 inefficient FP3 – Lower resource utilization and longer instruction latency WB • • • – Solution: diversified pipelines

  7. Spring 2018 :: CSE 502 ILP Limits of Scalar Pipelines (3) • Rigid pipeline stall IF • • • policy ID • • • – A stalled RD • • • instruction stalls ( in order ) Dispatch all newer Buffer ( out of order ) instructions EX ALU MEM1 FP1 BR – Solution 1: MEM2 FP2 out-of-order FP3 execution ( out of order ) Reorder Buffer ( in order ) WB • • •

  8. Spring 2018 :: CSE 502 ILP Limits of Scalar Pipelines (3) • Rigid pipeline stall Fetch policy Instruction Buffer In – A stalled Decode Program instruction stalls Order Dispatch Buffer all newer Dispatch instructions Issuing Buffer – Solution 1: Out Execute of out-of-order Order Completion Buffer execution Complete – Solution 2: inter- In Program stage buffers Store Buffer Order Retire

  9. Spring 2018 :: CSE 502 ILP Limits of Scalar Pipelines (4) • Instruction dependencies limit parallelism – Frequent stalls due to data and control dependencies – Solution 1: renaming – for WAR and WAW register dependences – Solution 2: speculation – for control dependences and memory dependences

  10. Spring 2018 :: CSE 502 Summary : ILP Limits of Scalar Pipelines 1) Scalar upper bound on throughput – Limited to IPC <= 1 – Solution: superscalar pipelines with multiple insns at each stage 2) Inefficient unified pipeline – Lower resource utilization and longer instruction latency – Solution: diversified pipelines 3) Rigid pipeline stall policy – A stalled instruction stalls all newer instructions – Solution: out-of-order execution and inter-stage buffers 4) Instruction dependencies limit parallelism – Frequent stalls due to data and control dependencies – Solutions: renaming and speculation State of the art: Out-of-Order Superscalar Speculative Pipelines

  11. Spring 2018 :: CSE 502 Superscalar Pipelines: Overall Picture • Fetch issues: – Fetch multiple isns I-cache – Branches and speculation Instruction Branch FETCH Flow • Decode issues: Predictor Instruction Buffer – Identify insns DECODE – Find dependences • Execution issues: Memory Integer Floating-point Media – Dispatch insns – Resolve dependences Memory – Forwarding networks Data – Multiple outstanding memory Flow EXECUTE accesses Reorder Buffer Register • Completion issues: (ROB) Data COMMIT – Out-of-order completion Flow D-cache Store Queue – Speculative instructions – Precise exceptions State of the art: Out-of-Order Superscalar Speculative Pipelines

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend