superscalar
play

Superscalar Organization Nima Honarmand Spring 2016 :: CSE 502 - PowerPoint PPT Presentation

Spring 2016 :: CSE 502 Computer Architecture Superscalar Organization Nima Honarmand Spring 2016 :: CSE 502 Computer Architecture Instruction-Level Parallelism (ILP) Recall: Parallelism is the number of independent tasks


  1. Spring 2016 :: CSE 502 – Computer Architecture Superscalar Organization Nima Honarmand

  2. Spring 2016 :: CSE 502 – Computer Architecture Instruction-Level Parallelism (ILP) • Recall: “Parallelism is the number of independent tasks available” • ILP is a measure of inter-dependencies between insns. • Average ILP = num. instruction / num. cyc required in an “ideal machine” code1: ILP = 1 i.e. must execute serially code2: ILP = 3 i.e. can execute at the same time r1  r2 + 1 r1  r2 + 1 code1: code2: r3  r9 / 17 r3  r1 / 17 r4  r0 - r10 r4  r0 - r3

  3. Spring 2016 :: CSE 502 – Computer Architecture ILP != IPC • ILP usually assumes – Infinite resources – Perfect fetch – Unit-latency for all instructions • ILP is a property of the program dataflow • IPC is the “real” observed metric – How many insns. are executed per cycle • ILP is an upper-bound on the attainable IPC – Specific to a particular program

  4. Spring 2016 :: CSE 502 – Computer Architecture Purported Limits on ILP Weiss and Smith [1984] 1.58 Sohi and Vajapeyam [1987] 1.81 Tjaden and Flynn [1970] 1.86 Tjaden and Flynn [1973] 1.96 Uht [1986] 2.00 Smith et al. [1989] 2.00 Jouppi and Wall [1988] 2.40 Johnson [1991] 2.50 Acosta et al. [1986] 2.79 Wedig [1982] 3.00 Butler et al. [1991] 5.8 Melvin and Patt [1991] 6 Wall [1991] 7 Kuck et al. [1972] 8 Riseman and Foster [1972] 51 Nicolau and Fisher [1984] 90

  5. Spring 2016 :: CSE 502 – Computer Architecture ILP Limits of Scalar Pipelines (1) • Scalar upper bound on throughput – Limited to CPI >= 1 – Solution: superscalar pipelines with multiple insns at each stage Prefetch Decode1 U-pipe V-pipe Decode2 Decode2 Execute Execute Pentium Pipeline Writeback Writeback

  6. Spring 2016 :: CSE 502 – Computer Architecture ILP Limits of Scalar Pipelines (2) • Inefficient unified IF • • • pipeline ID • • • – Lower resource utilization and longer RD • • • instruction latency EX ALU MEM1 FP1 BR – Solution: diversified pipelines MEM2 FP2 FP3 WB • • •

  7. Spring 2016 :: CSE 502 – Computer Architecture ILP Limits of Scalar Pipelines (3) • Rigid pipeline stall IF • • • policy ID • • • – A stalled RD instruction stalls • • • ( in order ) Dispatch all newer Buffer ( out of order ) instructions EX ALU MEM1 FP1 BR – Solution 1: MEM2 FP2 out-of-order execution FP3 ( out of order ) Reorder Buffer ( in order ) WB • • •

  8. Spring 2016 :: CSE 502 – Computer Architecture ILP Limits of Scalar Pipelines (3) • Rigid pipeline stall Fetch policy Instruction Buffer In – A stalled Decode Program instruction stalls Order Dispatch Buffer all newer Dispatch instructions Issuing Buffer – Solution 1: Out Execute of out-of-order Order Completion Buffer execution Complete – Solution 2: inter- In Program stage buffers Store Buffer Order Retire

  9. Spring 2016 :: CSE 502 – Computer Architecture ILP Limits of Scalar Pipelines (4) • Instruction dependencies limit parallelism – Frequent stalls due to data and control dependencies – Solution 1: renaming – for WAR and WAW register dependences – Solution 2: speculation – for control dependences and memory dependences

  10. Spring 2016 :: CSE 502 – Computer Architecture ILP Limits of Scalar Pipelines (Summary) 1. Scalar upper bound on throughput – Limited to CPI >= 1 – Solution: superscalar pipelines with multiple insns at each stage 2. Inefficient unified pipeline – Lower resource utilization and longer instruction latency – Solution: diversified pipelines 3. Rigid pipeline stall policy – A stalled instruction stalls all newer instructions – Solution: out-of-order execution and inter-stage buffers 4. Instruction dependencies limit parallelism – Frequent stalls due to data and control dependencies – Solutions: renaming and speculation State of the art: Out-of-Order Superscalar Pipelines

  11. Spring 2016 :: CSE 502 – Computer Architecture Overall Picture • Fetch issues: – Fetch multiple isns I-cache – Branches Instruction – Branch target mis-alignment Branch FETCH Flow Predictor Instruction • Decode issues: Buffer DECODE – Identify insns – Find dependences Memory Integer Floating-point Media • Execution issues: – Dispatch insns Memory – Resolve dependences Data – Bypass networks Flow EXECUTE – Multiple outstanding memory Reorder accesses Buffer Register (ROB) • Completion issues: Data COMMIT Flow D-cache Store – Out-of-order completion Queue – Speculative instructions – Precise exceptions State of the art: Out-of-Order Superscalar Pipelines

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend