Credits Some of the material in this presentation is taken from: - PowerPoint PPT Presentation

1 2 Credits • Some of the material in this presentation is taken from: – Computer Architecture: A Quantitative Approach • John Hennessy & David Patterson EE 457 Unit 9a • Some of the material in this presentation is derived from course notes and slides from – Prof. Michel Dubois (USC) – Prof. Murali Annavaram (USC) Exploiting ILP – Prof. David Patterson (UC Berkeley) Out-of-Order Execution 3 4 Outline • _________________ Parallelism – In-order (Io) pipeline • From academic 5-stage pipeline • To 8-stage MIPS R4000 pipeline • Superscalar, superpipelined – Out-of-Order (OoO) Execution Other In-Order techniques • OoO Execution AND Out-of-order completion (Problem: Exceptions) • OoO Execution BUT In-order completion SUPERSCALAR & SUPERPIPELINING • ________________ Parallelism – Chip ______________ (CMT) – Chip ______________ (CMP)

5 6 Overview 2-way Superscalar • Superscalar = More than 1 instruction ___________________________ • One ALU & Data transfer (LW/SW) instruction can be issued at the same time – ______uperscalar = Proc. that can issue 2 instructions per clock cycle Instruction Pipeline Stages – Success is sensitive to ability to find independent instructions to issue in the same cycle ALU or branch IF ID EX MEM WB • Superpipelining = Many small stages to boost _________________ LW/SW IF ID EX MEM WB – Success depends of finding instructions to schedule in the shadow of data and control hazards ALU or branch IF ID EX MEM WB LW/SW IF ID EX MEM WB ALU or branch IF ID EX MEM WB Instruction Instruc. Instruc. Data LW/SW IF ID EX MEM WB Superscalar Execute Write back 1 Fetch Decode Memory Integer Slot Instruction PC Instruc. Instruc. Data Execute Write back 2 Fetch Decode Memory ALU Reg. Superscalar: Executing more than 1 instruction per clock cycle (CPI < 1) File Superpipelining I-Cache Instruction (_ Read, IF1 IF2 ID EX DM1 DM2 DM3 WB Addr. 1 LD/ST Slot D-Cache _ Write) Calc. Instruction IF1 IF2 ID EX DM1 DM2 DM3 WB 2 instructions 2 Superpipelining: Divide logic into many short stages (______ Clock Frequency) 7 8 Instruction Level Parallelism (ILP) • Although a program defines a sequential ordering of instructions, in reality many instructions can be executed in parallel. • ILP refers to the process of finding instructions from a single program/thread of execution that can be executed in parallel • ________________________ is what truly limits ordering • _____________ instructions (no data dependencies) can be executed at the same time) • _____________________ also provide some ordering constraints lw $s3,0($s4) OUT-OF-ORDER EXECUTION and $t3,$t2,$t3 Program add $t0,$t0,$s4 Dependency Order or $t5,$t3,$t2 (In-order) Graph sub $t1,$t1,$t2 beq $t0,$t8,L1 We may perform xor $s0,$t1,$s2 execution out-of-order Cycle 1: / / / Cycle 2: / / / Cycle 3: / / /

9 10 Basic Blocks Out-of-Order Motivation • Basic Block (def.) = Sequence of instructions that will • Hide the impact of dynamic events such as a always be ________________ ______________ – No __________________ out lw $s3,0($s4) • Out-of-Order (OoO) Execution and $t3,$t2,$t3 – No branch targets coming ____ This is a L1: add $t0,$t0,$s4 basic block or $t5,$t3,$t2 – Let ________________ instructions behind a stalled (starts w/ sub $t1,$t1,$t2 – Also called “straight-line” code target, ends beq $t0,$t8,L1 instruction execute with branch) xor $s0,$t1,$s2 – Average size: _____ instrucs. – Important aspect: Completion Ordering • Instructions in a basic block can be overlapped if • Out-of-Order completion: Let the independent instruction that has there are no data dependencies been executed ____________________________________ before the stalled instruction completes • ________ dependences really ________________ of – Problem: Exception handling possible instructions to overlap • In-Order completion: Let the independent instructions execute but ______________ their results until the stalled instruction – W/o extra hardware, we can only overlap execution of completes instructions within a basic block 11 12 Out-of-Order Execution In- or Out-of-Order Completion • “Execution” here means ____________ the results not • IoI/IoD => OoOE => IoC necessarily _____________ them to a register or memory – In-order completion is necessary to support precise exceptions [exact state at time of exception] • Completion means _____________________the results to • We will present the concept of OoOC (out-of-order register file or memory completion) which is a bit easier and then come back to the • While we say out-of-order execution we really mean: desired approach of IoC – In-order Issue/Dispatch (IoD) • OoOC Issues Execution – Out-of-Order Execution (OoOE) – _____________…we should not commit an instruction that came after – In-order Completion (IoC) (in program order) a branch Issue/Dispatch Completion – Solution: Stall dispatching instructions LW $4,0($5) // cache miss after a branch until we resolve the BEQ $4,$0,L1 outcome ADD $6,$7,$8 // What if we execute this ADD out of order In-order In-order Out-of-Order

13 14 Scheduling Strategies Static Scheduling • _____________ Scheduling • Strengths – Hardware simplicity [Better clock rate] – ___________ re-orders instructions in such a way that no • Power/energy advantage dependencies will be violated and allows for OoOE • Compiler has a global view of the program anyway, so it should be able to • ____________ Scheduling do a “good” job – Very predictable: static performance predictions are reliable – ______ implementing the Tomasulo algorithm or other similar • Weaknesses approach will re-order instructions to allow for OoOE – Requires _______________ to take advantage of new/modified • More Advanced Concepts architecture – Branch prediction and speculative execution (execution beyond – Cannot foresee dynamic (data-dependent) events a branch flushing if incorrect) will be covered later • Cache miss, conditional branches (can only recedule instructions in a basic block) – Cannot precompute memory addresses – No good solution for precise exceptions with out-of-order completion 15 16 Where to Stall? Where to Stall? • Simple 5-stage pipeline: • In 5-stage pipeline (in-order execution) RAW – Dependent instructions cannot be stalled in the EX stage dependency was solved by 0 – ______________________ or 1 FLUSH PCWrite Mem WB HDU IRWrite 0 Mem WB 0 1 – ______________ Stall IF.Flush 0 Why? What if ADD was also WB 0 Ex 1 dependent on the instruction in MemToReg Control Branch 4 ____… ADD has no place to + • Dependent instructions stalled in the ID stage rs Read + ________ that forwarded value Sh. MemRead & Reg. 1 # MemWrite Left 5 2 Pipeline Stage Register Pipeline Stage Register Instruction Register rt Read 0 Thus we stall in ID so we can Read if necessary 1 Reg. 2 # Pipeline Stage Register data 1 use the ______________ to 5 0 2 I-Cache grab dependent values. Further . Write Zero PC ALUSelA Reg. # ALU stalling in ID incurs only 1 cycle Res. Read penalty as would stalling in EX. Write 0 0 data 2 D-Cache Data 1 1 1 2 Register File ADD $1,$3,$4 stall LW $4 Data Mem. or ALU result Sign ALUSelB ALUSrc Extend Reset 16 32 Forwarding 0 Unit DM rs IM ALU Reg Reg Prior ALU 1 rt Result rd Regwrite & Regwrite, WriteReg# WriteReg#

17 18 Where to Stall? Forwarding in OoO Execution • In 5-stage pipeline later instructions carried their source register IDs into the • But to implement OoO execution, we ________________ in the decode EX stage to be compared with _____________ register ID’s of their _______ stage since that would prevent any further issuing of instructions instructions • Thus, now we will issue to queues for each of the multiple functional units • But in OoO execution, we may have ______ (earlier) instructions in front of and have the instruction stall in the queue until it is ready us and cannot afford to perform so many comparisons (as well as handling the case where many earlier instructions are producing new version of a Queues + register) ALU Functional Units • Instead, the dispatch unit will explicitly tell the dependent instruction who to ____________ data from mWrite Read & 2 Pipeline Stage Register Pipeline Stage Register MUL 0 Read 1 data 1 Pipeline Stage Register 0 2 Zero ALUSelA ALU IM DM Res. Reg Reg Read 0 0 data 2 D-Cache 1 1 1 2 File DIV Data Mem. or ALU result ALUSelB ALUSrc 32 Forwarding 0 Unit rs Stalling here would _______ up Prior ALU 1 rt Result Addr the pipeline rd Regwrite & Regwrite, WriteReg# Calc. WriteReg# 19 20 Tomasulo’s Plan OoO Execution Problems • OoO Execution • For the time – No branch prediction • Multiple functional units – No speculative execution beyond a branch – Integer ALU, Data memory, Multiplier, Divider • So we simply stall on a conditional branch • Queues between ID and EX stages (in place of ID/EX register) • For the time, no support for precise exceptions – Allows later instructions to keep issuing even if earlier ones are stalled – Even then what about hazards…

Credits Some of the material in this presentation is taken from: - PowerPoint PPT Presentation

1 2 Credits Some of the material in this presentation is taken from: Computer Architecture: A Quantitative Approach John Hennessy & David Patterson EE 457 Unit 9a Some of the material in this presentation is derived from

Class Registration 2020-2021 Graduation Requirements at MHS GRADUATION REQUIREMENTS21.5

Credits 1 Credit = Successful completion of a 1 year course (2 semesters) .5 Credit = Successful

Twinning 9% Credits and 4% Credits Dan Rosen Klein Hornig LLP 2019 NH&RA Annual

GRADUATION REQUIREMENTS English: 4 credits Math: 4 credits (Algebra 2) Science: 3 credits

GRADUATION REQUIREMENTS English: 4 credits Math: 4 credits (Algebra 2) Science: 3 credits

COURSE SELECTION CANYON CREST ACADEMY 2019-2020 SDUHSD GRADUATION REQUIREMENTS 40 credits

GRADUATION REQUIREMENTS English: 4 credits Math: 4 credits (Algebra 2) Science: 3 credits

Class of 2023 Course Selection Freshman 101 Graduation Requirements: 4 credits each

Graduation Requirements (1 semester = .5 credit) 17 CORE CREDIT REQUIREMENTS 3 credits Science

Hugh McRoberts Current Grade 10 Presentation 2020 GRADUATION PROGRAM 1. Required 48 credits:

Class of 2024 Course Selection Graduation Requirements: 4 credits each semester for a total

Slide Credits:Agrawal Slide Credits:Agrawal Slide Credits:Agrawal Kolmogorov-Smirnov Test

2016-2017 Course Selection 12 th Grade Graduation Requirements for the class of 2017 English 4

St. Augustine High School 2014-2015 Registration Graduation Requirements English 4 credits

GRADUATION REQUIREMENTS English 4 Credits -I, II, III, IV Math 4 Credits Math I, Math II,

DIPLOMA DESIGNATIONS SCHOLAR MERIT ALL REQUIREMENTS OF THE STANDARD ALL REQUIREMENTS OF

THE SYNTAX OF OF DISCOU OURSE: WH WHAT AN AN ANISHINAABEMO MOWIN OR ORAL TEXT TEACHES US

A characterization of non-Noetherian BFDS and FFDs Richard Erwin Hasenauer March 25, 2019 Let D

T H E I M PA C T O F E A R T H S C AT T E R I N G S O N L I G H T D A R K M AT T E R D E T

one coin, two sides: the microwave and gamma-ray haze Greg Dobler (KITP/UCSB) one coin, two

Differential algebraic equations and distributional solutions Stephan Trenn Coordinated Science

Discrete Mathematics 1 Computer Science Tripos, Part 1A Natural Sciences Tripos, Part 1A,

Currency and Interest Rate Futures Course web pages: http://finance2010.pageout.net ID:

Some anomalies in Baryon Time-like Form Factors Rinaldo Baldini Ferroli INFN- Laboratori

Credits Some of the material in this presentation is taken from: - PowerPoint PPT Presentation

1 2 Credits Some of the material in this presentation is taken from: Computer Architecture: A Quantitative Approach John Hennessy & David Patterson EE 457 Unit 9a Some of the material in this presentation is derived from

Class Registration 2020-2021 Graduation Requirements at MHS GRADUATION REQUIREMENTS21.5

Credits 1 Credit = Successful completion of a 1 year course (2 semesters) .5 Credit = Successful

Twinning 9% Credits and 4% Credits Dan Rosen Klein Hornig LLP 2019 NH&amp;RA Annual

GRADUATION REQUIREMENTS English: 4 credits Math: 4 credits (Algebra 2) Science: 3 credits

GRADUATION REQUIREMENTS English: 4 credits Math: 4 credits (Algebra 2) Science: 3 credits

COURSE SELECTION CANYON CREST ACADEMY 2019-2020 SDUHSD GRADUATION REQUIREMENTS 40 credits

GRADUATION REQUIREMENTS English: 4 credits Math: 4 credits (Algebra 2) Science: 3 credits

Class of 2023 Course Selection Freshman 101 Graduation Requirements: 4 credits each

Graduation Requirements (1 semester = .5 credit) 17 CORE CREDIT REQUIREMENTS 3 credits Science

Hugh McRoberts Current Grade 10 Presentation 2020 GRADUATION PROGRAM 1. Required 48 credits:

Class of 2024 Course Selection Graduation Requirements: 4 credits each semester for a total

Slide Credits:Agrawal Slide Credits:Agrawal Slide Credits:Agrawal Kolmogorov-Smirnov Test

2016-2017 Course Selection 12 th Grade Graduation Requirements for the class of 2017 English 4

St. Augustine High School 2014-2015 Registration Graduation Requirements English 4 credits

GRADUATION REQUIREMENTS English 4 Credits -I, II, III, IV Math 4 Credits Math I, Math II,

DIPLOMA DESIGNATIONS SCHOLAR MERIT ALL REQUIREMENTS OF THE STANDARD ALL REQUIREMENTS OF

THE SYNTAX OF OF DISCOU OURSE: WH WHAT AN AN ANISHINAABEMO MOWIN OR ORAL TEXT TEACHES US

A characterization of non-Noetherian BFDS and FFDs Richard Erwin Hasenauer March 25, 2019 Let D

T H E I M PA C T O F E A R T H S C AT T E R I N G S O N L I G H T D A R K M AT T E R D E T

one coin, two sides: the microwave and gamma-ray haze Greg Dobler (KITP/UCSB) one coin, two

Differential algebraic equations and distributional solutions Stephan Trenn Coordinated Science

Discrete Mathematics 1 Computer Science Tripos, Part 1A Natural Sciences Tripos, Part 1A,

Currency and Interest Rate Futures Course web pages: http://finance2010.pageout.net ID:

Some anomalies in Baryon Time-like Form Factors Rinaldo Baldini Ferroli INFN- Laboratori

Twinning 9% Credits and 4% Credits Dan Rosen Klein Hornig LLP 2019 NH&RA Annual