NOW Handout Page 1 Hazard Resolution Example Structural Add r1 := - PDF document

Review • Data stationary pipeline control EECS 252 Graduate Computer – Micro-instruction & PC track down the pipe Architecture – Accumulate state • Implementing bubbles, stalls, forwarding, multicycle operations • Branch prediction Lec 5 – Out-of-Order Completion – Static vs dynamic – N-bit saturating counters – Local and global history David Culler – Correlated predictors, Tournament, GSHARE Electrical Engineering and Computer Sciences – Branch target buffers, return address predictors University of California, Berkeley http://www.eecs.berkeley.edu/~culler http://www-inst.eecs.berkeley.edu/~cs252 2/1/2005 CS252 SP05, Lec 5 OOC 2 Outline Pipelining with Reg. Reservations • Assumptions • Relax pipeline design to allow out-of-order completions 1. Multiple pipelined function units of different latency » able to accept operations at issue rate – Cray-1: register reservations » may be exceptions (e.g., divide) • Relax pipeline to allow out-of-order issue 2. Issue instructions in order – CDC 6600: Scoreboard 3. Operand fetch in order • Compiler optimizations for ILP 4. Completion out of order » short ops may bypass long ones • Superscalar issue 5. Some shared resources (e.g., reg write port) • Maybe Go back and finish exceptions • Implications – WAR hazard still resolved by pipeline flow (2 & 3) – RAW, WAW, and structural still present • Design philosophy (ala Cray) – Resolve hazards as instruction is issued into pipeline – Pipeline is non-blocking 2/1/2005 CS252 SP05, Lec 5 OOC 3 2/1/2005 CS252 SP05, Lec 5 OOC 4 Resolving Structural Hazards Basic Issue Model • With static pipeline flow, resource usage is known in • Issue unit checks for all advance hazards Instr. Fetch • Instruction requires X at t ticks after issue – Structural RAW, WAW • If reservation X [t] is clear, issue inst and set bit • Holds issue while hazards • Otherwise, delay till clear exist Op Fetch & Issue • At each tick the reservation X [] shifts by one, so will • Upon issue, register values eventually clear provided to F.U • Multiple resources? Range of delays? op valA valB rD • Executes to completion “shift reg.” for resource X without blocking Delay till required NOW resource resource is used CS252 SP05, Lec 5 OOC 5 CS252 SP05, Lec 5 OOC 6 2/1/2005 2/1/2005 NOW Handout Page 1

Hazard Resolution Example • Structural Add r1 := r2 + r3 Instr. Fetch Instr. Fetch – Op code => resource usage Add r2 := r2 + 4 – Check resource resv Lod r5 := mem[r1+16] – Set on issue Lod r6 := mem[r1+32] • Data Op Fetch Op Fetch & Issue & Issue Mul r7 := r5 * r6 – Add reservation bit one each register Bnz r1, foo – Check RegRsv for op valA valB rD op valA valB rD Sub r7 := r0 – r0 source and destination registers – Hold issue till clear – Set bit on destination register – Clear bit on dest reg. Write • Questions: – Forwarding? Motorola 88000 “scoreboard” [sic] 2/1/2005 CS252 SP05, Lec 5 OOC 7 2/1/2005 CS252 SP05, Lec 5 OOC 8 Cray-1 Discussion Pipelining with Scoreboarding • Assumptions • Technological Assumptions 1. Multiple function units of different latency • Why no forwarding? – Especially non-pipelined units • Longevity of the ISA? 2. Issue instructions whenever FU available, unless would cause multiple outstanding writes to same regsiter • Instruction cache? – Operand fetch out of order – Four blocks (RR) of 16x4 “parcels” – Completion out of order – Issue delayed on miss 3. Some shared resources (e.g., reg write port) » 2 CP for change of block • Implications • Branch delays? – Need to resolve RAW, WAR, WAW and structural • Design philosophy (ala CDC 6600) – Brach op code delayed till second parcel is obtained – 5 clocks (reg zero, nz, pos, neg) – Issue unit tracks all outstanding dependences – Holds issue if structural or WAW hazard • I/O system? – Informs FUs when hazards resolved – FUs fetch operands from register file and proceed 2/1/2005 CS252 SP05, Lec 5 OOC 9 2/1/2005 CS252 SP05, Lec 5 OOC 10 Scoreboard Operation Example • Issue Add r1 := r2 + r3 Instr. Fetch Instr. Fetch – Hold while FU unavailable or Add r2 := r2 + 4 destination register reserved (by FU f ) Lod r5 := mem[r1+16] • Read operands Scoreboard Scoreboard FU FU Lod r6 := mem[r1+32] Issue & Issue & – SB informs FU with all sources Resolve Resolve available to fetch & go Mul r7 := r5 * r6 – Limited by read ports Bnz r1, foo Sub r7 := r0 – r0 op rA rB rD op fetch op fetch op fetch op fetch op ex ex valA valB rD • Write back – SB schedules one FU to write – Waits no FU waiting to fetch (old version) of reg CS252 SP05, Lec 5 OOC 11 CS252 SP05, Lec 5 OOC 12 2/1/2005 2/1/2005 NOW Handout Page 2

Discussion Case Study: MIPS R4000 (200 MHz) IF IS RF EX DF DS TC WB • Technological Assumptions ALU reg instr mem reg data mem • Extend to allow forwarding? • How do loads and stores work? • 8 Stage Pipeline: • Instruction cache? – IF–first half of fetching of instruction; PC selection happens here as well as initiation of instruction cache access. • I/O system? – IS–second half of access to instruction cache. – RF–instruction decode and register fetch, hazard checking and also instruction cache hit detection. – EX–execution, which includes effective address calculation, ALU operation, and branch target computation and condition evaluation. – DF–data fetch, first half of access to data cache. – DS–second half of access to data cache. – TC–tag check, determine whether the data cache access hit. – WB–write back for loads and register-register operations. • 8 Stages: What is impact on Load delay? Branch delay? Why? 2/1/2005 CS252 SP05, Lec 5 OOC 13 2/1/2005 CS252 SP05, Lec 5 OOC 14 Case Study: MIPS R4000 MIPS R4000 Floating Point IF IS RF EX DF DS TC WB TWO Cycle • FP Adder, FP Multiplier, FP Divider IF IS RF EX DF DS TC Load Latency IF IS RF EX DF DS • Last step of FP Multiplier/Divider uses FP Adder HW IF IS RF EX DF • 8 kinds of stages in FP units: IF IS RF EX IF IS RF Stage Functional unit Description IF IS A FP adder Mantissa ADD stage IF D FP divider Divide pipeline stage IF IS RF EX DF DS TC WB THREE Cycle E FP multiplier Exception test stage IF IS RF EX DF DS TC Branch Latency M FP multiplier First stage of multiplier IF IS RF EX DF DS (conditions evaluated N FP multiplier Second stage of multiplier IF IS RF EX DF during EX phase) R FP adder Rounding stage IF IS RF EX Delay slot plus two stalls IF IS RF S FP adder Operand shift stage Branch likely cancels delay slot if not taken IF IS U Unpack FP numbers IF 2/1/2005 CS252 SP05, Lec 5 OOC 15 2/1/2005 CS252 SP05, Lec 5 OOC 16 R4000 Performance MIPS FP Pipe Stages • Not ideal CPI of 1: – Load stalls (1 or 2 clock cycles) FP Instr 1 2 3 4 5 6 7 8 … – Branch stalls (2 cycles + unfilled slots) Add, Subtract U S+A A+R R+S – FP result stalls: RAW data hazard (latency) Multiply U E+M M M M N N+A R – FP structural stalls: Not enough FP hardware (parallelism) 4.5 Divide U A R D 28 … D+A D+R, D+R, D+A, D+R, A, R 4 Square root U E (A+R) 108 … A R 3.5 Negate U S 3 Absolute value U S 2.5 FP compare U A R 2 Stages: 1.5 M First stage of multiplier A Mantissa ADD stage 1 N Second stage of multiplier D Divide pipeline stage 0.5 R Rounding stage E Exception test stage 0 doduc espresso gcc nasa7 ora S Operand shift stage eqntott li spice2g6 su2cor tomcatv U Unpack FP numbers Base Load stalls Branch stalls FP result stalls FP structural stalls CS252 SP05, Lec 5 OOC 17 CS252 SP05, Lec 5 OOC 18 2/1/2005 2/1/2005 NOW Handout Page 3

NOW Handout Page 1 Hazard Resolution Example Structural Add r1 := - PDF document

Review Data stationary pipeline control EECS 252 Graduate Computer Micro-instruction & PC track down the pipe Architecture Accumulate state Implementing bubbles, stalls, forwarding, multicycle operations Branch prediction

Agenda Item 7 Page 107 Page 108 Page 109 Page 110 Page 111 Page 112 Page 113 Page 114 Page

Page 1 of 36 Page 2 of 36 Page 3 of 36 Page 4 of 36 Page 5 of 36 Page 6 of 36 Page 7 of 36

Agenda Item 7 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9 Page 10

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

Lecture 8 Friday, June 2, 2017 5:38 PM slide_8 Page 1 slide_8 Page 2 slide_8 Page 3 slide_8

177 Hudson Street Manhattan, NY 10013 Block 219 Lot 21 Historic Photos Page 1 Page 2 Page 3

PAGE 1 PAGE 2 PAGE 3 PAGE 4 Vision PAGE 5 Desire Lines of Cow Paths? PAGE 6

1. Test page This page is for testing. This page is for testing. This page is for testing.

Lecture 12 Sunday, January 27, 2019 5:25 PM Lecture12 Page 1 Lecture12 Page 2 Lecture12 Page 3

KAMPARO page 9 page 16 page 19 page 27 page 34 2 INHOUDSOPGA VE page 4 Cables Chargers

Page 35 Page 36 Page 37 Page 38 Page 39 This page is intentionally left blank

May 26, 2015 Presentation to Council and School Board Page 1 of 24 Page 2 of 24 Page 3 of 24

BRIGHT-LINE TEST Table of Contents page page page page page 3 5 11 15 19 What is the

HANDOUTS 1 Slide 2 Handout contents Page 2-3 Handout contents 4 Introduction 5 - 6 Paying

Contents Nordea Page 3 Integration Page 16 Highlights and market development Page 24

Contents Summary presentation Q3/02 Page 3 Nordea Page 43 Integration Page 54

Pipelining Raul Queiroz Feitosa Parts of these slides are from the support material provided by

ta9 Spring 2006 Amar Lior Adapted from Computer Organization&Design,

Slides for Lecture 16 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve

ODE's ode Pablo Riera Laboratorio de Inteligencia Artificial Aplicada, Instituto de Ciencias de

Apndice C: Conceitos bsicos de pipelining 1 Tpicos IC-UNICAMP Funcionamento bsico

CS 251 Fall 2019 CS 251 Fall 2019 Eager evaluation: arguments first Principles of

COMP9032: Microprocessors and Interfacing Processor organisation Instruction execution

C6x programming (1/3) C6x code development Programming levels C (for the C/C++ compiler)

NOW Handout Page 1 Hazard Resolution Example Structural Add r1 := - PDF document

Review Data stationary pipeline control EECS 252 Graduate Computer Micro-instruction & PC track down the pipe Architecture Accumulate state Implementing bubbles, stalls, forwarding, multicycle operations Branch prediction

Agenda Item 7 Page 107 Page 108 Page 109 Page 110 Page 111 Page 112 Page 113 Page 114 Page

Page 1 of 36 Page 2 of 36 Page 3 of 36 Page 4 of 36 Page 5 of 36 Page 6 of 36 Page 7 of 36

Agenda Item 7 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9 Page 10

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

Lecture 8 Friday, June 2, 2017 5:38 PM slide_8 Page 1 slide_8 Page 2 slide_8 Page 3 slide_8

177 Hudson Street Manhattan, NY 10013 Block 219 Lot 21 Historic Photos Page 1 Page 2 Page 3

PAGE 1 PAGE 2 PAGE 3 PAGE 4 Vision PAGE 5 Desire Lines of Cow Paths? PAGE 6

1. Test page This page is for testing. This page is for testing. This page is for testing.

Lecture 12 Sunday, January 27, 2019 5:25 PM Lecture12 Page 1 Lecture12 Page 2 Lecture12 Page 3

KAMPARO page 9 page 16 page 19 page 27 page 34 2 INHOUDSOPGA VE page 4 Cables Chargers

Page 35 Page 36 Page 37 Page 38 Page 39 This page is intentionally left blank

May 26, 2015 Presentation to Council and School Board Page 1 of 24 Page 2 of 24 Page 3 of 24

BRIGHT-LINE TEST Table of Contents page page page page page 3 5 11 15 19 What is the

HANDOUTS 1 Slide 2 Handout contents Page 2-3 Handout contents 4 Introduction 5 - 6 Paying

Contents Nordea Page 3 Integration Page 16 Highlights and market development Page 24

Contents Summary presentation Q3/02 Page 3 Nordea Page 43 Integration Page 54

Pipelining Raul Queiroz Feitosa Parts of these slides are from the support material provided by

ta9 Spring 2006 Amar Lior Adapted from Computer Organization&amp;Design,

Slides for Lecture 16 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve

ODE's ode Pablo Riera Laboratorio de Inteligencia Artificial Aplicada, Instituto de Ciencias de

Apndice C: Conceitos bsicos de pipelining 1 Tpicos IC-UNICAMP Funcionamento bsico

CS 251 Fall 2019 CS 251 Fall 2019 Eager evaluation: arguments first Principles of

COMP9032: Microprocessors and Interfacing Processor organisation Instruction execution

C6x programming (1/3) C6x code development Programming levels C (for the C/C++ compiler)

ta9 Spring 2006 Amar Lior Adapted from Computer Organization&Design,