What about branches? Branch outcomes are not known until EXE What - PowerPoint PPT Presentation

What about branches? • Branch outcomes are not known until EXE • What are our options? 1

Control Hazards 2

Today • Quiz • Control Hazards • Midterm review • Return your papers 3

Key Points: Control Hazards • Control occur when we don’t know what the next instruction is • Mostly caused by branches • Strategies for dealing with them • Stall • Guess! • Leads to speculation • Flushing the pipeline • Strategies for making better guesses • Understand the difference between stall and flush 4

Control Hazards add $s1, $s3, $s2 • Computing the new PC sub $s6, $s5, $s2 beq $s6, $s7, somewhere and $s2, $s3, $s1 Fetch Deco Mem Write EX de back 5

Computing the PC • Non-branch instruction • PC = PC + 4 • When is PC ready? Fetch Deco Mem Write EX de back 6

Computing the PC • Branch instructions • bne $s1, $s2, offset • if ($s1 != $s2) { PC = PC + offset} else {PC = PC + 4;} • When is the value ready? Fetch Deco Mem Write EX de back 7

Computing the PC if (Instruction is branch) { if ($s1 != $s2) { PC = PC + offset; • Wait, when we do know? } else { PC = PC + 4; } } else { PC = PC + 4; } Fetch Deco Mem Write EX de back 8

There is a constant control hazard • We don’t even know what kind of instruction we have until decode. • Let’s consider the non-branch case first. • What do we do? 9

Option 1: Smart ISA design Cycles Fetch Deco Mem Write EX add $s0, $t0, $t1 de back Fetch Deco Mem Write EX sub $t2, $s0, $t3 de back Fetch Deco Mem Write EX sub $t2, $s0, $t3 de back Fetch Deco Mem Write EX sub $t2, $s0, $t3 de back • Make it very easy to tell if the instruction is a branch -- maybe a single bit or just a couple. • Decode is trivial • Pre-decode -- • Do part of decode when the instruction comes on chip. • more on this later 10

Option 2: The compiler • Use “branch delay” slots. • The next N instructions after a branch are always executed • Good • Simple hardware • Bad • N cannot change. 11

Delay slots. Cycles Taken Fetch Deco Mem Write EX bne $t2, $s0, somewhere de back Fetch Deco Mem Write EX add $t2, $s4, $t1 de back Branch Delay Fetch Deco Mem Write EX add $s0, $t0, $t1 de back ... Fetch Deco Mem somewhere: EX de sub $t2, $s0, $t3 12

Option 4: Stall Cycles Fetch Deco Mem Write EX add $s0, $t0, $t1 de back Fetch Deco Mem Write EX bne $t2, $s0, somewhere de back Fetch Deco EX sub $t2, $s0, $t3 Stall de Fetch Deco sub $t2, $s0, $t3 de • What does this do to our CPI? • Speedup? 13

Performance impact of stalling • ET = I * CPI * CT • Branches about about 1 in 5 instructions • What’s the CPI for branches? • Speedup = • ET = 14

Performance impact of stalling • ET = I * CPI * CT • Branches about about 1 in 5 instructions • What’s the CPI for branches? 1 + 2 = 3 This is really the CPI for the instruction that follows the branch. • Speedup = • ET = 14

Performance impact of stalling • ET = I * CPI * CT • Branches about about 1 in 5 instructions • What’s the CPI for branches? 1 + 2 = 3 This is really the CPI for the instruction that follows the branch. • Speedup = 1/(.2/(1/3) + (.8) = 0.714 • ET = 14

Performance impact of stalling • ET = I * CPI * CT • Branches about about 1 in 5 instructions • What’s the CPI for branches? 1 + 2 = 3 This is really the CPI for the instruction that follows the branch. • Speedup = 1/(.2/(1/3) + (.8) = 0.714 • ET = 1 * (.2*3 + .8 * 1) * 1 = 1.4 14

Option 2: Simple Prediction • Can a processor tell the future? • For non-taken branches, the new PC is ready immediately. • Let’s just assume the branch is not taken • Also called “branch prediction” or “control speculation” • What if we are wrong? 15

Predict Not-taken Cycles Not-taken Fetch Deco Mem Write EX bne $t2, $s0, somewhere de back Taken Fetch Deco Mem Write EX bne $t2, $s4, else de back add $s0, $t0, $t1 ... else: sub $t2, $s0, $t3 • We start the add, and then, when we discover the branch outcome, we squash it. • We “flush” the pipeline. 16

Predict Not-taken Cycles Not-taken Fetch Deco Mem Write EX bne $t2, $s0, somewhere de back Taken Fetch Deco Mem Write EX bne $t2, $s4, else de back Fetch Deco Mem Write EX add $s0, $t0, $t1 de back ... else: sub $t2, $s0, $t3 • We start the add, and then, when we discover the branch outcome, we squash it. • We “flush” the pipeline. 16

Predict Not-taken Cycles Not-taken Fetch Deco Mem Write EX bne $t2, $s0, somewhere de back Taken Fetch Deco Mem Write EX bne $t2, $s4, else de back Fetch Deco Mem Write EX add $s0, $t0, $t1 de back ... Fetch Deco else: de sub $t2, $s0, $t3 • We start the add, and then, when we discover the branch outcome, we squash it. • We “flush” the pipeline. 16

Predict Not-taken Cycles Not-taken Fetch Deco Mem Write EX bne $t2, $s0, somewhere de back Taken Fetch Deco Mem Write EX bne $t2, $s4, else de back Fetch Deco Mem Write EX Squash add $s0, $t0, $t1 de back ... Fetch Deco else: de sub $t2, $s0, $t3 • We start the add, and then, when we discover the branch outcome, we squash it. • We “flush” the pipeline. 16

Simple “static” Prediction • “static” means before run time • Many prediction schemes are possible • Predict taken • Pros? • Predict not-taken • Pros? 17

Simple “static” Prediction • “static” means before run time • Many prediction schemes are possible • Predict taken • Pros? Loops are commons • Predict not-taken • Pros? 17

Simple “static” Prediction • “static” means before run time • Many prediction schemes are possible • Predict taken • Pros? Loops are commons • Predict not-taken • Pros? Not all branches are for loops. 17

Simple “static” Prediction • “static” means before run time • Many prediction schemes are possible • Predict taken • Pros? Loops are commons • Predict not-taken • Pros? Not all branches are for loops. Backward Taken/Forward not taken Best of both worlds. 17

Implementing Backward taken/forward not taken .// 2 .// :;5< 7+<=> !"#$+%$$&+- !"#$%&'()" ?@$@ !"#$ 3+45#$+% *+,)%- *+,)%- +,#*#+- !6+$';A?+' !"#$+%$$&+. !"#$ 657+ BC+'A*+, ?+'ABC+' !"#$ .89 %$$&"'' 01 %$$&"'' (&)*"+%$$& ,#*# *+,ADE !"#$ +,#*#+. (&)*"+,#*# (&)*"+,#*# :54" BC$+"/ -/ 0.

Implementing Backward taken/forward not taken Compute target Sign Shi< le< 2 Extend Add Insert bubble Add Add 4 Shi< le< 2 Read Addr 1 Instruc(on Data Read Register Memory Memory Data 1 IFetch/Dec Read Addr 2 Read File Exec/Mem Dec/Exec Read ALU PC Address Address Write Addr Data Mem/WB Read Data 2 Write Data Write Data Sign Extend 16 32

Implementing Backward taken/forward not taken • Changes in control • New inputs to the control unit • The sign of the offset • The result of the branch • New outputs from control • The flush signal. • Inserts “noop” bits in datapath and control 20

Performance Impact • ET = I * CPI * CT • Back taken, forward not taken is 80% accurate • Branches are 20% of instructions • Changing the front end increases the cycle time by 10% • What is the speedup Bt/Fnt compared to just stalling on every branch? 21

Performance Impact • ET = I * CPI * CT • Back taken, forward not taken is 80% accurate • Branches are 20% of instructions • Changing the front end increases the cycle time by 10% • What is the speedup Bt/Fnt compared to just stalling on every branch? • Btfnt • CPI = 0.2*0.2*(1 + 2) + (1-.2*.2)*1 = • CT = 1.1 • ET = 1.188 • Stall • CPI = .2*3 + .8*1 = 1.4 • CT = 1 • ET = 1.4 • Speed up = 1.4/1.188 = 1.18 22

The Importance of Pipeline depth • There are two important parameters of the pipeline that determine the impact of branches on performance • Branch decode time -- how many cycles does it take to identify a branch (in our case, this is less than 1) • Branch resolution time -- cycles until the real branch outcome is known (in our case, this is 2 cycles) 23

Pentium 4 pipeline 1. Branches take 19 cycles to resolve 2. Identifying a branch takes 4 cycles. 3. Stalling is not an option.

What about branches? Branch outcomes are not known until EXE What - PowerPoint PPT Presentation

What about branches? Branch outcomes are not known until EXE What are our options? 1 Control Hazards 2 Today Quiz Control Hazards Midterm review Return your papers 3 Key Points: Control Hazards Control occur when we

Opportunity Day 30 March 2017 Draft Background and Business Company History and Background 20

Q12019 RESULTS OUR REGIONAL PRESENCE Branches 10 Ethiopia South ATMs 2 Sudan Staff 138

CSE 110A: Winter 2020 Fundamentals of Compiler Design I Branches and Binary Operators

CS356 Unit 5 x86 Control Flow 5.2 JUMP/BRANCHING OVERVIEW 5.3 Concept of Jumps/Branches

Memcheck Reloaded: Memcheck Reloaded: dealing with compiler-generated branches dealing with

Mon., 21 Sept. 2015 (delayed slides) Conditional and unconditional branches The go to

Bungee Jumps: Accelerating Indirect Branches Through Hardware/Software Co-Design Daniel S.

Announcements Final Examples Tree-Structured Data def tree(label, branches=[]): A tree can

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Administrative Law Branches of Government Legislative (Congress) creates law Judicial

Zygomatic Nerve Branches Around Zygomaticus Major Muscle in Facelift Min-Hee Ryu, MD Sino-Kor

Working with the OSPCA Comprised of 50 branches and affiliated Humane Societies across the

Nutrient Management in Subtropical Tree Crops The avocado model Avocado Fertilization Tissue

The Vine & the Branches John 15:1-3 John 15 gives us an indication of our priorities as a

Identification of Pruning Branches for for Automated Dormant Pruning M Manoj Karkee j K k

TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic

Y86-64 Instruction Set Byte 0 1 2 3 4 5 6 7 8 9 halt 0 0 Computer Architecture: nop

Scaling symbolic evaluation for automated verification of systems code with Serval Luke Nelson

Ion slides 2 pc windows 10 driver download Moon gives a good business the last-named aspect

9: Advanced shading techniques Obtaining realistic renderings in real-time! Remember the

Custom Writing Service - Special Prices Pc problem solving presentation slides Health research

14.54 International Trade Lecture 13: Heckscher-Ohlin Model of Trade (I) 14.54 Week 9

6.828: PC hardware and x86 Frans Kaashoek kaashoek@mit.edu A PC how to make it to do something

Synthesizing Software Verifiers from Proof Rules Corneliu Popeea Technical University Munich

What about branches? Branch outcomes are not known until EXE What - PowerPoint PPT Presentation

What about branches? Branch outcomes are not known until EXE What are our options? 1 Control Hazards 2 Today Quiz Control Hazards Midterm review Return your papers 3 Key Points: Control Hazards Control occur when we

Opportunity Day 30 March 2017 Draft Background and Business Company History and Background 20

Q12019 RESULTS OUR REGIONAL PRESENCE Branches 10 Ethiopia South ATMs 2 Sudan Staff 138

CSE 110A: Winter 2020 Fundamentals of Compiler Design I Branches and Binary Operators

CS356 Unit 5 x86 Control Flow 5.2 JUMP/BRANCHING OVERVIEW 5.3 Concept of Jumps/Branches

Memcheck Reloaded: Memcheck Reloaded: dealing with compiler-generated branches dealing with

Mon., 21 Sept. 2015 (delayed slides) Conditional and unconditional branches The go to

Bungee Jumps: Accelerating Indirect Branches Through Hardware/Software Co-Design Daniel S.

Announcements Final Examples Tree-Structured Data def tree(label, branches=[]): A tree can

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Administrative Law Branches of Government Legislative (Congress) creates law Judicial

Zygomatic Nerve Branches Around Zygomaticus Major Muscle in Facelift Min-Hee Ryu, MD Sino-Kor

Working with the OSPCA Comprised of 50 branches and affiliated Humane Societies across the

Nutrient Management in Subtropical Tree Crops The avocado model Avocado Fertilization Tissue

The Vine &amp; the Branches John 15:1-3 John 15 gives us an indication of our priorities as a

Identification of Pruning Branches for for Automated Dormant Pruning M Manoj Karkee j K k

TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic

Y86-64 Instruction Set Byte 0 1 2 3 4 5 6 7 8 9 halt 0 0 Computer Architecture: nop

Scaling symbolic evaluation for automated verification of systems code with Serval Luke Nelson

Ion slides 2 pc windows 10 driver download Moon gives a good business the last-named aspect

9: Advanced shading techniques Obtaining realistic renderings in real-time! Remember the

Custom Writing Service - Special Prices Pc problem solving presentation slides Health research

14.54 International Trade Lecture 13: Heckscher-Ohlin Model of Trade (I) 14.54 Week 9

6.828: PC hardware and x86 Frans Kaashoek kaashoek@mit.edu A PC how to make it to do something

Synthesizing Software Verifiers from Proof Rules Corneliu Popeea Technical University Munich

The Vine & the Branches John 15:1-3 John 15 gives us an indication of our priorities as a