Dependences and Hazards Lecture 17 CS301 Administrative Daily - - PowerPoint PPT Presentation

dependences and hazards
SMART_READER_LITE
LIVE PREVIEW

Dependences and Hazards Lecture 17 CS301 Administrative Daily - - PowerPoint PPT Presentation

Dependences and Hazards Lecture 17 CS301 Administrative Daily Review of todays lecture w Due tomorrow (10/30) at 8am HW #7 due today at 5pm HW #8 assigned w Due 10/5 at 5pm Read Chapter 4.8-4.9 Data Dependencies We


slide-1
SLIDE 1

Dependences and Hazards

Lecture 17 CS301

slide-2
SLIDE 2

Administrative

  • Daily Review of today’s lecture

w Due tomorrow (10/30) at 8am

  • HW #7 due today at 5pm
  • HW #8 assigned

w Due 10/5 at 5pm

  • Read Chapter 4.8-4.9
slide-3
SLIDE 3

Data Dependencies

  • We want to keep the pipeline completing an

instruction every cycle

  • When a later instruction depends on the

result of an earlier instruction, stalls happen

  • There are 3 types of data dependencies that

we’ve been talking about:

w RAW w WAR w WAW

slide-4
SLIDE 4

RAW – Read after Write

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mult $t2, $t7, $s0

slide-5
SLIDE 5

WAR - Write after Read

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mult $t2, $t7, $s0

slide-6
SLIDE 6

WAW –Write after Write

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mult $t2, $t7, $s0

slide-7
SLIDE 7

Identify all of the dependencies

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

slide-8
SLIDE 8

Identify all of the dependencies

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

slide-9
SLIDE 9

Identify all of the dependencies

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

slide-10
SLIDE 10

Identify all of the dependences

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

slide-11
SLIDE 11

Which dependences can cause hazards? (stalls)

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

slide-12
SLIDE 12

Which dependences can cause hazards? (stalls)

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW Yes True Dependence WAR WAW

slide-13
SLIDE 13

Which dependences can cause hazards? (stalls)

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW Yes True Dependency WAR No WAW

slide-14
SLIDE 14

Which dependences can cause hazards? (stalls)

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW Yes True Dependency WAR No WAW No

slide-15
SLIDE 15

Which dependences can cause hazards? (stalls)

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW Yes True Dependency WAR No WAW No

How do we solve data hazards?

slide-16
SLIDE 16

Which dependences can cause hazards? (stalls)

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW Yes True Dependency WAR No WAW No

How do we solve data hazards? Instruction Reordering

slide-17
SLIDE 17

Let’s reorder the or

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

slide-18
SLIDE 18

Let’s reorder the or

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW RAW

slide-19
SLIDE 19

Let’s reorder the or

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW RAW

Aaaaaaah! The result of the or will be passed to the sub!!!!!!

slide-20
SLIDE 20

Let’s reorder the mul

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

slide-21
SLIDE 21

Let’s reorder the mul

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

slide-22
SLIDE 22

Let’s reorder the mul

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

Aaaaaaah! $t2 will be left with the result of the sub, not mult!

slide-23
SLIDE 23

Why do we care about WAW,WAR?

slide-24
SLIDE 24

Why do we care about WAW,WAR?

  • WAR and WAW prevent instruction

reordering

slide-25
SLIDE 25

How to remove WAR, WAW dependences?

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0 and $t4, $s3, $s5 add $s3, $s4, $s6

slide-26
SLIDE 26

Register Renaming 


use a different register for that result 
 (and all subsequent uses of that result)

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t5, $t0, $s3

  • r $t6, $t7, $s2

mul $t2, $t7, $s0 and $t4, $t6, $s5 add $s3, $s4, $s6

slide-27
SLIDE 27

Who renames registers?

  • Static register renaming
  • Dynamic register renaming
slide-28
SLIDE 28

Who renames registers?

  • Static register renaming

w Compiler w Compiler is the one who makes assignments in the first place! w Number of registers limited by…….

  • Dynamic register renaming

w Hardware w Can offer more registers – w Number of registers limited by…..

slide-29
SLIDE 29

Who renames registers?

  • Static register renaming

w Compiler w Compiler is the one who makes assignments in the first place! w Number of registers limited by Instruction format

  • Dynamic register renaming

w Hardware w Can offer more registers – w Number of registers limited by size of register file & clock rate

slide-30
SLIDE 30

Minimizing Data Hazards

slide-31
SLIDE 31

Minimizing Data Hazards

  • Data Forwarding
slide-32
SLIDE 32

Minimizing Data Hazards

  • Data Forwarding
  • Instruction Reordering
slide-33
SLIDE 33

Summary

  • What is the difference between a hazard and a

dependence?

  • How can we get rid of WAW/WAR dependences?
  • What limits this solution?
slide-34
SLIDE 34

Summary

  • What is the difference between a hazard and a

dependence?

w A dependence prevents reordering w A hazard can cause a stall w Hazard -> dependence, not always the converse

  • How can we get rid of WAW/WAR dependences?
  • What limits this solution?
slide-35
SLIDE 35

Summary

  • What is the difference between a hazard and a

dependence?

w A dependence prevents reordering w A hazard can cause a stall w Hazard -> dependence, not always the converse

  • How can we get rid of WAW/WAR dependences?

w Register renaming

  • What limits this solution?
slide-36
SLIDE 36

Summary

  • What is the difference between a hazard and

a dependence?

w A dependence prevents reordering w A hazard can cause a stall w Hazard -> dependence, not always the converse

  • How can we get rid of WAW/WAR

dependences?

w Register renaming

  • What limits this solution?

w The number of registers available (ISA or physical)

slide-37
SLIDE 37

Control Dependences

slide-38
SLIDE 38

In what cycle does the nextPC get calculated for the bne? In what cycle does the or get fetched? Time-> bne $s0, $s1, end

  • r $s3, $s0, $t3

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

end: sw $s2, 0($t1)

IF ID IF ID

WB MEM MEM WB

Control Hazard

add $s5, $s4, $t1

slide-39
SLIDE 39

Pipelined Machine

Read Addr Out Data

Instruction Memory PC 4

src1 src1data src2 src2data

Register File

destreg destdata

  • p/fun

rs rt rd imm

Addr Out Data

Data Memory

In Data

32 Sign Ext 16 << 2 << 2 Pipeline Register

Fetch (Writeback) Execute Decode Memory

slide-40
SLIDE 40

In what cycle does the nextPC get calculated for the bne? End of 4 In what cycle does the or get fetched? Time-> bne $s0, $s1, end

  • r $s3, $s0, $t3

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

end: sw $s2, 0($t1)

IF ID

MEM WB

Control Hazard

add $s5, $s4, $t1

IF ID

MEM WB

slide-41
SLIDE 41

In what cycle does the nextPC get calculated for the bne? End of 4 In what cycle does the or get fetched? Beginning of 3 Time-> bne $s0, $s1, end

  • r $s3, $s0, $t3

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

end: sw $s2, 0($t1)

IF ID

MEM WB

Control Hazard

add $s5, $s4, $t1

IF ID

MEM WB

slide-42
SLIDE 42

In what cycle does the nextPC get calculated for the bne? End of 4 In what cycle does the or get fetched? Beginning of 3 Time-> bne $s0, $s1, end

  • r $s3, $s0, $t3

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

end: sw $s2, 0($t1)

IF ID

MEM WB

Control Hazard

IF IF

IF ID

MEM WB

add $s5, $s4, $t1

slide-43
SLIDE 43

Barriers to Pipeline Performance

  • Uneven stages
  • Pipeline register delays
  • Data Hazards
  • Control Hazards

w Whether an instruction will execute depends on the outcome of a conditional branch still in the pipeline

slide-44
SLIDE 44

In what cycle does the nextPC get calculated for the bne? End of 4 In what cycle does the or get fetched? Beginning of 3 Time-> bne $s0, $s1, end

  • r $s3, $s0, $t3

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

end: sw $s2, 0($t1)

IF ID

MEM WB

Control Hazard

IF IF

IF ID

MEM WB

add $s5, $s4, $t1

slide-45
SLIDE 45

In what cycle does the nextPC get calculated for the bne? End of 4 In what cycle does the or get fetched? Beginning of 3 Time-> bne $s0, $s1, end

  • r $s3, $s0, $t3

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

end: sw $s2, 0($t1)

IF ID

MEM WB

Solution 1: Add hardware to determine branch in decode stage

IF IF

IF ID

MEM WB

add $s5, $s4, $t1

slide-46
SLIDE 46

Pipelined Machine

Read Addr Out Data

Instruction Memory PC 4

src1 src1data src2 src2data

Register File

destreg destdata

  • p/fun

rs rt rd imm

Addr Out Data

Data Memory

In Data

32 Sign Ext 16 << 2 << 2 Pipeline Register

Fetch (Writeback) Execute Decode Memory

slide-47
SLIDE 47

In what cycle does the nextPC get calculated for the bne? 3 In what cycle does the or get fetched? 3 Time->

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID

MEM WB

Solution 1: Add hardware to determine branch in decode stage

IF bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

slide-48
SLIDE 48

Note

  • For the rest of this course, the

branches will be determined in the decode stage

  • All other optimizations will be in

addition to moving branch calculation to decode stage

slide-49
SLIDE 49

Redefine the semantics of a branch: ALWAYS execute the instruction after the branch, regardless of the outcome of the branch. Time->

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID

MEM WB

Solution 2: 
 Branch Delay Slot

IF bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

nop

ID EX MEM WB

slide-50
SLIDE 50

ALWAYS execute the instruction after the branch, regardless of the outcome of the branch. Try to fill that spot with an instruction from before the branch. Time->

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID

MEM WB

Solution 2: Also add 
 Branch Delay Slot

bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

slide-51
SLIDE 51

Branch Delay Slot

  • The hardware always executes

instruction after a branch

  • The compiler tries to take an

instruction from before branch and move it after branch

  • If it can find no instruction, it inserts a

nop after the branch

  • If it forgets to place nop or inst there,

you can get incorrect execution!!!!!

slide-52
SLIDE 52

Branch Delay Slot - Limitations

  • If you have a machine with 20 pipeline

stages, and it takes 10 stages to calculate branch, how many branch delay slots are there?

  • Can you move any instruction into

branch delay slot?

  • What happens as the pipeline gets

deeper?

slide-53
SLIDE 53

Branch Delay Slot - Limitations

  • If you have a machine with 20 pipeline

stages, and it takes 10 stages to calculate branch, how many branch delay slots are there? 9

  • Can you move any instruction into

branch delay slot?

  • What happens as the pipeline gets

deeper?

slide-54
SLIDE 54

Branch Delay Slot - Limitations

  • If you have a machine with 20 pipeline

stages, and it takes 10 stages to calculate branch, how many branch delay slots are there? 9

  • Can you move any instruction into

branch delay slot? Only independent instructions

  • What happens as the pipeline gets

deeper?

slide-55
SLIDE 55

Branch Delay Slot - Limitations

  • If you have a machine with 20 pipeline

stages, and it takes 10 stages to calculate branch, how many branch delay slots are there? 9

  • Can you move any instruction into branch

delay slot? Only independent instructions

  • What happens as the pipeline gets deeper?

More difficult to fill slots

  • Branch delay slot is only used in short

pipelines!

slide-56
SLIDE 56

Time->

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID

MEM WB

Solution 3: Branch Prediction

IF bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

Guess which way the branch will go before calculation occurs. Clean up if predictor is wrong.

slide-57
SLIDE 57

First: Always predict not taken If we are right, how many cycles do we stall? Time->

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID

MEM WB

Solution 3: Branch Prediction

bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

slide-58
SLIDE 58

First: Always predict not taken If we are right, how many cycles do we stall? 0 Time->

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID

MEM WB

Solution 3: Branch Prediction

bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

slide-59
SLIDE 59

First: Always predict not taken If we are wrong, then flush incorrect instruction(s) Time->

IF ID IF

MEM

1 2 3 4 5 6 7 8

WB

IF ID

MEM WB

Solution 3: Branch Prediction

bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

ID EX MEM WB

slide-60
SLIDE 60

First: Always predict not taken If we are wrong, then flush incorrect instruction(s) How many cycles do we stall? Time->

IF ID IF

MEM

1 2 3 4 5 6 7 8

WB

IF ID

MEM WB

Solution 3: Branch Prediction

bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

ID EX MEM WB

slide-61
SLIDE 61

First: Always predict not taken If we are wrong, then flush incorrect instruction(s) How many cycles do we stall? 1 Time->

IF ID IF

MEM

1 2 3 4 5 6 7 8

WB

IF ID

MEM WB

Solution 3: Branch Prediction

bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

ID EX MEM WB

slide-62
SLIDE 62

First: Always predict taken Why will this still result in a stall? Time->

IF ID

MEM

1 2 3 4 5 6 7 8

WB

IF ID

MEM WB

Solution 3: Branch Prediction

bne $s0, $s1, end

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

IF ID EX MEM WB

slide-63
SLIDE 63

Branch Prediction

  • If we’re going to predict taken, we

need to know where to branch to earlier than when we determine where the branch actually goes to.

w How?

slide-64
SLIDE 64

Branch Prediction

  • Understand the nature of programs
  • Are branch directions random?
  • If not, what will correlate?

w Past behavior? w Previous branches’ behavior?

slide-65
SLIDE 65

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work Is beq often taken or not taken? Is bne often taken or not taken?

slide-66
SLIDE 66

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work Is beq often taken or not taken? Not Taken Is bne often taken or not taken?

slide-67
SLIDE 67

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work Is beq often taken or not taken? Is bne often taken or not taken? Is beq often taken or not taken? Not Taken Is bne often taken or not taken? Taken

Conclusion: We want a prediction that is unique to each branch. Look up prediction by PC

slide-68
SLIDE 68

First Branch Predictor

Predict whatever happened last time Update the predictor for next time Predict Taken Predict Not Taken

slide-69
SLIDE 69

First Branch Predictor

Predict Taken Predict Not Taken Predict whatever happened last time Update the predictor for next time T NT

slide-70
SLIDE 70

First Branch Predictor

Predict Taken Predict Not Taken Predict whatever happened last time Update the predictor for next time T NT NT T 1

slide-71
SLIDE 71

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState Prediction Reality NextState

slide-72
SLIDE 72

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 0 1 Prediction NT Reality T NextState 1

slide-73
SLIDE 73

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 0 1 1 Prediction NT T Reality T T NextState 1 1

slide-74
SLIDE 74

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 0 1 1 0 Prediction NT T T Reality T T NT NextState 1 1 0

slide-75
SLIDE 75

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 0 1 1 0 1 Prediction NT T T NT Reality T T NT T NextState 1 1 0 1

slide-76
SLIDE 76

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 0 1 1 0 1 1 Prediction NT T T NT T Reality T T NT T T NextState 1 1 0 1 1

slide-77
SLIDE 77

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 0 1 1 0 1 1 Prediction NT T T NT T T Reality T T NT T T NT NextState 1 1 0 1 1 0 When are we wrong?????

slide-78
SLIDE 78

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 0 1 1 0 1 1 Prediction NT T T NT T T Reality T T NT T T NT NextState 1 1 0 1 1 0 When are we wrong????? First and last iteration of each loop

slide-79
SLIDE 79

Two-bit Branch Predictor

Must be wrong twice in a row to switch prediction Update the predictor for next time One wrong-> state 1 or 2, No wrong -> state 0 or 3 Predict Taken Predict Not Taken 1 2 3

slide-80
SLIDE 80

Two-bit Branch Predictor

Must be wrong twice in a row to switch prediction Update the predictor for next time One wrong-> state 1 or 2, No wrong -> state 0 or 3 Predict Taken Predict Not Taken 1 2 3 T NT

slide-81
SLIDE 81

Two-bit Branch Predictor

Must be wrong twice in a row to switch prediction Update the predictor for next time One wrong-> state 1 or 2, No wrong -> state 0 or 3 Predict Taken Predict Not Taken 1 2 3 T NT NT T

slide-82
SLIDE 82

Two-bit Branch Predictor

Must be wrong twice in a row to switch prediction Update the predictor for next time One wrong-> state 1 or 2, No wrong -> state 0 or 3 Predict Taken Predict Not Taken 1 2 3 T NT NT T T NT

slide-83
SLIDE 83

Second Branch Predictor

Must be wrong twice in a row to switch prediction Update the predictor for next time Predict Taken Predict Not Taken T NT NT T NT T T NT 1 2 3

slide-84
SLIDE 84

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 2 Prediction Reality NextState

slide-85
SLIDE 85

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 2 3 Prediction T Reality T NextState 3

slide-86
SLIDE 86

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 2 3 3 Prediction T T Reality T T NextState 3 3

slide-87
SLIDE 87

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 2 3 3 2 Prediction T T T Reality T T NT NextState 3 3 2

slide-88
SLIDE 88

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 2 3 3 2 3 Prediction T T T T Reality T T NT T NextState 3 3 2 3

slide-89
SLIDE 89

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 2 3 3 2 3 3 Prediction T T T T T Reality T T NT T T NextState 3 3 2 3 3

slide-90
SLIDE 90

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 2 3 3 2 3 3 Prediction T T T T T T Reality T T NT T T NT NextState 3 3 2 3 3 2 When are we wrong?????

slide-91
SLIDE 91

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 2 3 3 2 3 3 Prediction T T T T T T Reality T T NT T T NT NextState 3 3 2 3 3 2 When are we wrong????? Only when we exit the loop

slide-92
SLIDE 92

Simplest Branch Predictors

  • Memory indexed by

lower portion of address

  • Entry contains few

bits specifying prediction

  • Accessed in IF stage

so fetching of target

  • ccurs in next cycle

01 11 00 00 10 01 11 00

. . .

100........ 10110 PC

slide-93
SLIDE 93

Real Branch Predictors

  • TargetPC saved with predictor
  • Limited space, so different branches

may map to the same predictor

w errors?

  • Prediction based on past behavior of

several branches

slide-94
SLIDE 94

Advantages of 
 Branch Prediction

  • No extra instructions
  • Highly predictable branches have no

stalls

  • Works well with loops.
  • All hardware - no compiler necessary
slide-95
SLIDE 95

Disadvantages/Limits of 
 Branch Prediction

  • Large penalty when wrong

w Badly behaved branches kill performance

  • Only a few can be performed each

cycle (only a problem in multi-issue machines)

slide-96
SLIDE 96

Minimizing Control Hazards

slide-97
SLIDE 97

Minimizing Control Hazards

  • Calculate branch in decode stage
slide-98
SLIDE 98

Minimizing Control Hazards

  • Calculate branch in decode stage
  • Branch delay slot
slide-99
SLIDE 99

Minimizing Control Hazards

  • Calculate branch in decode stage
  • Branch delay slot
  • Branch prediction
slide-100
SLIDE 100

CPI

  • CPI = ∑((% instr)×(cycles))
  • How do hazards affect CPI?
  • How do branches affect CPI?
slide-101
SLIDE 101

CPI

  • CPI = ∑((% instr)×(cycles))
  • How do hazards affect CPI?

w Arithmetic instructions’ cycle time increases

  • How do branches affect CPI?
slide-102
SLIDE 102

CPI

  • CPI = ∑((% instr)×(cycles))
  • How do hazards affect CPI?

w Arithmetic instructions’ cycle time increases

  • How do branches affect CPI?

w Branches’ cycle time increases

slide-103
SLIDE 103

Summary of Optimizing Instruction Schedule

  • Identify dependencies
  • Draw timing diagram with data

forwarding

  • Move instructions between stalled

instructions

w This is reordering. You may need to do register renaming to do this.

  • Reduce impact of control hazards if

possible

w Branch delay slot

slide-104
SLIDE 104

Exceptions

slide-105
SLIDE 105

What is an Exception?

  • When there is an unexpected change

in control flow, control switches to OS to handle

w Examples: Divide by zero, arithmetic

  • verflow, undefined instruction
slide-106
SLIDE 106

Steps for Exceptions

slide-107
SLIDE 107

Steps for Exceptions

  • Detect exception
slide-108
SLIDE 108

Steps for Exceptions

  • Detect exception
  • Place processor in state before
  • ffending instruction
slide-109
SLIDE 109

Steps for Exceptions

  • Detect exception
  • Place processor in state before
  • ffending instruction
  • Record exception type
slide-110
SLIDE 110

Steps for Exceptions

  • Detect exception
  • Place processor in state before
  • ffending instruction
  • Record exception type
  • Record instruction’s PC in EPC
slide-111
SLIDE 111

Steps for Exceptions

  • Detect exception
  • Place processor in state before
  • ffending instruction
  • Record exception type
  • Record instruction’s PC in EPC
  • Transfer control to OS
slide-112
SLIDE 112

How does pipelining affect exception-handling?

slide-113
SLIDE 113

What happens if the third instruction is undefined? Time->

add $s0, $0, $0 lw $s1, 0($t0) undefined

  • r $s3, $s4, $t3

IF ID IF ID IF

MEM

ID IF

1 2 3 4 5 6 7 8

ID

WB MEM WB MEM WB MEM WB

In what stage is it detected? In what cycle?

  • 1. Detection
slide-114
SLIDE 114

What happens if the third instruction is undefined? Time->

add $s0, $0, $0 lw $s1, 0($t0) undefined

  • r $s3, $s4, $t3

IF ID IF ID IF

MEM

ID IF

1 2 3 4 5 6 7 8

ID

WB MEM WB MEM WB MEM WB

In what stage is it detected? Decode In what cycle? 4

  • 1. Detection
slide-115
SLIDE 115
  • 1. Detection
  • Must associate exception with proper

instruction

  • What happens if multiple exceptions

happen in the same cycle?

w Prioritize exceptions (earliest instructions have priority)

slide-116
SLIDE 116

Time->

add $s0, $0, $0 lw $s1, 0($t0) undefined

  • r $s3, $s4, $t3

IF ID IF ID IF

MEM

ID IF

1 2 3 4 5 6 7 8

  • 2. Preserve state before

instruction

What? What does that mean?!?

slide-117
SLIDE 117

Time->

add $s0, $0, $0 lw $s1, 0($t0) undefined

  • r $s3, $s4, $t3

IF ID IF ID IF

MEM

ID IF

1 2 3 4 5 6 7 8

MEM WB

  • 2. Preserve state before

instruction

What? What does that mean?!? Complete previous instructions, flush following instructions and do not let current write back

slide-118
SLIDE 118

Time->

add $s0, $0, $0 lw $s1, 0($t0) undefined

  • r $s3, $s4, $t3

IF ID IF ID IF

MEM

ID IF

1 2 3 4 5 6 7 8

MEM WB MEM WB

  • 2. Preserve state before

instruction

slide-119
SLIDE 119
  • 3. Record exception type
  • Place value in cause register or
  • Use vectored interrupts

w (exception routine address dependent on exception type)

slide-120
SLIDE 120

P C

4 4

Addr Instr

Inst Mem

src1 src1data src2

Reg File src2data

dest destdata

ALU

Addr OutData

Data Mem

InData X

<

Undef add lw

  • r
  • 4. Record nPC in EPC


Machine in detection cycle

slide-121
SLIDE 121
  • 4. Record nPC in EPC
  • Non-trivial because PC changes each

cycle, and exceptions can be detected in several stages (decode, execute, memory)

  • Precise exceptions
  • Imprecise exceptions
slide-122
SLIDE 122
  • 4. Record PC in EPC
  • Non-trivial because PC changes each

cycle, and exceptions can be detected in several stages (decode, execute, memory)

  • Precise exceptions figure out PC in

hardware

  • Imprecise exceptions let OS figure it
  • ut
slide-123
SLIDE 123
  • 5. Transfer control to OS
  • Same as before