Dependencies and Hazards Lecture 17 CS301 Data Dependencies We - - PowerPoint PPT Presentation

dependencies and hazards
SMART_READER_LITE
LIVE PREVIEW

Dependencies and Hazards Lecture 17 CS301 Data Dependencies We - - PowerPoint PPT Presentation

Dependencies and Hazards Lecture 17 CS301 Data Dependencies We want to keep the pipeline completing an instruction every cycle When a later instruction depends on the result of an earlier instruction, stalls happen There are 3


slide-1
SLIDE 1

Dependencies and Hazards

Lecture 17 CS301

slide-2
SLIDE 2

Data Dependencies

  • We want to keep the pipeline completing an

instruction every cycle

  • When a later instruction depends on the

result of an earlier instruction, stalls happen

  • There are 3 types of data dependencies that

we’ve been talking about:

w RAW w WAR w WAW

slide-3
SLIDE 3

RAW – Read after Write

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mult $t2, $t7, $s0

slide-4
SLIDE 4

WAR - Write after Read

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mult $t2, $t7, $s0

slide-5
SLIDE 5

WAW –Write after Write

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mult $t2, $t7, $s0

slide-6
SLIDE 6

Identify all of the dependencies

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

slide-7
SLIDE 7

Identify all of the dependencies

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

slide-8
SLIDE 8

Identify all of the dependencies

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

slide-9
SLIDE 9

Identify all of the dependences

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

slide-10
SLIDE 10

Which dependences can cause hazards? (stalls)

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

slide-11
SLIDE 11

Which dependences can cause hazards? (stalls)

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW Yes True Dependence WAR WAW

slide-12
SLIDE 12

Which dependences can cause hazards? (stalls)

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW Yes True Dependency WAR No WAW

slide-13
SLIDE 13

Which dependences can cause hazards? (stalls)

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW Yes True Dependency WAR No WAW No

slide-14
SLIDE 14

Which dependences can cause hazards? (stalls)

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW Yes True Dependency WAR No WAW No

How do we solve data hazards?

slide-15
SLIDE 15

Which dependences can cause hazards? (stalls)

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW Yes True Dependency WAR No WAW No

How do we solve data hazards? Instruction Reordering

slide-16
SLIDE 16

Let’s reorder the or

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

slide-17
SLIDE 17

Let’s reorder the or

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW RAW

slide-18
SLIDE 18

Let’s reorder the or

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW RAW

Aaaaaaah! The result of the or will be passed to the sub!!!!!!

slide-19
SLIDE 19

Let’s reorder the mul

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

slide-20
SLIDE 20

Let’s reorder the mul

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

slide-21
SLIDE 21

Let’s reorder the mul

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0

RAW WAR WAW

Aaaaaaah! $t2 will be left with the result of the sub, not mult!

slide-22
SLIDE 22

Why do we care about WAW,WAR?

slide-23
SLIDE 23

Why do we care about WAW,WAR?

  • WAR and WAW prevent instruction

reordering

slide-24
SLIDE 24

How to remove WAR, WAW dependences?

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t2, $t0, $s3

  • r $s3, $t7, $s2

mul $t2, $t7, $s0 and $t4, $s3, $s5 add $s3, $s4, $s6

slide-25
SLIDE 25

Register Renaming

use a different register for that result (and all subsequent uses of that result)

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID IF ID

WB MEM MEM WB

add $t0, $s0, $s1 sub $t5, $t0, $s3

  • r $t6, $t7, $s2

mul $t2, $t7, $s0 and $t4, $t6, $s5 add $s3, $s4, $s6

slide-26
SLIDE 26

Who renames registers?

  • Static register renaming
  • Dynamic register renaming
slide-27
SLIDE 27

Who renames registers?

  • Static register renaming

w Compiler w Compiler is the one who makes assignments in the first place! w Number of registers limited by…….

  • Dynamic register renaming

w Hardware w Can offer more registers – w Number of registers limited by…..

slide-28
SLIDE 28

Who renames registers?

  • Static register renaming

w Compiler w Compiler is the one who makes assignments in the first place! w Number of registers limited by Instruction format

  • Dynamic register renaming

w Hardware w Can offer more registers – w Number of registers limited by size of register file & clock rate

slide-29
SLIDE 29

Minimizing Data Hazards

slide-30
SLIDE 30

Minimizing Data Hazards

  • Data Forwarding
slide-31
SLIDE 31

Minimizing Data Hazards

  • Data Forwarding
  • Instruction Reordering
slide-32
SLIDE 32

Summary

  • What is the difference between a hazard and a

dependence?

  • How can we get rid of WAW/WAR dependences?
  • What limits this solution?
slide-33
SLIDE 33

Summary

  • What is the difference between a hazard and a

dependence?

w A dependence prevents reordering w A hazard can cause a stall w Hazard -> dependence, not always the converse

  • How can we get rid of WAW/WAR dependences?
  • What limits this solution?
slide-34
SLIDE 34

Summary

  • What is the difference between a hazard and a

dependence?

w A dependence prevents reordering w A hazard can cause a stall w Hazard -> dependence, not always the converse

  • How can we get rid of WAW/WAR dependences?

w Register renaming

  • What limits this solution?
slide-35
SLIDE 35

Summary

  • What is the difference between a hazard and

a dependence?

w A dependence prevents reordering w A hazard can cause a stall w Hazard -> dependence, not always the converse

  • How can we get rid of WAW/WAR

dependences?

w Register renaming

  • What limits this solution?

w The number of registers available (ISA or physical)

slide-36
SLIDE 36

Control Dependences

slide-37
SLIDE 37

In what cycle does the nextPC get calculated for the bne? In what cycle does the or get fetched? Time-> bne $s0, $s1, end

  • r $s3, $s0, $t3

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

end: sw $s2, 0($t1)

IF ID IF ID

WB MEM MEM WB

Control Hazard

add $s5, $s4, $t1

slide-38
SLIDE 38

Pipelined Machine

Read Addr Out Data

Instruction Memory PC 4

src1 src1data src2 src2data

Register File

destreg destdata

  • p/fun

rs rt rd imm

Addr Out Data

Data Memory

In Data

32 Sign Ext 16 << 2 << 2 Pipeline Register

Fetch (Writeback) Execute Decode Memory

slide-39
SLIDE 39

In what cycle does the nextPC get calculated for the bne? End of 4 In what cycle does the or get fetched? Time-> bne $s0, $s1, end

  • r $s3, $s0, $t3

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

end: sw $s2, 0($t1)

IF ID

MEM WB

Control Hazard

add $s5, $s4, $t1

IF ID

MEM WB

slide-40
SLIDE 40

In what cycle does the nextPC get calculated for the bne? End of 4 In what cycle does the or get fetched? Beginning of 3 Time-> bne $s0, $s1, end

  • r $s3, $s0, $t3

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

end: sw $s2, 0($t1)

IF ID

MEM WB

Control Hazard

add $s5, $s4, $t1

IF ID

MEM WB

slide-41
SLIDE 41

In what cycle does the nextPC get calculated for the bne? End of 4 In what cycle does the or get fetched? Beginning of 3 Time-> bne $s0, $s1, end

  • r $s3, $s0, $t3

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

end: sw $s2, 0($t1)

IF ID

MEM WB

Control Hazard

IF IF

IF ID

MEM WB

add $s5, $s4, $t1

slide-42
SLIDE 42

Barriers to Pipeline Performance

  • Uneven stages
  • Pipeline register delays
  • Data Hazards
  • Control Hazards

w Whether an instruction will execute depends on the outcome of a conditional branch still in the pipeline

slide-43
SLIDE 43

In what cycle does the nextPC get calculated for the bne? End of 4 In what cycle does the or get fetched? Beginning of 3 Time-> bne $s0, $s1, end

  • r $s3, $s0, $t3

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

end: sw $s2, 0($t1)

IF ID

MEM WB

Control Hazard

IF IF

IF ID

MEM WB

add $s5, $s4, $t1

slide-44
SLIDE 44

In what cycle does the nextPC get calculated for the bne? End of 4 In what cycle does the or get fetched? Beginning of 3 Time-> bne $s0, $s1, end

  • r $s3, $s0, $t3

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

end: sw $s2, 0($t1)

IF ID

MEM WB

Solution 1: Add hardware to determine branch in decode stage

IF IF

IF ID

MEM WB

add $s5, $s4, $t1

slide-45
SLIDE 45

Pipelined Machine

Read Addr Out Data

Instruction Memory PC 4

src1 src1data src2 src2data

Register File

destreg destdata

  • p/fun

rs rt rd imm

Addr Out Data

Data Memory

In Data

32 Sign Ext 16 << 2 << 2 Pipeline Register

Fetch (Writeback) Execute Decode Memory

slide-46
SLIDE 46

In what cycle does the nextPC get calculated for the bne? 3 In what cycle does the or get fetched? 3 Time->

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID

MEM WB

Solution 1: Add hardware to determine branch in decode stage

IF bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

slide-47
SLIDE 47

Note

  • For the rest of this course, the

branches will be determined in the decode stage

  • All other optimizations will be in

addition to moving branch calculation to decode stage

slide-48
SLIDE 48

Redefine the semantics of a branch: ALWAYS execute the instruction after the branch, regardless of the outcome of the branch. Time->

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID

MEM WB

Solution 2: Branch Delay Slot

IF bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

nop

ID EX

MEM WB

slide-49
SLIDE 49

ALWAYS execute the instruction after the branch, regardless of the outcome of the branch. Try to fill that spot with an instruction from before the branch. Time->

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID

MEM WB

Solution 2: Also add Branch Delay Slot

bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

slide-50
SLIDE 50

Branch Delay Slot

  • The hardware always executes

instruction after a branch

  • The compiler tries to take an

instruction from before branch and move it after branch

  • If it can find no instruction, it inserts a

nop after the branch

  • If it forgets to place nop or inst there,

you can get incorrect execution!!!!!

slide-51
SLIDE 51

Branch Delay Slot - Limitations

  • If you have a machine with 20 pipeline

stages, and it takes 10 stages to calculate branch, how many branch delay slots are there?

  • Can you move any instruction into

branch delay slot?

  • What happens as the pipeline gets

deeper?

slide-52
SLIDE 52

Branch Delay Slot - Limitations

  • If you have a machine with 20 pipeline

stages, and it takes 10 stages to calculate branch, how many branch delay slots are there? 9

  • Can you move any instruction into

branch delay slot?

  • What happens as the pipeline gets

deeper?

slide-53
SLIDE 53

Branch Delay Slot - Limitations

  • If you have a machine with 20 pipeline

stages, and it takes 10 stages to calculate branch, how many branch delay slots are there? 9

  • Can you move any instruction into

branch delay slot? Only independent instructions

  • What happens as the pipeline gets

deeper?

slide-54
SLIDE 54

Branch Delay Slot - Limitations

  • If you have a machine with 20 pipeline

stages, and it takes 10 stages to calculate branch, how many branch delay slots are there? 9

  • Can you move any instruction into branch

delay slot? Only independent instructions

  • What happens as the pipeline gets deeper?

More difficult to fill slots

  • Branch delay slot is only used in short

pipelines!

slide-55
SLIDE 55

Time->

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID

MEM WB

Solution 3: Branch Prediction

IF bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

Guess which way the branch will go before calculation occurs. Clean up if predictor is wrong.

slide-56
SLIDE 56

First: Always predict not taken If we are right, how many cycles do we stall? Time->

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID

MEM WB

Solution 3: Branch Prediction

bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

slide-57
SLIDE 57

First: Always predict not taken If we are right, how many cycles do we stall? 0 Time->

IF ID IF ID

MEM

1 2 3 4 5 6 7 8

MEM WB WB

IF ID

MEM WB

Solution 3: Branch Prediction

bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

slide-58
SLIDE 58

First: Always predict not taken If we are wrong, then flush incorrect instruction(s) Time->

IF ID IF

MEM

1 2 3 4 5 6 7 8

WB

IF ID

MEM WB

Solution 3: Branch Prediction

bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

ID EX

MEM WB

slide-59
SLIDE 59

First: Always predict not taken If we are wrong, then flush incorrect instruction(s) How many cycles do we stall? Time->

IF ID IF

MEM

1 2 3 4 5 6 7 8

WB

IF ID

MEM WB

Solution 3: Branch Prediction

bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

ID EX

MEM WB

slide-60
SLIDE 60

First: Always predict not taken If we are wrong, then flush incorrect instruction(s) How many cycles do we stall? 1 Time->

IF ID IF

MEM

1 2 3 4 5 6 7 8

WB

IF ID

MEM WB

Solution 3: Branch Prediction

bne $s0, $s1, end

  • r $s3, $s0, $t3

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

ID EX

MEM WB

slide-61
SLIDE 61

First: Always predict taken Why will this still result in a stall? Time->

IF ID

MEM

1 2 3 4 5 6 7 8

WB

IF ID

MEM WB

Solution 3: Branch Prediction

bne $s0, $s1, end

end: sw $s2, 0($t1) add $s5, $s4, $t1

IF ID

MEM WB

IF ID EX

MEM WB

slide-62
SLIDE 62

Branch Prediction

  • If we’re going to predict taken, we

need to know where to branch to earlier than when we determine where the branch actually goes to.

w How?

slide-63
SLIDE 63

Branch Prediction

  • Understand the nature of programs
  • Are branch directions random?
  • If not, what will correlate?

w Past behavior? w Previous branches’ behavior?

slide-64
SLIDE 64

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work Is beq often taken or not taken? Is bne often taken or not taken?

slide-65
SLIDE 65

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work Is beq often taken or not taken? Not Taken Is bne often taken or not taken?

slide-66
SLIDE 66

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work Is beq often taken or not taken? Is bne often taken or not taken? Is beq often taken or not taken? Not Taken Is bne often taken or not taken? Taken

Conclusion: We want a prediction that is unique to each branch. Look up prediction by PC

slide-67
SLIDE 67

First Branch Predictor

Predict whatever happened last time Update the predictor for next time Predict Taken Predict Not Taken

slide-68
SLIDE 68

First Branch Predictor

Predict Taken Predict Not Taken Predict whatever happened last time Update the predictor for next time T NT

slide-69
SLIDE 69

First Branch Predictor

Predict Taken Predict Not Taken Predict whatever happened last time Update the predictor for next time T NT NT T 1

slide-70
SLIDE 70

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState Prediction Reality NextState

slide-71
SLIDE 71

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 0 1 Prediction NT Reality T NextState 1

slide-72
SLIDE 72

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 0 1 1 Prediction NT T Reality T T NextState 1 1

slide-73
SLIDE 73

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 0 1 1 0 Prediction NT T T Reality T T NT NextState 1 1 0

slide-74
SLIDE 74

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 0 1 1 0 1 Prediction NT T T NT Reality T T NT T NextState 1 1 0 1

slide-75
SLIDE 75

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 0 1 1 0 1 1 Prediction NT T T NT T Reality T T NT T T NextState 1 1 0 1 1

slide-76
SLIDE 76

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 0 1 1 0 1 1 Prediction NT T T NT T T Reality T T NT T T NT NextState 1 1 0 1 1 0 When are we wrong?????

slide-77
SLIDE 77

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 0 1 1 0 1 1 Prediction NT T T NT T T Reality T T NT T T NT NextState 1 1 0 1 1 0 When are we wrong????? First and last iteration of each loop

slide-78
SLIDE 78

Two-bit Branch Predictor

Must be wrong twice in a row to switch prediction Update the predictor for next time One wrong-> state 1 or 2, No wrong -> state 0 or 3 Predict Taken Predict Not Taken 1 2 3

slide-79
SLIDE 79

Two-bit Branch Predictor

Must be wrong twice in a row to switch prediction Update the predictor for next time One wrong-> state 1 or 2, No wrong -> state 0 or 3 Predict Taken Predict Not Taken 1 2 3 T NT

slide-80
SLIDE 80

Two-bit Branch Predictor

Must be wrong twice in a row to switch prediction Update the predictor for next time One wrong-> state 1 or 2, No wrong -> state 0 or 3 Predict Taken Predict Not Taken 1 2 3 T NT NT T

slide-81
SLIDE 81

Two-bit Branch Predictor

Must be wrong twice in a row to switch prediction Update the predictor for next time One wrong-> state 1 or 2, No wrong -> state 0 or 3 Predict Taken Predict Not Taken 1 2 3 T NT NT T T NT

slide-82
SLIDE 82

Second Branch Predictor

Must be wrong twice in a row to switch prediction Update the predictor for next time Predict Taken Predict Not Taken T NT NT T NT T T NT 1 2 3

slide-83
SLIDE 83

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 2 Prediction Reality NextState

slide-84
SLIDE 84

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 2 3 Prediction T Reality T NextState 3

slide-85
SLIDE 85

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 2 3 3 Prediction T T Reality T T NextState 3 3

slide-86
SLIDE 86

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 2 3 3 2 Prediction T T T Reality T T NT NextState 3 3 2

slide-87
SLIDE 87

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 2 3 3 2 3 Prediction T T T T Reality T T NT T NextState 3 3 2 3

slide-88
SLIDE 88

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 2 3 3 2 3 3 Prediction T T T T T Reality T T NT T T NextState 3 3 2 3 3

slide-89
SLIDE 89

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 2 3 3 2 3 3 Prediction T T T T T T Reality T T NT T T NT NextState 3 3 2 3 3 2 When are we wrong?????

slide-90
SLIDE 90

Branch Prediction

slt $t1, $s2, $s3 beq $t1, $0, end loop: do some work addi $s2, $s2, 1 slt $t1, $s2, $s3 bne $t1, $0, loop end: for(i; i<n;i++) do some work

Iteration 1 2 … x 1 2 … y CurState 2 3 3 2 3 3 Prediction T T T T T T Reality T T NT T T NT NextState 3 3 2 3 3 2 When are we wrong????? Only when we exit the loop

slide-91
SLIDE 91

Simplest Branch Predictors

  • Memory indexed by

lower portion of address

  • Entry contains few

bits specifying prediction

  • Accessed in IF stage

so fetching of target

  • ccurs in next cycle

01 11 00 00 10 01 11 00

. . .

100........ 10110 PC

slide-92
SLIDE 92

Real Branch Predictors

  • TargetPC saved with predictor
  • Limited space, so different branches

may map to the same predictor

w Prediction may have been put there by another instruction with same low order address bits w errors? (Prediction is just that – not guarantee)

  • Prediction based on past behavior of

several branches

slide-93
SLIDE 93

Advantages of Branch Prediction

  • No extra instructions
  • Highly predictable branches have no

stalls

  • Works well with loops.
  • All hardware - no compiler necessary
slide-94
SLIDE 94

Disadvantages/Limits of Branch Prediction

  • Large penalty when wrong

w Badly behaved branches kill performance

  • Only a few can be performed each

cycle (only a problem in multi-issue machines)

w May or may not get to this – it’s superscalar processors

slide-95
SLIDE 95

Minimizing Control Hazards

slide-96
SLIDE 96

Minimizing Control Hazards

  • Calculate branch in decode stage
slide-97
SLIDE 97

Minimizing Control Hazards

  • Calculate branch in decode stage
  • Branch delay slot
slide-98
SLIDE 98

Minimizing Control Hazards

  • Calculate branch in decode stage
  • Branch delay slot
  • Branch prediction
slide-99
SLIDE 99

CPI

  • CPI = ∑((% instr)×(cycles))
  • How do hazards affect CPI?
  • How do branches affect CPI?
slide-100
SLIDE 100

CPI

  • CPI = ∑((% instr)×(cycles))
  • How do hazards affect CPI?

w Arithmetic instructions’ cycle time increases

  • How do branches affect CPI?
slide-101
SLIDE 101

CPI

  • CPI = ∑((% instr)×(cycles))
  • How do hazards affect CPI?

w Arithmetic instructions’ cycle time increases

  • How do branches affect CPI?

w Branches’ cycle time increases

slide-102
SLIDE 102

Summary of Optimizing Instruction Schedule

  • Identify dependencies
  • Draw timing diagram with data

forwarding

  • Move instructions between stalled

instructions

w This is reordering. You may need to do register renaming to do this.

  • Reduce impact of control hazards if

possible

w Branch delay slot

slide-103
SLIDE 103

Exceptions

slide-104
SLIDE 104

What is an Exception?

  • When there is an unexpected change

in control flow, control switches to OS to handle

w Examples: Divide by zero, arithmetic

  • verflow, undefined instruction
slide-105
SLIDE 105

Steps for Exceptions

slide-106
SLIDE 106

Steps for Exceptions

  • Detect exception
slide-107
SLIDE 107

Steps for Exceptions

  • Detect exception
  • Place processor in state before
  • ffending instruction
slide-108
SLIDE 108

Steps for Exceptions

  • Detect exception
  • Place processor in state before
  • ffending instruction
  • Record exception type
slide-109
SLIDE 109

Steps for Exceptions

  • Detect exception
  • Place processor in state before
  • ffending instruction
  • Record exception type
  • Record instruction’s PC in EPC
slide-110
SLIDE 110

Steps for Exceptions

  • Detect exception
  • Place processor in state before
  • ffending instruction
  • Record exception type
  • Record instruction’s PC in EPC
  • Transfer control to OS
slide-111
SLIDE 111

How does pipelining affect exception-handling?

slide-112
SLIDE 112

What happens if the third instruction is undefined? Time->

add $s0, $0, $0 lw $s1, 0($t0) undefined

  • r $s3, $s4, $t3

IF ID IF ID IF

MEM

ID IF

1 2 3 4 5 6 7 8

ID

WB MEM WB MEM WB MEM WB

In what stage is it detected? In what cycle?

  • 1. Detection
slide-113
SLIDE 113

What happens if the third instruction is undefined? Time->

add $s0, $0, $0 lw $s1, 0($t0) undefined

  • r $s3, $s4, $t3

IF ID IF ID IF

MEM

ID IF

1 2 3 4 5 6 7 8

ID

WB MEM WB MEM WB MEM WB

In what stage is it detected? Decode In what cycle? 4

  • 1. Detection
slide-114
SLIDE 114
  • 1. Detection
  • Must associate exception with proper

instruction

  • What happens if multiple exceptions

happen in the same cycle?

w Prioritize exceptions (earliest instructions have priority)

slide-115
SLIDE 115

Time->

add $s0, $0, $0 lw $s1, 0($t0) undefined

  • r $s3, $s4, $t3

IF ID IF ID IF

MEM

ID IF

1 2 3 4 5 6 7 8

  • 2. Preserve state before

instruction

What? What does that mean?!?

slide-116
SLIDE 116

Time->

add $s0, $0, $0 lw $s1, 0($t0) undefined

  • r $s3, $s4, $t3

IF ID IF ID IF

MEM

ID IF

1 2 3 4 5 6 7 8

MEM WB

  • 2. Preserve state before

instruction

What? What does that mean?!? Complete previous instructions, flush following instructions and do not let current write back

slide-117
SLIDE 117

Time->

add $s0, $0, $0 lw $s1, 0($t0) undefined

  • r $s3, $s4, $t3

IF ID IF ID IF

MEM

ID IF

1 2 3 4 5 6 7 8

MEM WB MEM WB

  • 2. Preserve state before

instruction

slide-118
SLIDE 118
  • 3. Record exception type
  • Place value in cause register or

w Special status register that holds information about type of exception

§ Single entry point to OS

  • Use vectored interrupts

w Address to which control transferred determined by cause of exception (i.e., exception routine address dependent on exception type)

§ Many entry points to OS

slide-119
SLIDE 119

P C

4 4

Addr Instr

Inst Mem

src1 src1data src2

Reg File

src2data dest destdata

ALU

Addr OutData

Data Mem

InData X

<

Undef add lw

  • r
  • 4. Record PC in EPC

Machine in detection cycle

slide-120
SLIDE 120
  • 4. Record PC in EPC
  • Non-trivial because PC changes each

cycle, and exceptions can be detected in several stages (decode, execute, memory)

  • Precise exceptions
  • Imprecise exceptions
slide-121
SLIDE 121
  • 4. Record PC in EPC
  • Non-trivial because PC changes each

cycle, and exceptions can be detected in several stages (decode, execute, memory)

  • Precise exceptions figure out PC in

hardware

  • Imprecise exceptions let OS figure it
  • ut
slide-122
SLIDE 122
  • 5. Transfer control to OS
  • Same as before