Chapter 7 Digital Design and Computer Architecture , 2 nd Edition - - PowerPoint PPT Presentation

chapter 7
SMART_READER_LITE
LIVE PREVIEW

Chapter 7 Digital Design and Computer Architecture , 2 nd Edition - - PowerPoint PPT Presentation

Chapter 7 Digital Design and Computer Architecture , 2 nd Edition David Money Harris and Sarah L. Harris Chapter 7 <1> Chapter 7 :: Topics Introduction Performance Analysis Single-Cycle Processor Multicycle Processor


slide-1
SLIDE 1

Chapter 7 <1>

Digital Design and Computer Architecture, 2nd Edition

Chapter 7

David Money Harris and Sarah L. Harris

slide-2
SLIDE 2

Chapter 7 <2>

Chapter 7 :: Topics

  • Introduction
  • Performance Analysis
  • Single-Cycle Processor
  • Multicycle Processor
  • Pipelined Processor
  • Exceptions
  • Advanced Microarchitecture
slide-3
SLIDE 3

Chapter 7 <3>

  • Microarchitecture: how to

implement an architecture in hardware

  • Processor:

– Datapath: functional blocks – Control: control signals

Physics Devices Analog Circuits Digital Circuits Logic Micro- architecture Architecture Operating Systems Application Software electrons transistors diodes amplifiers filters AND gates NOT gates adders memories datapaths controllers instructions registers device drivers programs

Introduction

slide-4
SLIDE 4

Chapter 7 <4>

  • Multiple implementations for a single

architecture:

– Single-cycle: Each instruction executes in a single cycle – Multicycle: Each instruction is broken into series

  • f shorter steps

– Pipelined: Each instruction broken up into series

  • f steps & multiple instructions execute at once

Microarchitecture

slide-5
SLIDE 5

Chapter 7 <5>

  • Program execution time

Execution Time = (#instructions)(cycles/instruction)(seconds/cycle)

  • Definitions:

– CPI: Cycles/instruction – clock period: seconds/cycle – IPC: instructions/cycle = IPC

  • Challenge is to satisfy constraints of:

– Cost – Power – Performance

Processor Performance

slide-6
SLIDE 6

Chapter 7 <6>

  • Consider subset of MIPS instructions:

– R-type instructions: and, or, add, sub, slt – Memory instructions: lw, sw – Branch instructions: beq

MIPS Processor

slide-7
SLIDE 7

Chapter 7 <7>

  • Determines everything about a processor:

– PC – 32 registers – Memory

Architectural State

slide-8
SLIDE 8

Chapter 7 <8>

CLK A RD Instruction Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Register File A RD Data Memory WD WE PC PC' CLK

32 32 32 32 32 32 32 32 32 32 5 5 5

MIPS State Elements

slide-9
SLIDE 9

Chapter 7 <9>

  • Datapath
  • Control

Single-Cycle MIPS Processor

slide-10
SLIDE 10

Chapter 7 <10>

STEP 1: Fetch instruction

CLK A RD Instruction Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Register File A RD Data Memory WD WE PC PC' Instr CLK

Single-Cycle Datapath: lw fetch

slide-11
SLIDE 11

Chapter 7 <11>

STEP 2: Read source operands from RF

Instr CLK A RD Instruction Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Register File A RD Data Memory WD WE PC PC'

25:21

CLK

Single-Cycle Datapath: lw Register Read

slide-12
SLIDE 12

Chapter 7 <12>

STEP 3: Sign-extend the immediate

SignImm CLK A RD Instruction Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File A RD Data Memory WD WE PC PC' Instr

25:21 15:0

CLK

Single-Cycle Datapath: lw Immediate

slide-13
SLIDE 13

Chapter 7 <13>

STEP 4: Compute the memory address

SignImm CLK A RD Instruction Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File A RD Data Memory WD WE PC PC' Instr

25:21 15:0

SrcB ALUResult SrcA Zero CLK ALUControl2:0

ALU

010

Single-Cycle Datapath: lw address

slide-14
SLIDE 14

Chapter 7 <14>

  • STEP 5: Read data from memory and write

it back to register file

A1 A3 WD3 RD2 RD1 WE3 A2 SignImm CLK A RD Instruction Memory CLK Sign Extend Register File A RD Data Memory WD WE PC PC' Instr

25:21 15:0

SrcB

20:16

ALUResult ReadData SrcA RegWrite Zero CLK ALUControl2:0

ALU

010 1

Single-Cycle Datapath: lw Memory Read

slide-15
SLIDE 15

Chapter 7 <15>

STEP 6: Determine address of next instruction

SignImm CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File A RD Data Memory WD WE PC PC' Instr

25:21 15:0

SrcB

20:16

ALUResult ReadData SrcA PCPlus4 Result RegWrite Zero CLK ALUControl2:0

ALU

010 1

Single-Cycle Datapath: lw PC Increment

slide-16
SLIDE 16

Chapter 7 <16>

Write data in rt to memory

SignImm CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File A RD Data Memory WD WE PC PC' Instr

25:21 20:16 15:0

SrcB

20:16

ALUResult ReadData WriteData SrcA PCPlus4 Result MemWrite RegWrite Zero CLK ALUControl2:0

ALU

1 010

Single-Cycle Datapath: sw

slide-17
SLIDE 17

Chapter 7 <17>

  • Read from rs and rt
  • Write ALUResult to register file
  • Write to rd (instead of rt)

SignImm CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1 PC PC' Instr

25:21 20:16 15:0

SrcB

20:16 15:11

ALUResult ReadData WriteData SrcA PCPlus4 WriteReg4:0 Result RegDst MemWrite MemtoReg ALUSrc RegWrite Zero CLK ALUControl2:0

ALU

varies 1 1

Single-Cycle Datapath: R-Type

slide-18
SLIDE 18

Chapter 7 <18>

  • Determine whether values in rs and rt are equal
  • Calculate branch target address:

BTA = (sign-extended immediate << 2) + (PC+4)

SignImm CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1 PC 1 PC' Instr

25:21 20:16 15:0

SrcB

20:16 15:11

<<2

+

ALUResult ReadData WriteData SrcA PCPlus4 PCBranch WriteReg4:0 Result RegDst Branch MemWrite MemtoReg ALUSrc RegWrite Zero PCSrc CLK ALUControl2:0

ALU

110 x x 1

Single-Cycle Datapath: beq

slide-19
SLIDE 19

Chapter 7 <19>

SignImm CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1 PC 1 PC' Instr

25:21 20:16 15:0 5:0

SrcB

20:16 15:11

<<2

+

ALUResult ReadData WriteData SrcA PCPlus4 PCBranch WriteReg4:0 Result

31:26

RegDst Branch MemWrite MemtoReg ALUSrc RegWrite Op Funct Control Unit Zero PCSrc CLK ALUControl2:0

ALU

Single-Cycle Processor

slide-20
SLIDE 20

Chapter 7 <20>

RegDst Branch MemWrite MemtoReg ALUSrc Opcode5:0 Control Unit ALUControl2:0 Funct5:0 Main Decoder ALUOp1:0 ALU Decoder RegWrite

Single-Cycle Control

slide-21
SLIDE 21

Chapter 7 <21>

ALU

N N N 3 A B Y F

F2:0 Function 000 A & B 001 A | B 010 A + B 011 not used 100 A & ~B 101 A | ~B 110 A - B 111 SLT

Review: ALU

slide-22
SLIDE 22

Chapter 7 <22>

+ 2 1 A B Cout Y 3 1 F2 F1:0

[N-1] S

N N N N N N N N N 2 Zero Extend

Review: ALU

slide-23
SLIDE 23

Chapter 7 <23>

ALUOp1:0 Meaning 00 Add 01 Subtract 10 Look at Funct 11 Not Used ALUOp1:0 Funct ALUControl2:0 00 X 010 (Add) X1 X 110 (Subtract) 1X 100000 (add) 010 (Add) 1X 100010 (sub) 110 (Subtract) 1X 100100 (and) 000 (And) 1X 100101 (or) 001 (Or) 1X 101010 (slt) 111 (SLT)

Control Unit: ALU Decoder

slide-24
SLIDE 24

Chapter 7 <24>

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000 lw 100011 sw 101011 beq 000100

SignImm CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1 PC 1 PC' Instr

25:21 20:16 15:0 5:0

SrcB

20:16 15:11

<<2

+

ALUResult ReadData WriteData SrcA PCPlus4 PCBranch WriteReg4:0 Result

31:26

RegDst Branch MemWrite MemtoReg ALUSrc RegWrite Op Funct Control Unit Zero PCSrc CLK ALUControl2:0

ALU

Control Unit Main Decoder

slide-25
SLIDE 25

Chapter 7 <25>

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000

1 1 10

lw 100011

1 1 00

sw 101011

X 1 1 X 00

beq 000100

X 1 X 01

Control Unit: Main Decoder

slide-26
SLIDE 26

Chapter 7 <26>

SignImm CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1 PC 1 PC' Instr

25:21 20:16 15:0 5:0

SrcB

20:16 15:11

<<2

+

ALUResult ReadData WriteData SrcA PCPlus4 PCBranch WriteReg4:0 Result

31:26

RegDst Branch MemWrite MemtoReg ALUSrc RegWrite Op Funct Control Unit Zero PCSrc CLK ALUControl2:0

ALU

001 1 1

Single-Cycle Datapath: or

slide-27
SLIDE 27

Chapter 7 <27>

SignImm CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1 PC 1 PC' Instr

25:21 20:16 15:0 5:0

SrcB

20:16 15:11

<<2

+

ALUResult ReadData WriteData SrcA PCPlus4 PCBranch WriteReg4:0 Result

31:26

RegDst Branch MemWrite MemtoReg ALUSrc RegWrite Op Funct Control Unit Zero PCSrc CLK ALUControl2:0

ALU

No change to datapath

Extended Functionality: addi

slide-28
SLIDE 28

Chapter 7 <28>

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000

1 1 10

lw 100011

1 1 1 00

sw 101011

X 1 1 X 00

beq 000100

X 1 X 01

addi 001000

Control Unit: addi

slide-29
SLIDE 29

Chapter 7 <29>

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000

1 1 10

lw 100011

1 1 1 00

sw 101011

X 1 1 X 00

beq 000100

X 1 X 01

addi 001000

1 1 00

Control Unit: addi

slide-30
SLIDE 30

Chapter 7 <30>

SignImm CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1 PC 1 PC' Instr

25:21 20:16 15:0 5:0

SrcB

20:16 15:11

<<2

+

ALUResult ReadData WriteData SrcA PCPlus4 PCBranch WriteReg4:0 Result

31:26

RegDst Branch MemWrite MemtoReg ALUSrc RegWrite Op Funct Control Unit Zero PCSrc CLK ALUControl2:0

ALU

1

25:0

<<2

27:0 31:28

PCJump Jump

Extended Functionality: j

slide-31
SLIDE 31

Chapter 7 <31>

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump

R-type 000000

1 1 10

lw 100011

1 1 1 00

sw 101011

X 1 1 X 00

beq 000100

X 1 X 01

j 000010

Control Unit: Main Decoder

slide-32
SLIDE 32

Chapter 7 <32>

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump

R-type 000000

1 1 10

lw 100011

1 1 1 00

sw 101011

X 1 1 X 00

beq 000100

X 1 X 01

j 000010

X X X X XX 1

Control Unit: Main Decoder

slide-33
SLIDE 33

Chapter 7 <33>

Program Execution Time

= (#instructions)(cycles/instruction)(seconds/cycle) = # instructions x CPI x TC

Review: Processor Performance

slide-34
SLIDE 34

Chapter 7 <34>

SignImm CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1 PC 1 PC' Instr

25:21 20:16 15:0 5:0

SrcB

20:16 15:11

<<2

+

ALUResult ReadData WriteData SrcA PCPlus4 PCBranch WriteReg4:0 Result

31:26

RegDst Branch MemWrite MemtoReg ALUSrc RegWrite Op Funct Control Unit Zero PCSrc CLK ALUControl2:0

ALU

1 010 1 1

TC limited by critical path (lw)

Single-Cycle Performance

slide-35
SLIDE 35

Chapter 7 <35>

  • Single-cycle critical path:

Tc = tpcq_PC + tmem + max(tRFread, tsext + tmux) + tALU + tmem + tmux + tRFsetup

  • Typically, limiting paths are:

– memory, ALU, register file – Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup

Single-Cycle Performance

slide-36
SLIDE 36

Chapter 7 <36>

Element Parameter Delay (ps)

Register clock-to-Q tpcq_PC 30 Register setup tsetup 20 Multiplexer tmux 25 ALU tALU 200 Memory read tmem 250 Register file read tRFread 150 Register file setup tRFsetup 20

Tc = ?

Single-Cycle Performance Example

slide-37
SLIDE 37

Chapter 7 <37>

Element Parameter Delay (ps)

Register clock-to-Q tpcq_PC 30 Register setup tsetup 20 Multiplexer tmux 25 ALU tALU 200 Memory read tmem 250 Register file read tRFread 150 Register file setup tRFsetup 20

Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup = [30 + 2(250) + 150 + 25 + 200 + 20] ps = 925 ps

Single-Cycle Performance Example

slide-38
SLIDE 38

Chapter 7 <38>

Program with 100 billion instructions: Execution Time = # instructions x CPI x TC = (100 × 109)(1)(925 × 10-12 s) = 92.5 seconds

Single-Cycle Performance Example

slide-39
SLIDE 39

Chapter 7 <39>

  • Single-cycle:

+ simple

  • cycle time limited by longest instruction (lw)
  • 2 adders/ALUs & 2 memories
  • Multicycle:

+ higher clock speed + simpler instructions run faster + reuse expensive hardware on multiple cycles

  • sequencing overhead paid many times
  • Same design steps: datapath & control

Multicycle MIPS Processor

slide-40
SLIDE 40

Chapter 7 <40>

CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Register File PC PC' WD WE CLK

EN

  • Replace Instruction and Data memories with

a single unified memory – more realistic

Multicycle State Elements

slide-41
SLIDE 41

Chapter 7 <41>

b

CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Register File PC PC' Instr CLK WD WE CLK

EN

IRWrite

STEP 1: Fetch instruction

Multicycle Datapath: Instruction Fetch

slide-42
SLIDE 42

Chapter 7 <42>

b

CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Register File PC PC' Instr

25:21

CLK WD WE CLK CLK A

EN

IRWrite

Multicycle Datapath: lw Register Read

STEP 2a: Read source operands from RF

slide-43
SLIDE 43

Chapter 7 <43>

SignImm

b

CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File PC PC' Instr

25:21 15:0

CLK WD WE CLK CLK A

EN

IRWrite

Multicycle Datapath: lw Immediate

STEP 2b: Sign-extend the immediate

slide-44
SLIDE 44

Chapter 7 <44>

SignImm

b

CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File PC PC' Instr

25:21 15:0

SrcB ALUResult SrcA ALUOut CLK ALUControl2:0

ALU

WD WE CLK CLK A CLK

EN

IRWrite

Multicycle Datapath: lw Address

STEP 3: Compute the memory address

slide-45
SLIDE 45

Chapter 7 <45>

SignImm

b

CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File PC PC' Instr

25:21 15:0

SrcB ALUResult SrcA ALUOut CLK ALUControl2:0

ALU

WD WE CLK Adr Data CLK CLK A CLK

EN

IRWrite IorD 1

Multicycle Datapath: lw Memory Read

STEP 4: Read data from memory

slide-46
SLIDE 46

Chapter 7 <46>

SignImm

b

CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File PC PC' Instr

25:21 15:0

SrcB

20:16

ALUResult SrcA ALUOut RegWrite CLK ALUControl2:0

ALU

WD WE CLK Adr Data CLK CLK A CLK

EN

IRWrite IorD 1

Multicycle Datapath: lw Write Register

STEP 5: Write data back to register file

slide-47
SLIDE 47

Chapter 7 <47>

PCWrite SignImm

b

CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 PC PC' Instr

25:21 15:0

SrcB

20:16

ALUResult SrcA ALUOut ALUSrcA RegWrite CLK ALUControl2:0

ALU

WD WE CLK Adr Data CLK CLK A 00 01 10 11 4 CLK

EN EN

ALUSrcB1:0 IRWrite IorD 1

Multicycle Datapath: Increment PC

STEP 6: Increment PC

slide-48
SLIDE 48

Chapter 7 <48>

SignImm

b

CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 PC 1 PC' Instr

25:21 20:16 15:0

SrcB

20:16

ALUResult SrcA ALUOut MemWrite ALUSrcA RegWrite CLK ALUControl2:0

ALU

WD WE CLK Adr Data CLK CLK A 00 01 10 11 4 CLK

EN EN

ALUSrcB1:0 IRWrite IorD PCWrite B

Write data in rt to memory

Multicycle Datapath: sw

slide-49
SLIDE 49

Chapter 7 <49>

1 SignImm

b

CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 PC 1 PC' Instr

25:21 20:16 15:0

SrcB

20:16 15:11

ALUResult SrcA ALUOut RegDst MemWrite MemtoReg ALUSrcA RegWrite CLK ALUControl2:0

ALU

WD WE CLK Adr Data CLK CLK A B 00 01 10 11 4 CLK

EN EN

ALUSrcB1:0 IRWrite IorD PCWrite

  • Read from rs and rt
  • Write ALUResult to register file
  • Write to rd (instead of rt)

Multicycle Datapath: R-Type

slide-50
SLIDE 50

Chapter 7 <50>

SignImm

b

CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 1 PC 1 PC' Instr

25:21 20:16 15:0

SrcB

20:16 15:11

<<2 ALUResult SrcA ALUOut RegDst Branch MemWrite MemtoReg ALUSrcA RegWrite Zero PCSrc CLK ALUControl2:0

ALU

WD WE CLK Adr 1 Data CLK CLK A B 00 01 10 11 4 CLK

EN EN

ALUSrcB1:0 IRWrite IorD PCWrite PCEn

  • rs == rt?
  • BTA = (sign-extended immediate << 2) + (PC+4)

Multicycle Datapath: beq

slide-51
SLIDE 51

Chapter 7 <51>

SignImm CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 1 PC 1 PC' Instr

25:21 20:16 15:0 5:0

SrcB

20:16 15:11

<<2 ALUResult SrcA ALUOut

31:26

RegDst Branch MemWrite

MemtoReg

ALUSrcA RegWrite Op Funct Control Unit Zero PCSrc CLK CLK ALUControl2:0

ALU

WD WE CLK Adr 1 Data CLK CLK A B 00 01 10 11 4 CLK

EN EN

ALUSrcB1:0 IRWrite IorD PCWrite PCEn

Multicycle Processor

slide-52
SLIDE 52

Chapter 7 <52>

ALUSrcA PCSrc Branch ALUSrcB1:0 Opcode5:0 Control Unit ALUControl2:0 Funct5:0 Main Controller (FSM) ALUOp1:0 ALU Decoder RegWrite PCWrite IorD MemWrite IRWrite RegDst MemtoReg Register Enables Multiplexer Selects

Multicycle Control

slide-53
SLIDE 53

Chapter 7 <53>

SignImm CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 1 PC 1 PC' Instr

25:21 20:16 15:0 5:0

SrcB

20:16 15:11

<<2 ALUResult SrcA ALUOut

31:26

RegDst Branch MemWrite

MemtoReg

ALUSrcA RegWrite Op Funct Control Unit Zero PCSrc CLK CLK ALUControl2:0

ALU

WD WE CLK Adr 1 Data CLK CLK A B 00 01 10 11 4 CLK

EN EN

ALUSrcB1:0 IRWrite IorD PCWrite PCEn 1 1 X X 01 010 1

Reset S0: Fetch

Main Controller FSM: Fetch

slide-54
SLIDE 54

Chapter 7 <54>

SignImm CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 1 PC 1 PC' Instr

25:21 20:16 15:0 5:0

SrcB

20:16 15:11

<<2 ALUResult SrcA ALUOut

31:26

RegDst Branch MemWrite

MemtoReg

ALUSrcA RegWrite Op Funct Control Unit Zero PCSrc CLK CLK ALUControl2:0

ALU

WD WE CLK Adr 1 Data CLK CLK A B 00 01 10 11 4 CLK

EN EN

ALUSrcB1:0 IRWrite IorD PCWrite PCEn 1 1 X X 01 010 1

IorD = 0 AluSrcA = 0 ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite Reset S0: Fetch

Main Controller FSM: Fetch

slide-55
SLIDE 55

Chapter 7 <55>

IorD = 0 AluSrcA = 0 ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite Reset S0: Fetch S1: Decode

SignImm CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 1 PC 1 PC' Instr

25:21 20:16 15:0 5:0

SrcB

20:16 15:11

<<2 ALUResult SrcA ALUOut

31:26

RegDst Branch MemWrite

MemtoReg

ALUSrcA RegWrite Op Funct Control Unit Zero PCSrc CLK CLK ALUControl2:0

ALU

WD WE CLK Adr 1 Data CLK CLK A B 00 01 10 11 4 CLK

EN EN

ALUSrcB1:0 IRWrite IorD PCWrite PCEn X X X X XX XXX X

Main Controller FSM: Decode

slide-56
SLIDE 56

Chapter 7 <56>

IorD = 0 AluSrcA = 0 ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite Reset S0: Fetch S2: MemAdr S1: Decode Op = LW

  • r

Op = SW

SignImm CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 1 PC 1 PC' Instr

25:21 20:16 15:0 5:0

SrcB

20:16 15:11

<<2 ALUResult SrcA ALUOut

31:26

RegDst Branch MemWrite

MemtoReg

ALUSrcA RegWrite Op Funct Control Unit Zero PCSrc CLK CLK ALUControl2:0

ALU

WD WE CLK Adr 1 Data CLK CLK A B 00 01 10 11 4 CLK

EN EN

ALUSrcB1:0 IRWrite IorD PCWrite PCEn X X X 1 10 010 X

Main Controller FSM: Address

slide-57
SLIDE 57

Chapter 7 <57>

IorD = 0 AluSrcA = 0 ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 Reset S0: Fetch S2: MemAdr S1: Decode Op = LW

  • r

Op = SW

SignImm CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 1 PC 1 PC' Instr

25:21 20:16 15:0 5:0

SrcB

20:16 15:11

<<2 ALUResult SrcA ALUOut

31:26

RegDst Branch MemWrite

MemtoReg

ALUSrcA RegWrite Op Funct Control Unit Zero PCSrc CLK CLK ALUControl2:0

ALU

WD WE CLK Adr 1 Data CLK CLK A B 00 01 10 11 4 CLK

EN EN

ALUSrcB1:0 IRWrite IorD PCWrite PCEn X X X 1 10 010 X

Main Controller FSM: Address

slide-58
SLIDE 58

Chapter 7 <58>

IorD = 0 AluSrcA = 0 ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 IorD = 1 Reset S0: Fetch S2: MemAdr S1: Decode S3: MemRead Op = LW

  • r

Op = SW Op = LW RegDst = 0 MemtoReg = 1 RegWrite S4: Mem Writeback

Main Controller FSM: lw

slide-59
SLIDE 59

Chapter 7 <59>

IorD = 0 AluSrcA = 0 ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 IorD = 1 IorD = 1 MemWrite Reset S0: Fetch S2: MemAdr S1: Decode S3: MemRead S5: MemWrite Op = LW

  • r

Op = SW Op = LW Op = SW RegDst = 0 MemtoReg = 1 RegWrite S4: Mem Writeback

Main Controller FSM: sw

slide-60
SLIDE 60

Chapter 7 <60>

IorD = 0 AluSrcA = 0 ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 IorD = 1 RegDst = 1 MemtoReg = 0 RegWrite IorD = 1 MemWrite ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 Reset S0: Fetch S2: MemAdr S1: Decode S3: MemRead S5: MemWrite S6: Execute S7: ALU Writeback Op = LW

  • r

Op = SW Op = R-type Op = LW Op = SW RegDst = 0 MemtoReg = 1 RegWrite S4: Mem Writeback

Main Controller FSM: R-Type

slide-61
SLIDE 61

Chapter 7 <61>

IorD = 0 AluSrcA = 0 ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 IorD = 1 RegDst = 1 MemtoReg = 0 RegWrite IorD = 1 MemWrite ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCSrc = 1 Branch Reset S0: Fetch S2: MemAdr S1: Decode S3: MemRead S5: MemWrite S6: Execute S7: ALU Writeback S8: Branch Op = LW

  • r

Op = SW Op = R-type Op = BEQ Op = LW Op = SW RegDst = 0 MemtoReg = 1 RegWrite S4: Mem Writeback

Main Controller FSM: beq

slide-62
SLIDE 62

Chapter 7 <62>

IorD = 0 AluSrcA = 0 ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 IorD = 1 RegDst = 1 MemtoReg = 0 RegWrite IorD = 1 MemWrite ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCSrc = 1 Branch Reset S0: Fetch S2: MemAdr S1: Decode S3: MemRead S5: MemWrite S6: Execute S7: ALU Writeback S8: Branch Op = LW

  • r

Op = SW Op = R-type Op = BEQ Op = LW Op = SW RegDst = 0 MemtoReg = 1 RegWrite S4: Mem Writeback

Multicycle Controller FSM

slide-63
SLIDE 63

Chapter 7 <63>

IorD = 0 AluSrcA = 0 ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 IorD = 1 RegDst = 1 MemtoReg = 0 RegWrite IorD = 1 MemWrite ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCSrc = 1 Branch Reset S0: Fetch S2: MemAdr S1: Decode S3: MemRead S5: MemWrite S6: Execute S7: ALU Writeback S8: Branch Op = LW

  • r

Op = SW Op = R-type Op = BEQ Op = LW Op = SW RegDst = 0 MemtoReg = 1 RegWrite S4: Mem Writeback Op = ADDI S9: ADDI Execute S10: ADDI Writeback

Extended Functionality: addi

slide-64
SLIDE 64

Chapter 7 <64>

IorD = 0 AluSrcA = 0 ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 IorD = 1 RegDst = 1 MemtoReg = 0 RegWrite IorD = 1 MemWrite ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCSrc = 1 Branch Reset S0: Fetch S2: MemAdr S1: Decode S3: MemRead S5: MemWrite S6: Execute S7: ALU Writeback S8: Branch Op = LW

  • r

Op = SW Op = R-type Op = BEQ Op = LW Op = SW RegDst = 0 MemtoReg = 1 RegWrite S4: Mem Writeback ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 RegDst = 0 MemtoReg = 0 RegWrite Op = ADDI S9: ADDI Execute S10: ADDI Writeback

Main Controller FSM: addi

slide-65
SLIDE 65

Chapter 7 <65>

SignImm CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 PC 1 PC' Instr

25:21 20:16 15:0

SrcB

20:16 15:11

<<2 ALUResult SrcA ALUOut RegDst Branch MemWrite MemtoReg ALUSrcA RegWrite Zero PCSrc1:0 CLK ALUControl2:0

ALU

WD WE CLK Adr 1 Data CLK CLK A B 00 01 10 11 4 CLK

EN EN

ALUSrcB1:0 IRWrite IorD PCWrite PCEn 00 01 10 <<2

25:0 (jump) 31:28 27:0

PCJump

Extended Functionality: j

slide-66
SLIDE 66

Chapter 7 <66>

IorD = 0 AluSrcA = 0 ALUSrcB = 01 ALUOp = 00 PCSrc = 00 IRWrite PCWrite ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 IorD = 1 RegDst = 1 MemtoReg = 0 RegWrite IorD = 1 MemWrite ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCSrc = 01 Branch Reset S0: Fetch S2: MemAdr S1: Decode S3: MemRead S5: MemWrite S6: Execute S7: ALU Writeback S8: Branch Op = LW

  • r

Op = SW Op = R-type Op = BEQ Op = LW Op = SW RegDst = 0 MemtoReg = 1 RegWrite S4: Mem Writeback ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 RegDst = 0 MemtoReg = 0 RegWrite Op = ADDI S9: ADDI Execute S10: ADDI Writeback Op = J S11: Jump

Main Controller FSM: j

slide-67
SLIDE 67

Chapter 7 <67>

IorD = 0 AluSrcA = 0 ALUSrcB = 01 ALUOp = 00 PCSrc = 00 IRWrite PCWrite ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 IorD = 1 RegDst = 1 MemtoReg = 0 RegWrite IorD = 1 MemWrite ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCSrc = 01 Branch Reset S0: Fetch S2: MemAdr S1: Decode S3: MemRead S5: MemWrite S6: Execute S7: ALU Writeback S8: Branch Op = LW

  • r

Op = SW Op = R-type Op = BEQ Op = LW Op = SW RegDst = 0 MemtoReg = 1 RegWrite S4: Mem Writeback ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 RegDst = 0 MemtoReg = 0 RegWrite Op = ADDI S9: ADDI Execute S10: ADDI Writeback PCSrc = 10 PCWrite Op = J S11: Jump

Main Controller FSM: j

slide-68
SLIDE 68

Chapter 7 <68>

  • Instructions take different number of cycles:

– 3 cycles: beq, j – 4 cycles: R-Type, sw, addi – 5 cycles: lw

  • CPI is weighted average
  • SPECINT2000 benchmark:

– 25% loads – 10% stores – 11% branches – 2% jumps – 52% R-type

Average CPI = (0.11 + 0.02)(3) + (0.52 + 0.10)(4) + (0.25)(5) = 4.12

Multicycle Processor Performance

slide-69
SLIDE 69

Chapter 7 <69>

SignImm CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 1 PC 1 PC' Instr

25:21 20:16 15:0 5:0

SrcB

20:16 15:11

<<2 ALUResult SrcA ALUOut

31:26

RegDst Branch MemWrite

MemtoReg

ALUSrcA RegWrite Op Funct Control Unit Zero PCSrc CLK CLK ALUControl2:0

ALU

WD WE CLK Adr 1 Data CLK CLK A B 00 01 10 11 4 CLK

EN EN

ALUSrcB1:0 IRWrite IorD PCWrite PCEn

Multicycle critical path:

Tc = tpcq + tmux + max(tALU + tmux, tmem) + tsetup

Multicycle Processor Performance

slide-70
SLIDE 70

Chapter 7 <70>

Element Parameter Delay (ps)

Register clock-to-Q tpcq_PC 30 Register setup tsetup 20 Multiplexer tmux 25 ALU tALU 200 Memory read tmem 250 Register file read tRFread 150 Register file setup tRFsetup 20

Tc = ?

Multicycle Performance Example

slide-71
SLIDE 71

Chapter 7 <71>

Element Parameter Delay (ps)

Register clock-to-Q tpcq_PC 30 Register setup tsetup 20 Multiplexer tmux 25 ALU tALU 200 Memory read tmem 250 Register file read tRFread 150 Register file setup tRFsetup 20

Tc = tpcq_PC + tmux + max(tALU + tmux, tmem) + tsetup = tpcq_PC + tmux + tmem + tsetup = [30 + 25 + 250 + 20] ps = 325 ps

Multicycle Performance Example

slide-72
SLIDE 72

Chapter 7 <72>

Program with 100 billion instructions Execution Time = ?

Multicycle Performance Example

slide-73
SLIDE 73

Chapter 7 <73>

Program with 100 billion instructions Execution Time = (# instructions) × CPI × Tc

= (100 × 109)(4.12)(325 × 10-12) = 133.9 seconds

This is slower than the single-cycle processor (92.5 seconds). Why?

Multicycle Performance Example

slide-74
SLIDE 74

Chapter 7 <74>

Program with 100 billion instructions Execution Time = (# instructions) × CPI × Tc

= (100 × 109)(4.12)(325 × 10-12) = 133.9 seconds

This is slower than the single-cycle processor (92.5 seconds). Why?

– Not all steps same length – Sequencing overhead for each step (tpcq + tsetup= 50 ps)

Multicycle Performance Example

slide-75
SLIDE 75

Chapter 7 <75>

SignImm CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1 PC 1 PC' Instr

25:21 20:16 15:0 5:0

SrcB

20:16 15:11

<<2

+

ALUResult ReadData WriteData SrcA PCPlus4 PCBranch WriteReg4:0 Result

31:26

RegDst Branch MemWrite MemtoReg ALUSrc RegWrite Op Funct Control Unit Zero PCSrc CLK ALUControl2:0

ALU

1

25:0

<<2

27:0 31:28

PCJump Jump

Review: Single-Cycle Processor

slide-76
SLIDE 76

Chapter 7 <76>

ImmExt CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 PC 1 PC' Instr

25:21 20:16 15:0

SrcB

20:16 15:11

<<2 ALUResult SrcA ALUOut Zero CLK

ALU

WD WE CLK Adr 1 Data CLK CLK A B 00 01 10 11 4 CLK

EN EN

00 01 10 <<2

25:0 (Addr) 31:28 27:0

PCJump

5:0 31:26

Branch MemWrite ALUSrcA RegWrite Op Funct Control Unit PCSrc CLK ALUControl2:0 ALUSrcB1:0 IRWrite IorD PCWrite PCEn RegDst

MemtoReg

Review: Multicycle Processor

slide-77
SLIDE 77

Chapter 7 <77>

  • Temporal parallelism
  • Divide single-cycle processor into 5 stages:

– Fetch – Decode – Execute – Memory – Writeback

  • Add pipeline registers between stages

Pipelined MIPS Processor

slide-78
SLIDE 78

Chapter 7 <78>

Time (ps) Instr Fetch Instruction Decode

Read Reg

Execute ALU Memory Read / Write Write Reg 1 2 100 200 300 400 500 600 700 800 900 1100 1200 1300 1400 1500 1600 1700 1800 1900 1000 Instr 1 2 3 Fetch Instruction Decode

Read Reg

Execute ALU Memory Read / Write Write Reg Fetch Instruction Decode

Read Reg

Execute ALU Memory Read/Write Write Reg Fetch Instruction Decode

Read Reg

Execute ALU Memory Read/Write Write Reg Fetch Instruction Decode

Read Reg

Execute ALU Memory Read/Write Write Reg

Single-Cycle Pipelined

Single-Cycle vs. Pipelined

slide-79
SLIDE 79

Chapter 7 <79>

Time (cycles)

lw $s2, 40($0)

RF 40 $0 RF $s2

+

DM RF $t2 $t1 RF $s3

+

DM RF $s5 $s1 RF $s4

  • DM

RF $t6 $t5 RF $s5

&

DM RF 20 $s1 RF $s6

+

DM RF $t4 $t3 RF $s7

|

DM

add $s3, $t1, $t2 sub $s4, $s1, $s5 and $s5, $t5, $t6 sw $s6, 20($s1)

  • r $s7, $t3, $t4

1 2 3 4 5 6 7 8 9 10

add IM IM IM IM IM IM lw sub and sw

  • r

Pipelined Processor Abstraction

slide-80
SLIDE 80

Chapter 7 <80>

SignImmE

CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1

PCF

1

PC' InstrD

25:21 20:16 15:0

SrcBE

20:16 15:11

RtE RdE

<<2

+

ALUOutM ALUOutW ReadDataW WriteDataE WriteDataM SrcAE PCPlus4D PCBranchM ResultW PCPlus4E PCPlus4F ZeroM

CLK CLK

ALU

WriteRegE4:0

CLK CLK CLK SignImm CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1 PC 1 PC' Instr

25:21 20:16 15:0

SrcB

20:16 15:11

<<2

+

ALUResult ReadData WriteData SrcA PCPlus4 PCBranch WriteReg4:0 Result Zero CLK

ALU Fetch Decode Execute Memory Writeback

Single-Cycle & Pipelined Datapath

slide-81
SLIDE 81

Chapter 7 <81>

SignImmE

CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1

PCF

1

PC' InstrD

25:21 20:16 15:0

SrcBE

20:16 15:11

RtE RdE

<<2

+

ALUOutM ALUOutW ReadDataW WriteDataE WriteDataM SrcAE PCPlus4D PCBranchM WriteRegM4:0 ResultW PCPlus4E PCPlus4F ZeroM

CLK CLK

WriteRegW4:0

ALU

WriteRegE4:0

CLK CLK CLK

Fetch Decode Execute Memory Writeback

WriteReg must arrive at same time as Result

Corrected Pipelined Datapath

slide-82
SLIDE 82

Chapter 7 <82>

SignImmE

CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1

PCF

1

PC' InstrD

25:21 20:16 15:0 5:0

SrcBE

20:16 15:11

RtE RdE

<<2

+

ALUOutM ALUOutW ReadDataW WriteDataE WriteDataM SrcAE PCPlus4D PCBranchM WriteRegM4:0 ResultW PCPlus4E PCPlus4F

31:26

RegDstD BranchD MemWriteD MemtoRegD ALUControlD ALUSrcD RegWriteD Op Funct

Control Unit

ZeroM PCSrcM

CLK CLK CLK CLK CLK

WriteRegW4:0 ALUControlE2:0

ALU

RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM BranchE BranchM RegDstE ALUSrcE WriteRegE4:0

  • Same control unit as single-cycle processor
  • Control delayed to proper pipeline stage

Pipelined Processor Control

slide-83
SLIDE 83

Chapter 7 <83>

  • When an instruction depends on result from

instruction that hasn’t completed

  • Types:

– Data hazard: register value not yet written back to register file – Control hazard: next instruction not decided yet (caused by branches)

Pipeline Hazards

slide-84
SLIDE 84

Chapter 7 <84>

Time (cycles)

add $s0, $s2, $s3

RF $s3 $s2 RF $s0

+

DM RF $s1 $s0 RF $t0

&

DM RF $s0 $s4 RF $t1

|

DM RF $s5 $s0 RF $t2

  • DM

and $t0, $s0, $s1

  • r $t1, $s4, $s0

sub $t2, $s0, $s5 1 2 3 4 5 6 7 8

and IM IM IM IM add

  • r

sub

Data Hazard

slide-85
SLIDE 85

Chapter 7 <85>

  • Insert nops in code at compile time
  • Rearrange code at compile time
  • Forward data at run time
  • Stall the processor at run time

Handling Data Hazards

slide-86
SLIDE 86

Chapter 7 <86>

Time (cycles)

add $s0, $s2, $s3

RF $s3 $s2 RF $s0

+

DM RF $s1 $s0 RF $t0

&

DM RF $s0 $s4 RF $t1

|

DM RF $s5 $s0 RF $t2

  • DM

and $t0, $s0, $s1

  • r $t1, $s4, $s0

sub $t2, $s0, $s5 1 2 3 4 5 6 7 8

and IM IM IM IM add

  • r

sub

nop nop

RF RF DM nop IM RF RF DM nop IM

9 10

  • Insert enough nops for result to be ready
  • Or move independent useful instructions forward

Compile-Time Hazard Elimination

slide-87
SLIDE 87

Chapter 7 <87>

Time (cycles)

add $s0, $s2, $s3

RF $s3 $s2 RF $s0

+

DM RF $s1 $s0 RF $t0

&

DM RF $s0 $s4 RF $t1

|

DM RF $s5 $s0 RF $t2

  • DM

and $t0, $s0, $s1

  • r $t1, $s4, $s0

sub $t2, $s0, $s5 1 2 3 4 5 6 7 8

and IM IM IM IM add

  • r

sub

Data Forwarding

slide-88
SLIDE 88

Chapter 7 <88>

SignImmE

CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1

PCF

1

PC' InstrD

25:21 20:16 15:0 5:0

SrcBE

25:21 15:11

RsE RdE

<<2

+

ALUOutM ALUOutW ReadDataW WriteDataE WriteDataM SrcAE PCPlus4D PCBranchM WriteRegM4:0 ResultW PCPlus4F

31:26

RegDstD BranchD MemWriteD MemtoRegD ALUControlD2:0 ALUSrcD RegWriteD Op Funct

Control Unit

PCSrcM

CLK CLK CLK CLK CLK

WriteRegW4:0 ALUControlE2:0

ALU

RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM RegDstE ALUSrcE WriteRegE4:0 00 01 10 00 01 10 SignImmD ForwardAE ForwardBE

20:16

RtE RsD RdD RtD RegWriteM RegWriteW

Hazard Unit

PCPlus4E BranchE BranchM ZeroM

Data Forwarding

slide-89
SLIDE 89

Chapter 7 <89>

  • Forward to Execute stage from either:

– Memory stage or – Writeback stage

  • Forwarding logic for ForwardAE:

if ((rsE != 0) AND (rsE == WriteRegM) AND RegWriteM) then ForwardAE = 10 else if ((rsE != 0) AND (rsE == WriteRegW) AND RegWriteW) then ForwardAE = 01 else ForwardAE = 00

Forwarding logic for ForwardBE same, but replace rsE with rtE

Data Forwarding

slide-90
SLIDE 90

Chapter 7 <90>

Time (cycles)

lw $s0, 40($0)

RF 40 $0 RF $s0

+

DM RF $s1 $s0 RF $t0

&

DM RF $s0 $s4 RF $t1

|

DM RF $s5 $s0 RF $t2

  • DM

and $t0, $s0, $s1

  • r $t1, $s4, $s0

sub $t2, $s0, $s5 1 2 3 4 5 6 7 8

and IM IM IM IM lw

  • r

sub

Trouble!

Stalling

slide-91
SLIDE 91

Chapter 7 <91>

Time (cycles)

lw $s0, 40($0)

RF 40 $0 RF $s0

+

DM RF $s1 $s0 RF $t0

&

DM RF $s0 $s4 RF $t1

|

DM RF $s5 $s0 RF $t2

  • DM

and $t0, $s0, $s1

  • r $t1, $s4, $s0

sub $t2, $s0, $s5 1 2 3 4 5 6 7 8

and IM IM IM IM lw

  • r

sub

9

RF $s1 $s0 IM

  • r

Stall

Stalling

slide-92
SLIDE 92

Chapter 7 <92>

SignImmE

CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1

PCF

1

PC' InstrD

25:21 20:16 15:0 5:0

SrcBE

25:21 15:11

RsE RdE

<<2

+

ALUOutM ALUOutW ReadDataW WriteDataE WriteDataM SrcAE PCPlus4D PCBranchM WriteRegM4:0 ResultW PCPlus4F

31:26

RegDstD BranchD MemWriteD MemtoRegD ALUControlD2:0 ALUSrcD RegWriteD Op Funct

Control Unit

PCSrcM

CLK CLK CLK CLK CLK

WriteRegW4:0 ALUControlE2:0

ALU

RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM RegDstE ALUSrcE WriteRegE4:0 00 01 10 00 01 10 SignImmD StallF StallD ForwardAE ForwardBE

20:16

RtE RsD RdD RtD RegWriteM RegWriteW MemtoRegE

Hazard Unit

FlushE PCPlus4E BranchE BranchM ZeroM

EN EN CLR

Stalling Hardware

slide-93
SLIDE 93

Chapter 7 <93>

lwstall = ((rsD==rtE) OR (rtD==rtE)) AND MemtoRegE StallF = StallD = FlushE = lwstall

Stalling Logic

slide-94
SLIDE 94

Chapter 7 <94>

  • beq:

– branch not determined until 4th stage of pipeline – Instructions after branch fetched before branch occurs – These instructions must be flushed if branch happens

  • Branch misprediction penalty

– number of instruction flushed when branch is taken – May be reduced by determining branch earlier

Control Hazards

slide-95
SLIDE 95

Chapter 7 <95>

SignImmE

CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1

PCF

1

PC' InstrD

25:21 20:16 15:0 5:0

SrcBE

25:21 15:11

RsE RdE

<<2

+

ALUOutM ALUOutW ReadDataW WriteDataE WriteDataM SrcAE PCPlus4D PCBranchM WriteRegM4:0 ResultW PCPlus4F

31:26

RegDstD BranchD MemWriteD MemtoRegD ALUControlD2:0 ALUSrcD RegWriteD Op Funct

Control Unit

PCSrcM

CLK CLK CLK CLK CLK

WriteRegW4:0 ALUControlE2:0

ALU

RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM RegDstE ALUSrcE WriteRegE4:0 00 01 10 00 01 10 SignImmD StallF StallD ForwardAE ForwardBE

20:16

RtE RsD RdD RtD RegWriteM RegWriteW MemtoRegE

Hazard Unit

FlushE PCPlus4E BranchE BranchM ZeroM

EN EN CLR

Control Hazards: Original Pipeline

slide-96
SLIDE 96

Chapter 7 <96>

Time (cycles)

beq $t1, $t2, 40

RF $t2 $t1 RF

  • DM

RF $s1 $s0 RF

&

DM RF $s0 $s4 RF

|

DM RF $s5 $s0 RF

  • DM

and $t0, $s0, $s1

  • r $t1, $s4, $s0

sub $t2, $s0, $s5 1 2 3 4 5 6 7 8

and IM IM IM IM lw

  • r

sub

20 24 28 2C 30 ... ... 9

Flush these instructions

64 slt $t3, $s2, $s3

RF $s3 $s2 RF $t3

slt

DM IM slt

Control Hazards

slide-97
SLIDE 97

Chapter 7 <97>

EqualD SignImmE

CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1

PCF

1

PC' InstrD

25:21 20:16 15:0 5:0

SrcBE

25:21 15:11

RsE RdE

<<2

+

ALUOutM ALUOutW ReadDataW WriteDataE WriteDataM SrcAE PCPlus4D PCBranchD WriteRegM4:0 ResultW PCPlus4F

31:26

RegDstD BranchD MemWriteD MemtoRegD ALUControlD2:0 ALUSrcD RegWriteD Op Funct

Control Unit

PCSrcD

CLK CLK CLK CLK CLK

WriteRegW4:0 ALUControlE2:0

ALU

RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM RegDstE ALUSrcE WriteRegE4:0 00 01 10 00 01 10

=

SignImmD StallF StallD ForwardAE ForwardBE

20:16

RtE RsD RdE RtD RegWriteM RegWriteW MemtoRegE

Hazard Unit

FlushE

EN EN CLR CLR

Introduced another data hazard in Decode stage

Early Branch Resolution

slide-98
SLIDE 98

Chapter 7 <98>

Time (cycles)

beq $t1, $t2, 40

RF $t2 $t1 RF

  • DM

RF $s1 $s0 RF

&

DM

and $t0, $s0, $s1

  • r $t1, $s4, $s0

sub $t2, $s0, $s5 1 2 3 4 5 6 7 8

and IM IM lw

20 24 28 2C 30 ... ... 9

Flush this instruction

64 slt $t3, $s2, $s3

RF $s3 $s2 RF $t3

slt

DM IM slt

Early Branch Resolution

slide-99
SLIDE 99

Chapter 7 <99>

EqualD SignImmE

CLK A RD Instruction Memory

+

4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 A RD Data Memory WD WE 1

PCF

1

PC' InstrD

25:21 20:16 15:0 5:0

SrcBE

25:21 15:11

RsE RdE

<<2

+

ALUOutM ALUOutW ReadDataW WriteDataE WriteDataM SrcAE PCPlus4D PCBranchD WriteRegM4:0 ResultW PCPlus4F

31:26

RegDstD BranchD MemWriteD MemtoRegD ALUControlD2:0 ALUSrcD RegWriteD Op Funct

Control Unit

PCSrcD

CLK CLK CLK CLK CLK

WriteRegW4:0 ALUControlE2:0

ALU

RegWriteE RegWriteM RegWriteW MemtoRegE MemtoRegM MemtoRegW MemWriteE MemWriteM RegDstE ALUSrcE WriteRegE4:0 00 01 10 00 01 10

1 1

=

SignImmD StallF StallD ForwardAE ForwardBE ForwardAD ForwardBD

20:16

RtE RsD RdD RtD RegWriteE RegWriteM RegWriteW MemtoRegE BranchD

Hazard Unit

FlushE

EN EN CLR CLR

Handling Data & Control Hazards

slide-100
SLIDE 100

Chapter 7 <100>

  • Forwarding logic:

ForwardAD = (rsD !=0) AND (rsD == WriteRegM) AND RegWriteM ForwardBD = (rtD !=0) AND (rtD == WriteRegM) AND RegWriteM

  • Stalling logic:

branchstall = BranchD AND [RegWriteE AND ((WriteRegE == rsD) OR (WriteRegE == rtD)) OR [MemtoRegM AND ((WriteRegM == rsD) OR (WriteRegM == rtD))] StallF = StallD = FlushE = lwstall OR branchstall

Control Forwarding & Stalling Logic

slide-101
SLIDE 101

Chapter 7 <101>

  • Guess whether branch will be taken

– Backward branches are usually taken (loops) – Consider history to improve guess

  • Good prediction reduces fraction of branches

requiring a flush

Branch Prediction

slide-102
SLIDE 102

Chapter 7 <102>

  • SPECINT2000 benchmark:

– 25% loads – 10% stores – 11% branches – 2% jumps – 52% R-type

  • Suppose:

– 40% of loads used by next instruction – 25% of branches mispredicted – All jumps flush next instruction

  • What is the average CPI?

Pipelined Performance Example

slide-103
SLIDE 103

Chapter 7 <103>

  • SPECINT2000 benchmark:

– 25% loads – 10% stores – 11% branches – 2% jumps – 52% R-type

  • Suppose:

– 40% of loads used by next instruction – 25% of branches mispredicted – All jumps flush next instruction

  • What is the average CPI?

– Load/Branch CPI = 1 when no stalling, 2 when stalling – CPIlw = 1(0.6) + 2(0.4) = 1.4 – CPIbeq = 1(0.75) + 2(0.25) = 1.25 Average CPI = (0.25)(1.4) + (0.1)(1) + (0.11)(1.25) + (0.02)(2) + (0.52)(1) = 1.15

Pipelined Performance Example

slide-104
SLIDE 104

Chapter 7 <104>

  • Pipelined processor critical path:

Tc = max { tpcq + tmem + tsetup 2(tRFread + tmux + teq + tAND + tmux + tsetup ) tpcq + tmux + tmux + tALU + tsetup tpcq + tmemwrite + tsetup 2(tpcq + tmux + tRFwrite) }

Pipelined Performance

slide-105
SLIDE 105

Chapter 7 <105>

Element Parameter Delay (ps)

Register clock-to-Q tpcq_PC 30 Register setup tsetup 20 Multiplexer tmux 25 ALU tALU 200 Memory read tmem 250 Register file read tRFread 150 Register file setup tRFsetup 20 Equality comparator teq 40 AND gate tAND 15 Memory write tmemwrite 220 Register file write tRFwrite 100

Tc = 2(tRFread + tmux + teq + tAND + tmux + tsetup ) = 2[150 + 25 + 40 + 15 + 25 + 20] ps = 550 ps

Pipelined Performance Example

slide-106
SLIDE 106

Chapter 7 <106>

Program with 100 billion instructions Execution Time = (# instructions) × CPI × Tc = (100 × 109)(1.15)(550 × 10-12) = 63 seconds

Pipelined Performance Example

slide-107
SLIDE 107

Chapter 7 <107>

Processor Execution Time (seconds) Speedup (single-cycle as baseline) Single-cycle 92.5 1 Multicycle 133 0.70 Pipelined 63 1.47

Processor Performance Comparison

slide-108
SLIDE 108

Chapter 7 <108>

  • Unscheduled function call to exception handler
  • Caused by:

– Hardware, also called an interrupt, e.g. keyboard – Software, also called traps, e.g. undefined instruction

  • When exception occurs, the processor:

– Records cause of exception (Cause register) – Jumps to exception handler (0x80000180) – Returns to program (EPC register)

Review: Exceptions

slide-109
SLIDE 109

Chapter 7 <109>

Example Exception

slide-110
SLIDE 110

Chapter 7 <110>

  • Not part of register file

– Cause

  • Records cause of exception
  • Coprocessor 0 register 13

– EPC (Exception PC)

  • Records PC where exception occurred
  • Coprocessor 0 register 14
  • Move from Coprocessor 0

– mfc0 $t0, Cause – Moves contents of Cause into $t0

00000 $t0 (8) Cause (13) 00000000000

mfc0

31:26 25:21 20:16 15:11 10:0 010000

Exception Registers

slide-111
SLIDE 111

Chapter 7 <111>

Exception Cause

Hardware Interrupt 0x00000000 System Call 0x00000020 Breakpoint / Divide by 0 0x00000024 Undefined Instruction 0x00000028 Arithmetic Overflow 0x00000030 Extend multicycle MIPS processor to handle last two types of exceptions

Exception Causes

slide-112
SLIDE 112

Chapter 7 <112>

SignImm CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 PC 1 PC' Instr

25:21 20:16 15:0

SrcB

20:16 15:11

<<2 ALUResult SrcA ALUOut RegDst Branch MemWrite MemtoReg ALUSrcA RegWrite Zero PCSrc1:0 CLK ALUControl2:0

ALU

WD WE CLK Adr 1 Data CLK CLK A B 00 01 10 11 4 CLK

EN EN

ALUSrcB1:0 IRWrite IorD PCWrite PCEn <<2

25:0 (jump) 31:28 27:0

PCJump 00 01 10 11

0x8000 0180

Overflow CLK

EN

EPCWrite CLK

EN

CauseWrite 1 IntCause

0x30 0x28

EPC Cause

Exception Hardware: EPC & Cause

slide-113
SLIDE 113

Chapter 7 <113>

IorD = 0 AluSrcA = 0 ALUSrcB = 01 ALUOp = 00 PCSrc = 00 IRWrite PCWrite ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 IorD = 1 RegDst = 1 MemtoReg = 00 RegWrite IorD = 1 MemWrite ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCSrc = 01 Branch Reset S0: Fetch S2: MemAdr S1: Decode S3: MemRead S5: MemWrite S6: Execute S7: ALU Writeback S8: Branch Op = LW

  • r

Op = SW Op = R-type Op = BEQ Op = LW Op = SW RegDst = 0 MemtoReg = 01 RegWrite S4: Mem Writeback ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 RegDst = 0 MemtoReg = 00 RegWrite Op = ADDI S9: ADDI Execute S10: ADDI Writeback PCSrc = 10 PCWrite Op = J S11: Jump Overflow Overflow S13: Overflow PCSrc = 11 PCWrite IntCause = 0 CauseWrite EPCWrite Op = others PCSrc = 11 PCWrite IntCause = 1 CauseWrite EPCWrite S12: Undefined RegDst = 0 Memtoreg = 10 RegWrite Op = mfc0 S14: MFC0

Control FSM with Exceptions

slide-114
SLIDE 114

Chapter 7 <114>

SignImm CLK A RD Instr / Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 1 1 PC 1 PC' Instr

25:21 20:16 15:0

SrcB

20:16 15:11

<<2 ALUResult SrcA ALUOut RegDst Branch MemWrite MemtoReg1:0 ALUSrcA RegWrite Zero PCSrc1:0 CLK ALUControl2:0

ALU

WD WE CLK Adr 00 01 Data CLK CLK A B 00 01 10 11 4 CLK

EN EN

ALUSrcB1:0 IRWrite IorD PCWrite PCEn <<2

25:0 (jump) 31:28 27:0

PCJump 00 01 10 11

0x8000 0180

CLK

EN

EPCWrite CLK

EN

CauseWrite 1 IntCause

0x30 0x28

EPC Cause Overflow ...

01101 01110

...

15:11

10 C0

Exception Hardware: mfc0

slide-115
SLIDE 115

Chapter 7 <115>

  • Deep Pipelining
  • Branch Prediction
  • Superscalar Processors
  • Out of Order Processors
  • Register Renaming
  • SIMD
  • Multithreading
  • Multiprocessors

Advanced Microarchitecture

slide-116
SLIDE 116

Chapter 7 <116>

  • 10-20 stages typical
  • Number of stages limited by:

– Pipeline hazards – Sequencing overhead – Power – Cost

Deep Pipelining

slide-117
SLIDE 117

Chapter 7 <117>

  • Ideal pipelined processor: CPI = 1
  • Branch misprediction increases CPI
  • Static branch prediction:

– Check direction of branch (forward or backward) – If backward, predict taken – Else, predict not taken

  • Dynamic branch prediction:

– Keep history of last (several hundred) branches in branch target buffer, record:

  • Branch destination
  • Whether branch was taken

Branch Prediction

slide-118
SLIDE 118

Chapter 7 <118>

add $s1, $0, $0 # sum = 0 add $s0, $0, $0 # i = 0 addi $t0, $0, 10 # $t0 = 10 for: beq $s0, $t0, done # if i == 10, branch add $s1, $s1, $s0 # sum = sum + i addi $s0, $s0, 1 # increment i j for done:

Branch Prediction Example

slide-119
SLIDE 119

Chapter 7 <119>

  • Remembers whether branch was taken the

last time and does the same thing

  • Mispredicts first and last branch of loop

1-Bit Branch Predictor

slide-120
SLIDE 120

Chapter 7 <120>

Only mispredicts last branch of loop

strongly taken predict taken weakly taken predict taken weakly not taken predict not taken strongly not taken predict not taken taken taken taken taken taken taken taken taken

2-Bit Branch Predictor

slide-121
SLIDE 121

Chapter 7 <121>

  • Multiple copies of datapath execute multiple

instructions at once

  • Dependencies make it tricky to issue multiple

instructions at once

CLK CLK CLK CLK A RD

A1 A2 RD1 A3 WD3 WD6 A4 A5 A6 RD4 RD2 RD5

Instruction Memory Register File Data Memory

ALUs

PC CLK A1 A2 WD1 WD2 RD1 RD2

Superscalar

slide-122
SLIDE 122

Chapter 7 <122>

lw $t0, 40($s0) add $t1, $t0, $s1 sub $t0, $s2, $s3 Ideal IPC: 2 and $t2, $s4, $t0 Actual IPC: 2

  • r $t3, $s5, $s6

sw $s7, 80($t3)

Time (cycles)

1 2 3 4 5 6 7 8

RF 40 $s0 RF $t0

+

DM IM lw add

lw $t0, 40($s0) add $t1, $s1, $s2 sub $t2, $s1, $s3 and $t3, $s3, $s4

  • r $t4, $s1, $s5

sw $s5, 80($s0)

$t1 $s2 $s1

+

RF $s3 $s1 RF $t2

  • DM

IM sub and $t3 $s4 $s3

&

RF $s5 $s1 RF $t4

|

DM IM

  • r

sw 80 $s0

+

$s5

Superscalar Example

slide-123
SLIDE 123

Chapter 7 <123>

lw $t0, 40($s0) add $t1, $t0, $s1 sub $t0, $s2, $s3 Ideal IPC: 2 and $t2, $s4, $t0 Actual IPC: 6/5 = 1.2

  • r

$t3, $s5, $s6 sw $s7, 80($t3)

Stall

Time (cycles)

1 2 3 4 5 6 7 8

RF 40 $s0 RF $t0

+

DM IM lw

lw $t0, 40($s0) add $t1, $t0, $s1 sub $t0, $s2, $s3 and $t2, $s4, $t0 sw $s7, 80($t3)

RF $s1 $t0 add RF $s1 $t0 RF $t1

+

DM RF $t0 $s4 RF $t2

&

DM IM and IM

  • r

and sub

|

$s6 $s5 $t3 RF 80 $t3 RF

+

DM sw IM $s7

9

$s3 $s2 $s3 $s2

  • $t0
  • r
  • r $t3, $s5, $s6

IM

Superscalar with Dependencies

slide-124
SLIDE 124

Chapter 7 <124>

  • Looks ahead across multiple instructions
  • Issues as many instructions as possible at once
  • Issues instructions out of order (as long as no

dependencies)

  • Dependencies:

– RAW (read after write): one instruction writes, later instruction reads a register – WAR (write after read): one instruction reads, later instruction writes a register – WAW (write after write): one instruction writes, later instruction writes a register

Out of Order Processor

slide-125
SLIDE 125

Chapter 7 <125>

  • Instruction level parallelism (ILP): number
  • f instruction that can be issued

simultaneously (average < 3)

  • Scoreboard: table that keeps track of:

–Instructions waiting to issue –Available functional units –Dependencies

Out of Order Processor

slide-126
SLIDE 126

Chapter 7 <126>

lw $t0, 40($s0) add $t1, $t0, $s1 sub $t0, $s2, $s3 Ideal IPC: 2 and $t2, $s4, $t0 Actual IPC: 6/4 = 1.5

  • r

$t3, $s5, $s6 sw $s7, 80($t3)

Time (cycles)

1 2 3 4 5 6 7 8

RF 40 $s0 RF $t0

+

DM IM lw

lw $t0, 40($s0) add $t1, $t0, $s1 sub $t0, $s2, $s3 and $t2, $s4, $t0 sw $s7, 80($t3)

  • r

|

$s6 $s5 $t3 RF 80 $t3 RF

+

DM sw $s7

  • r $t3, $s5, $s6

IM RF $s1 $t0 RF $t1

+

DM IM add sub

  • $s3

$s2 $t0

two cycle latency between load and use of $t0 RAW WAR RAW

RF $t0 $s4 RF

&

DM and IM $t2

RAW

Out of Order Processor Example

slide-127
SLIDE 127

Chapter 7 <127>

Time (cycles)

1 2 3 4 5 6 7

RF 40 $s0 RF $t0

+

DM IM lw

lw $t0, 40($s0) add $t1, $t0, $s1 sub $r0, $s2, $s3 and $t2, $s4, $r0 sw $s7, 80($t3)

sub

  • $s3

$s2 $r0 RF $r0 $s4 RF

&

DM and $s7

  • r $t3, $s5, $s6

IM RF $s1 $t0 RF $t1

+

DM IM add sw

+

80 $t3

RAW

$s6 $s5

|

  • r

2-cycle RAW RAW

$t2 $t3

lw $t0, 40($s0) add $t1, $t0, $s1 sub $t0, $s2, $s3 Ideal IPC: 2 and $t2, $s4, $t0 Actual IPC: 6/3 = 2

  • r $t3, $s5, $s6

sw $s7, 80($t3)

Register Renaming

slide-128
SLIDE 128

Chapter 7 <128>

  • Single Instruction Multiple Data (SIMD)

– Single instruction acts on multiple pieces of data at once – Common application: graphics – Perform short arithmetic operations (also called packed arithmetic)

  • For example, add four 8-bit elements

padd8 $s2, $s0, $s1 a0

7 8 15 16 23 24 32 Bit position

$s0 a1 a2 a3 b0 $s1 b1 b2 b3 a0 + b0 $s2 a1 + b1 a2 + b2 a3 + b3 +

SIMD

slide-129
SLIDE 129

Chapter 7 <129>

  • Multithreading

– Wordprocessor: thread for typing, spell checking, printing

  • Multiprocessors

– Multiple processors (cores) on a single chip

Advanced Architecture Techniques

slide-130
SLIDE 130

Chapter 7 <130>

  • Process: program running on a computer

– Multiple processes can run at once: e.g., surfing Web, playing music, writing a paper

  • Thread: part of a program

– Each process has multiple threads: e.g., a word processor may have threads for typing, spell checking, printing

Threading: Definitions

slide-131
SLIDE 131

Chapter 7 <131>

  • One thread runs at once
  • When one thread stalls (for example, waiting

for memory):

– Architectural state of that thread stored – Architectural state of waiting thread loaded into processor and it runs – Called context switching

  • Appears to user like all threads running

simultaneously

Threads in Conventional Processor

slide-132
SLIDE 132

Chapter 7 <132>

  • Multiple copies of architectural state
  • Multiple threads active at once:

– When one thread stalls, another runs immediately – If one thread can’t keep all execution units busy, another thread can use them

  • Does not increase instruction-level parallelism

(ILP) of single thread, but increases throughput Intel calls this “hyperthreading”

Multithreading

slide-133
SLIDE 133

Chapter 7 <133>

  • Multiple processors (cores) with a method of

communication between them

  • Types:

– Homogeneous: multiple cores with shared memory – Heterogeneous: separate cores for different tasks (for example, DSP and CPU in cell phone) – Clusters: each core has own memory system

Multiprocessors

slide-134
SLIDE 134

Chapter 7 <134>

  • Patterson & Hennessy’s: Computer

Architecture: A Quantitative Approach

  • Conferences:

– www.cs.wisc.edu/~arch/www/ – ISCA (International Symposium on Computer Architecture) – HPCA (International Symposium on High Performance Computer Architecture)

Other Resources