Pipelining 1 Today Quiz Introduction to pipelining 2 Pipelining - - PowerPoint PPT Presentation

pipelining
SMART_READER_LITE
LIVE PREVIEW

Pipelining 1 Today Quiz Introduction to pipelining 2 Pipelining - - PowerPoint PPT Presentation

Pipelining 1 Today Quiz Introduction to pipelining 2 Pipelining L L a a Logic (10ns) t t c c 10ns L L a a Logic (10ns) t t c c 20ns L L a a Logic (10ns) t t c c 30ns Whats the latency for one unit of


slide-1
SLIDE 1

Pipelining

1

slide-2
SLIDE 2

Today

  • Quiz
  • Introduction to pipelining

2

slide-3
SLIDE 3

Pipelining

Logic (10ns)

L a t c L a t c

Logic (10ns)

L a t c L a t c

Logic (10ns)

L a t c L a t c

What’s the latency for one unit of work? What’s the throughput? 10ns 20ns 30ns

slide-4
SLIDE 4

Pipelining

1.Break up the logic with latches into “pipeline stages” 2.Each stage can act on different data 3.Latches hold the inputs to their stage 4.Every clock cycle data transfers from one pipe stage to the next

Logic (10ns) L a t c L a t c Logic(2ns) L a t c L a t c Logic(2ns) L a t c Logic(2ns) L a t c Logic(2ns) L a t c Logic(2ns) L a t c

slide-5
SLIDE 5

Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic

2 n s 4 n s 6 n s 8 n s 10 ns 12 nsWhat’s the latency for one unit of work? What’s the

slide-6
SLIDE 6
  • Critical path is the longest possible delay

between two registers in a design.

  • The critical path sets the cycle time, since the

cycle time must be long enough for a signal to traverse the critical path.

  • Lengthening or shortening non-critical paths

does not change performance

  • Ideally, all paths are about the same length

6

Critical path review

Logic

slide-7
SLIDE 7

7

Pipelining and Logic

Logic Logic Logic Logic Logic Logic

Logic Logic Logic Logic Logic Logic

  • Hopefully, critical path reduced by 1/3
slide-8
SLIDE 8
  • You cannot pipeline forever
  • Some logic cannot be pipelined arbitrarily -- Memories
  • Some logic is inconvenient to pipeline.
  • How do you insert a register in the middle of an adder?
  • Registers have a cost
  • They cost area -- choose “narrow points” in the logic
  • They cost time
  • Extra logic delay
  • Set-up and hold times.

Limits on Pipelining

8

Logic Logic Logic Logic Logic Logic

slide-9
SLIDE 9

Pipelining Overhead

  • Logic Delay (LD) -- How long does the logic take

(i.e., the useful part)

  • Set up time (ST) -- How long before the clock edge

do the inputs to a register need be ready?

  • Register delay (RD) -- Delay through the internals of

the register.

  • BaseCT -- cycle time before pipelining
  • BaseCT = LD + ST + RD.
  • Total delay = BaseCT
  • PipeCT -- cycle time after pipelining N times
  • PipeCT = ?
  • Total delay = ?

9

slide-10
SLIDE 10

Pipelining Overhead

  • Logic Delay (LD) -- How long does the logic take

(i.e., the useful part)

  • Set up time (ST) -- How long before the clock

edge do the inputs to a register need be ready?

  • Register delay (RD) -- Delay through the

internals of the register.

  • BaseCT -- cycle time before pipelining
  • BaseCT = LD + ST + RD.
  • PipeCT -- cycle time after pipelining N times
  • PipeCT = ST + RD + LD/N
  • Total time = N*ST + N*RD + LD

10

slide-11
SLIDE 11
  • You cannot put registers just anywhere
  • You may not have access to the internal of some block
  • Ex: memories
  • Balancing the path lengths is challenging
  • The there are many more potential critical paths in a

pipelined design.

Pipelining Difficulties

11

slide-12
SLIDE 12
  • The critical path only went down a bit.

Pipelining Difficulties

12

Fast Logic Slow Logic Slow Logic Fast Logic Fast Logic Slow Logic Slow Logic Fast Logic

slide-13
SLIDE 13

How to pipeline a processor

  • Break each instruction into pieces -- remember the

basic algorithm for execution

  • Fetch
  • Decode
  • Collect arguments
  • Execute
  • Write back results
  • Compute next PC
  • The “classic 5-stage MIPS pipeline”
  • Fetch -- read the instruction
  • Decode -- decode and read from the register file
  • Execute -- Perform arithmetic ops and address calculations
  • Memory -- access data memory.
  • Write back-- Store results in the register file.

13

slide-14
SLIDE 14

14

Pipelining a processor

EX Decode Fetch Mem Write back

EX Decode Fetch Mem Write back

slide-15
SLIDE 15
  • Break the processor into P pipe stages
  • What happens to latency?
  • L = Inst * CPI * CycleTime
  • The cycle time = ?
  • CPI = ?

15

Impact of Pipelining

slide-16
SLIDE 16
  • Break the processor into P pipe stages
  • What happens to latency?
  • L = Inst * CPI * CycleTime
  • The cycle time = CT/P
  • CPI = 1
  • CPI is an average: Cycles/instructions
  • When # of instructions is large, CPI = 1
  • If just one instruction, CPI = P

16

Impact of Pipelining

slide-17
SLIDE 17

Pipelined Datapath

Read Address

Instruc(on Memory

Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr

Register File

Read Data 1 Read Data 2 16 32 ALU Shi< le< 2 Add

Data Memory

Address Write Data Read Data Sign Extend

slide-18
SLIDE 18

Pipelined Datapath

Read Address

Instruc(on Memory

Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr

Register File

Read Data 1 Read Data 2 16 32 ALU Shi< le< 2 Add

Data Memory

Address Write Data Read Data IFetch/Dec Dec/Exec Exec/Mem Mem/WB Sign Extend

slide-19
SLIDE 19

Pipelined Datapath

Read Address

Instruc(on Memory

Add 4 Write Data Read Addr 1 Read Addr 2 Write Addr

Register File

Read Data 1 Read Data 2 16 32 ALU Shi< le< 2 Add

Data Memory

Address Write Data Read Data Sign Extend

add … lw … Sub… Sub …. Add … Add …

slide-20
SLIDE 20

Pipelined Datapath

Read Address

Instruc(on Memory

Add 4 Write Data Read Addr 1 Read Addr 2 Write Addr

Register File

Read Data 1 Read Data 2 16 32 ALU Shi< le< 2 Add

Data Memory

Address Write Data Read Data Sign Extend

add … lw … Sub… Sub …. Add … Add …

slide-21
SLIDE 21

Pipelined Datapath

Read Address

Instruc(on Memory

Add 4 Write Data Read Addr 1 Read Addr 2 Write Addr

Register File

Read Data 1 Read Data 2 16 32 ALU Shi< le< 2 Add

Data Memory

Address Write Data Read Data Sign Extend

add … lw … Sub… Sub …. Add … Add …

slide-22
SLIDE 22

Pipelined Datapath

Read Address

Instruc(on Memory

Add 4 Write Data Read Addr 1 Read Addr 2 Write Addr

Register File

Read Data 1 Read Data 2 16 32 ALU Shi< le< 2 Add

Data Memory

Address Write Data Read Data Sign Extend

add … lw … Sub… Sub …. Add … Add …

slide-23
SLIDE 23

Pipelined Datapath

Read Address

Instruc(on Memory

Add 4 Write Data Read Addr 1 Read Addr 2 Write Addr

Register File

Read Data 1 Read Data 2 16 32 ALU Shi< le< 2 Add

Data Memory

Address Write Data Read Data Sign Extend

add … lw … Subi… Sub …. Add … Add …

slide-24
SLIDE 24

Simple Pipelining Control

24

EX Decode Fetch Mem Write back Fetch Fetch EX Decode Fetch Mem Write back Fetch Fetch Fetch

  • Compute all the control bits in decode, then pass

them from stage to stage. It won’t stay this simple...

slide-25
SLIDE 25

Pipelining is Tricky

  • If all the data flows in one direction, pipelining is

relatively easy.

  • Not so, for processors.
  • Decode and write back both access the register file.
  • Branch instructions affect the next PC
  • Instructions need values computed by previous

instructions

25

slide-26
SLIDE 26

Not just tricky, Hazardous!

  • Hazards are situations where pipelining does not

work as elegantly as we would like

  • Caused by backward flowing signals
  • Or by lack of available hardware
  • Three kinds
  • Data hazards -- an input is not available on the cycle it is

needed

  • Control hazards -- the next instruction is not known
  • Structural hazards -- we have run out of a hardware

resource

  • Detecting, avoiding, and recovering from these

hazards is what makes processor design hard.

  • That, and the Xilinx tools ;-)

26

slide-27
SLIDE 27

A Structural Hazard

  • Both the decode and write back stage have to

access the register file.

  • There is only one registers file. A structural

hazard!!

  • Solution: Write early, read late
  • Writes occur at the clock edge and complete long

before the end of the cycle

  • This leave enough time for the outputs to settle for the

reads.

  • Hazard avoided!
  • 27

EX Decode Fetch Mem Write back