Registered(Datapath DFFs are rising edge triggered D D LOGIC F - - PDF document

registered datapath
SMART_READER_LITE
LIVE PREVIEW

Registered(Datapath DFFs are rising edge triggered D D LOGIC F - - PDF document

Speeding(up(the(Clock The( register'to'register delay(is(usually(the( delay(path(that(sets(the(maximum(clock(rate From(a(design(point(of(view,(can(only(modify( the( combinational/logic between(the(registers


slide-1
SLIDE 1

Speeding(up(the(Clock

  • The(register'to'register delay(is(usually(the(

delay(path(that(sets(the(maximum(clock(rate

  • From(a(design(point(of(view,(can(only(modify(

the(combinational/logic between(the(registers

– Need(to(shorten(the(maximum(combinational(delay( path – Setup/Hold(time(of(registers(are(fixed

  • Can(shorten(the(delay(by(placing(a(register(in(

the(combinational(logic(to(break(longest(delay( path

– This(technique(is(called(pipelining – Adds(latency to(the(output((the(number(of(clocks( between(an(input(value(and(its(corresponding(

  • utput(result)

Registered(Datapath

D F F LOGIC D F F tpd Clk Freq = 1/ (Tclk2q + Tpd + Tsu)

InA InB InC InD OutA OutB OutC

DFFs are rising edge triggered

Latency = 1 clk

slide-2
SLIDE 2

Add(a(pipeline(stage

D F F

LOGIC

D F F

Tpd/2

Clk Freq = 1/ (Tclk2q + Tpd/2 + Tsu)

InA InB InC InD OutA OutB

D F F

Tpd/2

LOGIC

Latency = 2 clks

Definitions

  • Initiation'Rate I Rate(at(which(new(input(

values(are(accepted

– Rate(at(which(new(computations(are(initiated – Minimum(initiation(rate(=(1

  • Latency

– Number(of(clock/cycles between(input(value(and(

  • utput(value

– Adding(pipeline(stages(always(increases(latency

  • At(some(point,(adding(more(pipeline(stages(

does(not(increase(clock(frequency(because( Tclk2q(and(Tsu(dominate(delay.

slide-3
SLIDE 3

Pipeline(Example

D F F D F F

8-bit add

CO CI

8-bit add

CO CI

8-bit add

CO CI

8-bit add

CO 32 32 32

D F F

32 32

32Ibit(ripple(carry(adder.(( Longest(delay(path(in(carry( chain(from(byte0(to(byte3 Sum A B

32 byte0 byte1 byte2 byte3

Can(I(use(pipelining(to( speed(this(up?

Insert(pipeline(stage(between(byte1(and(byte2

8-bit add

CO CI

8-bit add

CO CI

8-bit add

CO CI

8-bit add

CO 16 16

Sum (low 16 bits)

A

16 byte0 byte1 byte2 byte3 16

B

16

D F F

D F F

16 16

A

16

B D F F

16

(low 16 bits) (high 16 bits)

D F F D F F

16 16 16

Sum (low 16 bits) Sum (high 16 bits)

WILL/THIS/WORK/PROPERLY?

slide-4
SLIDE 4

Insert(pipeline(stage(between(byte1(and(byte2

8-bit add

CO CI

8-bit add

CO CI

8-bit add

CO CI

8-bit add

CO 16 16

Sum (low 16 bits)

A

16 byte0 byte1 byte2 byte3 16

B

16

D F F

D F F

16 16

A

16

B D F F

16

(low 16 bits) (high 16 bits)

D F F D F F

16 16 16

Sum (low 16 bits) Sum (high 16 bits)

CLOCK/CYCLE/1

Insert(pipeline(stage(between(byte1(and(byte2

8-bit add

CO CI

8-bit add

CO CI

8-bit add

CO CI

8-bit add

CO 16 16

Sum (low 16 bits)

A

16 byte0 byte1 byte2 byte3 16

B

16

D F F

D F F

16 16

A

16

B D F F

16

(low 16 bits) (high 16 bits)

D F F D F F

16 16 16

Sum (low 16 bits) Sum (high 16 bits)

CLOCK/CYCLE/2 High/16'bits/ registered Using/OLD/CI/data

slide-5
SLIDE 5

Insert(pipeline(stage(between(byte1(and(byte2

D F F D F F

8-bit add

CO CI

8-bit add

CO CI

8-bit add

CO CI

8-bit add

CO 16 16 16

Sum (low 16 bits)

A

16 byte0 byte1 byte2 byte3 16

B

16

D F F

D F F

16 16

A

16

B D F F

16 16

(low 16 bits) (high 16 bits)

D F F D F F

16 16 16

Sum (low 16 bits) Sum (high 16 bits)

Delay/all/Inputs/to/High/Order/16'bits/Same/Amount Equalize/Delay/for LS/16'bits/of/Sum

Insert(pipeline(stage(between(byte1(and(byte2

D F F D F F

8-bit add

CO CI

8-bit add

CO CI

8-bit add

CO CI

8-bit add

CO 16 16 16

Sum (low 16 bits)

A

16 byte0 byte1 byte2 byte3 16

B

16

D F F

D F F

16 16

A

16

B D F F

16 16

(low 16 bits) (high 16 bits)

D F F D F F

16 16 16

Sum (low 16 bits) Sum (high 16 bits)

slide-6
SLIDE 6

Comments(on(Pipeline(Example

  • Note(that(the(pipeline(stage(broke(the(carry(

chain(into(two(equal(paths

– Each(pipeline(stage(should(have(approximately(the( same(combinational(delay – Clock(speed(will(be(set(by(the(delay(of(the(slowest( pipeline(stage

  • If(I(inserted(2(pipeline(stages,(I(would(need(to(

break(the(carry(chain(delay(into(equal(thirds

  • Could(insert(a(pipeline(stage(between(each(

BIT(in(order(to(get(maximum(clock(speed

– Called(‘bit/pipelining’

Latency(Tolerance

  • Latency(tolerance(is(dependent(upon(each(

application

  • Frequent(flushing(of(a(pipeline((discarding(

partial(results(within(the(pipeline(and(restarting( the(pipeline(with(a(new(value)((wastes(time( and(makes(an(application(latency(intolerant.

  • Flushing(of(a(pipeline(introduces(clock(cycles(

in(which(the(results(coming(out(of(the(pipeline( are(ignored(II these(are(wasted(clock(cycles.

  • High(Latency(tolerance(means(that(you(can(

have(many(pipeline(stages,(whatever(the( number(you(need(to(meet(the(clock(rate( specification.

slide-7
SLIDE 7

Two(Applications

  • Graphics(hardware(for(processing(pixels(is(

extremely(latency(tolerant(I not(unusual(to(find( pipelines(that(have(10’s(of(stages.

– Graphics(pipelines(are(never(flushed – High(clock(rate(is(EXTREMELY important(because(of( large(number(of(pixels((>(1(Million)(that(have(to(be( supplied(every(screen,(at(>(30(updates(per(second

  • Microprocessor(instruction(pipelines(are(not(very(

latency(tolerant I most(CPU(pipelines(are(only( about(5I10(stages.

– Branch(instructions(can(cause(pipeline(to(be(flushed.(( By(the(time(you(determine(direction(of(branch,(may( have(started(processing(instructions(that(should(not(be( in(the(pipeline.((These(are(flushed(and(the(pipeline( restarted. MULT 1 - F SAT ADDER R E G R E G R E G 2/1 Mux A B F MULT 2/1 Mux A * F B* (1-F) R E G MSB

  • f 1-F

MSB

  • f F

BLEND(Datapath((without(Pipelining.(( MULT(is(combinational

slide-8
SLIDE 8

LPM_Mult(can(be(pipelined(via(LPM_PIPELINE(parameter.( Add(one(pipeline(stage(to(LPM(Mult.((DFFs(inserted( automatically(in(LPM_MULT.(We(now(have(a(LATENCY mismatch(within(our(datapath!!!!( MULT 1 - F SAT ADDER R E G R E G R E G 2/1 Mux A B F MULT 2/1 Mux A * F B* (1-F) R E G MSB

  • f 1-F

MSB

  • f F

D F F D F F

1 clk latency 0 clk latency Correct(the(latency(mismatch(by(adding(DFFs(in(other(path( as(well.((May(have(to(break(delay(paths(in(other(places(or( add(additional(pipeline(stages(to(LPM_mult(to(meet(clock( frequency(target. MULT 1 - F SAT ADDER R E G R E G R E G 2/1 Mux A B F MULT 2/1 Mux A * F B* (1-F) R E G MSB

  • f 1-F

MSB

  • f F

D F F D F F

1 clk latency 0 clk latency

D F F D F F