Seongmoo Heo and Krste Asanovi MIT Laboratory for Computer Science - - PowerPoint PPT Presentation

seongmoo heo and krste asanovi mit laboratory for
SMART_READER_LITE
LIVE PREVIEW

Seongmoo Heo and Krste Asanovi MIT Laboratory for Computer Science - - PowerPoint PPT Presentation

Seongmoo Heo and Krste Asanovi MIT Laboratory for Computer Science http://www.cag.lcs.mit.edu/scale WVLSI 2001 April 19, 2001 Motivation Flip-flops are one of the most important components in synchronous VLSI designs. o Critical


slide-1
SLIDE 1

Seongmoo Heo and Krste Asanovi MIT Laboratory for Computer Science http://www.cag.lcs.mit.edu/scale

WVLSI 2001 April 19, 2001

slide-2
SLIDE 2

Motivation

  • Flip-flops are one of the most important components in

synchronous VLSI designs.

  • Critical effect on cycle time
  • Large fraction of total system power
  • Previously published work has failed to consider the effect of

circuit loading on the relative ranking of flip-flop structures.

[Kawaguchi et al. ’98] [Ko and Balsara ’00] [Kong et al ’00] [Lang et al ’97] [Nikolic et al ’00] [Nogawa and Ohtomo ’98] [Stojanovic and Oklobdzija ’99] [Stollo et al ’00] [Yuan and Svensson ’91] [Zyuban and Kogge ’99] [H.P. et al ’96] [J.M. et al ’96]

  • Fixed and usually overly large output load
  • Large or non-specified input drive
  • No output buffering
slide-3
SLIDE 3

Observation

  • 1. Different flip-flop designs have different inherent parasitics and
  • utput drive strength.
  • Different number and complexity of logic gates
  • Different kinds of feedback
slide-4
SLIDE 4

Observation

  • 1. Different flip-flop designs have different inherent parasitics and
  • utput drive strength.
  • Different number and complexity of logic gates
  • Different kinds of feedback

D Q D Q D Q

slide-5
SLIDE 5

Observation

  • 2. Output loads in a circuit vary significantly.

Flip-flop output load instances in a microprocessor datapath

(A custom-designed 32-bit MIPS CPU in 0.25µm process) 20 120 60 40

# of instances

80 100 7.2fF (4 min inv gate cap) 28.8fF (16 min inv gate cap) 115.2fF (64 min inv gate cap) 1.8fF (min inv gate cap)

slide-6
SLIDE 6

Our Proposal

Load effects must be considered in flip-flop characterization to avoid sub-optimal selection.

  • We will present energy and delay measurements for various flip-

flops across a range of output loading conditions(EE and absolute load size) and show that the relative rankings of structures vary.

  • We will show that output buffering at high load can lead to the

better performance and energy consumption for some structures.

slide-7
SLIDE 7

Related Work

  • Traditional Buffer Sizing
  • Logical Effort [Sutherland and Sproull]
  • Logical Effort: drive strength of a circuit structure
  • Electrical Effort: the ratio of output load to input load
  • Delay = intrinsic parasitic delay + LE x EE
slide-8
SLIDE 8

Overview

  • Flip-Flop Designs
  • Test Bench & Simulation Setup
  • Delay and Energy Characterization
  • Delay Analysis
  • Energy-versus-Delay Analysis
  • Summary
slide-9
SLIDE 9

Flip-Flop Designs

Fully static and single-ended

[Nikolic et al ’00

slide-10
SLIDE 10

Test Bench

  • Sized clock buffer

to give equal rise/fall time

  • Used a fixed, realistic input driver
  • Varied output load from

4 min inv cap(7.2fF) to 64 min inv cap(115.2fF).

  • 4 Load and Drive Configurations
  • EE4-min: min input drive, 4 min inv load (7.2fF)
  • EE16-min: min input drive, 16 min inv load (28.8fF)
  • EE64-min: min input drive, 64 min inv load (115.2fF)
  • EE4-big: 16x min input drive, 64 min inv load (115.2fF)

FF

4 min inv cap 16 min inv cap 64 min inv cap

slide-11
SLIDE 11

Simulation Setup

  • 0.25 m TSMC CMOS process, Vdd=2.5V, T=25°C
  • Hspice Levenberg-Marquardt method was used for transistor size
  • ptimization.
  • Transistor widths optimized for each load and drive conf.

to give min delay or min energy for a given delay (transistor lengths were fixed at minimum.)

  • Parasitic capacitances included in the circuit netlists.
slide-12
SLIDE 12

Delay and Energy Characterization

  • Minimum D-Q delay [Stojanovic et al. ’99] (.Measure command)
  • Total energy = input energy + internal energy + clock energy

– output energy

  • A single test waveform with ungated clock and data toggling every cycle
  • For a full characterization of energy dissipation, more realistic

activity patterns should be considered [Heo, Krashinsky, Asanovic ARVLSI’01].

FF 4 min inv load 16 min inv load 64 min inv load

slide-13
SLIDE 13

Speed Ranking Without Buffering

  • Delay = const. intrinsic parasitic delay

+ output drive delay (= load size × driving capability)

  • Driving Capability = f(# of stages, complexity)

0.5 1.0 1.5 3.0 3.5 PPCFF SAFF MSAFF HLFF SSAPL

(Transistors sized at each load point, but only for min delay)

slide-14
SLIDE 14

Influence of Buffering on Performance

: unbuffered : one inverter : two inverters (Min. input drive was used.)

(Assuming no penalty for inverting output)

SSAPL HLFF MSAFF SAFF PPCFF

1.5 1 0.5

Delay (ns)

0 20 40 80

Load (min inv cap)

1.5 1 0.5 0 20 40 80

slide-15
SLIDE 15

Speed Ranking With Buffering Allowed

  • Less speed variation compared to original flip-flops

PPCFF SAFF MSAFF HLFF SSAPL 0.5 1.0 1.5 3.0 3.5

slide-16
SLIDE 16

Energy-Delay Curve : EE4-min EE4-min: min. drive + 4 min inv load(7.2fF)

Delay(ns) E n e r g y ( f J )

Each point sized for min energy for a given delay

slide-17
SLIDE 17

Energy-Delay Curve : EE4-min

E n e r g y ( f J ) Delay(ns)

PPCFF-unbuf EE4-min: min. drive + 4 min inv load(7.2fF)

slide-18
SLIDE 18

Energy-Delay Curve : EE16-min EE16-min: min. drive + 16 min inv load(28.8fF)

Delay(ns) E n e r g y ( f J )

SSAPL-unbuf

slide-19
SLIDE 19

E n e r g y ( f J ) Delay(ns)

SSAPL-buf EE16-min: min. drive + 16 min inv load(28.8fF) Energy-Delay Curve : EE16-min SSAPL-unbuf

slide-20
SLIDE 20

E n e r g y ( f J ) Delay(ns)

HLFF-unbuf EE16-min: min. drive + 16 min inv load(28.8fF) Energy-Delay Curve : EE16-min

slide-21
SLIDE 21

E n e r g y ( f J ) Delay(ns)

HLFF-buf EE16-min: min. drive + 16 min inv load(28.8fF) Energy-Delay Curve : EE16-min HLFF-unbuf

slide-22
SLIDE 22

E n e r g y ( f J ) Delay(ns)

PPCFF-unbuf

EE16-min: min. drive + 16 min inv load(28.8fF) Energy-Delay Curve : EE16-min

slide-23
SLIDE 23

EE64-min: min. drive + 64 min inv load(115.2fF)

E n e r g y ( f J ) Delay(ns)

MSAFF-unbuf Energy-Delay Curve : EE64-min

slide-24
SLIDE 24

E n e r g y ( f J ) Delay(ns)

HLFF-unbuf HLFF-buf EE64-min: min. drive + 64 min inv load(115.2fF) Energy-Delay Curve : EE64-min

slide-25
SLIDE 25

EE4-big: 16x min. drive + 64 min inv load(115.2fF)

E n e r g y ( f J ) Delay(ns)

Energy-Delay Curve : EE4-big

slide-26
SLIDE 26

Energy-Delay Curve : EE4-min vs EE4-big

EE4-min MSAFF-unbuf SAFF-unbuf PPCFF-unbuf EE4-big

slide-27
SLIDE 27

MSAFF-unbuf SAFF-unbuf PPCFF-unbuf PPCFF-unbuf MSAFF-unbuf SAFF-unbuf

Energy-Delay Curve : EE4-min vs EE4-big

EE4-min EE4-big

slide-28
SLIDE 28

Summary

  • Different flip-flops have different gains and parasitics.
  • Real VLSI designs exhibit a variety of flip-flop output loads.
  • The output load size affects the relative performance and energy

consumption of different flip-flop designs.

  • Therefore, output load effects should be accounted for when

comparing flip-flops.

  • 1. Electrical effort
  • 2. Absolute output load size
  • 3. Output buffering