Day 2 VLSI Microprocessor Design Flow Session A: Circuit design - PDF document

Day 2 VLSI Microprocessor Design Flow Session A: Circuit design styles Break Session B: Design paths Lunch Session C: Verification Break Session D: Manufacture, fabrication testing, packaging Today Organized Bottom-Up Circuit design style Full-custom design path Standard cell design path RTL design Verification strategy Packaging Manufacture & testing Important: real designs proceed at all levels simultaneously

T0 Circuit Design Style Datapaths and memories Control logic Full-custom layout Standard cells Regular structures Irregular structures Most of the die area Most of the complexity Few design bugs Most of the design bugs Mostly hand-specified procedural layout and routing (some hand layout Placed and routed automatically and routing) Sometimes exotic circuit designs Conservative static CMOS circuits (dynamic, self-timed) Typical design style for modern microprocessor T0 Die Breakdown Std. Cell Full-Custom

Global Design Style Decisions Extremely important: Clock methodology and latch design Power, ground, and clock distribution Must be settled early since these affect every circuit on the chip. T0 Clock and Latch Style Input clock signal at 2x on-chip frequency (e.g., 80MHz crystal for 40MHz Spert-II board) divided by 2 on-chip to guarantee 50% duty cycle. Clock buffered up, last stage drives single clock grid across entire chip, <1ns skew across chip, <500ps rise/fall time. Clock output pad to phase lock external circuitry to T0 clock. TSPC dynamic latches (T0 has minimum operating frequency). Also, some special pseudo-static load-enabled latches. Very similar to Alpha 21064 clocking strategy.

T0 Clock Distribution Clock Output Clock Grid (In reality hundreds of wires) Clock Buffer 2x Clock Input T0 Latch Style Standard-cell controller designed with edge-triggered flip-flops • Only negative edge-triggered flip-flops • Simpler for state machines • Simplifies synthesis timing specification • State stall handled with mux around flip-flop - no clock gating Full-custom datapaths and memories used transparent latches • p- and n- type latches transparent on clock low or high respectively • Can steal time across clock cycle boundaries • Can place latches in convenient place in signal flow to save area • Simplifies double-cycling (used in vector register file, some buses) • Special stallable n-latch (small area without clock gating) Designed library of latches verified to operate across all process corners with clock skew/rise/fall spec, and when placed in series with other latches.

T0 Power/Ground Distribution Half of all pins were power and ground (204/408) Chip-on-board packaging gave low-inductance path to board (~1nH per wire) Grid across whole chip in wide M1 and M2 strapped whereever possible. Required IR drop less than 5% of Vdd in middle of chip. On-chip gate oxide decoupling capacitors placed everywhere possible, especially under power rails. Enough bypass capacitance for <5% power bounce, even if power/ground wires open circuit for one cycle. T0 Power/Ground Distribution Bypass cap. under power Every other rails pad is power or ground Additional bypass cap. in empty space M2 Power Grid M1

T0 Custom Memories Instruction cache • 1KB storage + tags + valid • Classic 6T SRAM design • One port: differential write (128b) or differential read (32b) • 1 word line and 2 bit lines per bit cell • Special wire to clear all valid bits in one cycle for cache flush • Fast dynamic tag comparator built into tag sense amps - critical path Scalar Register File • 128B storage (32x4B registers) • Three ports: One differential write plus two single-ended reads • 3 word lines and 4 bit lines per bit cell Vector Register File (Trickiest piece of circuit design in T0) • 2KB storage (16x32x4B registers) • Eight ports: three diff. write on clock low, five single-end. read on clock high • Self-timed to generate all timing edges in one cycle • 5 word lines and 6 bit lines per bit cell T0 Datapath Design Style Select datapath pitch, tradeoff between: •wasted space for simple cells •crunched inefficient design for complex cells Vector unit has 72 λ bit pitch (late change from 80 λ to fit reticle). Scalar unit has 80 λ bit pitch. Decide on metal layer assignments. Data busses in Metal 1, control/clock/Vdd/GND in Metal 2. Roughly half of datapath bit pitch is used for busses passing by cell. Design library of datapath cells (mostly latches and muxes). Special cells created where needed (maybe 5% are special) Mostly static CMOS logic and static pass-transistor logic, some critical places use dynamic logic: • Adder carry-chains • Branch zero comparator • Saturation overflow comparators

T0 Datapath Latch Designs Latches mostly dynamic TSPC plus holders (a la 21064) 4 PHI D 14 14 12 12 4 14 14 12 12 X Q X 9 4 16 9 D Q 4 10 16 4 PHI 10 16 p-latch n-latch Special Psuedo-Static n-Latch PHI 4x4 D 80F 20 8 LENB X Q 20 8 8 LEN 8 4 8 4 Restrictive enable control line timing caused problems later

T0 Datapath Mux Designs Muxes n-pass-transistor with level restoring p-transistor: CSEL 4x4 8 C 4 6 BSEL 8 OUT 8 B 4 6 6 ASEL 8 A 4 3-input mux 6 Example Datapath Layout

T0 Standard Cell Designs Started with public domain library, but hand-inspected each cell and threw away/redesigned bad cells • Some cells had too many series transistors or bad output driver Changed every cell to have much wider power/ground rails • To avoid IR drop in middle of long standard cell row Added separate clock rail into every cell • Fits into overall clock gridding scheme • Ensures controlled skew on clock (don’t want clock auto-routed!) Designed our own standard cell flip-flops and latches • Connects to special clock rail - uses our clocking methodology • Latches used to synchronize with datapath signals Added greater variety of inverters and buffers • Existing buffers not big enough to drive loads on our chip • More flexibility for synthesis to trade area and delay T0 Pads Pad design is especially tricky Many esoteric device structures used to provide protection against latch up and ESD damage Obtained HP’s design guidelines under NDA Designed custom pads using most of HP’s recommendations for pad protection Pad output drivers used n-type pullup to reduce power consumption - output only swings to ~4V not 5V Separate power supply rings for output drivers and core logic

Summary T0 circuit design mostly conservative, low risk Robustness engineered into all cells and overall design Only a few tricks where big wins possible Fast dynamic datapath logic to shorten critical paths Double-pumped vector register file to save area Novel output drivers to reduce power Day 2, Session B: Design Paths Full-custom Standard cell Final global checks

Full-Custom Tools Pre-existing tools used: • Viewlogic schematic editor (commercial) • Magic layout editor and extraction (university) • HSpice circuit simulator (commercial) • CAzM table-driven circuit simulator (university, now commercial) • irsim switch-level simulator (university) • gemini layout versus schematic compare (university) • Dracula design rule checker (commercial) In-house tools: • flat SPICE netlist flattener/processor • tilem procedural layout generator Full-Custom Design Process Initial specification with high-level schematic plus verbal communication (most full-custom work done before RTL finished) Design loop: Viewlogic schematic design (functionality and transistor sizing) Timing simulations with HSpice Functionality simulations with irsim magic layout Extractions with magic (get real parasitics - feed back into schematic) Iterate until design goals met. Clock cycle initially fixed at <50MHz to prevent over optimization.

Example Viewlogic Schematic RSEL IBIT IBITB 4 4 6 6 8 8 BIT BITB (I-Cache SRAM bit) Example magic Layout (Two halves of SRAM cache bits)

Standard Cell Design Path Initial RTL (Register Transfer Level) in C++ Each RTL control block manually translated into BDS • BDS, a limited, combinational-circuit-only hardware description language bdsyn compiles BDS into blif (Berkeley Logic Interchange Format) blif optimized and synthesized into gates using sis Gate netlist input to TimberWolf place and route. Also, generate Viewlogic schematic from gate netlist. RTL Model RTL (Register Transfer Level) design in C++. RTL model is “golden reference” for whole T0 design. Models state in every latch on every clock phase. Ran at 1,500 cycles/second on Sparcstation-20/61. 100-1000 times faster than Verilog or VHDL RTL model. (More on RTL in next session)

BDS Blocks C++ RTL control logic was manually split into about 20 blocks that the synthesis tool could handle (by trial and error). Each control block manually translated into equivalent BDS. Example BDS code (piece of JTAG block): routine run_tdo; state tdo<7:0>; if tapcin<3> then tdo = regioin else if iregin<3> then tdo = regioin else tdo = memioin; tdob = not tdo; endroutine; Synthesis with sis Each BDS block was translated into logic equations in blif Also, had to create timing specs for each block. Optimized and synthesized by sis (Berkeley synthesis package) Two basic synthesis scripts created: • target minimal area • target minimal delay Some critical blocks were tuned with own custom synthesis scripts. Synthesis could sometimes take infinite time or infinite memory. => had to split blocks further or rewrite script.

Day 2 VLSI Microprocessor Design Flow Session A: Circuit design - PDF document

Day 2 VLSI Microprocessor Design Flow Session A: Circuit design styles Break Session B: Design paths Lunch Session C: Verification Break Session D: Manufacture, fabrication testing, packaging Today Organized Bottom-Up Circuit design style

At Creation Common Holy Day 1 Day 2 Day 8 Day 9 Day 3 Day 4 Day 5 Day 6 Day 7 7 Days The

Science with a Little Altitude | QS18 Fah Sathirapongsasuti, PhD EBC Everest Day 1 Day 2 Day

ENGLAND | APRIL 12 20, 2020 8 DAY TOU R SUGGE STE D ITI N E R ARY* DAY 0 DAY 1 DAY 2

Day 1 Day 1 Staging area Buses & Ambulances In Use Day 1 Day 2 Days 2 & 3 Day 4

Introduction to R Day 4: Functions October 10, 2019 Agenda Day 1: Figures Day 2: Selecting,

Module 4 AFA CyberCamp Format Day T wo Day Three Day Four Day Five Day One Windows

Workflow 6 Touchpoints After First Visit Day 0 - Sunday Day 2 - Tuesday Day 6 -

Summer School Overview Day 0: R bootcamp Day 1: Workflow, Google App Engine Day 2:

2014 Investor Day DECEMBER 10, 2014 5 | 2014 INVESTOR DAY | 2014 INVESTOR DAY Welcome MARK

BJC BJC BJC BJC Opportunity Day Opportunity Day 4Q09 Opportunity Day Opportunity Day

CACTM Patricia Arizmendi Garcia LauraCalleja Diez Fernando Carmona Mateos Andrea Magn

Europe 2014 A Million Heartbeats by rt Ahlin 2 Who am I? 3 What do I do? 4 Why QS? 5

2020 Effective Mentoring Program Combined Program (School and Early Childhood) Day 2 1 2020 SB

King, Jr. Day Remember! Celebrate! Act! A Day On, Not A Day Off! January 18, 2016 Dr. Martin

KINDERGARTEN EXPERIENCE How will the full-day kindergarten differ from the half-day

In the United State we have celebrated Independence Day and Presidents Day since the 1870s.

System-level Exploration of Dynamical Clusteration for Adaptive Power Management in

Linear Cryptanalysis of Stream Ciphers T-79.514 Special Course on Cryptology Seminar talk Emilia

SWEN 563 STM32 CubeMX Aux Development Tools Parametric searching vendor lists for peripherals/

USCMS HCAL USCMS HCAL USCMS HCAL TriDAS Update Drew Baden University of Maryland

Cavity/Muon timing Need 1. Cavity phase and amplitude measurement. 2. Cavity phase for each

Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and

Registered(Datapath DFFs are rising edge triggered D D LOGIC F F tpd F F Clk Freq = 1/

Clock Routing Problem Formulation Specialized algorithms are required for clock (and power

Day 2 VLSI Microprocessor Design Flow Session A: Circuit design - PDF document

Day 2 VLSI Microprocessor Design Flow Session A: Circuit design styles Break Session B: Design paths Lunch Session C: Verification Break Session D: Manufacture, fabrication testing, packaging Today Organized Bottom-Up Circuit design style

At Creation Common Holy Day 1 Day 2 Day 8 Day 9 Day 3 Day 4 Day 5 Day 6 Day 7 7 Days The

Science with a Little Altitude | QS18 Fah Sathirapongsasuti, PhD EBC Everest Day 1 Day 2 Day

ENGLAND | APRIL 12 20, 2020 8 DAY TOU R SUGGE STE D ITI N E R ARY* DAY 0 DAY 1 DAY 2

Day 1 Day 1 Staging area Buses &amp; Ambulances In Use Day 1 Day 2 Days 2 &amp; 3 Day 4

Introduction to R Day 4: Functions October 10, 2019 Agenda Day 1: Figures Day 2: Selecting,

Module 4 AFA CyberCamp Format Day T wo Day Three Day Four Day Five Day One Windows

Workflow 6 Touchpoints After First Visit Day 0 - Sunday Day 2 - Tuesday Day 6 -

Summer School Overview Day 0: R bootcamp Day 1: Workflow, Google App Engine Day 2:

2014 Investor Day DECEMBER 10, 2014 5 | 2014 INVESTOR DAY | 2014 INVESTOR DAY Welcome MARK

BJC BJC BJC BJC Opportunity Day Opportunity Day 4Q09 Opportunity Day Opportunity Day

CACTM Patricia Arizmendi Garcia LauraCalleja Diez Fernando Carmona Mateos Andrea Magn

Europe 2014 A Million Heartbeats by rt Ahlin 2 Who am I? 3 What do I do? 4 Why QS? 5

2020 Effective Mentoring Program Combined Program (School and Early Childhood) Day 2 1 2020 SB

King, Jr. Day Remember! Celebrate! Act! A Day On, Not A Day Off! January 18, 2016 Dr. Martin

KINDERGARTEN EXPERIENCE How will the full-day kindergarten differ from the half-day

In the United State we have celebrated Independence Day and Presidents Day since the 1870s.

System-level Exploration of Dynamical Clusteration for Adaptive Power Management in

Linear Cryptanalysis of Stream Ciphers T-79.514 Special Course on Cryptology Seminar talk Emilia

SWEN 563 STM32 CubeMX Aux Development Tools Parametric searching vendor lists for peripherals/

USCMS HCAL USCMS HCAL USCMS HCAL TriDAS Update Drew Baden University of Maryland

Cavity/Muon timing Need 1. Cavity phase and amplitude measurement. 2. Cavity phase for each

Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and

Registered(Datapath DFFs are rising edge triggered D D LOGIC F F tpd F F Clk Freq = 1/

Clock Routing Problem Formulation Specialized algorithms are required for clock (and power

Day 1 Day 1 Staging area Buses & Ambulances In Use Day 1 Day 2 Days 2 & 3 Day 4