Outline Outline Thermal Thermal- -Aware 3D IC Physical Design - - PDF document

outline outline
SMART_READER_LITE
LIVE PREVIEW

Outline Outline Thermal Thermal- -Aware 3D IC Physical Design - - PDF document

3D IC Design Tools and Applications to 3D IC Design Tools and Applications to Microarchitecture Exploration Microarchitecture Exploration Jason Cong Jason Cong UCLA Computer Science Department UCLA Computer Science Department


slide-1
SLIDE 1

Page

3D IC Design Tools and Applications to 3D IC Design Tools and Applications to Microarchitecture Exploration Microarchitecture Exploration

Jason Cong Jason Cong

UCLA Computer Science Department UCLA Computer Science Department cong@cs.ucla.edu cong@cs.ucla.edu http:// http://cadlab.cs.ucla.edu cadlab.cs.ucla.edu/~cong /~cong

2

Outline Outline

  • Thermal

Thermal-

  • Aware 3D IC Physical Design Flow

Aware 3D IC Physical Design Flow

  • Thermal Models and Assumptions

Thermal Models and Assumptions

  • 3D Routing with Thermal Via Planning

3D Routing with Thermal Via Planning

  • 3D Placement

3D Placement

  • 3D

3D Floorplanning Floorplanning

  • 3D Architecture Exploration

3D Architecture Exploration

  • 3D Component Modeling and Testing

3D Component Modeling and Testing

  • Concluding Remarks and Future Work

Concluding Remarks and Future Work

slide-2
SLIDE 2

Page

3

Thermal Challenges in 3 Thermal Challenges in 3-

  • D ICs

D ICs

  • Key Challenge of 3

Key Challenge of 3-

  • D IC

D IC Design: Design:

  • Higher power density

Higher power density

  • Inter

Inter-

  • layer dielectric

layer dielectric layers layers

  • High Temperature

High Temperature Effects: Effects:

  • Longer interconnect

Longer interconnect delays delays

  • Functional failure

Functional failure

Temperature increases dramatically along the z direction

Z

T

30oC 100oC 135oC Si 1 Si 2 Si 3 Si 4 150oC

Temperature distribution along z direction

4

3 3-

  • D IC Cooling Schemes

D IC Cooling Schemes

  • Heat Sink Optimization

Heat Sink Optimization

  • Air cooling fans

Air cooling fans

  • Heat radiating fins

Heat radiating fins

  • Thermal grease, AC, etc

Thermal grease, AC, etc..

..

  • Chip

Chip-

  • Level Temperature

Level Temperature Optimization Optimization

  • Microchannel

Microchannel cooling cooling

  • Floorplanning

Floorplanning

  • Routing

Routing

  • Thermal via insertion

Thermal via insertion

slide-3
SLIDE 3

Page

5

Thermal Thermal-

  • Aware 3D Physical Design Flow at

Aware 3D Physical Design Flow at UCLA (2002 UCLA (2002 – – 2005) 2005)

Netlist Netlist (LEFDEF) (LEFDEF) Design constraints Design constraints Technology Technology

CIF/GDSII CIF/GDSII Parasitic Parasitic Extraction Extraction Thermal Thermal Simulation Simulation Timing Timing Analysis Analysis

Thermal Thermal-

  • Driven

Driven 3D Floorplanner 3D Floorplanner Thermal Thermal-

  • Aware

Aware 3D Router w/ 3D Router w/ Thermal Via Planning Thermal Via Planning

Open Access Open Open Access Access

Thermal Thermal-

  • Driven

Driven 3D Placement 3D Placement

Compact Thermal model Compact Compact Thermal Thermal model model

Layout Layout Verification Verification

10/8/2007 UCLA VLSICAD LAB 6

  • Tech. Lib
  • Tech. Lib
  • Ref. Lib
  • Ref. Lib

Design Design

3D OA 3D OA Thermal Thermal-

  • Driven

Driven 3D Floorplanner 3D Floorplanner Thermal Thermal-

  • Driven

Driven 3D Placer 3D Placer 3D Global Router 3D Global Router Thermal Thermal-

  • Via Planner

Via Planner Tier Export Tier Import Detailed Routing Detailed Routing by Cadence Router by Cadence Router 2D OA 2D OA

3D Physical Design Flow (IBM, UCLA, and PSU) 3D Physical Design Flow (IBM, UCLA, and PSU) (2006 (2006 – – present) present)

Layer & Design Rules (LEF) Cell & Via* definitions (LEF) Netlist (HDL or DEF)

3D RC extraction 3D RC extraction Timing Timing Interface Interface 3D DRC & 3D LVS 3D DRC & 3D LVS

Layout (GDSII )

EinsTimer EinsTimer PSU PSU UCLA UCLA

slide-4
SLIDE 4

Page

7 Rlateral

Thermal Resistive Network [Wilkerson04] Thermal Resistive Network [Wilkerson04]

Circuit stack partitioned

into tiles

Tiles connected through

thermal resistances

Lateral resistances: fixed Vertical resistances ∝ 1/#via

Heat sources modeled as

current sources

Current value = power

Heat sinks modeled as

ground nodes

(a) Tiles stack array (b) Single tile stack

P1 R2 R3 R4 P4 P3 P2 R1

1 2 3 4

  • ±

R5 P5

5

Accurate and slow

8

Thermal Resistive Chain Model Thermal Resistive Chain Model

  • One

One-

  • Dimension Heat Flow Analysis

Dimension Heat Flow Analysis

  • Elmore delay

Elmore delay-

  • like formula [Chiang01]

like formula [Chiang01]

∑ ∑

= =

=

4 1 i 4 i j j i 4

P R T ) (

∑ ∑

= =

=

4 1 i i 1 j j i 4

R P T ) (

P1 R2 R3 R4 P4 P3 P2 R1

1 2 3 4

  • ±

Fast and rough

Reduce R: thermal via insertion (routing) Permute P: floorplanning

slide-5
SLIDE 5

Page

9

Through Through-

  • the

the-

  • Silicon Vias (TS

Silicon Vias (TS-

  • Vias) in 3D ICs

Vias) in 3D ICs

  • Effective in heat dissipating

Effective in heat dissipating

  • Regular wires have almost no effect (size/direction)

Regular wires have almost no effect (size/direction)

  • Two types of TS

Two types of TS-

  • vias

vias

  • Signal TS

Signal TS-

  • vias, part of the

vias, part of the netlist netlist

  • Thermal TS

Thermal TS-

  • vias, with no connections, introduced to reduce

vias, with no connections, introduced to reduce temperature temperature

Pad Dielectric Layer Block 1 Block 2 Block 3 Block 4 Metal Routing Layer Silicon (Device Layers) Block 5 Through-the-Silicon Via (Thermal TS Via) Through-the-Silicon Via (Signal TS Via)

10

Thermal Thermal-

  • Aware

Aware 3D Routing Problem 3D Routing Problem

  • Input

Input

  • 3

3-

  • D floorplanning (placement) result

D floorplanning (placement) result

  • Technology

Technology

  • Netlist

Netlist

  • Required temperature, such as 80

Required temperature, such as 80O

OC

C

  • Output

Output

  • Routed nets

Routed nets

  • Thermal TS

Thermal TS-

  • via number and locations

via number and locations

  • Objectives

Objectives

  • Minimum wirelength

Minimum wirelength

  • Minimum TS

Minimum TS-

  • via number

via number

slide-6
SLIDE 6

Page

11

Multilevel TS Multilevel TS-

  • Via Planning and 3D Routing (TMARS)

Via Planning and 3D Routing (TMARS)

Gi G0 Gk G0 Gi Downward Pass Upward Pass level 0 level i level k level i level 0

(1). Power Density Calculation (2). Heat Flow Estimation (3). Routing Resource Estimation (1). Power Density Coarsening (2). Heat Flow Estimation (3). Routing Resource Coarsening (1). Init Routing Tree Generation (2). TTS Via Planning (3). TTS Via Number Adjustment (1) Routing Refinement (2). TTS Via Planning (3). TTS Via Number Adjustment

Thermal Resistive Network Model 12

Thermal TS Thermal TS-

  • Via Planning Problem

Via Planning Problem

  • Determines the thermal TS via density for all tiles

Determines the thermal TS via density for all tiles

  • Minimizing #total thermal TS via

Minimizing #total thermal TS via

  • Meeting capacity and temperature constraint

Meeting capacity and temperature constraint

  • Solving through

Solving through

  • Via planning proportional to

Via planning proportional to ∆ ∆t t (VPPT) (VPPT)

∆t: vertical t difference t: vertical t difference

  • Alternating direction via planning (ADVP)

Alternating direction via planning (ADVP)

1 3 5 8 4 10 8 6 2 5

∆ta =ta-tb

a b

slide-7
SLIDE 7

Page

13

Thermal TS Via Planning Thermal TS Via Planning [Cong & Zhang, ICCAD

[Cong & Zhang, ICCAD’ ’05] 05]

Non Non-

  • Linear Programming Formulation

Linear Programming Formulation

  • Variable Definition, for tile

Variable Definition, for tile L Li, j, k

i, j, k

ai,j,k : TS-via number Ri,j,k : vertical thermal resistance Pi,j,k : current source Ύ : constant Ri,j,k = Ύ / ai,j,k ti,j,k : temperature Ii,j,k : heat flow

  • Objective

Objective

  • Constraints

Constraints

  • Capacity constraint

Capacity constraint

  • Temperature constraint

Temperature constraint

  • Kirchoff's

Kirchoff's current law current law

  • Constrained NLP

Constrained NLP

  • Can be solved by general NLP solver

Can be solved by general NLP solver

  • But very time consuming

But very time consuming Ri,j,k=Ύ /ai,j,k ± Fixed R

I i,j,k ti,j,k

N i,j,k i,j,k k 2 i,j,k i,j,k 1

I #total _ via a t t γ

≥ −

= = −

∑ ∑

14

Alternating Direction TS Alternating Direction TS-

  • Via Planning (ADVP)

Via Planning (ADVP)

  • Decompose the NLP into simplified sub

Decompose the NLP into simplified sub-

  • problems

problems

  • Optimizing the via distribution at one direction at a time

Optimizing the via distribution at one direction at a time

  • Alternating between vertical via planning and horizontal

Alternating between vertical via planning and horizontal via planning at each level via planning at each level

  • Updating the heat flow after every step

Updating the heat flow after every step

slide-8
SLIDE 8

Page

15

Vertical TS Vertical TS-

  • Via Planning

Via Planning

  • Resistive network

Resistive network → → resistive chain resistive chain NLP NLP → → convex programming convex programming

  • Solvable by any convex

Solvable by any convex programming tool programming tool

  • Theorem:

Theorem:

  • no capacity constraint: TS

no capacity constraint: TS-

  • via number

via number proportional to the square root of proportional to the square root of ∆

∆t t

  • VPPT

VPPT

4 3 2 4 3 2

a : a : a t : t : t Δ Δ Δ =

4 3 2 4 3 2

a : a : a t : t : t Δ Δ Δ =

I1 R2=γ /a2 R3=γ /a3 R4=γ /a4 I4 I3 I2 R1

1 2 3 4

  • ±

16

Horizontal TS Horizontal TS-

  • Via Planning

Via Planning

  • Still an NLP

Still an NLP

  • Further simplification

Further simplification

  • TTS via number given

TTS via number given

  • Even out

Even out ∆ ∆t t in one layer in one layer

  • TS

TS-

  • via number proportional

via number proportional to the vertical heat flow to the vertical heat flow I Ii,j,k

i,j,k

  • Fast heat flow estimation

Fast heat flow estimation

  • Through

Through path counting path counting

  • Error can be corrected by

Error can be corrected by accurate model accurate model

Ii,j,k+1 layer k Ii,j,k Pi,j,k

Ii,j,k+1 1 2 3 4 5

slide-9
SLIDE 9

Page

17

Experiment Setup Experiment Setup

  • Four

Four-

  • layer 3D Floorplanning results from 3DFP [ICCAD04]

layer 3D Floorplanning results from 3DFP [ICCAD04]

  • MCNC and GSRC floorplanning benchmarks

MCNC and GSRC floorplanning benchmarks

  • Power density, random value (10

Power density, random value (105

5 ~

~10 107

7 W/m

W/m2

2)

)

  • Required temperature, 77

Required temperature, 77o

  • C

C

block # net # Init Temp (C) ami33 33 123 298.8 ami49 49 408 210.7 n100 100 885 275.3 n200 200 1585 311.2 n300 300 1893 290.2

Benchmark characteristics Benchmark characteristics

18

Experimental Results Experimental Results ⎯ ⎯ Temperature Reduction Temperature Reduction

  • With thermal via insertion, temperature can be reduced to the

With thermal via insertion, temperature can be reduced to the required temperature (77 required temperature (77o

  • C)

C)

  • Thermal via insertion can reduce the maximum on

Thermal via insertion can reduce the maximum on-

  • chip

chip temperature by over temperature by over 40% 40% 50 100 150 200 250 300 350 T (C) ami33 ami49 n100 n200 n300 input after routing with thermal via insertion

slide-10
SLIDE 10

Page

19

Temperature Maps of ami33 Top Layer Temperature Maps of ami33 Top Layer

157-158 156-157 155-156 154-155 153-154 152-153 76-77 75-76 74-75 73-74 72-73 71-72 70-71 69-70 68-69 67-68 66-67 65-66 64-65 63-64

Before Thermal Via Insertion After Thermal Via Insertion

20

Experimental Results Experimental Results ⎯ ⎯ Different TS Different TS-

  • Via Planners

Via Planners

  • All can reach the required temperature

All can reach the required temperature

  • m

m-

  • ADVP

ADVP

  • 11%

11% reduction over flat ADVP reduction over flat ADVP

  • 68%

68% reduction over TS reduction over TS-

  • via insertion by temperature (m

via insertion by temperature (m-

  • VPPT)

VPPT)

  • 3.5x

3.5x reduction over even TS via distribution reduction over even TS via distribution

1 2 3 4 5 6 7 8 normalized TS-via number ami33 ami49 n100 n200 n300 m-ADVP f-ADVP m-VPPT even

slide-11
SLIDE 11

Page

21

Experimental Results Experimental Results ⎯ ⎯ Final Routing Results Final Routing Results

0.5 0.6 0.7 0.8 0.9 1 ami33 n100 n300 Completion Rates m-ADVP m-VPPT even

  • Completion rates: m

Completion rates: m-

  • ADVP:

ADVP: 96.9% 96.9% , m , m-

  • VPPT:

VPPT: 93.7% , 93.7% , even: even: 73.44% 73.44%

  • Normalized runtime: m

Normalized runtime: m-

  • ADVP:

ADVP:1.0 1.0, m , m-

  • VPPT:

VPPT:1.49 1.49 and even: and even:3.8 3.8

2 4 6 8 10 ami33 n100 n300 Runtime (s)

22

Outline Outline

  • Thermal

Thermal-

  • Aware 3D IC Physical Design Flow

Aware 3D IC Physical Design Flow

  • Thermal Models and Assumptions

Thermal Models and Assumptions

  • 3D Routing with Thermal Via Planning

3D Routing with Thermal Via Planning

  • 3D Placement

3D Placement

  • 3D

3D Floorplanning Floorplanning

  • 3D Architecture Exploration

3D Architecture Exploration

  • 3D Component Modeling and Testing

3D Component Modeling and Testing

  • Concluding Remarks and Future Work

Concluding Remarks and Future Work

slide-12
SLIDE 12

Page

2D to 3D Transformation by Local Stacking

1.

2D placement on area K*A

  • For 3D chip with K device

layers and each with area A

2.

Shrink:

3.

Tetris-style 3D legalization

  • Cost R = αd + βv + γt
  • Minimize displacement, #via

and thermal cost

23

) K / y , K / (x ) y , (x

i i i i

2D to 3D Transformation by Folding

Layer assignment and location mapping

according to the folded order

Folding-2 Folding-4

24

slide-13
SLIDE 13

Page

Window-based Stacking / Folding

1.

Divde 2D placement into NxN windows

2.

Apply stacking or folding in a window

Effect of stacking or folding would be

spreaded out, and trade-offs are achieved by varying N

UCLA VLSICAD LAB 26

3D Placement via Transformation 3D Placement via Transformation

  • Features

Features

  • Existing well

Existing well-

  • performing 2D

performing 2D placers can be reused placers can be reused

  • Simple but effective

Simple but effective transformation heuristics transformation heuristics

  • Trade

Trade-

  • off between wire length
  • ff between wire length

and #via to adapt different and #via to adapt different manufacturing ability manufacturing ability

  • Refinement through RCN graph

Refinement through RCN graph

2D Wirelength- and/or Thermal- Driven Placement 2D to 3D Transformation Layer Reassignment through RCN Graph 2D Detailed Placement for Each Layer Fast Thermal Model Accurate Thermal Model

slide-14
SLIDE 14

Page

3D Placement Results (1/2) 3D Placement Results (1/2)

  • Wirelength (stacking)

Wirelength (stacking) compared to 2D mPL5 compared to 2D mPL5

  • Wirelength

Wirelength v.s v.s. # TS via . # TS via trade trade-

  • offs
  • ffs

circuit 2D mPL5 T3Place

ibm01 5.19E+ 06 2.51E+ 06 6.95E+ 06 ibm03 1.37E+ 07 6.67E+ 06 ibm02 1.44E+ 07 8.21E+ 06 ibm05 4.23E+ 07 1.94E+ 07 ibm04 1.67E+ 07 1.09E+ 07 ibm07 3.73E+ 07 1.90E+ 07 ibm06 2.20E+ 07 1.98E+ 07 ibm09 3.46E+ 07 1.78E+ 07 ibm08 3.94E+ 07 3.61E+ 07 ibm11 5.02E+ 07 2.51E+ 07 ibm10 6.82E+ 07 3.78E+ 07 ibm13 6.58E+ 07 3.30E+ 07 ibm12 7.58E+ 07 7.40E+ 07 ibm15 1.65E+ 08 8.42E+ 07 ibm14 1.42E+ 08 1.06E+ 08 ibm17 3.05E+ 08 1.60E+ 08 ibm16 2.04E+ 08 1.28E+ 08 avg. 1 0.5 ibm18 2.43E+ 08

0.00E+00 1.00E+04 2.00E+04 3.00E+04 4.00E+04 5.00E+04 6.00E+04 7.00E+04 8.00E+04 2.00E+07 2.50E+07 3.00E+07 3.50E+07 4.00E+07 4.50E+07 wirelength n u m b e r o f T S v i a s folding + 7(a) stacking 7(a) folding+7(b) stacking + 7(b)

1 1 2 2 2 2 32

folding + sequential stacking + sequential folding + symmetric stacking + symmetric

27 UCLA VLSICAD LAB 28

3D Placement Results (2/2) 3D Placement Results (2/2)

LST, r = 10%, LST, r = 10%, w/ temp optimization circuit

  • Temp. (ºC)

WL via #

  • Temp. (ºC)

ibm01 276.5 2.81E+06 19020 159.8 ibm03 196.7 7.13E+06 31780 121.6 ibm04 159.6 9.11E+06 40219 96.0 ibm06 160.4 1.23E+07 50576 103.5 ibm07 107.5 2.01E+07 69111 66.4 ibm08 97.7 2.05E+07 75397 63.2 ibm09 96.1 1.94E+07 78102 60.6 ibm13 249.3 3.47E+07 127520 156.2 ibm15 136.5 8.58E+07 260681 90.1 ibm18 89.4 1.31E+08 332012 58.7 Avg. 1.0 1.08 1.06 0.63

  • Effect of temperature optimization

Effect of temperature optimization

slide-15
SLIDE 15

Page

TS Via Aware 3D Placement TS Via Aware 3D Placement

  • Problem Formulation

Problem Formulation

  • Minimize

Minimize

  • WL(x,y,z

WL(x,y,z) + ) + viaCost(x,y,z viaCost(x,y,z) )

  • Subject to

Subject to

  • Overlap

Overlap-

  • free condition

free condition

  • Relaxed Placement Model

Relaxed Placement Model

  • Possible Layout

Possible Layout

  • Placement Model

Placement Model

29

net c1 c2 c3 via Tier 0 > Tier 1 > Tier 2 > Tier 3 > WL c1 c2 c3 viaCost WL c1 c2 c3 viaCost Placement Region

Problem Formulation Problem Formulation

  • WireLength

WireLength (WL) (WL)

  • Bounding box model

Bounding box model

  • viaCost

viaCost

  • Area consumption

Area consumption

  • Density congestion

Density congestion

  • Overlap

Overlap-

  • free condition

free condition

  • Achieved by area density penalty method

Achieved by area density penalty method

  • Two interleaved sets of density control plain

Two interleaved sets of density control plain

WL c1 c2 c3 viaCost

slide-16
SLIDE 16

Page

Density Penalty Method Density Penalty Method

  • Special case: z direction fixed

Special case: z direction fixed

  • a set of four density control plains

a set of four density control plains

  • Similar to the 2D case [mPL5]

Similar to the 2D case [mPL5]

  • Penalty minimized

Penalty minimized ↔ ↔ no overlaps no overlaps

  • General case: z direction relaxed

General case: z direction relaxed

  • Two sets of density control plain

Two sets of density control plain

  • (blue lines and red lines)

(blue lines and red lines)

  • Cell is diffused and distributes area

Cell is diffused and distributes area

  • Penalty minimized

Penalty minimized ↔ ↔ no overlaps no overlaps

31

WL c1 c2 c3 viaCost WL c1 c2 c3 viaCost

η=½ η=1 η=0

c2

Preliminary Results Preliminary Results

32

  • Tradeoff curve compared with [ASPDAC

Tradeoff curve compared with [ASPDAC’ ’07] on ibm01 07] on ibm01

Achieve as large as 50% #TSV reduction

  • r 12% WireLength reduction
slide-17
SLIDE 17

Page

33

Outline Outline

  • Thermal

Thermal-

  • Aware 3D IC Physical Design Flow

Aware 3D IC Physical Design Flow

  • Thermal Models and Assumptions

Thermal Models and Assumptions

  • 3D Routing with Thermal Via Planning

3D Routing with Thermal Via Planning

  • 3D Placement

3D Placement

  • 3D

3D Floorplanning Floorplanning

  • 3D Architecture Exploration

3D Architecture Exploration

  • 3D Component Modeling and Testing

3D Component Modeling and Testing

  • Concluding Remarks and Future Work

Concluding Remarks and Future Work

34

Thermal Thermal-

  • Aware 3D Floorplanning [ICCAD04]

Aware 3D Floorplanning [ICCAD04]

  • First work in this field

First work in this field

  • Simulated Annealing (SA) Engine

Simulated Annealing (SA) Engine

  • New local z

New local z-

  • neighbor operations

neighbor operations

  • Cost function

Cost function

  • nwl

nwl ⎯ ⎯ normalized normalized wirelength wirelength

  • narea

narea ⎯ ⎯ normalized normalized chip area chip area

  • nvc

nvc ⎯ ⎯ normalized normalized interlayer via number interlayer via number

  • c

cT

T ⎯

⎯ temperature temperature cost cost

  • Hybrid Thermal Evaluation

Hybrid Thermal Evaluation

  • At each move

At each move ― ― uses simplified uses simplified chain model chain model

  • At each SA temperature drop

At each SA temperature drop ― ― the resistive the resistive network model network model

a b c d e f g L1 L2 i h j k L3

T

c nvc narea nwl t cos ⋅ + ⋅ + ⋅ + ⋅ = η γ β α

slide-18
SLIDE 18

Page

35

Temperature/Runtime Tradeoff Temperature/Runtime Tradeoff

  • 3DFP

3DFP-

  • T can reduce the temperature by

T can reduce the temperature by 56% 56% with with 9.7x 9.7x runtime runtime

  • 3DFP

3DFP-

  • T

T-

  • Fast can reduce the temperature by

Fast can reduce the temperature by 40% 40% with with 1.8x 1.8x runtime runtime

  • 3DFP

3DFP-

  • T

T-

  • Hybrid can reduce the temperature by

Hybrid can reduce the temperature by 50% 50% with with 3.2x 3.2x runtime runtime

  • Wirelength increase less than 6%

Wirelength increase less than 6%

3DFP 3DFP-T 3DFP-T- Fast 3DFP-T- Hybrid 0.2 0.4 0.6 0.8 1 1.2 5 10 15 Normalized Runtime Normalized Temperatu

36

Detailed Simulation Result Detailed Simulation Result

Without Thermal Optimization With Thermal Optimization

  • ami33 benchmark with 33 blocks and 4 layers
  • Generated by FEM based thermal simulation tool

(CFD-ACE+)

slide-19
SLIDE 19

Page

3D Floorplanning with Folded Blocks

The exploration of the use of vertical integration on

microprocessor design requires consideration for both physical design and architecture.

True 3D packing Architectural Alternative Selection

  • The number of layers in folded blocks
  • The partition way: block folding or port partitioning

3D Architectural Blocks 3D Architectural Blocks – – Issue Queue Issue Queue

  • Block folding

Block folding

  • Fold the entries and place them

Fold the entries and place them

  • n different layers
  • n different layers
  • Effectively shortens the tag lines

Effectively shortens the tag lines

  • Port partitioning

Port partitioning

  • Place tag lines and ports on

Place tag lines and ports on multiple layer, thus reducing multiple layer, thus reducing both the height and width of the both the height and width of the ISQ. ISQ.

  • The reduction in tag and

The reduction in tag and matchline matchline wires can help reduce wires can help reduce both power and delay. both power and delay.

  • Benefits from block folding

Benefits from block folding

  • Maximum delay reduction of

Maximum delay reduction of 50%, maximum area 50%, maximum area reduction of 90% and a reduction of 90% and a maximum reduction in maximum reduction in power consumption of 40% power consumption of 40%

(a) 2D issue queue with 4 taglines; (b) block folding; (c) port partitioning

slide-20
SLIDE 20

Page

3D Architectural Blocks 3D Architectural Blocks – – Caches Caches

Port Partitioning Wordline Folding Single Layer Design

  • 3D

3D-

  • CACTI: a tool to model 3D cache for area, delay and power

CACTI: a tool to model 3D cache for area, delay and power

  • We add port partitioning method

We add port partitioning method

  • The area impaction of

The area impaction of vias vias

  • Improvements

Improvements

  • Port folding performs better than

Port folding performs better than wordline wordline folding for area.(72% folding for area.(72% vs vs 51%) 51%)

  • Wordline

Wordline folding is more effective in reducing the block delay (13% folding is more effective in reducing the block delay (13% vs vs 5%) 5%)

  • Port folding also performs better in reducing power (13%

Port folding also performs better in reducing power (13% vs vs 5%) 5%)

Corner Block List (CBL) Representation for 3D Floorplan (ICCD’07)

A 3D CBL composes a 3-tuple (S, L, T)

S: a list of block name L: corner cubic block orientation(X-, Y- or Z- oriented) T: The sequence of {Tn,Tn-1, …,T2} recording the number of blocks (represented by # 1’s separated by a 0) covered by corner cubic block in the uncovered block list 3 4 1 2 S={1 2 3 4 5} L = ( Y,Z,Y,X) T=( 10,110,10,1110) 5

slide-21
SLIDE 21

Page

41

Outline Outline

  • Thermal

Thermal-

  • Aware 3D IC Physical Design Flow

Aware 3D IC Physical Design Flow

  • Thermal Models and Assumptions

Thermal Models and Assumptions

  • 3D Routing with Thermal Via Planning

3D Routing with Thermal Via Planning

  • 3D Placement

3D Placement

  • 3D

3D Floorplanning Floorplanning

  • 3D Architecture Exploration

3D Architecture Exploration

  • 3D Component Modeling and Testing

3D Component Modeling and Testing

  • Concluding Remarks and Future Work

Concluding Remarks and Future Work

42

3D Architecture Evaluation with Physical Planning 3D Architecture Evaluation with Physical Planning

  • - MEVA

MEVA-

  • 3D [DAC

3D [DAC’ ’03 & ASPDAC 03 & ASPDAC’ ’06] 06]

  • Optimize

Optimize

  • BIPS (not IPC or Freq)

BIPS (not IPC or Freq)

  • Consider interconnect

Consider interconnect pipelining based on early pipelining based on early floorplanning for critical paths floorplanning for critical paths

  • Use IPC sensitivity model

Use IPC sensitivity model [Jagannathan05] [Jagannathan05]

  • Area/wirelength

Area/wirelength

  • Temperature

Temperature

2D/3D floorplanning for performance and thermal with interconnect pipelining performance simulation with interconnect latencies 2D/3D thermal simulation microarchitecture configuration target frequency critical architectural paths and sensitivity power density estimates estimated performance, temperature, and interconnect data power density with interconnect consideration performance, power and temperature

ESTIMATION VALIDATION

slide-22
SLIDE 22

Page

43

IPC Sensitivity Models IPC Sensitivity Models

  • Study sensitivity by varying latency of P with all other

Study sensitivity by varying latency of P with all other parameters fixed parameters fixed

  • Build mathematical models [linear, piece

Build mathematical models [linear, piece-

  • wise linear, etc. or

wise linear, etc. or table table-

  • lookup]

lookup]

  • P

PBL

BL: minimum latency along P (only from blocks)

: minimum latency along P (only from blocks)

  • P

PPL

PL: post

: post-

  • layout latency along P (blocks + wires)

layout latency along P (blocks + wires)

  • Delta latency

Delta latency δ δ = (P = (PPL

PL –

– P PBL

BL)

)

  • f(P

f(P, ,δ δ): relative degraded IPC with extra ): relative degraded IPC with extra δ δ cycle latency on P cycle latency on P

  • f(P

f(P, ,δ δ) = (1 ) = (1 – – x) x)δ

δ, where x is per

, where x is per-

  • cycle IPC degradation for P

cycle IPC degradation for P

  • e.g.: 2 extra cycles, new IPC = (1

e.g.: 2 extra cycles, new IPC = (1-

  • 0.024)*(1

0.024)*(1-

  • 0.024)

0.024)

  • IPC

IPCPL

PL = IPC

= IPCBL

BL x

x f(P f(P, ,δ δ) )

  • We ignore path interactions and use a simple additive

We ignore path interactions and use a simple additive model to combine multiple paths model to combine multiple paths

IPCPL(P1,P2,…,PN,δ1,δ2,…,δN) = IPCBL(P1,P2,…,PN,0,0..,0) * f(P1,δ1) * f(P2,δ2) * … * f(PN,δN)

44

Design Example Design Example

  • An out

An out-

  • of
  • f-
  • order superscalar processor micro
  • rder superscalar processor micro-
  • architecture

architecture with 4 banks of L2 cache in 70 with 4 banks of L2 cache in 70nm nm technology

technology

  • Critical paths

Critical paths

slide-23
SLIDE 23

Page

45

Baseline Processor Parameters Baseline Processor Parameters

46

Wirelength Improvement from 3D Layout Wirelength Improvement from 3D Layout

20000 40000 60000 80000 100000 120000 3G 4G 5G 6G 2D 3D

Assume two device layers

slide-24
SLIDE 24

Page

47

Performance Improvement of 3D Layout Performance Improvement of 3D Layout

Assume two device layers

48

2D 2D vs vs 3D Layout 3D Layout

2D EV6-like core 3D EV6-like core (2 layers) BIPS= 2.75 BIPS= 2.94 Wakeup loop : The extra cycle is eliminated. Branch misprediction resolution loop and the L2 cache access latency : Some of the extra cycles are eliminated

Assume two device layers

slide-25
SLIDE 25

Page

49

Maximum On Maximum On-

  • Chip Temperatures

Chip Temperatures

HS denotes a heat sink, and the 3D integration allows to insert thermal vias to reduce the temperature.

Frequency

Assume two device layers

50

Thermal Profiles for 2D chip(4Ghz) Thermal Profiles for 2D chip(4Ghz)

Temperature distribution in 2D integration. Temperature distribution in 2D integration.

slide-26
SLIDE 26

Page

51

Thermal Profiles for 3D chip(4Ghz) Thermal Profiles for 3D chip(4Ghz)

Temperature distribution in 3D integration with one heat sink. Temperature distribution in 3D integration with one heat sink. Temperature distribution in 3D integration with two heat sinks a Temperature distribution in 3D integration with two heat sinks and flipped upper layer. nd flipped upper layer. 52

Limitation of Component Stacking Alone Limitation of Component Stacking Alone

  • Extra latency seen by some critical loops:

Extra latency seen by some critical loops:

  • Stacking can only attack wire latency between blocks

Stacking can only attack wire latency between blocks

  • Further benefit can only come from attacking block

Further benefit can only come from attacking block latency latency

  • Component Folding

Component Folding

slide-27
SLIDE 27

Page

53

Solution: 3D Design w/ Component Folding and Solution: 3D Design w/ Component Folding and Stacking Stacking

  • Explore 3D design of architectural structures that are

Explore 3D design of architectural structures that are

  • Timing/Throughput Critical

Timing/Throughput Critical

  • Expensive in Terms of Power Consumption and/or Thermal

Expensive in Terms of Power Consumption and/or Thermal Output Output

  • Possible candidates for 3D component folding

Possible candidates for 3D component folding

  • Instruction Scheduling Window

Instruction Scheduling Window

  • Issue Queue can be partitioned into multiple levels via

Issue Queue can be partitioned into multiple levels via matchlines matchlines or taglines.

  • r taglines.
  • On

On-

  • Chip Caches

Chip Caches

  • Regular structure lends itself to a wide range of

Regular structure lends itself to a wide range of partitionings partitionings

  • Register File

Register File

  • Thermally critical resource

Thermally critical resource – – also has a regular structure also has a regular structure

54

Results from 3D Folding and Stacking Results from 3D Folding and Stacking

0.5 1 1.5 2 2.5 3 3.5 4 3G 4G 5G 6G 1 layer 2 layers 3 layers 4 layers

Over 35% performance improvement

slide-28
SLIDE 28

Page

55

5GHz 3 Device Layer Layout 5GHz 3 Device Layer Layout

Exploration of 3D MultiCore Systems -- MC-Sim

L2 Bank L2 Bank L2 Bank SESC Instance MINT C C C

… …

CACHE CONTROLLER

Functional Network Switch

SESC Instance MINT C C C

SESC Instance MINT C C C

SystemC NoC Model

message latencies messages

Central Page Handler

slide-29
SLIDE 29

Page

MC-Sim Components

A number of SESC instances

Each instance is a number of cores cooperating on a single (potentially multithreaded) application

A number of cache banks

Shared cache state that can be accessed by any SESC instance

A central page handler

To dole out physical pages to SESC instances Allows support for multitasking

A functional network switch

To functionally route messages between components

A SystemC NoC model

To accurately model latency and power Entries in the functional switch wait for an amount of time specified by the NoC

58

Summary Summary

  • Very little 3D CAD support from major EDA vendors

Very little 3D CAD support from major EDA vendors

  • A complete set of thermal

A complete set of thermal-

  • aware 3D IC physical design tool is

aware 3D IC physical design tool is available from UCLA/ available from UCLA/PennState PennState/IBM collaboration /IBM collaboration

  • 3D thermal modeling

3D thermal modeling

  • 3D routing with thermal via planning

3D routing with thermal via planning

  • 3D placement

3D placement

  • 3D

3D floorplanning floorplanning

  • 3D physical design tools provide the capability for early physic

3D physical design tools provide the capability for early physical al prototyping for microarchitecture exploration prototyping for microarchitecture exploration

  • Coupled with 3D physical planning

Coupled with 3D physical planning

  • Consider both 3D component stacking and folding

Consider both 3D component stacking and folding

  • Over 35% performance improvement

Over 35% performance improvement

slide-30
SLIDE 30

Page

Further Reading

  • Y. Liu, Y. Ma, E. Kursun, J. Cong, and G. Reinman, “Fine Grain 3D Integration for

Microarchitecture Design Through Cube Packing Exploration,” Proceedings of 25th IEEE International Conference on Computer Design, Lake Tahoe, CA, pp. 259-266, October 2007.

  • J. Cong, Y. Ma, Y. Liu, E. Kursun, and G. Reinman, “3D Architecture Modeling and

Exploration,” Proceedings of 24th International VLSI/ULSI Multilevel Interconnection Conference (VMIC), Fremont, CA, pp. 231-238, September 2007.

  • G. Loh, Y. Xie, and B.Black, “3D processor Design” , IEEE Micro, 2007
  • J. Cong, G. Luo, J. Wei, and Y. Zhang, “Thermal-Aware 3D IC Placement via

Transformation,” Proceedings of the 12th Asian and South Pacific Design Automation Conference (ASP-DAC 2007), Yokohama, Japan, pp. 780-785, January, 2007.

  • Yuan Xie, G. Loh, B. Black, K. Bernstein. Design Space Exploration for 3D Architecture.

ACM Journal of Emerging Technologies for Computer Systems 2(2):65-103.

  • J. Cong and Y. Zhang., “Thermal Via Planning for 3-D ICs,” Proceedings of the 2005

IEEE/ACM Int’l Conference on Computer Aided Design, November 2005, pp. 745-752.

  • Tsai, Y-F., Y. Xie, N. Vijaykrishnan, M. J. Irwin Three-Dimensional Cache Design

Exploration Using 3DCacti. Proceedings of the IEEE International Conference on Computer Design (ICCD 2005). pp. 519-524

http:// http://cadlab.cs.ucla.edu cadlab.cs.ucla.edu/~cong /~cong

Acknowledgements

We would like to thank the supports from

DARPA

Support from the primary contractors --

Collaboration with CFDRC and IBM and

Publications are available from

http://cadlab.cs.ucla.edu/~cong