A Framework for Layout-Level Logic Restructuring Hosung Leo Kim - - PowerPoint PPT Presentation

a framework for layout level logic restructuring
SMART_READER_LITE
LIVE PREVIEW

A Framework for Layout-Level Logic Restructuring Hosung Leo Kim - - PowerPoint PPT Presentation

A Framework for Layout-Level Logic Restructuring Hosung Leo Kim John Lillis Motivation: Logical-to-Physical Disconnect Logic-level Optimization fixed netlist disconnect Limited by the structure Physical-level Optimization obtained from


slide-1
SLIDE 1

A Framework for Layout-Level Logic Restructuring

Hosung Leo Kim John Lillis

slide-2
SLIDE 2

Motivation: Logical-to-Physical Disconnect

  • Performance is determined largely by physical-

level Interconnect delay.

  • Problem: timing optimization at logic-level ≠

actual performance.

Logic-level Optimization Physical-level Optimization disconnect Limited by the structure

  • btained from the logic-level

fixed netlist

slide-3
SLIDE 3

Past Layout-Driven Restructuring Work: Replication Based

  • Basic Operations:

– Gate Splitting – Fanout Partitioning;

Enables “Path Straightening”

  • [Schabas, Brown. ISFPGA03]
  • [Beraudo, Lillis. DAC03]
  • [Hrkic, Lillis, Beraudo. TCAD06]
  • [Chen, Cong. ISFPGA05]
slide-4
SLIDE 4

Limitation of Logic Replication

  • While interconnect delay can be significantly reduced,

the LUT-depth of a path remains unchanged.

  • The LUT-depth is typically determined by a technology

mapper which does not have an accurate view of critical paths. Candidate: Remapping

slide-5
SLIDE 5

Other Work

  • Redundant Wires (e.g., [Chang, Cheng, Suaris, Marek-
  • Sadowska. DAC00])

rewire connections while keeping logical equivalence. Predictable, but optimization scope limited

  • [Lin, Jagannathan, and Cong. ISFPGA03]

Remap based on placement-level timing analysis Significant restructuring, but placement of remapped cells determined by initial placement (not simultaneous).

  • [Singh and Brown. Integration07]

Shannon’s expansion / precomputation Allows late signals to skip logic levels, but relatively local in nature

slide-6
SLIDE 6

Objectives

  • Overcome limitations of basic replication

(e.g., fixed LUT-depth)

  • Large and flexible remapping space
  • Explicitly account for placement freedom
  • f remapped LUTs
  • Tight coupling with placement
slide-7
SLIDE 7

Components of Approach (FPGA Domain)

Placement-Level Static Timing Analysis Timing-Critical Fan-in Cone Extraction Induce Replication Tree [Hrkic,TCAD06]

slide-8
SLIDE 8

Components of Approach (cont’d)

Replication tree Subject Graph (Choice Tree)

A i j k l B a b c d C e f g h choice node i k j l i l choice node e f g h h e f h choice node a b c d e g e f g h k j i j k l i j k l a b d g1 g2 g3 c d a b g7 g8 d c g4 g5 c d g6 c (a) Given LUT-tree (b) Choice tree A B C

a e d f bR cR

Recursive, Exhaustive Ashenhurst LUT Decomposition Legalizer Mapper and Embedder (Dynamic Programming)

slide-9
SLIDE 9

Remapping Example

A D C B E a b c f g h i j k l d e m a b c f g h i j k l d e m A B D C E a b c f g h i j k l d e m A′ B′ D C E A′ D C B′ E a b c f g h i j k l d e m (a) Given LUT-tree (b) “Mini-LUT” tree after LUT-decompositions (c) Alternative mapping (d) Corresponding LUT-tree

slide-10
SLIDE 10

Functional Decomposition

  • Test for decomposability

– Ashenhurst’s theorem

  • Recursively decompose

g2 w g1 y z x

xy wz 1 1 1 1 1 1 1 1

f y x w z

1 bit (Simple) disjoint

  • Simple Disjoint Functional Decomposition
slide-11
SLIDE 11

All Recursive Decompositions

f a b c d c a b d g1 g2 g3 c d a b g7 g8 d a b c g4 g5 g3 a b c d g6 g3 g1 g2 d g2 g3 c′ g3 a b g4 g5 c′ g5 g3 d g6 g3 c′d g7 a b g8 g8 c′d f a b c′d

slide-12
SLIDE 12

Choice Tree [Lehman,TCAD97]

A i j k l B a b c d C e f g h choice node i k j l i l choice node e f g h h e f h choice node a b c d e g e f g h k j i j k l i j k l a b d g1 g2 g3 c d a b g7 g8 d c g4 g5 c d g6 c (a) Given LUT-tree (b) Choice tree A B C

slide-13
SLIDE 13

Algorithm

  • Mini-LUT Tree Mapping
  • Fan-in Tree Embedding

[Hrkic,TCAD06]

  • Simultaneous Remapping

and Embedding

slide-14
SLIDE 14

Logic Remapping Formulation

  • Formulation

– Given a “mini-LUT” tree and arrival time at the leaves, – map the tree to K-input LUTs minimizing cost subject to an arrival time constraint at the root.

a b c f e d h i j k l m g

slide-15
SLIDE 15

Solution Signature

  • (c,a)

– for a sub-tree rooted u, a solution is characterized by two parameters:

  • cost of the embedding (and remapping) of a sub-tree.
  • arrival time at u.
  • Dominance Relation

– (c,a) is not dominated by (c’,a’) when c is better than c’ or a is better than a’.

cost arrival time

slide-16
SLIDE 16

Solution Sets

  • Si [u] = {(c,a)}

– u: signal produced by root LUT – i: # inputs of root LUT – c: # LUTs in subtrees – a: the latest among the fan-ins.

i (0,2) u h (0,6) S2[u]={(0,6)}

J

  • Si[u]

– “finalized” solution from Si [u]. – c: # LUTs in subtrees + 1 – a: the root LUT included.

i (0,2) u h (0,6) S2[u]={(1,7)}

  • S[u]

– non-dominated_sol(S2[b], … , SK[b])

J

J

slide-17
SLIDE 17

Si [u] Example

For simplicity:

  • ne LUT = one unit cost
  • ne LUT = one unit delay

J

slide-18
SLIDE 18

Si[u] and S[u] Example

  • S[b] = non-dominated_sol(S2[b], … , SK[b])

= {(1,7)}

  • Si[b]
slide-19
SLIDE 19

Computation of Si [u]

i = 1, no collapsing of u and L i = K-1, no collapsing of u and R Otherwise, collapsing of u, L, and R.

L R

S4[u] = join(S[a],S3[b]) ∪ join(S2[a], S2[b]) ∪ join(S3[a],S[b])

(a) (b) (c) (d) i K - i

J

K = 4 i = 1 i = 2 i = 3 (=K–1) J J J J J

u b a d c

u b a d c

slide-20
SLIDE 20

Remapping Algorithm Example

arrival time

(a) Subject Tree i (0,2) a b c f e d h (0,6) j (0,3) k (0,2) l (0,1) m (0,4) g (0,4)

slide-21
SLIDE 21

Algorithms

  • Mini-LUT Tree Mapping
  • Fan-in Tree Embedding

[Hrkic,TCAD06]

  • Simultaneous Remapping

and Embedding

slide-22
SLIDE 22

Tree Embedding [Hrkic,TCAD06]

a e d f a e d f bR cR

topology arrival time pin locations target layout graph

Embedding Algorithm

cost metrics arrival time a e d f bR cR

bR a cR e d f

(0,2) (0,3) (0,4)

slide-23
SLIDE 23

Algorithms

  • Mini-LUT Tree Mapping
  • Fan-in Tree Embedding

[Hrkic,TCAD06]

  • Simultaneous Remapping

and Embedding

slide-24
SLIDE 24

Simultaneous Remapping and Embedding

  • Formulation

– Given a “mini-LUT” tree with fixed leaves and root, and arrival time at the leaves, a target layout graph – Simultaneously map the tree to K-input LUTs and embed.

slide-25
SLIDE 25

Solution Set Si [u][v]

  • The remapped root produces signal u and is

placed at v in the target layout graph. J

slide-26
SLIDE 26

Solution Set Si[u][v]

  • Solutions Si [u][w] are finalized and drives vertex

v in the target layout graph.

  • Computed by shortest weight-constrained path

algorithm.

w

Si[u][v]

v u h i j k

J

slide-27
SLIDE 27

Solution Set S[u][v]

  • S[u][v] ← non-dominated-sol(S2[u][v],…,SK[u][v])
  • The best remapping regardless of the number of

inputs at v in the target layout graph.

slide-28
SLIDE 28

Simultaneous Remapping and Embedding Example

cost arrival time (19,13) (20,11) a b c f e d h i j k l m g (c) S4[a][v23]={(22,10)} (22,10) (c) S[a][v23]={(19,13),(20,11),(22,10)} g m l k j i h v23

slide-29
SLIDE 29

Experiment

  • Benchmarks

– 20 MCNC benchmark circuits – At least 20% white space

  • Criteria of Interest

– LUT depth – Clock period of circuits

  • Comparisions

– Timing-driven VPR placer – Replication Tree embedder – Arbor embedder [Kim,GLSVLSI06] – Remapping embedder

  • Different logic-level mappers and Stability

effect of new algorithm

slide-30
SLIDE 30

Optimization Flow

Modified Netlist Initial Netlist & Placement Tree Embedding Static Timing Analysis & Replication Tree Construction Modified Netlist & Placement Post-Processing & Legalization

  • Repl Tree embedder
  • Remapping embedder
slide-31
SLIDE 31

LUT Depth Changes

13 9 7 4 10 8 8 8 11 7 Crit. Path 16 10 8 5 14 9 8 8 12 8 Crit. Path ckt ckt 14 16 s298 10 10 apex2 9 8 seq 8 8 dsip 13 14 diffeq 9 9 alu4 9 9 misex3 9 8 apex4 12 12 tseng 9 9 ex5p New ckt

  • Init. ckt

13 5 8 8 9 15 9 22 4 4 Crit. Path 14 5 10 10 10 15 10 20 4 5 Crit. Path ckt ckt 14 15 clma 10 10 s38584.1 10 10 s38417 11 11 pdc 10 10 ex1010 18 17 elliptic 10 10 spla 22 23 frisc 5 5 bigkey 8 8 des New ckt

  • Init. ckt
slide-32
SLIDE 32

Routed Clock Period

0.826 0.848 0.886 1 Avg Delay Remap Arbor Repl T-VPR

0.2 0.4 0.6 0.8 1 1.2 e x 5 p t s e n g a p e x 4 m i s e x 3 a l u 4 d i f f e q d s i p s e q a p e x 2 s 2 9 8 d e s b i g k e y f r i s c s p l a e l l i p t i c e x 1 1 p d c s 3 8 4 1 7 s 3 8 5 8 4 . 1 c l m a VPR Repl Arbor Remap

  • Average Normalized Clock Period
  • Max reduction of

REMAP vs Arbor 11.7%

slide-33
SLIDE 33

Different Logic-level Mappers and Stability Effect of Remap

10 20 30 40 50 60 70 80 90 VPR Repl Remap FlowMap FlowMap-r ZMap Praetor Daomap

  • FlowMap: optimal depth.
  • FlowMap-r: relaxed depth.
  • ZMap: optimal depth with simultaneous area minimization.
  • Praetor: minimized area.
  • Daomap

seq

Span: 12% Span: 4%

slide-34
SLIDE 34

Summary

  • Study of layout-level restructuring for

interconnect optimization.

– Functional Decomposition – Choice Tree – Remapping Algorithm – Simultaneous remapping and embedding

  • Experimental Result

– Average 17% reduction on clock period compared with T-VPR.

slide-35
SLIDE 35

Thank You!