A Framework for Layout-Level Logic Restructuring Hosung Leo Kim - - PowerPoint PPT Presentation
A Framework for Layout-Level Logic Restructuring Hosung Leo Kim - - PowerPoint PPT Presentation
A Framework for Layout-Level Logic Restructuring Hosung Leo Kim John Lillis Motivation: Logical-to-Physical Disconnect Logic-level Optimization fixed netlist disconnect Limited by the structure Physical-level Optimization obtained from
Motivation: Logical-to-Physical Disconnect
- Performance is determined largely by physical-
level Interconnect delay.
- Problem: timing optimization at logic-level ≠
actual performance.
Logic-level Optimization Physical-level Optimization disconnect Limited by the structure
- btained from the logic-level
fixed netlist
Past Layout-Driven Restructuring Work: Replication Based
- Basic Operations:
– Gate Splitting – Fanout Partitioning;
Enables “Path Straightening”
- [Schabas, Brown. ISFPGA03]
- [Beraudo, Lillis. DAC03]
- [Hrkic, Lillis, Beraudo. TCAD06]
- [Chen, Cong. ISFPGA05]
Limitation of Logic Replication
- While interconnect delay can be significantly reduced,
the LUT-depth of a path remains unchanged.
- The LUT-depth is typically determined by a technology
mapper which does not have an accurate view of critical paths. Candidate: Remapping
Other Work
- Redundant Wires (e.g., [Chang, Cheng, Suaris, Marek-
- Sadowska. DAC00])
rewire connections while keeping logical equivalence. Predictable, but optimization scope limited
- [Lin, Jagannathan, and Cong. ISFPGA03]
Remap based on placement-level timing analysis Significant restructuring, but placement of remapped cells determined by initial placement (not simultaneous).
- [Singh and Brown. Integration07]
Shannon’s expansion / precomputation Allows late signals to skip logic levels, but relatively local in nature
Objectives
- Overcome limitations of basic replication
(e.g., fixed LUT-depth)
- Large and flexible remapping space
- Explicitly account for placement freedom
- f remapped LUTs
- Tight coupling with placement
Components of Approach (FPGA Domain)
Placement-Level Static Timing Analysis Timing-Critical Fan-in Cone Extraction Induce Replication Tree [Hrkic,TCAD06]
Components of Approach (cont’d)
Replication tree Subject Graph (Choice Tree)
A i j k l B a b c d C e f g h choice node i k j l i l choice node e f g h h e f h choice node a b c d e g e f g h k j i j k l i j k l a b d g1 g2 g3 c d a b g7 g8 d c g4 g5 c d g6 c (a) Given LUT-tree (b) Choice tree A B C
a e d f bR cR
Recursive, Exhaustive Ashenhurst LUT Decomposition Legalizer Mapper and Embedder (Dynamic Programming)
Remapping Example
A D C B E a b c f g h i j k l d e m a b c f g h i j k l d e m A B D C E a b c f g h i j k l d e m A′ B′ D C E A′ D C B′ E a b c f g h i j k l d e m (a) Given LUT-tree (b) “Mini-LUT” tree after LUT-decompositions (c) Alternative mapping (d) Corresponding LUT-tree
Functional Decomposition
- Test for decomposability
– Ashenhurst’s theorem
- Recursively decompose
g2 w g1 y z x
xy wz 1 1 1 1 1 1 1 1
f y x w z
1 bit (Simple) disjoint
- Simple Disjoint Functional Decomposition
All Recursive Decompositions
f a b c d c a b d g1 g2 g3 c d a b g7 g8 d a b c g4 g5 g3 a b c d g6 g3 g1 g2 d g2 g3 c′ g3 a b g4 g5 c′ g5 g3 d g6 g3 c′d g7 a b g8 g8 c′d f a b c′d
Choice Tree [Lehman,TCAD97]
A i j k l B a b c d C e f g h choice node i k j l i l choice node e f g h h e f h choice node a b c d e g e f g h k j i j k l i j k l a b d g1 g2 g3 c d a b g7 g8 d c g4 g5 c d g6 c (a) Given LUT-tree (b) Choice tree A B C
Algorithm
- Mini-LUT Tree Mapping
- Fan-in Tree Embedding
[Hrkic,TCAD06]
- Simultaneous Remapping
and Embedding
Logic Remapping Formulation
- Formulation
– Given a “mini-LUT” tree and arrival time at the leaves, – map the tree to K-input LUTs minimizing cost subject to an arrival time constraint at the root.
a b c f e d h i j k l m g
Solution Signature
- (c,a)
– for a sub-tree rooted u, a solution is characterized by two parameters:
- cost of the embedding (and remapping) of a sub-tree.
- arrival time at u.
- Dominance Relation
– (c,a) is not dominated by (c’,a’) when c is better than c’ or a is better than a’.
cost arrival time
Solution Sets
- Si [u] = {(c,a)}
– u: signal produced by root LUT – i: # inputs of root LUT – c: # LUTs in subtrees – a: the latest among the fan-ins.
i (0,2) u h (0,6) S2[u]={(0,6)}
J
- Si[u]
– “finalized” solution from Si [u]. – c: # LUTs in subtrees + 1 – a: the root LUT included.
i (0,2) u h (0,6) S2[u]={(1,7)}
- S[u]
– non-dominated_sol(S2[b], … , SK[b])
J
J
Si [u] Example
For simplicity:
- ne LUT = one unit cost
- ne LUT = one unit delay
J
Si[u] and S[u] Example
- S[b] = non-dominated_sol(S2[b], … , SK[b])
= {(1,7)}
- Si[b]
Computation of Si [u]
i = 1, no collapsing of u and L i = K-1, no collapsing of u and R Otherwise, collapsing of u, L, and R.
L R
S4[u] = join(S[a],S3[b]) ∪ join(S2[a], S2[b]) ∪ join(S3[a],S[b])
(a) (b) (c) (d) i K - i
J
K = 4 i = 1 i = 2 i = 3 (=K–1) J J J J J
u b a d c
u b a d c
Remapping Algorithm Example
arrival time
(a) Subject Tree i (0,2) a b c f e d h (0,6) j (0,3) k (0,2) l (0,1) m (0,4) g (0,4)
Algorithms
- Mini-LUT Tree Mapping
- Fan-in Tree Embedding
[Hrkic,TCAD06]
- Simultaneous Remapping
and Embedding
Tree Embedding [Hrkic,TCAD06]
a e d f a e d f bR cR
topology arrival time pin locations target layout graph
Embedding Algorithm
cost metrics arrival time a e d f bR cR
bR a cR e d f
(0,2) (0,3) (0,4)
Algorithms
- Mini-LUT Tree Mapping
- Fan-in Tree Embedding
[Hrkic,TCAD06]
- Simultaneous Remapping
and Embedding
Simultaneous Remapping and Embedding
- Formulation
– Given a “mini-LUT” tree with fixed leaves and root, and arrival time at the leaves, a target layout graph – Simultaneously map the tree to K-input LUTs and embed.
Solution Set Si [u][v]
- The remapped root produces signal u and is
placed at v in the target layout graph. J
Solution Set Si[u][v]
- Solutions Si [u][w] are finalized and drives vertex
v in the target layout graph.
- Computed by shortest weight-constrained path
algorithm.
w
Si[u][v]
v u h i j k
J
Solution Set S[u][v]
- S[u][v] ← non-dominated-sol(S2[u][v],…,SK[u][v])
- The best remapping regardless of the number of
inputs at v in the target layout graph.
Simultaneous Remapping and Embedding Example
cost arrival time (19,13) (20,11) a b c f e d h i j k l m g (c) S4[a][v23]={(22,10)} (22,10) (c) S[a][v23]={(19,13),(20,11),(22,10)} g m l k j i h v23
Experiment
- Benchmarks
– 20 MCNC benchmark circuits – At least 20% white space
- Criteria of Interest
– LUT depth – Clock period of circuits
- Comparisions
– Timing-driven VPR placer – Replication Tree embedder – Arbor embedder [Kim,GLSVLSI06] – Remapping embedder
- Different logic-level mappers and Stability
effect of new algorithm
Optimization Flow
Modified Netlist Initial Netlist & Placement Tree Embedding Static Timing Analysis & Replication Tree Construction Modified Netlist & Placement Post-Processing & Legalization
- Repl Tree embedder
- Remapping embedder
LUT Depth Changes
13 9 7 4 10 8 8 8 11 7 Crit. Path 16 10 8 5 14 9 8 8 12 8 Crit. Path ckt ckt 14 16 s298 10 10 apex2 9 8 seq 8 8 dsip 13 14 diffeq 9 9 alu4 9 9 misex3 9 8 apex4 12 12 tseng 9 9 ex5p New ckt
- Init. ckt
13 5 8 8 9 15 9 22 4 4 Crit. Path 14 5 10 10 10 15 10 20 4 5 Crit. Path ckt ckt 14 15 clma 10 10 s38584.1 10 10 s38417 11 11 pdc 10 10 ex1010 18 17 elliptic 10 10 spla 22 23 frisc 5 5 bigkey 8 8 des New ckt
- Init. ckt
Routed Clock Period
0.826 0.848 0.886 1 Avg Delay Remap Arbor Repl T-VPR
0.2 0.4 0.6 0.8 1 1.2 e x 5 p t s e n g a p e x 4 m i s e x 3 a l u 4 d i f f e q d s i p s e q a p e x 2 s 2 9 8 d e s b i g k e y f r i s c s p l a e l l i p t i c e x 1 1 p d c s 3 8 4 1 7 s 3 8 5 8 4 . 1 c l m a VPR Repl Arbor Remap
- Average Normalized Clock Period
- Max reduction of
REMAP vs Arbor 11.7%
Different Logic-level Mappers and Stability Effect of Remap
10 20 30 40 50 60 70 80 90 VPR Repl Remap FlowMap FlowMap-r ZMap Praetor Daomap
- FlowMap: optimal depth.
- FlowMap-r: relaxed depth.
- ZMap: optimal depth with simultaneous area minimization.
- Praetor: minimized area.
- Daomap
seq
Span: 12% Span: 4%
Summary
- Study of layout-level restructuring for
interconnect optimization.
– Functional Decomposition – Choice Tree – Remapping Algorithm – Simultaneous remapping and embedding
- Experimental Result