a framework for layout level logic restructuring
play

A Framework for Layout-Level Logic Restructuring Hosung Leo Kim - PowerPoint PPT Presentation

A Framework for Layout-Level Logic Restructuring Hosung Leo Kim John Lillis Motivation: Logical-to-Physical Disconnect Logic-level Optimization fixed netlist disconnect Limited by the structure Physical-level Optimization obtained from


  1. A Framework for Layout-Level Logic Restructuring Hosung Leo Kim John Lillis

  2. Motivation: Logical-to-Physical Disconnect Logic-level Optimization fixed netlist disconnect Limited by the structure Physical-level Optimization obtained from the logic-level • Performance is determined largely by physical- level Interconnect delay. • Problem: timing optimization at logic-level ≠ actual performance.

  3. Past Layout-Driven Restructuring Work: Replication Based • Basic Operations: – Gate Splitting – Fanout Partitioning; Enables “Path Straightening” • [Schabas, Brown. ISFPGA03] • [Beraudo, Lillis. DAC03] • [Hrkic, Lillis, Beraudo. TCAD06] • [Chen, Cong. ISFPGA05]

  4. Limitation of Logic Replication • While interconnect delay can be significantly reduced, the LUT-depth of a path remains unchanged. • The LUT-depth is typically determined by a technology mapper which does not have an accurate view of critical paths. Candidate: Remapping

  5. Other Work • Redundant Wires (e.g., [Chang, Cheng, Suaris, Marek- Sadowska. DAC00]) � rewire connections while keeping logical equivalence. � Predictable, but optimization scope limited • [Lin, Jagannathan, and Cong. ISFPGA03] � Remap based on placement-level timing analysis � Significant restructuring, but placement of remapped cells determined by initial placement (not simultaneous). • [Singh and Brown. Integration07] � Shannon’s expansion / precomputation � Allows late signals to skip logic levels, but relatively local in nature

  6. Objectives • Overcome limitations of basic replication (e.g., fixed LUT-depth) • Large and flexible remapping space • Explicitly account for placement freedom of remapped LUTs • Tight coupling with placement

  7. Components of Approach (FPGA Domain) Placement-Level Timing-Critical Induce Fan-in Cone Replication Tree Static Timing Extraction [Hrkic,TCAD06] Analysis

  8. Components of Approach (cont’d) Mapper and Recursive, Exhaustive Embedder Legalizer Ashenhurst LUT (Dynamic Replication Decomposition Subject Graph Programming) tree (Choice Tree) A choice A node i j k l a B C l j l k l b R d i i k i j a b c d e f g h k j (a) Given LUT-tree i j k l B choice C choice node node c R e g 1 g 4 g 6 g 7 g 2 g 5 g 8 h e f d c c d a b f h c d e h c d e g g 3 f g a b a b c d e f g h (b) Choice tree

  9. Remapping Example A A B E m B C D m d e C D d e f g h i j k l a E b c h a b c g f i j k l (a) Given LUT-tree (b) “Mini-LUT” tree after LUT-decompositions A ′ A ′ B ′ d e E E B ′ m d e C D a b c a m C D b c h g f h i j k l g f i j k l (c) Alternative mapping (d) Corresponding LUT-tree

  10. Functional Decomposition • Simple Disjoint Functional Decomposition wz xy 0 0 0 0 0 1 1 0 • Test for decomposability 1 0 0 1 – Ashenhurst’s theorem g 1 1 1 1 1 f x y 1 bit (Simple) g 2 w x y z • Recursively decompose disjoint w z

  11. All Recursive Decompositions g 1 g 4 g 1 g 2 d d c g 2 g 5 g 2 g 3 c ′ f a b c ′ d c d g 3 a b g 3 g 3 g 4 g 5 c ′ f a b a b g 5 g 3 d g 6 g 3 c ′ d g 6 g 7 a b c d g 7 a b g 8 g 8 c ′ d c d a b g 3 g 8 a b c d

  12. Choice Tree [Lehman,TCAD97] A choice A node i j k l B C l j l k l i i k i j a b c d e f g h k j (a) Given LUT-tree i j k l choice B C choice node node g 1 g 4 g 6 g 7 g 2 g 5 g 8 h e d c c d a b f h c d e h c d e g g 3 f g a b a b c d e f g h (b) Choice tree

  13. Algorithm • Mini-LUT Tree Mapping • Fan-in Tree Embedding [Hrkic,TCAD06] • Simultaneous Remapping and Embedding

  14. Logic Remapping Formulation • Formulation – Given a “mini-LUT” tree and arrival time at the leaves, – map the tree to K -input LUTs minimizing cost subject to an arrival time constraint at the root. a b c g d e f h i j k l m

  15. Solution Signature • (c,a) – for a sub-tree rooted u , a solution is characterized by two parameters: • cost of the embedding (and remapping) of a sub-tree. • arrival time at u . • Dominance Relation arrival time – ( c , a ) is not dominated by ( c ’, a ’) when c is better than c ’ or a is better than a ’. cost

  16. Solution Sets • S i [ u ] = {( c , a )} J u – u : signal produced by root LUT – i : # inputs of root LUT h i – c : # LUTs in subtrees (0,6) (0,2) – a : the latest among the fan-ins. J S 2 [ u ]={(0,6)} • S i [ u ] u – “finalized” solution from S i [ u ]. J – c : # LUTs in subtrees + 1 h i – a : the root LUT included. (0,6) (0,2) S 2 [ u ]={(1,7)} • S [ u ] – non-dominated_sol( S 2 [ b ], … , S K [ b ])

  17. J S i [ u ] Example For simplicity: one LUT = one unit cost one LUT = one unit delay

  18. S i [ u ] and S [ u ] Example • S i [ b ] • S [ b ] = non-dominated_sol( S 2 [ b ], … , S K [ b ]) = {(1,7)}

  19. J Computation of S i [ u ] i = 1, no collapsing of u and L i = K - 1 , no c ollapsing of u and R L R Otherwise , collapsing of u, L , and R . K - i i (a) u u i = 3 i = 1 i = 2 a b a b (= K –1) d c d c K = 4 (b) (c) (d) S 4 [ u ] = join( S [ a ], S 3 [ b ]) ∪ join( S 2 [ a ], S 2 [ b ]) ∪ join( S 3 [ a ], S [ b ]) J J J J J

  20. Remapping Algorithm Example arrival time a b c g d e f (0,4) h i j k l m (0,6) (0,2) (0,3) (0,2) (0,1) (0,4) (a) Subject Tree

  21. Algorithms • Mini-LUT Tree Mapping • Fan-in Tree Embedding [Hrkic,TCAD06] • Simultaneous Remapping and Embedding

  22. Tree Embedding [Hrkic,TCAD06] a a b R d topology b R arrival time c R d pin locations c R e (0,4) arrival time e f f (0,3) (0,2) Embedding target layout graph Algorithm cost metrics a a d b R d e c R e f f

  23. Algorithms • Mini-LUT Tree Mapping • Fan-in Tree Embedding [Hrkic,TCAD06] • Simultaneous Remapping and Embedding

  24. Simultaneous Remapping and Embedding • Formulation – Given a “mini-LUT” tree with fixed leaves and root, and arrival time at the leaves, a target layout graph – Simultaneously map the tree to K -input LUTs and embed.

  25. J Solution Set S i [ u ][ v ] • The remapped root produces signal u and is placed at v in the target layout graph.

  26. Solution Set S i [ u ][ v ] • Solutions S i [ u ][w] are finalized and drives vertex J v in the target layout graph. • Computed by shortest weight-constrained path algorithm. S i [ u ][ v ] v u w h i j k

  27. Solution Set S [ u ][ v ] • S [ u ][ v ] ← non-dominated-sol( S 2 [ u ][ v ],…, S K [ u ][ v ]) • The best remapping regardless of the number of inputs at v in the target layout graph.

  28. Simultaneous Remapping and Embedding Example a v 23 b c h g g d e f i m j k l h i j k l m (c) S 4 [ a ][ v 23 ]={(22,10)} (19,13) arrival time (20,11) (22,10) cost (c) S [ a ][ v 23 ]={(19,13),(20,11),(22,10)}

  29. Experiment • Benchmarks – 20 MCNC benchmark circuits – At least 20% white space • Comparisions – Timing-driven VPR placer – Replication Tree embedder – Arbor embedder [Kim,GLSVLSI06] – Remapping embedder • Criteria of Interest – LUT depth – Clock period of circuits • Different logic-level mappers and Stability effect of new algorithm

  30. Optimization Flow Initial Netlist & Placement Static Timing Analysis & Replication Tree Construction Modified Netlist Tree Embedding •Repl Tree embedder •Remapping embedder Post-Processing & Legalization Modified Netlist & Placement

  31. LUT Depth Changes Init. ckt New ckt Init. ckt New ckt ckt Crit. ckt Crit. ckt Crit. ckt Crit. Path Path Path Path ex5p 9 8 9 7 des 8 5 8 4 tseng 12 12 12 11 bigkey 5 4 5 4 apex4 8 8 9 8 frisc 23 20 22 22 misex3 9 8 9 8 spla 10 10 10 9 alu4 9 9 9 8 elliptic 17 15 18 15 diffeq 14 14 13 10 ex1010 10 10 10 9 dsip 8 5 8 4 pdc 11 10 11 8 seq 8 8 9 7 s38417 10 10 10 8 apex2 10 10 10 9 s38584.1 10 5 10 5 s298 16 16 14 13 clma 15 14 14 13

  32. Routed Clock Period 1.2 1 0.8 VPR Repl 0.6 Arbor Remap 0.4 0.2 0 y p 4 q 2 c 0 7 1 g 3 4 p q 8 s c a c a e 5 x u e e x e s i 1 d 1 . n x i 9 l t m 4 s p e e k p x e e f 2 d i 0 p 4 l s a f d r 8 g s l e p p i 1 8 s s i s f c d l 5 i l a i a x 3 t b e m 8 e s 3 s • Average Normalized Clock Period • Max reduction of REMAP vs Arbor T-VPR Repl Arbor Remap 11.7% Avg Delay 1 0.886 0.848 0.826

  33. Different Logic-level Mappers and Stability Effect of Remap • FlowMap: optimal depth. • FlowMap-r: relaxed depth. • ZMap: optimal depth with simultaneous area minimization. • Praetor: minimized area. • Daomap 90 80 Span: 12% Span: 4% 70 60 FlowMap FlowMap-r 50 ZMap 40 Praetor Daomap 30 20 10 0 VPR Repl Remap seq

  34. Summary • Study of layout-level restructuring for interconnect optimization. – Functional Decomposition – Choice Tree – Remapping Algorithm – Simultaneous remapping and embedding • Experimental Result – Average 17% reduction on clock period compared with T-VPR.

  35. Thank You!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend