1 Clock skew optimization Another approach for sequential timing - PDF document

Sequential Timing Optimization Long path timing constraints • Data must not reach destination FF too late d max (i,j) s i + d(i,j) + T setup  s j + P i j s s i i sj d(i,j) T setup Short path timing constraints • FF should not get >1 data set per period d min (i,j) s i + d min (i,j)  s j + T hold i j s i i sj d min (i,j) T hold 1

Clock skew optimization • Another approach for sequential timing optimization • Deliberately change the arrival times of the clock at various memory elements in a circuit for cycle borrowing – For zero skew, delay from clock source to all FF s = T For zero skew delay from clock source to all FF’s = T – Positive skew of  at FF k • Change delay from clock source to FF k to T +  – Negative skew of  at FF k • Change delay from clock source to FF k to T –  • Problem statement: set skews for optimized performance Sequential timing optimization • Two “true” sequential timing optimization methods – Retiming: moving latches around in a design Comb Block 1 Comb Block 2 FF FF FF FF FF FF Clk Clk Clk Clk Clk Clk – Clock skew optimization: deliberately changing clock arrival times so that the circuit is not truly “synchronous” Comb Block 1 Comb Block 2 FF FF FF FF FF FF Clk Clk Clk Clk Clk Clk Delay Finding the optimal clock period using skews • Represented by the optimization problem below - solve for P and optimal skews minimize P subject to j (for all pairs of FF’s (i,j) connected by a combinational path) s i + d min (i,j)  s j + Thold s i + d max (i,j) + T setup  s j + P • If d max (i,j) and d min (i,j) are constant – linear program in the variables s i and P 2

Graph-based approaches • For a constant clock period P, the linear program = system of difference constraints s p - s q  constant • As before, perform a binary search on P • For each value of P build an equivalent constraint graph f ( P ) i j • Shortest path in the constraint graph gives a set of skews for a given value of P • If P is infeasible, there will be a negative cycle in the graph that will be detected during shortest-path calculations Retiming Assume unit gate delays, no setup times Comb Block 1 Comb Block 2 FF FF FF Clk lk Clk lk Clk lk Initial Circuit: P=3 FF FF FF Clk Clk Clk Retimed Circuit: P=2 Retiming: Definition • Relocation of flip-flops (FF’s) and latches (usually to achieve lower clock periods) • Maintain the latency of all paths in circuit, i.e., number of FF stages on any input-output path must remain unchanged FF stages on any input-output path must remain unchanged 3

Graph Notation of Circuit w(e uv ) = 2 u v u v delay = d(u) delay = d(v) w(e uv ) = #latencies between u and v r(u) is # latencies moved across gate u r(PI) = r(PO) = 0: Merge them both into a “host” node h with r(h) = 0 w r (e uv ) = w(e uv ) + r(v) - r(u) w r (e uv ) = 2 w(e uv ) = 1 u v u v r(v) = 2 r(u) = 1 For a path from v 1 to v k • Consider a path of vertices v 1 v 2 v 3 v k w 12 w 23 w 34 W k-1,k – Define w(v 1 to v k ) = w 12 + w 23 + … + w (k-1,k) D fi ( t ) + + + – After retiming, w r (v 1 to v k ) = w 12r + w 23r + … + w (k-1,k)r = [w 12 +r(2)–r(1)]+[w 23 +r(3)–r(2)]+[w 23 +r(3)–r(2)]+…+[w (k-1,k) +r(k)–r(k-1)] = w(v 1 to v k ) + r(k) – r(1) – For a cycle, v 1 = v k , which implies that w r = w for a cycle – In other words, retiming leaves the # latencies unchanged on any cycle Constraints for retiming • Non-negativity constraints (cannot have negative latencies) – w r on each edge must be non-negative – For any edge from vertex u to vertex v, w r (u,v) = w(u,v) + r(v) – r(u)  0 w r (u,v) w(u,v) r(v) r(u)  0 i.e., r(u) – r(v)  w(u,v) • Period constraints (need a latency if path delay  period) – (or more precisely, path delay + T setup  period) – For any path from vertex v 1 to vertex v k , under clock period P, w r (v 1 to v k ) = w(v 1 to v k ) + r(v k ) – r(v 1 )  1 if delay(v 1 to v k ) > P i.e., r(v 1 ) – r(v k )  w(v 1 to v k ) – 1 if delay(v 1 to v k ) > P 4

Comb Block 1 Comb Block 2 FF FF FF Example G1 G2 G3 G4 Clk Clk Clk • Circuit graph: • Non-negativity constraints – Vertex weights = gate delays 1. r(h) – r(G1)  0 – Edge weights = # latencies 2. r(G1) – r(G2)  0 3. r(G2) – r(G3)  0 h h 4. r(G3) – r(G4)  1 0 5. r(G4) – r(h)  0 0 0 • Period constraints for P = 2 6. r(h) – r(G3)  -1 1 G1 G4 1 7. r(G1) – r(G3)  -1 8. r(G2) – r(G4)  0 0 1 9. r(G2) – r(h)  0 0 G2 G3 1 1 Graph-based approaches • System of difference constraints r(u) – r(v)  c • Equivalent constraint graph c v u • Shortest path in the constraint graph gives a set of valid r values for a given value of P (note that period constraints change for different values of P) • If P is infeasible, there will be a negative cycle in the graph that will be detected during shortest-path calculations Corresponding shortest path problem h • Find shortest path from host to get 0 0 – r(h) = 0 – r(G1) = 0 G1 G4 0 -1 0 – r(G2) = 0 r(G2) 0 0 1 – r(G3) = 1 – r(G4) = 0 0 G2 G3 -1 • This gives the solution Comb Block 1 Comb Block 2 FF FF FF FF FF FF Clk Clk Clk Clk Clk Clk 5

Overall scheme for minimum period retiming • Objective: to find a retiming that minimizes the clock period (the assignment of r values may not be unique due to slack in the shortest path graph!) – Binary search over P = [0,P unretimed ] – P unretimed = period of unretimed circuit = upper bound on optimal P p pp p unretimed – Range in some iteration of the search = [P min , P max ] – Build shortest path graph with non-negativity constraints (independent of P) – At each value of P • Add period constraints to shortest path graph (related to W, D matrices discussed in class – will not describe here) • Solve shortest path problem • If negative cycle found, set P min = P; else set P max = P • Iterate until range of P is sufficiently small Finding shortest paths • Dijkstra’s algorithm – O(VlogV + E) for a graph with V vertices and E edges – Applicable only if all edge weights are non-negative – The latter condition does not hold in our case! • • Bellman-Ford algorithm Bellman-Ford algorithm – O(VE) for a graph with V vertices and E edges – Outline for I = 1 to V – 1 for each edge (u,v)  E update neighbor’s weights as r(v) = min[r(u) + d(u,v),r(v)] for each edge (u,v)  E if r(u) + d(u,v) > r(v) then a negative cycle exists • Basic idea: in iteration I, update lowest cost path with I edges • After V – 1 iterations, if any update is still required, a negative cycle exists “Relaxation” algorithm for retiming • Perform a binary search on clock period P as before • At each value of P check feasibility as follows – Repeat V-1 times (where V = # vertices) 1 1. Set r(u) 0 for each vertex Set r(u) = 0 for each vertex 2. Perform timing analysis to find clock period of the circuit 3. For any vertex u with delay > P, r(u)++ 4. If no such vertex exists, P is feasible 5. Else, retime the circuit using these values of r; update the circuit and go to step 1 – If Clock period > P after V – 1 iterations, then P is infeasible 6

The retiming-skew relationship Comb Block 1 Comb Block 2 • Skew FF FF FF Clk Clk Clk Delay = 1 • Retiming R i i FF FF FF Clk Clk Clk • Both borrow one unit of time from Comb Block 2 and lend it to Comb Block 1 • Magnitude of optimal skew = amount of delay that the FF has to move across • Can be generalized for another approach to retiming Can move from skews to retiming • Moving a flip-flop across a gate • More generally, G – left  right  �increasing its skew by delay(G) s1 FF j – s2 s2 s j = max 1  i  4 (s i +MAX(i,j)) FF k s3 s4 s k = max 1  i  4 (s i +MAX(i,k)) Delay=d New skew = s+d Old skew=s – right  left ��reducing its skew by delay(G) – Another approach to retiming • Two-phase approach – Phase A: Find optimal skews (complexity depends on the number of FF’s, not the number of gates) – Phase B: Relocate FF’s to retime circuit (since most FF movements are seen to be local in practice, this does not take too long) – Not provably better than earlier approach in terms of complexity, but practically works very well 7

1 Clock skew optimization Another approach for sequential timing - PDF document

Sequential Timing Optimization Long path timing constraints Data must not reach destination FF too late d max (i,j) s i + d(i,j) + T setup s j + P i j s s i i sj d(i,j) T setup Short path timing constraints FF should not

Evaluation Robert W. Lindeman Worcester Polytechnic Institute Department of Computer Science

Sparsity and decomposition in semidefinite optimization Lieven Vandenberghe ECE Department, UCLA

Support Vector Machines (II): Non-linear SVMs LING 572 Advanced Statistical Methods for NLP

MSc in Computer Engineering, Cybersecurity and Artificial Intelligence Course FDE , a.a.

Cylindric Skew Schur Functions University of Minnesota Combinatorics Seminar 5 November 2004

CENG 4480 Lecture 10: Clock Bei Yu Reference : Chapter 11 Clock Distribution High speed

PARTIAL ACTIONS OF GROUPS ON ALGEBRAS Miguel Ferrero, with D. Bagio, W. Cort es, M.

SDP Capability Negotiation draft-ietf-mmusic-sdp-capability-negotiation-12.txt Flemming Andreasen

Frontier Capabilities Working Group Summary Mark Palmer for the Study Group Conveners High

A systematic projection of the future of UUSs S P Way 1 , G R Tapsfield 2, S Summers 3 and A C

A conformal bootstrap approach to the Potts model Sylvain Ribault, October 2016 based on work

Fast and slow mixing of Markov chains for the ferromagnetic Potts model Catherine Greenhill

The U(1)-invariant Potts model and its symmetries Eric Vernier (Oxford University) with Paul

Improved mixing bounds for the anti-ferromagnetic Potts model on Z 2 Markus Jalsenius

On the Kertsz line: Thermody- namic versus Geometric phase transitions Jean RUIZ Centre de

B o u n d a r y T e n s o r R e n o r m a l i z a t i o n G r o u

Complexity of Counting (Computer Science) Can we efficiently count, e.g.: # colorings? #

Mixing Time Analysis of the Glauber Dynamics for the Curie-Weiss-Potts Model Precise Asymptotics.

Higher-order CRFs Nikos Komodakis (University of Crete) Introduction Conditional Random Fields

Random-Field Curie-Weiss-Potts Model: From average to pointwise estimates of metastable times

The Tutte polynomial, its applications and generalizations Sergei Chmutov The Ohio State

On the Gibbs states of the non-critical Potts model on Z 2 Joint work with H. Duminil-Copin, D.

The Complexity of Counting Edge Colorings and a Dichotomy for Some Higher Domain Holant Problems

Structural biomathematics: an overview of molecular simulations and protein structure prediction