1
play

1 Clock skew optimization Another approach for sequential timing - PDF document

Sequential Timing Optimization Long path timing constraints Data must not reach destination FF too late d max (i,j) s i + d(i,j) + T setup s j + P i j s s i i sj d(i,j) T setup Short path timing constraints FF should not


  1. Sequential Timing Optimization Long path timing constraints • Data must not reach destination FF too late d max (i,j) s i + d(i,j) + T setup  s j + P i j s s i i sj d(i,j) T setup Short path timing constraints • FF should not get >1 data set per period d min (i,j) s i + d min (i,j)  s j + T hold i j s i i sj d min (i,j) T hold 1

  2. Clock skew optimization • Another approach for sequential timing optimization • Deliberately change the arrival times of the clock at various memory elements in a circuit for cycle borrowing – For zero skew, delay from clock source to all FF s = T For zero skew delay from clock source to all FF’s = T – Positive skew of  at FF k • Change delay from clock source to FF k to T +  – Negative skew of  at FF k • Change delay from clock source to FF k to T –  • Problem statement: set skews for optimized performance Sequential timing optimization • Two “true” sequential timing optimization methods – Retiming: moving latches around in a design Comb Block 1 Comb Block 2 FF FF FF FF FF FF Clk Clk Clk Clk Clk Clk – Clock skew optimization: deliberately changing clock arrival times so that the circuit is not truly “synchronous” Comb Block 1 Comb Block 2 FF FF FF FF FF FF Clk Clk Clk Clk Clk Clk Delay Finding the optimal clock period using skews • Represented by the optimization problem below - solve for P and optimal skews minimize P subject to j (for all pairs of FF’s (i,j) connected by a combinational path) s i + d min (i,j)  s j + Thold s i + d max (i,j) + T setup  s j + P • If d max (i,j) and d min (i,j) are constant – linear program in the variables s i and P 2

  3. Graph-based approaches • For a constant clock period P, the linear program = system of difference constraints s p - s q  constant • As before, perform a binary search on P • For each value of P build an equivalent constraint graph f ( P ) i j • Shortest path in the constraint graph gives a set of skews for a given value of P • If P is infeasible, there will be a negative cycle in the graph that will be detected during shortest-path calculations Retiming Assume unit gate delays, no setup times Comb Block 1 Comb Block 2 FF FF FF Clk lk Clk lk Clk lk Initial Circuit: P=3 FF FF FF Clk Clk Clk Retimed Circuit: P=2 Retiming: Definition • Relocation of flip-flops (FF’s) and latches (usually to achieve lower clock periods) • Maintain the latency of all paths in circuit, i.e., number of FF stages on any input-output path must remain unchanged FF stages on any input-output path must remain unchanged 3

  4. Graph Notation of Circuit w(e uv ) = 2 u v u v delay = d(u) delay = d(v) w(e uv ) = #latencies between u and v r(u) is # latencies moved across gate u r(PI) = r(PO) = 0: Merge them both into a “host” node h with r(h) = 0 w r (e uv ) = w(e uv ) + r(v) - r(u) w r (e uv ) = 2 w(e uv ) = 1 u v u v r(v) = 2 r(u) = 1 For a path from v 1 to v k • Consider a path of vertices v 1 v 2 v 3 v k w 12 w 23 w 34 W k-1,k – Define w(v 1 to v k ) = w 12 + w 23 + … + w (k-1,k) D fi ( t ) + + + – After retiming, w r (v 1 to v k ) = w 12r + w 23r + … + w (k-1,k)r = [w 12 +r(2)–r(1)]+[w 23 +r(3)–r(2)]+[w 23 +r(3)–r(2)]+…+[w (k-1,k) +r(k)–r(k-1)] = w(v 1 to v k ) + r(k) – r(1) – For a cycle, v 1 = v k , which implies that w r = w for a cycle – In other words, retiming leaves the # latencies unchanged on any cycle Constraints for retiming • Non-negativity constraints (cannot have negative latencies) – w r on each edge must be non-negative – For any edge from vertex u to vertex v, w r (u,v) = w(u,v) + r(v) – r(u)  0 w r (u,v) w(u,v) r(v) r(u)  0 i.e., r(u) – r(v)  w(u,v) • Period constraints (need a latency if path delay  period) – (or more precisely, path delay + T setup  period) – For any path from vertex v 1 to vertex v k , under clock period P, w r (v 1 to v k ) = w(v 1 to v k ) + r(v k ) – r(v 1 )  1 if delay(v 1 to v k ) > P i.e., r(v 1 ) – r(v k )  w(v 1 to v k ) – 1 if delay(v 1 to v k ) > P 4

  5. Comb Block 1 Comb Block 2 FF FF FF Example G1 G2 G3 G4 Clk Clk Clk • Circuit graph: • Non-negativity constraints – Vertex weights = gate delays 1. r(h) – r(G1)  0 – Edge weights = # latencies 2. r(G1) – r(G2)  0 3. r(G2) – r(G3)  0 h h 4. r(G3) – r(G4)  1 0 5. r(G4) – r(h)  0 0 0 • Period constraints for P = 2 6. r(h) – r(G3)  -1 1 G1 G4 1 7. r(G1) – r(G3)  -1 8. r(G2) – r(G4)  0 0 1 9. r(G2) – r(h)  0 0 G2 G3 1 1 Graph-based approaches • System of difference constraints r(u) – r(v)  c • Equivalent constraint graph c v u • Shortest path in the constraint graph gives a set of valid r values for a given value of P (note that period constraints change for different values of P) • If P is infeasible, there will be a negative cycle in the graph that will be detected during shortest-path calculations Corresponding shortest path problem h • Find shortest path from host to get 0 0 – r(h) = 0 – r(G1) = 0 G1 G4 0 -1 0 – r(G2) = 0 r(G2) 0 0 1 – r(G3) = 1 – r(G4) = 0 0 G2 G3 -1 • This gives the solution Comb Block 1 Comb Block 2 FF FF FF FF FF FF Clk Clk Clk Clk Clk Clk 5

  6. Overall scheme for minimum period retiming • Objective: to find a retiming that minimizes the clock period (the assignment of r values may not be unique due to slack in the shortest path graph!) – Binary search over P = [0,P unretimed ] – P unretimed = period of unretimed circuit = upper bound on optimal P p pp p unretimed – Range in some iteration of the search = [P min , P max ] – Build shortest path graph with non-negativity constraints (independent of P) – At each value of P • Add period constraints to shortest path graph (related to W, D matrices discussed in class – will not describe here) • Solve shortest path problem • If negative cycle found, set P min = P; else set P max = P • Iterate until range of P is sufficiently small Finding shortest paths • Dijkstra’s algorithm – O(VlogV + E) for a graph with V vertices and E edges – Applicable only if all edge weights are non-negative – The latter condition does not hold in our case! • • Bellman-Ford algorithm Bellman-Ford algorithm – O(VE) for a graph with V vertices and E edges – Outline for I = 1 to V – 1 for each edge (u,v)  E update neighbor’s weights as r(v) = min[r(u) + d(u,v),r(v)] for each edge (u,v)  E if r(u) + d(u,v) > r(v) then a negative cycle exists • Basic idea: in iteration I, update lowest cost path with I edges • After V – 1 iterations, if any update is still required, a negative cycle exists “Relaxation” algorithm for retiming • Perform a binary search on clock period P as before • At each value of P check feasibility as follows – Repeat V-1 times (where V = # vertices) 1 1. Set r(u) 0 for each vertex Set r(u) = 0 for each vertex 2. Perform timing analysis to find clock period of the circuit 3. For any vertex u with delay > P, r(u)++ 4. If no such vertex exists, P is feasible 5. Else, retime the circuit using these values of r; update the circuit and go to step 1 – If Clock period > P after V – 1 iterations, then P is infeasible 6

  7. The retiming-skew relationship Comb Block 1 Comb Block 2 • Skew FF FF FF Clk Clk Clk Delay = 1 • Retiming R i i FF FF FF Clk Clk Clk • Both borrow one unit of time from Comb Block 2 and lend it to Comb Block 1 • Magnitude of optimal skew = amount of delay that the FF has to move across • Can be generalized for another approach to retiming Can move from skews to retiming • Moving a flip-flop across a gate • More generally, G – left  right  �increasing its skew by delay(G) s1 FF j – s2 s2 s j = max 1  i  4 (s i +MAX(i,j)) FF k s3 s4 s k = max 1  i  4 (s i +MAX(i,k)) Delay=d New skew = s+d Old skew=s – right  left ��reducing its skew by delay(G) – Another approach to retiming • Two-phase approach – Phase A: Find optimal skews (complexity depends on the number of FF’s, not the number of gates) – Phase B: Relocate FF’s to retime circuit (since most FF movements are seen to be local in practice, this does not take too long) – Not provably better than earlier approach in terms of complexity, but practically works very well 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend