Pipelined Compressor Tree Optimization using Integer Linear Programming
International Conference on Field Programmable Logic 03.09.2014 Martin Kumm, Peter Zipf
University of Kassel, Germany
Pipelined Compressor Tree Optimization using Integer Linear - - PowerPoint PPT Presentation
Pipelined Compressor Tree Optimization using Integer Linear Programming International Conference on Field Programmable Logic 03.09.2014 Martin Kumm, Peter Zipf University of Kassel, Germany C ONTENTS 1. Introduction to Compressor Trees 2.
University of Kassel, Germany
2
3
4
i
24 23 22 21 20 input vectors
5
i
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3·24 +2·23 +4·22 +3·21 +4·20 = 90 +22 +7 +13 +27 21 = 90 input vectors
6
7
[Dinechin HEART’13]
8
FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA
9
10
5 5 5 5 5 bits in stage 0 − 3
+ 1 1 − 3
+ 1 1 − 3
+ 1 1 − 3
+ 1 1 − 3
+ 1 1 = 1 4 4 4 4 3 bits in stage 1
11
1 4 4 4 4 3 bits in stage 1 − 3
+ 1 1 − 3
+ 1 1 − 3
+ 1 1 − 3
+ 1 1 − 3
+ 1 1 = 1 3 3 3 3 1 bits in stage 2
12
1 3 3 3 3 1 bits in stage 2 − 3
+ 1 1 − 3
+ 1 1 − 3
+ 1 1 − 3
+ 1 1 = 2 2 2 2 1 1 bits in stage 3
13
2 2 2 2 1 1 bits in stage 3 − 2 2 2 2
+ 1 1 1 1 1 = 1 1 1 1 1 1 1 bits in final stage
14
15
FA FA FA
16
1 1 1
Carry Logic
1
Slice LUT FA FA
17
1 1 1
Carry Logic
1
FA Slice LUT FA FA FA
18
1 1 1
Carry Logic
1
FA FA Slice LUT HA HA FA FA
19
1 1 1
Carry Logic
1
Slice LUT FA FA FA FA FA FA FA FA HA FA FA HA FA FA FA FA FA FA
20
1 1 1
Carry Logic
1
Slice LUT FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA
21
22
5 5 5 5 5 bits in stage 0 − 1 4 1 5
+ 1 1 1 1 1 − 1 4 1 4
+ 1 1 1 1 1 = 1 6 2 2 2 1 bits in stage 1 1 6 2 2 2 1 bits in stage 1 − 6
+ 1 1 1 = 1 2 1 2 2 2 1 bits in stage 2
23
5 5 5 5 5 bits in stage 0 − 2 4 5
+ 1 1 1 1 1 − 5 5
+ 1 1 1 1 1 − 3 1
+ 3 1 = 1 1 2 5 2 2 1 bits in stage 1 1 1 2 5 2 2 1 bits in stage 1 − 1 1 2 5
+ 1 1 1 1 1 − 2 2 1
+ 2 2 1 = 1 1 1 1 1 2 2 1 bits in stage 2
24
25
26
minimize
S−1
X
s=0 C−1
X
c=0 E−1
X
e=0
ceks,e,c subject to C1: Ns−1,c ≤
E−1
X
e=0 Ce−1
X
c0=0
Me,c+c0 ks−1,e,c+c0 ) s = 1 . . . S − 1, c = 0 . . . C − 1, if Ds = 0 C2: Ns,c =
E−1
X
e=0 Ce−1
X
c0=0
Ke,c+c0 ks−1,e,c+c0 ) s = 1 . . . S − 1, c = 0 . . . C − 1 C3: Ns,c ≤ ⇢ 2 for two-input VMA 3 for ternary VMA if Ds = 1 C4:
S−1
X
s=1
Ds = 1
27
E−1
e=0 Ce−1
c0=0
28
50 100 150 200 250 300 100 200 300 400 500 600 700 Compressed bits #LUT Heuristic [8]
(a)
(a)
50 100 150 200 250 300 50 100 150 200 250 Compressed bits #LUT Heuristic [8]
29
30
[Parandeh-Afshar TRETS’11]: H. Parandeh-Afshar, A. Neogy, P. Brisk, and P. Inne, “Compressor Tree Synthesis on Commercial High-Performance FPGAs,” ACM TRETS, 2011 [Dinechin HEART’13]: F. de Dinechin, M. Istoan, and G. Sergent, “Fixed-Point Trigonometric Functions on FPGAs,” HEART 2013,
[Dinechin FPL’13]: N. Brunie, F. de Dinechin, M. Istoan, G. Sergent,
Heaps,” FPL 2013 [Matsunaga’13]: T. Matsunaga, S. Kimura, and Y. Matsunaga, “An Exact Approach for GPC-Based Compressor Tree Synthesis,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Dec. 2013.
34
35
Heuristic [Dinechin FPL’13] proposed ILP Size [bits] LUT4 FF Slices fmax [MHz] LUT4 FF Slices fmax [MHz] 16 34 20 25 501.5 28 21 25 562.4 25 45 39 29 455.2 46 45 39 562.1 36 78 63 59 489.5 54 56 35 491.4 49 123 86 73 444.8 79 78 46 481.9 64 181 108 109 412.9 123 120 100 471.5 81 209 132 117 420.7 141 135 106 477.8 100 267 173 174 414.8 181 178 109 454.6 121 332 182 181 332.6 242 247 211 435.4 144 395 243 255 376.2 272 273 223 441.1 169 492 283 277 344.8 309 317 197 428.3 196 582 328 368 355.0 407 416 340 423.2 225 622 345 410 333.9 444 451 349 424.3 256 706 386 459 343.3 506 518 438 410.3 Avg.: 312.8 183.7 195.1 401.9 217.8 219.6 170.6 466.5 Imp.: – – – – 30.3%
12.5% 16.1%
36
Heuristic [Dinechin FPL’13] proposed ILP Size [bits] LUT6 FF Slices fmax [MHz] LUT6 FF Slices fmax [MHz] 16 12 7 3 478.0 10 9 3 639.4 25 24 11 6 636.5 26 25 7 452.9 36 32 13 9 595.6 27 36 7 603.1 49 44 15 12 492.4 35 40 10 407.7 64 59 19 16 407.7 47 48 13 506.8 81 76 21 20 442.9 56 59 15 480.1 100 96 47 26 435.9 77 98 20 437.5 121 116 26 32 401.6 89 112 25 438.6 144 134 28 35 383.9 94 121 24 469.0 169 161 60 43 396.8 119 155 30 470.6 196 189 76 50 358.0 131 160 35 408.0 225 216 81 56 327.2 192 236 57 364.0 256 251 74 66 338.3 204 251 55 372.3 Avg.: 108.5 36.8 28.8 438.1 85.2 103.8 23.2 465.4 Imp.: – – – – 21.5%
19.5% 6.2%
37
GPC / Compressor #LUT6 (k) Efficiency (E = δ/k) delay LUT based GPCs from [Dinechin FPL’13] (3;2) GPC 1 1 τL ≈ τ (6;3) GPC 3 1 τL ≈ τ (1,5;3) GPC 3 1 τL ≈ τ Improved GPC mappings from [Parandeh-Afshar TRETS’11]: (6;3) GPC 3 1 2τL + τR + 3τCC ≈ 3τ (1,5;3) GPC 2 1.5 τL + 2τCC ≈ τ (2,3;3) GPC 2 1 τL + 2τCC ≈ τ (7;3) GPC 3 1.33 2τL + τR + 3τCC ≈ 3τ (5,3;4) GPC 3 1.33 2τL + τR + 3τCC ≈ 3τ (6,2;4) GPC 3 1.33 2τL + τR + 3τCC ≈ 3τ
38
GPC / Compressor #LUT6 (k) Efficiency (E = δ/k) delay GPCs and 4:2 compressor from [Kumm MBMV’13]: (5,0,6;5) GPC 4 1.5 τL + 4τCC ≈ τ (1,4,1,5;5) GPC 4 1.5 τL + 4τCC ≈ τ (1,4,0,6;5) GPC 4 1.5 τL + 4τCC ≈ τ (2,0,4,5;5) GPC 4 1.5 2τL + τR + 4τCC ≈ 3τ 4:2 compressor k 2 − 2
k
τL + kτCC Adder with k BLE: 2-input adder k 1 τL + kτCC 3-input adder k 2 − 2
k
2τL + τR + kτCC ≈ 3τ + kτCC Proposed GPCs: (6,0,6;5) GPC 4 1.75 τL + 4τCC ≈ τ (1,3,2,5;5) GPC 4 1.5 τL ≈ τ
39
1 1 1
Carry Logic
1
Slice LUT HA FA FA FA
40
1
Slice LUT FA
1
FA
1
Carry Logic
1
FA
41
42
FA FA FA FA
1 1 1
Carry Logic
1
Slice LUT FA FA FA
43
44