A GPU Register File using Static Data Compression
Alexandra Angerd, Erik Sintorn, Per Stenström Department of Computer Science and Engineering Chalmers University of Technology Göteborg, Sweden
A GPU Register File using Static Data Compression Alexandra Angerd, - - PowerPoint PPT Presentation
A GPU Register File using Static Data Compression Alexandra Angerd, Erik Sintorn, Per Stenstrm Department of Computer Science and Engineering Chalmers University of Technology Gteborg, Sweden Motivation Register file Threads . . .
Alexandra Angerd, Erik Sintorn, Per Stenström Department of Computer Science and Engineering Chalmers University of Technology Göteborg, Sweden
2
… … … … … … … …
Limiting factors for TLP:
Threads Register file
3
13.5x
Sizes keep increasing! Already huge and power hungry! Instead: decrease footprint!
4
Tuned precision High Medium Low Register file
5
Narrow values Register file
k = 0 while k < 50{ i = 0 j = k while i < j{ print k i = i + 1 k = k + 1 } } print k k1 = φ(k0, k2) k1 < 50? k0 = 0 kt = k1∩[−∞,49] i0 = 0 j0 = kt print kf i1 = φ(i0,i2) i1 < j0? k2 = kt + 1 print kt i2 = i1 + 1 t f t f
(a) (b)
I[k0] = [0,0] I[k1] = [0,50] I[k2] = [1,50] I[kt] = [0,49] I[kf] = [50,50] I[i0] = [0,0] I[i1] = [0,49] I[i2] = [1,50] I[j0] = [0,49] I[k] = I[kx] = [0,50] I[i] = I[ix] = [0,50] I[j] = I[jx] = [0,49] k : 6 bits i : 6 bits j : 6 bits
(c) (d)
6
How to design a register file which utilizes both narrow floats AND narrow integers?
7
8
Angerd, Sintorn, Stenström. “A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs”, ACM Transactions on Architecture and Code Optimization (TACO), Volume 14 Issue 4, December 2017 . Pereira, Rodrigues, Campos. “A Fast and Low-overhead Technique to Secure Programs Against Integer Overflows”, In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization
10
32
V 2
32
V 1
32
R0 R1 Register p0 p1 m0 m1 … … … … … V1 R0
… V2 R0
… R0 Baseline Our Approach Indirection table Changes to baseline:
V 1 V 2
V 1 V 2
8 bits 24 bits
11
Bit-width [exponent bits , mantissa bits]
32 bits 28 bits 24 bits 20 bits 16 bits 12 bits 8 bits
IEEE754-compliant
[8 , 23]
[8 , 23] [7 , 20] [6 , 17] [5 , 14] [5 , 10] [4 , 7] [3 , 4]
12
13
14
15
16
17
Register pressure lowered in all cases Both integer and float reduction is important
18
Quality: Very high
Average: 18.6% increase in IPC
SSIM ≥ 0.9
Binary: All outputs correct
19
20
21