Static Transformation for Heap Layout Using Memory Access Patterns
Jinseong Jeon
Computer Science, KAIST
Static Transformation for Heap Layout Using Memory Access Patterns - - PowerPoint PPT Presentation
Static Transformation for Heap Layout Using Memory Access Patterns Jinseong Jeon Computer Science, KAIST Static Transformation computing user compiler machine + static transformation Compilers can transform program memory layout.
Computer Science, KAIST
2006-1
CS @ KAIST 2
computing machine compiler + static transformation user
– program behaviors: memory access patterns – machine properties: memory hierarchy
2006-1
CS @ KAIST 3
[ Pool Allocation ] - complex pointer analysis [ Field Layout Reconstruction ] - profiling Node { int key; char data[6]; Node *next; } * T; char* search(int k) { ... while (...) { if (h→key == k) return h→data; h = h→next; } ... }
k n
...
d
...
2006-1
CS @ KAIST 4
– Based on both heap layout transformations
– How to represent memory access behaviors
– How to extract run-time behaviors from codes
– How to interpret predicted behaviors
2006-1
CS @ KAIST 5
Structure Selection Analysis Field Affinity Analysis Access Pattern Analysis Layout Transformer Source code Optimized code
2006-1
CS @ KAIST 6
S1.x = T1.c; for (...) { Ti.a = ...; ... = Ti.b; Uj.y = ...; } S = T ; for ( ) { T = ; = T ; U = ; }
TS TTU
TS(TTU)*
T U
conversion candidate selection for pool allocation structure type projection
2006-1
CS @ KAIST 7
= .c; for ( ) { .a = ; = .b; }
c ab
c(ab)*
a c b *
a b
...
c
...
field usage projection conversion symbolic estimation field layout reconstruction
S1.x = T1.c; for (...) { Ti.a = ...; ... = Ti.b; Uj.y = ...; }
2006-1
CS @ KAIST 8
– record closure marks with nesting information – regard all closure marks as a same variable
n k d ** * * ** * * ** * * * n k d 2x2+3x 3x x x2
(kdn(n)*)* ((kn)*(kd+))* ((kn)*(kd+))*
2006-1
CS @ KAIST 9
– Oi.next is converted into *(Oi + offset(next)). – Random pointer dereferences like *(p + 4) are not allowed. – For some accesses, extra instructions are required.
2006-1
CS @ KAIST 10
– For custom allocators, feed hints which consist of target structures and corresponding custom allocators
... ... = malloc(sizeof(T)); ... ... ... = _T_alloc_(); ... char* my_malloc(int s) { ... ... = _T_alloc(); } char* _T_alloc_() { // pool allocation } ... ... = my_malloc(sizeof(T)); ... char* my_malloc(int s) { ... ... = malloc(s); } ... ... = my_malloc(sizeof(T)); ... char* _T_alloc_() { // pool allocation }
2006-1
CS @ KAIST 11
Structure Selection Analysis Field Affinity Analysis Access Pattern Analysis Layout Transformer Source code Optimized code
2006-1
CS @ KAIST 12
CIL compiler and OCaml
– 2.6GHz Pentium4 processor – 8KB L1D cache, 512KB L2 cache, 1.7GB main memory
2006-1
CS @ KAIST 13
Benchmark Program Lines of Code Structure Selection Field Affinity Code Transform Total
SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench analyzer 763 0.096 0.027 0.012 0.135 McGill chomp misr 378 181 0.021 0.003 0.006 0.002 0.003 0.001 0.030 0.006 Olden suite bisort health mst perimeter treeadd tsp voronoi 597 474 408 345 154 433 975 0.020 0.024 0.031 0.012 0.002 0.011 0.048 0.003 0.004 0.004 0.012 0.000 0.004 0.004 0.002 0.002 0.002 0.001 0.000 0.002 0.003 0.025 0.030 0.037 0.025 0.002 0.017 0.055 Ptrdist suite anagram bc ft ks 355 4303 926 551 0.031 2.028 0.050 0.055 0.003 0.634 0.014 0.012 0.001 0.193 0.010 0.020 0.035 2.855 0.074 0.087
2006-1
CS @ KAIST 14
0. 0.1 0. 0.2 0. 0.3 0. 0.4 0. 0.5 0. 0.6 0. 0.7 0. 0.8 0. 0.9 1 1. 1.1 1. 1.2 1. 1.3 1. 1.4 175. 175.vpr pr 300. 300.twol
anal analyzer er chomp mp mi misr bi bisort
heal health ms mst perime meter tr treeadd tsp tsp vo voronoi anagram anagram bc bc ft ft ks ks Norma malized L1D cache mi miss (1.0 = Original) Pool Pool Pool + Re Re
1.99 1.99 2.23 2.23 Po Pool l 0.86 0.86 0.84 0.84 Po Pool l + + Re
2006-1
CS @ KAIST 15
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 175. 175.vpr pr 300. 300.twol
anal analyzer er chomp mp mi misr bi bisort
heal health ms mst perime meter tr treeadd tsp tsp vo voronoi anagram anagram bc bc ft ft ks ks Norma malized L2 cache mi miss (1.0 = Original) Pool Pool Pool + Re Re
4.10 4.10 4.18 4.18 Po Pool l 1.06 1.06 1.00 1.00 Po Pool l + + Re
2006-1
CS @ KAIST 16
Benchmark Program Lines of Code Original ( second) Pool / Original Pool + Re / Original
SPECINT 2 000 175.vpr 300.twolf 11301 17821 10.959 435.19 1.01 0.98 1.01 0.99 FreeBench analyzer 763 66.64 0.41 0.45 McGill chomp misr 378 181 7.44 31.39 0.59 0.99 0.47 1.01 Olden suite bisort health mst perimeter treeadd tsp voronoi 597 474 408 345 154 433 975 24.29 86.05 65.73 7.19 10.17 20.44 11.03 0.99 0.71 0.82 0.78 0.48 0.96 0.99 0.99 0.63 0.82 0.84 0.55 0.97 0.99 Ptrdist suite anagram bc ft ks 355 4303 926 551 1.53 1.95 8.25 7.46 0.99 0.82 0.83 1.03 1.11 0.81 0.73 1.03 Avg. 0.84 0.84
2006-1
CS @ KAIST 17
– Regular expressions – Automata reduction algorithm
according to heap layout transformations
2006-1
CS @ KAIST 19
start return h == NULL h→key == k h = h→next h→data NotFound
T F T F k d n
2006-1
CS @ KAIST 20
e ae*c be*d
2006-1
CS @ KAIST 21
k d n k d n k n k kd+e kd+e kn (kn)*(kd+e)
(kn)*(kd+)
2006-1
CS @ KAIST 22
state_compare(state s1, state s2) b1 Ã whether 9s’.(s’ → s1, s1.dfn ≤ s’.dfn) // 0 or 1 b2 Ã whether 9s’.(s’ → s2, s2.dfn ≤ s’.dfn) // 0 or 1 if b1 and not b2 then 1 // s1 > s2 else if not b1 and b2 then
else if b1 and b2 then compare(s2.dfn, s1.dfn) // dfn = Depth First Numbering else compare(s1.dfn, s2.dfn) end if
2006-1
CS @ KAIST 23
worklist à ; workhorse(state s) if s ≠ start state and s ≠ end state then for all s’ 2 s.successor do delete s’ from worklist end for eliminate(s) for all s’ 2 s.successor do push s’ into worklist end for end if
2006-1
CS @ KAIST 24
reduce() E à {s 2 S | 9 s’.s →ε s’} R à {s 2 E | @ s’.s’ → s, s.dfn ≤ s’.dfn} for all s 2 R do workhorse(s) end for worklist à S\R while worklist ≠ ; do workhorse(pop(worklist)) end while
2006-1
CS @ KAIST 25
b a b a f()
f() { ... = s.a; if (!end) f(); ... = s.b; }
a*abb*
F → ab | aFb
aibi
2006-1
CS @ KAIST 26
– Most pools are used in a type-consistent manner
– Structure access patterns – Repeatedly used ones
2006-1
CS @ KAIST 27
– EMPTY, NORMAL, HAVE
. . foo(); . . . . bar1(); . . . . while(..) bar2(); . . . . s->f1; s->f2; . . main foo bar1 bar2
bar2 x NORMAL
bar1 x HAVE bar2 x NORMAL foo x HAVE bar1 x HAVE bar2 x NORMAL main x HAVE foo x HAVE bar1 x HAVE bar2 x NORMAL
exc. exc.
2006-1
CS @ KAIST 28
key next data data key,next 712440 2849975 704860 30278 7580 4267275 37858
...
...
2006-1
CS @ KAIST 29
. s->f3; foo(); . . . s->f1; bar1(); s->f2; . . s->f3; while(..) bar2(); s->f1; . . . s->f1; s->f2; . . main foo bar1 bar2
bar2.s x {f1} bar2.e x {f2} bar2.r x [(f1,f2) x {(0,1)}] bar1.s x {f3} bar1.e x {f1} bar1.r x [(f1,f3) x {(0,1)} (f1,f2) x {(1,1), (0,1)}] foo.s x {f1} foo.e x {f2} foo.r x [(f1,f3) x {(0,2)} (f1,f2) x {(1,1), (0,2)}] main.s x {f3} main.e x {f2} main.r x [(f1,f3) x {(0,3)} (f1,f2) x {(1,1), (0,2)}]
where F is the set of fields where VAR is the set of function names
2006-1
CS @ KAIST 30
2006-1
CS @ KAIST 31
2006-1
CS @ KAIST 32
Program SLOC time peak total time peak total
175.vpr 300.twolf 11301 17821 N.A. N.A. N.A. N.A. N.A. N.A. 0.154 0.360 15.97 27.03 178.68 313.14 analyzer 763 0.022 1.47 16.59 0.007 1.23 11.97 chomp misr 378 181 0.003 0.003 0.74 0.49 5.01 2.87 0.003 0.003 0.74 0.49 4.96 2.69 bisort health mst perimeter treeadd tsp voronoi 597 474 408 345 154 433 975 0.002 0.004 0.003 0.003 0.003 0.004 0.005 0.74 0.74 0.74 0.74 0.49 0.74 1.72 4.79 5.90 5.92 4.52 1.52 5.31 14.64 0.002 0.002 0.002 0.002 0.000 0.002 0.003 0.74 0.74 0.74 0.74 0.49 0.74 1.72 4.66 5.47 5.51 4.19 1.51 4.94 14.28 anagram bc ft ks 355 4303 926 551 0.002 572.897 0.006 0.008 0.74 612.93 0.98 0.98 5.33 4379.97 9.07 9.37 0.002 0.059 0.004 0.004 0.74 9.34 0.98 0.98 5.32 114.09 8.67 7.92
2006-1
CS @ KAIST 33
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 175.vpr 300.twolf analyzer chomp misr bisort health mst perimeter treeadd tsp voronoi anagram bc ft ks Normalized instruction reference Pool Pool + Re
Pool + Re 0.97 0.94 Pool