Static Transformation for Heap Layout Using Memory Access Patterns - - PowerPoint PPT Presentation

static transformation for heap layout using memory access
SMART_READER_LITE
LIVE PREVIEW

Static Transformation for Heap Layout Using Memory Access Patterns - - PowerPoint PPT Presentation

Static Transformation for Heap Layout Using Memory Access Patterns Jinseong Jeon Computer Science, KAIST Static Transformation computing user compiler machine + static transformation Compilers can transform program memory layout.


slide-1
SLIDE 1

Static Transformation for Heap Layout Using Memory Access Patterns

Jinseong Jeon

Computer Science, KAIST

slide-2
SLIDE 2

2006-1

  • 12-1
  • 12

CS @ KAIST 2

Static Transformation

computing machine compiler + static transformation user

  • Compilers can transform program memory layout.

– program behaviors: memory access patterns – machine properties: memory hierarchy

slide-3
SLIDE 3

2006-1

  • 12-1
  • 12

CS @ KAIST 3

Heap Layout Transformation

[ Pool Allocation ] - complex pointer analysis [ Field Layout Reconstruction ] - profiling Node { int key; char data[6]; Node *next; } * T; char* search(int k) { ... while (...) { if (h→key == k) return h→data; h = h→next; } ... }

k n

...

d

...

slide-4
SLIDE 4

2006-1

  • 12-1
  • 12

CS @ KAIST 4

Goal & Direction

  • To build static transformation for heap layout

– Based on both heap layout transformations

  • Predict program behaviors

– How to represent memory access behaviors

  • Regular expressions

– How to extract run-time behaviors from codes

  • Code → CFG → Automaton → R.E.
  • Then, apply optimizing techniques

– How to interpret predicted behaviors

slide-5
SLIDE 5

2006-1

  • 12-1
  • 12

CS @ KAIST 5

Overview

Structure Selection Analysis Field Affinity Analysis Access Pattern Analysis Layout Transformer Source code Optimized code

slide-6
SLIDE 6

2006-1

  • 12-1
  • 12

CS @ KAIST 6

Structure Selection

S1.x = T1.c; for (...) { Ti.a = ...; ... = Ti.b; Uj.y = ...; } S = T ; for ( ) { T = ; = T ; U = ; }

TS TTU

TS(TTU)*

T U

conversion candidate selection for pool allocation structure type projection

slide-7
SLIDE 7

2006-1

  • 12-1
  • 12

CS @ KAIST 7

Field Affinity Estimation

= .c; for ( ) { .a = ; = .b; }

c ab

c(ab)*

a c b *

a b

...

c

...

field usage projection conversion symbolic estimation field layout reconstruction

S1.x = T1.c; for (...) { Ti.a = ...; ... = Ti.b; Uj.y = ...; }

slide-8
SLIDE 8

2006-1

  • 12-1
  • 12

CS @ KAIST 8

Field Affinity Estimation

  • Symbolic approach

– record closure marks with nesting information – regard all closure marks as a same variable

n k d ** * * ** * * ** * * * n k d 2x2+3x 3x x x2

(kdn(n)*)* ((kn)*(kd+))* ((kn)*(kd+))*

slide-9
SLIDE 9

2006-1

  • 12-1
  • 12

CS @ KAIST 9

Code Transformation

  • Explicit field names → field accesses on modified layouts

– Oi.next is converted into *(Oi + offset(next)). – Random pointer dereferences like *(p + 4) are not allowed. – For some accesses, extra instructions are required.

slide-10
SLIDE 10

2006-1

  • 12-1
  • 12

CS @ KAIST 10

Code Transformation

  • Type-aware malloc → pool allocation routines

– For custom allocators, feed hints which consist of target structures and corresponding custom allocators

... ... = malloc(sizeof(T)); ... ... ... = _T_alloc_(); ... char* my_malloc(int s) { ... ... = _T_alloc(); } char* _T_alloc_() { // pool allocation } ... ... = my_malloc(sizeof(T)); ... char* my_malloc(int s) { ... ... = malloc(s); } ... ... = my_malloc(sizeof(T)); ... char* _T_alloc_() { // pool allocation }

slide-11
SLIDE 11

2006-1

  • 12-1
  • 12

CS @ KAIST 11

Overview

Structure Selection Analysis Field Affinity Analysis Access Pattern Analysis Layout Transformer Source code Optimized code

slide-12
SLIDE 12

2006-1

  • 12-1
  • 12

CS @ KAIST 12

Experimental Environment

  • Using the CIL

CIL compiler and OCaml

  • Redhat 9.0 Linux PC

– 2.6GHz Pentium4 processor – 8KB L1D cache, 512KB L2 cache, 1.7GB main memory

  • GCC 3.2.2 with -O3
slide-13
SLIDE 13

2006-1

  • 12-1
  • 12

CS @ KAIST 13

Analysis Time

Benchmark Program Lines of Code Structure Selection Field Affinity Code Transform Total

SPECINT 2 000 175.vpr 300.twolf 11301 17821 7.220 15.598 0.324 3.455 0.107 1.126 7.651 20.179 FreeBench analyzer 763 0.096 0.027 0.012 0.135 McGill chomp misr 378 181 0.021 0.003 0.006 0.002 0.003 0.001 0.030 0.006 Olden suite bisort health mst perimeter treeadd tsp voronoi 597 474 408 345 154 433 975 0.020 0.024 0.031 0.012 0.002 0.011 0.048 0.003 0.004 0.004 0.012 0.000 0.004 0.004 0.002 0.002 0.002 0.001 0.000 0.002 0.003 0.025 0.030 0.037 0.025 0.002 0.017 0.055 Ptrdist suite anagram bc ft ks 355 4303 926 551 0.031 2.028 0.050 0.055 0.003 0.634 0.014 0.012 0.001 0.193 0.010 0.020 0.035 2.855 0.074 0.087

slide-14
SLIDE 14

2006-1

  • 12-1
  • 12

CS @ KAIST 14

0. 0.1 0. 0.2 0. 0.3 0. 0.4 0. 0.5 0. 0.6 0. 0.7 0. 0.8 0. 0.9 1 1. 1.1 1. 1.2 1. 1.3 1. 1.4 175. 175.vpr pr 300. 300.twol

  • lf

anal analyzer er chomp mp mi misr bi bisort

  • rt

heal health ms mst perime meter tr treeadd tsp tsp vo voronoi anagram anagram bc bc ft ft ks ks Norma malized L1D cache mi miss (1.0 = Original) Pool Pool Pool + Re Re

Cache Miss - L1D

1.99 1.99 2.23 2.23 Po Pool l 0.86 0.86 0.84 0.84 Po Pool l + + Re

slide-15
SLIDE 15

2006-1

  • 12-1
  • 12

CS @ KAIST 15

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 175. 175.vpr pr 300. 300.twol

  • lf

anal analyzer er chomp mp mi misr bi bisort

  • rt

heal health ms mst perime meter tr treeadd tsp tsp vo voronoi anagram anagram bc bc ft ft ks ks Norma malized L2 cache mi miss (1.0 = Original) Pool Pool Pool + Re Re

Cache Miss - L2

4.10 4.10 4.18 4.18 Po Pool l 1.06 1.06 1.00 1.00 Po Pool l + + Re

slide-16
SLIDE 16

2006-1

  • 12-1
  • 12

CS @ KAIST 16

Performance

Benchmark Program Lines of Code Original ( second) Pool / Original Pool + Re / Original

SPECINT 2 000 175.vpr 300.twolf 11301 17821 10.959 435.19 1.01 0.98 1.01 0.99 FreeBench analyzer 763 66.64 0.41 0.45 McGill chomp misr 378 181 7.44 31.39 0.59 0.99 0.47 1.01 Olden suite bisort health mst perimeter treeadd tsp voronoi 597 474 408 345 154 433 975 24.29 86.05 65.73 7.19 10.17 20.44 11.03 0.99 0.71 0.82 0.78 0.48 0.96 0.99 0.99 0.63 0.82 0.84 0.55 0.97 0.99 Ptrdist suite anagram bc ft ks 355 4303 926 551 1.53 1.95 8.25 7.46 0.99 0.82 0.83 1.03 1.11 0.81 0.73 1.03 Avg. 0.84 0.84

slide-17
SLIDE 17

2006-1

  • 12-1
  • 12

CS @ KAIST 17

Contribution

  • Predict memory access patterns at compile-time

– Regular expressions – Automata reduction algorithm

  • Interpret predicted patterns

according to heap layout transformations

  • Cache misses are reduced by 16%
  • Execution times are reduced by 14%
slide-18
SLIDE 18

Backup Slides

slide-19
SLIDE 19

2006-1

  • 12-1
  • 12

CS @ KAIST 19

From CFG to Automaton

start return h == NULL h→key == k h = h→next h→data NotFound

T F T F k d n

slide-20
SLIDE 20

2006-1

  • 12-1
  • 12

CS @ KAIST 20

State Elimination

e ae*c be*d

slide-21
SLIDE 21

2006-1

  • 12-1
  • 12

CS @ KAIST 21

From Automaton to R.E.

k d n k d n k n k kd+e kd+e kn (kn)*(kd+e)

(kn)*(kd+)

slide-22
SLIDE 22

2006-1

  • 12-1
  • 12

CS @ KAIST 22

State Compare

state_compare(state s1, state s2) b1 Ã whether 9s’.(s’ → s1, s1.dfn ≤ s’.dfn) // 0 or 1 b2 Ã whether 9s’.(s’ → s2, s2.dfn ≤ s’.dfn) // 0 or 1 if b1 and not b2 then 1 // s1 > s2 else if not b1 and b2 then

  • 1 // s1 < s2

else if b1 and b2 then compare(s2.dfn, s1.dfn) // dfn = Depth First Numbering else compare(s1.dfn, s2.dfn) end if

slide-23
SLIDE 23

2006-1

  • 12-1
  • 12

CS @ KAIST 23

Automata Reduction

worklist à ; workhorse(state s) if s ≠ start state and s ≠ end state then for all s’ 2 s.successor do delete s’ from worklist end for eliminate(s) for all s’ 2 s.successor do push s’ into worklist end for end if

slide-24
SLIDE 24

2006-1

  • 12-1
  • 12

CS @ KAIST 24

Automata Reduction

reduce() E à {s 2 S | 9 s’.s →ε s’} R à {s 2 E | @ s’.s’ → s, s.dfn ≤ s’.dfn} for all s 2 R do workhorse(s) end for worklist à S\R while worklist ≠ ; do workhorse(pop(worklist)) end while

slide-25
SLIDE 25

2006-1

  • 12-1
  • 12

CS @ KAIST 25

From Intra- to Inter-proc.

b a b a f()

  • Intrinsically, reverse topological order of a call graph
  • For self-recursive function calls,

f() { ... = s.a; if (!end) f(); ... = s.b; }

a*abb*

F → ab | aFb

aibi

slide-26
SLIDE 26

2006-1

  • 12-1
  • 12

CS @ KAIST 26

Structure Selection

  • “One structure per pool”

– Most pools are used in a type-consistent manner

  • Identify which structures are exhaustively used

– Structure access patterns – Repeatedly used ones

  • Structure detection in closures
slide-27
SLIDE 27

2006-1

  • 12-1
  • 12

CS @ KAIST 27

Closure Detection

  • Presence of closures

– EMPTY, NORMAL, HAVE

. . foo(); . . . . bar1(); . . . . while(..) bar2(); . . . . s->f1; s->f2; . . main foo bar1 bar2

bar2 x NORMAL

bar1 x HAVE bar2 x NORMAL foo x HAVE bar1 x HAVE bar2 x NORMAL main x HAVE foo x HAVE bar1 x HAVE bar2 x NORMAL

exc. exc.

slide-28
SLIDE 28

2006-1

  • 12-1
  • 12

CS @ KAIST 28

Field Affinity

key next data data key,next 712440 2849975 704860 30278 7580 4267275 37858

  • 4.key

...

  • 4.next
  • 5.key
  • 5.next
  • 3.next

...

  • 6.key
slide-29
SLIDE 29

2006-1

  • 12-1
  • 12

CS @ KAIST 29

Affinity Relation Abstraction

. s->f3; foo(); . . . s->f1; bar1(); s->f2; . . s->f3; while(..) bar2(); s->f1; . . . s->f1; s->f2; . . main foo bar1 bar2

bar2.s x {f1} bar2.e x {f2} bar2.r x [(f1,f2) x {(0,1)}] bar1.s x {f3} bar1.e x {f1} bar1.r x [(f1,f3) x {(0,1)} (f1,f2) x {(1,1), (0,1)}] foo.s x {f1} foo.e x {f2} foo.r x [(f1,f3) x {(0,2)} (f1,f2) x {(1,1), (0,2)}] main.s x {f3} main.e x {f2} main.r x [(f1,f3) x {(0,3)} (f1,f2) x {(1,1), (0,2)}]

where F is the set of fields where VAR is the set of function names

slide-30
SLIDE 30

2006-1

  • 12-1
  • 12

CS @ KAIST 30

Offset Calculation (1/2)

slide-31
SLIDE 31

2006-1

  • 12-1
  • 12

CS @ KAIST 31

Offset Calculation (2/2)

slide-32
SLIDE 32

2006-1

  • 12-1
  • 12

CS @ KAIST 32

Traditional vs. WTO based

Program SLOC time peak total time peak total

175.vpr 300.twolf 11301 17821 N.A. N.A. N.A. N.A. N.A. N.A. 0.154 0.360 15.97 27.03 178.68 313.14 analyzer 763 0.022 1.47 16.59 0.007 1.23 11.97 chomp misr 378 181 0.003 0.003 0.74 0.49 5.01 2.87 0.003 0.003 0.74 0.49 4.96 2.69 bisort health mst perimeter treeadd tsp voronoi 597 474 408 345 154 433 975 0.002 0.004 0.003 0.003 0.003 0.004 0.005 0.74 0.74 0.74 0.74 0.49 0.74 1.72 4.79 5.90 5.92 4.52 1.52 5.31 14.64 0.002 0.002 0.002 0.002 0.000 0.002 0.003 0.74 0.74 0.74 0.74 0.49 0.74 1.72 4.66 5.47 5.51 4.19 1.51 4.94 14.28 anagram bc ft ks 355 4303 926 551 0.002 572.897 0.006 0.008 0.74 612.93 0.98 0.98 5.33 4379.97 9.07 9.37 0.002 0.059 0.004 0.004 0.74 9.34 0.98 0.98 5.32 114.09 8.67 7.92

slide-33
SLIDE 33

2006-1

  • 12-1
  • 12

CS @ KAIST 33

Instruction Reference

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 175.vpr 300.twolf analyzer chomp misr bisort health mst perimeter treeadd tsp voronoi anagram bc ft ks Normalized instruction reference Pool Pool + Re

Pool + Re 0.97 0.94 Pool