Exhaustive Optimization Phase Order Space Exploration Prasad A. - - PowerPoint PPT Presentation

exhaustive optimization phase order space exploration
SMART_READER_LITE
LIVE PREVIEW

Exhaustive Optimization Phase Order Space Exploration Prasad A. - - PowerPoint PPT Presentation

Florida State University Exhaustive Optimization Phase Order Space Exploration Prasad A. Kulkarni David B. Whalley Gary S. Tyson Jack W. Davidson Symposium on Code Generation and Optimization - 2006 Florida State University Optimization


slide-1
SLIDE 1

Florida State University

Symposium on Code Generation and Optimization - 2006

Exhaustive Optimization Phase Order Space Exploration

Prasad A. Kulkarni David B. Whalley Gary S. Tyson Jack W. Davidson

slide-2
SLIDE 2

Symposium on Code Generation and Optimization - 2006

2

Florida State University

Optimization Phase Ordering

  • Optimizing compilers apply several
  • ptimization phases to improve the

performance of applications.

  • Optimization phases interact with each other.
  • Determining the best order of applying
  • ptimization phases has been a long standing

problem in compilers.

slide-3
SLIDE 3

Symposium on Code Generation and Optimization - 2006

3

Florida State University

Exhaustive Phase Order Enumeration... is it Feasible ?

  • A obvious approach to address the phase
  • rdering problem is to exhaustively evaluate

all combinations of optimization phases.

  • Exhaustive enumeration is difficult
  • compilers typically contain many different
  • ptimization phases
  • optimizations may be successful multiple times

for each function / program

slide-4
SLIDE 4

Symposium on Code Generation and Optimization - 2006

4

Florida State University

Optimization Space Properties

  • Phase ordering problem can be made more

manageable by exploiting certain properties

  • f the optimization search space
  • optimization phases might not apply any

transformations

  • many optimization phases are independent
  • Thus, many different orderings of
  • ptimization phases produce the same code.
slide-5
SLIDE 5

Symposium on Code Generation and Optimization - 2006

5

Florida State University

Re-stating the Phase Ordering Problem

  • Rather than considering all attempted phase

sequences, the phase ordering problem can be addressed by enumerating all distinct function instances that can be produced by combination of optimization phases.

  • We were able to exhaustively enumerate

109 out of 111 functions, in a few minutes for most.

slide-6
SLIDE 6

Symposium on Code Generation and Optimization - 2006

6

Florida State University

Outline

  • Experimental framework
  • Algorithm for exhaustive enumeration of

the phase order space

  • Search space enumeration results
  • Optimization phase interaction analysis
  • Making conventional compilation faster
  • Future work and conclusions
slide-7
SLIDE 7

Symposium on Code Generation and Optimization - 2006

7

Florida State University

Experimental Framework

  • We used the VPO compilation system
  • established compiler framework, started development

in 1988

  • comparable performance to gcc –O2
  • VPO performs all transformations on a single

representation (RTLs), so it is possible to perform most phases in an arbitrary order.

  • Experiments use all the 15 available optimization

phases in VPO.

  • Target architecture was the StrongARM SA-100

processor.

slide-8
SLIDE 8

Symposium on Code Generation and Optimization - 2006

8

Florida State University

VPO Optimization Phases

register allocation k

  • remv. useless jumps

u minimize loop jumps j instruction selection s block reordering i reverse branches r dead assignment elim. h strength reduction q loop unrolling g

  • eval. order determin.
  • remv. unreachable code

d code abstraction n common subexpr. elim. c loop transformations l branch chaining b Optimization Phase ID Optimization Phase ID

slide-9
SLIDE 9

Symposium on Code Generation and Optimization - 2006

9

Florida State University

Disclaimers

  • Did not include optimization phases normally

associated with compiler front ends

  • no memory hierarchy optimizations
  • no inlining or other interprocedural optimizations
  • Did not vary how phases are applied.
  • Did not include optimizations that require

profile data.

slide-10
SLIDE 10

Symposium on Code Generation and Optimization - 2006

10

Florida State University

Benchmarks

  • Used one program from each of the six

MiBench categories.

  • Total of 111 functions.

searches for given words in phrases stringsearch

  • ffice

secure hash algorithm sha security image compression / decompression jpeg consumer fast fourier transform fft telecomm Dijkstra’s shortest path algorithm dijkstra network test processor bit manipulation abilities bitcount auto

Description Program Category

slide-11
SLIDE 11

Symposium on Code Generation and Optimization - 2006

11

Florida State University

Outline

  • Experimental framework
  • Exhaustive enumeration of the phase order

space.

  • Search space enumeration results
  • Optimization phase interaction analysis
  • Making conventional compilation faster
  • Future work and conclusions
slide-12
SLIDE 12

Symposium on Code Generation and Optimization - 2006

12

Florida State University

Naïve Optimization Phase Order Space Exploration

a b c d a b c d a d a d a d b c b c b c

  • All combinations of optimization phase

sequences are attempted.

L2 L1 L0

slide-13
SLIDE 13

Symposium on Code Generation and Optimization - 2006

13

Florida State University

Eliminating Consecutively Applied Phases

  • A phase just applied in our compiler cannot

be immediately active again.

a b c d b c d a d a d a c b b c L2 L1 L0

slide-14
SLIDE 14

Symposium on Code Generation and Optimization - 2006

14

Florida State University

Eliminating Dormant Phases

  • Get feedback from the compiler indicating

if any transformations were successfully applied in a phase.

L2 L1 L0 a b c d b c d a d a d c b

slide-15
SLIDE 15

Symposium on Code Generation and Optimization - 2006

15

Florida State University

Detecting Identical Function Instances

  • Some optimization phases are independent
  • example: branch chaining & register allocation
  • Different phase sequences can produce the

same code

r[2] = 1; r[2] = 1; r[3] = r[4] + r[2]; r[3] = r[4] + r[2]; ⇒ ⇒instruction selection instruction selection r[3] = r[4] + 1; r[3] = r[4] + 1; r[2] = 1; r[2] = 1; r[3] = r[4] + r[2]; r[3] = r[4] + r[2]; ⇒ ⇒constant propagation constant propagation r[2] = 1; r[2] = 1; r[3] = r[4] + 1; r[3] = r[4] + 1; ⇒ ⇒dead assignment elimination dead assignment elimination r[3] = r[4] + 1; r[3] = r[4] + 1;

slide-16
SLIDE 16

Symposium on Code Generation and Optimization - 2006

16

Florida State University

Detecting Equivalent Function Instances

sum = 0; for (i = 0; i < 1000; i++ ) sum += a [ i ]; Source Code

r[10]=0; r[12]=HI[a]; r[12]=r[12]+LO[a]; r[1]=r[12]; r[9]=4000+r[12]; L3 r[8]=M[r[1]]; r[10]=r[10]+r[8]; r[1]=r[1]+4; IC=r[1]?r[9]; PC=IC<0,L3;

Register Allocation before Code Motion

r[11]=0; r[10]=HI[a]; r[10]=r[10]+LO[a]; r[1]=r[10]; r[9]=4000+r[10]; L5 r[8]=M[r[1]]; r[11]=r[11]+r[8]; r[1]=r[1]+4; IC=r[1]?r[9]; PC=IC<0,L5;

Code Motion before Register Allocation

r[32]=0; r[33]=HI[a]; r[33]=r[33]+LO[a]; r[34]=r[33]; r[35]=4000+r[33]; L01 r[36]=M[r[34]]; r[32]=r[32]+r[36]; r[34]=r[34]+4; IC=r[34]?r[35]; PC=IC<0,L01;

After Mapping Registers

slide-17
SLIDE 17

Symposium on Code Generation and Optimization - 2006

17

Florida State University

Resulting Search Space

  • Merging equivalent function instances

transforms the tree to a DAG.

L2 L1 L0 a b c c d a d a d

slide-18
SLIDE 18

Symposium on Code Generation and Optimization - 2006

18

Florida State University

Efficient Detection of Unique Function Instances

  • Even after pruning there may be tens or

hundreds of thousands of unique instances.

  • Use a CRC (cyclic redundancy check)

checksum on the bytes of the RTLs representing the instructions.

  • Used a hash table to check if an equivalent

function instance already exists in the DAG.

slide-19
SLIDE 19

Symposium on Code Generation and Optimization - 2006

19

Florida State University

Techniques to Make Searches Faster

  • Kept a copy of the program representation of the

unoptimized function instance in memory to avoid repeated disk accesses.

  • Also kept the program representation after each

active phase in memory to reduce the number of phases applied for each sequence.

  • Reduced search time by at least a factor of 5 to 10.
  • Out of 111 functions in our benchmark suite we

were able to completely enumerate all instances for 109 functions.

slide-20
SLIDE 20

Symposium on Code Generation and Optimization - 2006

20

Florida State University

Outline

  • Experimental framework
  • Exhaustive enumeration of the phase order

space.

  • Search space enumeration results
  • Optimization phase interaction analysis
  • Making conventional compilation faster
  • Future work and conclusions
slide-21
SLIDE 21

Symposium on Code Generation and Optimization - 2006

21

Florida State University

Search Space Statistics

182.9 27.5 12 381,857.7 25,362.6 0.9 16.9 166.7 average .... .... .... .... .... .... .... .... .... 1168 18 20 1,361,960 86,370 3 30 354 dijkstra(d) 153 12 17 515,749 33,620 1 40 465 main(j) 159 41 20 772,864 49,412 2 44 472 LZWRea...(j) 540 57 15 511,093 34,270 2 59 480 read_scan...(j) 2964 95 26 5,119,947 343,162 6 33 541 sha_trans...(h) N/A N/A N/A N/A N/A 5 50 624 main(f) N/A N/A N/A N/A N/A 4 45 680 fft_float(f) 52 37 15 106,793 7,018 1 63 795 start_inp...(j) 591 47 18 999,814 64,571 1 82 971 start_inp...(j) 324 18 16 597,147 39,152 1 72 1,009 start_inp...(j) 2365 53 18 2,990,221 200,397 1 198 1,228 parse_swi...(j) 587 153 20 1,153,279 74,950 2 88 1,371 start_inp...(j) Leaves CF Len Phases Instances Loop Blk Insts Function

slide-22
SLIDE 22

Symposium on Code Generation and Optimization - 2006

22

Florida State University

Outline

  • Experimental framework
  • Exhaustive enumeration of the phase order

space.

  • Search space enumeration results
  • Optimization phase interaction analysis
  • Making conventional compilation faster
  • Future work and conclusions
slide-23
SLIDE 23

Symposium on Code Generation and Optimization - 2006

23

Florida State University

Weighted Function Instance DAG

5 1 1 1 2 2 1 1 1 [abc] [bc] [c] [ab] [d] [a] a b c b a c c b a d

  • Each node is weighted by the number of

paths to a leaf node.

slide-24
SLIDE 24

Symposium on Code Generation and Optimization - 2006

24

Florida State University

Enabling Interaction Between Phases

  • b enables a along the path a-b-a.

5 1 1 1 2 2 1 1 1 [abc] [bc] [c] [ab] [d] [a] a b c b a c c b a d

slide-25
SLIDE 25

Symposium on Code Generation and Optimization - 2006

25

Florida State University

Enabling Probabilities

0.03 0.73 u 1.00 0.2 0.53 0.97 0.23 0.16 0.29 1.00 s 0.01 0.05 0.15 0.02 0.45 r 0.08 0.16 q 0.01 0.87

  • 0.03

0.05 0.03 0.01 0.01 0.04 0.01 0.22 0.04 0.42 n 0.06 0.03 0.01 0.02 0.06 0.59 l 0.81 0.11 0.01 k 0.13 0.01 0.03 j 0.61 0.01 0.01 0.61 i 0.46 0.03 0.01 0.02 0.7 0.06 h 0.01 0.18 0.01 g d 0.32 0.05 1.00 0.33 0.38 0.72 0.99 0.12 0.14 0.23 0.02 1.00 c 0.06 0.15 0.01 0.62 b u s r q

  • n

l k j i h g d c b St Ph

slide-26
SLIDE 26

Symposium on Code Generation and Optimization - 2006

26

Florida State University

Disabling Interaction Between Phases

  • b disables a along the path b-c-d.

5 1 1 1 2 2 1 1 1 [abc] [bc] [c] [ab] [d] [a] a b c b a c c b a d

slide-27
SLIDE 27

Symposium on Code Generation and Optimization - 2006

27

Florida State University

Disabling Probabilities

1.00 0.20 1.00 0.08 u 1.00 0.11 s 1.00 0.53 0.03 0.01 0.05 r 0.12 1.00 q 0.21 1.00 1.00 1.00 1.00 1.00 1.00

  • 0.33

0.02 1.00 0.53 0.31 0.07 0.25 0.09 0.49 0.33 n 0.73 1.00 0.30 0.07 0.71 l 1.00 0.01 0.04 k 0.14 0.49 1.00 0.13 j 0.55 0.14 0.14 1.00 0.06 0.08 i 1.00 0.01 h 0.03 0.02 0.19 1.00 0.35 g 1.00 d 0.15 0.02 1.00 c 0.31 0.05 0.08 0.02 0.15 1.00 b u s r q

  • n

l k j i h g d c b Ph

slide-28
SLIDE 28

Symposium on Code Generation and Optimization - 2006

28

Florida State University

Disabling Probabilities

1.00 0.20 1.00 0.08 u 1.00 0.11 s 1.00 0.53 0.03 0.01 0.05 r 0.12 1.00 q 0.21 1.00 1.00 1.00 1.00 1.00 1.00

  • 0.33

0.02 1.00 0.53 0.31 0.07 0.25 0.09 0.49 0.33 n 0.73 1.00 0.30 0.07 0.71 l 1.00 0.01 0.04 k 0.14 0.49 1.00 0.13 j 0.55 0.14 0.14 1.00 0.06 0.08 i 1.00 0.01 h 0.03 0.02 0.19 1.00 0.35 g 1.00 d 0.15 0.02 1.00 c 0.31 0.05 0.08 0.02 0.15 1.00 b u s r q

  • n

l k j i h g d c b Ph

slide-29
SLIDE 29

Symposium on Code Generation and Optimization - 2006

29

Florida State University

Outline

  • Experimental framework
  • Exhaustive enumeration of the phase order

space.

  • Search space enumeration results
  • Optimization phase interaction analysis
  • Making conventional compilation faster
  • Future work and conclusions
slide-30
SLIDE 30

Symposium on Code Generation and Optimization - 2006

30

Florida State University

Faster Conventional Compiler

  • We modified the VPO compiler to use enabling and

disabling probabilities to decrease compilation time.

# p[i] - current probability of phase i being active # e[i][j] - probability of phase j enabling phase i # d[i][j] - probability of phase j disabling phase i For each phase i do p[i] = e[i][st]; While (any p[i] > 0) do Select j as the current phase with highest probability of being active Apply phase j If phase j was active then For each phase i, where i != j do p[i] += ((1-p[i]) * e[i][j]) - (p[i] * d[i][j]) p[j] = 0

slide-31
SLIDE 31

Symposium on Code Generation and Optimization - 2006

31

Florida State University

Probabilistic Compilation Results

1.005 1.015 0.297 9.6 47.7 8.9 230.3 average .... .... .... .... .... .... .... .... 1.000 1.010 0.409 9 43 9 231 dijkstra(d) 1.000 1.007 0.375 14 57 12 270 main(j) N/A 1.014 0.325 11 45 12 268 LZWReadByte(j) N/A 1.018 0.342 10 43 13 233 read\_scan...(j) 0.953 0.965 0.605 16 67 17 284 sha\_trans...(h) 1.000 1.007 0.550 18 73 20 284 main(f) 0.974 1.012 0.451 25 99 28 463 fft\_float(f) 1.000 1.004 0.436 12 53 11 231 start\_inp...(j) N/A 1.003 0.420 13 49 14 233 start\_inp...(j) N/A 1.010 0.353 14 55 15 270 start\_inp...(j) 0.972 1.016 0.371 12 53 14 233 parse\_swi...(j) N/A 1.014 0.469 14 55 16 233 start\_inp...(j) Speed Size Time Active Attempted Active Attempted

  • Prob. / Old
  • Prob. Compilation

Old Compilation Function

slide-32
SLIDE 32

Symposium on Code Generation and Optimization - 2006

32

Florida State University

Outline

  • Experimental framework
  • Exhaustive enumeration of the phase order

space.

  • Search space enumeration results
  • Optimization phase interaction analysis
  • Making conventional compilation faster
  • Future work and conclusions
slide-33
SLIDE 33

Symposium on Code Generation and Optimization - 2006

33

Florida State University

Future Work

  • Study methods to find more equivalent performing

function instances to further reduce the

  • ptimization phase order space.
  • Evaluate approaches to find the dynamically
  • ptimal function instance.
  • Improve non-exhaustive searches of the phase
  • rder space.
  • Study additional methods to improve conventional

compilers.

slide-34
SLIDE 34

Symposium on Code Generation and Optimization - 2006

34

Florida State University

Conclusions

  • First work to show that the optimization phase
  • rder space can often be completely enumerated

(at least for the phases in our compiler).

  • First analysis of the entire phase order space to

capture various phase probabilities.

  • Used phase interaction information to achieve a

much faster compiler that still generates comparable code.

slide-35
SLIDE 35

Symposium on Code Generation and Optimization - 2006

35

Florida State University

Optimization Phase Independence

  • a-c and c-a are independent.

5 1 1 1 2 2 1 1 1 [abc] [bc] [c] [ab] [d] [a] a b c b a c c b a d

slide-36
SLIDE 36

Symposium on Code Generation and Optimization - 2006

36

Florida State University

Optimization Phase Independence

  • b-c and c-b are not independent.

5 1 1 1 2 2 1 1 1 [abc] [bc] [c] [ab] [d] [a] a b c b a c c b a d

slide-37
SLIDE 37

Symposium on Code Generation and Optimization - 2006

37

Florida State University

Independence Probabilities

0.97 0.98 0.5 0.95 u 0.94 0.89 0.39 0.61 0.45 0.82 0.96 0.96 0.22 s 0.94 0.99 0.97 0.71 0.98 0.99 0.96 r 0.89 0.98 q 0.39 0.58 0.45 0.30 0.59 0.12

  • 0.61

0.58 0.78 0.81 0.88 0.98 0.65 0.82 n 0.45 0.45 0.78 0.87 0.96 0.95 0.96 0.44 0.95 l 0.97 0.82 0.99 0.30 0.81 0.87 0.97 0.79 0.45 0.97 k 0.98 0.97 0.98 j 0.5 0.71 0.96 0.97 0.98 0.84 0.94 i 0.96 0.98 0.59 0.88 0.95 0.79 0.98 0.91 h 0.96 0.98 0.96 0.84 0.98 0.96 0.84 g d 0.22 0.99 0.98 0.12 0.65 0.44 0.45 0.91 0.96 c 0.95 0.96 0.82 0.95 0.97 0.94 0.84 b u s r q

  • n

l k j i h g d c b Ph

slide-38
SLIDE 38

Symposium on Code Generation and Optimization - 2006

38

Florida State University

Independence Probabilities

0.97 0.98 0.5 0.95 u 0.94 0.89 0.39 0.61 0.45 0.82 0.96 0.96 0.22 s 0.94 0.99 0.97 0.71 0.98 0.99 0.96 r 0.89 0.98 q 0.39 0.58 0.45 0.30 0.59 0.12

  • 0.61

0.58 0.78 0.81 0.88 0.98 0.65 0.82 n 0.45 0.45 0.78 0.87 0.96 0.95 0.96 0.44 0.95 l 0.97 0.82 0.99 0.30 0.81 0.87 0.97 0.79 0.45 0.97 k 0.98 0.97 0.98 j 0.5 0.71 0.96 0.97 0.98 0.84 0.94 i 0.96 0.98 0.59 0.88 0.95 0.79 0.98 0.91 h 0.96 0.98 0.96 0.84 0.98 0.96 0.84 g d 0.22 0.99 0.98 0.12 0.65 0.44 0.45 0.91 0.96 c 0.95 0.96 0.82 0.95 0.97 0.94 0.84 b u s r q

  • n

l k j i h g d c b Ph

slide-39
SLIDE 39

Symposium on Code Generation and Optimization - 2006

39

Florida State University

VPO Optimization Phases (cont...)

  • Register assignment (assigning pseudo registers to

hardware registers) is implicitly performed before the first phase that requires it.

  • Some phases are applied after the sequence
  • fixing the entry and exit of the function to manage the

run-time stack

  • exploiting predication on the ARM
  • performing instruction scheduling