EECS 768 Virtual Machines 1
Dynamic Binary Optimization
- Introduction
- Application profiling
- Optimizing translation blocks
- Compatibility
- Code reordering
- Other code optimizations
Dynamic Binary Optimization Introduction Application profiling - - PowerPoint PPT Presentation
Dynamic Binary Optimization Introduction Application profiling Optimizing translation blocks Compatibility Code reordering Other code optimizations 1 EECS 768 Virtual Machines Optimization Overview Identify frequently
EECS 768 Virtual Machines 1
EECS 768 Virtual Machines 2
EECS 768 Virtual Machines 3
Basic Block A . . . . . . R3 … R7 ... R1 R2 + R3 BEQ L1 if R3 ==0 L1: R1 … ... Basic Block C Basic Block B . . . R6 R1 + R6 … ... Compensation code R1 R2 + R3 Basic Block A . . . . . . R3 … R7 ... BEQ L1 if R3 ==0 L1: R1 … ... Basic Block C Basic Block B . . . R6 R1 + R6 … ...
EECS 768 Virtual Machines 4
Basic Block A . . . . . . R3 … R7 ... R1 R2 + R3 BEQ L1 if R3 ==0 L1: R1 … ... Basic Block C Basic Block B . . . R6 R1 + R6 … ... Superblock . . . . . . R3 … R7 ... BNE L2 if R3 !=0 R1 … ... Basic Block B L2: . . . R6 R1 + R6 … ... Compensation code R1 R2 + R3
EECS 768 Virtual Machines 5
R3 ← 100 loop: R1 ← mem(R2) ; load from memory Br found if R1 == -1 ; look for -1 R2 ← R2 + 4 R3 ← R3 -1 Br loop if R3 != 0 ; loop closing branch . . found:
EECS 768 Virtual Machines 6
0% 10% 20% 30% 40% 50% 0-10% 10-20% 20-30% 30-40% 40-50% 50-60% 60-70% 70-80% 80-90% >90% Percent Taken Fraction of Static Conditional Branches
EECS 768 Virtual Machines 7
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1 7 6 . g c c 1 8 1 . m c f 1 9 7 . p a r s e r 2 5 2 . e
2 5 6 . b z i p 2 1 7 1 . s w i m 1 7 3 . a p p l u 1 7 7 . m e s a 1 8 7 . f a c e r e c 1 8 9 . l u c a s Percent Dynamic Branches Decided Same As Previous Time
EECS 768 Virtual Machines 8
0.1 0.2 0.3 0.4 0.5 0.6 0.7 A l l A d d / S u b L
d L
i c S h i f t S e t Instruction Type Fraction with Constant Value static dynamic
EECS 768 Virtual Machines 9
EECS 768 Virtual Machines 10
B C D E F A
Compiler Front-end HLL Program Instrumented Code Optimizing Compiler Test Data Program Execution Compiler Back-end Program Statistics Optmized Binary intermediate form Instrumented Code
EECS 768 Virtual Machines 11
B D E A
Interpreter Program Binary Translator/ Optimizer Program Data Partial Program Statistics partially "discovered" code
EECS 768 Virtual Machines 12
50 48 38 15 2 13 10 17 15 12 B C D E F A B C D E F A
65 50 48 17 25 15
EECS 768 Virtual Machines 13
slows down program more requires less total time than sampling
less overhead than software less well-supported in processors typically event counters
slows down program less requires longer time to get same amount of data
EECS 768 Virtual Machines 14
Instruction function list . branch_conditional(inst) { BO = extract(inst,25,5); BI = extract(inst,20,5); displacement = extract(inst,15,14) * 4; . . // code to compute whether branch should be taken . . profile_addr = lookup(PC); if (branch_taken) profile_cnt(profile_addr, taken)++; PC = PC + displacement; Else profile_cnt(profile_addr, nottaken)++; PC = PC + 4; }
Branch PC taken count PC not taken count
HASH
EECS 768 Virtual Machines 15
EECS 768 Virtual Machines 16
Zero Detect Instruction Address Interval Counter decrement for each instruction Program Counter Load PC TRAP Initialize Counter Sample PC
EECS 768 Virtual Machines 17
A B E A B E
EECS 768 Virtual Machines 18
Br cond1 == true Br cond2 == false Br uncond Br cond3 == true Br uncond Br cond4 == true
A B C D E F G 30 70 68 2 97 15 29 B D C G A E F 1 1 29 68 1 3
EECS 768 Virtual Machines 19
Br uncond
EECS 768 Virtual Machines 20
Br cond1 == true Br cond2 == false Br uncond Br cond3 == true Br uncond Br cond4 == true
A B C D E F G
Br cond1 == false
A
Br cond3 == true
D E
Br cond2 == false Br uncond
B C
Br cond4 == true
G F
Br uncond Br uncond
EECS 768 Virtual Machines 21
Call proc xyz Proc xyz Return Call proc xyz A K L B X
. . .
Y Z A K L B
. . .
Y X X Z
EECS 768 Virtual Machines 22
Trace 1 Trace 3 Trace 2
30 70 68 97 15 29 B D C G A E F 1 1 29 68 1 3 2
EECS 768 Virtual Machines 23
EECS 768 Virtual Machines 24
15 B D C G A E F 15 B D C G A E F G G
EECS 768 Virtual Machines 25
Br cond1 == true Br cond2 == false Br uncond Br cond3 == true Br uncond Br cond4 == true
A B C D E F G 30 70 68 2 97 15 29 B D C G A E F 1 1 29 68 1 3
Br cond1 == false
A
Br cond3 == true
D E
Br cond2 == false
B C
Br cond4 == true
G F
Br uncond Br cond4 == true
G G
Br uncond Br cond4 == true Br uncond
EECS 768 Virtual Machines 26
A B C A B C
A B C
comp comp
A B C
Collect basic blocks using profile information Convert to intermediate form; place in buffer Schedule and
Add compensation code; place in code cache Intermediate form Generate target code Optimized target code Original source code
EECS 768 Virtual Machines 27
EECS 768 Virtual Machines 28
Source … r4 ← r6 + 1 r1 ← r2 + r3 → trap? r1 ← r4 + r5 r6 ← r1 * r7 Target … R4 ← R6 + 1 Remove R1 ← R4 + R5 dead R6 ← R1 * R7 assignment
Source … r1 ← r2 + r3 r9 ← r1 + r5 reschedule r6 ← r1 * r7 r3 ← r6 + 1 … Target … R1 ← R2 + R3 R6 ← R1 * R7 R9 ← R1 + R5 → trap? R3 ← R6 + 1 … Target with saved reg. state … R1 ← R2 + R3 S1 ← R1 * R7 R9 ← R1 + R5 R6 ← S1 R3 ← S1 + 1 …
EECS 768 Virtual Machines 29
EECS 768 Virtual Machines 30
EECS 768 Virtual Machines 31
Br reg Br reg reg (compensation) Br mem Br mem mem (compensation)
… R1 ← mem(R6) R2 ← mem(R6+4) R3 ← R1 + 1 R4 ← R1 << 2 Br exit if R7 == 0 R7 ← R7 + 1 mem(R6) ← R3 … R1 ← mem(R6) R2 ← mem(R6+4) R3 ← R1 + 1 Br exit if R7 == 0 R4 ← R1 << 2 R7 ← R7 + 1 mem(R6) ← R3 R4 ← R1 << 2
EECS 768 Virtual Machines 32
Br reg (R) Br reg (T) R T
… R2 ← R1 << 2 Br exit if R8 == 0 R6 ← R7 * R2 mem(R6) ← R3 R6 ← R2 + 2 … R2 ← R1 << 2 T1 ← R7 * R2 Br exit if R8 == 0 R6 ← T1 mem(T1) ← R3 R6 ← R2 + 2 … R2 ← R1 << 2 T1 ← R7 * R2 Br exit if R8 == 0 mem(T1) ← R3 R6 ← R2 + 2
EECS 768 Virtual Machines 33
Br mem
X
… R2 ← R1 << 2 T1 ← R7 * R2 Br exit if R8 == 0 mem(T1) ← R3 R6 ← R2 + 2
EECS 768 Virtual Machines 34
join point reg reg join point reg (compensation) join point mem mem join point mem (compensation)
… R1 ← R1 + 1 R7 ← mem(R6) R7 ← R7 + 1 ... … R1 ← R1 + 1 R7 ← mem(R6) R7 ← R7 + 1 ... R7 ← mem(R6)
EECS 768 Virtual Machines 35
join point reg join point mem
EECS 768 Virtual Machines 36
reg(R) reg R T reg reg(T) reg(R) mem R T mem reg(T)
… R1 ← R1 * 3 mem(R6) ← R1 R7 ← R7 << 3 R9 ← R7 + R2 ... … R1 ← R1 * 3 T1 ← R7 << 3 mem(R6) ← R1 R7 ← T1 R9 ← T1 + R2 ...
EECS 768 Virtual Machines 37
EECS 768 Virtual Machines 38
first second reg mem br join reg extend live range
extend live range
extend live range
add compensation code at entrance mem not allowed not allowed not allowed add compensation code at entrance br add compensation code at branch exit add compensation code at branch exit Not allowed (changes control flow) Not allowed (changes control flow) join Not allowed (can
in rare cases) Not allowed (can
in rare cases) Not allowed (changes control flow) no effect
EECS 768 Virtual Machines 39
EECS 768 Virtual Machines 40
– apply low-overhead conservative optimizations – only apply local optimizations
– exception, HLL (Java) Vms