1
Online Phase-Adaptive Data Layout Selection Chengliang Zhang - - PowerPoint PPT Presentation
Online Phase-Adaptive Data Layout Selection Chengliang Zhang - - PowerPoint PPT Presentation
Online Phase-Adaptive Data Layout Selection Chengliang Zhang Martin Hirzel Microsoft IBM (former IBM intern) ECOOP, 10 July 2008 1 Problem Statement No training run. Online Phase-Adaptive Data Layout Selection Cache line or page A
2
Online Phase-Adaptive Data Layout Selection
Cache line or page A Cache line or page B Object 1 Object 2 Object 3 Object 4
try measure decide
No training run.
Problem Statement
3
Data Layouts from Copying Garbage Collection
BF HI
4
Layout Performance Comparison
8-processor AMD HI faster BF faster
5
Multi-Armed Bandit Problem
BF HI Depth-first Allocation order Popularity Random Size Type Thread
6
Layout Auditing
try Data reorganizer Program measure Profiler decide Controller
Performance Data Layout Perfor- mance Profiling Decision Layout Decision Reward
7
Profiler
measure Profiler
Profiling Decision Reward
Data reorg. Program Data reorg. Program Data reorg. Program Data layout
l3 l4 l5
Physical time (wall clock)
r3 e3 r4 e4 r5 e5
Virtual time (allocated bytes)
v3 v4 v5
Reward for layout li uses historical average of:
- Virtual time vi / program execution time ei
- Virtual time vi-1 / reorganizer time ri
(always on)
8
Controller: Blind Justice
Profiling Decision Layout Decision Rewards
Goals
- Match performance
- f best layout
- Online
Challenges
- Confidence
- vs. Curiosity
- Phase changes
- vs. Noise
9
Confidence vs. Curiosity
Confidence Curiosity Never tried layout Few samples / High variance Many samples / Low variance
∞
Pick layout l if either:
- High confidence that l gives best reward
- High curiosity about l ’s reward
⇒ use simulated annealing
Phase Changes vs. Noise
Phase Adaptivity
– When layout performance changes, learn new best layout ⇒Forget historical rewards
Noise Tolerance
– Perturbation from extraneous causes ⇒ Remember historical rewards
10
⇒ use exponential decay
11
SASO Properties of Control Systems
- Stability
- Accuracy
- Settling
- Overshoot
- Phase adaptivity
- Overhead
Methodology
12
20 Java programs
(DaCapo suite, SPECjvm98 suite, and a few more)
J9 = IBM’s product Java VM
HI
hierarchical
BF
breadth-first
LA
layout auditing
4 Hardware Platforms Intel-2 AMD-2 AMD-4 AMD-8
Accuracy and Overhead
13
1 2 Intel
- 2
AMD-2 AMD-4 AMD-8 BF HI LA
Average % slowdown
- vs. best
5 10 Intel
- 2
AMD-2 AMD-4 AMD-8 BF HI LA 10 20 Intel
- 2
AMD-2 AMD-4 AMD-8 BF HI LA
Number of programs not optimal Worst % slowdown
- vs. best
Stability and Settling
14
Decay = 0.9 No decay (Decay=1.0)
75s 20s
BF better HI better HI better BF better HI better HI better Reward Layout Decision
BF HI
15
Related Work
- Lau/Arnold/Hind/Calder PLDI’06:
performance auditing for JIT optimization
- Soman/Krintz/Bacon ISMM’04: switch
copy vs. mark-sweep, generations or not
- Chen/Bhansali/Chilimbi/Gao/Chuang
PLDI’06: throttle unless miss rate reduced
- Saavedra/Park PACT’96: adapt prefetch
distance based on cancellation & latency
16
- Accurate
- Phase adaptive (good settling/stability)
- Negligible overhead profiling
- Online, hardware independent
Conclusions
17
Clustering Layouts by Performance
[SIGMETRICS 2007]
BF HI