Efficient Program Compilation through Machine Learning Techniques - - PowerPoint PPT Presentation
Efficient Program Compilation through Machine Learning Techniques - - PowerPoint PPT Presentation
Efficient Program Compilation through Machine Learning Techniques Gennady Pekhimenko IBM Canada Angela Demke Brown University of Toronto Motivation Unroll My cool Compiler Unroll Executable program -O2 Inline Inline Peephole
Motivation
My cool program Compiler
- O2
DCE Peephole Unroll Inline Executable
But what to do if executable is slow?
Replace –O2 with –O5
Unroll Unroll Unroll Unroll Unroll Unroll Optimization 100 New Fast Executable
1-10 minutes few seconds
Unroll Inline Peephole DCE
Motivation (2)
Compiler
- O2
Our cool Operating System
1 hour
Executable
Too slow!
Compiler
- O5
20 hours
New Executable
We do not have that much time Why did it happen?
Basic Idea
Unroll Unroll Unroll Optimization 100
Do we need all these optimizations for every function? Probably not. Compiler writers can typically solve this problem, but how ?
- 1. Description of every function
- 2. Classification based on the description
- 3. Only certain optimizations for every class
Machine Learning is good for solving this kind of problems
Overview
- Motivation
- System Overview
- Experiments and Results
- Related Work
- Conclusions
- Future Work
Initial Experiment
3X difference on average
Initial Experiment (2)
100 200 300 400 500 bzip2 crafty eon gap gzip mcf vortex vpr ammp applu art equake facerec fma3d galgel lucas mesa mgrid sixtrack swim wupwise Time, secs Benchmarks
SPEC2000 execution time at –O3 and –qhot –O3
"-O3" "-qhot -O3"
Classification parameters
Our System
Prepare
- extract features
- modify heuristic values
- choose transformations
- find hot methods
Gather Training Data
Compile Measure run time
Learn
Logistic Regression Classifier Best feature settings Offline
Deploy
TPO/XL Compiler set heuristic values Online
Data Preparation
Three key elements:
- Feature extraction
- Heuristic values modification
- Target set of transformations
- Total # of insts
- Loop nest level
- # and % of Loads, Stores,
Branches
- Loop characteristics
- Float and Integer # and %
- Existing XL compiler is
missing functionality
- Extension was made to the
existing Heuristic Context approach
- Unroll
- Wandwaving
- If-conversion
- Unswitching
- CSE
- Index Splitting ….
Gather Training Data
- Try to “cut” transformation backwards (from
last to first)
- If run time not worse than before,
transformation can be skipped
- Otherwise we keep it
- We do this for every hot function of every test
The main benefit is linear complexity.
Late Inlining Unroll Wandwaving
Learn with Logistic Regression
Function Descriptions Best Heuristic Values
Input Classifier
- Logistic Regression
- Neural Networks
- Genetic Programming
Output
.hpredict files
Compiler + Heuristic Values
Deployment
Online phase, for every function:
- Calculate the feature vector
- Compute the prediction
- Use this prediction as heuristic context
Overhead is negligible
Overview
- Motivation
- System Overview
- Experiments and Results
- Related Work
- Conclusions
- Future Work
Experiments
Benchmarks: SPEC2000 Others from IBM customers Platform: IBM server, 4 x Power5 1.9 GHz, 32GB RAM Running AIX 5.3
Results: compilation time
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
bzip2 crafty eon gap gzip mcf vortex vpr ammp applu art equake facerec fma3d galgel lucas mesa mgrid sixtrack swim wupwise GeoMean
Normalized Time Benchmarks Oracle Classifer
2x average speedup
Results: execution time
50 100 150 200 250 300 350 bzip2 crafty eon gap gzip mcf vortex vpr ammp applu art equake facerec fma3d galgel lucas mesa mgrid sixtrack swim wupwise Time, secs Benchmarks Baseline Oracle Classifer
New benchmarks: compilation time
0.2 0.4 0.6 0.8 1
Normalized Time
Benchmarks Classifier
New benchmarks: execution time
50 100 150 200 250 300 350 apsi parser twolf dmo argonne Time, secs Benchmarks Baseline Classifer 4% speedup
Overview
- Motivation
- System Overview
- Experiments and Results
- Related Work
- Conclusions
- Future Work
Related Work
- Iterative Compilation
- Pan and Eigenmann
- Agakov, et al.
- Single Heuristic Tuning
- Calder, et al.
- Stephenson, et al.
- Multiple Heuristic Tuning
- Cavazos, et al.
- MILEPOST GCC
Conclusions and Future Work
- 2x average compile time decrease
- Future work
- Execution time improvement
- -O5 level
- Performance Counters for better method
description
- Other benefits
- Heuristic Context Infrastructure
- Bug Finding
Thank you
- Raul Silvera, Arie Tal, Greg Steffan, Mathew
Zaleski
- Questions?