Optimization Heuristics
Deep Learning
http://chriscummins.cc/pact17
End-to-end
- f
Deep Learning of Optimization Heuristics - - PowerPoint PPT Presentation
End-to-end Deep Learning of Optimization Heuristics http://chriscummins.cc/pact17 Chris Cummins University of Edinburgh Pavlos Petoumenos University of Edinburgh Zheng Wang Lancaster University Hugh Leather University of Edinburgh
http://chriscummins.cc/pact17
End-to-end
Chris Cummins
Lancaster University University of Edinburgh University of Edinburgh University of Edinburgh
Pavlos Petoumenos Zheng Wang Hugh Leather
compilers are very complex
h a n d
e d h e u r i s t i c s
hundreds,
thousands,
millions}
int main( int argc, char** arg) {... _main: .cfi_start proc ## BB#0: pushq %rbp ... (
t
d a t e b y t i m e
r e l e a s e )
Machine learning in compilers
y = f(x)
decision model features
( d e r i v e d f r
I R )
Machine learning in compilers
Training Programs Driver Feature Extractor Feature Vectors Best Decisions Training Data Training Data Training Data Optimization Heuristic
Machine learning in compilers
Training Programs Driver Feature Extractor Feature Vectors Best Decisions Training Data Training Data Training Data Optimization Heuristic
the human bit!
1 . h a r d t
e t r i g h t
Feature space
Feature “Y”
Feature “X”
Feature space
Feature “Y”
Feature “X”
irrelevant
e.g. not capturing the right information e.g. missing critical information
incomplete
Ways to fail
unsuitable
e.g. wrong combination of features / model
What we have
Training Programs Driver Feature Extractor Feature Vectors Best Decisions Training Data Training Data Training Data Predictive Model
Training Programs Driver Best Decisions Training Data Training Data Training Data Predictive Model
What we need
Heuristics without features Beats expert approach Learning across heuristics
Contributions
int main(int argc, char **argv) { ...
Our approach
Deep Learning Optimization Decision Program Code
int main(int argc, char **argv) { ...
Our approach
Deep Learning Optimization Decision Program Code
p r e p r
e s s i n g
Rewriter Encoder Code in
encode as sequence of vocabulary indices
Vocabulary table for characters + lang keywords
normalize identifiers & code style
1.var/fun names: ‘foo’ , ‘bar’ , … to ‘a’ , ‘b’ , … 2.sanitize whitespace 3.consistent use of optional braces
Our approach
Deep Learning Optimization Decision Program Code
Rewriter Encoder Embedding Heuristic Model Language Model Rewriter Encoder Code in
m a p v
a b i n d i c e s i n t
e a l s p a c e summarize sequence as vector
(2 layer LSTM network)
predict optimization
(2 layer DNN)
Our approach
Deep Learning Optimization Decision Program Code
Embedding Heuristic Model Language Model Rewriter Encoder Code in
How does it work?
How does it work?
w e l l
Heterogeneous Mapping Thread Coarsening
Prior Art
CGO’13 Grewe et. al PACT’14 Magni et. al
Heterogeneous Mapping Thread Coarsening
Prior Art
Binary classification
{CPU, GPU}
One-of-six classification
{1, 2, 4, 8, 16, 32} CGO’13 PACT’14
Decision Space Model
Decision Tree
Cascading
Neural Networks
Heterogeneous Mapping Thread Coarsening
Prior Art
4 features
Combined from 7 raw values. Instruction counts / ratios.
7 features
Principle Components of 34 raw values. Instruction counts / ratios / relative deltas.
CGO’13 PACT’14
Features
2 p a p e r s !
Heterogeneous Mapping Thread Coarsening
int main(int argc ... int main(int argc ...
Our Approach
Heterogeneous Mapping Thread Coarsening
Prior Art
2x CPU-GPU architectures 4x GPU architectures
CGO’13 PACT’14
Hardware Training Programs
7 Benchmark Suites 3 Benchmark Suites
results
14% and 5% improvements over state-of-the-art
Speedup Heterogeneous Mapping
2.38x 2.09x
Speedup Thread Coarsening
1.06x 1.01x
State-of-the-art DeepTune
14% and 5% improvements over state-of-the-art
Speedup Heterogeneous Mapping
2.38x 2.09x
Speedup Thread Coarsening
1.06x 1.01x
State-of-the-art DeepTune
2 5 6 b e n c h m a r k s 1 7 b e n c h m a r k s
Heterogeneous Mapping Thread Coarsening
Transfer Learning
Embed- ding Heuristic Model Language Model Embed- ding Heuristic Model Language Model general specialized
Heterogeneous Mapping Thread Coarsening
Transfer Learning
Embed- ding Heuristic Model Language Model Embed- ding Heuristic Model Language Model general specialized
initialize with values
14% and 5% improvements over state-of-the-art
Speedup Heterogeneous Mapping
2.38x 2.09x
Speedup Thread Coarsening
1.06x 1.01x
State-of-the-art DeepTune
Speedup Heterogeneous Mapping
2.38x 2.09x
Speedup Thread Coarsening
1.12x 1.06x 1.01x
State-of-the-art DeepTune
14% and 11% improvements over state-of-the-art
Try it for yourself!
http://chriscummins.cc/pact17
code and data on GitHub r u n s i n t h e b r
s e r
CProblem: feature design is hard Featureless heuristics First cross-domain learning 11-14% speedups
Deep Learning Optimisation Heuristics
End-to-end
http://chriscummins.cc/pact17