Deep Learning of Optimization Heuristics - - PowerPoint PPT Presentation

deep learning
SMART_READER_LITE
LIVE PREVIEW

Deep Learning of Optimization Heuristics - - PowerPoint PPT Presentation

End-to-end Deep Learning of Optimization Heuristics http://chriscummins.cc/pact17 Chris Cummins University of Edinburgh Pavlos Petoumenos University of Edinburgh Zheng Wang Lancaster University Hugh Leather University of Edinburgh


slide-1
SLIDE 1

Optimization Heuristics

Deep Learning

http://chriscummins.cc/pact17

End-to-end

  • f
slide-2
SLIDE 2

Chris Cummins

Lancaster University University of Edinburgh University of Edinburgh University of Edinburgh

Pavlos Petoumenos Zheng Wang Hugh Leather

slide-3
SLIDE 3

compilers are very complex

h a n d

  • c
  • d

e d h e u r i s t i c s

  • f choices

hundreds,

thousands,

millions}

{

int main( int argc, char** arg) {... _main: .cfi_start proc ## BB#0: pushq %rbp ... (

  • u

t

  • f

d a t e b y t i m e

  • f

r e l e a s e )

slide-4
SLIDE 4

Machine learning in compilers

y = f(x)

  • ptimization

decision model features

( d e r i v e d f r

  • m

I R )

slide-5
SLIDE 5

Machine learning in compilers

Training Programs Driver Feature Extractor Feature Vectors Best Decisions Training Data Training Data Training Data Optimization Heuristic

slide-6
SLIDE 6

Machine learning in compilers

Training Programs Driver Feature Extractor Feature Vectors Best Decisions Training Data Training Data Training Data Optimization Heuristic

the human bit!

1 . h a r d t

  • g

e t r i g h t

  • 2. time consuming
  • 3. repetitious
slide-7
SLIDE 7 Use a GPU Use a CPU Learned Heuristic

Feature space

Feature “Y”

Feature “X”

slide-8
SLIDE 8 Use a GPU Use a CPU Learned Heuristic

Feature space

Feature “Y”

Feature “X”

need good features!

slide-9
SLIDE 9

irrelevant

e.g. not capturing the right information e.g. missing critical information

incomplete

Ways to fail

unsuitable

e.g. wrong combination of features / model

slide-10
SLIDE 10

What we have

Training Programs Driver Feature Extractor Feature Vectors Best Decisions Training Data Training Data Training Data Predictive Model

slide-11
SLIDE 11

Training Programs Driver Best Decisions Training Data Training Data Training Data Predictive Model

What we need

slide-12
SLIDE 12

Heuristics without features Beats expert approach Learning across heuristics

Contributions

slide-13
SLIDE 13

int main(int argc, char **argv) { ...

Our approach

Deep Learning Optimization Decision Program Code

slide-14
SLIDE 14

int main(int argc, char **argv) { ...

Our approach

Deep Learning Optimization Decision Program Code

{

p r e p r

  • c

e s s i n g

Rewriter Encoder Code in

encode as sequence of vocabulary indices

Vocabulary table for characters + lang keywords

normalize identifiers & code style

1.var/fun names: ‘foo’ , ‘bar’ , … to ‘a’ , ‘b’ , … 2.sanitize whitespace 3.consistent use of optional braces

slide-15
SLIDE 15

Our approach

Deep Learning Optimization Decision Program Code

Rewriter Encoder Embedding Heuristic Model Language Model Rewriter Encoder Code in

m a p v

  • c

a b i n d i c e s i n t

  • r

e a l s p a c e summarize sequence as vector

(2 layer LSTM network)

predict optimization

  • n vector

(2 layer DNN)

slide-16
SLIDE 16

Our approach

Deep Learning Optimization Decision Program Code

Embedding Heuristic Model Language Model Rewriter Encoder Code in

slide-17
SLIDE 17

How does it work?

slide-18
SLIDE 18
slide-19
SLIDE 19

How does it work?

w e l l

slide-20
SLIDE 20

Heterogeneous Mapping Thread Coarsening

Prior Art

CGO’13 Grewe et. al PACT’14 Magni et. al

slide-21
SLIDE 21

Heterogeneous Mapping Thread Coarsening

Prior Art

Binary classification

{CPU, GPU}

One-of-six classification

{1, 2, 4, 8, 16, 32} CGO’13 PACT’14

Decision Space Model

Decision Tree

Cascading

Neural Networks

slide-22
SLIDE 22

Heterogeneous Mapping Thread Coarsening

Prior Art

4 features

Combined from 7 raw values. Instruction counts / ratios.

7 features

Principle Components of 34 raw values. Instruction counts / ratios / relative deltas.

CGO’13 PACT’14

Features

2 p a p e r s !

slide-23
SLIDE 23

Heterogeneous Mapping Thread Coarsening

int main(int argc ... int main(int argc ...

Our Approach

  • 1. Use the same model design for both
  • 2. No tweaking of parameters
  • 3. Minimum change - 3 line diff
slide-24
SLIDE 24

Heterogeneous Mapping Thread Coarsening

Prior Art

2x CPU-GPU architectures 4x GPU architectures

CGO’13 PACT’14

Hardware Training Programs

7 Benchmark Suites 3 Benchmark Suites

slide-25
SLIDE 25

results

slide-26
SLIDE 26

14% and 5% improvements over state-of-the-art

Speedup Heterogeneous Mapping

2.38x 2.09x

Speedup Thread Coarsening

1.06x 1.01x

State-of-the-art DeepTune

  • w. Transfer Learning
slide-27
SLIDE 27

14% and 5% improvements over state-of-the-art

Speedup Heterogeneous Mapping

2.38x 2.09x

Speedup Thread Coarsening

1.06x 1.01x

State-of-the-art DeepTune

  • w. Transfer Learning

2 5 6 b e n c h m a r k s 1 7 b e n c h m a r k s

slide-28
SLIDE 28

Heterogeneous Mapping Thread Coarsening

Transfer Learning

Embed- ding Heuristic Model Language Model Embed- ding Heuristic Model Language Model general specialized

slide-29
SLIDE 29

Heterogeneous Mapping Thread Coarsening

Transfer Learning

Embed- ding Heuristic Model Language Model Embed- ding Heuristic Model Language Model general specialized

initialize with values

slide-30
SLIDE 30

14% and 5% improvements over state-of-the-art

Speedup Heterogeneous Mapping

2.38x 2.09x

Speedup Thread Coarsening

1.06x 1.01x

State-of-the-art DeepTune

  • w. Transfer Learning
slide-31
SLIDE 31

Speedup Heterogeneous Mapping

2.38x 2.09x

Speedup Thread Coarsening

1.12x 1.06x 1.01x

State-of-the-art DeepTune

  • w. Transfer Learning

14% and 11% improvements over state-of-the-art

slide-32
SLIDE 32

Try it for yourself!

http://chriscummins.cc/pact17

code and data on GitHub r u n s i n t h e b r

  • w

s e r

C
  • n
s i s t e n t * C
  • m
p l e t e * W e l l D
  • c
u m e n t e d * E a s y t
  • R
e u s e * * E v a l u a t e d * A C T * A r t i f a c t * A E C P
slide-33
SLIDE 33

Problem: feature design is hard Featureless heuristics First cross-domain learning 11-14% speedups

Deep Learning Optimisation Heuristics

End-to-end

  • f

http://chriscummins.cc/pact17