Efficient Program Compilation through Machine Learning Techniques - - PowerPoint PPT Presentation

efficient program compilation through machine learning
SMART_READER_LITE
LIVE PREVIEW

Efficient Program Compilation through Machine Learning Techniques - - PowerPoint PPT Presentation

Efficient Program Compilation through Machine Learning Techniques Gennady Pekhimenko IBM Canada Angela Demke Brown University of Toronto Motivation Unroll My cool Compiler Unroll Executable program -O2 Inline Inline Peephole


slide-1
SLIDE 1

Efficient Program Compilation through Machine Learning Techniques

Gennady Pekhimenko IBM Canada Angela Demke Brown University of Toronto

slide-2
SLIDE 2

Motivation

My cool program Compiler

  • O2

DCE Peephole Unroll Inline Executable

But what to do if executable is slow?

Replace –O2 with –O5

Unroll Unroll Unroll Unroll Unroll Unroll Optimization 100 New Fast Executable

1-10 minutes few seconds

Unroll Inline Peephole DCE

slide-3
SLIDE 3

Motivation (2)

Compiler

  • O2

Our cool Operating System

1 hour

Executable

Too slow!

Compiler

  • O5

20 hours

New Executable

We do not have that much time Why did it happen?

slide-4
SLIDE 4

Basic Idea

Unroll Unroll Unroll Optimization 100

Do we need all these optimizations for every function? Probably not. Compiler writers can typically solve this problem, but how ?

  • 1. Description of every function
  • 2. Classification based on the description
  • 3. Only certain optimizations for every class

Machine Learning is good for solving this kind of problems

slide-5
SLIDE 5

Overview

  • Motivation
  • System Overview
  • Experiments and Results
  • Related Work
  • Conclusions
  • Future Work
slide-6
SLIDE 6

Initial Experiment

3X difference on average

slide-7
SLIDE 7

Initial Experiment (2)

100 200 300 400 500 bzip2 crafty eon gap gzip mcf vortex vpr ammp applu art equake facerec fma3d galgel lucas mesa mgrid sixtrack swim wupwise Time, secs Benchmarks

SPEC2000 execution time at –O3 and –qhot –O3

"-O3" "-qhot -O3"

slide-8
SLIDE 8

Classification parameters

Our System

Prepare

  • extract features
  • modify heuristic values
  • choose transformations
  • find hot methods

Gather Training Data

Compile Measure run time

Learn

Logistic Regression Classifier Best feature settings Offline

Deploy

TPO/XL Compiler set heuristic values Online

slide-9
SLIDE 9

Data Preparation

Three key elements:

  • Feature extraction
  • Heuristic values modification
  • Target set of transformations
  • Total # of insts
  • Loop nest level
  • # and % of Loads, Stores,

Branches

  • Loop characteristics
  • Float and Integer # and %
  • Existing XL compiler is

missing functionality

  • Extension was made to the

existing Heuristic Context approach

  • Unroll
  • Wandwaving
  • If-conversion
  • Unswitching
  • CSE
  • Index Splitting ….
slide-10
SLIDE 10

Gather Training Data

  • Try to “cut” transformation backwards (from

last to first)

  • If run time not worse than before,

transformation can be skipped

  • Otherwise we keep it
  • We do this for every hot function of every test

The main benefit is linear complexity.

Late Inlining Unroll Wandwaving

slide-11
SLIDE 11

Learn with Logistic Regression

Function Descriptions Best Heuristic Values

Input Classifier

  • Logistic Regression
  • Neural Networks
  • Genetic Programming

Output

.hpredict files

Compiler + Heuristic Values

slide-12
SLIDE 12

Deployment

Online phase, for every function:

  • Calculate the feature vector
  • Compute the prediction
  • Use this prediction as heuristic context

Overhead is negligible

slide-13
SLIDE 13

Overview

  • Motivation
  • System Overview
  • Experiments and Results
  • Related Work
  • Conclusions
  • Future Work
slide-14
SLIDE 14

Experiments

Benchmarks: SPEC2000 Others from IBM customers Platform: IBM server, 4 x Power5 1.9 GHz, 32GB RAM Running AIX 5.3

slide-15
SLIDE 15

Results: compilation time

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

bzip2 crafty eon gap gzip mcf vortex vpr ammp applu art equake facerec fma3d galgel lucas mesa mgrid sixtrack swim wupwise GeoMean

Normalized Time Benchmarks Oracle Classifer

2x average speedup

slide-16
SLIDE 16

Results: execution time

50 100 150 200 250 300 350 bzip2 crafty eon gap gzip mcf vortex vpr ammp applu art equake facerec fma3d galgel lucas mesa mgrid sixtrack swim wupwise Time, secs Benchmarks Baseline Oracle Classifer

slide-17
SLIDE 17

New benchmarks: compilation time

0.2 0.4 0.6 0.8 1

Normalized Time

Benchmarks Classifier

slide-18
SLIDE 18

New benchmarks: execution time

50 100 150 200 250 300 350 apsi parser twolf dmo argonne Time, secs Benchmarks Baseline Classifer 4% speedup

slide-19
SLIDE 19

Overview

  • Motivation
  • System Overview
  • Experiments and Results
  • Related Work
  • Conclusions
  • Future Work
slide-20
SLIDE 20

Related Work

  • Iterative Compilation
  • Pan and Eigenmann
  • Agakov, et al.
  • Single Heuristic Tuning
  • Calder, et al.
  • Stephenson, et al.
  • Multiple Heuristic Tuning
  • Cavazos, et al.
  • MILEPOST GCC
slide-21
SLIDE 21

Conclusions and Future Work

  • 2x average compile time decrease
  • Future work
  • Execution time improvement
  • -O5 level
  • Performance Counters for better method

description

  • Other benefits
  • Heuristic Context Infrastructure
  • Bug Finding
slide-22
SLIDE 22

Thank you

  • Raul Silvera, Arie Tal, Greg Steffan, Mathew

Zaleski

  • Questions?