An Efficient Evolutionary Algorithm for Solving Incrementally - - PowerPoint PPT Presentation

an efficient evolutionary algorithm for solving
SMART_READER_LITE
LIVE PREVIEW

An Efficient Evolutionary Algorithm for Solving Incrementally - - PowerPoint PPT Presentation

An Efficient Evolutionary Algorithm for Solving Incrementally Structured Problems Jason Ansel Maciej Pacula Saman Amarasinghe Una-May OReilly MIT - CSAIL July 14, 2011 Jason Ansel (MIT) PetaBricks July 14, 2011 1 / 30 Who are we? I


slide-1
SLIDE 1

An Efficient Evolutionary Algorithm for Solving Incrementally Structured Problems

Jason Ansel Maciej Pacula Saman Amarasinghe Una-May O’Reilly

MIT - CSAIL

July 14, 2011

Jason Ansel (MIT) PetaBricks July 14, 2011 1 / 30

slide-2
SLIDE 2

Who are we?

I do research in programming languages (PL) and compilers The PetaBricks language is a collaboration between:

A PL / compiler research group A evolutionary algorithms research group A applied mathematics research group

Jason Ansel (MIT) PetaBricks July 14, 2011 2 / 30

slide-3
SLIDE 3

Who are we?

I do research in programming languages (PL) and compilers The PetaBricks language is a collaboration between:

A PL / compiler research group A evolutionary algorithms research group A applied mathematics research group

Our goal is to make programs run faster We use evolutionary algorithms to search for faster programs

Jason Ansel (MIT) PetaBricks July 14, 2011 2 / 30

slide-4
SLIDE 4

Who are we?

I do research in programming languages (PL) and compilers The PetaBricks language is a collaboration between:

A PL / compiler research group A evolutionary algorithms research group A applied mathematics research group

Our goal is to make programs run faster We use evolutionary algorithms to search for faster programs The PetaBricks language defines search spaces of algorithmic choices

Jason Ansel (MIT) PetaBricks July 14, 2011 2 / 30

slide-5
SLIDE 5

A motivating example

How would you write a fast sorting algorithm?

Jason Ansel (MIT) PetaBricks July 14, 2011 3 / 30

slide-6
SLIDE 6

A motivating example

How would you write a fast sorting algorithm?

Insertion sort Quick sort Merge sort Radix sort

Jason Ansel (MIT) PetaBricks July 14, 2011 3 / 30

slide-7
SLIDE 7

A motivating example

How would you write a fast sorting algorithm?

Insertion sort Quick sort Merge sort Radix sort Binary tree sort, Bitonic sort, Bubble sort, Bucket sort, Burstsort, Cocktail sort, Comb sort, Counting Sort, Distribution sort, Flashsort, Heapsort, Introsort, Library sort, Odd-even sort, Postman sort, Samplesort, Selection sort, Shell sort, Stooge sort, Strand sort, Timsort?

Jason Ansel (MIT) PetaBricks July 14, 2011 3 / 30

slide-8
SLIDE 8

A motivating example

How would you write a fast sorting algorithm?

Insertion sort Quick sort Merge sort Radix sort Binary tree sort, Bitonic sort, Bubble sort, Bucket sort, Burstsort, Cocktail sort, Comb sort, Counting Sort, Distribution sort, Flashsort, Heapsort, Introsort, Library sort, Odd-even sort, Postman sort, Samplesort, Selection sort, Shell sort, Stooge sort, Strand sort, Timsort?

Poly-algorithms

Jason Ansel (MIT) PetaBricks July 14, 2011 3 / 30

slide-9
SLIDE 9

std::stable sort

/usr/include/c++/4.5.2/bits/stl algo.h lines 3350-3367

Jason Ansel (MIT) PetaBricks July 14, 2011 4 / 30

slide-10
SLIDE 10

std::stable sort

/usr/include/c++/4.5.2/bits/stl algo.h lines 3350-3367

Jason Ansel (MIT) PetaBricks July 14, 2011 4 / 30

slide-11
SLIDE 11

Is 15 the right number?

The best cutoff (CO) changes Depends on competing costs:

Cost of computation (< operator, call overhead, etc) Cost of communication (swaps) Cache behavior (misses, prefetcher, locality)

Jason Ansel (MIT) PetaBricks July 14, 2011 5 / 30

slide-12
SLIDE 12

Is 15 the right number?

The best cutoff (CO) changes Depends on competing costs:

Cost of computation (< operator, call overhead, etc) Cost of communication (swaps) Cache behavior (misses, prefetcher, locality)

Sorting 100000 doubles with std::stable sort:

CO ≈ 200 optimal on a Phenom 905e (15% speedup over CO = 15) CO ≈ 400 optimal on a Opteron 6168 (15% speedup over CO = 15) CO ≈ 500 optimal on a Xeon E5320 (34% speedup over CO = 15) CO ≈ 700 optimal on a Xeon X5460 (25% speedup over CO = 15)

Jason Ansel (MIT) PetaBricks July 14, 2011 5 / 30

slide-13
SLIDE 13

Is 15 the right number?

The best cutoff (CO) changes Depends on competing costs:

Cost of computation (< operator, call overhead, etc) Cost of communication (swaps) Cache behavior (misses, prefetcher, locality)

Sorting 100000 doubles with std::stable sort:

CO ≈ 200 optimal on a Phenom 905e (15% speedup over CO = 15) CO ≈ 400 optimal on a Opteron 6168 (15% speedup over CO = 15) CO ≈ 500 optimal on a Xeon E5320 (34% speedup over CO = 15) CO ≈ 700 optimal on a Xeon X5460 (25% speedup over CO = 15)

If the best cutoff has changed, perhaps best algorithm has also changed

Jason Ansel (MIT) PetaBricks July 14, 2011 5 / 30

slide-14
SLIDE 14

Algorithmic choices

Language

either { I n s e r t i o n S o r t ( out , in ) ; } or { QuickSort ( out , in ) ; } or { MergeSort ( out , in ) ; } or { RadixSort ( out , in ) ; }

Jason Ansel (MIT) PetaBricks July 14, 2011 6 / 30

slide-15
SLIDE 15

Algorithmic choices

Language

either { I n s e r t i o n S o r t ( out , in ) ; } or { QuickSort ( out , in ) ; } or { MergeSort ( out , in ) ; } or { RadixSort ( out , in ) ; } ⇒

Representation

Decision tree synthesized by

  • ur evolutionary algorithm

Jason Ansel (MIT) PetaBricks July 14, 2011 6 / 30

slide-16
SLIDE 16

Decision trees

Optimized for a Xeon E7340 (8 cores):

N < 600 N < 1420 Insertion Sort Quick Sort Merge Sort (2-way)

Text notation (will be used later): I 600 Q 1420 M2

Jason Ansel (MIT) PetaBricks July 14, 2011 7 / 30

slide-17
SLIDE 17

Decision trees

Optimized for Sun Fire T200 Niagara (8 cores):

N < 1461 N < 2400 Merge Sort (4-way) Merge Sort (2-way) N < 75 Merge Sort (8-way) Merge Sort (16-way)

Text notation: M16 75 M8 1461 M4 2400 M2

Jason Ansel (MIT) PetaBricks July 14, 2011 8 / 30

slide-18
SLIDE 18

The configuration encoded by the genome

Decision trees Algorithm parameters (integers, floats) Parallel scheduling / blocking parameters (integers) Synthesized scalar functions (not used in the benchmarks shown) The average PetaBricks benchmark’s genome has:

1.9 decision trees 10.1 algorithm/parallelism/blocking parameters 0.6 synthesized scalar functions 23107 possible configurations

Jason Ansel (MIT) PetaBricks July 14, 2011 9 / 30

slide-19
SLIDE 19

Outline

1

PetaBricks Language

2

Autotuning Problem

3

INCREA

4

Evaluation

5

Conclusions

Jason Ansel (MIT) PetaBricks July 14, 2011 10 / 30

slide-20
SLIDE 20

PetaBricks programs at runtime

Program

Request Response

Jason Ansel (MIT) PetaBricks July 14, 2011 11 / 30

slide-21
SLIDE 21

PetaBricks programs at runtime

Program

Request Response

Configuration:

  • point in ~100D space

Measurement:

  • performance
  • accuracy (QoS)

Jason Ansel (MIT) PetaBricks July 14, 2011 11 / 30

slide-22
SLIDE 22

PetaBricks programs at runtime

Program

Request Response

Configuration:

  • point in ~100D space

Measurement:

  • performance
  • accuracy (QoS)

Offline Autotuning

Jason Ansel (MIT) PetaBricks July 14, 2011 11 / 30

slide-23
SLIDE 23

The challenges

Evaluating objective function is expensive

Must run the program (at least once) More expensive for unfit solutions Scales poorly with larger problem sizes

Fitness is noisy

Randomness from parallel races and system noise Testing each candidate only once often produces an worse algorithm Running many trials is expensive

Decision tree structures are complex

Theoretically infinite size We artificially bound them to 2736 bits (23 ints) each

Jason Ansel (MIT) PetaBricks July 14, 2011 12 / 30

slide-24
SLIDE 24

Contrast two evolutionary approaches

GPEA: General Purpose Evolutionary Algorithm

Used as a baseline

INCREA: Incremental Evolutionary Algorithm

Bottom-up approach Noisy fitness evaluation strategy Domain informed mutation operators

Jason Ansel (MIT) PetaBricks July 14, 2011 13 / 30

slide-25
SLIDE 25

General purpose evolution algorithm (GPEA)

Initial population ? ? ? ? Cost = 0

Jason Ansel (MIT) PetaBricks July 14, 2011 14 / 30

slide-26
SLIDE 26

General purpose evolution algorithm (GPEA)

Initial population 72.7s ? ? ? Cost = 72.7

Jason Ansel (MIT) PetaBricks July 14, 2011 14 / 30

slide-27
SLIDE 27

General purpose evolution algorithm (GPEA)

Initial population 72.7s 10.5s ? ? Cost = 83.2

Jason Ansel (MIT) PetaBricks July 14, 2011 14 / 30

slide-28
SLIDE 28

General purpose evolution algorithm (GPEA)

Initial population 72.7s 10.5s 4.1s ? Cost = 87.3

Jason Ansel (MIT) PetaBricks July 14, 2011 14 / 30

slide-29
SLIDE 29

General purpose evolution algorithm (GPEA)

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5

Jason Ansel (MIT) PetaBricks July 14, 2011 14 / 30

slide-30
SLIDE 30

General purpose evolution algorithm (GPEA)

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5 Generation 2 ? ? ? ? Cost = 0

Jason Ansel (MIT) PetaBricks July 14, 2011 14 / 30

slide-31
SLIDE 31

General purpose evolution algorithm (GPEA)

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5 Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1

Jason Ansel (MIT) PetaBricks July 14, 2011 14 / 30

slide-32
SLIDE 32

General purpose evolution algorithm (GPEA)

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5 Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3 ? ? ? ? Cost = 0

Jason Ansel (MIT) PetaBricks July 14, 2011 14 / 30

slide-33
SLIDE 33

General purpose evolution algorithm (GPEA)

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5 Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0

Jason Ansel (MIT) PetaBricks July 14, 2011 14 / 30

slide-34
SLIDE 34

General purpose evolution algorithm (GPEA)

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5 Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0 Generation 4 ? ? ? ? Cost = 0

Jason Ansel (MIT) PetaBricks July 14, 2011 14 / 30

slide-35
SLIDE 35

General purpose evolution algorithm (GPEA)

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5 Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0 Generation 4 0.3s 0.1s 0.4s 2.4s Cost = 3.2

Jason Ansel (MIT) PetaBricks July 14, 2011 14 / 30

slide-36
SLIDE 36

General purpose evolution algorithm (GPEA)

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5 Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0 Generation 4 0.3s 0.1s 0.4s 2.4s Cost = 3.2 Cost of autotuning front-loaded in initial (unfit) population We could speed up tuning if we start with a faster initial population

Jason Ansel (MIT) PetaBricks July 14, 2011 14 / 30

slide-37
SLIDE 37

General purpose evolution algorithm (GPEA)

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5 Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1 Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0 Generation 4 0.3s 0.1s 0.4s 2.4s Cost = 3.2 Cost of autotuning front-loaded in initial (unfit) population We could speed up tuning if we start with a faster initial population

Key insight

Smaller input sizes can be used to form better initial population

Jason Ansel (MIT) PetaBricks July 14, 2011 14 / 30

slide-38
SLIDE 38

Bottom-up evolutionary algorithm

Train on input size 64

Jason Ansel (MIT) PetaBricks July 14, 2011 15 / 30

slide-39
SLIDE 39

Bottom-up evolutionary algorithm

Train on input size 32, to form initial population for: Train on input size 64

Jason Ansel (MIT) PetaBricks July 14, 2011 15 / 30

slide-40
SLIDE 40

Bottom-up evolutionary algorithm

Train on input size 16, to form initial population for: Train on input size 32, to form initial population for: Train on input size 64

Jason Ansel (MIT) PetaBricks July 14, 2011 15 / 30

slide-41
SLIDE 41

Bottom-up evolutionary algorithm

Train on input size 1, to form initial population for: Train on input size 2, to form initial population for: Train on input size 8, to form initial population for: Train on input size 16, to form initial population for: Train on input size 32, to form initial population for: Train on input size 64 Naturally exploits optimal substructure of problems

Jason Ansel (MIT) PetaBricks July 14, 2011 15 / 30

slide-42
SLIDE 42

Noisy fitness evaluation

Both strategies terminate slow tests early GPEA uses 1 trial per candidate algorithm INCREA adaptively changes the number of trials Represents fitness as a probability distribution Runs a single tailed t-test to get confidence in differences Runs more trails if confidence is low

Jason Ansel (MIT) PetaBricks July 14, 2011 16 / 30

slide-43
SLIDE 43

Domain informed mutation operators

Mutation operators deal with larger structures in the genome

“Add algorithm Y to the top of decision tree X” “Scale cutoff X using a lognormal distribution”

Generated fully automatically by our compiler

Jason Ansel (MIT) PetaBricks July 14, 2011 17 / 30

slide-44
SLIDE 44

Outline

1

PetaBricks Language

2

Autotuning Problem

3

INCREA

4

Evaluation

5

Conclusions

Jason Ansel (MIT) PetaBricks July 14, 2011 18 / 30

slide-45
SLIDE 45

Experimental Setup

Measuring convergence time

Important to both program users and developers Vital in online autotuning

Three fixed-accuracy PetaBricks programs:

Sort 220 (small input size) Sort 223 (large input size) Matrix multiply Eigenvector solve

Representative runs Average of 30 runs, with tests for statistical significance in paper Run an 8-core Xeon running Debian 5.0

Jason Ansel (MIT) PetaBricks July 14, 2011 19 / 30

slide-46
SLIDE 46

Sort 220: training input size

1 10 100 1000 10000 100000 1e+06 1e+07 60 120 240 480 960 1920 Training Input Size Training Time INCREA GPEA

Jason Ansel (MIT) PetaBricks July 14, 2011 20 / 30

slide-47
SLIDE 47

Sort 220: candidates tested

100 1000 10000 60 120 240 480 960 1920 Tests Conducted Training Time INCREA GPEA

Jason Ansel (MIT) PetaBricks July 14, 2011 21 / 30

slide-48
SLIDE 48

Sort 220: performance

0.1 0.2 0.3 0.4 0.5 60 120 240 480 960 1920 Best Candidate (s) Training Time INCREA GPEA

Jason Ansel (MIT) PetaBricks July 14, 2011 22 / 30

slide-49
SLIDE 49

Sort 223: performance

0.2 0.4 0.6 0.8 1 60 180 540 1620 4860 14580 Best Candidate (s) Training Time INCREA GPEA

Jason Ansel (MIT) PetaBricks July 14, 2011 23 / 30

slide-50
SLIDE 50

INCREA: Sort, best algorithm at each generation

Input size Training time (s) Genome 20 6.9 Q 64 Qp 21 14.6 Q 64 Qp 22 26.6 I ... 27 115.7 I 28 138.6 I 270 R 1310 Rp 29 160.4 I 270 Q 1310 Qp 210 190.1 I 270 Q 1310 Qp 211 216.4 I 270 Q 3343 Qp 212 250.0 I 189 R 13190 Rp 213 275.5 I 189 R 13190 Rp 214 307.6 I 189 R 17131 Rp 215 341.9 I 189 R 49718 Rp 216 409.3 I 189 R 124155 M2 217 523.4 I 189 Q 5585 Qp 218 642.9 I 189 Q 5585 Qp 219 899.8 I 456 Q 5585 Qp 220 1313.8 I 456 Q 5585 Qp

I = insertion-sort Q = quick-sort R = radix-sort Mx = x-way merge-sort

p indicates run in

parallel

Jason Ansel (MIT) PetaBricks July 14, 2011 24 / 30

slide-51
SLIDE 51

INCREA: Sort, best algorithm at each generation

Input size Training time (s) Genome 20 6.9 Q 64 Qp 21 14.6 Q 64 Qp 22 26.6 I ... 27 115.7 I 28 138.6 I 270 R 1310 Rp 29 160.4 I 270 Q 1310 Qp 210 190.1 I 270 Q 1310 Qp 211 216.4 I 270 Q 3343 Qp 212 250.0 I 189 R 13190 Rp 213 275.5 I 189 R 13190 Rp 214 307.6 I 189 R 17131 Rp 215 341.9 I 189 R 49718 Rp 216 409.3 I 189 R 124155 M2 217 523.4 I 189 Q 5585 Qp 218 642.9 I 189 Q 5585 Qp 219 899.8 I 456 Q 5585 Qp 220 1313.8 I 456 Q 5585 Qp

I = insertion-sort Q = quick-sort R = radix-sort Mx = x-way merge-sort

p indicates run in

parallel

Jason Ansel (MIT) PetaBricks July 14, 2011 24 / 30

slide-52
SLIDE 52

INCREA: Sort, best algorithm at each generation

Input size Training time (s) Genome 20 6.9 Q 64 Qp 21 14.6 Q 64 Qp 22 26.6 I ... 27 115.7 I 28 138.6 I 270 R 1310 Rp 29 160.4 I 270 Q 1310 Qp 210 190.1 I 270 Q 1310 Qp 211 216.4 I 270 Q 3343 Qp 212 250.0 I 189 R 13190 Rp 213 275.5 I 189 R 13190 Rp 214 307.6 I 189 R 17131 Rp 215 341.9 I 189 R 49718 Rp 216 409.3 I 189 R 124155 M2 217 523.4 I 189 Q 5585 Qp 218 642.9 I 189 Q 5585 Qp 219 899.8 I 456 Q 5585 Qp 220 1313.8 I 456 Q 5585 Qp

I = insertion-sort Q = quick-sort R = radix-sort Mx = x-way merge-sort

p indicates run in

parallel

Jason Ansel (MIT) PetaBricks July 14, 2011 24 / 30

slide-53
SLIDE 53

INCREA: Sort, best algorithm at each generation

Input size Training time (s) Genome 20 6.9 Q 64 Qp 21 14.6 Q 64 Qp 22 26.6 I ... 27 115.7 I 28 138.6 I 270 R 1310 Rp 29 160.4 I 270 Q 1310 Qp 210 190.1 I 270 Q 1310 Qp 211 216.4 I 270 Q 3343 Qp 212 250.0 I 189 R 13190 Rp 213 275.5 I 189 R 13190 Rp 214 307.6 I 189 R 17131 Rp 215 341.9 I 189 R 49718 Rp 216 409.3 I 189 R 124155 M2 217 523.4 I 189 Q 5585 Qp 218 642.9 I 189 Q 5585 Qp 219 899.8 I 456 Q 5585 Qp 220 1313.8 I 456 Q 5585 Qp

I = insertion-sort Q = quick-sort R = radix-sort Mx = x-way merge-sort

p indicates run in

parallel

Jason Ansel (MIT) PetaBricks July 14, 2011 24 / 30

slide-54
SLIDE 54

INCREA: Sort, best algorithm at each generation

Input size Training time (s) Genome 20 6.9 Q 64 Qp 21 14.6 Q 64 Qp 22 26.6 I ... 27 115.7 I 28 138.6 I 270 R 1310 Rp 29 160.4 I 270 Q 1310 Qp 210 190.1 I 270 Q 1310 Qp 211 216.4 I 270 Q 3343 Qp 212 250.0 I 189 R 13190 Rp 213 275.5 I 189 R 13190 Rp 214 307.6 I 189 R 17131 Rp 215 341.9 I 189 R 49718 Rp 216 409.3 I 189 R 124155 M2 217 523.4 I 189 Q 5585 Qp 218 642.9 I 189 Q 5585 Qp 219 899.8 I 456 Q 5585 Qp 220 1313.8 I 456 Q 5585 Qp

I = insertion-sort Q = quick-sort R = radix-sort Mx = x-way merge-sort

p indicates run in

parallel

Jason Ansel (MIT) PetaBricks July 14, 2011 24 / 30

slide-55
SLIDE 55

INCREA: Sort, best algorithm at each generation

Input size Training time (s) Genome 20 6.9 Q 64 Qp 21 14.6 Q 64 Qp 22 26.6 I ... 27 115.7 I 28 138.6 I 270 R 1310 Rp 29 160.4 I 270 Q 1310 Qp 210 190.1 I 270 Q 1310 Qp 211 216.4 I 270 Q 3343 Qp 212 250.0 I 189 R 13190 Rp 213 275.5 I 189 R 13190 Rp 214 307.6 I 189 R 17131 Rp 215 341.9 I 189 R 49718 Rp 216 409.3 I 189 R 124155 M2 217 523.4 I 189 Q 5585 Qp 218 642.9 I 189 Q 5585 Qp 219 899.8 I 456 Q 5585 Qp 220 1313.8 I 456 Q 5585 Qp

I = insertion-sort Q = quick-sort R = radix-sort Mx = x-way merge-sort

p indicates run in

parallel

Jason Ansel (MIT) PetaBricks July 14, 2011 24 / 30

slide-56
SLIDE 56

GPEA: Sort, best algorithm at each generation

Generation Training time (s) Genome 91.4 I 448 R 1 133.2 I 413 R 2 156.5 I 448 R 3 174.8 I 448 Q 4 192.0 I 448 Q 5 206.8 I 448 Q 6 222.9 I 448 Q 4096 Qp 7 238.3 I 448 Q 4096 Qp 8 253.0 I 448 Q 4096 Qp 9 266.9 I 448 Q 4096 Qp 10 281.1 I 371 Q 4096 Qp 11 296.3 I 272 Q 4096 Qp 12 310.8 I 272 Q 4096 Qp ... 27 530.2 I 272 Q 4096 Qp 28 545.6 I 272 Q 4096 Qp 29 559.5 I 370 Q 8192 Qp 30 574.3 I 370 Q 8192 Qp ...

I = insertion-sort Q = quick-sort R = radix-sort Mx = x-way merge-sort

p indicates run in

parallel

Jason Ansel (MIT) PetaBricks July 14, 2011 25 / 30

slide-57
SLIDE 57

Matrix Multiply (input size 1024x1024)

0.5 1 1.5 2 2.5 3 60 120 240 480 960 1920 3840 7680 Best Candidate (s) Training Time INCREA GPEA ‘

Jason Ansel (MIT) PetaBricks July 14, 2011 26 / 30

slide-58
SLIDE 58

Eigenvector Solve (input size 1024x1024)

1 1.5 2 2.5 3 60 240 960 3840 15360 Best Candidate (s) Training Time INCREA GPEA

Jason Ansel (MIT) PetaBricks July 14, 2011 27 / 30

slide-59
SLIDE 59

Outline

1

PetaBricks Language

2

Autotuning Problem

3

INCREA

4

Evaluation

5

Conclusions

Jason Ansel (MIT) PetaBricks July 14, 2011 28 / 30

slide-60
SLIDE 60

Conclusions

Take away

The technique of solving incrementally structured problems by exploiting knowledge from smaller problem instances may be more broadly applicable.

Take away

PetaBricks is a useful framework for comparing techniques for autotuning programs.

Jason Ansel (MIT) PetaBricks July 14, 2011 29 / 30

slide-61
SLIDE 61

Thanks!

Questions? http://projects.csail.mit.edu/petabricks/

Jason Ansel (MIT) PetaBricks July 14, 2011 30 / 30