PetaBricks
A Language and Compiler for Algorithmic Choice Jason Ansel Cy Chan Yee Lok Wong Marek Olszewski Qin Zhao Alan Edelman Saman Amarasinghe
MIT - CSAIL
June 16, 2009
Jason Ansel (MIT) PetaBricks June 16, 2009 1 / 47
PetaBricks A Language and Compiler for Algorithmic Choice Jason - - PowerPoint PPT Presentation
PetaBricks A Language and Compiler for Algorithmic Choice Jason Ansel Cy Chan Yee Lok Wong Marek Olszewski Qin Zhao Alan Edelman Saman Amarasinghe MIT - CSAIL June 16, 2009 Jason Ansel (MIT) PetaBricks June 16, 2009 1 / 47
PetaBricks
A Language and Compiler for Algorithmic Choice Jason Ansel Cy Chan Yee Lok Wong Marek Olszewski Qin Zhao Alan Edelman Saman Amarasinghe
MIT - CSAIL
June 16, 2009
Jason Ansel (MIT) PetaBricks June 16, 2009 1 / 47
Introduction Motivating Example
Outline
1
Introduction Motivating Example Language & Compiler Overview Why choices
2
PetaBricks Language Key Ideas Compilation Example Other Language Features
3
Results Benchmarks Scalability Variable Accuracy
4
Conclusion Final thoughts
Jason Ansel (MIT) PetaBricks June 16, 2009 2 / 47
Introduction Motivating Example
Algorithmic choice
Mergesort (N-way)
Jason Ansel (MIT) PetaBricks June 16, 2009 3 / 47
Introduction Motivating Example
Algorithmic choice
Mergesort (N-way)
Jason Ansel (MIT) PetaBricks June 16, 2009 3 / 47
Introduction Motivating Example
Algorithmic choice
Insertionsort Mergesort (N-way)
Jason Ansel (MIT) PetaBricks June 16, 2009 3 / 47
Introduction Motivating Example
Algorithmic choice
Insertionsort Radixsort Mergesort (N-way)
Jason Ansel (MIT) PetaBricks June 16, 2009 3 / 47
Introduction Motivating Example
Algorithmic choice
Quicksort Quicksort Insertionsort Radixsort Mergesort (N-way)
Jason Ansel (MIT) PetaBricks June 16, 2009 3 / 47
Introduction Motivating Example
Algorithmic choice
Quicksort Quicksort Insertionsort Radixsort Mergesort (N-way)
@15 N=2
STL Algorithm
Jason Ansel (MIT) PetaBricks June 16, 2009 3 / 47
Introduction Motivating Example
Algorithmic choice
Quicksort Quicksort Insertionsort Radixsort Mergesort (N-way)
@98 @75 N=4
Xeon (1 core)
Optimized For:
Jason Ansel (MIT) PetaBricks June 16, 2009 3 / 47
Introduction Motivating Example
Algorithmic choice
Quicksort Quicksort Insertionsort Radixsort Mergesort (N-way)
@98 @75 N=4
Xeon (1 core)
Optimized For:
@1420 @ 6 N=2
Xeon (8 cores)
Jason Ansel (MIT) PetaBricks June 16, 2009 3 / 47
Introduction Motivating Example
Algorithmic choice
Quicksort Quicksort Insertionsort Radixsort Mergesort (N-way)
@98 @75 N=4
Xeon (1 core)
Optimized For:
@1420 @ 6 N=2
Xeon (8 cores) Niagra (8 cores)
@75 @1461 @2400 N=2,4,8,16 Jason Ansel (MIT) PetaBricks June 16, 2009 3 / 47
Introduction Motivating Example
Algorithmic choice
Quicksort Quicksort Insertionsort Radixsort Mergesort (N-way)
@98 @75 N=4
Xeon (1 core)
Optimized For:
@1420 @ 6 N=2
Xeon (8 cores) Niagra (8 cores)
@75 @1461 @2400 N=2,4,8,16
Core 2 (2 cores)
@150 @600 @ 1 2 9 5 N=2,4,8 @38400 Jason Ansel (MIT) PetaBricks June 16, 2009 3 / 47
Introduction Motivating Example
Algorithmic choice
Quicksort Quicksort Quicksort Quicksort Quicksort Quicksort Quicksort Quicksort Quicksort Quicksort Insertionsort Radixsort Mergesort (N-way)
@98 @75 N=4
Xeon (1 core)
Optimized For:
@1420 @ 6 N=2
Xeon (8 cores) Niagra (8 cores)
@75 @1461 @2400 N=2,4,8,16
Core 2 (2 cores)
@150 @600 @ 1 2 9 5 N=2,4,8 @38400 Jason Ansel (MIT) PetaBricks June 16, 2009 3 / 47
Introduction Motivating Example
The PetaBricks language
The case for autotuning is obvious
Jason Ansel (MIT) PetaBricks June 16, 2009 4 / 47
Introduction Motivating Example
The PetaBricks language
The case for autotuning is obvious How should the programmer represent choice?
Jason Ansel (MIT) PetaBricks June 16, 2009 4 / 47
Introduction Motivating Example
The PetaBricks language
The case for autotuning is obvious How should the programmer represent choice? We present the PetaBricks programming language and compiler:
Jason Ansel (MIT) PetaBricks June 16, 2009 4 / 47
Introduction Motivating Example
The PetaBricks language
The case for autotuning is obvious How should the programmer represent choice? We present the PetaBricks programming language and compiler:
Choice as a fundamental language construct
Jason Ansel (MIT) PetaBricks June 16, 2009 4 / 47
Introduction Motivating Example
The PetaBricks language
The case for autotuning is obvious How should the programmer represent choice? We present the PetaBricks programming language and compiler:
Choice as a fundamental language construct Autotuning performed by the compiler
Jason Ansel (MIT) PetaBricks June 16, 2009 4 / 47
Introduction Motivating Example
The PetaBricks language
The case for autotuning is obvious How should the programmer represent choice? We present the PetaBricks programming language and compiler:
Choice as a fundamental language construct Autotuning performed by the compiler Automatically parallelized
Jason Ansel (MIT) PetaBricks June 16, 2009 4 / 47
Introduction Motivating Example
Sort in PetaBricks
1 transform Sort 2 from A[ n ] 3 to B[ n ] 4 { 5 from (A a ) to (B b) { 6 tunable WAYS; 7 /∗ Mergesort ∗/ 8 } or { 9 /∗ I n s e r t i o n s o r t ∗/ 10 } or { 11 /∗ Radixsort ∗/ 12 } or { 13 /∗ Quicksort ∗/ 14 } 15 }
Jason Ansel (MIT) PetaBricks June 16, 2009 5 / 47
Introduction Motivating Example
Sort in PetaBricks
1 transform Sort 2 from A[ n ] 3 to B[ n ] 4 { 5 from (A a ) to (B b) { 6 tunable WAYS; 7 /∗ Mergesort ∗/ 8 } or { 9 /∗ I n s e r t i o n s o r t ∗/ 10 } or { 11 /∗ Radixsort ∗/ 12 } or { 13 /∗ Quicksort ∗/ 14 } 15 }
Jason Ansel (MIT) PetaBricks June 16, 2009 5 / 47
Introduction Motivating Example
Sort in PetaBricks
1 transform Sort 2 from A[ n ] 3 to B[ n ] 4 { 5 from (A a ) to (B b) { 6 tunable WAYS; 7 /∗ Mergesort ∗/ 8 } or { 9 /∗ I n s e r t i o n s o r t ∗/ 10 } or { 11 /∗ Radixsort ∗/ 12 } or { 13 /∗ Quicksort ∗/ 14 } 15 }
Jason Ansel (MIT) PetaBricks June 16, 2009 5 / 47
Introduction Motivating Example
Sort in PetaBricks
1 transform Sort 2 from A[ n ] 3 to B[ n ] 4 { 5 from (A a ) to (B b) { 6 tunable WAYS; 7 /∗ Mergesort ∗/ 8 } or { 9 /∗ I n s e r t i o n s o r t ∗/ 10 } or { 11 /∗ Radixsort ∗/ 12 } or { 13 /∗ Quicksort ∗/ 14 } 15 }
Jason Ansel (MIT) PetaBricks June 16, 2009 5 / 47
Introduction Motivating Example
Sort in PetaBricks
1 transform Sort 2 from A[ n ] 3 to B[ n ] 4 { 5 from (A a ) to (B b) { 6 tunable WAYS; 7 /∗ Mergesort ∗/ 8 } or { 9 /∗ I n s e r t i o n s o r t ∗/ 10 } or { 11 /∗ Radixsort ∗/ 12 } or { 13 /∗ Quicksort ∗/ 14 } 15 }
Jason Ansel (MIT) PetaBricks June 16, 2009 5 / 47
Introduction Motivating Example
Sort in PetaBricks
1 transform Sort 2 from A[ n ] 3 to B[ n ] 4 { 5 from (A a ) to (B b) { 6 tunable WAYS; 7 /∗ Mergesort ∗/ 8 } or { 9 /∗ I n s e r t i o n s o r t ∗/ 10 } or { 11 /∗ Radixsort ∗/ 12 } or { 13 /∗ Quicksort ∗/ 14 } 15 }
Jason Ansel (MIT) PetaBricks June 16, 2009 5 / 47
Introduction Motivating Example
Sort in PetaBricks
1 transform Sort 2 from A[ n ] 3 to B[ n ] 4 { 5 from (A a ) to (B b) { 6 tunable WAYS; 7 /∗ Mergesort ∗/ 8 } or { 9 /∗ I n s e r t i o n s o r t ∗/ 10 } or { 11 /∗ Radixsort ∗/ 12 } or { 13 /∗ Quicksort ∗/ 14 } 15 }
Jason Ansel (MIT) PetaBricks June 16, 2009 5 / 47
Introduction Language & Compiler Overview
Outline
1
Introduction Motivating Example Language & Compiler Overview Why choices
2
PetaBricks Language Key Ideas Compilation Example Other Language Features
3
Results Benchmarks Scalability Variable Accuracy
4
Conclusion Final thoughts
Jason Ansel (MIT) PetaBricks June 16, 2009 6 / 47
Introduction Language & Compiler Overview
The PetaBricks compiler
Sort is compiled into a autotuning binary
Jason Ansel (MIT) PetaBricks June 16, 2009 7 / 47
Introduction Language & Compiler Overview
The PetaBricks compiler
Sort is compiled into a autotuning binary Trained on target architecture
Jason Ansel (MIT) PetaBricks June 16, 2009 7 / 47
Introduction Language & Compiler Overview
The PetaBricks compiler
Sort is compiled into a autotuning binary Trained on target architecture
Structured genetic tuner
Jason Ansel (MIT) PetaBricks June 16, 2009 7 / 47
Introduction Language & Compiler Overview
The PetaBricks compiler
Sort is compiled into a autotuning binary Trained on target architecture
Structured genetic tuner Trained with full number of threads
Jason Ansel (MIT) PetaBricks June 16, 2009 7 / 47
Introduction Language & Compiler Overview
The PetaBricks compiler
Sort is compiled into a autotuning binary Trained on target architecture
Structured genetic tuner Trained with full number of threads Under 1 minute for Sort
Jason Ansel (MIT) PetaBricks June 16, 2009 7 / 47
Introduction Language & Compiler Overview
The PetaBricks compiler
Sort is compiled into a autotuning binary Trained on target architecture
Structured genetic tuner Trained with full number of threads Under 1 minute for Sort
Results fed back into the compiler Final binary created
Jason Ansel (MIT) PetaBricks June 16, 2009 7 / 47
Introduction Language & Compiler Overview
Sort algorithm timings1
0.0005 0.001 0.0015 0.002 0.0025 250 500 750 1000 1250 1500 1750 Time (s) Input Size InsertionSort
1On an 8-way Xeon E7340 system Jason Ansel (MIT) PetaBricks June 16, 2009 8 / 47
Introduction Language & Compiler Overview
Sort algorithm timings1
0.0005 0.001 0.0015 0.002 0.0025 250 500 750 1000 1250 1500 1750 Time (s) Input Size InsertionSort QuickSort
1On an 8-way Xeon E7340 system Jason Ansel (MIT) PetaBricks June 16, 2009 8 / 47
Introduction Language & Compiler Overview
Sort algorithm timings1
0.0005 0.001 0.0015 0.002 0.0025 250 500 750 1000 1250 1500 1750 Time (s) Input Size InsertionSort QuickSort MergeSort
1On an 8-way Xeon E7340 system Jason Ansel (MIT) PetaBricks June 16, 2009 8 / 47
Introduction Language & Compiler Overview
Sort algorithm timings1
0.0005 0.001 0.0015 0.002 0.0025 250 500 750 1000 1250 1500 1750 Time (s) Input Size InsertionSort QuickSort MergeSort RadixSort
1On an 8-way Xeon E7340 system Jason Ansel (MIT) PetaBricks June 16, 2009 8 / 47
Introduction Language & Compiler Overview
Sort algorithm timings1
0.0005 0.001 0.0015 0.002 0.0025 250 500 750 1000 1250 1500 1750 Time (s) Input Size InsertionSort QuickSort MergeSort RadixSort Autotuned
1On an 8-way Xeon E7340 system Jason Ansel (MIT) PetaBricks June 16, 2009 8 / 47
Introduction Language & Compiler Overview
Timings on different architectures
Trained on Mobile Xeon 1-way Xeon 8-way Niagara Run on Mobile
1.67x 1.47x Xeon 1-way 1.61x
2.50x Xeon 8-way 1.59x 2.14x
Niagara 1.12x 1.51x 1.08x
PetaBricks June 16, 2009 9 / 47
Introduction Language & Compiler Overview
Timings on different architectures
Trained on Mobile Xeon 1-way Xeon 8-way Niagara Run on Mobile
1.67x 1.47x Xeon 1-way 1.61x
2.50x Xeon 8-way 1.59x 2.14x
Niagara 1.12x 1.51x 1.08x
PetaBricks June 16, 2009 9 / 47
Introduction Language & Compiler Overview
Timings on different architectures
Trained on Mobile Xeon 1-way Xeon 8-way Niagara Run on Mobile
1.67x 1.47x Xeon 1-way 1.61x
2.50x Xeon 8-way 1.59x 2.14x
Niagara 1.12x 1.51x 1.08x
PetaBricks June 16, 2009 9 / 47
Introduction Why choices
Outline
1
Introduction Motivating Example Language & Compiler Overview Why choices
2
PetaBricks Language Key Ideas Compilation Example Other Language Features
3
Results Benchmarks Scalability Variable Accuracy
4
Conclusion Final thoughts
Jason Ansel (MIT) PetaBricks June 16, 2009 10 / 47
Introduction Why choices
Early compilers
Code Gen Parsing Constrained Input Language (No choices)Early computers (and compilers) were weak
Jason Ansel (MIT) PetaBricks June 16, 2009 11 / 47
Introduction Why choices
Early compilers
Code Gen Parsing Constrained Input Language (No choices)Early computers (and compilers) were weak Parsing and code generation dominated compilation
Jason Ansel (MIT) PetaBricks June 16, 2009 11 / 47
Introduction Why choices
Early compilers
Code Gen Parsing Constrained Input Language (No choices)Early computers (and compilers) were weak Parsing and code generation dominated compilation Needed a constrained input language to simplify compilation
Jason Ansel (MIT) PetaBricks June 16, 2009 11 / 47
Introduction Why choices
Current compilers
Code Gen ParsingExposing Choices
Decisions Constrained Input Language (No choices)Current computers are much more powerful Compilers can do a lot more
Jason Ansel (MIT) PetaBricks June 16, 2009 12 / 47
Introduction Why choices
Current compilers
Code Gen ParsingExposing Choices
Decisions Constrained Input Language (No choices)Current computers are much more powerful Compilers can do a lot more Input language is still constraining
Jason Ansel (MIT) PetaBricks June 16, 2009 12 / 47
Introduction Why choices
Current compilers
Code Gen ParsingExposing Choices
Decisions Constrained Input Language (No choices)Current computers are much more powerful Compilers can do a lot more Input language is still constraining Compilation dominated by exposing choices
Jason Ansel (MIT) PetaBricks June 16, 2009 12 / 47
Introduction Why choices
Current compilers
Code Gen ParsingExposing Choices
Decisions Constrained Input Language (No choices)Current computers are much more powerful Compilers can do a lot more Input language is still constraining Compilation dominated by exposing choices Input language specifies only one
Algorithmic choice Iteration order choice Parallelism strategy choice Data layout choice
Jason Ansel (MIT) PetaBricks June 16, 2009 12 / 47
Introduction Why choices
Current compilers
Code Gen ParsingExposing Choices
Decisions Constrained Input Language (No choices)Current computers are much more powerful Compilers can do a lot more Input language is still constraining Compilation dominated by exposing choices Input language specifies only one
Algorithmic choice Iteration order choice Parallelism strategy choice Data layout choice
Compiler must perform heroic analysis to reconstruct
Jason Ansel (MIT) PetaBricks June 16, 2009 12 / 47
Introduction Why choices
PetaBricks compiler
Code Gen ParsingExploring Choices & Making Decisions
Rich Input Language (w/ choices)We propose explicit choices in the language
Jason Ansel (MIT) PetaBricks June 16, 2009 13 / 47
Introduction Why choices
PetaBricks compiler
Code Gen ParsingExploring Choices & Making Decisions
Rich Input Language (w/ choices)We propose explicit choices in the language The programmer defines the space of legal
Algorithmic choices Iteration orders (include parallel) Data layouts
Jason Ansel (MIT) PetaBricks June 16, 2009 13 / 47
Introduction Why choices
PetaBricks compiler
Code Gen ParsingExploring Choices & Making Decisions
Rich Input Language (w/ choices)We propose explicit choices in the language The programmer defines the space of legal
Algorithmic choices Iteration orders (include parallel) Data layouts
Allow compilers to focus on exploring choices Compiler no longer needs to reconstruct choices
Jason Ansel (MIT) PetaBricks June 16, 2009 13 / 47
Introduction Why choices
Future-proof programs
The result: programs can adapt to their environment
Jason Ansel (MIT) PetaBricks June 16, 2009 14 / 47
Introduction Why choices
Future-proof programs
The result: programs can adapt to their environment Choices make programs less brittle
Jason Ansel (MIT) PetaBricks June 16, 2009 14 / 47
Introduction Why choices
Future-proof programs
The result: programs can adapt to their environment Choices make programs less brittle Programs change with architecture, available cores, inputs, etc
Jason Ansel (MIT) PetaBricks June 16, 2009 14 / 47
PetaBricks Language Key Ideas
Outline
1
Introduction Motivating Example Language & Compiler Overview Why choices
2
PetaBricks Language Key Ideas Compilation Example Other Language Features
3
Results Benchmarks Scalability Variable Accuracy
4
Conclusion Final thoughts
Jason Ansel (MIT) PetaBricks June 16, 2009 15 / 47
PetaBricks Language Key Ideas
Algorithmic choice in the language
Algorithmic choice is the key aspect of PetaBricks
Jason Ansel (MIT) PetaBricks June 16, 2009 16 / 47
PetaBricks Language Key Ideas
Algorithmic choice in the language
Algorithmic choice is the key aspect of PetaBricks Programmer can define multiple rules to compute the same data
Jason Ansel (MIT) PetaBricks June 16, 2009 16 / 47
PetaBricks Language Key Ideas
Algorithmic choice in the language
Algorithmic choice is the key aspect of PetaBricks Programmer can define multiple rules to compute the same data Compiler re-use rules to create hybrid algorithms
Jason Ansel (MIT) PetaBricks June 16, 2009 16 / 47
PetaBricks Language Key Ideas
Algorithmic choice in the language
Algorithmic choice is the key aspect of PetaBricks Programmer can define multiple rules to compute the same data Compiler re-use rules to create hybrid algorithms Can express choices at many different granularities
Jason Ansel (MIT) PetaBricks June 16, 2009 16 / 47
PetaBricks Language Key Ideas
Synthesized outer control flow
Outer control flow synthesized by compiler
Jason Ansel (MIT) PetaBricks June 16, 2009 17 / 47
PetaBricks Language Key Ideas
Synthesized outer control flow
Outer control flow synthesized by compiler Another choice that the programmer should not make
Jason Ansel (MIT) PetaBricks June 16, 2009 17 / 47
PetaBricks Language Key Ideas
Synthesized outer control flow
Outer control flow synthesized by compiler Another choice that the programmer should not make
By rows?
Jason Ansel (MIT) PetaBricks June 16, 2009 17 / 47
PetaBricks Language Key Ideas
Synthesized outer control flow
Outer control flow synthesized by compiler Another choice that the programmer should not make
By rows? By columns?
Jason Ansel (MIT) PetaBricks June 16, 2009 17 / 47
PetaBricks Language Key Ideas
Synthesized outer control flow
Outer control flow synthesized by compiler Another choice that the programmer should not make
By rows? By columns? Diagonal? Reverse order? Blocked? Parallel?
Jason Ansel (MIT) PetaBricks June 16, 2009 17 / 47
PetaBricks Language Key Ideas
Synthesized outer control flow
Outer control flow synthesized by compiler Another choice that the programmer should not make
By rows? By columns? Diagonal? Reverse order? Blocked? Parallel?
Instead programmer provides explicit producer-consumer relations
Jason Ansel (MIT) PetaBricks June 16, 2009 17 / 47
PetaBricks Language Key Ideas
Synthesized outer control flow
Outer control flow synthesized by compiler Another choice that the programmer should not make
By rows? By columns? Diagonal? Reverse order? Blocked? Parallel?
Instead programmer provides explicit producer-consumer relations Allows compiler to explore choice space
Jason Ansel (MIT) PetaBricks June 16, 2009 17 / 47
PetaBricks Language Compilation Example
Outline
1
Introduction Motivating Example Language & Compiler Overview Why choices
2
PetaBricks Language Key Ideas Compilation Example Other Language Features
3
Results Benchmarks Scalability Variable Accuracy
4
Conclusion Final thoughts
Jason Ansel (MIT) PetaBricks June 16, 2009 18 / 47
PetaBricks Language Compilation Example
Simple example program
1 transform RollingSum 2 from A[ n ] 3 to B[ n ] 4 { 5 // r u l e 0: use the p r e v i o u s l y computed value 6
7
8 return a+leftSum ; 9 } 10 11 // r u l e 1: sum a l l elements to the l e f t 12
i ) in ) { 13 return sum( in ) ; 14 } 15 }
Jason Ansel (MIT) PetaBricks June 16, 2009 19 / 47
PetaBricks Language Compilation Example
Simple example program
1 transform RollingSum 2 from A[ n ] 3 to B[ n ] 4 { 5 // r u l e 0: use the p r e v i o u s l y computed value 6
7
8 return a+leftSum ; 9 } 10 11 // r u l e 1: sum a l l elements to the l e f t 12
i ) in ) { 13 return sum( in ) ; 14 } 15 }
Jason Ansel (MIT) PetaBricks June 16, 2009 19 / 47
PetaBricks Language Compilation Example
Simple example program
... 5 // r u l e 0: use the p r e v i o u s l y computed value 6
7
8 return a+leftSum ; 9 } ...
A: B:
Jason Ansel (MIT) PetaBricks June 16, 2009 20 / 47
PetaBricks Language Compilation Example
Simple example program
A: B:
... 11 // r u l e 1: sum a l l elements to the l e f t 12
i ) in ) { 13 return sum( in ) ; 14 } ...
Jason Ansel (MIT) PetaBricks June 16, 2009 21 / 47
PetaBricks Language Compilation Example
Applicable regions
Compilation Process Applicable regions Choice grids Choice dependency graph
Jason Ansel (MIT) PetaBricks June 16, 2009 22 / 47
PetaBricks Language Compilation Example
Applicable regions
Compilation Process Applicable regions Choice grids Choice dependency graph
// r u l e 0 : use the p r e v i o u s l y computed v a l u e
from (A. c e l l ( i ) a ,
leftSum ) { return a+leftSum ; }
Applicable where 1 ≤ i < n
Jason Ansel (MIT) PetaBricks June 16, 2009 22 / 47
PetaBricks Language Compilation Example
Applicable regions
Compilation Process Applicable regions Choice grids Choice dependency graph
// r u l e 0 : use the p r e v i o u s l y computed v a l u e
from (A. c e l l ( i ) a ,
leftSum ) { return a+leftSum ; }
Applicable where 1 ≤ i < n
// r u l e 1 : sum a l l elements to the l e f t
from (A. region (0 , i ) i n ) { return sum( i n ) ; }
Applicable where 0 ≤ i < n
Jason Ansel (MIT) PetaBricks June 16, 2009 22 / 47
PetaBricks Language Compilation Example
Choice grids
Compilation Process Applicable regions Choice grids Choice dependency graph
1 n
R1 R0 or R1
Jason Ansel (MIT) PetaBricks June 16, 2009 23 / 47
PetaBricks Language Compilation Example
Choice grids
Compilation Process Applicable regions Choice grids Choice dependency graph Divide data space into symbolic regions with common sets of choices
1 n
R1 R0 or R1
Jason Ansel (MIT) PetaBricks June 16, 2009 23 / 47
PetaBricks Language Compilation Example
Choice grids
Compilation Process Applicable regions Choice grids Choice dependency graph Divide data space into symbolic regions with common sets of choices In this simple example:
A: Input (no choices) B: [0, 1) = rule 1 B: [1, n) = rule 0 or rule 1
1 n
R1 R0 or R1
Jason Ansel (MIT) PetaBricks June 16, 2009 23 / 47
PetaBricks Language Compilation Example
Choice grids
Compilation Process Applicable regions Choice grids Choice dependency graph Divide data space into symbolic regions with common sets of choices In this simple example:
A: Input (no choices) B: [0, 1) = rule 1 B: [1, n) = rule 0 or rule 1
1 n
R1 R0 or R1
Applicable regions map rules → symbolic data Choice grids map symbolic data → rules
Jason Ansel (MIT) PetaBricks June 16, 2009 23 / 47
PetaBricks Language Compilation Example
Choice dependency graph
Compilation Process Applicable regions Choice grids Choice dependency graph
B.region(1, n) Choices: r0, r1 (r0,=,-1) B.region(0, 1) Choices: r1 (r0,=,-1) A.region(0, n) (r1,<=),(r0,=) (r1,<=),(r0,=)
Jason Ansel (MIT) PetaBricks June 16, 2009 24 / 47
PetaBricks Language Compilation Example
Choice dependency graph
Compilation Process Applicable regions Choice grids Choice dependency graph
B.region(1, n) Choices: r0, r1 (r0,=,-1) B.region(0, 1) Choices: r1 (r0,=,-1) A.region(0, n) (r1,<=),(r0,=) (r1,<=),(r0,=)
Adds dependency edges between symbolic regions
Jason Ansel (MIT) PetaBricks June 16, 2009 24 / 47
PetaBricks Language Compilation Example
Choice dependency graph
Compilation Process Applicable regions Choice grids Choice dependency graph
B.region(1, n) Choices: r0, r1 (r0,=,-1) B.region(0, 1) Choices: r1 (r0,=,-1) A.region(0, n) (r1,<=),(r0,=) (r1,<=),(r0,=)
Adds dependency edges between symbolic regions Edges annotated with directions and rules
Jason Ansel (MIT) PetaBricks June 16, 2009 24 / 47
PetaBricks Language Compilation Example
Choice dependency graph
Compilation Process Applicable regions Choice grids Choice dependency graph
B.region(1, n) Choices: r0, r1 (r0,=,-1) B.region(0, 1) Choices: r1 (r0,=,-1) A.region(0, n) (r1,<=),(r0,=) (r1,<=),(r0,=)
Adds dependency edges between symbolic regions Edges annotated with directions and rules Many compiler passes on this IR to:
Jason Ansel (MIT) PetaBricks June 16, 2009 24 / 47
PetaBricks Language Compilation Example
Choice dependency graph
Compilation Process Applicable regions Choice grids Choice dependency graph
B.region(1, n) Choices: r0, r1 (r0,=,-1) B.region(0, 1) Choices: r1 (r0,=,-1) A.region(0, n) (r1,<=),(r0,=) (r1,<=),(r0,=)
Adds dependency edges between symbolic regions Edges annotated with directions and rules Many compiler passes on this IR to:
Simplify complex dependency patterns
Jason Ansel (MIT) PetaBricks June 16, 2009 24 / 47
PetaBricks Language Compilation Example
Choice dependency graph
Compilation Process Applicable regions Choice grids Choice dependency graph
B.region(1, n) Choices: r0, r1 (r0,=,-1) B.region(0, 1) Choices: r1 (r0,=,-1) A.region(0, n) (r1,<=),(r0,=) (r1,<=),(r0,=)
Adds dependency edges between symbolic regions Edges annotated with directions and rules Many compiler passes on this IR to:
Simplify complex dependency patterns Add choices
Jason Ansel (MIT) PetaBricks June 16, 2009 24 / 47
PetaBricks Language Compilation Example
Code generation
Autotuning Binary
PetaBricks Compiler
Final Binary
Choice Configuration File PetaBricks Source Code 1 PetaBricks source code is
compiled
Jason Ansel (MIT) PetaBricks June 16, 2009 25 / 47
PetaBricks Language Compilation Example
Code generation
Autotuning Binary
PetaBricks Compiler
Final Binary
Choice Configuration File PetaBricks Source Code 1 PetaBricks source code is
compiled
2 An autotuning binary is created Jason Ansel (MIT) PetaBricks June 16, 2009 25 / 47
PetaBricks Language Compilation Example
Code generation
Autotuning Binary
PetaBricks Compiler
Final Binary
Choice Configuration File PetaBricks Source Code 1 PetaBricks source code is
compiled
2 An autotuning binary is created 3 Autotuning occurs creating a
choice configuration file
Jason Ansel (MIT) PetaBricks June 16, 2009 25 / 47
PetaBricks Language Compilation Example
Code generation
Autotuning Binary
PetaBricks Compiler
Final Binary
Choice Configuration File PetaBricks Source Code 1 PetaBricks source code is
compiled
2 An autotuning binary is created 3 Autotuning occurs creating a
choice configuration file
4 Choices are fed back into the
compiler to create a final binary
Jason Ansel (MIT) PetaBricks June 16, 2009 25 / 47
PetaBricks Language Compilation Example
Autotuning
Based on two building blocks:
A genetic tuner An n-ary search algorithm
Jason Ansel (MIT) PetaBricks June 16, 2009 26 / 47
PetaBricks Language Compilation Example
Autotuning
Based on two building blocks:
A genetic tuner An n-ary search algorithm
Flat parameter space Compiler generates a dependency graph describing this parameter space
Jason Ansel (MIT) PetaBricks June 16, 2009 26 / 47
PetaBricks Language Compilation Example
Autotuning
Based on two building blocks:
A genetic tuner An n-ary search algorithm
Flat parameter space Compiler generates a dependency graph describing this parameter space Entire program tuned from bottom up
Jason Ansel (MIT) PetaBricks June 16, 2009 26 / 47
PetaBricks Language Compilation Example
Parallel Runtime Library
Task-based parallel runtime Thread-local decks of runnable tasks
Jason Ansel (MIT) PetaBricks June 16, 2009 27 / 47
PetaBricks Language Compilation Example
Parallel Runtime Library
Task-based parallel runtime Thread-local decks of runnable tasks Use a work-stealing algorithm similar to that of Cilk
Jason Ansel (MIT) PetaBricks June 16, 2009 27 / 47
PetaBricks Language Other Language Features
Outline
1
Introduction Motivating Example Language & Compiler Overview Why choices
2
PetaBricks Language Key Ideas Compilation Example Other Language Features
3
Results Benchmarks Scalability Variable Accuracy
4
Conclusion Final thoughts
Jason Ansel (MIT) PetaBricks June 16, 2009 28 / 47
PetaBricks Language Other Language Features
More PetaBricks features
Automatic consistency checking The tunable keyword Call external code Custom training data generators Matrix versions for iterative algorithms Rule priorities where (clause for limiting applicable regions) Template transforms
Jason Ansel (MIT) PetaBricks June 16, 2009 29 / 47
PetaBricks Language Other Language Features
More PetaBricks features
Automatic consistency checking The tunable keyword Call external code Custom training data generators Matrix versions for iterative algorithms Rule priorities where (clause for limiting applicable regions) Template transforms
Jason Ansel (MIT) PetaBricks June 16, 2009 29 / 47
PetaBricks Language Other Language Features
More PetaBricks features
Automatic consistency checking The tunable keyword Call external code Custom training data generators Matrix versions for iterative algorithms Rule priorities where (clause for limiting applicable regions) Template transforms
Jason Ansel (MIT) PetaBricks June 16, 2009 29 / 47
PetaBricks Language Other Language Features
More PetaBricks features
Automatic consistency checking The tunable keyword Call external code Custom training data generators Matrix versions for iterative algorithms Rule priorities where (clause for limiting applicable regions) Template transforms
Jason Ansel (MIT) PetaBricks June 16, 2009 29 / 47
Results Benchmarks
Outline
1
Introduction Motivating Example Language & Compiler Overview Why choices
2
PetaBricks Language Key Ideas Compilation Example Other Language Features
3
Results Benchmarks Scalability Variable Accuracy
4
Conclusion Final thoughts
Jason Ansel (MIT) PetaBricks June 16, 2009 30 / 47
Results Benchmarks
Eigenvector Solve
Bisection QR decomposition Divide and conquer
Jason Ansel (MIT) PetaBricks June 16, 2009 31 / 47
Results Benchmarks
Eigenvector Solve
0.02 0.04 0.06 0.08 0.1 0.12 200 400 600 800 1000 Time (s) Input Size Bisection
Jason Ansel (MIT) PetaBricks June 16, 2009 32 / 47
Results Benchmarks
Eigenvector Solve
0.02 0.04 0.06 0.08 0.1 0.12 200 400 600 800 1000 Time (s) Input Size Bisection DC
Jason Ansel (MIT) PetaBricks June 16, 2009 32 / 47
Results Benchmarks
Eigenvector Solve
0.02 0.04 0.06 0.08 0.1 0.12 200 400 600 800 1000 Time (s) Input Size Bisection DC QR
Jason Ansel (MIT) PetaBricks June 16, 2009 32 / 47
Results Benchmarks
Eigenvector Solve
0.02 0.04 0.06 0.08 0.1 0.12 200 400 600 800 1000 Time (s) Input Size Bisection DC QR Autotuned
Jason Ansel (MIT) PetaBricks June 16, 2009 32 / 47
Results Benchmarks
Eigenvector Solve
0.02 0.04 0.06 0.08 0.1 0.12 200 400 600 800 1000 Time (s) Input Size Bisection DC QR Autotuned Cutoff 25
Jason Ansel (MIT) PetaBricks June 16, 2009 32 / 47
Results Benchmarks
Matrix Multiply
Basic Recursive decompositions Strassen’s algorithm Iteration order (blocking) Transpose
Jason Ansel (MIT) PetaBricks June 16, 2009 33 / 47
Results Benchmarks
Matrix Multiply
1e-06 0.0001 0.01 1 100 10000 1 10 100 1000 10000 Time (s) Input Size Basic
Jason Ansel (MIT) PetaBricks June 16, 2009 34 / 47
Results Benchmarks
Matrix Multiply
1e-06 0.0001 0.01 1 100 10000 1 10 100 1000 10000 Time (s) Input Size Basic Blocking
Jason Ansel (MIT) PetaBricks June 16, 2009 34 / 47
Results Benchmarks
Matrix Multiply
1e-06 0.0001 0.01 1 100 10000 1 10 100 1000 10000 Time (s) Input Size Basic Blocking Transpose
Jason Ansel (MIT) PetaBricks June 16, 2009 34 / 47
Results Benchmarks
Matrix Multiply
1e-06 0.0001 0.01 1 100 10000 1 10 100 1000 10000 Time (s) Input Size Basic Blocking Transpose Recursive
Jason Ansel (MIT) PetaBricks June 16, 2009 34 / 47
Results Benchmarks
Matrix Multiply
1e-06 0.0001 0.01 1 100 10000 1 10 100 1000 10000 Time (s) Input Size Basic Blocking Transpose Recursive Strassen 256
Jason Ansel (MIT) PetaBricks June 16, 2009 34 / 47
Results Benchmarks
Matrix Multiply
1e-06 0.0001 0.01 1 100 10000 1 10 100 1000 10000 Time (s) Input Size Basic Blocking Transpose Recursive Strassen 256 Autotuned
Jason Ansel (MIT) PetaBricks June 16, 2009 34 / 47
Results Scalability
Outline
1
Introduction Motivating Example Language & Compiler Overview Why choices
2
PetaBricks Language Key Ideas Compilation Example Other Language Features
3
Results Benchmarks Scalability Variable Accuracy
4
Conclusion Final thoughts
Jason Ansel (MIT) PetaBricks June 16, 2009 35 / 47
Results Scalability
Scalability
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Speedup Number of Threads Autotuned Matrix Multiply
Jason Ansel (MIT) PetaBricks June 16, 2009 36 / 47
Results Scalability
Scalability
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Speedup Number of Threads Autotuned Matrix Multiply Autotuned Sort
Jason Ansel (MIT) PetaBricks June 16, 2009 36 / 47
Results Scalability
Scalability
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Speedup Number of Threads Autotuned Matrix Multiply Autotuned Sort Autotuned Poisson
Jason Ansel (MIT) PetaBricks June 16, 2009 36 / 47
Results Scalability
Scalability
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Speedup Number of Threads Autotuned Matrix Multiply Autotuned Sort Autotuned Poisson Autotuned Eigenvector Solve
Jason Ansel (MIT) PetaBricks June 16, 2009 36 / 47
Results Variable Accuracy
Outline
1
Introduction Motivating Example Language & Compiler Overview Why choices
2
PetaBricks Language Key Ideas Compilation Example Other Language Features
3
Results Benchmarks Scalability Variable Accuracy
4
Conclusion Final thoughts
Jason Ansel (MIT) PetaBricks June 16, 2009 37 / 47
Results Variable Accuracy
Variable accuracy
Most algorithms produce exact solutions
Jason Ansel (MIT) PetaBricks June 16, 2009 38 / 47
Results Variable Accuracy
Variable accuracy
Most algorithms produce exact solutions Large class of algorithms can produce approximate solutions
Jason Ansel (MIT) PetaBricks June 16, 2009 38 / 47
Results Variable Accuracy
Variable accuracy
Most algorithms produce exact solutions Large class of algorithms can produce approximate solutions
Iterative convergence Grid coarsening Others
Jason Ansel (MIT) PetaBricks June 16, 2009 38 / 47
Results Variable Accuracy
Variable accuracy
Most algorithms produce exact solutions Large class of algorithms can produce approximate solutions
Iterative convergence Grid coarsening Others
Compiler/autotuner should be aware of variable accuracy
Jason Ansel (MIT) PetaBricks June 16, 2009 38 / 47
Results Variable Accuracy
Variable accuracy
Most algorithms produce exact solutions Large class of algorithms can produce approximate solutions
Iterative convergence Grid coarsening Others
Compiler/autotuner should be aware of variable accuracy Compiler can examine optimal frontier of algorithms
Jason Ansel (MIT) PetaBricks June 16, 2009 38 / 47
Results Variable Accuracy
Poisson’s equation
A variable accuracy benchmark Accuracy level expressed as a template parameter Autotuner exploits variable accuracy in a general way Choices:
Direct solve Jacobi iteration Successive over relaxation Multigrid
Jason Ansel (MIT) PetaBricks June 16, 2009 39 / 47
Results Variable Accuracy
Choices in Multigrid
Grid Size
128
SOR Iteration
Time
64 32 16
SOR is an iterative algorithm
Jason Ansel (MIT) PetaBricks June 16, 2009 40 / 47
Results Variable Accuracy
Choices in Multigrid
Grid Size
128
SOR Iteration
Time
64 32 16
SOR is an iterative algorithm Multigrid changes grid coarseness to speed up convergence Many standard shapes: V-Cycle,
Jason Ansel (MIT) PetaBricks June 16, 2009 40 / 47
Results Variable Accuracy
Choices in Multigrid
Grid Size
128
SOR Iteration
Time
64 32 16
SOR is an iterative algorithm Multigrid changes grid coarseness to speed up convergence Many standard shapes: V-Cycle, W-Cycle, etc
Jason Ansel (MIT) PetaBricks June 16, 2009 40 / 47
Results Variable Accuracy
Choices in Multigrid
Grid Size
128
SOR Iteration
Time
64 32 16
Direct Solve
SOR is an iterative algorithm Multigrid changes grid coarseness to speed up convergence Many standard shapes: V-Cycle, W-Cycle, etc Direct solver
Jason Ansel (MIT) PetaBricks June 16, 2009 40 / 47
Results Variable Accuracy
Choices in Multigrid
Grid Size
128
SOR Iteration
Time
64 32 16
Direct Solve
SOR is an iterative algorithm Multigrid changes grid coarseness to speed up convergence Many standard shapes: V-Cycle, W-Cycle, etc Direct solver Different shapes = different algorithms
Jason Ansel (MIT) PetaBricks June 16, 2009 40 / 47
Results Variable Accuracy
Autotuned V-cycle shapes for different accuracy requirements
10
1
Grid Size
2048 1024 512 256 128 64 32 16 Jason Ansel (MIT) PetaBricks June 16, 2009 41 / 47
Results Variable Accuracy
Autotuned V-cycle shapes for different accuracy requirements
10
1
Grid Size
2048 1024 512 256 128 64 32 16
10
3 Jason Ansel (MIT) PetaBricks June 16, 2009 41 / 47
Results Variable Accuracy
Autotuned V-cycle shapes for different accuracy requirements
10
1
Grid Size
2048 1024 512 256 128 64 32 16
10
3
10
5 Jason Ansel (MIT) PetaBricks June 16, 2009 41 / 47
Results Variable Accuracy
Autotuned V-cycle shapes for different accuracy requirements
10
1
Grid Size
2048 1024 512 256 128 64 32 16
10
3
10
5
10
7
Grid Size
2048 1024 512 256 128 64 32 16 Jason Ansel (MIT) PetaBricks June 16, 2009 41 / 47
Results Variable Accuracy
Dynamic programming technique for autotuning Multigrid
Jason Ansel (MIT) PetaBricks June 16, 2009 42 / 47
Results Variable Accuracy
Dynamic programming technique for autotuning Multigrid
Jason Ansel (MIT) PetaBricks June 16, 2009 42 / 47
Results Variable Accuracy
Dynamic programming technique for autotuning Multigrid
Partition accuracy space into discrete levels
Jason Ansel (MIT) PetaBricks June 16, 2009 42 / 47
Results Variable Accuracy
Dynamic programming technique for autotuning Multigrid
Partition accuracy space into discrete levels
Jason Ansel (MIT) PetaBricks June 16, 2009 42 / 47
Results Variable Accuracy
Dynamic programming technique for autotuning Multigrid
Partition accuracy space into discrete levels
Jason Ansel (MIT) PetaBricks June 16, 2009 42 / 47
Results Variable Accuracy
Dynamic programming technique for autotuning Multigrid
Grid size i Grid size 2i Partition accuracy space into discrete levels
Jason Ansel (MIT) PetaBricks June 16, 2009 42 / 47
Results Variable Accuracy
Dynamic programming technique for autotuning Multigrid
Grid size i
Grid size 2i Partition accuracy space into discrete levels Base space of candidate algorithms on optimal algorithms from coarser level
Jason Ansel (MIT) PetaBricks June 16, 2009 42 / 47
Results Variable Accuracy
Dynamic programming technique for autotuning Multigrid
Grid size i
Grid size 2i Partition accuracy space into discrete levels Base space of candidate algorithms on optimal algorithms from coarser level
Jason Ansel (MIT) PetaBricks June 16, 2009 42 / 47
Results Variable Accuracy
Dynamic programming technique for autotuning Multigrid
Grid size i
Grid size 2i Partition accuracy space into discrete levels Base space of candidate algorithms on optimal algorithms from coarser level
Jason Ansel (MIT) PetaBricks June 16, 2009 42 / 47
Results Variable Accuracy
Poisson’s Equation
1e-05 0.0001 0.001 0.01 0.1 1 10 100 1000 10000 1 10 100 1000 Time (s) Input Size Direct
Jason Ansel (MIT) PetaBricks June 16, 2009 43 / 47
Results Variable Accuracy
Poisson’s Equation
1e-05 0.0001 0.001 0.01 0.1 1 10 100 1000 10000 1 10 100 1000 Time (s) Input Size Direct Jacobi
Jason Ansel (MIT) PetaBricks June 16, 2009 43 / 47
Results Variable Accuracy
Poisson’s Equation
1e-05 0.0001 0.001 0.01 0.1 1 10 100 1000 10000 1 10 100 1000 Time (s) Input Size Direct Jacobi SOR
Jason Ansel (MIT) PetaBricks June 16, 2009 43 / 47
Results Variable Accuracy
Poisson’s Equation
1e-05 0.0001 0.001 0.01 0.1 1 10 100 1000 10000 1 10 100 1000 Time (s) Input Size Direct Jacobi SOR Multigrid
Jason Ansel (MIT) PetaBricks June 16, 2009 43 / 47
Results Variable Accuracy
Poisson’s Equation
1e-05 0.0001 0.001 0.01 0.1 1 10 100 1000 10000 1 10 100 1000 Time (s) Input Size Direct Jacobi SOR Multigrid Autotuned
Jason Ansel (MIT) PetaBricks June 16, 2009 43 / 47
Conclusion Final thoughts
Outline
1
Introduction Motivating Example Language & Compiler Overview Why choices
2
PetaBricks Language Key Ideas Compilation Example Other Language Features
3
Results Benchmarks Scalability Variable Accuracy
4
Conclusion Final thoughts
Jason Ansel (MIT) PetaBricks June 16, 2009 44 / 47
Conclusion Final thoughts
Related work
Languages
Sequoia
Libraries & domain specific tuners
STAPL ATLAS FFTW SPARSITY SPIRAL ...
Jason Ansel (MIT) PetaBricks June 16, 2009 45 / 47
Conclusion Final thoughts
For more information
PetaBricks makes programs future-proof, by allowing them to adapt to new architectures We plan to released PetaBricks at the end of summer Sign up for our mailing list to be notified For more information see: http://projects.csail.mit.edu/petabricks/ Questions?
Jason Ansel (MIT) PetaBricks June 16, 2009 46 / 47
Conclusion Final thoughts
Thank you!
Jason Ansel (MIT) PetaBricks June 16, 2009 47 / 47