petabricks a language and compiler for algorithmic choice
play

PetaBricks: A Language and Compiler for Algorithmic Choice Jason - PowerPoint PPT Presentation

PetaBricks: A Language and Compiler for Algorithmic Choice Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, Saman Amarasinghe Presentation: Thomas Etter Motivating example Sorting numbers Algorithms K-way


  1. PetaBricks: A Language and Compiler for Algorithmic Choice Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, Saman Amarasinghe Presentation: Thomas Etter

  2. Motivating example Sorting numbers Algorithms K-way MergeSort RadixSort QuickSort InsertionSort Different characteristics Composing the best hybrid sort

  3. Motivating example Sorting numbers 6 8 0 5 3 1 7 4 Algorithms 4-way Split 6 8 0 5 3 1 7 4 K-way MergeSort Sort parts 6 8 0 5 1 3 4 7 4-way Merge RadixSort 0 1 3 4 5 6 7 8 QuickSort InsertionSort Different characteristics Composing the best hybrid sort

  4. Motivating example Sorting numbers 6 8 0 5 3 1 7 4 Algorithms Look at top N bits 6 8 0 5 3 1 7 4 K-way MergeSort 0 3 1 6 5 7 4 8 Sort parts RadixSort 0 1 3 4 5 6 7 8 QuickSort InsertionSort Different characteristics Composing the best hybrid sort

  5. Motivating example Sorting numbers 6 8 0 5 3 1 7 4 Algorithms Partition by pivot 6 8 0 5 3 1 7 4 K-way MergeSort 1 3 0 5 8 6 7 4 Swap pivot/center 1 3 0 4 8 6 7 5 RadixSort Sort parts 0 1 3 4 5 6 7 8 QuickSort InsertionSort Different characteristics Composing the best hybrid sort

  6. Motivating example Sorting numbers 6 8 0 5 3 1 7 4 Algorithms 6 8 0 5 3 1 7 4 K-way MergeSort 6 8 0 5 3 1 7 4 0 RadixSort 6 8 5 3 1 7 4 0 5 6 8 3 1 7 4 QuickSort 0 3 5 6 8 1 7 4 InsertionSort 0 1 3 5 6 8 7 4 Different characteristics 0 1 3 5 6 7 8 4 Composing the best hybrid sort 0 1 3 4 5 6 7 8 0 1 3 4 5 6 7 8

  7. Motivating example Sorting numbers 6 8 0 5 3 1 7 4 Algorithms 6 8 0 5 3 1 7 4 K-way MergeSort 6 8 0 5 3 1 7 4 0 RadixSort 6 8 5 3 1 7 4 0 5 6 8 3 1 7 4 QuickSort 0 3 5 6 8 1 7 4 InsertionSort 0 1 3 5 6 8 7 4 Different characteristics 0 1 3 5 6 7 8 4 Composing the best hybrid sort 0 1 3 4 5 6 7 8 0 1 3 4 5 6 7 8

  8. The Problem Multiple algorithms/implementations Which one(s) to use? In what order? Cutoff points? For matrices: Blocking size?

  9. A New Language: Why? Expose algorithmic choice to the compiler Parallelization Automatic optimization Consistency checks between choices

  10. PetaBricks: The language Functional language transform RollingSum from A[ n ] Basic construct: transform to B[ n ] { Has one or more rules //rule 0: sum all elements to the left C++ code can be directly to ( B. cell (i) b ) from (A. region (0, i) in ) { included b=sum(in) ; Allows inclusion of existing } libraries //rule 1: use the previously computed value to (B. cell (i) b ) Has facilities for dealing with from (A. cell (i) a , matrices B. cell (i−1) leftSum) { b = a + leftSum; } }

  11. PetaBricks: The language RollingSum transform RollingSum from A[ n ] [1,2,3, 4, 5, 6]=> to B[ n ] { [1,3,6,10,15,21] //rule 0: sum all elements to the left to ( B. cell (i) b ) from (A. region (0, i) in ) { b=sum(in) ; } //rule 1: use the previously computed value to (B. cell (i) b ) from (A. cell (i) a , B. cell (i−1) leftSum) { b = a + leftSum; } }

  12. PetaBricks: The language RollingSum transform RollingSum from A[ n ] [1,2,3, 4, 5, 6]=> to B[ n ] { [1,3,6,10,15,21] //rule 0: sum all elements to the left Rule 0: O(n 2 ) to ( B. cell (i) b ) from (A. region (0, i) in ) { b=sum(in) ; A[0] A[1] A[2] A[3] A[4] A[5] A[6] } //rule 1: use the previously computed value to (B. cell (i) b ) from (A. cell (i) a , B. cell (i−1) leftSum) { b = a + leftSum; } } B[0] B[1] B[2] B[3] B[4] B[5] B[6]

  13. PetaBricks: The language RollingSum transform RollingSum from A[ n ] [1,2,3, 4, 5, 6]=> to B[ n ] { [1,3,6,10,15,21] //rule 0: sum all elements to the left Rule 1: O(n) to ( B. cell (i) b ) from (A. region (0, i) in ) { b=sum(in) ; A[0] A[1] A[2] A[3] A[4] A[5] A[6] } //rule 1: use the previously computed value to (B. cell (i) b ) from (A. cell (i) a , B. cell (i−1) leftSum) { b = a + leftSum; } } B[0] B[1] B[2] B[3] B[4] B[5] B[6]

  14. PetaBricks: Compilation Analyse dependencies transform RollingSum from A[ n ] to B[ n ] B(i) = rule0(i) B(i) = rule1(i) { //rule 0: sum all elements to the left Depends on Depends on to ( B. cell (i) b ) from (A. region (0, i) in ) { A(0 to i) B(i-1),A(i) b=sum(in) ; } //rule 1: use the previously computed value to (B. cell (i) b ) from (A. cell (i) a , B. cell (i−1) leftSum) { b = a + leftSum; } }

  15. PetaBricks: Compilation Analyse dependencies transform RollingSum from A[ n ] to B[ n ] B(i) = rule0(i) B(i) = rule1(i) { //rule 0: sum all elements to the left Depends on Depends on to ( B. cell (i) b ) from (A. region (0, i) in ) { A(0 to i) B(i-1),A(i) b=sum(in) ; } Compute applicable regions: //rule 1: use the previously computed value to (B. cell (i) b ) Rule 1: [0, n) from (A. cell (i) a , B. cell (i−1) leftSum) { Rule 2: [1, n) b = a + leftSum; } }

  16. PetaBricks: The implementation Source-to-source compiler Petabricks Source Translates PetaBricks to C++ Compiles code for tuning PetaBricks Compiler Autotuning system C++ Code Runtime library Runtime Executable Linked

  17. PetaBricks: Compilation Analyse dependencies transform RollingSum from A[ n ] to B[ n ] B(i) = rule0(i) B(i) = rule1(i) { //rule 0: sum all elements to the left Depends on Depends on to ( B. cell (i) b ) from (A. region (0, i) in ) { A(0 to i) B(i-1),A(i) b=sum(in) ; } Compute applicable regions: //rule 1: use the previously computed value to (B. cell (i) b ) Rule 0: [0, n) from (A. cell (i) a , B. cell (i−1) leftSum) { Rule 1: [1, n) b = a + leftSum; Tunable parameter: splitsize } }

  18. PetaBricks: Compilation Analyse dependencies transform RollingSum from A[ n ] to B[ n ] B(i) = rule0(i) B(i) = rule1(i) { //rule 0: sum all elements to the left Depends on Depends on to ( B. cell (i) b ) from (A. region (0, i) in ) { A(0 to i) B(i-1),A(i) b=sum(in) ; } Compute applicable regions: //rule 1: use the previously computed value to (B. cell (i) b ) Rule 0: [0, n) from (A. cell (i) a , B. cell (i−1) leftSum) { Rule 1: [1, n) b = a + leftSum; Tunable parameter: splitsize } }

  19. PetaBricks: Compilation Analyse dependencies transform RollingSum from A[ n ] to B[ n ] B(i) = rule0(i) B(i) = rule1(i) { //rule 0: sum all elements to the left Depends on Depends on to ( B. cell (i) b ) from (A. region (0, i) in ) { A(0 to i) B(i-1),A(i) b=sum(in) ; } Compute applicable regions: //rule 1: use the previously computed value to (B. cell (i) b ) Rule 0: [0, n) from (A. cell (i) a , B. cell (i−1) leftSum) { Rule 1: [1, n) b = a + leftSum; Tunable parameter: splitsize } }

  20. Tuning Seed with “pure” algorithms Tune bottom-up Start small Evolve configurations Measure Tune additional parameters Parallel-sequential cutoff points Select N fastest Use existing/ add level/ Mutate Double input size

  21. Tuning Seed with “pure” algorithms Tune bottom-up Start small Evolve configurations Measure Tune additional parameters Parallel-sequential cutoff points Select N fastest Use existing/ add level/ Mutate Double input size

  22. Tuning Seed with “pure” algorithms Tune bottom-up Start all single-algorithm implementations Measure small training input Double input every iteration Keep the N fastest algorithms Select Extend/Mutate the fastest algorithms N fastest Tune additional parameters Parallel-sequential cutoff points Use existing/ add level/ Mutate Double input size

  23. Automatic Blocking AB[w,h] = A[c,h] * B[w,c] transform MatrixMultiply from A[c,h], B[w,c] to AB[w,h] { // Base case, compute a single element to (AB. cell (x,y) out) from (A. row (y) a, B. column (x) b) { out = dot(a,b); } // Recursively decompose in c to (AB ab) from (A. region ( 0, 0, c/2, h) a1, A. region (c/2, 0, c, h) a2, B. region ( 0, 0, w, c/2) b1, B. region ( 0, c/2, w, c) b2) { ab = MatrixAdd(MatrixMultiply(a1, b1), MatrixMultiply(a2, b2)); } }

  24. Automatic Blocking AB[w,h] = A[c,h] * B[w,c] transform MatrixMultiply from A[c,h], B[w,c] to AB[w,h] { // Base case, compute a single element to (AB. cell (x,y) out) from (A. row (y) a, B. column (x) b) { out = dot(a,b); } // Recursively decompose in c to (AB ab) from (A. region ( 0, 0, c/2, h) a1, A. region (c/2, 0, c, h) a2, B. region ( 0, 0, w, c/2) b1, B. region ( 0, c/2, w, c) b2) { ab = MatrixAdd(MatrixMultiply(a1, b1), MatrixMultiply(a2, b2)); } }

  25. Automatic Blocking AB[w,h] = A[c,h] * B[w,c] transform MatrixMultiply from A[c,h], B[w,c] Blocking on c is non-trivial to AB[w,h] { // Base case, compute a single element to (AB. cell (x,y) out) from (A. row (y) a, B. column (x) b) { out = dot(a,b); } // Recursively decompose in c to (AB ab) from (A. region ( 0, 0, c/2, h) a1, A. region (c/2, 0, c, h) a2, B. region ( 0, 0, w, c/2) b1, B. region ( 0, c/2, w, c) b2) { ab = MatrixAdd(MatrixMultiply(a1, b1), MatrixMultiply(a2, b2)); } }

  26. Automatic Blocking transform MatrixMultiply from A[c,h], B[w,c] to AB[w,h] { // Base case, compute a single element to (AB. cell (x,y) out) from (A. row (y) a, B. column (x) b) { out = dot(a,b); } // Recursively decompose in c to (AB ab) from (A. region ( 0, 0, c/2, h) a1, A. region (c/2, 0, c, h) a2, B. region ( 0, 0, w, c/2) b1, B. region ( 0, c/2, w, c) b2) { ab = MatrixAdd(MatrixMultiply(a1, b1), MatrixMultiply(a2, b2)); } }

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend