PetaBricks and Julia Kathleen C. Alexander Massachusetts Institute - - PowerPoint PPT Presentation
PetaBricks and Julia Kathleen C. Alexander Massachusetts Institute - - PowerPoint PPT Presentation
PetaBricks and Julia Kathleen C. Alexander Massachusetts Institute of Technology December 11th, 2013 Motivation Motivation Background Approach Results Recommendations Index The Programmers Dilemma a personal example energy
Motivation
Motivation Background Approach Results Recommendations Index
The Programmer’s Dilemma
a personal example— energy landscapes
K.C. Alexander (MIT) PetaBricks and Julia 1 / 15
Motivation Background Approach Results Recommendations Index
The Programmer’s Dilemma
which algorithm is best?
K.C. Alexander (MIT) PetaBricks and Julia 1 / 15
Motivation Background Approach Results Recommendations Index
The Programmer’s Dilemma
which algorithm is best? Goal: determine the best algorithm for the application– which may be machine dependent
K.C. Alexander (MIT) PetaBricks and Julia 1 / 15
Motivation Background Approach Results Recommendations Index
Parallel Programming
- many parts of these al-
gorithms can be written in parallel
- often they can be paral-
lelized in many different ways
- optimizing these options
is a challenge
Determine the best way to parallelize the program– which will be machine dependent
K.C. Alexander (MIT) PetaBricks and Julia 2 / 15
Motivation Background Approach Results Recommendations Index
Parallel Programming
- many parts of these al-
gorithms can be written in parallel
- often they can be paral-
lelized in many different ways
- optimizing these options
is a challenge
Determine the best way to parallelize the program– which will be machine dependent
K.C. Alexander (MIT) PetaBricks and Julia 2 / 15
Background
Motivation Background Approach Results Recommendations Index
Petabricks – Algorithmic Choice
PetaBricks was developed to alleviate some of the optimiza- tion responsibility from the programmer the transform
K.C. Alexander (MIT) PetaBricks and Julia 3 / 15
Motivation Background Approach Results Recommendations Index
Petabricks – Algorithmic Choice
PetaBricks was developed to alleviate some of the optimiza- tion responsibility from the programmer the transform compiling framework
Ansel, et al. ACM SIGPLAN Conference (2009). K.C. Alexander (MIT) PetaBricks and Julia 3 / 15
Motivation Background Approach Results Recommendations Index
Petabricks – Autotuning
the autotuner determines the best configuration for the ma- chine under the tuning constraints
K.C. Alexander (MIT) PetaBricks and Julia 4 / 15
Motivation Background Approach Results Recommendations Index
Petabricks – Autotuning
Sort
Ansel, et al. ACM SIGPLAN Conference (2009). K.C. Alexander (MIT) PetaBricks and Julia 4 / 15
Motivation Background Approach Results Recommendations Index
Petabricks – Autotuning
Eigen Problem
Ansel, et al. ACM SIGPLAN Conference (2009). K.C. Alexander (MIT) PetaBricks and Julia 4 / 15
Motivation Background Approach Results Recommendations Index
Petabricks – Autotuning
Matrix Multiply
Ansel, et al. ACM SIGPLAN Conference (2009). K.C. Alexander (MIT) PetaBricks and Julia 4 / 15
Motivation Background Approach Results Recommendations Index
Julia
- Julia was developed to bridge the gap between interpreted
and compiled scientific computing
- streamlining parallelization techniques has been a priority
K.C. Alexander (MIT) PetaBricks and Julia 5 / 15
Motivation Background Approach Results Recommendations Index
Julia
http://forio.com/julia/julia K.C. Alexander (MIT) PetaBricks and Julia 5 / 15
Motivation Background Approach Results Recommendations Index
Julia
http://forio.com/julia/julia
Question: is there room for overlap between the PetaBricks and Julia approaches?
K.C. Alexander (MIT) PetaBricks and Julia 5 / 15
Approach
Motivation Background Approach Results Recommendations Index
Options for Implementation
Julia in PetaBricks
- can utilize PetaBricks autotuner and compiler
- PetaBricks compiler needs to interpret Julia
K.C. Alexander (MIT) PetaBricks and Julia 6 / 15
Motivation Background Approach Results Recommendations Index
Options for Implementation
Julia in PetaBricks
- can utilize PetaBricks autotuner and compiler
- PetaBricks compiler needs to interpret Julia
PetaBricks in Julia
- can run PetaBricks binaries inside Julia
- no PetaBricks shared object files, functions require disk i/o
- doesn’t take advantage of JuliaLang
K.C. Alexander (MIT) PetaBricks and Julia 6 / 15
Motivation Background Approach Results Recommendations Index
Options for Implementation
Julia in PetaBricks
- can utilize PetaBricks autotuner and compiler
- PetaBricks compiler needs to interpret Julia
PetaBricks in Julia
- can run PetaBricks binaries inside Julia
- no PetaBricks shared object files, functions require disk i/o
- doesn’t take advantage of JuliaLang
Julia + OpenTuner
- apply PetaBricks framework to Julia
- utilize OpenTuner to optimize Julia
K.C. Alexander (MIT) PetaBricks and Julia 6 / 15
Motivation Background Approach Results Recommendations Index
Approach Used Here
PetaBricks in Julia
- can run PetaBricks binaries inside Julia
- no PetaBricks shared object files, functions require disk i/o
- doesn’t take advantage of JuliaLang
K.C. Alexander (MIT) PetaBricks and Julia 7 / 15
Motivation Background Approach Results Recommendations Index
Approach Used Here
PetaBricks in Julia
- can run PetaBricks binaries inside Julia
- no PetaBricks shared object files, functions require disk i/o
- doesn’t take advantage of JuliaLang
⇒ most naive approach possible: → compile PetaBricks executable, exe → julia ¿ run(‘$exe $in $out‘)
K.C. Alexander (MIT) PetaBricks and Julia 7 / 15
Motivation Background Approach Results Recommendations Index
Approach Used Here
PetaBricks in Julia
- can run PetaBricks binaries inside Julia
- no PetaBricks shared object files, functions require disk i/o
- doesn’t take advantage of JuliaLang
⇒ most naive approach possible: → compile PetaBricks executable, exe → julia ¿ run(‘$exe $in $out‘) ⇒ compare with PetaBricks and Julia alone → lower bound of performance improvement → is there proof of benefit?
K.C. Alexander (MIT) PetaBricks and Julia 7 / 15
Results
Motivation Background Approach Results Recommendations Index
PetaBricks- Tuning Improvements
performance improvement— tuned and untuned PetaBricks Matrix Multiply
1000 2000 3000 4000 Size 50 100 150 200 250 Wall-Clock Time [s] tuned untuned
K.C. Alexander (MIT) PetaBricks and Julia 8 / 15
Motivation Background Approach Results Recommendations Index
Comparing PetaBricks with Julia - Apples to Apples
PetaBricks → functions read in ASCII files and output same → determines parallelization during autotuning → autotuning can take days Julia → JIT for each independent execution → can addprocs(n), but may not parallelize → can be used interactively
K.C. Alexander (MIT) PetaBricks and Julia 9 / 15
Motivation Background Approach Results Recommendations Index
Comparing PetaBricks with Julia - Apples to Apples
PetaBricks → functions read in ASCII files and output same → determines parallelization during autotuning → autotuning can take days PetaBricks → JIT for each independent executable → can addprocs(n), but may not parallelize → can be used interactively → make both programs do i/o → run both programs from shell → try addprocs(n) in Julia, with no other instructions → subtract ’hello world’ start-up time from Julia wall-clock
K.C. Alexander (MIT) PetaBricks and Julia 9 / 15
Motivation Background Approach Results Recommendations Index
Comparing PetaBricks to Julia - EigenSolve
EigenSolve
200 400 600 800 1000 Size 2 4 6 8 10 Wall-Clock Time [s] Julia Julia-Scaled PetaBricks
→ Julia seems to do the best for large matrices → however, the results were not comparable → this test was not a good apples-to-apples perfor- mance test
K.C. Alexander (MIT) PetaBricks and Julia 10 / 15
Motivation Background Approach Results Recommendations Index
Comparing PetaBricks with Julia - Sort
Sort
20 40 60 80 100 Size [104 ] 2 4 6 8 Wall-Clock Time [s] Julia Julia-Scaled PetaBricks
→ Julia and PetBricks con- verge for large vectors → PetaBricks is better with shorter vectors → effect of i/o not consid- ered wrt performance
K.C. Alexander (MIT) PetaBricks and Julia 11 / 15
Motivation Background Approach Results Recommendations Index
Comparing PetaBricks with Julia Matrix Multiply
i5-3339 (4 CPU)
200 400 600 800 1000 Size 2 4 6 8 Wall-Clock Time [s] Julia Julia-Scaled PetaBricks
i7-3770 (8 CPU)
1000 2000 3000 4000 Size 20 40 60 80 Wall-Clock Time [s] Julia Julia-Scaled Julia-Scaled-8p PetaBricks
→ Julia and PetBricks converge moderate matrix sizes on fewer cores → PetaBricks is better with smaller lists and larger matrices → using addprocs(n) with no other instruction does not utilize parallel func- tionality in Julia
K.C. Alexander (MIT) PetaBricks and Julia 12 / 15
Motivation Background Approach Results Recommendations Index
Running PetaBricks from Julia
Matrix Multiply
1000 2000 3000 4000 Size 20 40 60 80 Wall-Clock Time [s] Julia-Scaled PB-Julia-Scaled PetaBricks
→ Can get PetaBricks im- provement by incorpo- rating PetaBricks exe- cutable in Julia → effect of i/o not consid- ered wrt performance
K.C. Alexander (MIT) PetaBricks and Julia 13 / 15
Recommendations
Motivation Background Approach Results Recommendations Index
Recommendations
Matrix Multiply
200 400 600 800 1000 Size 2 4 6 8 Wall-Clock Time [s] Julia Julia-Scaled PetaBricks → under many circum- stances, Julia performs as well as PetaBricks without days
- f
compilation
K.C. Alexander (MIT) PetaBricks and Julia 14 / 15
Motivation Background Approach Results Recommendations Index
Recommendations
Matrix Multiply
200 400 600 800 1000 Size 2 4 6 8 Wall-Clock Time [s] Julia Julia-Scaled PetaBricks → there is room for im- provement on the start- up time for Julia
K.C. Alexander (MIT) PetaBricks and Julia 14 / 15
Motivation Background Approach Results Recommendations Index
Recommendations
Matrix Multiply
1000 2000 3000 4000 Size 20 40 60 80 Wall-Clock Time [s] Julia-Scaled PB-Julia-Scaled PetaBricks → PetaBricks performance can be achieved by using a shell command in Julia
K.C. Alexander (MIT) PetaBricks and Julia 14 / 15
Motivation Background Approach Results Recommendations Index
Recommendations
Ansel, et. al. MIT CSAIL Technical Report MIT-CSAIL-TR-2013-026 (2013).
→ implementing Open- Tuner (when better documentation is avail- able) with Julia may be a reasonable long term goal for performance gains of this kind
K.C. Alexander (MIT) PetaBricks and Julia 14 / 15
Motivation Background Approach Results Recommendations Index
Index I
1
Motivation Background Approach Results Recommendations IndexThe Programmer’s Dilemma
a personal example— energy landscapes
K.C. Alexander (MIT) PetaBricks and Julia 1 / 152
Motivation Background Approach Results Recommendations IndexPetabricks – Algorithmic Choice
PetaBricks was developed to alleviate some of the optimiza- tion responsibility from the programmer the transform compiling framework
Ansel, et al. ACM SIGPLAN Conference (2009). K.C. Alexander (MIT) PetaBricks and Julia 3 / 153
Motivation Background Approach Results Recommendations IndexPetabricks – Autotuning
Sort
Ansel, et al. ACM SIGPLAN Conference (2009). K.C. Alexander (MIT) PetaBricks and Julia 4 / 154
Motivation Background Approach Results Recommendations IndexJulia
http://forio.com/julia/juliaQuestion: is there room for overlap between the PetaBricks and Julia approaches?
K.C. Alexander (MIT) PetaBricks and Julia 5 / 155
Motivation Background Approach Results Recommendations IndexApproach Used Here
PetaBricks in Julia
- can run PetaBricks binaries inside Julia
- no PetaBricks shared object files, functions require disk i/o
- doesn’t take advantage of JuliaLang
⇒ most naive approach possible: → compile PetaBricks executable, exe → julia ¿ run(‘$exe $in $out‘) ⇒ compare with PetaBricks and Julia alone → lower bound of performance improvement → is there proof of benefit?
K.C. Alexander (MIT) PetaBricks and Julia 7 / 156
Motivation Background Approach Results Recommendations IndexComparing PetaBricks to Julia - EigenSolve
EigenSolve 200 400 600 800 1000 Size 2 4 6 8 10 Wall-Clock Time [s] Julia Julia-Scaled PetaBricks
→ Julia seems to do the best for large matrices → however, the results were not comparable → this test was not a good apples-to-apples perfor- mance test K.C. Alexander (MIT) PetaBricks and Julia 10 / 15
7
Motivation Background Approach Results Recommendations IndexComparing PetaBricks with Julia Matrix Multiply
i5-3339 (4 CPU)
200 400 600 800 1000 Size 2 4 6 8 Wall-Clock Time [s] Julia Julia-Scaled PetaBricksi7-3770 (8 CPU)
1000 2000 3000 4000 Size 20 40 60 80 Wall-Clock Time [s] Julia Julia-Scaled Julia-Scaled-8p PetaBricks → Julia and PetBricks converge moderate matrix sizes on fewer cores → PetaBricks is better with smaller lists and larger matrices → using addprocs(n) with no other instruction does not utilize parallel func- tionality in Julia K.C. Alexander (MIT) PetaBricks and Julia 12 / 158
Motivation Background Approach Results Recommendations IndexRunning PetaBricks from Julia
Matrix Multiply 1000 2000 3000 4000 Size 20 40 60 80 Wall-Clock Time [s] Julia-Scaled PB-Julia-Scaled PetaBricks
→ Can get PetaBricks im- provement by incorpo- rating PetaBricks exe- cutable in Julia → effect of i/o not consid- ered wrt performance K.C. Alexander (MIT) PetaBricks and Julia 13 / 15
9
Motivation Background Approach Results Recommendations IndexRecommendations
Ansel, et. al. MIT CSAIL Technical Report MIT-CSAIL-TR-2013-026 (2013). → implementing Open- Tuner (when better documentation is avail- able) with Julia may be a reasonable long term goal for performance gains of this kind K.C. Alexander (MIT) PetaBricks and Julia 14 / 15K.C. Alexander (MIT) PetaBricks and Julia 15 / 15