Just-in-time Length Specialization of Dynamic Vector Code
Justin Talbot Zachary DeVito Pat Hanrahan Tableau Research Stanford University
(ARRAY 2014)
Just-in-time Length Specialization of Dynamic Vector Code Justin - - PowerPoint PPT Presentation
Just-in-time Length Specialization of Dynamic Vector Code Justin Talbot Zachary DeVito Pat Hanrahan Tableau Research Stanford University (ARRAY 2014) Tableau Tableau + R Riposte Bytecode interpreter and tracing JIT compiler for R
Justin Talbot Zachary DeVito Pat Hanrahan Tableau Research Stanford University
(ARRAY 2014)
(how fast can it be? don’t reason from incremental changes!)
(If I know that a variable is a vector of length 4, what else can I figure out?)
conform to each other
(large dynamically-allocated arrays)
bottlenecks
(scalars, tuples, short dynamically-allocated arrays)
computation time must be dominated by loops.
make assumptions that lead to optimizations (specialization)
(frequency * performance improvement)
(to amortize overhead)
languages (Javascript, etc.)
application areas
➡ Recycling overhead is frequently unnecessary
➡ Specialized code has a high probability of being
reused
0% 25% 50% 75% 100% 1 [27, 28) [215, 216)
vector length (binned on log2 scale) average prediction rate
0% 25% 50% 75% 100% 1 [27, 28) [215, 216)
vector length (binned on log2 scale) average prediction rate
<8
abstract lengths
introduce guards
and if both are loop carried or if both aren’t
+ introduce guards
(can’t have intervening recycle operations)
(needs concrete lengths)
(needs same lengths)
wide range of vector lengths?
style so we can vary length of input vectors
C (clang 3.1 -O3 + autovectorization)
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale)
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean ShiftRandom Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale) Specialization R
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale)
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean ShiftRandom Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale) Specialization R C
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale)
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean ShiftRandom Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale) Specialization R C
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale)
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean ShiftRandom Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale) Specialization R C
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale)
American Put Binary Search BlackScholes Column SumFibonacci MandelbrotMean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale) Specialization R C No Specialization
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale)
American Put Binary Search BlackScholes Column SumFibonacci MandelbrotMean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale) Specialization R C No Specialization
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale)
American Put Binary Search BlackScholes Column SumFibonacci MandelbrotMean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale) Specialization R C No Specialization Recycling
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale)
American Put Binary Search BlackScholes Column SumFibonacci MandelbrotMean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale) Specialization R C No Specialization Recycling
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale)
American Put Binary Search BlackScholes Column SumFibonacci MandelbrotMean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale) Specialization R C No Specialization Recycling
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale)
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale) Specialization R C No Specialization Recycling Recycling+Short Vectors
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale)
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale) Specialization R C No Specialization Recycling Recycling+Short Vectors
American Put Binary Search BlackScholes Column Sum Fibonacci Mandelbrot Mean Shift Random Walk Riemann zeta RungeKutta
1 × 10 × 100 × 1000 × 10000 × 1 × 10 × 100 × 1000 × 10000 × 1 28 216 1 28 216 1 28 216 1 28 216 1 28 216
vector length (log scale) normalized throughput (log scale)
vector sizes, but not yet as good as hand-written C
not all.
increase compilation overhead “much”
heuristics increases performance across a wide range of vector lengths
to demonstrate that our approach works in the wild.
languages?
internal implementation details
.Map(ff_name, ...)
Runtime handles recycling arguments and calls ff_name to get each result.
.Reduce(ff_name, base_case, ...)
Runtime handles iteration
(e.g. transcendental functions)