Scalarization of Index Vectors in Compiled APL Robert Bernecky - - PowerPoint PPT Presentation

scalarization of index vectors in compiled apl
SMART_READER_LITE
LIVE PREVIEW

Scalarization of Index Vectors in Compiled APL Robert Bernecky - - PowerPoint PPT Presentation

Scalarization of Index Vectors in Compiled APL Robert Bernecky Snake Island Research Inc 18 Fifth Street, Wards Island Toronto, Canada tel: +1 416 203 0854 bernecky@snakeisland.com September 30, 2011 . . . . . . Robert Bernecky


slide-1
SLIDE 1

. . . . . .

Scalarization of Index Vectors in Compiled APL

Robert Bernecky

Snake Island Research Inc 18 Fifth Street, Ward’s Island Toronto, Canada tel: +1 416 203 0854 bernecky@snakeisland.com

September 30, 2011

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-2
SLIDE 2

. . . . . .

Abstract

High-performance for array languages offers several unique challenges to the compiler writer, including fusion of loops over large arrays, detection and elimination of scalars as arbitrary arrays, and eliminating or minimizing the run-time creation of index vectors. We introduce one of those challenges in the context of SAC, a functional array languge, and give preliminary results on the performance of a compiler that eliminates index vectors by scalarizing them within the optimization cycle.

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-3
SLIDE 3

. . . . . .

The Question

◮ How much faster is compiled APL than interpreted APL?

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-4
SLIDE 4

. . . . . .

The Question

◮ How much faster is compiled APL than interpreted APL? ◮ The answer is NOT a scalar.

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-5
SLIDE 5

. . . . . .

Environment

◮ Dyalog APL 13.0 vs. APEX/SAC

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-6
SLIDE 6

. . . . . .

Environment

◮ Dyalog APL 13.0 vs. APEX/SAC ◮ The current SAC compiler

◮ a functional array language ◮ data-parallel nested loops: With-Loop ◮ array-based optimizations ◮ functional loops and conditionals as functions Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-7
SLIDE 7

. . . . . .

Environment

◮ Dyalog APL 13.0 vs. APEX/SAC ◮ The current SAC compiler

◮ a functional array language ◮ data-parallel nested loops: With-Loop ◮ array-based optimizations ◮ functional loops and conditionals as functions

◮ Goal: Compiled APL performance competitive with

hand-coded C

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-8
SLIDE 8

. . . . . .

Some Reasons Why APL is Slow

◮ Fixed per-primitive overheads: Syntax analysis, conf checks,

fn dispatch, mem mgmt

◮ Variable per-primitive overheads ◮ Index vector materialization

0.1 1 10 100 1,000 buildvAKS buildvfAKS buildv2AKS compiotaAKS compiotadAKS csbenchAKS downgradePVAKS fdAKS gewlfAKS histgradeAKS histlpAKS histopAKS histopfAKS iotanAKS ipapeAKS ipbbAKS ipbdAKS ipddAKS ipopneAKS ipplusandAKS lltopAKS logd3AKS logd4AKS loopfsAKS loopfvAKS loopisAKS mconvAKS mconvoutAKS nsvAKS nthoneAKS schedrAKS scsAKS testforAKS testindxAKS testlcvAKS unirandAKS unirand3AKS upgradeBoolAKS upgradeCharAKS upgradePVAKS upgradeRPVAKS Speedup (APL/APEX w/AWLF) Benchmark name APL vs. APEX CPU Time Performance (2,011−09−30) Higher is better for APEX APL: Dyalog APL 13.0 SAC: 17,654:MODIFIED Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-9
SLIDE 9

. . . . . .

Why APL is Slow: Fixed Per-Primitive Overheads

20 40 60 80 100 120 140 microseconds/element 1 3 5 7 9 11 13 15 17 19 21 23 25 # elements in array

APL Primitive Overhead

time/element for (Intvec+intvec)

Who suffers? Apps dominated by operations on scalars: CRC, loopy histograms, dynamic programming, RNG

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-10
SLIDE 10

. . . . . .

Why APL is Slow: Fixed Per-Primitive Overheads

10 100 1,000 crcAKS histlpAKS lltopAKS loopisAKS scsAKS testforAKS Speedup (APL/APEX w/AWLF) Benchmark name APL vs. APEX CPU Time Performance (2,011−09−30) Higher is better for APEX APL: Dyalog APL 13.0 SAC: 17,654:MODIFIED Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-11
SLIDE 11

. . . . . .

Why APL is Slow: Fixed Per-Primitive Overheads

◮ Scalar-dominated apps have good serial speedup. . . ◮ but poor parallel speedup 0x 0.1x 0.2x 0.3x 0.4x 0.5x crc histlp lltop loopis scs testfor Execution time (sec) # threads APEX/SAC Parallel Performance SAC (17654:MODIFIED) real time 2,011−09−30 6−core AMD Phenom II X6 1,075T

mt1 mt2 mt3 mt4 mt5 mt6

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-12
SLIDE 12

. . . . . .

Why is APL Slow? Variable Per-Primitive Overheads

◮ Naive execution: Limited fn composition, e.g. sum( iota( N))

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-13
SLIDE 13

. . . . . .

Why is APL Slow? Variable Per-Primitive Overheads

◮ Naive execution: Limited fn composition, e.g. sum( iota( N)) ◮ Array-valued intermediate results: memory madness

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-14
SLIDE 14

. . . . . .

Why is APL Slow? Variable Per-Primitive Overheads

◮ Naive execution: Limited fn composition, e.g. sum( iota( N)) ◮ Array-valued intermediate results: memory madness ◮ Who suffers? Apps dominated by operations on large arrays:

Signal processing, convolution, normal move-out

0.1 1 10 100 1,000 iotanAKS ipapeAKS ipbdAKS ipopneAKS logd3AKS logd4AKS upgradeCharAKS Speedup (APL/APEX w/AWLF) Benchmark name APL vs. APEX CPU Time Performance (2,011−09−30) Higher is better for APEX APL: Dyalog APL 13.0 SAC: 17,654:MODIFIED Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-15
SLIDE 15

. . . . . .

Why is APL Slow? Variable Per-Primitive Overheads

◮ Who suffers? Apps dominated by operations on large arrays:

Signal processing, convolution, normal move-out

0x 0.2x 0.4x 0.6x 0.8x 1x 1.2x iotan ipape ipbbAKD ipbd ipopneAKD ipopne logd3 logd4 upgradeChar Execution time (sec) # threads APEX/SAC Parallel Performance SAC (17654:MODIFIED) real time 2,011−09−30 6−core AMD Phenom II X6 1,075T

mt1 mt2 mt3 mt4 mt5 mt6

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-16
SLIDE 16

. . . . . .

Why is APL Slow? Materialized Index Vectors

◮ Mike Jenkins’ matrix divide model

a[;i,pi] = a[;pi,i]

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-17
SLIDE 17

. . . . . .

Why is APL Slow? Materialized Index Vectors

◮ Mike Jenkins’ matrix divide model

a[;i,pi] = a[;pi,i]

◮ [i,pi] and [pi,i] are materialized index vectors

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-18
SLIDE 18

. . . . . .

Why is APL Slow? Materialized Index Vectors

◮ Mike Jenkins’ matrix divide model

a[;i,pi] = a[;pi,i]

◮ [i,pi] and [pi,i] are materialized index vectors ◮ A few simple changes to scalarize index vectors:

tmp = a[;i] a[;i] = a[;pi] a[;pi] = tmp

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-19
SLIDE 19

. . . . . .

Why is APL Slow? Materialized Index Vectors

◮ Mike Jenkins’ matrix divide model

a[;i,pi] = a[;pi,i]

◮ [i,pi] and [pi,i] are materialized index vectors ◮ A few simple changes to scalarize index vectors:

tmp = a[;i] a[;i] = a[;pi] a[;pi] = tmp

◮ Matrix divide model now runs twice as fast!

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-20
SLIDE 20

. . . . . .

Why Materialized Index Vectors are Expensive

◮ Materialization of [i,pi] and [pi,i]:

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-21
SLIDE 21

. . . . . .

Why Materialized Index Vectors are Expensive

◮ Materialization of [i,pi] and [pi,i]: ◮ (* for indexing part)

*Increment reference counts on i and pi Allocate 2-element temp vector Initialize temp vector descriptor Initialize temp vector elements *Perform indexing Deallocate 2-element temp vector *Decrement reference counts on i and pi

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-22
SLIDE 22

. . . . . .

Why Materialized Index Vectors are Expensive

◮ Who suffers? Apps using explicit array indexing

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-23
SLIDE 23

. . . . . .

Why Materialized Index Vectors are Expensive

◮ Who suffers? Apps using explicit array indexing ◮ e.g., many apps dominated by indexed assign

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-24
SLIDE 24

. . . . . .

Why Materialized Index Vectors are Expensive

◮ Who suffers? Apps using explicit array indexing ◮ e.g., many apps dominated by indexed assign ◮ Who suffers? Matrix divide, compress, deal, dynamic

programming

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-25
SLIDE 25

. . . . . .

Why Materialized Index Vectors are Expensive

◮ Who suffers? Apps using explicit array indexing ◮ e.g., many apps dominated by indexed assign ◮ Who suffers? Matrix divide, compress, deal, dynamic

programming

◮ Who suffers? Inner products that use the CDC STAR-100

algorithm

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-26
SLIDE 26

. . . . . .

Why Materialized Index Vectors are Expensive

◮ Who suffers? Apps using explicit array indexing ◮ e.g., many apps dominated by indexed assign ◮ Who suffers? Matrix divide, compress, deal, dynamic

programming

◮ Who suffers? Inner products that use the CDC STAR-100

algorithm

◮ 800x800 ipplusandAKD CPU time: 45 minutes!

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-27
SLIDE 27

. . . . . .

Eliminating Materialized Index Vectors With IVE

◮ IFL2006 paper: Index Vector Elimination (IVE)

Bernecky, Grelck, Herhut, Scholz, Trojahner, and Schafarenko

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-28
SLIDE 28

. . . . . .

Eliminating Materialized Index Vectors With IVE

◮ IFL2006 paper: Index Vector Elimination (IVE)

Bernecky, Grelck, Herhut, Scholz, Trojahner, and Schafarenko

◮ Post-optimization transformation

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-29
SLIDE 29

. . . . . .

Eliminating Materialized Index Vectors With IVE

◮ IFL2006 paper: Index Vector Elimination (IVE)

Bernecky, Grelck, Herhut, Scholz, Trojahner, and Schafarenko

◮ Post-optimization transformation ◮ Start with:

IV = [ i, j, k] z = sel( IV, M)

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-30
SLIDE 30

. . . . . .

Eliminating Materialized Index Vectors With IVE

◮ IFL2006 paper: Index Vector Elimination (IVE)

Bernecky, Grelck, Herhut, Scholz, Trojahner, and Schafarenko

◮ Post-optimization transformation ◮ Start with:

IV = [ i, j, k] z = sel( IV, M)

◮ IVE: Replace: z = M[ IV] by:

IV = [ i, j, k]

  • ffset = vect2offset( shape(M), IV)

z = idxsel( offset, M)

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-31
SLIDE 31

. . . . . .

Eliminating Materialized Index Vectors With IVE

◮ IFL2006 paper: Index Vector Elimination (IVE)

Bernecky, Grelck, Herhut, Scholz, Trojahner, and Schafarenko

◮ Post-optimization transformation ◮ Start with:

IV = [ i, j, k] z = sel( IV, M)

◮ IVE: Replace: z = M[ IV] by:

IV = [ i, j, k]

  • ffset = vect2offset( shape(M), IV)

z = idxsel( offset, M)

◮ Implication: IV is a materialized index vector!

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-32
SLIDE 32

. . . . . .

Eliminating Materialized Index Vectors With IVE

◮ IFL2006 paper: Index Vector Elimination (IVE)

Bernecky, Grelck, Herhut, Scholz, Trojahner, and Schafarenko

◮ Post-optimization transformation ◮ Start with:

IV = [ i, j, k] z = sel( IV, M)

◮ IVE: Replace: z = M[ IV] by:

IV = [ i, j, k]

  • ffset = vect2offset( shape(M), IV)

z = idxsel( offset, M)

◮ Implication: IV is a materialized index vector! ◮ Implication: offset calculation may be liftable (LIR)

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-33
SLIDE 33

. . . . . .

Eliminating Materialized Index Vectors With IVE

◮ IVE: If IV can be represented as scalars, eliminate

  • vect2offset. Before:

IV = [ i, j, k]

  • ffset = vect2offset( shape(M), IV)

z = idxsel( offset, M)

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-34
SLIDE 34

. . . . . .

Eliminating Materialized Index Vectors With IVE

◮ IVE: If IV can be represented as scalars, eliminate

  • vect2offset. Before:

IV = [ i, j, k]

  • ffset = vect2offset( shape(M), IV)

z = idxsel( offset, M)

◮ After:

IV = [ i, j, k]

  • ffset = idxs2offset( shape(M), i, j, k)

z = idxsel( offset, M)

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-35
SLIDE 35

. . . . . .

Eliminating Materialized Index Vectors With IVE

◮ IVE: If IV can be represented as scalars, eliminate

  • vect2offset. Before:

IV = [ i, j, k]

  • ffset = vect2offset( shape(M), IV)

z = idxsel( offset, M)

◮ After:

IV = [ i, j, k]

  • ffset = idxs2offset( shape(M), i, j, k)

z = idxsel( offset, M)

◮ IV is now dead code, and can be eliminated!

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-36
SLIDE 36

. . . . . .

Eliminating Materialized Index Vectors With IVE

◮ IVE: If IV is NOT scalars, scalarize it. Before:

  • ffset = vect2offset( shape(M), IV)

z = idxsel( offset, M)

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-37
SLIDE 37

. . . . . .

Eliminating Materialized Index Vectors With IVE

◮ IVE: If IV is NOT scalars, scalarize it. Before:

  • ffset = vect2offset( shape(M), IV)

z = idxsel( offset, M)

◮ After:

I = IV[0] J = IV[1] K = IV[2] JV = [ I, J, K]

  • ffset = vect2offset( shape(M), JV)

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-38
SLIDE 38

. . . . . .

Eliminating Materialized Index Vectors With IVE

◮ IVE: If IV is NOT scalars, scalarize it. Before:

  • ffset = vect2offset( shape(M), IV)

z = idxsel( offset, M)

◮ After:

I = IV[0] J = IV[1] K = IV[2] JV = [ I, J, K]

  • ffset = vect2offset( shape(M), JV)

◮ IV is now dead code, and can be eliminated!

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-39
SLIDE 39

. . . . . .

Eliminating Materialized Index Vectors With IVE

◮ IVE: If IV is NOT scalars, scalarize it. Before:

  • ffset = vect2offset( shape(M), IV)

z = idxsel( offset, M)

◮ After:

I = IV[0] J = IV[1] K = IV[2] JV = [ I, J, K]

  • ffset = vect2offset( shape(M), JV)

◮ IV is now dead code, and can be eliminated! ◮ Earlier substitution by idxs2offset now feasible.

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-40
SLIDE 40

. . . . . .

Eliminating Materialized Index Vectors With IVE

◮ IVE: If IV is NOT scalars, scalarize it. Before:

  • ffset = vect2offset( shape(M), IV)

z = idxsel( offset, M)

◮ After:

I = IV[0] J = IV[1] K = IV[2] JV = [ I, J, K]

  • ffset = vect2offset( shape(M), JV)

◮ IV is now dead code, and can be eliminated! ◮ Earlier substitution by idxs2offset now feasible. ◮ Unfortunately. . .

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-41
SLIDE 41

. . . . . .

Dueling Optimizations: IVE vs. LIR

◮ IVE introduces JV = [ I, J, K]

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-42
SLIDE 42

. . . . . .

Dueling Optimizations: IVE vs. LIR

◮ IVE introduces JV = [ I, J, K] ◮ Loop-Invariant Removal (LIR) lifts I, J, K, JV out of the

function

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-43
SLIDE 43

. . . . . .

Dueling Optimizations: IVE vs. LIR

◮ IVE introduces JV = [ I, J, K] ◮ Loop-Invariant Removal (LIR) lifts I, J, K, JV out of the

function

◮ Constant Folding (CF) replaces JV = [ I, J, K] by JV =

IV

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-44
SLIDE 44

. . . . . .

Dueling Optimizations: IVE vs. LIR

◮ IVE introduces JV = [ I, J, K] ◮ Loop-Invariant Removal (LIR) lifts I, J, K, JV out of the

function

◮ Constant Folding (CF) replaces JV = [ I, J, K] by JV =

IV

◮ Common-subexpression elimination (CSE) replaces JV by IV

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-45
SLIDE 45

. . . . . .

Dueling Optimizations: IVE vs. LIR

◮ IVE introduces JV = [ I, J, K] ◮ Loop-Invariant Removal (LIR) lifts I, J, K, JV out of the

function

◮ Constant Folding (CF) replaces JV = [ I, J, K] by JV =

IV

◮ Common-subexpression elimination (CSE) replaces JV by IV ◮ This is where we came in!

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-46
SLIDE 46

. . . . . .

Dueling Optimizations: IVE vs. LIR

◮ IVE introduces JV = [ I, J, K] ◮ Loop-Invariant Removal (LIR) lifts I, J, K, JV out of the

function

◮ Constant Folding (CF) replaces JV = [ I, J, K] by JV =

IV

◮ Common-subexpression elimination (CSE) replaces JV by IV ◮ This is where we came in! ◮ So, what can we do?

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-47
SLIDE 47

. . . . . .

Dueling Optimizations: IVE vs. LIR

◮ IVE introduces JV = [ I, J, K] ◮ Loop-Invariant Removal (LIR) lifts I, J, K, JV out of the

function

◮ Constant Folding (CF) replaces JV = [ I, J, K] by JV =

IV

◮ Common-subexpression elimination (CSE) replaces JV by IV ◮ This is where we came in! ◮ So, what can we do? ◮ A kludged LIR to deal with this was deemed tasteless

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-48
SLIDE 48

. . . . . .

Biting the Bullet

◮ Another approach: move IVE into the optimization cycle

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-49
SLIDE 49

. . . . . .

Biting the Bullet

◮ Another approach: move IVE into the optimization cycle ◮ Having IVE in opt cycle enables other optimizations!

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-50
SLIDE 50

. . . . . .

Biting the Bullet

◮ Another approach: move IVE into the optimization cycle ◮ Having IVE in opt cycle enables other optimizations! ◮ But, it also breaks many optimizations

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-51
SLIDE 51

. . . . . .

Biting the Bullet

◮ Another approach: move IVE into the optimization cycle ◮ Having IVE in opt cycle enables other optimizations! ◮ But, it also breaks many optimizations ◮ To prevent dueling, we must scalarize all index vector ops!

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-52
SLIDE 52

. . . . . .

Biting the Bullet

◮ Another approach: move IVE into the optimization cycle ◮ Having IVE in opt cycle enables other optimizations! ◮ But, it also breaks many optimizations ◮ To prevent dueling, we must scalarize all index vector ops! ◮ e.g., unroll IV+1, guard and extrema functions..

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-53
SLIDE 53

. . . . . .

Biting the Bullet

◮ Another approach: move IVE into the optimization cycle ◮ Having IVE in opt cycle enables other optimizations! ◮ But, it also breaks many optimizations ◮ To prevent dueling, we must scalarize all index vector ops! ◮ e.g., unroll IV+1, guard and extrema functions.. ◮ e.g., extend existing optimizations, such as CF

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-54
SLIDE 54

. . . . . .

Biting the Bullet

◮ Another approach: move IVE into the optimization cycle ◮ Having IVE in opt cycle enables other optimizations! ◮ But, it also breaks many optimizations ◮ To prevent dueling, we must scalarize all index vector ops! ◮ e.g., unroll IV+1, guard and extrema functions.. ◮ e.g., extend existing optimizations, such as CF ◮ The Good News: I had already scalarized many index vectors

for Algebraic With-Loop Folding!

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-55
SLIDE 55

. . . . . .

Current Status

◮ Moving IVE into optimization cycle worked, sort of:

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-56
SLIDE 56

. . . . . .

Current Status

◮ Moving IVE into optimization cycle worked, sort of: ◮ The Good News: ipplusandAKD CPU time: 8 seconds,

instead of 45 minutes

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-57
SLIDE 57

. . . . . .

Current Status

◮ Moving IVE into optimization cycle worked, sort of: ◮ The Good News: ipplusandAKD CPU time: 8 seconds,

instead of 45 minutes

◮ The Bad News: New primitives in AST broke some

  • ptimizations!

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-58
SLIDE 58

. . . . . .

Current Status

◮ Moving IVE into optimization cycle worked, sort of: ◮ The Good News: ipplusandAKD CPU time: 8 seconds,

instead of 45 minutes

◮ The Bad News: New primitives in AST broke some

  • ptimizations!

◮ More Bad News: I am still fixing them!

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-59
SLIDE 59

. . . . . .

Current Status

◮ Moving IVE into optimization cycle worked, sort of: ◮ The Good News: ipplusandAKD CPU time: 8 seconds,

instead of 45 minutes

◮ The Bad News: New primitives in AST broke some

  • ptimizations!

◮ More Bad News: I am still fixing them! ◮ The Good News: Performance is improving daily

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-60
SLIDE 60

. . . . . .

Current Status

◮ Moving IVE into optimization cycle worked, sort of: ◮ The Good News: ipplusandAKD CPU time: 8 seconds,

instead of 45 minutes

◮ The Bad News: New primitives in AST broke some

  • ptimizations!

◮ More Bad News: I am still fixing them! ◮ The Good News: Performance is improving daily ◮ More Good News: Many opportunities exist for further

  • ptimization

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-61
SLIDE 61

. . . . . .

Current Status

◮ Moving IVE into optimization cycle worked, sort of: ◮ The Good News: ipplusandAKD CPU time: 8 seconds,

instead of 45 minutes

◮ The Bad News: New primitives in AST broke some

  • ptimizations!

◮ More Bad News: I am still fixing them! ◮ The Good News: Performance is improving daily ◮ More Good News: Many opportunities exist for further

  • ptimization

◮ Even More Good News: More parallelism is coming

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-62
SLIDE 62

. . . . . .

How Fast Is Compiled APL?

◮ Array sizes affect performance

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-63
SLIDE 63

. . . . . .

How Fast Is Compiled APL?

◮ Array sizes affect performance ◮ Iteration counts affect performance

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-64
SLIDE 64

. . . . . .

How Fast Is Compiled APL?

◮ Array sizes affect performance ◮ Iteration counts affect performance ◮ Indexed assigns affect performance

Robert Bernecky Scalarization of Index Vectors in Compiled APL

slide-65
SLIDE 65

. . . . . .

How Fast Is Compiled APL?

◮ Array sizes affect performance ◮ Iteration counts affect performance ◮ Indexed assigns affect performance ◮ Characterize your application, then we can provide an answer.

Robert Bernecky Scalarization of Index Vectors in Compiled APL