Scalarization of Index Vectors in Compiled APL Robert Bernecky - PowerPoint PPT Presentation

Scalarization of Index Vectors in Compiled APL Robert Bernecky Snake Island Research Inc 18 Fifth Street, Ward’s Island Toronto, Canada tel: +1 416 203 0854 bernecky@snakeisland.com September 30, 2011 . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Abstract High-performance for array languages offers several unique challenges to the compiler writer, including fusion of loops over large arrays, detection and elimination of scalars as arbitrary arrays, and eliminating or minimizing the run-time creation of index vectors. We introduce one of those challenges in the context of SAC, a functional array languge, and give preliminary results on the performance of a compiler that eliminates index vectors by scalarizing them within the optimization cycle. . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

The Question ◮ How much faster is compiled APL than interpreted APL? . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

The Question ◮ How much faster is compiled APL than interpreted APL? ◮ The answer is NOT a scalar. . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Environment ◮ Dyalog APL 13.0 vs. APEX/SAC . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Environment ◮ Dyalog APL 13.0 vs. APEX/SAC ◮ The current SAC compiler ◮ a functional array language ◮ data-parallel nested loops: With-Loop ◮ array-based optimizations ◮ functional loops and conditionals as functions . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Environment ◮ Dyalog APL 13.0 vs. APEX/SAC ◮ The current SAC compiler ◮ a functional array language ◮ data-parallel nested loops: With-Loop ◮ array-based optimizations ◮ functional loops and conditionals as functions ◮ Goal: Compiled APL performance competitive with hand-coded C . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Some Reasons Why APL is Slow Speedup (APL/APEX w/AWLF) ◮ Index vector materialization ◮ Variable per-primitive overheads ◮ Fixed per-primitive overheads: Syntax analysis, conf checks, 1,000 100 fn dispatch, mem mgmt 0.1 10 1 APL: Dyalog APL 13.0 SAC: 17,654:MODIFIED Higher is better for APEX buildvAKS buildvfAKS buildv2AKS compiotaAKS compiotadAKS csbenchAKS downgradePVAKS fdAKS gewlfAKS APL vs. APEX CPU Time Performance (2,011−09−30) Robert Bernecky histgradeAKS histlpAKS histopAKS histopfAKS iotanAKS ipapeAKS ipbbAKS ipbdAKS Benchmark name ipddAKS ipopneAKS Scalarization of Index Vectors in Compiled APL ipplusandAKS lltopAKS logd3AKS logd4AKS loopfsAKS loopfvAKS loopisAKS . mconvAKS mconvoutAKS nsvAKS . nthoneAKS schedrAKS scsAKS testforAKS . testindxAKS testlcvAKS unirandAKS . unirand3AKS upgradeBoolAKS upgradeCharAKS upgradePVAKS . upgradeRPVAKS .

Why APL is Slow: Fixed Per-Primitive Overheads APL Primitive Overhead time/element for (Intvec+intvec) 140 microseconds/element 120 100 80 60 40 20 0 1 3 5 7 9 11 13 15 17 19 21 23 25 # elements in array Who suffers? Apps dominated by operations on scalars: CRC, loopy histograms, dynamic programming, RNG . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Why APL is Slow: Fixed Per-Primitive Overheads APL vs. APEX CPU Time Performance (2,011−09−30) 1,000 Speedup (APL/APEX w/AWLF) Higher is better for APEX APL: Dyalog APL 13.0 SAC: 17,654:MODIFIED 100 10 crcAKS histlpAKS lltopAKS loopisAKS scsAKS testforAKS Benchmark name . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Why APL is Slow: Fixed Per-Primitive Overheads ◮ Scalar-dominated apps have good serial speedup. . . ◮ but poor parallel speedup APEX/SAC Parallel Performance SAC (17654:MODIFIED) real time 2,011−09−30 0.5x mt1 mt2 mt3 0.4x 6−core AMD Phenom II X6 1,075T Execution time (sec) mt4 mt5 mt6 0.3x 0.2x 0.1x 0x crc histlp lltop loopis scs testfor # threads . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Why is APL Slow? Variable Per-Primitive Overheads ◮ Naive execution: Limited fn composition, e.g. sum( iota( N)) . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Why is APL Slow? Variable Per-Primitive Overheads ◮ Naive execution: Limited fn composition, e.g. sum( iota( N)) ◮ Array-valued intermediate results: memory madness . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Why is APL Slow? Variable Per-Primitive Overheads ◮ Naive execution: Limited fn composition, e.g. sum( iota( N)) ◮ Array-valued intermediate results: memory madness ◮ Who suffers? Apps dominated by operations on large arrays: Signal processing, convolution, normal move-out APL vs. APEX CPU Time Performance (2,011−09−30) Speedup (APL/APEX w/AWLF) 1,000 Higher is better for APEX 100 SAC: 17,654:MODIFIED APL: Dyalog APL 13.0 10 1 0.1 iotanAKS ipapeAKS ipbdAKS ipopneAKS logd3AKS logd4AKS upgradeCharAKS . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL Benchmark name

Why is APL Slow? Variable Per-Primitive Overheads ◮ Who suffers? Apps dominated by operations on large arrays: Signal processing, convolution, normal move-out APEX/SAC Parallel Performance SAC (17654:MODIFIED) real time 2,011−09−30 mt1 1.2x 6−core AMD Phenom II X6 1,075T mt2 Execution time (sec) mt3 1x mt4 mt5 0.8x mt6 0.6x 0.4x 0.2x 0x iotan ipape ipbbAKD ipbd ipopneAKD ipopne logd3 logd4 upgradeChar # threads . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Why is APL Slow? Materialized Index Vectors ◮ Mike Jenkins’ matrix divide model a[;i,pi] = a[;pi,i] . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Why is APL Slow? Materialized Index Vectors ◮ Mike Jenkins’ matrix divide model a[;i,pi] = a[;pi,i] ◮ [i,pi] and [pi,i] are materialized index vectors . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Why is APL Slow? Materialized Index Vectors ◮ Mike Jenkins’ matrix divide model a[;i,pi] = a[;pi,i] ◮ [i,pi] and [pi,i] are materialized index vectors ◮ A few simple changes to scalarize index vectors: tmp = a[;i] a[;i] = a[;pi] a[;pi] = tmp . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Why is APL Slow? Materialized Index Vectors ◮ Mike Jenkins’ matrix divide model a[;i,pi] = a[;pi,i] ◮ [i,pi] and [pi,i] are materialized index vectors ◮ A few simple changes to scalarize index vectors: tmp = a[;i] a[;i] = a[;pi] a[;pi] = tmp ◮ Matrix divide model now runs twice as fast! . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Why Materialized Index Vectors are Expensive ◮ Materialization of [i,pi] and [pi,i] : . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Why Materialized Index Vectors are Expensive ◮ Materialization of [i,pi] and [pi,i] : ◮ (* for indexing part) *Increment reference counts on i and pi Allocate 2-element temp vector Initialize temp vector descriptor Initialize temp vector elements *Perform indexing Deallocate 2-element temp vector *Decrement reference counts on i and pi . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Why Materialized Index Vectors are Expensive ◮ Who suffers? Apps using explicit array indexing . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Why Materialized Index Vectors are Expensive ◮ Who suffers? Apps using explicit array indexing ◮ e.g. , many apps dominated by indexed assign . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Why Materialized Index Vectors are Expensive ◮ Who suffers? Apps using explicit array indexing ◮ e.g. , many apps dominated by indexed assign ◮ Who suffers? Matrix divide, compress, deal, dynamic programming . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Why Materialized Index Vectors are Expensive ◮ Who suffers? Apps using explicit array indexing ◮ e.g. , many apps dominated by indexed assign ◮ Who suffers? Matrix divide, compress, deal, dynamic programming ◮ Who suffers? Inner products that use the CDC STAR-100 algorithm . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Why Materialized Index Vectors are Expensive ◮ Who suffers? Apps using explicit array indexing ◮ e.g. , many apps dominated by indexed assign ◮ Who suffers? Matrix divide, compress, deal, dynamic programming ◮ Who suffers? Inner products that use the CDC STAR-100 algorithm ◮ 800x800 ipplusandAKD CPU time: 45 minutes! . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Eliminating Materialized Index Vectors With IVE ◮ IFL2006 paper: Index Vector Elimination (IVE) Bernecky, Grelck, Herhut, Scholz, Trojahner, and Schafarenko . . . . . . Robert Bernecky Scalarization of Index Vectors in Compiled APL

Scalarization of Index Vectors in Compiled APL Robert Bernecky - PowerPoint PPT Presentation

Scalarization of Index Vectors in Compiled APL Robert Bernecky Snake Island Research Inc 18 Fifth Street, Wards Island Toronto, Canada tel: +1 416 203 0854 bernecky@snakeisland.com September 30, 2011 . . . . . . Robert Bernecky

Vectors Vectors and Scalars Properties of Vectors Components of a Vector and Unit

Orthonormal bases of functions April 24, 2018 Data - Vectors or Functions Vectors Functions

CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary

Vector'Semantics Dense%Vectors% Dan%Jurafsky Sparse'versus'dense'vectors PPMI%vectors%are

Methods of Adding Vectors Geometrically MCV4U: Calculus & Vectors Recall that two vectors are

APL APOLLO INVESTOR PRESENTATION November 2019 1 Contents Topics APL Apollo Overview

APL APOLLO June 2020 1 Contents APL Apollo Overview Covid-19 Action Plan Porter's Five Forces

La sindrome da aPL : up to date La sindrome da aPL : up to date PL Meroni Div. of Rheumatology,

Is APL occurring as a therapy-related malignancy different from de novo APL? Richard A. Larson,

Potential for non-conventional agents in upfront and relapsed APL APL Rome Sep 2017 Vikram

Hi APL APL Ideas Precedence rules Array oriented Concise Functions

Dyalog APL/W Conference 2011 Unicode Edition Serial No : 000000 Mon Feb 20 20:24:29 2012 clear

Index Rules and Methodology Index Name Ticker S-Network US Equity 3000 Index SN3000 S-Network

Comparison of Costello Geomagnetic Activity Index Model and JHU/APL Models for Kp Prediction

1 APL is a highly curable disease in ATRA/ATO era ATO+ATRA+CT PML ATO Apoptosis ATRA+CT RARa

L S J A G Summary of the study for APL studies in JALSG APL92 APL97 APL204 APL212 May 1997 -

Likelihood Consequences Insignificant Minor Moderate Major Catastrophic No injuries First

Game and Learn: An Introduction to Educational Gaming 12. Case Study: Inform 7 Ruben R.

LifeCLEF 2020 Alexis Joly (INRIA, LIRMM) , Henning Mller (HES-SO), Herv Goau (CIRAD, AMAP),

The Revolutionary Rescue April 16 (Easter Sunday ) April 23 Testimony Sunday: Our Stories of

Undecidability Informatics 2A: Lecture 30 John Longley School of Informatics University of

Wine | Proverbs 23:29-35 Wisdom for the Holidays Wine is a mocker and beer a brawler; whoever is

Talking Math with Your Kids Mackenzie Glen PS December 10th, 2015 Amy Lin E Q A O Q U E S T I

Logos Class Book of Prophet Amos Online class for May 24, 2020- session 6 5/24/20 1 Agenda

Scalarization of Index Vectors in Compiled APL Robert Bernecky - PowerPoint PPT Presentation

Scalarization of Index Vectors in Compiled APL Robert Bernecky Snake Island Research Inc 18 Fifth Street, Wards Island Toronto, Canada tel: +1 416 203 0854 bernecky@snakeisland.com September 30, 2011 . . . . . . Robert Bernecky

Vectors Vectors and Scalars Properties of Vectors Components of a Vector and Unit

Orthonormal bases of functions April 24, 2018 Data - Vectors or Functions Vectors Functions

CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary

Vector'Semantics Dense%Vectors% Dan%Jurafsky Sparse'versus'dense'vectors PPMI%vectors%are

Methods of Adding Vectors Geometrically MCV4U: Calculus &amp; Vectors Recall that two vectors are

APL APOLLO INVESTOR PRESENTATION November 2019 1 Contents Topics APL Apollo Overview

APL APOLLO June 2020 1 Contents APL Apollo Overview Covid-19 Action Plan Porter's Five Forces

La sindrome da aPL : up to date La sindrome da aPL : up to date PL Meroni Div. of Rheumatology,

Is APL occurring as a therapy-related malignancy different from de novo APL? Richard A. Larson,

Potential for non-conventional agents in upfront and relapsed APL APL Rome Sep 2017 Vikram

Hi APL APL Ideas Precedence rules Array oriented Concise Functions

Dyalog APL/W Conference 2011 Unicode Edition Serial No : 000000 Mon Feb 20 20:24:29 2012 clear

Index Rules and Methodology Index Name Ticker S-Network US Equity 3000 Index SN3000 S-Network

Comparison of Costello Geomagnetic Activity Index Model and JHU/APL Models for Kp Prediction

1 APL is a highly curable disease in ATRA/ATO era ATO+ATRA+CT PML ATO Apoptosis ATRA+CT RARa

L S J A G Summary of the study for APL studies in JALSG APL92 APL97 APL204 APL212 May 1997 -

Likelihood Consequences Insignificant Minor Moderate Major Catastrophic No injuries First

Game and Learn: An Introduction to Educational Gaming 12. Case Study: Inform 7 Ruben R.

LifeCLEF 2020 Alexis Joly (INRIA, LIRMM) , Henning Mller (HES-SO), Herv Goau (CIRAD, AMAP),

The Revolutionary Rescue April 16 (Easter Sunday ) April 23 Testimony Sunday: Our Stories of

Undecidability Informatics 2A: Lecture 30 John Longley School of Informatics University of

Wine | Proverbs 23:29-35 Wisdom for the Holidays Wine is a mocker and beer a brawler; whoever is

Talking Math with Your Kids Mackenzie Glen PS December 10th, 2015 Amy Lin E Q A O Q U E S T I

Logos Class Book of Prophet Amos Online class for May 24, 2020- session 6 5/24/20 1 Agenda

Methods of Adding Vectors Geometrically MCV4U: Calculus & Vectors Recall that two vectors are