Data Parallel Programming in R David Padua Department - PowerPoint PPT Presentation

Data ¡Parallel ¡Programming ¡ in ¡R ¡ David ¡Padua ¡ ¡ Department ¡of ¡Computer ¡Science ¡ University ¡of ¡Illinois ¡at ¡Urbana-‑Champaign ¡

Outline ¡ • Parallelism ¡ • Data ¡Parallel ¡Programming ¡and ¡abstrac?ons ¡ • Hierarchically ¡Tiled ¡Arrays ¡ • Future ¡plans ¡ ¡2 ¡

I. ¡Parallelism ¡ • Parallelism ¡is ¡crucial ¡for ¡ ¡ – Con?nued ¡gains ¡in ¡performance ¡ – Maximum ¡performance ¡at ¡any ¡given ¡?me ¡ – Also ¡the ¡most ¡natural ¡way ¡to ¡program ¡for ¡reac?ve ¡ compu?ng ¡ ¡(but ¡not ¡the ¡topic ¡of ¡this ¡presenta?on) ¡ • Main ¡problem ¡with ¡parallelism ¡is ¡produc?vity. ¡ • Need ¡the ¡right ¡languages, ¡libraries ¡and ¡tools ¡ ¡3 ¡

II. ¡Data ¡parallel ¡programming ¡ • In ¡its ¡simplest ¡form ¡is ¡just ¡the ¡execu?on ¡of ¡the ¡ same ¡opera?on ¡on ¡each ¡element ¡of ¡an ¡aggregate ¡ (array, ¡set, ¡database ¡rela?on). ¡ • Sequen?al ¡execu?on ¡across ¡these ¡opera?ons ¡ • Crucial ¡issue ¡is ¡what ¡should ¡these ¡opera?ons ¡ should ¡look ¡like ¡(research ¡problem) ¡ • There ¡are ¡numerous ¡proposals ¡ ¡ – Array ¡opera?ons ¡(Iversion ¡ca. ¡1960) ¡ – MapReduce ¡(Google, ¡ca. ¡2000) ¡ – Galois ¡(Pingali, ¡ca. ¡2000) ¡ ¡4 ¡

II. ¡Data ¡parallel ¡programming ¡ Array ¡Constructs ¡ • Popular ¡among ¡scien?sts ¡and ¡engineers. ¡ – Fortran ¡90 ¡and ¡successors ¡ – MATLAB ¡ – R ¡ • Parallelism ¡not ¡the ¡reason ¡for ¡this ¡nota?on. ¡ ¡5 ¡

II. ¡Data ¡parallel ¡programming ¡ Array ¡Constructs ¡ • Convenient ¡nota?on ¡ ¡ – Compact ¡ – Higher ¡level ¡of ¡abstrac?on ¡ do i=1,n do i=1,n do j=1,n do j=1,n C(i,j)= A(i,j)+B(i,j) S = S + A(i,j) end do end do end do end do C = A + B ¡ S += sum(A) ¡6 ¡

II. ¡Data ¡parallel ¡programming ¡ Array ¡constructs ¡ • Used ¡in ¡the ¡past ¡for ¡parallelism: ¡Illiac ¡IV, ¡ Connec?on ¡machine ¡ • Today: ¡Intel’s ¡Cilk ¡(mainly ¡for ¡ microprocessor ¡vector ¡extensions) ¡ ¡7 ¡

II. ¡Data ¡parallel ¡programming ¡ Benefits ¡-‑ ¡Programmability ¡ • Data ¡parallel ¡programming ¡is ¡scalable ¡ Opera&ons ¡implemented ¡as ¡ ¡ – Scales ¡with ¡increasing ¡number ¡of ¡processors ¡by ¡increasing ¡the ¡size ¡of ¡ parallel ¡loops ¡in ¡shared ¡memory ¡ the ¡data ¡ • Data ¡parallel ¡programs ¡using ¡powerful ¡operator ¡resemble ¡ conven?onal, ¡serial ¡programs. ¡ Opera&ons ¡implemented ¡as ¡ ¡ – Parallelism ¡is ¡encapsulated. ¡ – Parallelism ¡is ¡structured ¡ messages ¡if ¡distributed ¡memory ¡ ¡ ¡ • Portable ¡ – Can ¡run ¡on ¡any ¡class ¡of ¡machine ¡for ¡which ¡the ¡appropriate ¡operators ¡ are ¡implemented ¡ Opera&ons ¡implemented ¡with ¡ ¡ • Shared/Distributed ¡Memory, ¡Vector ¡Intrinsics, ¡GPUs ¡ vector ¡intrinsics ¡for ¡SIMD ¡ • Interoperates ¡with ¡R ¡! ¡ ¡8 ¡

II. ¡Data ¡parallel ¡programming ¡ Completeness ¡ • Can ¡all ¡problems ¡be ¡solved ¡in ¡the ¡ most ¡ efficient ¡ manner ¡with ¡data ¡parallel ¡ programming ¡? ¡ • Most ¡? ¡All ¡?? ¡ – Other ¡(lower ¡level)forms ¡must ¡be ¡there ¡in ¡the ¡ same ¡way ¡that ¡we ¡s?ll ¡use ¡assembly ¡language ¡ some?mes. ¡ ¡9 ¡

II. ¡Data ¡parallel ¡programming ¡ Completeness ¡ Numerical ¡compu?ng ¡ ¡ • – William ¡H. ¡Press, ¡Saul ¡A. ¡Teukolsky, ¡William ¡T. ¡Veeerling, ¡and ¡Brian ¡P. ¡ Flannery. ¡1996. ¡ Numerical ¡Recipes ¡in ¡Fortran ¡90 ¡ (2nd ¡Ed.): ¡The ¡Art ¡of ¡Parallel ¡ Scien?fic ¡Compu?ng. ¡Cambridge ¡University ¡Press, ¡New ¡York, ¡NY, ¡USA. ¡ Graph ¡algorithms ¡ • – Aydın ¡Buluç ¡and ¡John ¡R ¡Gilbert. ¡2011. ¡ The ¡Combinatorial ¡BLAS: ¡design, ¡ implementa&on, ¡and ¡applica&ons . ¡Int. ¡J. ¡High ¡Perform. ¡Comput. ¡Appl. ¡25, ¡4 ¡ (November ¡2011), ¡496-‑509. ¡ – Keshav ¡Pingali, ¡Donald ¡Nguyen, ¡Milind ¡Kulkarni, ¡Mar?n ¡Burtscher, ¡M. ¡Amber ¡ Hassaan, ¡Rashid ¡Kaleem, ¡Tsung-‑Hsien ¡Lee, ¡Andrew ¡Lenharth, ¡Roman ¡ Manevich, ¡Mario ¡Méndez-‑Lojo, ¡Dimitrios ¡Prountzos, ¡and ¡Xin ¡Sui. ¡2011. ¡ The ¡ tao ¡of ¡parallelism ¡in ¡algorithms . ¡In ¡Proceedings ¡of ¡the ¡32nd ¡ACM ¡SIGPLAN ¡ conference ¡on ¡Programming ¡language ¡design ¡and ¡implementa?on ¡(PLDI ¡'11). ¡ ACM, ¡New ¡York, ¡NY, ¡USA, ¡12-‑25. ¡ ¡ Database ¡algorithms ¡ • – Anand ¡Rajaraman ¡and ¡Jeff ¡Ullman. ¡ Mining ¡of ¡Massive ¡Datasets ¡ .Cambridge ¡ University ¡Press, ¡2011. ¡ … ¡ • ¡10 ¡

II. ¡Data ¡parallel ¡programming ¡ Transla?ng ¡to ¡SPMD ¡ • SPMD ¡is ¡the ¡nota?on ¡of ¡ real ¡a, ¡b, ¡x(1000) ¡ choice ¡for ¡distribute ¡ a=sin ¡(b) ¡ memory ¡machines ¡(and ¡ x(:)=x(:)+a ¡ GPUs). ¡ • Easy ¡to ¡convert ¡from ¡ real ¡a, ¡b, ¡x(1000/p) ¡ ¡ /* ¡a ¡and ¡b ¡are ¡replicated ¡*/ ¡ array ¡nota?on ¡to ¡SPMD ¡ a=sin(b) ¡ form. ¡ x(:)=x(:)+a ¡ • This ¡is ¡an ¡op?miza?on. ¡ ¡11 ¡

III. ¡Hierarchically ¡Tiled ¡Arrays: ¡ ¡ Our ¡data ¡parallel ¡nota?on ¡for ¡array ¡ computa?ons ¡ ¡12 ¡

Hierarchically ¡Tiled ¡Arrays ¡ • Recognizes ¡the ¡importance ¡of ¡blocking/?ling ¡ for ¡locality ¡and ¡parallel ¡programming. ¡ ¡ • Makes ¡?les ¡first ¡class ¡objects. ¡ ¡ – Referenced ¡explicitly. ¡ ¡ – Manipulated ¡using ¡array ¡opera?ons ¡such ¡as ¡ reduc?ons, ¡gather, ¡etc.. ¡ G. ¡Bikshandi, ¡J. ¡Guo, ¡D. ¡Hoeflinger, ¡G. ¡Almasi, ¡B. ¡Fraguela, ¡M. ¡Garzarán, ¡D. ¡Padua, ¡ ¡ and ¡C. ¡von ¡Praun. ¡Programming ¡for ¡ ¡Parallelism ¡and ¡Locality ¡with ¡Hierarchically ¡Tiled. ¡ ¡ PPoPP , ¡March ¡2006. ¡ ¡ ¡13 ¡

Hierarchically ¡Tiled ¡Arrays ¡ Distributed Locality Locality ¡14 ¡

Vector Addressing a(1:2,1:2) In general, a(v n ) a(1,1:4) ¡15 ¡

HTA Addressing ¡16 ¡

HTA Addressing h{1,1:2} (hta) h{2,1} (array) hierarchical ¡17 ¡

HTA Addressing h{1,1:2} (hta) h(3,4) ↔ scalar h{2,1} (array) flattened hierarchical ¡18 ¡

HTA Addressing h{1,1:2} (hta) h(3,4) ↔ scalar h{2,1} (array) flattened hierarchical h{1:2,2}(1:2,2) ↔ hta hybrid ¡19 ¡

HTA Addressing h{ i + j == 3} ↔ hta logical indexing h{1,1:2} (hta) h(3,4) ↔ scalar h{2,1} (array) flattened hierarchical h{1:2,2}(1:2,2) ↔ hta hybrid ¡20 ¡

Higher ¡level ¡opera?ons ¡ repmat(h, [1, 3]) circshift( h, [0, -1] ) transpose(h) ¡21 ¡

Higher ¡Level ¡Opera?ons ¡ • Many ¡operators ¡part ¡of ¡the ¡library ¡ – reduce, ¡circular ¡shiv, ¡replicate, ¡transpose, ¡etc ¡ • A ¡map ¡opera?on ¡(hmap) ¡ – Applies ¡user ¡defined ¡operators ¡to ¡each ¡?le ¡of ¡the ¡ HTA ¡ • And ¡corresponding ¡?les ¡if ¡mul?ple ¡HTAs ¡are ¡passed ¡as ¡ input ¡ – Applica?on ¡of ¡operator ¡occurs ¡in ¡parallel ¡across ¡ ?les ¡ ¡22 ¡ ¡

User ¡Defined ¡Opera?ons ¡ HTA ¡X(3,3)[10] ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡// ¡3x3 ¡?les ¡of ¡10 ¡elements ¡ HTA ¡Y(3,3)[10] ¡ … ¡ X hmap( ¡F(), ¡X, ¡Y ¡) ¡ ¡ F(HTA ¡x, ¡HTA ¡y) ¡{ ¡ ¡ ¡ ¡ ¡ ¡y ¡[i] ¡= ¡x[i] ¡* ¡x[i] ¡-‑ ¡3 ¡ } ¡ F( … ) F( … ) F( … ) Y F( … ) F( … ) F( … ) F( … ) F( … ) F( … ) ¡23 ¡

Cannon's Matrix Multiplication A 00 A 01 A 02 B 00 B 01 B 02 A 10 A 11 A 12 initial skew B 10 B 11 B 12 A 20 A 21 A 22 B 20 B 21 B 22 A 00 A 01 A 02 A 00 A 01 A 02 B 00 B 11 B 22 B 00 B 11 B 22 A 12 A 12 A 11 A 10 A 11 A 10 shift-multiply-add B 21 B 21 B 10 B 02 B 10 B 02 A 22 A 20 A 21 A 22 A 20 A 21 B 20 B 01 B 12 B 20 B 01 B 12 ¡24 ¡

Cannon's Matrix Multiplication %Main loop for i = 1:n c = c + a * b; a = circshift( a, [0, -1] ); b = circshift( b, [-1, 0] ); end ¡25 ¡

FT ¡26 ¡

Data Parallel Programming in R David Padua Department - PowerPoint PPT Presentation

Data Parallel Programming in R David Padua Department of Computer Science University of Illinois at Urbana-Champaign Outline Parallelism Data

Distributed Data-Parallel Programming Parallel Programming and Data Analysis Heather Miller

Cluster Basics Hana Sevcikova University of Washington DataCamp Parallel Programming in R

PARALLEL Joachim Nitschke PROGRAMMING Project Seminar Parallel Programming, Summer

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel

Shared Memory Programming with OpenMP Lecture 3: Parallel Regions Parallel region directive

Parallel Programming http://www.cs.bham.ac.uk/~hxt/2013/ parallel-programming/ based on: David

Lecture 2: Parallel Architectures Lecture 2: Parallel Architectures and Programming Models

SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES Parallel Programming Languages and Approaches

How to Think Algorithmically in Parallel? Or, Parallel Programming through Parallel Algorithms

2110412 Parallel Comp Arch Parallel Programming Paradigm Natawut Nupairoj, Ph.D. Department of

Overview Parallel computing platforms Approaches to building parallel computers

Introduction to Parallel Computing George Karypis Parallel Programming Platforms Elements of a

Concurrent Programming with Parallel Extensions to .NET Joe Duffy Architect & Development

An embedded language for An embedded language for data-parallel programming data-parallel

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Faster Parallel Algorithm for Approximate Shortest Path Jason Li (CMU) STOC 2020 March 2, 2020

Introduction to Symbolic Dynamics Part 1: The basics Silvio Capobianco Institute of Cybernetics

Sec$on 4: Parallel Algorithms Michelle Ku8el mku8el@cs.uct.ac.za

Switching for multicast short time-shift draft-yang-avt-switch-multicast-short-timeshift-00

High-speed parallel software implementation of the T pairing Diego F. Aranha Institute of

Dynamic generation of parallel computations James Hanlon, Simon J. Hollis Many-core project June

lecture 7 Integer multiplication (grade school) How to do (unsigned) integer multiplication in

24 Databases Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science

Data Parallel Programming in R David Padua Department - PowerPoint PPT Presentation

Data Parallel Programming in R David Padua Department of Computer Science University of Illinois at Urbana-Champaign Outline Parallelism Data

Distributed Data-Parallel Programming Parallel Programming and Data Analysis Heather Miller

Cluster Basics Hana Sevcikova University of Washington DataCamp Parallel Programming in R

PARALLEL Joachim Nitschke PROGRAMMING Project Seminar Parallel Programming, Summer

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel

Shared Memory Programming with OpenMP Lecture 3: Parallel Regions Parallel region directive

Parallel Programming http://www.cs.bham.ac.uk/~hxt/2013/ parallel-programming/ based on: David

Lecture 2: Parallel Architectures Lecture 2: Parallel Architectures and Programming Models

SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES Parallel Programming Languages and Approaches

How to Think Algorithmically in Parallel? Or, Parallel Programming through Parallel Algorithms

2110412 Parallel Comp Arch Parallel Programming Paradigm Natawut Nupairoj, Ph.D. Department of

Overview Parallel computing platforms Approaches to building parallel computers

Introduction to Parallel Computing George Karypis Parallel Programming Platforms Elements of a

Concurrent Programming with Parallel Extensions to .NET Joe Duffy Architect &amp; Development

An embedded language for An embedded language for data-parallel programming data-parallel

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Faster Parallel Algorithm for Approximate Shortest Path Jason Li (CMU) STOC 2020 March 2, 2020

Introduction to Symbolic Dynamics Part 1: The basics Silvio Capobianco Institute of Cybernetics

Sec$on 4: Parallel Algorithms Michelle Ku8el mku8el@cs.uct.ac.za

Switching for multicast short time-shift draft-yang-avt-switch-multicast-short-timeshift-00

High-speed parallel software implementation of the T pairing Diego F. Aranha Institute of

Dynamic generation of parallel computations James Hanlon, Simon J. Hollis Many-core project June

lecture 7 Integer multiplication (grade school) How to do (unsigned) integer multiplication in

24 Databases Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science

Concurrent Programming with Parallel Extensions to .NET Joe Duffy Architect & Development