Lower Bounds for Communication in Linear Algebra Grey Ballard UC - PowerPoint PPT Presentation

Lower Bounds for Communication in Linear Algebra Grey Ballard UC Berkeley January 9, 2012 Research supported by Microsoft (Award #024263) and Intel (Award #024894) funding and by matching funding by U.C. Discovery (Award #DIG07-10227) 1

Summary Motivation Communication is costly We’d like to reduce communication at the algorithm level How much communication can we possibly avoid? Outline Communication model Methods of proving lower bounds New algorithms developed to match lower bounds 2

Memory Model By communication we mean moving data within memory hierarchy on a sequential computer moving data between processors on a parallel computer Local Local Local SLOW Local Local Local FAST Local Local Local Sequential Parallel 3

Communication Cost Model Measure communication in terms of messages and words Flop cost: γ Cost of message of size w words: α + β w Total running time of an algorithm (ignoring overlap): α · ( # messages ) + β · ( # words ) + γ · ( # flops ) think of α as latency+overhead cost, β as inverse bandwidth As flop rates continue to improve more quickly than data transfer rates, the relative cost of communication (the first two terms) grows larger 4

Prior Work: Matrix Multiplication Lower Bounds Assume O ( n 3 ) algorithm (i.e. not Strassen-like) Sequential case with fast memory of size M lower bound on words moved between fast/slow mem: ✓ n 3 ◆ Ω √ [ HK81 ] M attained by blocked algorithm and recursive algorithm 5

Prior Work: Matrix Multiplication Lower Bounds Assume O ( n 3 ) algorithm (i.e. not Strassen-like) Sequential case with fast memory of size M lower bound on words moved between fast/slow mem: ✓ n 3 ◆ Ω √ [ HK81 ] M attained by blocked algorithm and recursive algorithm Parallel case with P processors (local memory of size M ) lower bound on words communicated (along critical path): n 3 ✓ ◆ Ω [ ITT04 ] √ P M attained by “2D” and “3D” algorithms: M lower bound attained by ⇣ ⌘ ⇣ ⌘ n 2 n 2 2D O [Can69] Ω P √ P ⇣ n 2 ⌘ ⇣ n 2 ⌘ 3D [DNS81] O Ω P 2 / 3 P 2 / 3 more on these upper bounds later 5

Proving lower bounds We’ve used three approaches to proving communication lower bounds: Reduction argument 1 [BDHS10],[GDX11] Geometric embedding 2 [ITT04],[BDHS11b] Computation graph analysis 3 [HK81],[BDHS11a] Blue = work in which our group at Berkeley was involved 6

Reduction Example: LU It’s easy to reduce matrix multiplication to LU: 2 3 2 3 2 3 I 0 − B I I 0 − B 5 = 5 ≡ L · U T ≡ A I 0 A I I A · B 4 4 5 4 0 0 I 0 0 I I LU factorization can be used to perform matrix multiplication Communication lower bound for matrix multiplication applies to LU Reduction to Cholesky is a little trickier, but same idea [BDHS10] 7

Geometric Embedding Example: Matmul Crux of proof based on geometric inequality from [LW49] used to prove matrix multiplication lower bound in [ITT04] x � C shadow � y C C A B A B z V V y z � � w o d a h s x B � � � A shadow � Volume of box Volume of a 3D set p V area(A shadow ) · ≤ V = xyz = √ xz · yz · xy p area(B shadow ) · p area(C shadow ) Given limited set of data, how much useful computation can be done? 8

Extensions to the rest of O ( n 3 ) linear algebra We extended the geometric embedding approach of [ITT04] to other algorithms that “smell” like 3 nested loops: the rest of BLAS Cholesky, LU, QR factorizations eigenvalue and SVD reductions sequences of algorithms (e.g. repeated matrix squaring) graph algorithms (e.g. all pairs shortest paths) to dense or sparse problems to sequential or parallel cases see [BDHS11b] for details and proofs 9

Extensions to the rest of O ( n 3 ) linear algebra General lower bound for O ( n 3 ) linear algebra (3 nested loops): ! # flops # words = Ω p fast/local memory size this bound can be applied processor by processor even to heterogeneous platforms # flops refers to work done by that processor the memory size may depend on the hardware or the algorithm lower bound on # messages can be derived by dividing by the memory size (the largest possible message size) corresponds to synchronization required 10

Computation graph analysis Red-blue pebble game introduced in [HK81] lower bounds proved for matrix multiplication, FFT, and others pebbling game extended in [Sav95] and later papers 11

Computation graph analysis Red-blue pebble game introduced in [HK81] lower bounds proved for matrix multiplication, FFT, and others pebbling game extended in [Sav95] and later papers W S Input / Output V Intermediate value S Dependency R S We’ve connected graph expansion to communication [BDHS11a] expansion describes the relationship between a subset and its neighbors in the complement larger expansion implies more communication necessary 11

Computation graph analysis example: Strassen Strassen’s original algorithm uses 7 multiplies and 18 adds for n = 2 Dec C 11 12 21 22 M 1 = ( A 11 + A 22 ) · ( B 11 + B 22 ) M 2 = ( A 21 + A 22 ) · B 11 M 3 = A 11 · ( B 12 − B 22 ) 7 5 4 1 3 2 6 M 4 = A 22 · ( B 21 − B 11 ) M 5 = ( A 11 + A 12 ) · B 22 M 6 = ( A 21 − A 11 ) · ( B 11 + B 12 ) M 7 = ( A 12 − A 22 ) · ( B 21 + B 22 ) = M 1 + M 4 − M 5 + M 7 C 11 C 12 = M 3 + M 5 C 21 = M 2 + M 4 C 22 = M 1 − M 2 + M 3 + M 6 ` 11 12 21 22 11 12 21 22 Enc A Enc B 12

Computation graph analysis example: Strassen Strassen works recursively, so its graph has recursive structure Dec C ` Enc A Enc B

Computation graph analysis example: Strassen Strassen works recursively, so its graph has recursive structure Dec C Dec C ` ` Enc A Enc A Enc B Enc B

Computation graph analysis example: Strassen Strassen works recursively, so its graph has recursive structure Dec C Dec C Dec C ` ` ` Enc A Enc A Enc A Enc B Enc B Enc B

Computation graph analysis example: Strassen Strassen works recursively, so its graph has recursive structure Dec C Dec C Dec C Dec C Dec C Enc B Enc A ` ` ` ` Enc A Enc A Enc A Enc A Enc B Enc B Enc B Enc B 13

Computation graph analysis example: Strassen Strassen works recursively, so its graph has recursive structure Dec C Dec C Dec C Dec C Dec C Enc B Enc A ` ` ` ` Enc A Enc A Enc A Enc A Enc B Enc B Enc B Enc B Expansion properties of this graph lead to communication lower bounds 13

Lower Bounds for Strassen The communication lower bounds are similar to classical matmul Classical Strassen ✓⇣ ⌘ log 2 8 ◆ ✓⇣ ⌘ log 2 7 ◆ n n Sequential Ω M Ω M √ √ M M ✓⇣ ⌘ log 2 8 M ◆ ✓⇣ ⌘ log 2 7 M ◆ n n Parallel Ω Ω √ P √ P M M 14

Lower Bounds for Strassen The communication lower bounds are similar to classical matmul Classical Strassen Strassen-like ⌘ ω 0 M ✓⇣ ⌘ log 2 8 ◆ ✓⇣ ⌘ log 2 7 ◆ ⇣⇣ ⌘ n n n Sequential Ω M Ω M Ω √ √ √ M M M ✓⇣ ⌘ log 2 8 M ◆ ✓⇣ ⌘ log 2 7 M ◆ ⌘ ω 0 M ⇣⇣ ⌘ n n n Parallel Ω Ω Ω √ P √ P √ P M M M 14

Lower Bounds for Strassen The communication lower bounds are similar to classical matmul Classical Strassen Strassen-like ⌘ ω 0 M ✓⇣ ⌘ log 2 8 ◆ ✓⇣ ⌘ log 2 7 ◆ ⇣⇣ ⌘ n n n Sequential Ω M Ω M Ω √ √ √ M M M ✓⇣ ⌘ log 2 8 M ◆ ✓⇣ ⌘ log 2 7 M ◆ ⌘ ω 0 M ⇣⇣ ⌘ n n n Parallel Ω Ω Ω √ P √ P √ P M M M these lower bounds imply that Strassen and faster algorithms require less communication sequential lower bound is attained by the recursive algorithm parallel lower bound is attainable with a new algorithm [BDH + 12] 14

Lower bounds → algorithms Proving lower bounds allows us to evaluate existing algorithms identify possibilities for algorithmic innovation obtain asymptotic improvements speedups increase with n and/or M 15

Communication Upper Bounds - Sequential O ( n 3 ) Algorithm Minimizes Minimizes # words # messages BLAS3 usual blocked or recursive algorithms [Gus97, FLPR99] Cholesky LAPACK [Gus97, AP00] [Gus97, AP00] [BDHS10] [BDHS10] [BDD + 12] Symmetric LAPACK (rarely) [BDD + 12] Indefinite LU with LAPACK (rarely) [GDX11] pivoting [Gus97, Tol97] [GDX11] QR LAPACK (rarely) [FW03] [FW03] [DGHL11] [EG98] [DGHL11] Eig, SVD [BDK12, BDD11] Blue = work in which our group at Berkeley was involved 16

Communication Upper Bounds - Parallel O ( n 3 ) Algorithm Reference Factor exceeding Factor exceeding lower bound for lower bound for # words # messages Matrix Multiply [Can69] 1 1 Cholesky ScaLAPACK log P log P LU with [GDX11] log P log P ( N / P 1 / 2 ) log P pivoting ScaLAPACK log P log 3 P QR [DGHL11] log P ( N / P 1 / 2 ) log P ScaLAPACK log P log 3 P SymEig, SVD [BDK12] log P N / P 1 / 2 ScaLAPACK log P log 3 P NonsymEig [BDD11] log P P 1 / 2 log P ScaLAPACK N log P Blue = work in which our group at Berkeley was involved Red = not optimal ⇣ n 2 ⌘ √ Lower Bounds: Ω Ω ( P ) √ P 17

Lower Bounds for Communication in Linear Algebra Grey Ballard UC - PowerPoint PPT Presentation

Lower Bounds for Communication in Linear Algebra Grey Ballard UC Berkeley January 9, 2012 Research supported by Microsoft (Award #024263) and Intel (Award #024894) funding and by matching funding by U.C. Discovery (Award #DIG07-10227) 1

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Lower Bounds on Matrix Rigidity via a Quantum Argument Ronald de Wolf CWI Amsterdam Lower

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Lecture 3: Lower Bounds for Sorting, Linear Time Sorting Algorithms Instructor: Saravanan

Lecture 2. Upper and lower bounds for subgaussian matrices The -net method refined 1 Random

Kernel-Size Lower Bounds: The Evidence from Complexity Theory Andrew Drucker IAS Worker 2013,

Amit Chakrabarti Dartmouth College WAPMDS, IIT Kanpur, Dec 2009 Amit Chakrabarti 1 Multi-Pass

Kernel-Size Lower Bounds: The Evidence from Complexity Theory Andrew Drucker IAS Worker 2013,

Kernel-Size Lower Bounds: The Evidence from Complexity Theory Andrew Drucker IAS Worker 2013,

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

Monotone Circuit Depth Lower Bounds Prashant Vasudevan April 10, 2012 Prashant Vasudevan

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

Seminar: Recreational Computer Science 1. Organization, Seminar Schedule & Topics Gabi R

6.808 Mobile and Sensor Computing Lecture 12 Map Matching with Cellular Data Hari Balakrishnan

Cellular Covers of Cotorsion-free Modules R udiger G obel, Jos e L. Rodr guez, Lutz

L ECTURE 13: C ELLULAR A UTOMATA 3 / D ISCRETE -T IME D YNAMICAL S YSTEMS 5 I NSTRUCTOR : G IANNI

On stability of special solutions for quasi-linear equations of traffic flow Podoroga Anastasia

BioNanofluidics for Drug Screening, Disease Diagnosis, Medical Device Design, and Personalized

Individual-Based Correlates of Protection Identification of Protective Level by Looking at

Cellular Automata: Overview and classical results Silvio Capobianco k, Silvio@ru.is H ask

Lower Bounds for Communication in Linear Algebra Grey Ballard UC - PowerPoint PPT Presentation

Lower Bounds for Communication in Linear Algebra Grey Ballard UC Berkeley January 9, 2012 Research supported by Microsoft (Award #024263) and Intel (Award #024894) funding and by matching funding by U.C. Discovery (Award #DIG07-10227) 1

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Lower Bounds on Matrix Rigidity via a Quantum Argument Ronald de Wolf CWI Amsterdam Lower

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Lecture 3: Lower Bounds for Sorting, Linear Time Sorting Algorithms Instructor: Saravanan

Lecture 2. Upper and lower bounds for subgaussian matrices The -net method refined 1 Random

Kernel-Size Lower Bounds: The Evidence from Complexity Theory Andrew Drucker IAS Worker 2013,

Amit Chakrabarti Dartmouth College WAPMDS, IIT Kanpur, Dec 2009 Amit Chakrabarti 1 Multi-Pass

Kernel-Size Lower Bounds: The Evidence from Complexity Theory Andrew Drucker IAS Worker 2013,

Kernel-Size Lower Bounds: The Evidence from Complexity Theory Andrew Drucker IAS Worker 2013,

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

Monotone Circuit Depth Lower Bounds Prashant Vasudevan April 10, 2012 Prashant Vasudevan

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

Seminar: Recreational Computer Science 1. Organization, Seminar Schedule &amp; Topics Gabi R

6.808 Mobile and Sensor Computing Lecture 12 Map Matching with Cellular Data Hari Balakrishnan

Cellular Covers of Cotorsion-free Modules R udiger G obel, Jos e L. Rodr guez, Lutz

L ECTURE 13: C ELLULAR A UTOMATA 3 / D ISCRETE -T IME D YNAMICAL S YSTEMS 5 I NSTRUCTOR : G IANNI

On stability of special solutions for quasi-linear equations of traffic flow Podoroga Anastasia

BioNanofluidics for Drug Screening, Disease Diagnosis, Medical Device Design, and Personalized

Individual-Based Correlates of Protection Identification of Protective Level by Looking at

Cellular Automata: Overview and classical results Silvio Capobianco k, Silvio@ru.is H ask

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Seminar: Recreational Computer Science 1. Organization, Seminar Schedule & Topics Gabi R