Communication Avoiding Successive Band Reduction Nick Knight, Grey - PowerPoint PPT Presentation

Communication Avoiding Successive Band Reduction Nick Knight, Grey Ballard, James Demmel UC Berkeley SIAM PP12 Research supported by Microsoft (Award #024263) and Intel (Award #024894) funding and by matching funding by U.C. Discovery (Award #DIG07-10227). Additional support comes from Par Lab affiliates National Instruments, NEC, Nokia, NVIDIA, and Samsung.

Talk Summary For high performance, we must reformulate existing algorithms in order to reduce data movement (i.e., avoid communication) We want to tridiagonalize a symmetric band matrix Application: dense symmetric eigenproblem Only want the eigenvalues (no eigenvectors) Our improved band reduction algorithm Moves asymptotically less data Speeds up against tuned libraries on a multicore platform, up to 2 × serial, 6 × parallel With our band-reduction approach, two-step tridiagonalization of a dense matrix is communication-optimal for all problem sizes Nick Knight Communication Avoiding Successive Band Reduction 1

Motivation By communication we mean moving data within memory hierarchy on a sequential computer moving data between processors on a parallel computer Local Local Local SLOW Local Local Local FAST Local Local Local Sequential Parallel Communication is expensive, so our goal is to minimize it in many cases we need new algorithms in many cases we can prove lower bounds and optimality Nick Knight Communication Avoiding Successive Band Reduction 2

Direct vs Two-Step Tridiagonalization Application: solving the dense symmetric eigenproblem via reduction to tridiagonal form (tridiagonalization) Conventional approach (e.g. LAPACK) is direct tridiagonalization Two-step approach reduces first to band, then band to tridiagonal Direct: A T Two-step: A B T Nick Knight Communication Avoiding Successive Band Reduction 3

Direct vs Two-Step Tridiagonalization Application: solving the dense symmetric eigenproblem via reduction to tridiagonal form (tridiagonalization) Conventional approach (e.g. LAPACK) is direct tridiagonalization Two-step approach reduces first to band, then band to tridiagonal Direct: MatMul  Direct  Two‐step  9000  8000  7000  A T MFLOPS  6000  5000  Two-step: 4000  3000  2000  1000  0  0  1000  2000  3000  4000  5000  6000  7000  8000  n  A B T Nick Knight Communication Avoiding Successive Band Reduction 3

Why is direct tridiagonalization slow? Communication costs! Approach Flops Words Moved MatMul  Direct  Two‐step  3 n 3 4 O ( n 3 ) 9000  Direct 8000  7000  O ( n 3 4 MFLOPS  6000  3 n 3 (1) M ) √ 5000  Two-step O ( n 2 √ O ( n 2 √ 4000  3000  (2) M ) M ) 2000  1000  0  0  1000  2000  3000  4000  5000  6000  7000  8000  n  M = fast memory size Direct approach achieves O (1) data re-use Two-step approach moves fewer words than direct approach √ using intermediate bandwidth b = Θ( M ) √ Full-to-banded step (1) achieves O ( M ) data re-use this is optimal Band reduction step (2) achieves O (1) data re-use Can we do better? Nick Knight Communication Avoiding Successive Band Reduction 4

Band Reduction - previous work 1963 Rutishauser: Givens-based down diagonals and Householder-based 1968 Schwarz: Givens-based up columns 1975 Muraka-Horikoshi: improved R’s Householder-based algorithm 1984 Kaufman: vectorized S’s algorithm 1993 Lang: parallelized M-H’s algorithm (distributed-mem) 2000 Bischof-Lang-Sun: generalized everything but S’s algorithm 2009 Davis-Rajamanickam: Givens-based in blocks 2011 Luszczek-Ltaief-Dongarra: parallelized M-H’s algorithm (shared-mem) 2011 Haidar-Ltaief-Dongarra: combined L-L-D and D-R see A. Haidar’s talk in MS50 tomorrow Nick Knight Communication Avoiding Successive Band Reduction 5

Band Reduction - previous work 1963 Rutishauser: Givens-based down diagonals and Householder-based 1968 Schwarz: Givens-based up columns 1975 Muraka-Horikoshi: improved R’s Householder-based algorithm 1984 Kaufman: vectorized S’s algorithm 1993 Lang: parallelized M-H’s algorithm (distributed-mem) 2000 Bischof-Lang-Sun: generalized everything but S’s algorithm ← 2009 Davis-Rajamanickam: Givens-based in blocks 2011 Luszczek-Ltaief-Dongarra: parallelized M-H’s algorithm (shared-mem) 2011 Haidar-Ltaief-Dongarra: combined L-L-D and D-R see A. Haidar’s talk in MS50 tomorrow Nick Knight Communication Avoiding Successive Band Reduction 5

Successive Band Reduction (bulge-chasing) Q 1 T constraint: b+1 Q 2 T c + d ≤ b d+1 Q 1 1 6 c Q 3 T c+d Q 2 2 c d Q 4 T Q 3 3 b = bandwidth Q 5 T c = columns d = diagonals Q 4 4 Q 5 5 Nick Knight Communication Avoiding Successive Band Reduction 6

How do we get data re-use? 1 Increase number of columns in parallelogram ( c ) permits blocking Householder updates: O ( c ) re-use constraint c + d ≤ b = ⇒ trade-off between re-use and progress 2 Chase multiple bulges at a time ( ω ) apply several updates to band while it’s in cache: O ( ω ) re-use bulges cannot overlap, need working set to fit in cache b+1 d+1 QR PRE SYM c POST Nick Knight Communication Avoiding Successive Band Reduction 7

Data access patterns One bulge at a time Four bulges at a time ω = 4: same amount of work, 4 × fewer words moved Nick Knight Communication Avoiding Successive Band Reduction 8

Shared-Memory Parallel Implementation lots of dependencies: use pipelining threads maintain working sets which never overlap Nick Knight Communication Avoiding Successive Band Reduction 9

Communication Avoiding Successive Band Reduction Nick Knight, Grey - PowerPoint PPT Presentation

Communication Avoiding Successive Band Reduction Nick Knight, Grey Ballard, James Demmel UC Berkeley SIAM PP12 Research supported by Microsoft (Award #024263) and Intel (Award #024894) funding and by matching funding by U.C. Discovery (Award

Radio/microwave band The electromagnetic spectrum Infra red band Optical band UV band X-ray

The Franklin Band The Franklin Band Overview Lets be clear on this: CONCERT BAND IS THE MOST

Band Presentation: By Leigh Maisey Initial Band The Scene and Herd Band Change Red Sky Was

NMCHS CONDOR Band Marching Band, DrumLine, Color Guard, Jazz Band Performances Band Reviews:

Welcome to the Washington Warrior Band Program DARIEN M. ORR BAND DIRECTOR Phone: (217) 525-

LGHS Music Department Ma Ricel Riley - Choir, Orchestra and Music Theory Andrew Hill - Band,

Srdjan Mihaljevic CTO TAS Band Sub-band Frequency Wavelength HF 3-30 MHz decametric waves

CHEVREUL Simultaneous Contrast Successive Contrast Successive Contrast Mixed Contrast look

Wireless Communication Systems @CS.NCTU Lecture 11: Successive Interference Cancellation

Wireless Communication Systems @CS.NCTU Lecture 8: Successive Interference Cancellation

Band Orientation 2019 What is the SMS Band? The Shelburne band is a place where students have fun

Brass Band Participation in the To Absent Friends Festival Peter Ottery, Secretary, Edinburgh

Amersham Band AN ILLUSTRATED STORY BY FRED HARRISON AND CAROLINE PERKINS 17 MARCH 2016 Our

2012 HSC - Business Studies Band 4/5 Sample 2 Question 25 2012 HSC - Business Studies Band 4/5

WELCOME BAND PARENTS! Friday, May 19, 2017 CTJ Band Hall Our Kids Are Amazing!!! Upcoming

WELCOME! STONE HILL MIDDLE SCHOOL - BAND - PROGRAM Robyn Lawrence Andrew Giotta Band

Consumer protection requirements and supervisory activities - KNFs model Micha Kruszka , Ph.D.

With Great Eastern, youre always covered F INANCIAL R ESULTS FOR T HIRD Q UARTER 2011 31 OCTOBER

NE SGORTA Investor Presentation 2019 Facts and Figures in 2019 Results of Growth and

Welcome To Back to School Night Choir, Guitar, Band Mr. Jonathan Har tling Bachelor of Music

OVERVIEW AND IMPACT GUIDELINE on ccTLD CONTRIBUTIONS to ICANN VALUE EXCHANGE MODEL 2013 SPECIFIC

Lower Kootenay Band Wetlands Restoration Projects Yaqan Nukiy Wetland Restoration Project Yaqan

and the Student Learning Objectives Process An update from O David Deitz Consultant, Educator

2018 Facilities Improvement Presentation September 12, 2018 Dr. Colin Manahan The District will: