Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems - PowerPoint PPT Presentation

Band Systems Tridiagonal Systems Cyclic Reduction Parallel Numerical Algorithms Chapter 4 – Sparse Linear Systems Section 4.2 – Banded Matrices Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 1 / 28

Band Systems Tridiagonal Systems Cyclic Reduction Outline Band Systems 1 Tridiagonal Systems 2 Cyclic Reduction 3 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 2 / 28

Band Systems Tridiagonal Systems Cyclic Reduction Banded Linear Systems Bandwidth (or semibandwidth ) of n × n matrix A is smallest value w such that for all a ij = 0 | i − j | > w Matrix is banded if w ≪ n If w ≫ p , then minor modifications of parallel algorithms for dense LU or Cholesky factorization are reasonably efficient for solving banded linear system Ax = b If w � p , then standard parallel algorithms for LU or Cholesky factorization utilize few processors and are very inefficient Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 3 / 28

Band Systems Tridiagonal Systems Cyclic Reduction Narrow Banded Linear Systems More efficient parallel algorithms for narrow banded linear systems are based on divide-and-conquer approach in which band is partitioned into multiple pieces that are processed simultaneously Reordering matrix by nested dissection is one example of this approach Because of fill, such methods generally require more total work than best serial algorithm for system with dense band We will illustrate for tridiagonal linear systems, for which w = 1 , and will assume pivoting is not needed for stability (e.g., matrix is diagonally dominant or symmetric positive definite) Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 4 / 28

Band Systems Tridiagonal Systems Cyclic Reduction Tridiagonal Linear System Tridiagonal linear system has form  b 1 c 1   x 1   y 1  a 2 b 2 c 2 x 2 y 2       . . ... ... ...       . . = . .                   a n − 1 b n − 1 c n − 1 x n − 1 y n − 1       a n b n x n y n For tridiagonal system of order n , LU or Cholesky factorization incurs no fill, but yields serial thread of length Θ( n ) through task graph, and hence no parallelism Neither cdivs nor cmods can be done simultaneously Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 5 / 28

Band Systems Tridiagonal Systems Cyclic Reduction Tridiagonal System, Natural Order ×× 15 15 ×× × × × × 14 14 × × × × × × × × × 13 13 × × × × × × A 12 12 × ×× ×× × × 11 11 × × × × × × × × 10 10 × × × × × 9 9 G ( A ) 8 T ( A ) 8 × 7 7 ×× × × 6 6 × × × × × × 5 5 × × × × L 4 4 × × ×× × 3 3 × × × × × 2 2 × × × × 1 1 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 6 / 28

Band Systems Tridiagonal Systems Cyclic Reduction Two-Way Elimination Other orderings may enable some degree of parallelism, however For example, elimination from both ends (sometimes called twisted factorization) yields two concurrent threads (odd-numbered nodes and even-numbered nodes) through task graph and still incurs no fill Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 7 / 28

Band Systems Tridiagonal Systems Cyclic Reduction Tridiagonal System, Two-Way Elimination × × 2 × × ×× × × 4 × × × × × × × × 6 × × × × × × A 8 × × × 15 × × × 10 ×× × × × × 13 14 × × × 12 × × × 11 12 × × × 14 9 10 G ( A ) T ( A ) 15 7 8 × 13 × 5 6 ×× × 11 × 3 4 × × 9 × × × × 1 2 × × L 7 × × × × 5 ×× × × 3 × × × × × × × 1 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 8 / 28

Band Systems Tridiagonal Systems Cyclic Reduction Odd-Even Ordering Repeating this idea recursively gives odd-even ordering (variant of nested dissection), which yields even more parallelism, but incurs some fill Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 9 / 28

Band Systems Tridiagonal Systems Cyclic Reduction Tridiagonal System, Odd-Even Ordering × × 8 × × × × × × × 12 × × × × × × × × 4 × × × × × A 14 ×× × × × × 6 × × × × × × ×× × × 10 15 × × × × × 2 13 14 G ( A ) 15 × 9 11 10 12 7 × × × 11 1 5 3 7 2 6 4 8 × × 3 T ( A ) × × L 13 ×× × × × × × × × 5 × × × ×× × + + × 9 × + + × × × + + + + × 1 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 10 / 28

Band Systems Tridiagonal Systems Cyclic Reduction Cyclic Reduction Recursive nested dissection for tridiagonal system can be effectively implemented using cyclic reduction (or odd-even reduction ) Linear combinations of adjacent equations in tridiagonal system are used to eliminate alternate unknowns Adding appropriate multiples of ( i − 1) st and ( i + 1) st equations to i th equation eliminates x i − 1 and x i +1 , respectively, from i th equation Resulting new i th equation involves x i − 2 , x i , and x i +2 , but not x i − 1 or x i +1 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 11 / 28

Band Systems Tridiagonal Systems Cyclic Reduction Cyclic Reduction For tridiagonal system, i th equation a i x i − 1 + b i x i + c i x i +1 = y i is transformed into a i x i − 2 + ¯ ¯ b i x i + ¯ c i x i +2 = ¯ y i where ¯ ¯ a i = α i a i − 1 , b i = b i + α i c i − 1 + β i a i +1 ¯ c i = β i c i +1 , ¯ y i = y i + α i y i − 1 + β i y i +1 with α i = − a i /b i − 1 and β i = − c i /b i +1 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 12 / 28

Band Systems Tridiagonal Systems Cyclic Reduction Cyclic Reduction After transforming each equation in system (handling first two and last two equations as special cases), matrix of resulting new system has form ¯  b 1 0 c 1 ¯  ¯ 0 b 2 0 c 2 ¯   ¯   a 3 ¯ 0 b 3 0 c 3 ¯   ... ... ... ... ...       ¯   a n − 2 ¯ 0 b n − 2 0 ¯ c n − 2   ¯   ¯ a n − 1 0 b n − 1 0   ¯ a n ¯ 0 b n Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 13 / 28

Band Systems Tridiagonal Systems Cyclic Reduction Cyclic Reduction Reordering equations and unknowns to place odd indices before even indices, matrix then has form ¯   b 1 c 1 ¯ ... ¯   ¯ a 3 b 3   ... ...     c n − 3 ¯   ¯   a n − 1 ¯ b n − 1 0   ¯   0 b 2 ¯ c 2   ...   ¯   a 4 ¯ b 4   ... ...     c n − 2 ¯   ¯ a n ¯ b n Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 14 / 28

Band Systems Tridiagonal Systems Cyclic Reduction Cyclic Reduction System breaks into two independent tridiagonal systems that can be solved simultaneously (i.e., divide-and-conquer) Each resulting tridiagonal system can in turn be solved using same technique (i.e., recursively) Thus, there are two distinct sources of potential parallelism simultaneous transformation of equations in system simultaneous solution of multiple tridiagonal subsystems Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 15 / 28

Band Systems Tridiagonal Systems Cyclic Reduction Cyclic Reduction Cyclic reduction requires log 2 n steps, each of which requires Θ( n ) operations, so total work is Θ( n log n ) Serially, cyclic reduction is therefore inferior to LU or Cholesky factorization, which require only Θ( n ) work for tridiagonal system But in parallel, cyclic reduction can exploit up to n -fold parallelism and requires only Θ(log n ) time in best case Often matrix becomes approximately diagonal in fewer than log n steps, in which case reduction can be truncated and still attain acceptable accuracy Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 16 / 28

Band Systems Tridiagonal Systems Cyclic Reduction Cyclic Reduction Cost for solving tridiagonal system by best serial algorithm is about T 1 ≈ 8 γ n where γ is time for one addition or multiplication Cost for solving tridiagonal system serially by cyclic reduction is about T 1 ≈ 12 γ n log 2 n which means that efficiency is less than 67%, even with p = 1 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 17 / 28

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems - PowerPoint PPT Presentation

Band Systems Tridiagonal Systems Cyclic Reduction Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.2 Banded Matrices Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.1 Parallel Algorithm

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.3 Parallel

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms for Heterogeneous Parallel Computers Antonio M. Vidal Maci a

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort

1 2. Light has particle properties: Light is a form of Electromagnetic Radiation Photoelectric

How does Hash Join work in PostgreSQL and its derivates Yandong Yao Pivotal Greenplum team

How to handle data that is really large? Really big, as in... Petabytes (PB, about 1000 times

Framing Event Variables Paul M. Pietroski University of Maryland

Continuous Delivery Jez Humble, ThoughtWorks Studios @jezhumble #continuousdelivery JAOO 2010,

The computation of photonic band gaps Christian Wieners Institut fr Angewandte und Numerische

CSE 311: Foundations of Computing Lecture 6: More Predicate Logic Last class: Predicates

THE VANCOUVER ISLAND WILDERNESS PRESERVA TION MOVEMENT STUDY QUESTIONNAIRE FOR SIERRA CLUB OF W

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems - PowerPoint PPT Presentation

Band Systems Tridiagonal Systems Cyclic Reduction Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.2 Banded Matrices Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.1 Parallel Algorithm

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.3 Parallel

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Parallel Algorithms Parallel Algorithms Examples Examples Concepts &amp; Definitions

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms for Heterogeneous Parallel Computers Antonio M. Vidal Maci a

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort

1 2. Light has particle properties: Light is a form of Electromagnetic Radiation Photoelectric

How does Hash Join work in PostgreSQL and its derivates Yandong Yao Pivotal Greenplum team

How to handle data that is really large? Really big, as in... Petabytes (PB, about 1000 times

Framing Event Variables Paul M. Pietroski University of Maryland

Continuous Delivery Jez Humble, ThoughtWorks Studios @jezhumble #continuousdelivery JAOO 2010,

The computation of photonic band gaps Christian Wieners Institut fr Angewandte und Numerische

CSE 311: Foundations of Computing Lecture 6: More Predicate Logic Last class: Predicates

THE VANCOUVER ISLAND WILDERNESS PRESERVA TION MOVEMENT STUDY QUESTIONNAIRE FOR SIERRA CLUB OF W

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions