Heuristics for Nested Dissection to Reduce Fill-in in Sparse Matrix - PowerPoint PPT Presentation

Heuristics for Nested Dissection to Reduce Fill-in in Sparse Matrix Factorizations Miguel Vargas-Félix, Salvador Botello-Rionda miguelvargas@cimat.mx, botello@cimat.mx Morelia, November 6, 2013 1/28

Sparse matrices In most problems of finite element method, finite volume or isogeometric analysis we have to solve a linear system of equations A x = b . When the stiffness matrix is assembled, the relation between adjacent nodes is captured as entries in a matrix. Because a node has adjacency with only a few others, the resulting matrix has a very sparse structure. ( ⋱ ) ∘ ∘ ∘ ∘ ∘ ∘ ∘ ⋯ ∘ ∘ ∘ ∘ ∘ ∘ ∘ ⋯ ∘ a i i ∘ a i j ∘ 0 ∘ ⋯ ∘ ∘ ∘ ∘ ∘ ∘ ∘ ⋯ i j ∘ a j i ∘ a j j ∘ 0 ∘ ⋯ ∘ ∘ ∘ ∘ ∘ ∘ ∘ ⋯ ∘ 0 ∘ 0 ∘ a k k ∘ ⋯ ∘ ∘ ∘ ∘ ∘ ∘ ∘ ⋯ k ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ http://www.cimat.mx/~miguelvargas 2/28

Lets define the notation η( A ) , it indicates the number of non-zero entries of A . 556 × 556 has 309,136 entries, with η( A )= 1810, this means that only the 0.58% of For example, A ∈ℝ the entries are non zero. Black dots indicates a non zero entry in the matrix In finite element problems all matrices have symmetric structure, and depending on the problem symmetric values or not. http://www.cimat.mx/~miguelvargas 3/28

Cholesky factorization for sparse matrices T is O ( n 3 ) . For full matrices the computational complexity of Cholesky factorization A = LL To calculate entries of L L j j ( A i j − ∑ L i k L j k ) , for i > j j − 1 L i j = 1 k = 1 L j j = √ A j j − ∑ j − 1 2 . L j k k = 1 We use four strategies to reduce time and memory usage when performing this factorization on sparse matrices: 1. Reordering of rows and columns of the matrix to reduce fill-in in L . This is equivalent to use a permutation matrix to reorder the system ( P AP T ) ( Px ) = ( P b ) . 2. Use symbolic Cholesky factorization to obtain an exact L factor (non zero entries in L ). 3. Organize operations to improve cache usage. 4. Parallelize the factorization. http://www.cimat.mx/~miguelvargas 4/28

Matrix reordering We want to reorder rows and columns of A , in a way that the number of non-zero entries of L are reduced. η ( L ) indicates the number of non-zero entries of L . A = L = 556 × 556 , with η ( A ) = 1810 . To the right the lower triangular The stiffness matrix to the left A ∈ℝ matrix L , with η ( L ) = 8729 . There are several heuristics like the minimum degree algorithm [Geor81] or a nested dissection method [Kary99]. http://www.cimat.mx/~miguelvargas 5/28

By reordering we have a matrix A ' with η ( A ' ) = 1810 and its factorization L ' with η ( L ' ) = 3215. Both factorizations solve the same system of equations. A ' = L ' = We reduce the factorization fill-in by η ( L ' ) = 3215 η ( L ) = 8729 = 0.368 . To determine a “good” reordering for a matrix A that minimize the fill-in of L is an NP complete problem [Yann81]. http://www.cimat.mx/~miguelvargas 6/28

Symbolic Cholesky factorization The algorithm to determine the L i j entries that area non-zero is called symbolic Cholesky factorization [Gall90]. Let be, for all columns j = 1 … n , a j ≝ { k > j ∣ A k j ≠ 0 } , l j ≝ { k > j ∣ L k j ≠ 0 } . The sets r j will register the columns of L which structure will affect the column j of L . for i ← 1,2, … ,n r i ← ∅ A = ( a 66 ) L = ( l 66 ) · a 11 a 12 a 16 l 11 for i ← 1,2, … ,n a 21 a 22 a 23 a 24 l 21 l 22 l 32 l 33 l i ← a i a 32 a 33 a 35 · l 42 l 43 l 44 a 42 a 44 for each j ∈ r i · l 53 l 54 l 55 a 53 a 55 a 56 l i ← l i ∪ l j ∖{ i } · · l 61 l 62 l 63 l 64 l 65 a 61 a 65 if l i ≠∅ · a 2 = { 3,4 } l 2 = { 3,4,6 } p ← min { j ∈ l i } · · r p ← r p ∪ { i } · · This algorithm is very efficient, its complexity in time and space has an order of O (η( L )) . http://www.cimat.mx/~miguelvargas 7/28

How efficient is it? The next table shows results solving a 2D Poisson equation problem, comparing Cholesky and conjugate gradient with Jacobi preconditioning. Number of equations nnz(A) nnz(L) Cholesky time [s] CGJ time [s] 1,006 6,140 14,722 0.086 0.081 3,110 20,112 62,363 0.137 0.103 10,014 67,052 265,566 0.309 0.184 31,615 215,807 1’059,714 1.008 0.454 102,233 705,689 4’162,084 3.810 2.891 312,248 2’168,286 14’697,188 15.819 19.165 909,540 6’336,942 48’748,327 69.353 89.660 3’105,275 21’681,667 188’982,798 409.365 543.110 10’757,887 75’202,303 743’643,820 2,780.734 3,386.609 32,703.9 10,000 3,386.6 10,000 2,780.7 7,134.4 4,656.1 1,000 543.1 1,747.3 409.4 1,343.5 1,000 512.5 100 393.2 89.7 69.4 Memory [Mega bytes] 145.3 134.9 100 19.2 15.8 Time [s] 10 44.1 38.2 3.8 2.9 13.6 10 10.2 1 1.0 4.3 0.5 2.6 Cholesky 0.3 1.3 0.2 1 CGJ 0.1 Cholesky 0.7 0 0.1 0.1 0.1 0.4 CGJ 0 0 1,000 10,000 100,000 1,000,000 10,000,000 1,000 10,000 100,000 1,000,000 10,000,000 Equations Equations http://www.cimat.mx/~miguelvargas 8/28

Graph reordering Let G = ( V , E ) be a graph with n vertex V with m edges E , 2 1 4 7 6 3 5 A ordering (or tagging) α of G is simply a mapping from the set { 1,2, … ,n } in V , where n is the α , E α = ( V α ) . number of vertex of G . The graph tagged by α will be written G 4 5 6 3 1 7 2 http://www.cimat.mx/~miguelvargas 9/28

A , E A = ( V A ) , in it Let A be a sparse matrix, symmetric, of size n × n , the ordered graph of A , G { v i ,v j } ∈ E A if and only if A i j = A j i ≠ 0 , i ≠ j . Here v i is a vertex of V A with tag A . A = ( ● ) ● ● ● ● · ● · 2 ● ● · ● · · · ● · ● · ● · · 1 4 A , E A = ( V A ) G ● ● · ● · · ● · · ● · ● · ● 7 6 3 ● · · · · ● · 5 · · · ● ● · A , E A = ( V A ) is equivalent to a apply a permutation matrix. An ordering α applied to G T = ( ● ) ● · · · ● · · 4 · ● ● · · · ● · ● ● · · ● · 5 6 α , E α = ( V α ) G PA P · · · ● ● ● · 3 ● · · ● ● ● ● 1 7 · · ● ● ● ● · 2 · ● · · ● · For the fill-in of the Cholesky factorization, to find a “good” permutation P for A that reduces the number of non-zero entries on L means to find a “good”ordering of its graph [Geor81]. http://www.cimat.mx/~miguelvargas 10/28

Nested dissection algorithm Separator Lets consider a separator set S ⊂ V in G = ( V , E ) , when removed from G the graph is disconnected in two non-empty sets of vertex R ⊂ V and S ⊂ V , also R ∩ S =∅ , S ∩ T =∅ , R ∩ T =∅ y R ∪ S ∪ T = V In the next example V = { 1,2, … , 13 } , let S = { 1,10,13 } be a separator of G = ( V , E ) , when removed from the graph we obtain R = { 4,3,7,8,11 } and T = { 2,5,6,9,12 } 1 4 5 4 5 3 2 3 2 7 6 7 6 10 8 8 9 11 9 11 12 12 13 http://www.cimat.mx/~miguelvargas 11/28

This procedure applied in a recursive way is known as the generalized nested dissection algorithm [Lipt79]. The idea is to split the graph trying to make R and T of the same size with a small separator. This is an divide-and-conquer recursive method. Dissection G ( V , E ) If ∣ G ∣ <β The vertexes V are ordered with any order · (the minimum degree algorithm can be used) else Search for a separator set S ⊂ V such that after removing it from G · we get two sets of disconnected vertexes R and T , with ∣ R ∣ ≈ ∣ T ∣ . Remove all edges from E that have connections with entries in S . · Dissection G ( R , E ) · Dissection G ( T , E ) · Order the vertexes, putting the ones in S at the end of the order. · This algorithm is recursive and the resulting order of vertexes of G produces an efficient Cholesky factorization. An improved version of this algorithm is presented in [Kary99]. http://www.cimat.mx/~miguelvargas 12/28

This is an exaple of the nested dissection algorithm: Graph Dependency tree 1 4 5 3 2 7 6 1, 2, 3, 4, 5, 6, 7, 10 8, 9, 10, 11, 12, 8 16 13, 14, 15, 16, 11 12 9 17, 18, 19, 20 13 19 18 15 14 17 20 1 4 5 3 2 5, 2, 10, 7 6 13, 17 10 8 16 11 12 9 1, 4, 3, 7, 6, 12, 18, 20, 13 19 8, 11, 14 9, 15, 16, 19 18 15 14 17 20 http://www.cimat.mx/~miguelvargas 13/28

Graph Dependency tree 1 4 3 5, 2, 10, 7 6 13, 17 8 16 11 12 9 1, 4, 3, 7, 6, 12, 18, 20, 19 8, 11, 14 9, 15, 16, 19 18 15 14 20 1 5, 2, 10, 4 13, 17 3 7 6 8 16 7, 8 12, 15 11 12 9 19 1, 11, 18, 6, 9, 18 15 14 4, 14 20 16, 19 3 20 http://www.cimat.mx/~miguelvargas 14/28

Graph Dependency tree 1 5, 2, 10, 4 13, 17 3 6 16 7, 8 12, 15 11 9 19 1, 11, 18, 6, 9, 18 14 4, 14 20 16, 19 3 20 5, 2, 10, 1 13, 17 4 3 6 7, 8 12, 15 16 11 9 11, 18, 9, 4 19 14 20 16 18 14 1 3 20 6 19 http://www.cimat.mx/~miguelvargas 15/28

Heuristics for Nested Dissection to Reduce Fill-in in Sparse Matrix - PowerPoint PPT Presentation

Heuristics for Nested Dissection to Reduce Fill-in in Sparse Matrix Factorizations Miguel Vargas-Flix, Salvador Botello-Rionda miguelvargas@cimat.mx, botello@cimat.mx Morelia, November 6, 2013 1/28 Sparse matrices In most problems of finite

Chronic Aortic Dissection Treatment of Chronic Aortic Dissection Is Evolving 40% of repairs of

Nested Word Automata Jens Stimpfle 30.6.2014 Nested Words Nested Words Theoretically and

Nested Transactions Nested Transactions Flat transactions The rules for committing of

Nested and Composite Classes Lecture 14 COP 3252 Summer 2017 May 30, 2017 Nested Classes

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

Flood-Fill Flood-fill Used in interactive paint systems. The user specify a seed by

Heuristics and biases Tina Nane 2 Heuristics and biases Lotto Icon by Dapete is

Outline CP for VRP DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Construction Heuristics

Part II: Cow Eye Dissection Dissection of a Cow s Eye After the front of the eye has been

6 Subsequences and sequential compactness 6.1 Nested intervals and nested d -cells Recall the

NEVE: Nested Virtualization Extensions for ARM Jin Tack Lim, Christo ff er Dall, Shih-Wei Li, Jason

Decision Aid Methodologies In Transportation Lecture 7: Heuristics Heuristics Shadi SHARIF

VS SIMPLE HYSTERECTOMY AND PELVIC NODE DISSECTION IN PATIENTS WITH LOW-RISK, EARLY- STAGE

VS SIMPLE HYSTERECTOMY AND PELVIC NODE DISSECTION IN PATIENTS WITH LOW-RISK, EARLY- STAGE

1 A RANDOMIZED TRIAL COMPARING RADICAL HYSTERECTOMY AND PELVIC NODE DISSECTION VS SIMPLE

Side-Channel Countermeasures Dissection and the Limits of Closed Source Security Evaluations

Scaling limits of random dissections Igor Kortchemski (work with N. Curien and B. Haas) DMA

Bivalve Anatomy & Classification Class Bivalvia ~15,000 species; includes clams, scallops,

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks David Bau, Jun-Yan

srt st tr r

Dissecting media file formats with Kaitai Struct FOSDEM 2017 Mikhail Yakshin (GreyCat) Kaitai

Closing the Smoothness and Uniformity Gap in Area Fill Synthesis Y. Chen , , A. B. Kahng, G.

ULTIMATE A Multicenter, Prospective, Randomized Trial Comparing Intravascular Ultrasound-guided

Network Dissection: Quantifying Interpretability of Deep Visual Representations By David Bau,

Heuristics for Nested Dissection to Reduce Fill-in in Sparse Matrix - PowerPoint PPT Presentation

Heuristics for Nested Dissection to Reduce Fill-in in Sparse Matrix Factorizations Miguel Vargas-Flix, Salvador Botello-Rionda miguelvargas@cimat.mx, botello@cimat.mx Morelia, November 6, 2013 1/28 Sparse matrices In most problems of finite

Chronic Aortic Dissection Treatment of Chronic Aortic Dissection Is Evolving 40% of repairs of

Nested Word Automata Jens Stimpfle 30.6.2014 Nested Words Nested Words Theoretically and

Nested Transactions Nested Transactions Flat transactions The rules for committing of

Nested and Composite Classes Lecture 14 COP 3252 Summer 2017 May 30, 2017 Nested Classes

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

Flood-Fill Flood-fill Used in interactive paint systems. The user specify a seed by

Heuristics and biases Tina Nane 2 Heuristics and biases Lotto Icon by Dapete is

Outline CP for VRP DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Construction Heuristics

Part II: Cow Eye Dissection Dissection of a Cow s Eye After the front of the eye has been

6 Subsequences and sequential compactness 6.1 Nested intervals and nested d -cells Recall the

NEVE: Nested Virtualization Extensions for ARM Jin Tack Lim, Christo ff er Dall, Shih-Wei Li, Jason

Decision Aid Methodologies In Transportation Lecture 7: Heuristics Heuristics Shadi SHARIF

VS SIMPLE HYSTERECTOMY AND PELVIC NODE DISSECTION IN PATIENTS WITH LOW-RISK, EARLY- STAGE

VS SIMPLE HYSTERECTOMY AND PELVIC NODE DISSECTION IN PATIENTS WITH LOW-RISK, EARLY- STAGE

1 A RANDOMIZED TRIAL COMPARING RADICAL HYSTERECTOMY AND PELVIC NODE DISSECTION VS SIMPLE

Side-Channel Countermeasures Dissection and the Limits of Closed Source Security Evaluations

Scaling limits of random dissections Igor Kortchemski (work with N. Curien and B. Haas) DMA

Bivalve Anatomy &amp; Classification Class Bivalvia ~15,000 species; includes clams, scallops,

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks David Bau, Jun-Yan

srt st tr r

Dissecting media file formats with Kaitai Struct FOSDEM 2017 Mikhail Yakshin (GreyCat) Kaitai

Closing the Smoothness and Uniformity Gap in Area Fill Synthesis Y. Chen , , A. B. Kahng, G.

ULTIMATE A Multicenter, Prospective, Randomized Trial Comparing Intravascular Ultrasound-guided

Network Dissection: Quantifying Interpretability of Deep Visual Representations By David Bau,

Bivalve Anatomy & Classification Class Bivalvia ~15,000 species; includes clams, scallops,