Learning Algebraic Multigrid using Graph Neural Networks Ilay Luz, - - PowerPoint PPT Presentation

β–Ά
learning algebraic multigrid using graph neural networks
SMART_READER_LITE
LIVE PREVIEW

Learning Algebraic Multigrid using Graph Neural Networks Ilay Luz, - - PowerPoint PPT Presentation

Learning Algebraic Multigrid using Graph Neural Networks Ilay Luz, Meirav Galun, Haggai Maron, Ronen Basri, Irad Yavneh Goal: Large scale linear systems Solve = is huge, need solution! Some applications:


slide-1
SLIDE 1

Learning Algebraic Multigrid using Graph Neural Networks

Ilay Luz, Meirav Galun, Haggai Maron, Ronen Basri, Irad Yavneh

slide-2
SLIDE 2

Goal: Large scale linear systems

  • Solve 𝐡𝑦 = 𝑐
  • 𝐡 is huge, need 𝑃 π‘œ solution!
  • Some applications:
  • Discretization of PDEs
  • Sparse graph analysis

πœ–2𝑣 πœ–π‘¦2 + πœ–2𝑣 πœ–π‘§2 = 𝑔 𝑦, 𝑧

slide-3
SLIDE 3

Efficient linear solvers

  • Decades of research on efficient iterative solvers for large-

scale systems

  • We focus on Algebraic Multigrid (AMG) solvers
  • Can we use machine learning to improve AMG solvers?
  • Follow-up to Greenfeld et al. (2019) on Geometric Multigrid
slide-4
SLIDE 4

What AMG does

  • AMG works by successively

coarsening the system of equations, and solving on multiple scales

  • Prolongation operator 𝑄 that

creates the hierarchy

  • We want to learn a mapping π‘„πœ„ 𝐡

with fast convergence

slide-5
SLIDE 5

Learning 𝑄

  • Unsupervised loss function over distribution 𝒠:

min

πœ„ 𝔽𝐡~𝒠 𝜍 𝑁 𝐡, π‘„πœ„ 𝐡

  • 𝜍 𝑁 𝐡, π‘„πœ„ 𝐡

measures the convergence factor of the solver

  • π‘„πœ„ 𝐡 is a NN mapping system 𝐡 to prolongation operator 𝑄
slide-6
SLIDE 6

Graph neural network

  • Sparse matrices can be represented as graphs – we use a Graph

Neural Network as the mapping π‘„πœ„ 𝐡

4 5 2 3 1 6 7

2.7 βˆ’0.5 βˆ’0.5 βˆ’1.7 βˆ’0.5 7.7 βˆ’4.9 βˆ’0.6 βˆ’1.7 βˆ’0.5 βˆ’4.9 6.2 βˆ’0.8 βˆ’0.6 2.9 βˆ’0.6 βˆ’1.7 βˆ’1.7 βˆ’0.6 13.1 βˆ’10.8 βˆ’0.8 βˆ’10.8 11.6 βˆ’1.7 βˆ’1.7 3.4

slide-7
SLIDE 7

Benefits of our approach

  • Unsupervised training– rely on algebraic properties
  • Generalization –learn general rules for wide class of problems
  • Efficient training – Fourier analysis reduces computational

burden

slide-8
SLIDE 8

Sample result; lower is better, ours is lower!

Finite Element PDE

slide-9
SLIDE 9

Outline

  • Overview of AMG
  • Learning objective
  • Graph neural network
  • Results
slide-10
SLIDE 10

1st ingredient of AMG: Relaxation

  • System of equations: 𝑏𝑗1𝑦1 + 𝑏𝑗2𝑦2 + β‹― π‘π‘—π‘œπ‘¦π‘œ = 𝑐𝑗
  • Rearrange: 𝑦𝑗 =

1 𝑏𝑗𝑗 𝑐𝑗 βˆ’ Οƒπ‘˜β‰ π‘— π‘π‘—π‘˜π‘¦π‘˜

  • Start with an initial guess 𝑦𝑗

(0)

  • Iterate until convergence: 𝑦𝑗

(𝑙+1) = 1 𝑏𝑗𝑗 𝑐𝑗 βˆ’ Οƒπ‘˜β‰ π‘— π‘π‘—π‘˜π‘¦π‘˜ (𝑙)

slide-11
SLIDE 11

Relaxation smooths the error

  • Since relaxation is a local procedure, its effect is to smooth
  • ut the error
  • How to accelerate relaxation by dealing with low-frequency

errors?

slide-12
SLIDE 12

2nd ingredient of AMG: Coarsening

  • Smooth error, and then coarsen
  • Error is no longer smooth on coarse grid; relaxation is fast

again!

Relax Coarsen

slide-13
SLIDE 13

Putting it all together

Relaxation (smoothing) Error on original problem Restriction Error approximated on coarsened problem Prolongation Smaller Error on original problem

slide-14
SLIDE 14

Learning objective

slide-15
SLIDE 15

Prolongation operator

  • Focus of AMG is prolongation operator 𝑄 for defining scales

and moving between them

  • 𝑄 needs to be sparse for efficiency, but also approximate well

smooth errors

slide-16
SLIDE 16

Learning 𝑄

  • Quality can be quantified by estimating by how much the

error is reduced each iteration:

  • 𝑓(𝑙+1) = 𝑁 𝐡, 𝑄 𝑓(𝑙)
  • 𝑁 𝐡, 𝑄 = 𝑇 𝐽 βˆ’ 𝑄 π‘„π‘ˆπ΅π‘„ βˆ’1π‘„π‘ˆπ΅ 𝑇
  • Asymptotically: ‖𝑓(𝑙+1)β€– β‰ˆ 𝜍 𝑁 ‖𝑓(𝑙)β€–
  • Spectral radius: 𝜍 𝑁 = max πœ‡1 , … , πœ‡π‘œ
  • Our learning objective:

min

πœ„ 𝔽𝐡~𝒠 𝜍 𝑁 𝐡, π‘„πœ„ 𝐡

slide-17
SLIDE 17

Graph neural network

slide-18
SLIDE 18

Representing π‘„πœ„

  • Sparse matrix 𝐡 ∈ β„π‘œΓ—π‘œ to sparse matrix 𝑄 ∈ β„π‘œΓ—π‘œπ‘‘
  • Mapping should be efficient
  • Matrices can be represented as graphs with edge weights
slide-19
SLIDE 19

Representing π‘„πœ„

4 5 2 3 1 6 7

2.7 βˆ’0.5 βˆ’0.5 βˆ’1.7 βˆ’0.5 7.7 βˆ’4.9 βˆ’0.6 βˆ’1.7 βˆ’0.5 βˆ’4.9 6.2 βˆ’0.8 βˆ’0.6 2.9 βˆ’0.6 βˆ’1.7 βˆ’1.7 βˆ’0.6 13.1 βˆ’10.8 βˆ’0.8 βˆ’10.8 11.6 βˆ’1.7 βˆ’1.7 3.4

1 0 .3 0.7 0.9 0.1 1 0.2 0.8 1 1

Input 𝐡 Output 𝑄

4 5 2 3 1 6 7

1 4 6 1 2 3 4 5 6 7

Sparsity pattern

4 5 2 3 1 6 7

slide-20
SLIDE 20

GNN architecture

  • Message Passing architectures can handle any graph, and

have 𝑃 π‘œ runtime

  • Graph Nets framework from Battaglia et al. (2018) generalize

many MP variants, handle edge features

4 5 2 3 1 6 7 1 5 2 3

slide-21
SLIDE 21

Results

slide-22
SLIDE 22

Spectral clustering

  • Bottleneck is an iterative eigenvector algorithm

that uses a linear solver

  • Evaluate number of iterations required to reach

convergence

  • Train network on dataset of small 2D clusters,

test on various 2D and 3D distributions

slide-23
SLIDE 23

Conclusion

  • Algebraic Multigrid is an effective 𝑷 𝒐 solver for a wide class of linear

systems 𝐡𝑦 = 𝑐

  • Main challenge in AMG is constructing prolongation operator 𝑸, which

controls how information is passed between grids

  • We use an 𝑃 π‘œ , edge-based GNN to learn a mapping π‘„πœ„ 𝐡 , without

supervision

  • GNN generalizes to larger problems, with different distributions of

sparsity pattern and elements

slide-24
SLIDE 24

Take home messages

  • In a well-developed field, might make sense to apply ML to a

part of the algorithm

  • Graph neural networks can be an effective tool for learning

sparse linear systems