Solvers for O(N) Electronic Structure in the Strong Scaling Limit - - PowerPoint PPT Presentation

solvers for o n electronic structure in the strong
SMART_READER_LITE
LIVE PREVIEW

Solvers for O(N) Electronic Structure in the Strong Scaling Limit - - PowerPoint PPT Presentation

Solvers for O(N) Electronic Structure in the Strong Scaling Limit with Charm++ Nicolas Bock Matt Challacombe Theoretical Division Los Alamos National Laboratory Laxmikant V. Kal Parallel Programming Laboratory University of Illinois at


slide-1
SLIDE 1

Solvers for O(N) Electronic Structure in the Strong Scaling Limit with Charm++

Nicolas Bock Matt Challacombe

Theoretical Division Los Alamos National Laboratory

Laxmikant V. Kalé

Parallel Programming Laboratory University of Illinois at Urbana- Champaign 12th Annual Workshop on Charm++ and its Applications 29th April 2014

slide-2
SLIDE 2

FreeON - O(N) Electronic Structure

FreeON 1.0 Cartesian-Gaussian LCAO basis http://www.freeon.org

HiCu ONX QCTC SP2/BCSR AINV/BCSR

slide-3
SLIDE 3

Xylose Isomerase in FreeON

slide-4
SLIDE 4

Xylose Isomerase in FreeON

slide-5
SLIDE 5

FreeON – O(N) Electronic Structure

Ohloh code analysis: http://www.ohloh.net/p/freeon

slide-6
SLIDE 6

Unified Solver Framework N-Body Solvers

Database Operations Gaming Collision Detection Computer Graphics Culling Machine Learning Science FMM/HOT

Sparse/Irregular

  • Linear scaling complexity, O(N)
  • With scalable parallelism, increasing core count

yields proportional capability gains

SP2/SpAMM ONX 3.0 Inv.Fact./SpAMM Coulomb Exch/Corr. All N-Body!

slide-7
SLIDE 7

N-Body for Electronic Structure

  • Generalize range query → metric query + ...
  • All 5 solvers as N-Body
  • Unified programming model
  • Unified data structures
  • Task-parallel decomposition
  • Clean separation between solver and runtime
  • Concise solver code
slide-8
SLIDE 8

SpAMM

Sparse Approximate Matrix Multiply (SpAMM) for matrices with decay

  • Occlusion based on metric query
  • Linear scaling electronic structure (FreeON)
  • General alternative to incomplete matrix algebra

(“sparsification”)

  • N-Body learning
  • On the fly dropping of product contributions can lead to

better accuracy than GEMM, and O(N) execution time for matrices with decay.

slide-9
SLIDE 9

SpAMM

Space Filling Curve Molecule Matrix/Quadtree Convolution/Octree

A) Exponential decay, B) Algebraic decay

slide-10
SLIDE 10

SpAMM – Task-Parallel

  • Linked list on top tiers → recursive execution
  • Task parallelism with OpenMP at top
  • Linear quadtree on bottom tiers

– Hashtables/Linear index – Kernel for efficient submatrix multiplication

  • High performance serial execution at bottom
  • Dropping is applied all the way down to 4x4
  • Non-contiguous, dynamic allocation
  • Or, contiguous allocation and position

independent data structure

slide-11
SLIDE 11

SpAMM – Error

slide-12
SLIDE 12

SpAMM – Parallel Efficiency on Magny Cours

slide-13
SLIDE 13

SpAMM – Parallel Efficiency on Xeon Phi

slide-14
SLIDE 14

SpAMM - OpenMP

slide-15
SLIDE 15

SpAMM - Charm++

  • Quadtree linked list → 2D chare array per tier
  • Recursive multiply → 3D chare array per tier
  • GreedyComm LB after each multiply
slide-16
SLIDE 16

SP2/SpAMM - Charm++

slide-17
SLIDE 17

SP2/SpAMM - Charm++

B3LYP/6-31G**: tolerance = 10-10

slide-18
SLIDE 18

SP2/SpAMM - Charm++

B3LYP/6-31G**: tolerance = 10-8

slide-19
SLIDE 19

SP2/SpAMM - Charm++

B3LYP/6-31G**: tolerance = 10-6

slide-20
SLIDE 20

Conclusions

  • Novel unified solver approach based on N-Body
  • First time demonstration of O(N) electronic structure

solver in strong scaling limit

– Parallel scaling to almost 1000 (!) cores / atom – The competition: 1 molecule or atom / core

  • Closer alignment of programming models?

– Singleton chares for N-Body? – Express same recursive task-parallel approach?

  • Holistic load balancing across solver collective?