SLIDE 17 pre-com pute pair-wis e quantities com pute J and K(Eq. 8, 9) form Focks ub-m atrices (Eq. 7) gath er com plete F
atrix F s catter F com pute m atrix C(Eq. 5 ) gather and b roadcas t P
C
e?
G ues s initial m
- lecular orbital coefficients
m atrix C and com pute dens ity m atrix P(Eq.10) done s tart
p r e - c
p u t e C
p u t e Ja n d K g a t h e r F
k m a t r i x D i s t r i b u t eF
k m a t r i x S
v e e i g e n v a l u e p r
l e m f i n a l g a t h e r
1 2 3 4 5 6
y es no
m aster MPI p rocesses, m ultiple PO S IX threads m aster MPI processes, m ultiple PO S IX threads, GPUs m aster MPI p rocesses, ran k 0 MPI proc ess all MPI processes all MPI processes all MPI processes, ran k 0 MPI process
Parallelization strategy (II)
- Start as MPI program, each node
has as many MPI processes as CPU cores
- One MPI process per node is
designated as “master”
- The master MPI processes create
threads for controlling GPUs as well as CPU work threads
- MPI processes/GPU management
threads/CPU work threads are awaken or put to sleep as needed
IPDPS 200