Optimizing the steady-state throughput of scatter and reduce - PowerPoint PPT Presentation

Two Problems of Collective Communication Scatter one processor P source sends distinct messages to target � � processors ( ) P t 0 , . . . , P t N ◮ Series of Scatter P source sends consecutively a large number of distinct messages to all targets Reduce Each of the participating processor P r i in P r 0 , . . . , P r N owns a value v i ⇒ compute V = v 1 ⊕ v 2 ⊕ · · · ⊕ v N ( ⊕ is associative, non commutative) ◮ Series of Reduce several consecutive reduce operations from the same set P r 0 , . . . , P r N to the same target P target . Loris Marchal Steady state collective communications 5/ 27

Platform Model ◮ G = ( V, E, c ) ◮ P 1 , P 2 , . . . , P n : processors P 1 ◮ ( j, k ) ∈ E : communication link 10 between P i and P j 10 30 ◮ c ( j, k ) : time to transfer one unit P 0 P 3 message from P j to P k 5 5 8 ◮ one-port for incoming P 2 communications ◮ one-port for outgoing communications Loris Marchal Steady state collective communications 6/ 27

Platform Model ◮ G = ( V, E, c ) ◮ P 1 , P 2 , . . . , P n : processors P 1 ◮ ( j, k ) ∈ E : communication link between P i and P j ◮ c ( j, k ) : time to transfer one unit P 0 P 3 message from P j to P k ◮ one-port for incoming P 2 communications ◮ one-port for outgoing communications Loris Marchal Steady state collective communications 6/ 27

Platform Model ◮ G = ( V, E, c ) ◮ P 1 , P 2 , . . . , P n : processors P 1 ◮ ( j, k ) ∈ E : communication link OK between P i and P j ◮ c ( j, k ) : time to transfer one unit P 0 P 3 message from P j to P k ◮ one-port for incoming P 2 communications ◮ one-port for outgoing communications Loris Marchal Steady state collective communications 6/ 27

Framework 1. express optimization problem as set of linear constraints (variables = fraction of time a processor spends sending to one of its neighbors) 2. solve linear program (in rational numbers) 3. use solution to build periodic schedule reaching best throughput Loris Marchal Steady state collective communications 7/ 27

Outline Introduction Two Problems of Collective Communication Platform Model Framework Series of Scatter Steady-state constraints Toy Example Building a schedule Asymptotic optimality Series of Reduce Introduction to reduction trees Linear Program Periodic schedule - Asymptotic optimality Toy Example for Series of Reduce Approximation for a fixed period Conclusion Loris Marchal Steady state collective communications 8/ 27

Series of Scatter ◮ m k : types of the messages with destination P k ◮ s ( P i → P j , m k ) : fractional number of messages of type m k sent on the edge P i → P j within on time unit ◮ t ( P i → P j ) : fractional time spent by processor P i to send data to its neighbor P j within one time unit ◮ bound for this activity: ∀ P i , P j , 0 � t ( P i → P j ) � 1 ◮ on a link P i → P j during one time-unit: � t ( P i → P j ) = s ( P i → P j , m k ) k Loris Marchal Steady state collective communications 9/ 27

Linear constraints ◮ one port constraints for outgoing messages in P i : � ∀ P i , t ( P i → P j ) � 1 P i → P j ◮ one port constraints for incoming messages in P i : � ∀ P i , t ( P j → P i ) � 1 P j → P i ◮ conservation law in node P i for message m k ( k � = i ): 5 m k 3 m k P i 4 m k 2 m k � � s ( P j → P i , m k ) = s ( P j → P i , m k ) � 1 P j → P i P i → P j Loris Marchal Steady state collective communications 10/ 27

Throughput and Linear Program ◮ throughput: total number of messages m k received in P k � TP = s ( P j → P k , m k ) P j → P k (same throughput for every target node P k ) ◮ summarize this constraints in a linear program: Steady-State Scatter Problem on a Graph SSSP(G) Maximize TP , subject to  ∀ P i , ∀ P j , 0 � s ( P i → P j ) � 1  ∀ P i , �  P j , ( i,j ) ∈ E s ( P i → P j ) � 1     ∀ P i , � P j , ( j,i ) ∈ E s ( P j → P i ) � 1     ∀ P i , P j , s ( P i → P j ) = � m k send ( P i → P j , m k ) × c ( i, j ) ∀ P i , ∀ m k , k � = i, � P j , ( j,i ) ∈ E send ( P j → P i , m k )      = � P j , ( i,j ) ∈ E send ( P i → P j , m k )     ∀ P k , k ∈ T � P i , ( i,k ) ∈ E send ( P i → P k , m k ) = TP  Loris Marchal Steady state collective communications 11/ 27

Series of Scatter - Toy Example P s 1 1 P a P b 2/3 4/3 4/3 P 0 P 1 platform graph (edges labeled with c ( i, j ) ) Loris Marchal Steady state collective communications 12/ 27

Series of Scatter - Toy Example P s 1 4 m 0 1 4 m 0 1 2 m 1 P a P b 1 4 m 0 1 1 4 m 0 2 m 1 P 0 P 1 value of s ( P i → P j , m k ) in the solution of the linear program Loris Marchal Steady state collective communications 12/ 27

Series of Scatter - Toy Example P s 1/4 3/4 P a P b 1/3 1/6 2/3 P 0 P 1 occupation time of the edge ( t ( P i → P j ) ) Loris Marchal Steady state collective communications 12/ 27

Building a schedule P s ◮ consider the time needed 1 2 ( 1 2 m 1 ) 4 ( 1 1 4 m 0 ) for all transfers 1 4 ( 1 4 m 0 ) ◮ build a bipartite graph by P a P b splitting all nodes 0 ) 4 m 1 6 ( 1 3 ( 1 2 4 m 0 ) 1 2 m 1 ) ◮ extract matchings, using 3 ( 1 the weighted-edge coloring algorithm P 0 P 1 Loris Marchal Steady state collective communications 13/ 27

Building a schedule P send P recv s s ◮ consider the time needed 1 1 1 for all transfers 4 2 4 ◮ build a bipartite graph by P send P recv P send P recv a a b b splitting all nodes 1 ◮ extract matchings, using 1 3 2 6 3 the weighted-edge coloring algorithm P send P recv P send P recv 1 0 1 1 Loris Marchal Steady state collective communications 13/ 27

Building a schedule P send P recv s s ◮ consider the time needed 1 for all transfers 2 ◮ build a bipartite graph by P send P recv P send P recv a a b b splitting all nodes ◮ extract matchings, using 1 2 the weighted-edge coloring algorithm P send P recv P send P recv 1 0 1 1 Loris Marchal Steady state collective communications 13/ 27

Building a schedule P send P recv s s ◮ consider the time needed 1 for all transfers 4 ◮ build a bipartite graph by P send P recv P send P recv a a b b splitting all nodes 1 ◮ extract matchings, using 4 the weighted-edge coloring algorithm P send P recv P send P recv 1 0 1 1 Loris Marchal Steady state collective communications 13/ 27

Building a schedule P send P recv s s ◮ consider the time needed 1 for all transfers 6 ◮ build a bipartite graph by P send P recv P send P recv a a b b splitting all nodes ◮ extract matchings, using 1 1 6 6 the weighted-edge coloring algorithm P send P recv P send P recv 1 0 1 1 Loris Marchal Steady state collective communications 13/ 27

Building a schedule P send P recv s s ◮ consider the time needed 1 for all transfers 12 ◮ build a bipartite graph by P send P recv P send P recv a a b b splitting all nodes 1 ◮ extract matchings, using 12 the weighted-edge coloring algorithm P send P recv P send P recv 1 0 1 1 Loris Marchal Steady state collective communications 13/ 27

Building a schedule matchings: 1 P b → P 1 P send P recv s s 1 P b → P 0 2 P send P recv P send P recv a a b b P a → P 0 1 2 P s → P b P send P recv P send P recv 1 0 1 1 P s → P a t 1 3 11 0 1 2 4 12 ◮ least common multiple T = lcm { b i } where a i b i denotes the number of messages transfered in each matching ◮ ⇒ periodic schedule of period T with atomic transfers of messages Loris Marchal Steady state collective communications 14/ 27

Building a schedule matchings: 3 P b → P 1 P send P recv s s 1 P b → P 0 6 P send P recv P send P recv a a b b P a → P 0 1 1 6 6 P s → P b P send P recv P send P recv 1 0 1 1 P s → P a t 1 3 11 0 1 2 4 12 ◮ least common multiple T = lcm { b i } where a i b i denotes the number of messages transfered in each matching ◮ ⇒ periodic schedule of period T with atomic transfers of messages Loris Marchal Steady state collective communications 14/ 27

Building a schedule matchings: 1 2 3 4 P b → P 1 P b → P 0 P a → P 0 P s → P b P s → P a t 1 3 11 0 1 2 4 12 ◮ least common multiple T = lcm { b i } where a i b i denotes the number of messages transfered in each matching ◮ ⇒ periodic schedule of period T with atomic transfers of messages Loris Marchal Steady state collective communications 14/ 27

Building a schedule matchings 1 2 3 4                                                            � P b → P 1 P b → P 0 P a → P 0 P s → P b P s → P a t 0 10 20 30 40 48 ◮ least common multiple T = lcm { b i } where a i b i denotes the number of messages transfered in each matching ◮ ⇒ periodic schedule of period T with atomic transfers of messages Loris Marchal Steady state collective communications 14/ 27

Asymptotic optimality ◮ No schedule can perform more tasks than the steady-state: Lemma. opt ( G, K ) � TP ( G ) × K ◮ periodic schedule ⇒ schedule: 1. initialization phase (fill buffers of messages) 2. r periods of duration T (steady-state) 3. clean-up phase (empty buffers) Lemma. the previous algorithm is asymptotically optimal: steady ( G, K ) lim = 1 opt ( G, K ) K → + ∞ Loris Marchal Steady state collective communications 15/ 27

Outline Introduction Two Problems of Collective Communication Platform Model Framework Series of Scatter Steady-state constraints Toy Example Building a schedule Asymptotic optimality Series of Reduce Introduction to reduction trees Linear Program Periodic schedule - Asymptotic optimality Toy Example for Series of Reduce Approximation for a fixed period Conclusion Loris Marchal Steady state collective communications 16/ 27

Reduce - Reduction trees ◮ Reduce: ◮ each processor P r i owns a value v i ◮ compute V = v 1 ⊕ v 2 ⊕ · · · ⊕ v N ( ⊕ associative, non commutative) ◮ partial result of the Reduce operation: v [ k,m ] = v k ⊕ v 2 ⊕ · · · ⊕ v m ◮ two partial results can be merged: v [ k,m ] = v [ k,l ] ⊕ v [ l +1 ,m ] (computational task T k,l,m ) Loris Marchal Steady state collective communications 17/ 27

Reduce - Reduction trees v 0 v 1 v 2 ◮ Reduce: P 0 P 1 P 2 ◮ each processor P r i owns a value v i ◮ compute V = v 1 ⊕ v 2 ⊕ · · · ⊕ v N ( ⊕ associative, non commutative) ◮ partial result of the Reduce operation: v [ k,m ] = v k ⊕ v 2 ⊕ · · · ⊕ v m ◮ two partial results can be merged: v [ k,m ] = v [ k,l ] ⊕ v [ l +1 ,m ] (computational task T k,l,m ) Loris Marchal Steady state collective communications 17/ 27

Reduce - Reduction trees v 0 v 1 v 2 ◮ Reduce: P 0 P 1 P 2 ◮ each processor P r i owns a value v i v 2 ◮ compute V = v 1 ⊕ v 2 ⊕ · · · ⊕ v N P 2 → P 1 ( ⊕ associative, non commutative) ◮ partial result of the Reduce operation: v [ k,m ] = v k ⊕ v 2 ⊕ · · · ⊕ v m ◮ two partial results can be merged: v [ k,m ] = v [ k,l ] ⊕ v [ l +1 ,m ] (computational task T k,l,m ) Loris Marchal Steady state collective communications 17/ 27

Reduce - Reduction trees v 0 v 1 v 2 ◮ Reduce: P 0 P 1 P 2 ◮ each processor P r i owns a value v i v 2 ◮ compute V = v 1 ⊕ v 2 ⊕ · · · ⊕ v N P 2 → P 1 ( ⊕ associative, non commutative) T 1 , 1 , 2 ◮ partial result of the Reduce P 1 operation: v [ k,m ] = v k ⊕ v 2 ⊕ · · · ⊕ v m ◮ two partial results can be merged: v [ k,m ] = v [ k,l ] ⊕ v [ l +1 ,m ] (computational task T k,l,m ) Loris Marchal Steady state collective communications 17/ 27

Reduce - Reduction trees v 0 v 1 v 2 ◮ Reduce: P 0 P 1 P 2 ◮ each processor P r i owns a value v i v 2 ◮ compute V = v 1 ⊕ v 2 ⊕ · · · ⊕ v N P 2 → P 1 ( ⊕ associative, non commutative) v 0 T 1 , 1 , 2 ◮ partial result of the Reduce P 0 → P 1 P 1 operation: v [ k,m ] = v k ⊕ v 2 ⊕ · · · ⊕ v m ◮ two partial results can be merged: v [ k,m ] = v [ k,l ] ⊕ v [ l +1 ,m ] (computational task T k,l,m ) Loris Marchal Steady state collective communications 17/ 27

Reduce - Reduction trees v 0 v 1 v 2 ◮ Reduce: P 0 P 1 P 2 ◮ each processor P r i owns a value v i v 2 ◮ compute V = v 1 ⊕ v 2 ⊕ · · · ⊕ v N P 2 → P 1 ( ⊕ associative, non commutative) v 0 T 1 , 1 , 2 ◮ partial result of the Reduce P 0 → P 1 P 1 operation: v [ k,m ] = v k ⊕ v 2 ⊕ · · · ⊕ v m T 0 , 0 , 2 P 1 ◮ two partial results can be merged: v [ k,m ] = v [ k,l ] ⊕ v [ l +1 ,m ] (computational task T k,l,m ) Loris Marchal Steady state collective communications 17/ 27

Reduce - Reduction trees v 0 v 1 v 2 ◮ Reduce: P 0 P 1 P 2 ◮ each processor P r i owns a value v i v 2 ◮ compute V = v 1 ⊕ v 2 ⊕ · · · ⊕ v N P 2 → P 1 ( ⊕ associative, non commutative) v 0 T 1 , 1 , 2 ◮ partial result of the Reduce P 0 → P 1 P 1 operation: v [ k,m ] = v k ⊕ v 2 ⊕ · · · ⊕ v m T 0 , 0 , 2 P 1 ◮ two partial results can be merged: v [ k,m ] = v [ k,l ] ⊕ v [ l +1 ,m ] v [0 , 2] (computational task T k,l,m ) P 1 → P 0 Loris Marchal Steady state collective communications 17/ 27

Series of Reduce ◮ each processor P r i owns a set of values v t i (e.g. produced at different time-steps t ) ◮ perform a Reduce operation on each set { v t 1 , . . . , v t N } to compute V t ◮ each reduction uses a reduction tree ◮ two reductions ( t 1 and t 2 ) may use different trees Loris Marchal Steady state collective communications 18/ 27

Linear Program - Notations ◮ s ( P i → P j , v [ k,l ] ) : fractional number of values v [ k,l ] sent on link P i → P j within one time-unit ◮ t ( P i → P j ) fractional occupation time of link P i → P j within one time-unit: 0 � t ( P i → P j ) � 1 ◮ cons ( P i , T k,l,m ) : fractional number of tasks T k,l,m computed on processor P i within one time-unit ◮ α ( P i ) time spent by processor P i computing tasks within one time-unit: 0 � α ( P i ) � 1 ◮ size ( v [ k,m ] ) size of a message containing a value v t [ k,m ] ◮ w ( P i , T k,l,m ) time needed by processor P i to compute one task T k,l,m Loris Marchal Steady state collective communications 19/ 27

Linear Program - Constraints ◮ occupation of a link P i → P j : � t ( P i → P j ) = s ( P i → P j , v [ k,l ] ) × size ( v [ k,l ] ) × c ( i, j ) v [ k,l ] ◮ occupation time of a processor P i : � α ( P i ) = cons ( P i , T k,l,m ) × w ( P i , T k,l,m ) T k,l,m ◮ “conservation law” for packets of type v [ k,m ] : � � s ( P j → P i , v [ k,m ] ) + cons ( P i , T k,l,m ) P j → P i k � l<m � � � = s ( P i → P j , v [ k,m ] ) + cons ( P i , T k,m,n ) + cons ( P i , T n,k − 1 ,m ) n>m P i → P j n<k Loris Marchal Steady state collective communications 20/ 27

Linear Program - Constraints ◮ definition of the throughput: � � TP = s ( P j → P target , v [0 ,m ] )+ cons ( P target , T 0 ,l,N ) P j → P target 0 � l<N − 1 ◮ solve the following linear program over the rational numbers: Steady-State Reduce Problem on a Graph SSRP(G) Maximize TP , subject to all previous constraints Loris Marchal Steady state collective communications 21/ 27

Building a schedule ◮ consider the reduction tree T t associated with the computation of the t th value ( V t ): ◮ a given tree may be used by many time-stamps t ◮ there exists an algorithm which extracts from the solution a set of weighted trees such that ◮ this description is polynomial and ◮ the sum of the weighted trees is equal to the original solution ◮ same use of a weighted edge-coloring algorithm on a bipartite graph to orchestrate the communication Loris Marchal Steady state collective communications 22/ 27

Toy Example for Series of Reduce 0 1 1 1 1 2 1 topology Loris Marchal Steady state collective communications 23/ 27

Toy Example for Series of Reduce 1 T 0 , 0 , 2 0 2 1 3 v [1 , 2] 3 v [1 , 2] 2 3 v [1 , 1] 1 2 1 3 T 1 , 1 , 2 2 3 T 1 , 1 , 2 1 3 v [2 , 2] results of the linear program Loris Marchal Steady state collective communications 23/ 27

Optimizing the steady-state throughput of scatter and reduce - PowerPoint PPT Presentation

Optimizing the steady-state throughput of scatter and reduce operations on heterogeneous platforms Arnaud Legrand, Loris Marchal, Yves Robert Laboratoire de lInformatique du Parall elisme Ecole Normale Sup erieure de Lyon, France

MPI types, Scatter and Scatterv MPI types, Scatter and Scatterv 0 1 2 3 4 5 Logical and

Robust Location and Scatter Estimators Outline for Multivariate Data Analysis Background

PHI: ARCHITECTURAL SUPPORT FOR SYNCHRONIZATION- AND BANDWIDTH-EFFICIENT COMMUTATIVE SCATTER

Optimizing monitoring networks for Optimizing monitoring networks for Optimizing monitoring

Pandemic Telehealth vs Steady-State Telehealth In steady-state telepractice, several

Steady State Temperature Steady State Temperature Profiles in Rods Profiles in Rods Amy Chan

Steady-State Simulation Steady-State Simulation Overview Reading: Ch. 9 in Law & Ch. 15 in

Steady State Equation for Wells After a well has been pumped for an extended period, steady

Evaluation of Improved Scalability Comparison points Throughput (IPC/Node)

Outline DM812 METAHEURISTICS Lecture 5 1. Resume Scatter Search and Path Relinking Marco

Robust Statistics Part 2: Multivariate location and scatter Peter Rousseeuw LARS-IASC School,

Robust scatter regularization G. Haesbroeck and C. Croux University of Li` ege - University of

Scatter Creek Aquifer Area S eptic S ystem Management Proj ect Public Health and Social

Plotting Data March 5, 2010 Derek Ruths Why plot data programmatically? Different kinds of

Making a scatter plot IN TR OD U C TION TO DATA SC IE N C E IN P YTH ON Hillar y Green -

Estatstica e Modelos Probabilsticos - COE241 Aula de hoje Introduo a Regresso linear

Euler characteristic. ( K ) = f 0 ( K ) f 1 ( K ) + f 2 ( K ) . . . 1

CSL202: Discrete Mathematical Structures Ragesh Jaiswal, CSE, IIT Delhi Ragesh Jaiswal, CSE, IIT

Lecture 13: Location Reachability (or: The Region Automaton) 2014-07-15 Dr. Bernd Westphal

Internal Model Principle 1. v G c ( z ) = S c G ( z ) = B r e u y R c A ( z ) =

NUMBER THEORY IN THE STONE- CECH COMPACTIFICATION Boris Sobot Department of Mathematics

Toeplitz algebras of Baumslag-Solitar semigroups Astrid an Huef Department of Mathematics and

CS 764: Topics in Database Management Systems Lecture 9: B-tree Locking Xiangyao Yu 10/5/2020 1

E ff ect handlers in OCaml KC Sivaramakrishnan 1 & Stephen Dolan 1 Leo White 2 , Jeremy Yallop

Sambuz

Useful Links

Newsletter

Mail Us

Optimizing the steady-state throughput of scatter and reduce - PowerPoint PPT Presentation

Optimizing the steady-state throughput of scatter and reduce operations on heterogeneous platforms Arnaud Legrand, Loris Marchal, Yves Robert Laboratoire de lInformatique du Parall elisme Ecole Normale Sup erieure de Lyon, France

MPI types, Scatter and Scatterv MPI types, Scatter and Scatterv 0 1 2 3 4 5 Logical and

Robust Location and Scatter Estimators Outline for Multivariate Data Analysis Background

PHI: ARCHITECTURAL SUPPORT FOR SYNCHRONIZATION- AND BANDWIDTH-EFFICIENT COMMUTATIVE SCATTER

Optimizing monitoring networks for Optimizing monitoring networks for Optimizing monitoring

Pandemic Telehealth vs Steady-State Telehealth In steady-state telepractice, several

Steady State Temperature Steady State Temperature Profiles in Rods Profiles in Rods Amy Chan

Steady-State Simulation Steady-State Simulation Overview Reading: Ch. 9 in Law &amp; Ch. 15 in

Steady State Equation for Wells After a well has been pumped for an extended period, steady

Evaluation of Improved Scalability Comparison points Throughput (IPC/Node)

Outline DM812 METAHEURISTICS Lecture 5 1. Resume Scatter Search and Path Relinking Marco

Robust Statistics Part 2: Multivariate location and scatter Peter Rousseeuw LARS-IASC School,

Robust scatter regularization G. Haesbroeck and C. Croux University of Li` ege - University of

Scatter Creek Aquifer Area S eptic S ystem Management Proj ect Public Health and Social

Plotting Data March 5, 2010 Derek Ruths Why plot data programmatically? Different kinds of

Making a scatter plot IN TR OD U C TION TO DATA SC IE N C E IN P YTH ON Hillar y Green -

Estatstica e Modelos Probabilsticos - COE241 Aula de hoje Introduo a Regresso linear

Euler characteristic. ( K ) = f 0 ( K ) f 1 ( K ) + f 2 ( K ) . . . 1

CSL202: Discrete Mathematical Structures Ragesh Jaiswal, CSE, IIT Delhi Ragesh Jaiswal, CSE, IIT

Lecture 13: Location Reachability (or: The Region Automaton) 2014-07-15 Dr. Bernd Westphal

Internal Model Principle 1. v G c ( z ) = S c G ( z ) = B r e u y R c A ( z ) =

NUMBER THEORY IN THE STONE- CECH COMPACTIFICATION Boris Sobot Department of Mathematics

Toeplitz algebras of Baumslag-Solitar semigroups Astrid an Huef Department of Mathematics and

CS 764: Topics in Database Management Systems Lecture 9: B-tree Locking Xiangyao Yu 10/5/2020 1

E ff ect handlers in OCaml KC Sivaramakrishnan 1 &amp; Stephen Dolan 1 Leo White 2 , Jeremy Yallop

Sambuz

Useful Links

Newsletter

Mail Us

Steady-State Simulation Steady-State Simulation Overview Reading: Ch. 9 in Law & Ch. 15 in

E ff ect handlers in OCaml KC Sivaramakrishnan 1 & Stephen Dolan 1 Leo White 2 , Jeremy Yallop