parallel jacobian accumulation
play

Parallel Jacobian Accumulation Ebadollah Varnik Uwe Naumann RWTH - PowerPoint PPT Presentation

Parallel Jacobian Accumulation Ebadollah Varnik Uwe Naumann RWTH Aachen University Content Introduction Definitions Jacobian Accumulation Parallel Approach General Idea Data Race Problem Atomic Sub-Graphs Implementation Extended


  1. Parallel Jacobian Accumulation Ebadollah Varnik Uwe Naumann RWTH Aachen University

  2. Content Introduction Definitions Jacobian Accumulation Parallel Approach General Idea Data Race Problem Atomic Sub-Graphs Implementation Extended Jacobian Compressed Row Storage

  3. Definitions R n =2 → I R m =2 with Consider the vector function f : I � y 0 � � exp(( v 0 · v 1 ) + sin( v 0 · v 1 )) � = f ( v ) = y 1 cos(( v 0 · v 1 ) + sin( v 0 · v 1 )) The code list of f is the following v 2 := v 0 · v 1 ; v 3 := sin( v 2 ); v 4 := v 2 + v 3 ; v 5 := exp( v 4 ); v 6 := cos( v 4 ); y 0 := v 5 ; y 1 := v 6 ;

  4. Jacobian Accumulation f ′ by elimination of intermediate vertices v 4 , v 3 , v 2 : v 5 v 5 v 6 v 6 exp cos c 5 , 4 c 6 , 4 v 5 + v 6 � � c 5 , 3 � � � � �� �� v 4 � � � � �� �� c 5 , 4 · c 4 , 2 c 6 , 3 = c 4 , 3 c 6 , 2 c 5 , 1 c 5 , 2 �� �� �� �� elim( v 4 ) v 3 elim( v 3 ) �� �� sin �� �� v 3 c 4 , 2 c 5 , 0 �� �� �� �� c 6 , 1 elim( v 2 ) c 3 , 2 c 3 , 2 c 6 , 0 �� �� �� �� v 2 v 2 �� �� �� �� * c 2 , 1 =[ v 0 ] c 2 , 1 =[ v 0 ] c 2 , 0 c 2 , 0 v 0 v 1 v 1 v 1 v 0 v 0

  5. General Idea (1) 1. Graph Decomposition into Sub-graphs G i ◮ local independent and dependent vertices 2. Parallel Vertex Elimination on Sub-graphs ◮ back-elimination of out-edges of local intermediate vertices 3. Main Focus is on ◮ Correctness → Data Race caused by out-of-range edges ◮ Load Balancing

  6. General Idea (2) 11 12 13 12 13 11 3*2 G ′ 3 + * 18 48 10 + 9 9 10 * 3*3 3*4 (28) (10) c 9 , 7 c 10 , 7 c 10 , 8 c 9 , 4 c 9 , 5 c 9 , 4 G 2 sin c 10 , 6 8 2*3 7 * (8,7) c 10 , 5 c 10 , 4 c 7 , 4 c 7 , 5 c 8 , 6 24 54 2*4 3*4 4 5 6 4 5 6 (12) (26) G ′ 3*4 1 0 1 2 3 0 1 2 3

  7. General Idea (3) Master Reduction Reduction Elimination Elimination Slave 1 Slave 2

  8. Data Race Problem v 5 v 6 v 5 v 6 c 5 , 4 c 5 , 4 c 6 , 4 c 6 , 4 � � elim( v 4 ) � � � � v 4 v 4 t 2 � � � � c 4 , 3 c 4 , 3 c 4 , 2 �� �� �� �� �� �� �� �� v 3 v 3 c 4 , 2 �� �� �� �� t 3 c 3 , 2 �� �� �� �� elim( v 2 ) v 2 c 3 , 1 v 2 �� �� �� �� t 1 c 3 , 0 c 2 , 1 c 2 , 1 c 2 , 0 c 2 , 0 v 0 v 1 v 1 v 0

  9. Atomic Sub-Graphs (1) v i v i v i v i 5 6 5 6 c i c i 5 , 4 6 , 4 � � c i v i c i 5 , 3 � � 6 , 3 4 � � c i � � c i 4 , 3 c i v i 5 , 2 � � 6 , 2 elim( v i �� �� v i 4 ) t i c i 3 4 , 2 3 �� �� c i 3 , 2 c i �� �� �� �� 3 , 2 v i v i �� �� �� �� 2 2 �� �� �� �� c i c i c i c i 2 , 1 2 , 1 2 , 0 2 , 0 v i v i v i v i 1 1 0 0 v 5 v 6 v 5 v 6 c 5 , 4 �� �� c 6 , 4 �� �� c 5 , 3 c 6 , 3 v 4 �� �� c 4 , 3 � � c 5 , 2 c 6 , 2 v 3 �� �� elim( v 4 ) � � v 3 t 1 c 4 , 2 �� �� c 3 , 2 c 3 , 2 � � v 2 � � � � v 2 � � � � c 2 , 1 c 2 , 0 c 2 , 0 c 2 , 1 v 1 v 0 v 0 v 1

  10. Atomic Sub-Graphs (1) v i v i v i v i 5 6 5 6 c i c i 5 , 3 6 , 3 c i 5 , 1 �� �� v i c i c i �� �� 3 6 , 2 c i elim( v i 3 ) t i 5 , 2 5 , 0 c i c i 6 , 1 3 , 2 elim( v i 2 ) �� �� v i c i 2 �� �� 6 , 0 c i 2 , 0 v i v i v i v i 0 1 1 0 v 5 v 5 v 6 v 6 c 5 , 3 c 6 , 3 c 5 , 1 � � � � c 5 , 2 c 6 , 2 c 5 , 0 v 3 � � elim( v 3 ) c 6 , 1 t 1 c 3 , 2 elim( v 2 ) � � v 2 c 6 , 0 � � c 2 , 0 c 2 , 1 v 0 v 1 v 1 v 0

  11. Atomic Code Example Overloaded function with atomic call: 1. void foo (int n, active [2] x) { 2. for (int i=0; i < n; i++) { 3. atomic(); 4. x[0] = exp( (x[0] ∗ x[1]) + sin(x[0] ∗ x[1]) ); 5. x[1] = cos( (x[0] ∗ x[1]) + sin(x[0] ∗ x[1]) ); 6. } 7. }

  12. Implementation 1. Pattern Detection Mode ◮ Generation of Binary Pattern of C ′ by overloading ◮ Symbolic elimination on Binary Pattern for fill-in detection, ◮ Allocation of Compressed Row Storage CRS 2. Accumulation Mode ◮ Initialization of CRS by overloading ◮ Row Elimination on CRS ◮ Jacobian extraction.

  13. Extended Jacobian The extended Jacobian C ′ of f is the following   v 0 0 v 1     c 2 , 0 c 2 , 1 v 2   C ′ =   0 0 c 3 , 2 v 3     0 0 c 4 , 2 c 4 , 3 v 4   0 0 0 0 c 5 , 4 v 5   0 0 0 0 c 6 , 4 0 v 6 f ′ by elimination of intermediate rows v 4 , v 3 , v 2 :   v 0 0 v 1     c 2 , 0 c 2 , 1 v 2   elim ( v 4 )   − → 0 0 c 3 , 2 v 3    0 0  0 0 v 4   0 0  c 5 , 4 . c 4 , 2 c 5 , 4 . c 4 , 3 0 v 5  0 0 c 6 , 4 · c 4 , 2 c 6 , 4 · c 4 , 3 0 0 v 6

  14. Compressed Row Storage   v 0 0 v 1     c 2 , 0 c 2 , 1 v 2   C ′ =   0 0 c 3 , 2 v 3    0 0  c 4 , 2 c 4 , 3 v 4   0 0 0 0  c 5 , 4 v 5  0 0 0 0 0 c 6 , 4 v 6 CRS scheme for C ′ with Fill-in: α =[ c 2 , 0 , c 2 , 1 , c 3 , 2 , c 4 , 2 , c 4 , 3 , 0 , 0 , 0 , 0 , c 5 , 4 , 0 , 0 , 0 , 0 , c 6 , 4 ] κ =[0 , 1 , 2 , 2 , 3 , 0 , 1 , 2 , 3 , 4 , 0 , 1 , 2 , 3 , 4] ρ =[ 0 , 0 0 2 3 5 9 , 13] , , , , , ���� ���� ���� ���� ���� ���� v 2 v 3 v 4 v 5 v 6 v 0 , v 1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend