Parallel Jacobian Accumulation Ebadollah Varnik Uwe Naumann RWTH - - PowerPoint PPT Presentation
Parallel Jacobian Accumulation Ebadollah Varnik Uwe Naumann RWTH - - PowerPoint PPT Presentation
Parallel Jacobian Accumulation Ebadollah Varnik Uwe Naumann RWTH Aachen University Content Introduction Definitions Jacobian Accumulation Parallel Approach General Idea Data Race Problem Atomic Sub-Graphs Implementation Extended
Content
Introduction Definitions Jacobian Accumulation Parallel Approach General Idea Data Race Problem Atomic Sub-Graphs Implementation Extended Jacobian Compressed Row Storage
Definitions
Consider the vector function f : I Rn=2 → I Rm=2 with y0 y1
- = f (v) =
exp((v0 · v1) + sin(v0 · v1)) cos((v0 · v1) + sin(v0 · v1))
- The code list of f is the following
v2 := v0 · v1; v3 := sin(v2); v4 := v2 + v3; v5 := exp(v4); v6 := cos(v4); y0 := v5; y1 := v6;
Jacobian Accumulation
f ′ by elimination of intermediate vertices v4, v3,v2:
- +
* =
v6 v6 v6 v5 v5 v5 v4 v3 v3 v2 v2 v1 v1 v1 v0 v0 v0 c2,0 c2,0 c2,1=[v0] c2,1=[v0] c3,2 c3,2 c4,3 c4,2 c5,4 c6,4 c5,3 c6,2 c6,3 c6,0 c6,1 c5,0 c5,1 c5,4 · c4,2 c5,2 elim(v4) elim(v3) elim(v2) sin cos exp
General Idea (1)
- 1. Graph Decomposition into Sub-graphs Gi
◮ local independent and dependent vertices
- 2. Parallel Vertex Elimination on Sub-graphs
◮ back-elimination of out-edges of local intermediate vertices
- 3. Main Focus is on
◮ Correctness
→ Data Race caused by out-of-range edges
◮ Load Balancing
General Idea (2)
+ * * + *
13 13 12 12 11 11 10 10 9 9 8 7 6 6 5 5 4 4 3 3 2 2 1 1 sin c9,4 c9,4 c9,7 c10,7 c10,8 c8,6 c7,4 c7,5 c9,5 c10,4 c10,5 c10,6 (8,7)
G ′
3
G2 G ′
1 2*3 3*3 24 18 48 54 (10) (28) (26) (12) 3*2 3*4 3*4 2*4 3*4
General Idea (3)
Reduction Reduction Elimination Elimination Master Slave1 Slave2
Data Race Problem
- v6
v6 v5 v5 v4 v4 v3 v3 v2 v2 v1 v1 v0 v0
c2,0 c2,0 c2,1 c2,1 c3,2 c4,3 c4,3 c4,2 c4,2 c5,4 c5,4 c6,4 c6,4 c3,0 c3,1
elim(v4) elim(v2) t2 t1 t3
Atomic Sub-Graphs (1)
- v6
v6 v5 v5 v4 v3 v3 v2 v2 v1 v1 v0 v0 v i
6
v i
6
v i
5
v i
5
v i
4
v i
3
v i
3
v i
2
v i
2
v i
1
v i
1
v i v i
c2,0 c2,0 c2,1 c2,1 c3,2 c3,2 c4,3 c4,2 c5,4 c6,4 ci
2,0
ci
2,0
ci
2,1
ci
2,1
ci
3,2
ci
3,2
ci
4,3
ci
4,2
ci
5,4
ci
6,4
c5,3 c6,3 c5,2 c6,2 ci
5,3
ci
6,3
ci
5,2
ci
6,2
elim(v i
4)
elim(v4) ti t1
Atomic Sub-Graphs (1)
- v6
v6 v5 v5 v3 v2 v1 v1 v0 v0 v i
6
v i
6
v i
5
v i
5
v i
3
v i
2
v i
1
v i
1
v i v i
c2,0 c2,1 c3,2 c5,3 c6,3 c5,2 c6,2 ci
2,0
ci
3,2
ci
5,3
ci
6,3
ci
5,2
ci
6,2
ci
5,0
ci
5,1
ci
6,0
ci
6,1
c5,0 c5,1 c6,0 c6,1
elim(v i
3)
elim(v i
2)
elim(v3) elim(v2) ti t1
Atomic Code Example
Overloaded function with atomic call:
- 1. void foo (int n, active [2] x)
{ 2. for (int i=0; i < n; i++) { 3. atomic(); 4. x[0] = exp( (x[0]∗x[1]) + sin(x[0]∗x[1]) ); 5. x[1] = cos( (x[0]∗x[1]) + sin(x[0]∗x[1]) ); 6. }
- 7. }
Implementation
- 1. Pattern Detection Mode
◮ Generation of Binary Pattern of C ′ by overloading ◮ Symbolic elimination on Binary Pattern for fill-in detection, ◮ Allocation of Compressed Row Storage CRS
- 2. Accumulation Mode
◮ Initialization of CRS by overloading ◮ Row Elimination on CRS ◮ Jacobian extraction.
Extended Jacobian
The extended Jacobian C ′ of f is the following C ′ =
v0 v1 c2,0 c2,1 v2 c3,2 v3 c4,2 c4,3 v4 c5,4 v5 c6,4 v6
f ′ by elimination of intermediate rows v4, v3,v2:
elim(v4)
− →
v0 v1 c2,0 c2,1 v2 c3,2 v3 v4 c5,4.c4,2 c5,4.c4,3 v5 c6,4 · c4,2 c6,4 · c4,3 v6
Compressed Row Storage
C ′ =
v0 v1 c2,0 c2,1 v2 c3,2 v3 c4,2 c4,3 v4 c5,4 v5 c6,4 v6
CRS scheme for C ′ with Fill-in: α =[c2,0, c2,1, c3,2, c4,2, c4,3, 0, 0, 0, 0, c5,4, 0, 0, 0, 0, c6,4] κ =[0, 1, 2, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4] ρ =[ 0, 0
- v0,v1
,
- v2
, 2
- v3
, 3
- v4
, 5
- v5
, 9
- v6