Parallel Jacobian Accumulation Ebadollah Varnik Uwe Naumann RWTH - - PowerPoint PPT Presentation

parallel jacobian accumulation
SMART_READER_LITE
LIVE PREVIEW

Parallel Jacobian Accumulation Ebadollah Varnik Uwe Naumann RWTH - - PowerPoint PPT Presentation

Parallel Jacobian Accumulation Ebadollah Varnik Uwe Naumann RWTH Aachen University Content Introduction Definitions Jacobian Accumulation Parallel Approach General Idea Data Race Problem Atomic Sub-Graphs Implementation Extended


slide-1
SLIDE 1

Parallel Jacobian Accumulation

Ebadollah Varnik Uwe Naumann RWTH Aachen University

slide-2
SLIDE 2

Content

Introduction Definitions Jacobian Accumulation Parallel Approach General Idea Data Race Problem Atomic Sub-Graphs Implementation Extended Jacobian Compressed Row Storage

slide-3
SLIDE 3

Definitions

Consider the vector function f : I Rn=2 → I Rm=2 with y0 y1

  • = f (v) =

exp((v0 · v1) + sin(v0 · v1)) cos((v0 · v1) + sin(v0 · v1))

  • The code list of f is the following

v2 := v0 · v1; v3 := sin(v2); v4 := v2 + v3; v5 := exp(v4); v6 := cos(v4); y0 := v5; y1 := v6;

slide-4
SLIDE 4

Jacobian Accumulation

f ′ by elimination of intermediate vertices v4, v3,v2:

  • +

* =

v6 v6 v6 v5 v5 v5 v4 v3 v3 v2 v2 v1 v1 v1 v0 v0 v0 c2,0 c2,0 c2,1=[v0] c2,1=[v0] c3,2 c3,2 c4,3 c4,2 c5,4 c6,4 c5,3 c6,2 c6,3 c6,0 c6,1 c5,0 c5,1 c5,4 · c4,2 c5,2 elim(v4) elim(v3) elim(v2) sin cos exp

slide-5
SLIDE 5

General Idea (1)

  • 1. Graph Decomposition into Sub-graphs Gi

◮ local independent and dependent vertices

  • 2. Parallel Vertex Elimination on Sub-graphs

◮ back-elimination of out-edges of local intermediate vertices

  • 3. Main Focus is on

◮ Correctness

→ Data Race caused by out-of-range edges

◮ Load Balancing

slide-6
SLIDE 6

General Idea (2)

+ * * + *

13 13 12 12 11 11 10 10 9 9 8 7 6 6 5 5 4 4 3 3 2 2 1 1 sin c9,4 c9,4 c9,7 c10,7 c10,8 c8,6 c7,4 c7,5 c9,5 c10,4 c10,5 c10,6 (8,7)

G ′

3

G2 G ′

1 2*3 3*3 24 18 48 54 (10) (28) (26) (12) 3*2 3*4 3*4 2*4 3*4

slide-7
SLIDE 7

General Idea (3)

Reduction Reduction Elimination Elimination Master Slave1 Slave2

slide-8
SLIDE 8

Data Race Problem

  • v6

v6 v5 v5 v4 v4 v3 v3 v2 v2 v1 v1 v0 v0

c2,0 c2,0 c2,1 c2,1 c3,2 c4,3 c4,3 c4,2 c4,2 c5,4 c5,4 c6,4 c6,4 c3,0 c3,1

elim(v4) elim(v2) t2 t1 t3

slide-9
SLIDE 9

Atomic Sub-Graphs (1)

  • v6

v6 v5 v5 v4 v3 v3 v2 v2 v1 v1 v0 v0 v i

6

v i

6

v i

5

v i

5

v i

4

v i

3

v i

3

v i

2

v i

2

v i

1

v i

1

v i v i

c2,0 c2,0 c2,1 c2,1 c3,2 c3,2 c4,3 c4,2 c5,4 c6,4 ci

2,0

ci

2,0

ci

2,1

ci

2,1

ci

3,2

ci

3,2

ci

4,3

ci

4,2

ci

5,4

ci

6,4

c5,3 c6,3 c5,2 c6,2 ci

5,3

ci

6,3

ci

5,2

ci

6,2

elim(v i

4)

elim(v4) ti t1

slide-10
SLIDE 10

Atomic Sub-Graphs (1)

  • v6

v6 v5 v5 v3 v2 v1 v1 v0 v0 v i

6

v i

6

v i

5

v i

5

v i

3

v i

2

v i

1

v i

1

v i v i

c2,0 c2,1 c3,2 c5,3 c6,3 c5,2 c6,2 ci

2,0

ci

3,2

ci

5,3

ci

6,3

ci

5,2

ci

6,2

ci

5,0

ci

5,1

ci

6,0

ci

6,1

c5,0 c5,1 c6,0 c6,1

elim(v i

3)

elim(v i

2)

elim(v3) elim(v2) ti t1

slide-11
SLIDE 11

Atomic Code Example

Overloaded function with atomic call:

  • 1. void foo (int n, active [2] x)

{ 2. for (int i=0; i < n; i++) { 3. atomic(); 4. x[0] = exp( (x[0]∗x[1]) + sin(x[0]∗x[1]) ); 5. x[1] = cos( (x[0]∗x[1]) + sin(x[0]∗x[1]) ); 6. }

  • 7. }
slide-12
SLIDE 12

Implementation

  • 1. Pattern Detection Mode

◮ Generation of Binary Pattern of C ′ by overloading ◮ Symbolic elimination on Binary Pattern for fill-in detection, ◮ Allocation of Compressed Row Storage CRS

  • 2. Accumulation Mode

◮ Initialization of CRS by overloading ◮ Row Elimination on CRS ◮ Jacobian extraction.

slide-13
SLIDE 13

Extended Jacobian

The extended Jacobian C ′ of f is the following C ′ =         

v0 v1 c2,0 c2,1 v2 c3,2 v3 c4,2 c4,3 v4 c5,4 v5 c6,4 v6

         f ′ by elimination of intermediate rows v4, v3,v2:

elim(v4)

− →         

v0 v1 c2,0 c2,1 v2 c3,2 v3 v4 c5,4.c4,2 c5,4.c4,3 v5 c6,4 · c4,2 c6,4 · c4,3 v6

        

slide-14
SLIDE 14

Compressed Row Storage

C ′ =         

v0 v1 c2,0 c2,1 v2 c3,2 v3 c4,2 c4,3 v4 c5,4 v5 c6,4 v6

         CRS scheme for C ′ with Fill-in: α =[c2,0, c2,1, c3,2, c4,2, c4,3, 0, 0, 0, 0, c5,4, 0, 0, 0, 0, c6,4] κ =[0, 1, 2, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4] ρ =[ 0, 0

  • v0,v1

,

  • v2

, 2

  • v3

, 3

  • v4

, 5

  • v5

, 9

  • v6

, 13]