1
ni.com
A Graphical Dataflow Programming Approach To High Performance - - PowerPoint PPT Presentation
A Graphical Dataflow Programming Approach To High Performance Computing Somashekar acharya G. Bhaskaracharya National Instruments Bangalore ni.com 1 Outline Graphical Dataflow Programming LabVIEW Introduction and Demo LabVIEW
1
ni.com
2
ni.com
3
ni.com
Binary Assembly Text Based: Fortran, Pascal C / C++ C#, Java, Python, Ruby LabVIEW
4
ni.com
// s = ut + 0.5a*t*t double displacement_in_time_t(double time, double initial_velocity, double acceleration) { double displacement = initial_velocity * time; displacement += 0.5 * acceleration * time * time; return displacement; }
5
ni.com
// s = ut + 0.5a*t*t double displacement_in_time_t(double time, double initial_velocity, double acceleration) { double displacement = initial_velocity * time; displacement += 0.5 * acceleration * time * time; return displacement; }
6
ni.com
7
ni.com
Strictly Sequential
8
ni.com
Strictly Sequential
Exploiting inherent parallelism
9
ni.com
10
ni.com
11
ni.com
12
ni.com
13
ni.com
14
ni.com
Measurement Control I/O Deployable Math and Analysis User Interface Technology Integration
15
ni.com
16
ni.com
17
ni.com
18
ni.com
19
ni.com
Shift registers to propagate data across iterations
Unindexed tunnels propagate same data every iteration Indexed tunnels
20
ni.com
21
ni.com
mov byte ptr [esi+29h],0 mov eax,dword ptr [esi+18h] mov ebp,dword ptr [esi+14h] mov dword ptr [esi+0Ch],eax cmp byte ptr [esi+2Ah],1 je 0ABFFE0F mov eax,dword ptr [esi+1Ch] mov eax,dword ptr [eax+14h] test eax,eax je 0ABFFCEF cmp byte ptr [eax+2Ah],1 jne 0ABFFCEF jmp 0ABFFE0F mov ecx,dword ptr [ebp+44h] xor eax,eax mov edx,1 lock cmpxchg dword ptr [ecx],edx test eax,eax jne 0ABFFCEF mov eax,dword ptr [esi+1Ch] lea ecx,[ebp+4Ch] mov dword ptr [eax+10h],ecx mov dword ptr [ebp+68h],eax mov dword ptr [ebp+48h],esi cmp dword ptr [eax+14h],0 jne 0ABFFD90 mov dword ptr [eax+14h],esi mov byte ptr [ebp+1Eh],1 cmp dword ptr [esi+30h],2 je 0ABFFE39 mov byte ptr [ebp+1Bh],1 mov esi,dword ptr [ebp+360h] mov esi,dword ptr [esi] mov dword ptr [ebp+37Ch],esi
inc dword rd ptr [ebp+37Ch Ch] ]
mov esi,dword ptr [ebp+48h] cmp byte ptr [esi+3Dh],1 mov eax,dword ptr [ebp+68h] je 0ABFFE09 cmp dword ptr [eax+28h],0 jne 0ABFFE1F mov dword ptr [ebp+48h],0 mov dword ptr [eax+10h],esi mov byte ptr [ebp+1Eh],0 mov ecx,dword ptr [ebp+44h] mov dword ptr [ecx],0 cmp dword ptr [eax+14h],esi jne 0ABFFE0F mov dword ptr [eax+14h],0 cmp byte ptr [esi+29h],5 jne 0ABFFE0F mov dword ptr [esi+29h],2 xor eax,eax jmp 0ABFFD13 mov dword ptr [esi+1Ch],eax mov dword ptr [eax+10h],esi mov edx,dword ptr [esi+8] mov ecx,dword ptr [esi+0Ch] mov eax,esi add esp,8 pop esi mov ebp,edx jmp ecx add ebp,3Ch mov dword ptr [esp],ebp call SubrVIExit (24D6450h) test eax,eax je 0ABFFE02 mov esi,eax jmp 0ABFFE0F mov byte ptr [ebp+1Bh],0 jmp 0ABFFD90
22
ni.com
23
ni.com
24
ni.com
25
ni.com
26
ni.com
27
ni.com
28
ni.com
29
ni.com
30
ni.com
31
ni.com
32
ni.com
33
ni.com
34
ni.com
35
ni.com
36
ni.com
37
ni.com
38
ni.com
39
ni.com
40
ni.com
41
ni.com
42
ni.com
43
ni.com
44
ni.com
45
ni.com
46
ni.com
47
ni.com
48
ni.com
49
ni.com
50
ni.com
for(t = 1; t < T; ++t) for(i = 1; i < N; ++i) for(j = 1; j < N; ++j) grid[t][i][j] = f(grid[t-1][i-1][j], grid[t-1][i+1][j], grid[t-1][i][j-1], grid[t-1][i][j+1]);
51
ni.com
independently
for(t = 1; t < T; ++t) for(i = 1; i < N; ++i) for(j = 1; j < N; ++j) grid[t][i][j] = f(grid[t-1][i-1][j], grid[t-1][i+1][j], grid[t-1][i][j-1], grid[t-1][i][j+1]);
52
ni.com
independently
separate core
represent data exchange
for(t = 1; t < T; ++t) for(i = 1; i < N; ++i) for(j = 1; j < N; ++j) grid[t][i][j] = f(grid[t-1][i-1][j], grid[t-1][i+1][j], grid[t-1][i][j-1], grid[t-1][i][j+1]);
53
ni.com
54
ni.com
55
ni.com
56
ni.com
57
ni.com
58
ni.com
59
ni.com
60
ni.com
Auto-parallelization of for loop
61
ni.com
Auto-parallelization of for loop
62
ni.com
Auto-parallelization of for loop
parallel loop instances
represents independently schedulable clump
63
ni.com
64
ni.com
65
ni.com
66
ni.com
Data carried across iterations through shift registers
67
ni.com
Data carried across iterations through shift registers
for (int i = 1; i < N; ++i) for (int j = 1; j < N; ++j) a[i][j] = a[i-1][j] + 1; Can any loop be parallelized here?
68
ni.com
Data carried across iterations through shift registers
for (int i = 1; i < N; ++i) for (int j = 1; j < N; ++i) a[i][j] = a[i-1][j] + 1; Can any loop be parallelized here?
69
ni.com
Data carried across iterations through shift registers One iteration should not depend on results of another
LabVIEW automatically does cross-iteration dependence analysis
for (int i = 1; i < N; ++i) for (int j = 1; j < N; ++i) a[i][j] = a[i-1][j] + 1; Can any loop be parallelized here?
70
ni.com
71
ni.com
None of these loops can be parallelized Loop-nest is inner parallel
72
ni.com
None of these loops can be parallelized Loop-nest is inner parallel
73
ni.com
None of these loops can be parallelized Loop-nest is inner parallel
74
ni.com
None of these loops can be parallelized Loop-nest is inner parallel
75
ni.com
76
ni.com
77
ni.com
78
ni.com
79
ni.com
80
ni.com
Schedule of the statement instances is given by theta(i, j) = (i, j)
81
ni.com
Schedule of the statement instances is given by theta(i, j) = (i, j) New schedule is theta( i, j) = (i+j, j)
82
ni.com
83
ni.com
84
ni.com
85
ni.com
86
ni.com
87
ni.com
88
ni.com
89
ni.com
90
ni.com
91
ni.com
92
ni.com
93
ni.com
94
ni.com
95
ni.com