CUDA-Based Implementation of GSLIB: The Geostatistical Software Library
Daniel Baeza
dabaeza@alges.cl
Oscar Peredo
- peredo@alges.cl
CUDA-Based Implementation of GSLIB: The Geostatistical Software - - PowerPoint PPT Presentation
CUDA-Based Implementation of GSLIB: The Geostatistical Software Library Daniel Baeza Oscar Peredo dabaeza@alges.cl operedo@alges.cl The Mining Process Exploration Evaluation Planning Operation GSLIB: The Geostatistical Software Library
Daniel Baeza
dabaeza@alges.cl
Oscar Peredo
Evaluation Exploration Planning Operation
GSLIB: The Geostatistical Software Library
applications related with geostatistics
GSLIB: Geostatistical Software Library and user's guide (1998) Deutsch, Clayton V, Journel, André G
γ
Distance Normal Scores Semivariogram Low Solubil 0. 100. 200. 300. 400. 500. .00 .20 .40 .60 .80 1.00 1.20γ
Distance Normal Scores Semivariogram High Solubi 0. 100. 200. 300. 400. 500. .00 .20 .40 .60 .80 1.00 1.20x z y Z(u) Z(u + h) b a a b
|| = h 2!(h) = 1 ∑ [z(u) - z(u + h)]
2
N(h) more variability !(h) less variability h
Input:
1 Read input parameter file; 2 Read sample data values file; 3 β ← zeros(nvar × nlag × ndir × nvarg); 4 for i ∈ {1, . . . , |Ω|} do 5
for j ∈ {i, . . . , |Ω|} do
6
for id ∈ {1, . . . , ndir} do
7
for iv ∈ {1, . . . , nvarg} do
8
for il ∈ {1, . . . , nlag} do
9
pi = (xi, yi, zi) ∈ Ω;
10
pj = (xj, y j, zj) ∈ Ω;
11
if (pi, p j) satisfy tolerances τid and ||pi − pj|| ≈ hid × il × h then
12
Save (VRi,ivheadiv or VRi,ivtailiv) and (VRj,ivheadiv or VRj,ivtailiv) according to variogram ivtypeiv into β;
13 γ ← build variogram using statistics β Write γ in the output file
Output: Output file with γ values
Setup parameters & Load data Main computation: Loop over pairs of points Read computation results
Input:
1 Read input parameter file; 2 Read sample data values file; 3 β ← zeros(nvar × nlag × ndir × nvarg); 4 for i ∈ {1, . . . , |Ω|} do 5
for j ∈ {i, . . . , |Ω|} do
6
for id ∈ {1, . . . , ndir} do
7
for iv ∈ {1, . . . , nvarg} do
8
for il ∈ {1, . . . , nlag} do
9
pi = (xi, yi, zi) ∈ Ω;
10
pj = (xj, y j, zj) ∈ Ω;
11
if (pi, p j) satisfy tolerances τid and ||pi − pj|| ≈ hid × il × h then
12
Save (VRi,ivheadiv or VRi,ivtailiv) and (VRj,ivheadiv or VRj,ivtailiv) according to variogram ivtypeiv into β;
13 γ ← build variogram using statistics β Write γ in the output file
Output: Output file with γ values
Input:
1 Read input parameter file; 2 Read sample data values file; 3 β ← zeros(nvar × nlag × ndir × nvarg); 4 for i ∈ {1, . . . , |Ω|} do 5
for j ∈ {i, . . . , |Ω|} do
6
for id ∈ {1, . . . , ndir} do
7
for iv ∈ {1, . . . , nvarg} do
8
for il ∈ {1, . . . , nlag} do
9
pi = (xi, yi, zi) ∈ Ω;
10
pj = (xj, y j, zj) ∈ Ω;
11
if (pi, p j) satisfy tolerances τid and ||pi − pj|| ≈ hid × il × h then
12
Save (VRi,ivheadiv or VRi,ivtailiv) and (VRj,ivheadiv or VRj,ivtailiv) according to variogram ivtypeiv into β;
13 γ ← build variogram using statistics β Write γ in the output file
Output: Output file with γ values
1 Read input parameter file; 2 Read sample data values file; 3 β ← zeros(nvar × nlag × ndir × nvarg); 4 for i ∈ {1, . . . , |Ω|} do 5
for j ∈ {i, . . . , |Ω|} do
6
for id ∈ {1, . . . , ndir} do
7
for iv ∈ {1, . . . , nvarg} do
8
for il ∈ {1, . . . , nlag} do
9
pi = (xi, yi, zi) ∈ Ω;
10
pj = (xj, yj, zj) ∈ Ω;
11
if (pi, pj) satisfy tolerances τid and ||pi − pj|| ≈ hid × il × h then
12
Save (VRi,ivheadiv or VRi,ivtailiv) and (VRj,ivheadiv or VRj,ivtailiv) according to variogram ivtypeiv into β;
13 γ ← build variogram using statistics β Write γ in the output file
Output: Output file with γ values
compute statistics(pi, pj) store result in shared memory
1 Read input parameter file; 2 Read sample data values file; 3 β ← zeros(nvar × nlag × ndir × nvarg); 4 for i ∈ {1, . . . , |Ω|} do 5
for j ∈ {i, . . . , |Ω|} do
6
for id ∈ {1, . . . , ndir} do
7
for iv ∈ {1, . . . , nvarg} do
8
for il ∈ {1, . . . , nlag} do
9
pi = (xi, yi, zi) ∈ Ω;
10
pj = (xj, yj, zj) ∈ Ω;
11
if (pi, pj) satisfy tolerances τid and ||pi − pj|| ≈ hid × il × h then
12
Save (VRi,ivheadiv or VRi,ivtailiv) and (VRj,ivheadiv or VRj,ivtailiv) according to variogram ivtypeiv into β;
13 γ ← build variogram using statistics β Write γ in the output file
Output: Output file with γ values
STEP 1 STEP 2 STEP 3 ... STEP N
compute statistics(pi, pj) store result in shared memory
/* x threads coord in the GPU grid */
2 idy = blockId.y*blockDim.y + threadId.y
/* y threads coord in the GPU grid */
3 Set and Initialize shared memory in the block 4
syncthreads()
5 iterx = idx 6 itery = idy 7 while (iterx & itery ∈ ChunkPoints)
/* Chunk points belongs thread (x,y) */
8 do 9
j = iterx + |Ω|
2 10
i = itery
11
compute statistics(pi, p j)
12
store result in shared memory via atomic functions
13
if (iterx > itery) then
14
i = itery
15
j = iterx
16
compute statistics(pi, pj)
17
store result in shared memory via atomic functions
18
else
19
if (iterx == itery) then
20
i = itery
21
j = itery
22
compute statistics(pi, p j)
23
store result in shared memory via atomic functions
24
i = iterx + |Ω|
2 25
j = itery + |Ω|
2 26
compute statistics(pi, pj)
27
store result in shared memory via atomic functions
28
up date(iterx, itery)
29
syncthreads()
30 save Statistics values that are in shared memory into globa memory via atomic functions
Output: Array β
Each thread compute
/* x threads coord in the GPU grid */
2 idy = blockId.y*blockDim.y + threadId.y
/* y threads coord in the GPU grid */
3 Set and Initialize shared memory in the block 4
syncthreads()
5 iterx = idx 6 itery = idy 7 while (iterx & itery ∈ ChunkPoints)
/* Chunk points belongs thread (x,y) */
8 do 9
j = iterx + |Ω|
2 10
i = itery
11
compute statistics(pi, p j)
12
store result in shared memory via atomic functions
13
if (iterx > itery) then
14
i = itery
15
j = iterx
16
compute statistics(pi, pj)
17
store result in shared memory via atomic functions
18
else
19
if (iterx == itery) then
20
i = itery
21
j = itery
22
compute statistics(pi, p j)
23
store result in shared memory via atomic functions
24
i = iterx + |Ω|
2 25
j = itery + |Ω|
2 26
compute statistics(pi, pj)
27
store result in shared memory via atomic functions
28
up date(iterx, itery)
29
syncthreads()
30 save Statistics values that are in shared memory into globa memory via atomic functions
Output: Array β
2 0 1 2 3 1 3 3 0 1 2 3 0 2 1 3 0 1 2 3 0 1 2 3 2 3 0 1 2 3 1 3 STEP 1 STEP 2 STEP 3 STEP 4
/* x threads coord in the GPU grid */
2 idy = blockId.y*blockDim.y + threadId.y
/* y threads coord in the GPU grid */
3 Set and Initialize shared memory in the block 4
syncthreads()
5 iterx = idx 6 itery = idy 7 while (iterx & itery ∈ ChunkPoints)
/* Chunk points belongs thread (x,y) */
8 do 9
j = iterx + |Ω|
2 10
i = itery
11
compute statistics(pi, p j)
12
store result in shared memory via atomic functions
13
if (iterx > itery) then
14
i = itery
15
j = iterx
16
compute statistics(pi, pj)
17
store result in shared memory via atomic functions
18
else
19
if (iterx == itery) then
20
i = itery
21
j = itery
22
compute statistics(pi, p j)
23
store result in shared memory via atomic functions
24
i = iterx + |Ω|
2 25
j = itery + |Ω|
2 26
compute statistics(pi, pj)
27
store result in shared memory via atomic functions
28
up date(iterx, itery)
29
syncthreads()
30 save Statistics values that are in shared memory into globa memory via atomic functions
Output: Array β
2 0 1 2 3 1 3 3 0 1 2 3 0 2 1 3 0 1 2 3 0 1 2 3 STEP 1 STEP 2
2 0 1 2 3 1 3 3 0 1 2 3 0 2 1 3 0 1 2 3 0 1 2 3 2 3 0 1 2 3 1 3 STEP 1 STEP 2 STEP 3 STEP 4
/* x threads coord in the GPU grid */
2 idy = blockId.y*blockDim.y + threadId.y
/* y threads coord in the GPU grid */
3 Set and Initialize shared memory in the block 4
syncthreads()
5 iterx = idx 6 itery = idy 7 while (iterx & itery ∈ ChunkPoints)
/* Chunk points belongs thread (x,y) */
8 do 9
j = iterx + |Ω|
2 10
i = itery
11
compute statistics(pi, p j)
12
store result in shared memory via atomic functions
13
if (iterx > itery) then
14
i = itery
15
j = iterx
16
compute statistics(pi, pj)
17
store result in shared memory via atomic functions
18
else
19
if (iterx == itery) then
20
i = itery
21
j = itery
22
compute statistics(pi, p j)
23
store result in shared memory via atomic functions
24
i = iterx + |Ω|
2 25
j = itery + |Ω|
2 26
compute statistics(pi, pj)
27
store result in shared memory via atomic functions
28
up date(iterx, itery)
29
syncthreads()
30 save Statistics values that are in shared memory into globa memory via atomic functions
Output: Array β
2 0 1 2 3 1 3 3 0 1 2 3 0 2 1 3 0 1 2 3 0 1 2 3 STEP 1 STEP 2
2 0 1 2 3 1 3 3 0 1 2 3 0 2 1 3 0 1 2 3 0 1 2 3 2 3 0 1 2 3 1 3 STEP 1 STEP 2 STEP 3 STEP 4
/* x threads coord in the GPU grid */
2 idy = blockId.y*blockDim.y + threadId.y
/* y threads coord in the GPU grid */
3 Set and Initialize shared memory in the block 4
syncthreads()
5 iterx = idx 6 itery = idy 7 while (iterx & itery ∈ ChunkPoints)
/* Chunk points belongs thread (x,y) */
8 do 9
j = iterx + |Ω|
2 10
i = itery
11
compute statistics(pi, p j)
12
store result in shared memory via atomic functions
13
if (iterx > itery) then
14
i = itery
15
j = iterx
16
compute statistics(pi, pj)
17
store result in shared memory via atomic functions
18
else
19
if (iterx == itery) then
20
i = itery
21
j = itery
22
compute statistics(pi, p j)
23
store result in shared memory via atomic functions
24
i = iterx + |Ω|
2 25
j = itery + |Ω|
2 26
compute statistics(pi, pj)
27
store result in shared memory via atomic functions
28
up date(iterx, itery)
29
syncthreads()
30 save Statistics values that are in shared memory into globa memory via atomic functions
Output: Array β
2 0 1 2 3 1 3 3 0 1 2 3 0 2 1 3 0 1 2 3 0 1 2 3 STEP 1 STEP 2
2 0 1 2 3 1 3 3 0 1 2 3 0 2 1 3 0 1 2 3 0 1 2 3 2 3 0 1 2 3 1 3 STEP 1 STEP 2 STEP 3 STEP 4
/* x threads coord in the GPU grid */
2 idy = blockId.y*blockDim.y + threadId.y
/* y threads coord in the GPU grid */
3 Set and Initialize shared memory in the block 4
syncthreads()
5 iterx = idx 6 itery = idy 7 while (iterx & itery ∈ ChunkPoints)
/* Chunk points belongs thread (x,y) */
8 do 9
j = iterx + |Ω|
2 10
i = itery
11
compute statistics(pi, p j)
12
store result in shared memory via atomic functions
13
if (iterx > itery) then
14
i = itery
15
j = iterx
16
compute statistics(pi, pj)
17
store result in shared memory via atomic functions
18
else
19
if (iterx == itery) then
20
i = itery
21
j = itery
22
compute statistics(pi, p j)
23
store result in shared memory via atomic functions
24
i = iterx + |Ω|
2 25
j = itery + |Ω|
2 26
compute statistics(pi, pj)
27
store result in shared memory via atomic functions
28
up date(iterx, itery)
29
syncthreads()
30 save Statistics values that are in shared memory into globa memory via atomic functions
Output: Array β
2 0 1 2 3 1 3 3 0 1 2 3 0 2 1 3 0 1 2 3 0 1 2 3 STEP 1 STEP 2
Distance
10x10x120 Omni-Directional Semivariogram
0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 0.00 0.20 0.40 0.60 0.80 1.00
Standard version CUDA version
Distance
510x510x1 Omni-Directional Semivariogram
0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 0.0 10.0 20.0 30.0 40.0 50.0
Standard version CUDA version
Standard Version
Seconds 575 1150 1725 2300 Number of points 5000 12000 65000 261000 500000 800000 1000000 2095 1330 518.69 143.2 9.52 4.68 0.8
CUDA version
Seconds 12.5 25 37.5 50 Number of points 5000 12000 65000 261000 500000 800000 1000000 43.2 28.45 12.28 3.62 0.61 0.53 0.25
CUDA version
Seconds 12.5 25 37.5 50 Number of points 5000 12000 65000 261000 500000 800000 1000000 43.2 28.45 12.28 3.62 0.61 0.53 0.25
Standard Version
Seconds 575 1150 1725 2300 Number of points 5000 12000 65000 261000 500000 800000 1000000 2095 1330 518.69 143.2 9.52 4.68 0.8
SpeedUp
SpeedUp 13.75 27.5 41.25 55 Number of points 5000 12000 65000 261000 500000 800000 1000000 48.5 46.7 42.2 39.6 15.6 8.8 3.2 Sequential Parallel
43 sec 34 min
CPU GPU
simulation of GSLIB