Parallel Space-Time Kernel Density Estimation Erik Saule , Dinesh - - PowerPoint PPT Presentation

parallel space time kernel density estimation
SMART_READER_LITE
LIVE PREVIEW

Parallel Space-Time Kernel Density Estimation Erik Saule , Dinesh - - PowerPoint PPT Presentation

Parallel Space-Time Kernel Density Estimation Erik Saule , Dinesh Panchananam , Alexander Hohl , Wenwu Tang , Eric Delmelle Dept. of Computer Science Dept. of Geography and Earth Sciences UNC Charlotte Email: {


slide-1
SLIDE 1

Parallel Space-Time Kernel Density Estimation

Erik Saule†, Dinesh Panchananam†, Alexander Hohl‡, Wenwu Tang‡, Eric Delmelle‡

† Dept. of Computer Science ‡Dept. of Geography and Earth Sciences

UNC Charlotte Email: {esaule,dpanchan,ahohl,wtang4,eric.delmelle}@uncc.edu

LIG seminar (previously presented at ICPP 2017)

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 1 / 32

slide-2
SLIDE 2

Outline

1

Space Time Kernel Density

2

Sequential Algorithms

3

Domain-Based Parallelism

4

Point-Based Parallelism

5

Conclusion

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 2 / 32

slide-3
SLIDE 3

Space Time Kernel Density

What is it?

Common way of visualizing events with time and place information Basically voxelize the space Give a value to each voxel that depends on the number of neighboring events to the voxel (with some kind of decay). Essentially a generalization of density maps (e.g., population density)

What is it useful for?

Monitoring disease outbreak Political analysis Social media analysis Ornithology

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 3 / 32

slide-4
SLIDE 4

Space-Time Kernel Density Estimate Formally

For a voxel x, y, t

ˆ f (x, y, t) =

1 nh2

s ht

  • i|di<hs,ti<ht ks( x−xi

hs , y−yi hs )kt( t−ti ht )

ks(u, v) = π 2 (1 − u)2(1 − v)2 kt(w) = 3 4(1 − w)2 hs is the spatial bandwidth ht is the temporal bandwidth n is the number of points (events)

Each event radiates density

Similar to computing sums of radial basis functions which are typical in physics.

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 4 / 32

slide-5
SLIDE 5

Dengue Fever in Cali, Colombia

hs = 2500m, ht = 14days hs = 500m, ht = 7days

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 5 / 32

slide-6
SLIDE 6

Outline

1

Space Time Kernel Density

2

Sequential Algorithms

3

Domain-Based Parallelism

4

Point-Based Parallelism

5

Conclusion

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 6 / 32

slide-7
SLIDE 7

Voxel-Based Algorithm VB

Algorithm

for all voxels s = (x, y, t) do sum = 0 for all points i at xi, yi, ti do if

  • (xi − x)2 + (yi − y)2 < hs and |ti − t| ≤ ht then

sum+ = ks( x−xi

hs , y−yi hs )kt( t−ti ht )

stkde[X][Y ][T] =

sum nh2

s ht

θ(GxGyGtn) distance tests θ(nH2

s Ht) density values

Complexity: θ(GxGyGtn) But pleasingly parallel.

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 7 / 32

slide-8
SLIDE 8

Point-Based Algorithm PB

Algorithm

for all voxels s = (x, y, t) do stkde[X][Y ][T] = 0 for each points i at xi, yi, ti do for Xi − Hs ≤ X ≤ Xi + Hs do for Yi − Hs ≤ Y ≤ Yi + Hs do for Ti − Ts ≤ T ≤ Ti + Hs do if

  • (xi − x)2 + (yi − y)2 < hs and |ti − t| ≤ ht then

stkde[X][Y ][T]+ =

ks( x−xi

hs , y−yi hs )kt( t−ti ht )

nh2

s ht

Θ(GxGyGt) for memory initialization Θ(nH2

s Ht) density computations

Complexity: Θ(GxGyGt + nH2

s Ht)

(Gain the θ(GxGyGtn) distance tests from the voxel-based algorithm)

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 8 / 32

slide-9
SLIDE 9

Exploiting Symmetries PB-SYM

For each point: Compute the Kt Compute the Ks Do the cross product Complexity is the same, but saves computation in practice

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 9 / 32

slide-10
SLIDE 10

Experimental settings

Instance n Gx x Gy x Gt Size Hs Ht Dengue Lr-Lb 11056 148x194x728 79MB 3 1 Dengue Lr-Hb 11056 148x194x728 79MB 25 1 Dengue Hr-Lb 11056 294x386x728 315MB 2 1 Dengue Hr-Hb 11056 294x386x728 315MB 50 1 Dengue Hr-VHb 11056 294x386x728 315MB 50 14 PollenUS Lr-Lb 588189 131x61x84 2MB 2 3 PollenUS Hr-Lb 588189 651x301x84 62MB 10 3 PollenUS Hr-Mb 588189 651x301x84 62MB 25 7 PollenUS Hr-Hb 588189 651x301x84 62MB 50 14 PollenUS VHr-Lb 588189 6501x3001x84 6252MB 100 3 PollenUS VHr-VLb 588189 6501x3001x84 6252MB 50 3 Flu Lr-Lb 31478 117x308x851 117MB 1 1 Flu Lr-Hb 31478 117x308x851 117MB 2 3 Flu Mr-Lb 31478 233x615x1985 1085MB 2 3 Flu Mr-Hb 31478 233x615x1985 1085MB 4 7 Flu Hr-Lb 31478 581x1536x5951 20260MB 5 7 Flu Hr-Hb 31478 581x1536x5951 20260MB 10 21 eBird Lr-Lb 291990435 357x721x2435 2391MB 2 3 eBird Lr-Hb 291990435 357x721x2435 2391MB 6 5 eBird Hr-Lb 291990435 1781x3601x2435 59570MB 10 3 eBird Hr-Hb 291990435 1781x3601x2435 59570MB 30 5

Shared memory machine (A node of Copperhead): 2 Intel Xeon E5-2667 v3 (2 times 8 cores) 128GB of DRAM G++ 5.3 (with OpenMP 4.0)

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 10 / 32

slide-11
SLIDE 11

In practice, PB-SYM is much better

Time (in seconds) speedup Instance VB VB-DEC PB PB-DISK PB-BAR PB-SYM PB-SYM Dengue Lr-Lb 219.163 2.283 0.040 0.029 0.035 0.028 1.429 Dengue Lr-Hb 220.591 13.878 1.298 0.564 1.152 0.499 2.601 Dengue Hr-Lb 866.445 9.522 0.089 0.082 0.085 0.084 1.060 Dengue Hr-Hb 871.774 55.206 5.169 2.272 4.563 2.074 2.492 Dengue Hr-VHb 1056.172 404.845 51.885 11.478 42.994 7.431 6.982 PollenUS Lr-Lb 518.859 7.639 1.106 0.347 0.922 0.256 4.320 PollenUS Hr-Lb 12721.001 189.337 23.539 7.700 18.527 4.708 5.000 PollenUS Hr-Mb 17179.482 3126.947 357.743 86.129 295.791 57.528 6.219 PollenUS Hr-Hb 2666.104 583.175 2212.626 382.566 6.969 PollenUS VHr-Lb 2428.126 1004.174 1949.988 759.722 3.196 PollenUS VHr-VLb 603.789 240.236 488.388 179.834 3.357 Flu Lr-Lb 926.360 3.691 0.035 0.032 0.034 0.032 1.094 Flu Lr-Hb 966.328 3.797 0.081 0.046 0.070 0.042 1.929 Flu Mr-Lb 8591.165 30.355 0.305 0.278 0.298 0.277 1.101 Flu Mr-Hb 8957.175 32.018 0.714 0.384 0.608 0.323 2.211 Flu Hr-Lb 536.091 5.702 5.089 5.454 5.059 1.127 Flu Hr-Hb 591.955 12.795 6.822 10.992 7.072 1.809 eBird Lr-Lb 396.811 147.951 322.580 125.248 3.168 eBird Lr-Hb 6969.187 1897.051 5611.158 1067.395 6.529 eBird Hr-Lb 8373.273 3226.016 6470.764 2229.460 3.756 eBird Hr-Hb 34577.745

Clearly, PB-SYM is the algorithm to make parallel.

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 11 / 32

slide-12
SLIDE 12

Parallelism in STKDE is not trivial

The problem

The algorithm is written with a for all points as its outer loop. But if the cylinder of two points intersect and the points are processed at the same time, there could be race condition in writing the stkde array

Naive solution

Lock the state of cells of stkde to avoid the race condition Or use atomics when updating stkde That causes Θ(nH2

s Ht) locks or atomics

A better solution

Make sure intersecting cylinder are never processed at the same time. That is the rest of the presentation

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 12 / 32

slide-13
SLIDE 13

Parallelism in STKDE is not trivial

The problem

The algorithm is written with a for all points as its outer loop. But if the cylinder of two points intersect and the points are processed at the same time, there could be race condition in writing the stkde array

Naive solution

Lock the state of cells of stkde to avoid the race condition Or use atomics when updating stkde That causes Θ(nH2

s Ht) locks or atomics

A better solution

Make sure intersecting cylinder are never processed at the same time. That is the rest of the presentation

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 12 / 32

slide-14
SLIDE 14

Outline

1

Space Time Kernel Density

2

Sequential Algorithms

3

Domain-Based Parallelism

4

Point-Based Parallelism

5

Conclusion

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 13 / 32

slide-15
SLIDE 15

Domain Replication PB-SYM-DR

Each worker: Initializes its own memory buffer stkde local Processes some points in stkde local (with load balancing)

So no race condition

Participates in reducing the many stkde local into stkde

2 4 6 8 10 12 14 16 18 Dengue_Lr-Lb Dengue_Lr-Hb Dengue_Hr-Lb Dengue_Hr-Hb Dengue_Hr-VHb PollenUS_Lr-Lb PollenUS_Hr-Lb PollenUS_Hr-Mb PollenUS_Hr-Hb PollenUS_VHr-Lb PollenUS_VHr-VLb Flu_Lr-Lb Flu_Lr-Hb Flu_Mr-Lb Flu_Mr-Hb Flu_Hr-Lb Flu_Hr-Hb eBird_Lr-Lb eBird_Lr-Hb eBird_Hr-Lb Speedup 1 2 4 8 16

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 14 / 32

slide-16
SLIDE 16

Why is DR bad? Some instances have little computation!

0.2 0.4 0.6 0.8 1 1.2 1.4 Dengue_Lr-Lb Dengue_Lr-Hb Dengue_Hr-Lb Dengue_Hr-Hb Dengue_Hr-VHb PollenUS_Lr-Lb PollenUS_Hr-Lb PollenUS_Hr-Mb PollenUS_Hr-Hb PollenUS_VHr-Lb PollenUS_VHr-VLb Flu_Lr-Lb Flu_Lr-Hb Flu_Mr-Lb Flu_Mr-Hb Flu_Hr-Lb Flu_Hr-Hb eBird_Lr-Lb eBird_Lr-Hb eBird_Hr-Lb eBird_Hr-Hb Initialization Compute

(and some run out of memory)

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 15 / 32

slide-17
SLIDE 17

Domain Decomposition PB-SYM-DD

Decompose the voxel domain in K × K × K subdomains Each worker processes different subdomains (load balanced on the subdomains)

Each voxel is now processed by a unique thread, so no race condition

2 4 6 8 10 12 14 16 18 Dengue_Lr-Lb Dengue_Lr-Hb Dengue_Hr-Lb Dengue_Hr-Hb Dengue_Hr-VHb PollenUS_Lr-Lb PollenUS_Hr-Lb PollenUS_Hr-Mb PollenUS_Hr-Hb PollenUS_VHr-Lb PollenUS_VHr-VLb Flu_Lr-Lb Flu_Lr-Hb Flu_Mr-Lb Flu_Mr-Hb Flu_Hr-Lb Flu_Hr-Hb eBird_Lr-Lb eBird_Lr-Hb eBird_Hr-Lb eBird_Hr-Hb Speedup 1x1x1 2x2x2 4x4x4 8x8x8 16x16x16 32x32x32 64x64x64

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 16 / 32

slide-18
SLIDE 18

Why is DD bad? Work overhead. Some cylinders are cut!

2 4 6 8 10 D e n g u e _ L r

  • L

b D e n g u e _ L r

  • H

b D e n g u e _ H r

  • L

b D e n g u e _ H r

  • H

b D e n g u e _ H r

  • V

H b P

  • l

l e n U S _ L r

  • L

b P

  • l

l e n U S _ H r

  • L

b P

  • l

l e n U S _ H r

  • M

b P

  • l

l e n U S _ H r

  • H

b P

  • l

l e n U S _ V H r

  • L

b P

  • l

l e n U S _ V H r

  • V

L b F l u _ L r

  • L

b F l u _ L r

  • H

b F l u _ M r

  • L

b F l u _ M r

  • H

b F l u _ H r

  • L

b F l u _ H r

  • H

b e B i r d _ L r

  • L

b e B i r d _ L r

  • H

b e B i r d _ H r

  • L

b Time relative to PB-SYM 1x1x1 2x2x2 4x4x4 8x8x8 16x16x16 32x32x32 64x64x64

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 17 / 32

slide-19
SLIDE 19

Are there better partitionings? (since ICPP17)

PB-SYM-DD partitions the space using a naive grid. But we can do better. Hierarchical partitionings Jagged partitionings (shown in 2D, but generalizes to 3D)

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 18 / 32

slide-20
SLIDE 20

Optimal decompositions can be found using dynamic programming

Hier(Xmin, Xmax, Ymin, Ymax, Tmin, Tmax, P) = min

1≤p<P(

min

Xmin<x<Xmax(maxHier(Xmin, x, Ymin, Ymax, Tmin, Tmax, p),

Hier(x, Xmax, Ymin, Ymax, Tmin, Tmax, P − p)), min

Ymin<y<Ymax(maxHier(Xmin, Xmax, Ymin, y, Tmin, Tmax, p),

Hier(Xmin, Xmax, y, Ymax, Tmin, Tmax, P − p)), min

Tmin<t<Tmax(maxHier(Xmin, Xmax, Ymin, Ymax, Tmin, t, p),

Hier(Xmin, Xmax, Ymin, Ymax, t, Tmax, P − p))) (1) (yummy)

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 19 / 32

slide-21
SLIDE 21

Largest block work-overhead tradeoff

6.10352e-05 0.000244141 0.000976562 0.00390625 0.015625 0.0625 0.25 1 4 1 2 4 8 16 32 64 128 256 512 work overhead 1/pmax GRID HIER HIER-OVER

(for PollenUS-Hr-Lb) How good are these in practice? Some encouraging preliminary results. But the partitioning are expensive.

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 20 / 32

slide-22
SLIDE 22

Outline

1

Space Time Kernel Density

2

Sequential Algorithms

3

Domain-Based Parallelism

4

Point-Based Parallelism

5

Conclusion

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 21 / 32

slide-23
SLIDE 23

Point decomposition PB-SYM-PD

Partition the points in a regular AxBxC grid such that each dimension is bigger than the bandwidth. For (a, b, c) ∈ {0, 1}3

Do in parallel grids 2i + a, 2j + b, 2k + c, ∀i, j, k Sync

No two points closer than twice the bandwidth are processed simultaneously, so no simultaneous cylinder intersection.

2 4 6 8 10 12 14 16 18 Dengue_Lr-Lb Dengue_Lr-Hb Dengue_Hr-Lb Dengue_Hr-Hb Dengue_Hr-VHb PollenUS_Lr-Lb PollenUS_Hr-Lb PollenUS_Hr-Mb PollenUS_Hr-Hb PollenUS_VHr-Lb PollenUS_VHr-VLb Flu_Lr-Lb Flu_Lr-Hb Flu_Mr-Lb Flu_Mr-Hb Flu_Hr-Lb Flu_Hr-Hb eBird_Lr-Lb eBird_Lr-Hb eBird_Hr-Lb eBird_Hr-Hb Speedup 1x1x1 2x2x2 4x4x4 8x8x8 16x16x16 32x32x32 64x64x64

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 22 / 32

slide-24
SLIDE 24

Why is this bad? Too many dependencies?

Since all 2i, 2j, 2k are done before any 2i + 1, 2j, 2k, there is a forced precedence of 0, 0, 0 over 3, 0, 0. But they are not dependent. This does coloring, instead of doing scheduling.

A B C D E F G H I J K L M N O P Q R S T

A B C D E F G H I J K L M N O P Q R S T

Building the graph from a coloring is simple (and easily expressed in OpenMP 4.0).

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 23 / 32

slide-25
SLIDE 25

A better coloring with PD-SYM-PD-SCHED

We don’t need to color subdomains to minimize the number of colors. We need a coloring that minimizes the longest chain in the implied graph. Heuristic: greedily color subdomains in highest number of points first.

2 4 6 8 10 12 14 16 18 D e n g u e _ L r

  • L

b D e n g u e _ L r

  • H

b D e n g u e _ H r

  • L

b D e n g u e _ H r

  • H

b D e n g u e _ H r

  • V

H b P

  • l

l e n U S _ L r

  • L

b P

  • l

l e n U S _ H r

  • L

b P

  • l

l e n U S _ H r

  • M

b P

  • l

l e n U S _ H r

  • H

b P

  • l

l e n U S _ V H r

  • L

b P

  • l

l e n U S _ V H r

  • V

L b F l u _ L r

  • L

b F l u _ L r

  • H

b F l u _ M r

  • L

b F l u _ M r

  • H

b F l u _ H r

  • L

b F l u _ H r

  • H

b e B i r d _ L r

  • L

b e B i r d _ L r

  • H

b e B i r d _ H r

  • L

b e B i r d _ H r

  • H

b Speedup 1x1x1 2x2x2 4x4x4 8x8x8 16x16x16 32x32x32 64x64x64

(If you don’t have a good eye, it is a bit better than before)

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 24 / 32

slide-26
SLIDE 26

Why is it bad? Still too long critical path!

0.1 0.2 0.3 0.4 0.5 0.6 0.7 D e n g u e _ L r

  • L

b D e n g u e _ L r

  • H

b D e n g u e _ H r

  • L

b D e n g u e _ H r

  • H

b D e n g u e _ H r

  • V

H b P

  • l

l e n U S _ L r

  • L

b P

  • l

l e n U S _ H r

  • L

b P

  • l

l e n U S _ H r

  • M

b P

  • l

l e n U S _ H r

  • H

b P

  • l

l e n U S _ V H r

  • L

b P

  • l

l e n U S _ V H r

  • V

L b F l u _ L r

  • L

b F l u _ L r

  • H

b F l u _ M r

  • L

b F l u _ M r

  • H

b F l u _ H r

  • L

b F l u _ H r

  • H

b e B i r d _ L r

  • L

b e B i r d _ L r

  • H

b e B i r d _ H r

  • L

b Relative length of the critical path PB-SYM-PD PB-SYM-PD-SCHED

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 25 / 32

slide-27
SLIDE 27

How hard is the coloring/edge-orientation problem to minimize critical path?

NP-hard in general graph (harder than coloring) Trivial on chains Polynomial if Bipartite (Julien Hermann made that observation) 2D mesh with 5-pt stencil and 3D mesh with 7-pt stencil are bipartite On 2D mesh with 9-pt stencil and 3D mesh with 27-pt stencil, one easily builds a 2d−1 approximation algorithm. (Based on a 5point stencil, 7point stencil approximation of the graph.) On 2D mesh with 9-pt stencil and 3D mesh with 27-pt stencil, NP-Complete? I don’t know. How good is the heuristic? I don’t know

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 26 / 32

slide-28
SLIDE 28

Parallelize long path PB-SYM-PD-SCHED-REP

One can replicate a subdomain and get perfect work parallelism. (at the expense of some memory initialization and reduction.) Heuristic: for all paths longer than

n 2P , add one copy to all tasks on the

path

2 4 6 8 10 12 14 16 18 D e n g u e _ L r

  • L

b D e n g u e _ L r

  • H

b D e n g u e _ H r

  • L

b D e n g u e _ H r

  • H

b D e n g u e _ H r

  • V

H b P

  • l

l e n U S _ L r

  • L

b P

  • l

l e n U S _ H r

  • L

b P

  • l

l e n U S _ H r

  • M

b P

  • l

l e n U S _ H r

  • H

b P

  • l

l e n U S _ V H r

  • L

b P

  • l

l e n U S _ V H r

  • V

L b F l u _ L r

  • L

b F l u _ L r

  • H

b F l u _ M r

  • L

b F l u _ M r

  • H

b F l u _ H r

  • L

b F l u _ H r

  • H

b e B i r d _ L r

  • L

b e B i r d _ L r

  • H

b e B i r d _ H r

  • L

b e B i r d _ H r

  • H

b Speedup 1x1x1 2x2x2 4x4x4 8x8x8 16x16x16 32x32x32 64x64x64

(This sounds like moldable DAG scheduling.)

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 27 / 32

slide-29
SLIDE 29

Outline

1

Space Time Kernel Density

2

Sequential Algorithms

3

Domain-Based Parallelism

4

Point-Based Parallelism

5

Conclusion

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 28 / 32

slide-30
SLIDE 30

All methods

2 4 6 8 10 12 14 16 18 D e n g u e _ L r

  • L

b D e n g u e _ L r

  • H

b D e n g u e _ H r

  • L

b D e n g u e _ H r

  • H

b D e n g u e _ H r

  • V

H b P

  • l

l e n U S _ L r

  • L

b P

  • l

l e n U S _ H r

  • L

b P

  • l

l e n U S _ H r

  • M

b P

  • l

l e n U S _ H r

  • H

b P

  • l

l e n U S _ V H r

  • L

b P

  • l

l e n U S _ V H r

  • V

L b F l u _ L r

  • L

b F l u _ L r

  • H

b F l u _ M r

  • L

b F l u _ M r

  • H

b F l u _ H r

  • L

b F l u _ H r

  • H

b e B i r d _ L r

  • L

b e B i r d _ L r

  • H

b e B i r d _ H r

  • L

b e B i r d _ H r

  • H

b Speedup PB-SYM-DR PB-SYM-DD PB-SYM-PD PB-SYM-PD-SCHED PB-SYM-PD-SCHED-REP

Algorithms matter in sequential processing The real challenge in parallel processing is often algorithmic

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 29 / 32

slide-31
SLIDE 31

Future Works

Other platforms: GPU (not quite sure how to approach it) KNL Distributed Memory Some algorithmic problems: Better way to decompose for PB-SYM-DD Formally study the edge orientation problem for PB-SYM-DD-SCHED Look deeper into the moldable scheduling connection for PB-SYM-PD-SCHED-REP Model everything and derive analytical bounds on performance

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 30 / 32

slide-32
SLIDE 32

What else do I do ?

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 31 / 32

slide-33
SLIDE 33

Thank you!

And thanks: Dan Janies for pointing out the Flu dataset Bora U¸ car for pointing out the Gallai-Hasse-Roy-Vitaver theorem The US tax payer

Support from US NSF XSEDE Supercomputing Resource Allocation (SES170007) is acknowledged. This material is based upon work supported by the National Science Foundation under Grant No. 1652442.

Want to know more?

Read the ICPP 17 paper. Contact: esaule@uncc.edu Visit: http://webpages.uncc.edu/~esaule

Erik Saule (UNC Charlotte) Shared-memory STKDE LIG seminar 32 / 32