Lonestar: A Suite of Parallel Irregular Programs
Milind Kulkarni, Martin Burtscher and Keshav Pingali Călin Caşcaval
Tuesday, April 21, 2009
Lonestar: A Suite of Parallel Irregular Programs Milind Kulkarni, - - PowerPoint PPT Presentation
Lonestar: A Suite of Parallel Irregular Programs Milind Kulkarni, Martin C lin Ca caval Burtscher and Keshav Pingali Tuesday, April 21, 2009 Lonestar: A Suite of Parallel Irregular Programs Milind Kulkarni, Martin C lin Ca
Milind Kulkarni, Martin Burtscher and Keshav Pingali Călin Caşcaval
Tuesday, April 21, 2009
Milind Kulkarni, Martin Burtscher and Keshav Pingali Călin Caşcaval
Tuesday, April 21, 2009
multiplications concurrently
structures such as graphs, trees, etc.
be exploited?
3
Tuesday, April 21, 2009
Application Domain Algorithms Data-mining Agglomerative clustering, k-means Bayesian inference Belief propagation, survey propagation Compilers Iterative dataflow, Elimination-based dataflow Functional interpreters Graph reduction, static/dynamic dataflow Maxflow Preflow-push, augmenting paths Minimum spanning trees Prim’s, Kruskal’s Boruvka’s N-body methods Barnes-Hut, fast multipole Graphics Ray-tracing Linear solvers Sparse MVM, sparse Cholesky factorization Event-driven simulation Time warp, Chandy-Misra-Bryant Meshing Delaunay mesh refinement, triangulation
4
Tuesday, April 21, 2009
removing “cavity” and re- triangulating
triangles
processed in any order
when worklist is empty
Before After
5
Tuesday, April 21, 2009
can generate new events to send to other nodes
in global time order
B
3 A
6
Tuesday, April 21, 2009
can generate new events to send to other nodes
in global time order
B
2 4 3 A
6
Tuesday, April 21, 2009
can generate new events to send to other nodes
in global time order
B
2 4 3 A
6
Tuesday, April 21, 2009
can generate new events to send to other nodes
in global time order
B
2 4 3
A
6
Tuesday, April 21, 2009
can generate new events to send to other nodes
in global time order
B
2 4 3
A
6
6
Tuesday, April 21, 2009
abstraction, find commonalities between algorithms
“Program = Algorithm + Data Structure”
unordered worklists of active nodes
accessing neighborhood
7
Tuesday, April 21, 2009
abstraction, find commonalities between algorithms
“Program = Algorithm + Data Structure”
unordered worklists of active nodes
accessing neighborhood
7
Tuesday, April 21, 2009
abstraction, find commonalities between algorithms
“Program = Algorithm + Data Structure”
unordered worklists of active nodes
accessing neighborhood
7
Tuesday, April 21, 2009
non-overlapping neighborhoods in parallel
respect ordering constraints
parallelism
8
Tuesday, April 21, 2009
9
Tuesday, April 21, 2009
10
Tuesday, April 21, 2009
11
Tuesday, April 21, 2009
12
Application Input Size Iterations Memory Footprint Instructions/ Iteration Memory Acc/ Iteration L1d Miss Rate
AC 2M points 1,999,999 1,039 MB 67,920 13,832 7.1% BH 220K bodies 220,000 41 MB 199,167 49,789 14.1% DMR 550K triangles 1,297,380 2,545 MB 72,747 22,684 31.7% DT 80K points 80,000 927 MB 262,952 91,547 41.1% SP 500 variables 2100 clauses 4,492,403 8 MB 177,885 42,016 2.1%
Tuesday, April 21, 2009
13
Tuesday, April 21, 2009
14
Tuesday, April 21, 2009
architectures, etc.
applications
Applications?”
computations executed in each step
15
Tuesday, April 21, 2009
badly shaped
reflects increasing size of mesh
16
10 20 30 40 50 60
Computation Step
2000 4000 6000 8000 10000 12000 14000 16000
Available Parallelism
Tuesday, April 21, 2009
20 40 60 80 100
Computation Step
50000 100000 150000 200000 250000 300000
Available Parallelism
that clusters points based
bottom-up manner
structure of binary tree
17
Tuesday, April 21, 2009
20 40 60 80 100
Computation Step
200 400 600 800 1000 1200
Available Parallelism
mesh from set of input points
as DMR
beginning because mesh starts very small
18
Tuesday, April 21, 2009
bipartite graph.
updating guess for truth value
clauses
uniform, as graph doesn’t change dramatically.
are assigned truth values.
19
5000 10000 15000 20000 25000 30000
Computation Step
50 100 150 200 250 300 350
Available Parallelism
Tuesday, April 21, 2009
20
Tuesday, April 21, 2009
21
Tuesday, April 21, 2009
Abstractions”
22
Tuesday, April 21, 2009
23
Speedup
Tuesday, April 21, 2009
24
# of cores Speedup
Tuesday, April 21, 2009
25
# of cores Speedup
Tuesday, April 21, 2009
26
# of cores Speedup
Tuesday, April 21, 2009
27
# of cores Speedup
Tuesday, April 21, 2009
efficiency up to 16 cores
available parallelism
system
28
Tuesday, April 21, 2009
programs that exhibit amorphous data parallelism
important domains
amount of work
both in theory and in practice
29
Tuesday, April 21, 2009
http://www.iss.ices.utexas.edu/lonestar
30
Tuesday, April 21, 2009
Tuesday, April 21, 2009