Massive Streaming Data Analytics: A Case Study with Clustering Coefficients
Davi vid Ediger, Karl Jiang, Jason Riedy and David A. Bader
Massive Streaming Data Analytics: A Case Study with Clustering - - PowerPoint PPT Presentation
Massive Streaming Data Analytics: A Case Study with Clustering Coefficients Davi vid Ediger, Karl Jiang, Jason Riedy and David A. Bader Overview Motivation A Framework for Massive Streaming hello Data Analytics STINGER
Davi vid Ediger, Karl Jiang, Jason Riedy and David A. Bader
David Ediger, MTAAP 2010, Atlanta, GA
2
David Ediger, MTAAP 2010, Atlanta, GA
3
Even with parallelism, current
David Ediger, MTAAP 2010, Atlanta, GA
4
David Ediger, MTAAP 2010, Atlanta, GA
5
David Ediger, MTAAP 2010, Atlanta, GA
6
Pre-process, Sort, Reconcile “Age off” old vertices Alter graph Update metrics STINGER graph Insertions / Deletions Affected vertices Change detection
7
David Ediger, MTAAP 2010, Atlanta, GA
David Ediger, MTAAP 2010, Atlanta, GA
8
+2
u v
+2 +1 +1
David Ediger, MTAAP 2010, Atlanta, GA
9
David Ediger, MTAAP 2010, Atlanta, GA
10
David Ediger, MTAAP 2010, Atlanta, GA
11
David Ediger, MTAAP 2010, Atlanta, GA
12
David Ediger, MTAAP 2010, Atlanta, GA
13
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 1
HashA(10) = 2 HashB(10) = 10 HashA(23) = 11 HashB(23) = 8
Bit Array Bloom Filter
David Ediger, MTAAP 2010, Atlanta, GA
14
Image Source: cray.com
David Ediger, MTAAP 2010, Atlanta, GA
15
David Ediger, MTAAP 2010, Atlanta, GA
Image Source: intel.com 16
David Ediger, MTAAP 2010, Atlanta, GA
17
David Ediger, MTAAP 2010, Atlanta, GA
18
Algorithm B = 1 B = 1000 B = 4000 Exact 90 25,100 50,100 Approx. 60 83,700 193,300
David Ediger, MTAAP 2010, Atlanta, GA
32 of 64P Cray XMT, 16M vertices, 134M edges
19
David Ediger, MTAAP 2010, Atlanta, GA
20
David Ediger, MTAAP 2010, Atlanta, GA
21
David Ediger, MTAAP 2010, Atlanta, GA
22