StratoSphere Above the Clouds Triangle Enumeration Input Set of - - PowerPoint PPT Presentation

stratosphere
SMART_READER_LITE
LIVE PREVIEW

StratoSphere Above the Clouds Triangle Enumeration Input Set of - - PowerPoint PPT Presentation

Stratosphere Demo Triangle Enumeration & TPC-H Query 3 Thomas Bodner, Matthias Ringwald TU Berlin StratoSphere Above the Clouds Triangle Enumeration Input Set of undirected edges Friend-of-a-Friend RDF data from Billion Triple


slide-1
SLIDE 1

StratoSphere

Above the Clouds

Stratosphere Demo

Triangle Enumeration & TPC-H Query 3 Thomas Bodner, Matthias Ringwald TU Berlin

slide-2
SLIDE 2

Stratosphere: Information Management above the Clouds

Input ■ Set of undirected edges ■ Friend-of-a-Friend RDF data from Billion Triple Challenge 2011 Goal ■ Find triples of edges that build triangle ■ Used as preprocessing to find highly connected subgraphs

Triangle Enumeration

2

slide-3
SLIDE 3

Stratosphere: Information Management above the Clouds

Triangle Enumeration – Step 1 & 2

3

slide-4
SLIDE 4

Stratosphere: Information Management above the Clouds

Triangle Enumeration – Step 3 & 4

4

slide-5
SLIDE 5

Stratosphere: Information Management above the Clouds

Triangle Enumeration on PACTs

5

slide-6
SLIDE 6

Stratosphere: Information Management above the Clouds

StratoSphere

Above the Clouds

CODE + DEMO

6

slide-7
SLIDE 7

Stratosphere: Information Management above the Clouds

TPC-H Query 3

SELECT l_orderkey, o_shippriority, sum(l_extendedprice) as revenue FROM orders, lineitem WHERE l_orderkey = o_orderkey AND o_custkey IN [X] AND o_orderdate > [Y] GROUP BY l_orderkey, o_shippriority ■ OLAP-style query ■ 2 Tables: Orders, Lineitem ■ Join and Aggregation

slide-8
SLIDE 8

Stratosphere: Information Management above the Clouds

StratoSphere

Above the Clouds

CODE + DEMO

8

slide-9
SLIDE 9

Stratosphere: Information Management above the Clouds

■ www.stratosphere.eu provides open-source release and additional examples:

□ WordCount □ One Iteration of K-Means □ Pair-wise shortest path computation in graphs □ Weblog file analysis

■ „MapReduce and PACT - Comparing Data Parallel Programming Models“, A.Alexandrov et al., BTW 2011

□ Compares MapReduce and PACT implementations of additional examples

Additional Information

9

slide-10
SLIDE 10

Stratosphere: Information Management above the Clouds

StratoSphere

Above the Clouds

ENUMERATING TRIANGLES FOR SOCIAL NETWORK MINING

Demo Screenshots

slide-11
SLIDE 11

Stratosphere: Information Management above the Clouds

Enumerating Triangles – Job Preview

slide-12
SLIDE 12

Stratosphere: Information Management above the Clouds

Enumerating Triangles – Optimized Plan

slide-13
SLIDE 13

Stratosphere: Information Management above the Clouds

Enumerating Triangles – Nephele Schedule in Execution

slide-14
SLIDE 14

Stratosphere: Information Management above the Clouds

Enumerating Triangles– Result

Edge 1 ID Edge 2 ID Edge 3 ID 1669672241|957516469|13113271| 1174119379|957443913|195638598| 1805945648|956415427|448134175| 1950197714|956415427|448134175| 2016831532|956415427|448134175| 1669297417|956305207|315643403| 1039976411|956305207|315643403| 1467833050|953504954|878633592| 1672783901|950586510|524583308| 1840098659|949391994|562197935| 1146307869|947061533|121415420| 1564227243|945488147|536289824| 1892548695|945488147|536289824|

slide-15
SLIDE 15

Stratosphere: Information Management above the Clouds

StratoSphere

Above the Clouds

TPCH QUERY 3 (SIMPLIFIED)

Demo Screenshots

slide-16
SLIDE 16

Stratosphere: Information Management above the Clouds

TPCH3 – Plan Preview

slide-17
SLIDE 17

Stratosphere: Information Management above the Clouds

TPCH3 – Optimized Plan

slide-18
SLIDE 18

Stratosphere: Information Management above the Clouds

TCPH-3 – Nephele Schedule in Execution

slide-19
SLIDE 19

Stratosphere: Information Management above the Clouds

TPCH Query – Result

2948|0|5896| 6691|0|1887| 6691|1|26837| 9507|0|61605| 9665|0|28995| 12641|0|75846| 17282|0|34564| 24964|0|124820| 27330|0|191310| 29858|0|22865| 30533|0|183198| 30561|0|91683| 41255|0|206275| Order Nr. Priority (0 = Normal) Total Sales Volume