Graph Cube: On Warehousing and OLAP Multidimensional Networks - - PowerPoint PPT Presentation

graph cube on warehousing and olap multidimensional
SMART_READER_LITE
LIVE PREVIEW

Graph Cube: On Warehousing and OLAP Multidimensional Networks - - PowerPoint PPT Presentation

Graph Cube: On Warehousing and OLAP Multidimensional Networks Peixiang Zhao , Xiaolei Li , Dong Xin , Jiawei Han Department of Computer Science, UIUC Groupon Inc. Google Cooperation pzhao4@illinois.edu,


slide-1
SLIDE 1

Graph Cube: On Warehousing and OLAP Multidimensional Networks

Peixiang Zhao†, Xiaolei Li‡, Dong Xin§, Jiawei Han†

†Department of Computer Science, UIUC ‡Groupon Inc. §Google Cooperation †pzhao4@illinois.edu, hanj@cs.illinois.edu ‡me@xiaolei.org, §dongxin@gmail.com

June 16th, 2011

SIGMOD 2011 Athens, Greece 1 / 24

slide-2
SLIDE 2

Outline

1 Introduction 2 The Graph Cube Model 3 OLAP on Graph Cube

Cuboid Query Crossboid Query

4 Implementing Graph Cube 5 Experiment 6 Conclusion SIGMOD 2011 Athens, Greece 2 / 24

slide-3
SLIDE 3

Introduction

Recent years have seen an astounding growth of networks in a wide spectrum of application domains

Communication networks Social networks Biological networks The Web

Multidimensional networks

1

An underlying graph structure comprising entities and relationships

2

Multidimensional attributes are specified and associated with entities of the network

There exist considerable technology gaps in managing, querying and summarizing multidimensional networks effectively

SIGMOD 2011 Athens, Greece 3 / 24

slide-4
SLIDE 4

A Sample Multidimensional Network

1 2 3 4 5 6 7 8 9 10

(a) Graph

ID Gender Location Profession Income 1 Male CA Teacher $70, 000 2 Female WA Teacher $65, 000 3 Female CA Engineer $80, 000 4 Female NY Teacher $90, 000 5 Male IL Lawyer $80, 000 6 Female WA Teacher $90, 000 7 Male NY Lawyer $100, 000 8 Male IL Engineer $75, 000 9 Female CA Lawyer $120, 000 10 Male IL Engineer $95, 000

(b) Vertex Attribute Table

Figure: A Multidimensional Network Comprising a Graph Structure and a Multidimensional Vertex Attribute Table

SIGMOD 2011 Athens, Greece 4 / 24

slide-5
SLIDE 5

Introduction

Motivation: Can we extend decision support facilities on multidimensional networks?

Data warehouses and OLAP are advantageous in the multidimensional network scenario

Summarizing the massive networks into different levels of granularity for more effective analysis and exploration Business Intelligence: in Facebook and Twitter, advertisers and marketers take advantage of social networks within different multidimensional spaces to better promote their products via social targeting or viral marketing

However, in multidimensional networks, much of the valuation and interest lies in the network itself!

Simple numeric value based group-by’s in traditional data warehouses are no longer insightful and of limited usage, because the structural information of the networks is simply ignored

SIGMOD 2011 Athens, Greece 5 / 24

slide-6
SLIDE 6

Network Aggregation v.s. Traditional Group-by

5 5 9 3 Male Female

(a) Aggregate Network

Gender COUNT(*) Male 5 Female 5

(b) Aggregate Table

Figure: Multidimensional Network Aggregation v.s. Traditional RDB Aggregation (Group by Gender)

2 3 1 2 1 1 5 (Female, CA) (Male, IL) (Male, CA) (Female, WA) (Female, NY) (Male, NY)

(a) Aggregate Network

Gender Location COUNT(*) Male CA 1 Female CA 2 Female WA 2 Male IL 3 Male NY 1 Female NY 1

(b) Aggregate Table

Figure: Multidimensional Network Aggregation v.s. Traditional RDB Aggregation (Group by Gender and Location)

SIGMOD 2011 Athens, Greece 6 / 24

slide-7
SLIDE 7

Introduction

Graph Cube

A multidimensional network can be summarized to aggregate networks in coarser levels of granularity within different multidimensional spaces

Vertex coalescence Structure summarization

Different query models and OLAP solutions are proposed for multidimensional networks

Cuboid Queries Crossboid Queries

Efficient implementation is based on a combination of

Well-studied data cube implementation techniques Special characteristics of multidimensional networks

The first to systematically address warehousing and OLAP issues on large multidimensional networks

SIGMOD 2011 Athens, Greece 7 / 24

slide-8
SLIDE 8

The Graph Cube Model

Multidimensional Network A multidimensional network, N, is a graph denoted as N = (V , E, A), where V is a set of vertices, E ⊆ V × V is a set of edges and A = {A1, A2, . . . , An} is a set of n vertex-specific attributes, i.e., ∀u ∈ V , there is a tuple A(u) of u, denoted as A(u) = (A1(u), A2(u), . . . , An(u)), where Ai(u) is the value of u

  • n i-th attribute, 1 ≤ i ≤ n. A is called the dimensions of the

network N. Some (or all) dimension Ai could be ∗ (ALL), representing a super-aggregation along Ai Given a set of n dimensions of a network, there exist 2n multidimensional spaces (aggregations) The measure within each possible space is no longer a simple numeric value, but an aggregate network

SIGMOD 2011 Athens, Greece 8 / 24

slide-9
SLIDE 9

The Graph Cube Model

Graph Cube Given a multidimensional network N = (V , E, A), the graph cube is obtained by restructuring N in all possible aggregations of A. For each possible aggregation A′ of A, the grouping measure is an aggregate network G ′ w.r.t. A′.

2 5 12 8 15 16 19 23 Apex (Gender) (Location) (Profession) (Gender, Location) (Gender, Profession) (Location, Profession) Base

Figure: The Graph Cube Lattice

SIGMOD 2011 Athens, Greece 9 / 24

slide-10
SLIDE 10

OLAP on Graph Cubes

Cuboid Query: return as output the aggregate network corresponding to a specific aggregation of the dimensions of the multidimensional network

What is the network structure between various genders? What is the network structure between the various gender and location combinations?

5 5 9 3 Male Female

2 3 1 2 1 1 5 (Female, CA) (Male, IL) (Male, CA) (Female, WA) (Female, NY) (Male, NY) SIGMOD 2011 Athens, Greece 10 / 24

slide-11
SLIDE 11

OLAP on Graph Cubes

A cuboid query is within a single multidimensional space, which follows the traditional OLAP model A crossboid query crosses multiple multidimensional spaces

  • f the network, i.e., more than one cuboid is involved in a

query

What is the network structure between the user with ID = 3 and various locations? What is the network structure between users grouped by gender v.s. users grouped by location?.

1 1 3 1 1 ID: 3 WA IL CA NY 3 5

Male

5

Female CA IL WA NY

6 2 2 3 3 3 2 2 2 6 4

SIGMOD 2011 Athens, Greece 11 / 24

slide-12
SLIDE 12

Cuboid Queries v.s. Crossboid Queries

Apex (Gender) (Gender, Location, Profession) (Gender, Profession) (Location) (Profession) (Gender, Location)

(a) Traditional Cuboid Queries

(Gender)

"What is the network structure "What is the network structure between

(Location)

users grouped by gender and users grouped by location?" between users and the locations?"

(Gender, Location, Profession)

(b) Crossboid Queries Straddling Multiple Cuboids

SIGMOD 2011 Athens, Greece 12 / 24

slide-13
SLIDE 13

Graph Cube Implementation

Objective: compute the aggregate networks of different cuboids grouping on all possible dimension combinations of a multidimensional network

1

Full materialization: Best query response time, worst space cost

2

No materialization: Best space cost, worst query response time

3

Partial materialization: A small portion of cuboids is materialized in order to balance the tradeoff between query response time and cube resource requirement

SIGMOD 2011 Athens, Greece 13 / 24

slide-14
SLIDE 14

Graph Cube Implementation: Partial Materialization

Problem: To select a set S of k cuboids in the graph cube for materialization, such that the average time taken to evaluate the queries can be minimized

The partial materialization problem is NP-complete, reduced from set-cover

Greedy Algorithm: Selecting k cuboids with the highest size-reduction benefit Theorem Let Bgreedy be the benefit of k cuboids chosen by the greedy algorithm and let Bopt be the benefit of any optimal set of k

  • cuboids. Then Bgreedy ≤ (1 − 1/e) × Bopt and this bound is tight

MinLevel Algorithm: Materializing cuboids c, where dim(c) = l0 indicating the level in the cube lattice at which we start materializing cuboids

SIGMOD 2011 Athens, Greece 14 / 24

slide-15
SLIDE 15

Experimental Evaluation

DBLP data set

A co-authorship graph with 28, 702 authors as vertices and 66, 832 coauthor relationships as edges Three dimensions: name, area, productivity

area: DB, DM, AI, IR productivity: Excellent, Good, Fair, Poor

IMDB data set

A movie rating network with 116, 164 vertices and 5, 452, 350 edges Seven dimensions: Title, Year, Length, Budget, Rating, MPAA and Type

MPAA: G, PG, PG-13, R, NC-17, NR Type: action, animation, comedy, drama, documentary, romance, short

SIGMOD 2011 Athens, Greece 15 / 24

slide-16
SLIDE 16

Effectiveness Evaluation

7752 4590 11329 5031

DB DM AI IR

22490 18729 1182 7116 8010 2220 1229 1550 2307 1999

(c) (Area)

26170 2165 321 46

Poor Fair Good Excellent

31587 682 5787 3520 139 15877 872 496 1744 2584

(d) (Productivity)

Figure: Cuboid Queries of the Graph Cube on DBLP Data Set

SIGMOD 2011 Athens, Greece 16 / 24

slide-17
SLIDE 17

Effectiveness Evaluation

6825 (DB, Poor) 732 (DB, Fair) 161 (DB, Good) 34 (DB, Excellent) 4209 (DM, Poor) 331 (DM, Fair) 43 (DM, Good) 7 (DM, Excellent) 10498 (AI, Poor) 747 (AI, Fair) 83 (AI, Good) 1 (AI, Excellent) 4638 (IR, Poor) 355 (IR, Fair) 34 (IR, Good) 4 (IR, Excellent) 8887 1148 410 105 4182 252 32 4 10975 838 76 4590 478 31 1 5276 2877 1270 1422 670 425 396 290 170 361 253 679 292 333 523 244 203

(a) (Area, Productivity)

Figure: Cuboid Queries of the Graph Cube on DBLP Data Set

SIGMOD 2011 Athens, Greece 17 / 24

slide-18
SLIDE 18

Effectiveness Evaluation

7752 4590

DB DM

21591

11329 5031

AI IR

26170 2165 321 46

Poor Fair Good Excellent

10193 5816 2596 7166

1857 1511 719 20355

7639 2158 148 9778 4394 1420 414

(a) Area ⊲ ⊳ Productivity

Figure: Crossboid Queries of the Graph Cube on DBLP Data Set

SIGMOD 2011 Athens, Greece 18 / 24

slide-19
SLIDE 19

Effectiveness Evaluation

97

DB

4

DM

3

AI

11

IR

52

Poor

33

Fair

24

Good

6

Excellent

1 Hector Garcia-Molina 97 4 3 11 52 33 24 6

(a) Area ⊲ ⊳ Base ⊲ ⊳ Productivity for “Hector Garcia-Molina”

66

DB

71

DM

4

AI

13

IR

71

Poor

52

Fair

12

Good

13

Excellent

1 Philip S. Yu 66 71 4 13 71 52 12 13

(b) Area ⊲ ⊳ Base ⊲ ⊳ Productivity for “Philip S. Yu”

Figure: Crossboid Queries of the Graph Cube on DBLP Data Set

SIGMOD 2011 Athens, Greece 19 / 24

slide-20
SLIDE 20

Efficiency Evaluation

2 4 6 8 10 12 14 1 2 3

Runtime (seconds) Number of Dimensions

Raw Table Graph Cube

(a) Time v.s. # Dimensions

2 4 6 8 10 12 14 1 2 3 4 5 6

Runtime (seconds) Number of Edges (*10K)

Raw Table Graph Cube

(b) Time v.s. # Edges

Figure: Full Materialization of Graph Cube for DBLP Data Set

SIGMOD 2011 Athens, Greece 20 / 24

slide-21
SLIDE 21

Efficiency Evaluation

200 400 600 800 1000 1 2 3 4 5 6 Runtime (seconds) Number of Dimensions Graph Cube Raw Table

(a) Time v.s. # Dimensions

100 200 300 400 500 600 700 800 900 1000 1 2 3 4 5 Runtime (seconds) Number of Edges (*1M) Graph Cube Raw Table

(b) Time v.s. # Edges

Figure: Full Materialization of Graph Cube for IMDB Data Set

SIGMOD 2011 Athens, Greece 21 / 24

slide-22
SLIDE 22

Efficiency Evaluation

5 10 15 20 25 30 35 40 45 6 8 10 12 14 16

Runtime (seconds) Number of Materialized Cuboids

Greedy MinLevel

(a) Cuboid Queries

10 20 30 40 50 60 70 6 8 10 12 14 16

Runtime (seconds) Number of Materialized Cuboids

Greedy MinLevel

(b) Crossboid Queries

Figure: Average Query Respond Time w.r.t. Different Partial Materialization Algorithms

SIGMOD 2011 Athens, Greece 22 / 24

slide-23
SLIDE 23

Conclusion

1 This work seeks to enhance decision-support functionality on

large multidimensional networks

2 Graph cube:

A new data warehousing model is designed specifically for efficient aggregation on multidimensional networks

3 Different query models and OLAP solutions for Graph Cube

are proposed and studied

Crossboid queries break the boundary of the traditional OLAP model by straddling multiple cuboids of the Graph Cube

4 The implementation of Graph Cube is discussed and the

experimental results have demonstrated the power and efficacy

  • f Graph Cube as the first, to the best of our knowledge, tool

for warehousing and OLAP large multidimensional networks

SIGMOD 2011 Athens, Greece 23 / 24

slide-24
SLIDE 24

Thank you

SIGMOD 2011 Athens, Greece 24 / 24