Benchmarking Summarizability Processing in Colocated with XML - - PowerPoint PPT Presentation

benchmarking summarizability processing in
SMART_READER_LITE
LIVE PREVIEW

Benchmarking Summarizability Processing in Colocated with XML - - PowerPoint PPT Presentation

ACM Fifteenth International Workshop On Data Warehousing and OLAP DOLAP 2012 Benchmarking Summarizability Processing in Colocated with XML Warehouses with ACM CIKM 2012 Complex Hierarchies Maui, Hawaii, USA November 2, 2012 By Chantola


slide-1
SLIDE 1

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

DOLAP 2012 Colocated with ACM CIKM 2012 Maui, Hawaii, USA November 2, 2012

ACM Fifteenth International Workshop On Data Warehousing and OLAP

By Chantola KIT Marouane HACHICHA Jérôme DARMONT

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

slide-2
SLIDE 2

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

Outline

Introduction Background Benchmark Specification Experimental Demonstration Conclusion and Future Work

2

slide-3
SLIDE 3

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

Decision Making:

  • 1. Business Intelligence (BI) is famed for complex

analysis

  • OLAP is a notable BI tool for multi-dimensional analysis
  • 2. DWs: collection of historical and concurrent data
  • XML is widely used to represent complex hierarchical

data

Introduction

3

slide-4
SLIDE 4

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

Effectiveness of Summarizability processing on complex hierarchies Benchmarks are used to support performance evaluation Existing XML data warehouse benchmark: XWeB

  • Complex hierarchies are not scalable

Introduction (Cont.)

4

slide-5
SLIDE 5

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

XML Data Example

Sales part type3 type2 type1 f_quantity f_totalamount customer nation region date day month year supplier nation region sale

Part#1 LARGE PLATE TIN Customer#1 USA AMERICA Supplier#1 FRANCE EUROPE 25/06/1998 25 06 1998 100 2,800 sale#1

5

slide-6
SLIDE 6

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

Non-Strict Hierarchies

Sales part type3 type2 type1 f_quantity f_totalamount customer nation region date day month year supplier nation region sale

Part#1 LARGE PLATE TIN Customer#1 USA AMERICA Supplier#1 FRANCE EUROPE 25/06/1998 25 06 1998 100 2,800

nation region

GERMANY EUROPE sale#1

supplier

Supplier#2

nation region

ALGERIA AFRICA

  • Supplier#1 is located in

Europe and Africa;

  • Europe contains two

suppliers: #1 and #2 6

  • Total quantity supplied

by Europe is 200 (wrong)

slide-7
SLIDE 7

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

Incomplete Hierarchies

Sales part type2 type1 f_quantity f_totalamount customer nation region date day month year supplier nation region sale

Part#1 PLATE TIN Customer#1 USA AMERICA Supplier#1 FRANCE EUROPE 25/06/1998 25 06 1998 100 2,800 sale#1

7

  • Part#1 has no type3

(LARGE) level

  • Total quantity of PLATE or

TIN part is 0 (wrong)

slide-8
SLIDE 8

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

Related Work

Relational Decision Support Benchmark

TPC: TPC-H and TPC-DS [TPPC’12] SSB [VLDB/TPCTC’09] DWEB [IJBIDM’07]

XML benchmarks: Michigan [VLDB’02], MemBer [SIGMOD’05],

X-Mach, XMark [VLDB/EEXTT’02], XOO7[CIKM’01], and XBench [ICDE’04]

XML decision support benchmarks: XWeB [VLDB/TPCTC’10]

Only one complex hierarchy workload Complexity lies only on part-category dimension Query on complex hierarchies is limited Complex hierarchy is not scalable

8

slide-9
SLIDE 9

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

Objective

Extending XWeB with:

Scalable complex hierarchies Summarizability processing

9

slide-10
SLIDE 10

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

Data Model

Sales part type3 type2 type1 f_quantity f_totalamount customer nation region date day month year supplier nation region

+ * * * *

?

  • +

?

*

? ? ? ? ?: 0-1 (incomplete)

  • : 1 only (simple)

*: 0-many (complex) +: 1-many (non-strict)

sale

*

  • 10
slide-11
SLIDE 11

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

Randomly delete ip hierarchical levels

ip: incomplete percentage

Generating Incomplete Hierarchies

part type2 type1

Part#1 PLATE TIN

Type3 level of Part#1 is randomly deleted

Part#1

part type3 type2 type1

LARGE PLATE TIN

11

slide-12
SLIDE 12

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

Generating Non-strict Hierarchies

Supplier#1 FRANCE EUROPE Supplier#2 INDIA ASIA Supplier#1 ALGERIA AFRICA Supplier#2 GERMANY EUROPE

4-non-strict-hierarchy array

sale#1

supplier#1 FRANCE EUROPE GERMANY EUROPE supplier#2 INDIA ASIA ALGERIA AFRICA

12

Randomly generate np non-strict hierarchies np: non-strict percentage

  • 1. Randomly generate an array of n non-strict

hierarchies

  • n: number of non-strict hierarchies. Ex. n = 4
  • 2. Convert the array into Hierarchical XML Data
slide-13
SLIDE 13

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

  • 1. Generate n-non-strict array (as in slide #12)
  • 2. Randomly delete some levels from non-strict array
  • 3. Convert the array into Hierarchical XML Data

Generating Complex Hierarchies

Supplier#1 FRANCE EUROPE Supplier#2 INDIA ASIA Supplier#1 ALGERIA AFRICA Supplier#2 GERMANY EUROPE

4-non-strict-hierarchy array

sale#1

supplier#1 FRANCE EUROPE GERMANY supplier#2 ASIA ALGERIA AFRICA

Supplier#1 FRANCE EUROPE Supplier#2 ASIA Supplier#1 ALGERIA AFRICA Supplier#2 GERMANY

complex-hierarchy array 13

slide-14
SLIDE 14

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

Query Workload

Q21

sum of f_quantity, f_totalamount from part, customer, supplier, date group by part, customer, supplier, date

14

Q22

min of f_quantity from customer, part, supplier, date group by nation, type3, nation, day

Q23

max of f_totalamount from date, part, supplier, customer group by month, type2, nation, region

Q24

average of f_totalamount from supplier, part, customer, date group by region, type1, region, year

slide-15
SLIDE 15

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

Performance Metrics

Quantitative metric: response time; the execution time of the query workload Qualitative metric: verifying the result whether the summarizability issues are correctly handled

  • Resulted groups are not duplicated
  • Total of aggregation values is equal to grand total
  • average value is the division of total and its number
  • Min is the least value
  • Max is the highest value

15

slide-16
SLIDE 16

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

Experimental Study

Summarizability processing using:

Our proposed approach: Query Based Approach (QBS) [COMAD’12] Previous approach: Pedersen’s approach (Pedersen) [VLDB’99]

16

slide-17
SLIDE 17

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

Experimental Study (Cont.)

Dataset size (KB)

17

  • No. Facts

50,000 100,000 150,000 200,000 250,000 Simple 27,700 55,390 82,800 110,577 138,015 Incomplete 5% 27,626 55,242 82,543 110,249 137,573 Non-strict 5% 28,669 57,328 85,671 114,422 142,786 Complex 5% 28,376 56,742 85,791 113,252 141,319 Incomplete 50% 25,020 50,030 74,769 99,842 124,601 Non-strict 50% 35,412 70,826 105,914 141,397 176,527 Complex 50% 32,522 65,031 97,263 129,839 162,088

slide-18
SLIDE 18

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

  • Exp. Results of Simple Hierarchy Grouping

1,000 10,000 100,000 1,000,000 10,000,000 1D 2D 3D 4D 1D 2D 3D 4D 1D 2D 3D 4D 1D 2D 3D 4D 1D 2D 3D 4D 50000 10000 150000 200000 250000 Time (ms) Number of Facts

QBS Pedersen without Overhead Pedersen with Overhead 18

slide-19
SLIDE 19

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

  • Exp. Results of QBS Simple Hierarchy Group Matching

500 5,000 50,000 500,000 5,000,000 1D 2D 3D 4D 1D 2D 3D 4D 1D 2D 3D 4D 1D 2D 3D 4D 1D 2D 3D 4D 50000 10000 150000 200000 250000 Time (ms) Number of Facts

QBS without Overhead, without Group Matching QBS with Overhead, without Group Matching QBS with Overhead, with Group Matching 19

slide-20
SLIDE 20

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

  • Exp. Results of Pedersen Simple Hierarchy Group Matching

500 5,000 50,000 500,000 5,000,000 1D 2D 3D 4D 1D 2D 3D 4D 1D 2D 3D 4D 1D 2D 3D 4D 1D 2D 3D 4D 50000 10000 150000 200000 250000 Time (ms) Number of Facts

Pedersen without Overhead, without Group Matching Pedersen without Overhead, with Group Matching Pedersen with Overhead, with Group Matching 20

slide-21
SLIDE 21

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

  • Exp. Results of Complex Hierarchy Grouping

7,000 70,000 700,000 7,000,000 1D 2D 3D 1D 2D 3D 1D 2D 3D 1D 2D 3D 1D 2D 3D 50,000 100,000 150,000 200,000 250,000 Time (ms) Number of Facts 7,000 70,000 700,000 7,000,000 1D 2D 3D 1D 2D 3D 1D 2D 3D 1D 2D 3D 1D 2D 3D 50,000 100,000 150,000 200,000 250,000 Time (ms) Number of Facts

QBS Pedersen without Overhead Pedersen with Overhead 5% 50% 21

slide-22
SLIDE 22

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

  • Exp. Results of QBS Complex Hierarchy Grouping

1,000 10,000 100,000 1,000,000 10,000,000 1D 2D 3D 1D 2D 3D 1D 2D 3D 1D 2D 3D 1D 2D 3D 50,000 100,000 150,000 200,000 250,000

Time (ms) Number of Facts

1,000 10,000 100,000 1,000,000 10,000,000 1D 2D 3D 1D 2D 3D 1D 2D 3D 1D 2D 3D 1D 2D 3D 50,000 100,000 150,000 200,000 250,000

Time (ms) Number of Facts

Incomlete Non-strict Complex

5% 50%

22

slide-23
SLIDE 23

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

Conclusion

First XML data warehouse benchmark with complex hierarchies Conform to Gray’s criteria: relevance, portability, scalability, and simplicity Experimentation addressing summariability processing: Run-time summarizability management is feasible Run-time of group matching process is still costly Future work: Improve group matching process Integrate with previous XML benchmarks: XWeB

23

slide-24
SLIDE 24

Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies

QUESTIONS?

chantola.kit@univ-lyon2.fr marouane.hachicha@univ-lyon2.fr jerome.darmont@univ-lyon2.fr

24

Benchmark preliminary version: http://eric.uni-lyon2.fr/~ckit/DOLAP12.zip