2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 - - PowerPoint PPT Presentation

2 workloa d
SMART_READER_LITE
LIVE PREVIEW

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 - - PowerPoint PPT Presentation

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 Scan- OLAP OLTP Streaming oriented 4 Scan- OLAP OLTP Streaming Archiving oriented 4 Scan- Log- OLAP OLTP Streaming Archiving processing oriented 4 Scan- Log-


slide-1
SLIDE 1
slide-2
SLIDE 2 2
slide-3
SLIDE 3

Workloa d?

3
slide-4
SLIDE 4

OLTP

4
slide-5
SLIDE 5

OLTP OLAP

4
slide-6
SLIDE 6

OLTP OLAP

Streaming

4
slide-7
SLIDE 7

OLTP OLAP

Streaming Scan-

  • riented
4
slide-8
SLIDE 8

OLTP OLAP

Streaming Scan-

  • riented

Archiving

4
slide-9
SLIDE 9

OLTP OLAP

Streaming Scan-

  • riented

Archiving

Log- processing

4
slide-10
SLIDE 10

OLTP OLAP

Streaming Scan-

  • riented

Archiving

Log- processing

Web-search

4
slide-11
SLIDE 11

OLAP

Streaming

Log-processing Web-search Scan-oriented Archiving

OLTP

5
slide-12
SLIDE 12

OLAP

Streaming

Log-processing Web-search Scan-oriented Archiving

OLTP

5
slide-13
SLIDE 13

OLAP

Streaming

Log-processing Web-search Scan-oriented Archiving

OLTP

5
slide-14
SLIDE 14

OLAP

Streaming

Log-processing Web-search Scan-oriented Archiving

OLTP

5
slide-15
SLIDE 15

OLAP

Streaming

Log-processing Web-search Scan-oriented Archiving

OLTP

5
slide-16
SLIDE 16

OLTP OLAP

Archiving Scan-oriented

Streaming

Log-processing Web-search

6
slide-17
SLIDE 17

OLTP

OLAP

Archiving Scan-oriented

Streaming

Log-processing Web-search

6
slide-18
SLIDE 18

OLTP

OLAP

Archiving

Streaming

Log-processing Web-search Shared-scans

6
slide-19
SLIDE 19

BIG DATA

7
slide-20
SLIDE 20 8

OOPs

slide-21
SLIDE 21 9
slide-22
SLIDE 22 10
slide-23
SLIDE 23
  • Primary/Secondary Index
  • Materialized

Views

  • Vertical Partitioning
  • Horizontal Partitioning
10
slide-24
SLIDE 24
  • Primary/Secondary Index
  • Materialized

Views

  • Vertical Partitioning
  • Horizontal Partitioning
  • Human Intervention
  • Slow Response Time
  • Inefficient
10
slide-25
SLIDE 25 10
slide-26
SLIDE 26
  • Online Indexes
  • Dynamic Materialized

Views

  • Database Cracking
  • Adaptive Merging
10
slide-27
SLIDE 27
  • Online Indexes
  • Dynamic Materialized

Views

  • Database Cracking
  • Adaptive Merging
  • Still, does not change core system features
  • Several physical designs at schema level
  • No true physical data independence
  • Physical design not effective
10
slide-28
SLIDE 28

PLAN B?

11
slide-29
SLIDE 29

OLTP

12
slide-30
SLIDE 30

OLAP

13
slide-31
SLIDE 31

Archive

14
slide-32
SLIDE 32

Streaming

15
slide-33
SLIDE 33

Log-processing

16
slide-34
SLIDE 34 17

OLTP OLAP

Archiving Scan-oriented

Streaming

Log-processing Web-search

slide-35
SLIDE 35 17
slide-36
SLIDE 36 17
  • Tedious
  • Expensive
  • Complex ETL
  • Inefficient
slide-37
SLIDE 37 18

Mixed Workloads Dynamic Workloads ‘Zoo’ of systems

slide-38
SLIDE 38 18
slide-39
SLIDE 39 19
slide-40
SLIDE 40 19

Indexes

Column Row

Raw files Row+Column

slide-41
SLIDE 41 19

Indexes

Column Row

Raw files Row+Column

slide-42
SLIDE 42

OctopusDB

20
  • Flexible data storage layer
  • Adapt layout to workload
  • Logical journal of data operations
  • Arbitrary physical representations
  • New concept: Storage

Views

slide-43
SLIDE 43 1 abc 56 887.9 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6
  • mg
90 148.9

OctopusDB

20
  • Flexible data storage layer
  • Adapt layout to workload
  • Logical journal of data operations
  • Arbitrary physical representations
  • New concept: Storage

Views

slide-44
SLIDE 44 1 abc 56 887.9 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6
  • mg
90 148.9

OctopusDB

20

Log

  • Flexible data storage layer
  • Adapt layout to workload
  • Logical journal of data operations
  • Arbitrary physical representations
  • New concept: Storage

Views

slide-45
SLIDE 45 1 abc 56 887.9 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6
  • mg
90 148.9

OctopusDB

20

Log

Row

  • Flexible data storage layer
  • Adapt layout to workload
  • Logical journal of data operations
  • Arbitrary physical representations
  • New concept: Storage

Views

slide-46
SLIDE 46 1 abc 56 887.9 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6
  • mg
90 148.9

OctopusDB

20

Log

Row Column

  • Flexible data storage layer
  • Adapt layout to workload
  • Logical journal of data operations
  • Arbitrary physical representations
  • New concept: Storage

Views

slide-47
SLIDE 47 1 abc 56 887.9 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6
  • mg
90 148.9

OctopusDB

20

Log

Row Column

  • Flexible data storage layer
  • Adapt layout to workload
  • Logical journal of data operations
  • Arbitrary physical representations
  • New concept: Storage

Views

Column grouped

slide-48
SLIDE 48 1 abc 56 887.9 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6
  • mg
90 148.9

OctopusDB

20

Log

Row Column

  • Flexible data storage layer
  • Adapt layout to workload
  • Logical journal of data operations
  • Arbitrary physical representations
  • New concept: Storage

Views

Column grouped

Index

slide-49
SLIDE 49 1 abc 56 887.9 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6
  • mg
90 148.9

OctopusDB

20

Log

Row Column

  • Flexible data storage layer
  • Adapt layout to workload
  • Logical journal of data operations
  • Arbitrary physical representations
  • New concept: Storage

Views

Column grouped

Index PAX

slide-50
SLIDE 50

Log SV

Result

tickets.customer_id

!

customer.*

( ))

"

a1=x1..an=xn

(

customer.id

Example: Flight Tickets

21
slide-51
SLIDE 51

Log SV

Result

tickets.customer_id

!

customer.*

( ))

"

a1=x1..an=xn

(

customer.id

Example: Flight Tickets

21 Col SV Row SV Log SV

Result

!

bag=customers " bag,key recent

#

( ( ))

!

b a g = t i c k e t s " b a g , k e y r e c e n t

#

( ( ) )

tickets.customer_id

$

customer.*

( ))

!

a1=x1..an=xn

(

customer.id tickets customers
slide-52
SLIDE 52

Log SV

Result

tickets.customer_id

!

customer.*

( ))

"

a1=x1..an=xn

(

customer.id Col SV Row SV Log SV Result

!

bag=customers " bag,key recent

#

( ( ))

!

bag=tickets " bag,key recent

#

( ( ) )

!

time>=now-7days Col SV

!

time<now-7days Cold Index SV Index SV

$ id,rid $price,rid

count(*)>=5 customer_id "

#

( )

tickets.customer_id

$

customer.*

( ))

!

a1=x1..an=xn

(

customer.id Frequent Fliers (Adaptive Partial Index) customer.id tickets.customer_id

Example: Flight Tickets

21 Col SV Row SV Log SV

Result

!

bag=customers " bag,key recent

#

( ( ))

!

b a g = t i c k e t s " b a g , k e y r e c e n t

#

( ( ) )

tickets.customer_id

$

customer.*

( ))

!

a1=x1..an=xn

(

customer.id tickets customers
slide-53
SLIDE 53

Log SV

Result

tickets.customer_id

!

customer.*

( ))

"

a1=x1..an=xn

(

customer.id Col SV Row SV Log SV Result

!

bag=customers " bag,key recent

#

( ( ))

!

bag=tickets " bag,key recent

#

( ( ) )

!

time>=now-7days Col SV

!

time<now-7days Cold Index SV Index SV

$ id,rid $price,rid

count(*)>=5 customer_id "

#

( )

tickets.customer_id

$

customer.*

( ))

!

a1=x1..an=xn

(

customer.id Frequent Fliers (Adaptive Partial Index) customer.id tickets.customer_id Primary Log Store Primary Log Store

Example: Flight Tickets

21 Col SV Row SV Log SV

Result

!

bag=customers " bag,key recent

#

( ( ))

!

b a g = t i c k e t s " b a g , k e y r e c e n t

#

( ( ) )

tickets.customer_id

$

customer.*

( ))

!

a1=x1..an=xn

(

customer.id tickets customers Primary Log Store
slide-54
SLIDE 54 Use-Case Storage view definition (traditional systems) type example query row store Row SV any column store Col SV any PAX PAX SV any fractured mirrors Row SV same query for both and Col SV column groups Row SV πa1,...,ak and Col SV πak+1,...,am index Index SV any indexed row store Index SV(Row SV) any indexed column store Index SV(Col SV) any read-optimized in- dexed column store Index SV(Col SV) σt<now()−1day + differential write-
  • ptimized row store
Row SV σt≥now()−1day partial index Index SV σ420≤ak≤42000 projection index Col SV πak partial projection in- dex Index SV(Col SV) πak(σ420≤ak≤42000) DSMS Index SV σt≥now()−5min DSMS Index SV σt≥now()−5min + archive and Col SV σt<now()−5min snapshot any any replicated row store Row SV same query for both Row SV query any any dynamic view any any materialized view any any 22
slide-55
SLIDE 55 Use-Case Storage view definition (new system) type example query OLTP Row SV σt≥now()−1day + OLAP Col SV σt<now()−1day DSMS Index SV σt≥now()−5min + OLTP Row SV σt<now()−5min DSMS Index SV σt≥now()−5min + archive OLTP Row SV σnow()−1day≤t<now()−5min + archive OLAP Col SV σt<now()−1day
  • ther hybrid
any combination any
  • f the above
23
slide-56
SLIDE 56
  • Flexible data storage layer
  • Adapt layout to workload
  • Logical journal of data operations
  • Arbitrary physical representations
  • ‘Mimic’ several systems
  • No ‘zoo’ overheads
  • One-size-fits-all

OctopusDB

24
  • Flexible data storage layer
  • Adapt layout to workload
  • Logical journal of data operations
  • Arbitrary physical representations
  • New concept: Storage

Views

slide-57
SLIDE 57

Interesting

25
slide-58
SLIDE 58 25 19
slide-59
SLIDE 59 25
slide-60
SLIDE 60

Trojan Techniques

  • Good Trojans
  • Existing system
  • Source-code not required
  • Inject additional layouts
26
slide-61
SLIDE 61

How does it work?

  • Exploit UDFs provided by

existing systems

  • Inject pieces of code
  • Hack layouts into the UDFs
  • UDF as mapping between

logical and physical view of data

27
slide-62
SLIDE 62

How does it work?

UDF Storage Layer Query Processor Relations Physical Representation

File 1 File 2 File 3 File n ....
  • Exploit UDFs provided by

existing systems

  • Inject pieces of code
  • Hack layouts into the UDFs
  • UDF as mapping between

logical and physical view of data

27
slide-63
SLIDE 63

How does it work?

UDF Storage Layer Query Processor Relations Physical Representation

File 1 File 2 File 3 File n ....
  • Exploit UDFs provided by

existing systems

  • Inject pieces of code
  • Hack layouts into the UDFs
  • UDF as mapping between

logical and physical view of data

  • Novel use of UDFs
27
slide-64
SLIDE 64 28

Use Case 1: OLAP in Row-stores

slide-65
SLIDE 65

OLTP OLAP

29
slide-66
SLIDE 66

OLTP OLAP

29
slide-67
SLIDE 67

OLTP OLAP?

  • Can we push the limits of

row stores?

29
slide-68
SLIDE 68

Trojan Columns

30 Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household

Relation

slide-69
SLIDE 69

Trojan Columns

30 Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household

Relation Physical Table

slide-70
SLIDE 70

Trojan Columns

30 Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household

Relation Physical Table

slide-71
SLIDE 71

Trojan Columns

30 Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household

Relation Physical Table

slide-72
SLIDE 72

Trojan Columns

30 Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household

Relation Physical Table

slide-73
SLIDE 73

Example: TPC-H Query 6

shipd

scanUDF

σ

ate BETWEEN

σ

selectUDF

σ

Result

quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24

σ π

agg (extendedprice * discount)

γ

lineitem

SCAN

scanUDF Result

quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24

σ π

agg (extendedprice * discount)

γ

lineitem

SCAN

selectUDF Result

quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24

σ π

agg (extendedprice * discount)

γ

lineitem

SCAN

31
slide-74
SLIDE 74

Example: TPC-H Query 6

shipd

scanUDF

σ

ate BETWEEN

σ

selectUDF

σ

Result

quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24

σ π

agg (extendedprice * discount)

γ

lineitem

SCAN

scanUDF Result

quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24

σ π

agg (extendedprice * discount)

γ

lineitem

SCAN

selectUDF Result

quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24

σ π

agg (extendedprice * discount)

γ

lineitem

SCAN

31
slide-75
SLIDE 75

Example: TPC-H Query 6

shipd

scanUDF

σ

ate BETWEEN

σ

selectUDF

σ

Result

quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24

σ π

agg (extendedprice * discount)

γ

lineitem

SCAN

scanUDF Result

quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24

σ π

agg (extendedprice * discount)

γ

lineitem

SCAN

selectUDF Result

quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24

σ π

agg (extendedprice * discount)

γ

lineitem

SCAN

31
slide-76
SLIDE 76 10 20 30 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Query Time (sec)

Standard Row Trojan Columns

71.74058 72.41696

C-Store Benchmark *

32 * Mike Stonebraker et. al. C-Store: A Column Oriented DBMS. VLDB 2005
slide-77
SLIDE 77

TPC-H Benchmark *

33 * tpch.org/tpch 25 50 75 100 Q1 Q6 Q12 Q14 Query Time (sec)

Standard Row Trojan Columns

slide-78
SLIDE 78

TPC-H Benchmark *

34 * tpch.org/tpch 225 450 675 900 Q3 Q5 Q10 Q19 Query Time (sec)

Standard Row Trojan Columns

slide-79
SLIDE 79

Micro-Benchmark

35 16

2.13 2.13 2.11 2.06 1.55 0.47 0.06

15

4.64 4.62 4.55 4.27 2.57 0.55 0.06

13

5.00 5.00 4.94 4.61 2.70 0.57 0.06

11

5.79 5.82 5.75 5.24 2.87 0.56 0.06

9

6.39 6.38 6.25 5.79 3.11 0.54 0.06

7

7.00 6.96 6.80 6.23 3.17 0.56 0.06

5

10.96 10.94 10.55 9.27 3.75 0.57 0.06

3

12.86 13.57 13.22 11.03 4.16 0.56 0.06

1

17.43 17.61 16.61 13.57 4.39 0.57 0.06

1E-06 1E-05 1E-04 1E-03 1E-02 1E-01 1E+00 selectivity (fraction of tuples accessed) # referenced attributes (r)
slide-80
SLIDE 80

versus Column Stores

25 50 75 100 Q1 Q6 Q12 Q14 Query Time (sec)

Standard Row Trojan Columns DBMS-Y

36
slide-81
SLIDE 81
  • Column + Row storage
  • Much better performance
  • Closed source system

Trojan Columns Advantages

37
slide-82
SLIDE 82

Use Case 2: Big Data Analytics

38
slide-83
SLIDE 83

User visits from different countries

  • Scan Tasks
39

Web-log Processing

slide-84
SLIDE 84

User visits with duration greater than 10s

  • Scan Tasks
  • Selection Tasks
39

Web-log Processing

slide-85
SLIDE 85

URL and duration of each user visit

  • Scan Tasks
  • Selection Tasks
  • Projection Tasks
39

Web-log Processing

slide-86
SLIDE 86

Average PageRank visited by each user IP

  • Scan Tasks
  • Selection Tasks
  • Projection Tasks
  • Join Tasks
39

Web-log Processing

slide-87
SLIDE 87
  • Scan Tasks
  • Selection Tasks
  • Projection Tasks
  • Join Tasks
39 T L PPartblock Replicate Replicate Replicate Replicate Replicate Replicate T1 T6 H1 Fetch Fetch Fetch Store Store Store Scan Scan PPartsplit PPartsplit H2 . . . H3 Fetch Fetch Fetch Store Store Store Scan PPartsplit H4 . . . M1 Union RecReaditemize MMapmap PPartmem LPartsh LPartsh LPartsh Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine Store Store Store Mergecmp SortGrpgrp MMapcombine Store PPartsh M2 . . . M3 RecReaditemize MMapmap LPartsh Sortcmp SortGrpgrp MMapcombine Store PPartsh M4 . . . T1 T5 T2 T4 T3 T6 Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp SortGrpgrp MMapreduce R1 Store . . . R2 Data Load Phase Map Phase Shuffle Phase Reduce Phase

Web-log Processing

slide-88
SLIDE 88

Trojan Index

  • Each HDFS block sorted
  • Each block contains an index
SData T H F

... ...

DataSet Indexed Split i Trojan Index 40
slide-89
SLIDE 89

Trojan Index

  • Each HDFS block sorted
  • Each block contains an index
  • Index access in UDF
SData T H F ... ... DataSet Indexed Split i Trojan Index SData T H F ... ... DataSet Indexed Split i Trojan Index SData T H F ... ... DataSet Indexed Split i Trojan Index SData T H F ... ... DataSet Indexed Split i Trojan Index 40 HDFS Blocks
slide-90
SLIDE 90 T L PPartblock Replicate Replicate Replicate Replicate Replicate Replicate T1 T6 H1 Fetch Fetch Fetch Store Store Store Scan Scan PPartsplit PPartsplit H2 . . . H3 Fetch Fetch Fetch Store Store Store Scan PPartsplit H4 . . . M1 Union RecReaditemize MMapmap PPartmem LPartsh LPartsh LPartsh Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine Store Store Store Mergecmp SortGrpgrp MMapcombine Store PPartsh M2 . . . M3 RecReaditemize MMapmap LPartsh Sortcmp SortGrpgrp MMapcombine Store PPartsh M4 . . . T1 T5 T2 T4 T3 T6 Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp SortGrpgrp MMapreduce R1 Store T 1 . . . R2 T 2 Data Load Phase Map Phase Shuffle Phase Reduce Phase

Trojan Index Creation

sh(key k, value v, int numPartitions) k.splitID % numPartitions cmp(key k1, key k2) compare(k1.a , k2.a) grp(key k1, key k2) compare(k1.splitID , k2.splitID)             map(key k, value v) [(getSplitID() ⌅ prjai(k ⌅ v), k ⌅ v)] (key ik vset ivs)                  ⌅ ⌅ reduce(key ik, vset ivs) [(ivs ⌅ indexBuilderai(ivs))] 41
slide-91
SLIDE 91 T L PPartblock Replicate Replicate Replicate Replicate Replicate Replicate T1 T6 H1 Fetch Fetch Fetch Store Store Store Scan Scan PPartsplit PPartsplit H2 . . . H3 Fetch Fetch Fetch Store Store Store Scan PPartsplit H4 . . . M1 Union RecReaditemize MMapmap PPartmem LPartsh LPartsh LPartsh Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine Store Store Store Mergecmp SortGrpgrp MMapcombine Store PPartsh M2 . . . M3 RecReaditemize MMapmap LPartsh Sortcmp SortGrpgrp MMapcombine Store PPartsh M4 . . . T1 T5 T2 T4 T3 T6 Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp SortGrpgrp MMapreduce R1 Store T 1 . . . R2 T 2 Data Load Phase Map Phase Shuffle Phase Reduce Phase

Trojan Index Access

Algorithm 1: Trojan Index/Trojan Join split UDF Input : JobConf job, Int numSplits Output: logical data splits FileSplit [] splits; 1 File [] files = GetFiles(job); 2 foreach file in files do 3 Path path = file.getPath(); 4 InputStream in = GetInputStream(path); 5 Long offset = file.getLength(); 6 while offset > 0 do 7 in.seek(offset-FOOTER SIZE); 8 Footer footer = ReadFooter(in); 9 Long splitSize = footer.getSplitSize(); 10
  • ffset -= (splitSize + FOOTER SIZE);
11 BlockLocations blocks = GetBlockLocations(path,offset); 12 FileSplit newSplit = CreateSplit(path,offset,splitSize,blocks); 13 splits.add(newSplit); 14 end 15 end 16 return splits; 17 42
slide-92
SLIDE 92 T L PPartblock Replicate Replicate Replicate Replicate Replicate Replicate T1 T6 H1 Fetch Fetch Fetch Store Store Store Scan Scan PPartsplit PPartsplit H2 . . . H3 Fetch Fetch Fetch Store Store Store Scan PPartsplit H4 . . . M1 Union RecReaditemize MMapmap PPartmem LPartsh LPartsh LPartsh Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine Store Store Store Mergecmp SortGrpgrp MMapcombine Store PPartsh M2 . . . M3 RecReaditemize MMapmap LPartsh Sortcmp SortGrpgrp MMapcombine Store PPartsh M4 . . . T1 T5 T2 T4 T3 T6 Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp SortGrpgrp MMapreduce R1 Store T 1 . . . R2 T 2 Data Load Phase Map Phase Shuffle Phase Reduce Phase

Trojan Index Access

Algorithm 2: Trojan Index itemize.initialize UDF Input: FileSplit split, JobConf job Global FileSplit split = split; 1 Key lowKey = job.getLowKey(); 2 Global Key highKey = job.getHighKey(); 3 Int splitStart = split.getStart(); 4 Global Int splitEnd = split.getEnd(); 5 Header h = ReadHeader(split); 6 Overlap type = h.getOverlapType(lowKey,highKey); 7 Global Int offset; 8 if type == LEFT CONTAINED or type == FULL CONTAINED or type == 9 POINT CONTAINED then Index i = ReadIndex(split); 10
  • ffset = splitStart + i.lookup(lowKey);
11 else if type == RIGHT CONTAINED or type == SPAN then 12
  • ffset = splitStart;
13 else 14 // NOT CONTAINED, skip the split; 15
  • ffset = splitEnd;
16 end 17 Seek(offset); 18 42
slide-93
SLIDE 93 T L PPartblock Replicate Replicate Replicate Replicate Replicate Replicate T1 T6 H1 Fetch Fetch Fetch Store Store Store Scan Scan PPartsplit PPartsplit H2 . . . H3 Fetch Fetch Fetch Store Store Store Scan PPartsplit H4 . . . M1 Union RecReaditemize MMapmap PPartmem LPartsh LPartsh LPartsh Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine Store Store Store Mergecmp SortGrpgrp MMapcombine Store PPartsh M2 . . . M3 RecReaditemize MMapmap LPartsh Sortcmp SortGrpgrp MMapcombine Store PPartsh M4 . . . T1 T5 T2 T4 T3 T6 Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp SortGrpgrp MMapreduce R1 Store T 1 . . . R2 T 2 Data Load Phase Map Phase Shuffle Phase Reduce Phase

Trojan Index Access

size Algorithm 3: Trojan Index itemize.next UDF Input : KeyType key, ValueType value Output: has more records if offset < splitEnd then 1 Record nextRecord = ReadNextRecord(split); 2
  • ffset += nextRecord.size();
3 if nextRecord.key < highKey then 4 SetKeyValue(key, value, nextRecord); 5 return true; 6 end 7 end 8 return false; 9 42
slide-94
SLIDE 94 20 40 60 80 100 120 140 10 nodes 50 nodes 100 nodes runtime [seconds] Hadoop HadoopDB HadoopDB Chunks Hadoop++(256MB) Hadoop++(1GB)

Selection Analytical Task *

43 * Pavlo et. al. A Comparison of Approaches to large-Scale Data Analysis. SIGMOD 2009
slide-95
SLIDE 95

Trojan Index Advantages

  • Each HDFS block sorted
  • Each block contains an index
  • Index access in UDF
  • Scan + Index data accesses
  • Parallel index lookups
  • Non-invasive system changes
  • Much better performance
44
slide-96
SLIDE 96

Trojan Join

  • Each HDFS block co-

partitioning over two relations

  • Join relations are co-located
SData Tk SData Sk F DataSet ... ... Co-Partitioned Split i co-group j co-group j+1 Ht Hs Ht Hs SData Tk+1 SData Sk+1 45
slide-97
SLIDE 97

Trojan Join

  • Each HDFS block co-

partitioning over two relations

  • Join relations are co-located
  • Co-partitioned join in UDF
physical SData Tk SData Sk F DataSet ... ... Co-Partitioned Split i co-group j co-group j+1 Ht Hs Ht Hs SData Tk+1 SData Sk+1 to SData Tk SData Sk F DataSet ... ... Co-Partitioned Split i co-group j co-group j+1 Ht Hs Ht Hs SData Tk+1 SData Sk+1 co-partitioned SData Tk SData Sk F DataSet ... ... Co-Partitioned Split i co-group j co-group j+1 Ht Hs Ht Hs SData Tk+1 SData Sk+1 co-partition SData Tk SData Sk F DataSet ... ... Co-Partitioned Split i co-group j co-group j+1 Ht Hs Ht Hs SData Tk+1 SData Sk+1 45 HDFS Blocks
slide-98
SLIDE 98 500 1000 1500 2000 2500 10 nodes 50 nodes 100 nodes runtime [seconds] Hadoop HadoopDB Hadoop++(256MB) Hadoop++(1GB)

Join Analytical Task *

46 * Pavlo et. al. A Comparison of Approaches to large-Scale Data Analysis. SIGMOD 2009
slide-99
SLIDE 99

Trojan Join Advantages

  • Re- + Co- partitioned join
  • Parallel join processing
  • Non-invasive system changes
  • Much better performance
47
  • Each HDFS block co-

partitioning over two relations

  • Join relations are co-located
  • Co-partitioned join in UDF
slide-100
SLIDE 100

Trojan Layouts

  • Each HDFS block in row or column
  • Each block replica in different layout
  • Pick right layout in UDF
48 Replica 1 Replica 2 Replica 3
slide-101
SLIDE 101

1 2 3 4 5 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8

  • ver Hadoop-Row
  • ver Hadoop-PAX

Improvement Factor TPC-H Queries

Projection Analytical Task

49
slide-102
SLIDE 102

Trojan Layouts Advantages

  • Each HDFS block in row or column
  • Each block replica in different layout
  • Pick right layout in UDF
  • Row, PAX, Column-group layouts
  • Several layouts at the same time
  • Non-invasive system changes
  • Much better performance
50
slide-103
SLIDE 103 51
  • Automatically rewriting user queries
  • Avoiding UDF call overheads for low selectivity
  • Putting all Trojan Techniques together in a single system
  • What to store, How to store, Where to store
  • Trojan Techniques: one way of approaching OctopusDB
  • Trojan Techniques: first step towards OctopusDB
  • Storage

View optimization: selection, transformation, update propagation

Open Issues

slide-104
SLIDE 104

Mixed Workloads Big Problem Not Sufficient Complicated

52

OLAP

Streaming Log-processing Web-search Scan-oriented Archiving

OLTP

5

BIG DATA

7 10 17
slide-105
SLIDE 105

Mixed Workloads Big Problem Not Sufficient Complicated Our Vision First Steps

OLAP in Row-stores

Big Data Analytics

52

OLAP

Streaming Log-processing Web-search Scan-oriented Archiving

OLTP

5

BIG DATA

7 10 17
  • Flexible data storage layer
  • Adapt layout to workload
  • Logical journal of data operations
  • Arbitrary physical representations
  • ‘Mimic’ several systems
  • No ‘zoo’ overheads
  • One-size-fits-all
OctopusDB 24
  • Flexible data storage layer
  • Adapt layout to workload
  • Logical journal of data operations
  • Arbitrary physical representations
  • New concept: Storage
Views [VLDB PhD Workshop, 2010] [CIDR, 2011] Trojan Techniques
  • Existing system
  • Inject additional layouts
  • Source-code not required
  • Good use of Trojans
  • No heavy changes
  • Affect from inside
  • Similar to PAX, fpB+tree
28 30 Use Case 1: OLAP in Row-stores Use Case 2: Big Data Analytics 59

[CIDR, 2013]

(Under Submission)

[VLDB, 2010] [SOCC, 2011]