Ph.D. Matt Might, The Illustrated Guide to a Ph.D.: - - PowerPoint PPT Presentation

ph d
SMART_READER_LITE
LIVE PREVIEW

Ph.D. Matt Might, The Illustrated Guide to a Ph.D.: - - PowerPoint PPT Presentation

Ph.D. Matt Might, The Illustrated Guide to a Ph.D.: http://matt.might.net/articles/phd-school-in-pictures ONE SIZE DOES NOT FIT ALL Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 4 Streaming OLAP OLTP Archiving


slide-1
SLIDE 1

Ph.D.

Matt Might, The Illustrated Guide to a Ph.D.: http://matt.might.net/articles/phd-school-in-pictures

slide-2
SLIDE 2

ONE SIZE DOES NOT FIT ALL

slide-3
SLIDE 3

OLAP

Streaming

Log-processing Web-search Scan-oriented Archiving

OLTP

4

slide-4
SLIDE 4

OLAP

Streaming

Log-processing Web-search Scan-oriented Archiving

OLTP

4

slide-5
SLIDE 5

OLAP

Streaming

Log-processing Web-search Scan-oriented Archiving

OLTP

4

slide-6
SLIDE 6

OLAP

Streaming

Log-processing Web-search Scan-oriented Archiving

OLTP

4

slide-7
SLIDE 7

OLAP

Streaming

Log-processing Web-search Scan-oriented Archiving

OLTP

4

slide-8
SLIDE 8

OLTP

5

slide-9
SLIDE 9

5

OLAP

slide-10
SLIDE 10

5

Archive

slide-11
SLIDE 11

5

Streaming

slide-12
SLIDE 12

5

Log-processing

slide-13
SLIDE 13

6

OLTP OLAP

Archiving Scan-oriented

Streaming

Log-processing Web-search

slide-14
SLIDE 14

6

slide-15
SLIDE 15

7

Indexes

Column Row

Raw files Row+Column

slide-16
SLIDE 16

1 abc 56 887.9 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6

  • mg

90 148.9

8

Storage Views

slide-17
SLIDE 17

1 abc 56 887.9 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6

  • mg

90 148.9

8

Log

Storage Views

slide-18
SLIDE 18

1 abc 56 887.9 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6

  • mg

90 148.9

8

Log

Row

Storage Views

slide-19
SLIDE 19

1 abc 56 887.9 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6

  • mg

90 148.9

8

Log

Row Column

Storage Views

slide-20
SLIDE 20

1 abc 56 887.9 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6

  • mg

90 148.9

8

Log

Row Column

Column grouped

Storage Views

slide-21
SLIDE 21

1 abc 56 887.9 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6

  • mg

90 148.9

8

Log

Row Column

Column grouped

Index

Storage Views

slide-22
SLIDE 22

1 abc 56 887.9 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6

  • mg

90 148.9

8

Log

Row Column

Column grouped

Index PAX

Storage Views

slide-23
SLIDE 23

Log SV

Result

tickets.customer_id

!

customer.*

( ))

"

a1=x1..an=xn

(

customer.id

Example: Flight Tickets

9

slide-24
SLIDE 24

Log SV

Result

tickets.customer_id

!

customer.*

( ))

"

a1=x1..an=xn

(

customer.id

Example: Flight Tickets

9

Col SV Row SV Log SV

Result

!

b a g = c u s t

  • m

e r s

"

b a g , k e y r e c e n t

#

( ( ) )

!

bag=tickets

"

bag,key recent

#

( ( ))

tickets.customer_id

$

customer.*

( ))

!

a1=x1..an=xn

(

customer.id

tickets customers

slide-25
SLIDE 25

Log SV

Result

tickets.customer_id

!

customer.*

( ))

"

a1=x1..an=xn

(

customer.id

Col SV Row SV Log SV

Result

!

bag=customers

"

bag,key recent

#

( ( ))

!

bag=tickets

"

bag,key recent

#

( ( ) )

!

time>=now-7days Col SV

!

t i m e < n

  • w
  • 7

d a y s

Cold

Index SV Index SV

$ id,rid $price,rid

count(*)>=5 customer_id

"

#

( )

tickets.customer_id

$

customer.*

( ))

!

a1=x1..an=xn

(

customer.id

Frequent Fliers (Adaptive Partial Index)

customer.id tickets.customer_id

Example: Flight Tickets

9

Col SV Row SV Log SV

Result

!

b a g = c u s t

  • m

e r s

"

b a g , k e y r e c e n t

#

( ( ) )

!

bag=tickets

"

bag,key recent

#

( ( ))

tickets.customer_id

$

customer.*

( ))

!

a1=x1..an=xn

(

customer.id

tickets customers

slide-26
SLIDE 26

Log SV

Result

tickets.customer_id

!

customer.*

( ))

"

a1=x1..an=xn

(

customer.id

Col SV Row SV Log SV

Result

!

bag=customers

"

bag,key recent

#

( ( ))

!

bag=tickets

"

bag,key recent

#

( ( ) )

!

time>=now-7days Col SV

!

t i m e < n

  • w
  • 7

d a y s

Cold

Index SV Index SV

$ id,rid $price,rid

count(*)>=5 customer_id

"

#

( )

tickets.customer_id

$

customer.*

( ))

!

a1=x1..an=xn

(

customer.id

Frequent Fliers (Adaptive Partial Index)

customer.id tickets.customer_id Primary Log Store

Primary Log Store

Example: Flight Tickets

9

Col SV Row SV Log SV

Result

!

b a g = c u s t

  • m

e r s

"

b a g , k e y r e c e n t

#

( ( ) )

!

bag=tickets

"

bag,key recent

#

( ( ))

tickets.customer_id

$

customer.*

( ))

!

a1=x1..an=xn

(

customer.id

tickets customers

Primary Log Store
slide-27
SLIDE 27

10

WTF!

Where’s The Food!

slide-28
SLIDE 28

Rodent Store

slide-29
SLIDE 29

What to store?

Data Files

copy 1 copy 2 copy 3

slide-30
SLIDE 30

Data Files

How to store?

?

a b

+

slide-31
SLIDE 31

Data Files

Where to store?

?

slide-32
SLIDE 32

DSL DSL

Logical Data View Physical Data View WWHow! Language Physical Storage Interface

Data Management System

WWHow! Layer

slide-33
SLIDE 33

Example Use-cases

  • WWHow! File System
  • WWHow! RAID
  • WWHow! Relational DBMS
  • WWHow! Cloud
slide-34
SLIDE 34

STORE ‘/Users/Bob/Conferences/Talks/*.*’ WHAT *.(pdf | ppt), *.pdf WHERE vise4 HOW encryption(rsa) FOR *;

Store my conferences talks (PDFs 2x and PPTs 1x) using RSA compression on University server

slide-35
SLIDE 35

I want my conference talks to be highly available

STORE ‘/Users/Bob/Conferences/Talks/*.*’ WHAT *.(pdf | ppt), *.pdf HOW encryption(rsa) FOR * PREFERENCE Availability=‘high’;

slide-36
SLIDE 36

I want my conference talks to be highly available

job for the 
 WWhow! data storage optimizer

STORE ‘/Users/Bob/Conferences/Talks/*.*’ WHAT *.(pdf | ppt), *.pdf HOW encryption(rsa) FOR * PREFERENCE Availability=‘high’;

slide-37
SLIDE 37

OctopusDB

19

  • Cool

Vision

  • Tough to realize
slide-38
SLIDE 38

C-Store

slide-39
SLIDE 39

21

?

slide-40
SLIDE 40

Trojan Columns

23

UDF Storage Layer Query Processor Relations Physical Representation

File 1 File 2 File 3 File n ....

Application

User Database
slide-41
SLIDE 41

24

Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household

Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household

Relation Physical Table

Trojan Columns

slide-42
SLIDE 42

24

Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household

Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household

Relation Physical Table

Trojan Columns

Tuple Iterator Data Parser Data Accessor (a) Convert row tuples into blobs (b) Store blob data (c) Get next row data

write-UDF

slide-43
SLIDE 43

25

Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household

Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household

Relation Physical Table

Trojan Columns

Tuple Iterator Data Parser Data Accessor (e) Reconstruct row tuples (d) Parse blob data (f) Fetch blob data (g)End of table

read-UDF

slide-44
SLIDE 44

Example: TPC-H Query 6

Result

quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24

σ π

agg (extendedprice * discount)

γ

lineitem

SCAN

26

slide-45
SLIDE 45

Example: TPC-H Query 6

Result

quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24

σ π

agg (extendedprice * discount)

γ

lineitem

SCAN

shipd

scanUDF

scanUDF Result

quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24

σ π

agg (extendedprice * discount)

γ

lineitem

SCAN

26

slide-46
SLIDE 46

Example: TPC-H Query 6

Result

quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24

σ π

agg (extendedprice * discount)

γ

lineitem

SCAN

shipd

scanUDF

scanUDF Result

quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24

σ π

agg (extendedprice * discount)

γ

lineitem

SCAN

σ

te BETWEEN

σ

selectUDF

σ

selectUDF Result

quantity, discount extendedprice, shipdate shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24

σ π

agg (extendedprice * discount)

γ

lineitem

SCAN

26

slide-47
SLIDE 47

10 20 30 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Query Time (sec)

Standard Row Trojan Columns

71.74058 72.41696

Benchmark Results *

27

* Mike Stonebraker et. al. C-Store: A Column Oriented DBMS. VLDB 2005

slide-48
SLIDE 48

10 20 30 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Query Time (sec)

Standard Row Trojan Columns

71.74058 72.41696

Benchmark Results *

27

* Mike Stonebraker et. al. C-Store: A Column Oriented DBMS. VLDB 2005

5x

slide-49
SLIDE 49

1980s

slide-50
SLIDE 50

1990s

slide-51
SLIDE 51

2000s

slide-52
SLIDE 52

HYRISE

2010s

slide-53
SLIDE 53

7 Vertical Partitioning Algorithms

  • Brute Force
  • Navathe’s Algorithm
  • HillClimb
  • AutoPart
  • HYRISE
  • O2P
  • Trojan

29

slide-54
SLIDE 54

Four Comparison Metrics

  • How Fast?
  • How Good?
  • How fragile?
  • Where does it makes sense?

30

slide-55
SLIDE 55

Optimization Runtime

31

0.01 0.1 1 10 100 1,000 10,000 Optimization time (s) AutoPart HillClimb HYRISE Navathe O2P Trojan BruteForce

slide-56
SLIDE 56

Distance from Column Layouts

32

  • 30
  • 25
  • 20
  • 15
  • 10
  • 5

5 10 AutoPart HillClimb HYRISE Navathe O2P Trojan BruteForce

Percentage Difference from Column Layouts [%] TPC-H SSB

slide-57
SLIDE 57

Effect of Buffer Size

33

0% 25% 50% 75% 100% 125% 150% 0.01 0.1 1 10 100 1,000 10,000 Normalized estd. costs (%) Buffer Size (MB, log scale)

HillClimb Navathe Materialized views Column

slide-58
SLIDE 58

Comparison’s Paper: Hadoop Vs PDBMS

slide-59
SLIDE 59

Comparison’s Paper: Hadoop Vs PDBMS

slide-60
SLIDE 60

1 Nodes 10 Nodes 25 Nodes 50 Nodes 100 Nodes 20 40 60 80 100 120 140 160 seconds

← 0.3 ← 0.8 ← 1.8 ← 4.7 ← 12.4 Vertica Hadoop

1 Nodes 10 Nodes 25 Nodes 50 Nodes 100 Nodes 200 400 600 800 1000 1200 1400 1600 1800 seconds

← 21.5 ← 28.2 ← 31.3 ← 36.1 ← 85.0 ← 15.7 ← 28.0 ← 29.2 ← 29.4 ← 31.9 Vertica DBMS−X Hadoop

Analytical Query Performance

Selection Task Join Task

slide-61
SLIDE 61 L PPartblock Replicate Replicate Replicate Replicate Replicate Replicate T1 T6 H1 Fetch Fetch Fetch Store Store Store Scan Scan PPartsplit PPartsplit H2 . . . H3 Fetch Fetch Fetch Store Store Store Scan PPartsplit H4 . . . M1 Union RecReaditemize MMapmap PPartmem LPartsh LPartsh LPartsh Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine Store Store Store Mergecmp SortGrpgrp MMapcombine Store PPartsh M2 . . . M3 RecReaditemize MMapmap LPartsh Sortcmp SortGrpgrp MMapcombine Store PPartsh M4 . . . T1 T5 T2 T4 T3 T6 Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp SortGrpgrp MMapreduce R1 Store
  • . . .
R2
  • Data Load Phase

Map Phase Shuffle Phase Reduce Phase

.. Client Data Node 1 .. .. Data Node 3 Mapper 1 Mapper 3 Reducer 1

slide-62
SLIDE 62 L PPartblock Replicate Replicate Replicate Replicate Replicate Replicate T1 T6 H1 Fetch Fetch Fetch Store Store Store Scan Scan PPartsplit PPartsplit H2 . . . H3 Fetch Fetch Fetch Store Store Store Scan PPartsplit H4 . . . M1 Union RecReaditemize MMapmap PPartmem LPartsh LPartsh LPartsh Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine Store Store Store Mergecmp SortGrpgrp MMapcombine Store PPartsh M2 . . . M3 RecReaditemize MMapmap LPartsh Sortcmp SortGrpgrp MMapcombine Store PPartsh M4 . . . T1 T5 T2 T4 T3 T6 Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp SortGrpgrp MMapreduce R1 Store
  • . . .
R2
  • Data Load Phase

Map Phase Shuffle Phase Reduce Phase

..

slide-63
SLIDE 63 L PPartblock Replicate Replicate Replicate Replicate Replicate Replicate T1 T6 H1 Fetch Fetch Fetch Store Store Store Scan Scan PPartsplit PPartsplit H2 . . . H3 Fetch Fetch Fetch Store Store Store Scan PPartsplit H4 . . . M1 Union RecReaditemize MMapmap PPartmem LPartsh LPartsh LPartsh Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine Store Store Store Mergecmp SortGrpgrp MMapcombine Store PPartsh M2 . . . M3 RecReaditemize MMapmap LPartsh Sortcmp SortGrpgrp MMapcombine Store PPartsh M4 . . . T1 T5 T2 T4 T3 T6 Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp SortGrpgrp MMapreduce R1 Store
  • . . .
R2
  • Data Load Phase

Map Phase Shuffle Phase Reduce Phase

..

slide-64
SLIDE 64 T L PPartblock Replicate Replicate Replicate Replicate Replicate Replicate T1 T6 H1 Fetch Fetch Fetch Store Store Store Scan Scan PPartsplit PPartsplit H2 . . . H3 Fetch Fetch Fetch Store Store Store Scan PPartsplit H4 . . . M1 Union RecReaditemize MMapmap PPartmem LPartsh LPartsh LPartsh Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine Store Store Store Mergecmp SortGrpgrp MMapcombine Store PPartsh M2 . . . M3 RecReaditemize MMapmap LPartsh Sortcmp SortGrpgrp MMapcombine Store PPartsh M4 . . . T1 T5 T2 T4 T3 T6 Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp SortGrpgrp MMapreduce R1 Store T 1 . . . R2 T 2 Data Load Phase Map Phase Shuffle Phase Reduce Phase

Trojan Index Creation

sh(key k, value v, int numPartitions) k.splitID % numPartitions cmp(key k1, key k2) compare(k1.a , k2.a) grp(key k1, key k2) compare(k1.splitID , k2.splitID)

            map(key k, value v) [(getSplitID() ⌅ prjai(k ⌅ v), k ⌅ v)] (key ik vset ivs)

                 ⌅ ⌅ reduce(key ik, vset ivs) [(ivs ⌅ indexBuilderai(ivs))]

37

slide-65
SLIDE 65 T L PPartblock Replicate Replicate Replicate Replicate Replicate Replicate T1 T6 H1 Fetch Fetch Fetch Store Store Store Scan Scan PPartsplit PPartsplit H2 . . . H3 Fetch Fetch Fetch Store Store Store Scan PPartsplit H4 . . . M1 Union RecReaditemize MMapmap PPartmem LPartsh LPartsh LPartsh Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine Store Store Store Mergecmp SortGrpgrp MMapcombine Store PPartsh M2 . . . M3 RecReaditemize MMapmap LPartsh Sortcmp SortGrpgrp MMapcombine Store PPartsh M4 . . . T1 T5 T2 T4 T3 T6 Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp SortGrpgrp MMapreduce R1 Store T 1 . . . R2 T 2 Data Load Phase Map Phase Shuffle Phase Reduce Phase

Trojan Index Access

Algorithm 1: Trojan Index/Trojan Join split UDF Input : JobConf job, Int numSplits Output: logical data splits FileSplit [] splits; 1 File [] files = GetFiles(job); 2 foreach file in files do 3 Path path = file.getPath(); 4 InputStream in = GetInputStream(path); 5 Long offset = file.getLength(); 6 while offset > 0 do 7 in.seek(offset-FOOTER SIZE); 8 Footer footer = ReadFooter(in); 9 Long splitSize = footer.getSplitSize(); 10
  • ffset -= (splitSize + FOOTER SIZE);
11 BlockLocations blocks = GetBlockLocations(path,offset); 12 FileSplit newSplit = CreateSplit(path,offset,splitSize,blocks); 13 splits.add(newSplit); 14 end 15 end 16 return splits; 17

38

slide-66
SLIDE 66 T L PPartblock Replicate Replicate Replicate Replicate Replicate Replicate T1 T6 H1 Fetch Fetch Fetch Store Store Store Scan Scan PPartsplit PPartsplit H2 . . . H3 Fetch Fetch Fetch Store Store Store Scan PPartsplit H4 . . . M1 Union RecReaditemize MMapmap PPartmem LPartsh LPartsh LPartsh Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine Store Store Store Mergecmp SortGrpgrp MMapcombine Store PPartsh M2 . . . M3 RecReaditemize MMapmap LPartsh Sortcmp SortGrpgrp MMapcombine Store PPartsh M4 . . . T1 T5 T2 T4 T3 T6 Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp SortGrpgrp MMapreduce R1 Store T 1 . . . R2 T 2 Data Load Phase Map Phase Shuffle Phase Reduce Phase

Trojan Index Access

Algorithm 2: Trojan Index itemize.initialize UDF Input: FileSplit split, JobConf job Global FileSplit split = split; 1 Key lowKey = job.getLowKey(); 2 Global Key highKey = job.getHighKey(); 3 Int splitStart = split.getStart(); 4 Global Int splitEnd = split.getEnd(); 5 Header h = ReadHeader(split); 6 Overlap type = h.getOverlapType(lowKey,highKey); 7 Global Int offset; 8 if type == LEFT CONTAINED or type == FULL CONTAINED or type == 9 POINT CONTAINED then Index i = ReadIndex(split); 10
  • ffset = splitStart + i.lookup(lowKey);
11 else if type == RIGHT CONTAINED or type == SPAN then 12
  • ffset = splitStart;
13 else 14 // NOT CONTAINED, skip the split; 15
  • ffset = splitEnd;
16 end 17 Seek(offset); 18

38

slide-67
SLIDE 67 T L PPartblock Replicate Replicate Replicate Replicate Replicate Replicate T1 T6 H1 Fetch Fetch Fetch Store Store Store Scan Scan PPartsplit PPartsplit H2 . . . H3 Fetch Fetch Fetch Store Store Store Scan PPartsplit H4 . . . M1 Union RecReaditemize MMapmap PPartmem LPartsh LPartsh LPartsh Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine Store Store Store Mergecmp SortGrpgrp MMapcombine Store PPartsh M2 . . . M3 RecReaditemize MMapmap LPartsh Sortcmp SortGrpgrp MMapcombine Store PPartsh M4 . . . T1 T5 T2 T4 T3 T6 Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp SortGrpgrp MMapreduce R1 Store T 1 . . . R2 T 2 Data Load Phase Map Phase Shuffle Phase Reduce Phase

Trojan Index Access

size

Algorithm 3: Trojan Index itemize.next UDF Input : KeyType key, ValueType value Output: has more records if offset < splitEnd then 1 Record nextRecord = ReadNextRecord(split); 2
  • ffset += nextRecord.size();
3 if nextRecord.key < highKey then 4 SetKeyValue(key, value, nextRecord); 5 return true; 6 end 7 end 8 return false; 9

38

slide-68
SLIDE 68

20 40 60 80 100 120 140 10 nodes 50 nodes 100 nodes runtime [seconds] Hadoop HadoopDB HadoopDB Chunks Hadoop++(256MB) Hadoop++(1GB)

Selection Analytical Task *

39

* Pavlo et. al. A Comparison of Approaches to large-Scale Data Analysis. SIGMOD 2009

slide-69
SLIDE 69

500 1000 1500 2000 2500 10 nodes 50 nodes 100 nodes runtime [seconds] Hadoop HadoopDB Hadoop++(256MB) Hadoop++(1GB)

Join Analytical Task *

40

* Pavlo et. al. A Comparison of Approaches to large-Scale Data Analysis. SIGMOD 2009

slide-70
SLIDE 70

41

Trojan Index Trojan Join

slide-71
SLIDE 71

42

slide-72
SLIDE 72

Traditional Layouts

43 001 alex bsc 002 tim msc 003 mat bsc 004 joel bsc 005 phil msc 006 ron msc 007 neo bsc 008 jack msc 009 jens bsc 010 tom msc

Row Column* PAX**

(default)

* A. Floratou et al. Column-Oriented Storage Techniques for MapReduce. PVLDB, April, 2011 **

  • Y. He et al. RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems. ICDE, 2011
slide-73
SLIDE 73

Traditional Layouts

44

Row Column PAX

Non-required Reads Network Costs Data Block Placement Tuple Reconstruction

slide-74
SLIDE 74

Trojan Data Layouts

45

Replica 2 Replica 1 Replica 3

slide-75
SLIDE 75

Trojan Data Layouts

46

Non-required Reads Network Costs Data Block Placement Tuple Reconstruction

Row Column PAX Trojan

slide-76
SLIDE 76

Layout Quality

47

#Non-required Attributes Read #Joins in Tuple Reconstruction

HADOOP-ROW 525 HADOOP-PAX 139 Trojan Layout 14 20

slide-77
SLIDE 77

1 2 3 4 5 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8

  • ver Hadoop-Row
  • ver Hadoop-PAX

Improvement Factor TPC-H Queries

Projection Analytical Task

48

slide-78
SLIDE 78

Hadoop Aggressive Indexing Library

slide-79
SLIDE 79

Individual Jobs: Weblog, RecordReader

1000 2000 3000 4000 Bob-Q1 Bob-Q2 Bob-Q3 Bob-Q4 Bob-Q5

683 333 52 75 73 2864 2917 53 83 2776 2442 2470 21 12 2156 3358

RR Runtime [ms] MapReduce Jobs

Hadoop Hadoop ++ HAIL

slide-80
SLIDE 80

Cartilage

slide-81
SLIDE 81

Cartilage

slide-82
SLIDE 82

Hadoop Stack

54

HDFS MapReduce Cartilage Query Engine Cartilage Upload Pipeline Hive HBase Pig

Data File 1 Data File 2 Data File n

...

slide-83
SLIDE 83

Hadoop Stack

55

HDFS Cartilage Query Engine Cartilage Upload Pipeline Input Data Queried Data

slide-84
SLIDE 84

Upload Plans

56

Serializer Parser

Data

Physical Partitioner Uploader

HDFS Stage 5 Stage 3 Stage 2 Stage 1 Block 3 Block 1

Logical Partitioner

Stage 4 Block 2

slide-85
SLIDE 85

Upload Plans

57

Parser Replicator 1

Data

Block 2

Locator 2 Physical Partitioner 2 Logical Partitioner Serializer 3 Locator 1 Uploader

HDFS

Physical Partitioner 1 Serializer 2 Serializer 1 Replicator 2

replica 1 replica 2 replica 1a replica 1b

Block 1 Block 4 Block 3 Block 5

slide-86
SLIDE 86

Summary

58

slide-87
SLIDE 87

59

ONE SIZE DOES NOT FIT ALL

CIDR 2011

HYRISE

WWHow! Layer

CIDR 2013 VLDB 2013 CIDR 2013 VLDB 2010 SOCC 2011 VLDB 2012 SIGMOD (demo)

slide-88
SLIDE 88

Acknowledgements

  • Jens Dittrich
  • Jorge Quiane
  • Felix Martin Schuhknecht
  • Endre Palatinus
  • Karen Khachatryan
  • Stefan Richter
  • Alexander Bunte

60

  • Sam Madden
  • Stefan Richter
  • Stefan Schuh
  • Joerg Schad
  • Yagiz Kargin
  • Vinay Setty
  • Vladimir Pavlov