2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 - PowerPoint PPT Presentation

How does it work? • Exploit UDFs provided by existing systems Query Processor • Inject pieces of code Relations • Hack layouts into the UDFs UDF Storage Layer • UDF as mapping between logical and physical view of data Physical Representation File 1 File 2 File 3 .... File n 27

How does it work? • Exploit UDFs provided by existing systems Query Processor • Inject pieces of code Relations • Hack layouts into the UDFs UDF Storage Layer • UDF as mapping between logical and physical view of data Physical Representation • Novel use of UDFs File 1 File 2 File 3 .... File n 27

Use Case 1 : OLAP in Row-stores 28

OLTP OLAP 29

OLTP OLAP? • Can we push the limits of row stores? 29

Trojan Columns Relation Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household 30

Trojan Columns Relation Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Physical Table Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household 30

Example: TPC-H Query 6 Result Result Result γ γ γ agg (extendedprice * discount) agg (extendedprice * discount) agg (extendedprice * discount) selectUDF σ σ σ shipdate BETWEEN shipdate BETWEEN shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ ‘1994-01-01’ AND ‘1995-01-01’ ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN AND discount BETWEEN AND discount BETWEEN 0.05 AND 0.07 0.05 AND 0.07 0.05 AND 0.07 AND quantity < 24 AND quantity < 24 AND quantity < 24 π π π quantity, discount quantity, discount quantity, discount extendedprice, shipdate extendedprice, shipdate extendedprice, shipdate scanUDF S CAN S CAN S CAN lineitem lineitem lineitem shipd scanUDF selectUDF σ σ 31 σ ate BETWEEN

C-Store Benchmark * 30 Standard Row Trojan Columns Query Time (sec) 20 10 0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 71.74058 72.41696 32 * Mike Stonebraker et. al. C-Store: A Column Oriented DBMS. VLDB 2005

TPC-H Benchmark * 100 Standard Row Trojan Columns 75 Query Time (sec) 50 25 0 Q1 Q6 Q12 Q14 33 * tpch.org/tpch

TPC-H Benchmark * 900 Standard Row Trojan Columns 675 Query Time (sec) 450 225 0 Q3 Q5 Q10 Q19 34 * tpch.org/tpch

Micro-Benchmark 16 2.13 2.13 2.11 2.06 1.55 0.47 0.06 # referenced attributes (r) 15 4.64 4.62 4.55 4.27 2.57 0.55 0.06 13 5.00 5.00 4.94 4.61 2.70 0.57 0.06 11 5.79 5.82 5.75 5.24 2.87 0.56 0.06 9 6.39 6.38 6.25 5.79 3.11 0.54 0.06 7 7.00 6.96 6.80 6.23 3.17 0.56 0.06 5 10.96 10.94 10.55 9.27 3.75 0.57 0.06 3 12.86 13.57 13.22 11.03 4.16 0.56 0.06 1 17.43 17.61 16.61 13.57 4.39 0.57 0.06 1E-06 1E-05 1E-04 1E-03 1E-02 1E-01 1E+00 selectivity (fraction of tuples accessed) 35

versus Column Stores 100 Standard Row Trojan Columns DBMS-Y 75 Query Time (sec) 50 25 0 Q1 Q6 Q12 Q14 36

Trojan Columns Advantages • Column + Row storage • Much better performance • Closed source system 37

Use Case 2: Big Data Analytics 38

Web-log Processing • Scan Tasks User visits from different countries 39

Web-log Processing • Scan Tasks • Selection Tasks User visits with duration greater than 10s 39

Web-log Processing • Scan Tasks • Selection Tasks • Projection Tasks URL and duration of each user visit 39

Web-log Processing • Scan Tasks • Selection Tasks • Projection Tasks • Join Tasks Average PageRank visited by each user IP 39

Web-log Processing T Data Load Phase T 1 L PPartblock T 6 Replicate Replicate Replicate Replicate Replicate Replicate H1 Fetch Fetch Fetch H2 H3 Fetch Fetch Fetch H4 Store Store Store . . . Store Store Store . . . Scan Scan Scan PPartsplit PPartsplit PPartsplit T 1 T 5 T 2 T 4 T 3 T 6 M1 Union M2 M3 M4 RecReaditemize RecReaditemize MMapmap MMapmap PPartmem Map Phase LPartsh LPartsh LPartsh LPartsh Sortcmp Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine . . . MMapcombine . . . Store Store Store Mergecmp SortGrpgrp • Scan Tasks MMapcombine Store Store • Selection Tasks PPartsh PPartsh Shu ffl e Phase Fetch Fetch Fetch Fetch • Projection Tasks Buffer Buffer Buffer Buffer Store Store Merge • Join Tasks Store Mergecmp . . . Reduce Phase SortGrpgrp MMapreduce 39 R1 Store R2

Trojan Index DataSet ... ... SData T H F Trojan Index Indexed Split i • Each HDFS block sorted • Each block contains an index 40

Trojan Index HDFS Blocks DataSet ... ... SData T Trojan Index H F Indexed Split i DataSet ... ... SData T Trojan Index H F Indexed Split i DataSet ... ... SData T Trojan Index H F Indexed Split i DataSet ... ... SData T Trojan Index H F Indexed Split i • Each HDFS block sorted • Each block contains an index • Index access in UDF 40

Trojan Index Creation T Data Load Phase T 1 L PPartblock T 6 Replicate Replicate Replicate Replicate Replicate Replicate H1 Fetch Fetch Fetch H2 H3 Fetch Fetch Fetch H4 Store Store Store . . . Store Store Store . . . Scan Scan Scan PPartsplit PPartsplit PPartsplit T 1 T 5 T 2 T 4 T 3 T 6  map (key k , value v ) �� M1 Union M2 M3 M4    RecReaditemize RecReaditemize   [( getSplitID() ⌅ prj ai ( k ⌅ v ) , k ⌅ v )]   MMapmap MMapmap  PPartmem (key ik vset ivs ) ��  Map Phase   LPartsh LPartsh LPartsh LPartsh cmp (key k 1 , key k 2) �� compare ( k 1 . a , k 2 . a ) Sortcmp Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine . . . grp (key k 1 , key k 2) �� compare ( k 1 . splitID , k 2 . splitID ) MMapcombine . . . Store Store Store Mergecmp SortGrpgrp MMapcombine sh (key k , value v , int numPartitions ) �� k . splitID % numPartitions Store Store PPartsh PPartsh Shu ffl e Phase Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer   Store Store Merge    Store  ⌅ ⌅    Mergecmp . . . Reduce Phase reduce (key ik , vset ivs ) ��  SortGrpgrp     MMapreduce   [( ivs ⌅ indexBuilder ai ( ivs ))]  R1 Store R2 41 T � T � 1 2

Trojan Index Access T Data Load Phase T 1 L PPartblock T 6 Replicate Replicate Replicate Replicate Replicate Replicate H1 Fetch Fetch Fetch H2 H3 Fetch Fetch Fetch H4 Store Store Store . . . Store Store Store . . . Scan Scan Scan PPartsplit PPartsplit PPartsplit T 1 T 5 T 2 T 4 T 3 T 6 Algorithm 1 : Trojan Index / Trojan Join split UDF M1 Union M2 M3 M4 Input : JobConf job, Int numSplits Output : logical data splits RecReaditemize RecReaditemize 1 FileSplit [] splits; MMapmap MMapmap 2 File [] files = GetFiles( job ) ; PPartmem 3 foreach file in files do Map Phase Path path = file. getPath() ; 4 LPartsh LPartsh LPartsh LPartsh 5 InputStream in = GetInputStream( path ) ; Sortcmp Sortcmp Sortcmp Sortcmp 6 Long o ff set = file. getLength() ; 7 while o ff set > 0 do SortGrpgrp SortGrpgrp SortGrpgrp SortGrpgrp in. seek( o ff set- FOOTER SIZE ) ; 8 MMapcombine MMapcombine MMapcombine . . . MMapcombine . . . 9 Footer footer = ReadFooter( in ) ; 10 Long splitSize = footer. getSplitSize() ; Store Store Store 11 o ff set - = (splitSize + FOOTER SIZE ); Mergecmp BlockLocations blocks = GetBlockLocations( path,o ff set ) ; 12 13 FileSplit newSplit = CreateSplit( path,o ff set,splitSize,blocks ) ; SortGrpgrp 14 splits. add( newSplit ) ; MMapcombine 15 end Store Store 16 end 17 return splits; PPartsh PPartsh Shu ffl e Phase Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp . . . Reduce Phase SortGrpgrp MMapreduce R1 Store R2 42 T � T � 1 2

Trojan Index Access T Data Load Phase T 1 L PPartblock T 6 Replicate Replicate Replicate Replicate Replicate Replicate H1 Fetch Fetch Fetch H2 H3 Fetch Fetch Fetch H4 Store Store Store . . . Store Store Store . . . Scan Scan Scan PPartsplit PPartsplit PPartsplit T 1 T 5 T 2 T 4 T 3 T 6 Algorithm 2 : Trojan Index itemize .initialize UDF M1 Union M2 M3 M4 Input : FileSplit split, JobConf job RecReaditemize RecReaditemize 1 Global FileSplit split = split; 2 Key lowKey = job. getLowKey() ; MMapmap MMapmap 3 Global Key highKey = job. getHighKey() ; PPartmem 4 Int splitStart = split. getStart() ; Map Phase 5 Global Int splitEnd = split. getEnd() ; LPartsh LPartsh LPartsh LPartsh 6 Header h = ReadHeader( split ) ; Sortcmp Sortcmp Sortcmp Sortcmp 7 Overlap type = h. getOverlapType( lowKey,highKey ) ; Global Int o ff set; 8 SortGrpgrp SortGrpgrp SortGrpgrp SortGrpgrp 9 if type == LEFT CONTAINED or type == FULL CONTAINED or type == MMapcombine MMapcombine MMapcombine . . . MMapcombine . . . POINT CONTAINED then 10 Index i = ReadIndex( split ) ; Store Store Store 11 o ff set = splitStart + i. lookup( lowKey ) ; Mergecmp 12 else if type == RIGHT CONTAINED or type == SPAN then SortGrpgrp 13 o ff set = splitStart; 14 else MMapcombine 15 // NOT CONTAINED, skip the split; Store Store 16 o ff set = splitEnd; 17 end PPartsh PPartsh 18 Seek( o ff set ) ; Shu ffl e Phase Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp . . . Reduce Phase SortGrpgrp MMapreduce R1 Store R2 42 T � T � 1 2

Trojan Index Access T Data Load Phase T 1 L PPartblock T 6 Replicate Replicate Replicate Replicate Replicate Replicate H1 Fetch Fetch Fetch H2 H3 Fetch Fetch Fetch H4 Store Store Store . . . Store Store Store . . . Scan Scan Scan PPartsplit PPartsplit PPartsplit T 1 T 5 T 2 T 4 T 3 T 6 Algorithm 3 : Trojan Index itemize .next UDF M1 Union M2 M3 M4 Input : KeyType key, ValueType value Output : has more records RecReaditemize RecReaditemize 1 if o ff set < splitEnd then MMapmap MMapmap 2 Record nextRecord = ReadNextRecord( split ) ; PPartmem 3 o ff set += nextRecord.size(); Map Phase 4 if nextRecord.key < highKey then LPartsh LPartsh LPartsh LPartsh 5 SetKeyValue( key, value, nextRecord ) ; Sortcmp Sortcmp Sortcmp Sortcmp 6 return true; SortGrpgrp SortGrpgrp SortGrpgrp SortGrpgrp 7 end 8 end MMapcombine MMapcombine MMapcombine . . . MMapcombine . . . 9 return false; Store Store Store Mergecmp SortGrpgrp MMapcombine Store Store PPartsh PPartsh Shu ffl e Phase Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp . . . Reduce Phase SortGrpgrp MMapreduce R1 Store R2 42 T � T � 1 2 size

Selection Analytical Task * Hadoop Hadoop++(256MB) 140 HadoopDB Hadoop++(1GB) HadoopDB Chunks 120 runtime [seconds] 100 80 60 40 20 0 10 nodes 50 nodes 100 nodes 43 * Pavlo et. al. A Comparison of Approaches to large-Scale Data Analysis. SIGMOD 2009

Trojan Index Advantages • Scan + Index data accesses • Parallel index lookups • Non-invasive system changes • Much better performance • Each HDFS block sorted • Each block contains an index • Index access in UDF 44

Trojan Join DataSet ... ... Ht SData T k Hs SData S k Ht SData T k+1 Hs SData S k+1 F co-group j co-group j+1 Co-Partitioned Split i • Each HDFS block co- partitioning over two relations • Join relations are co-located 45

Trojan Join HDFS Blocks DataSet ... ... Ht SData T k Hs SData S k Ht SData T k+1 Hs SData S k+1 F co-group j co-group j+1 Co-Partitioned Split i DataSet ... ... Ht SData T k Hs SData S k Ht SData T k+1 Hs SData S k+1 F co-group j co-group j+1 Co-Partitioned Split i DataSet ... ... Ht SData T k Hs SData S k Ht SData T k+1 Hs SData S k+1 F co-group j co-group j+1 Co-Partitioned Split i DataSet ... ... Ht SData T k Hs SData S k Ht SData T k+1 Hs SData S k+1 F co-group j co-group j+1 Co-Partitioned Split i • Each HDFS block co- partitioning over two relations • Join relations are co-located • Co-partitioned join in UDF physical 45 to co-partitioned co-partition

Join Analytical Task * 2500 Hadoop Hadoop++(256MB) HadoopDB Hadoop++(1GB) 2000 runtime [seconds] 1500 1000 500 0 10 nodes 50 nodes 100 nodes 46 * Pavlo et. al. A Comparison of Approaches to large-Scale Data Analysis. SIGMOD 2009

Trojan Join Advantages • Re- + Co- partitioned join • Parallel join processing • Non-invasive system changes • Much better performance • Each HDFS block co- partitioning over two relations • Join relations are co-located • Co-partitioned join in UDF 47

Trojan Layouts Replica 1 Replica 2 Replica 3 • Each HDFS block in row or column • Each block replica in different layout • Pick right layout in UDF 48

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 - PowerPoint PPT Presentation

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 Scan- OLAP OLTP Streaming oriented 4 Scan- OLAP OLTP Streaming Archiving oriented 4 Scan- Log- OLAP OLTP Streaming Archiving processing oriented 4 Scan- Log-

Creating LaTeX and HTML documents from within Stata using texdoc and webdoc Ben Jann University

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Log all the things! Honza Krl @honzakral Logs? Events! Log lines Twitter feed Invoices

Detecting Large-Scale System Problems by Mining Console Logs Author : Wei Xu, Ling Huang,

Harvesting Logs and Events Using MetaCentrum Virtualization Services Radoslav Bod, Daniel

SECFUZZ: Fuzz-testing Security Protocols Petar Tsankov, Mohammad Torabi Dashti, David Basin ETH

Investigation into file size distribution and its effect on disk server performance Brain Davies

MapReduce Andrew Crotty Alex Galakatos What is MapReduce? MapReduce is a framework for:

Precimonious & HiFPTuner Tuning Assistant for Floating-Point Precision Ignacio Laguna,

Towards a Methodology for Benchmarking Edge Processing Frameworks Pedro Silva, Alexandru Costan,

ADVANCED DATABASE SYSTEMS Recovery Protocols @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

A ns2-based simulation framework for performance evaluation of overlay networks Michele Amoretti

IMRT: Patient Specific QA ICPT School on Medical Physics for Radiation Therapy Justus Adamson PhD

CS535 Big Data 2/10/2019 Week 4-A Sangmi Lee Pallickara CS535 Big Data | Computer Science |

20 Schemes Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science

redo logging (fjnish) / distributed systems 1 1 last time (1) block groups keep related

Debugging & Logging Java Logging Java has built-in support for logging Logs contain

Announcements Quiz on Thursday Next assignment will be available later this week (Thursday

ONLINE DEGREE-BOUNDED STEINER Sina Dehghani Saeed Seddighin NETWORK DESIGN Ali Shafahi Fall

Logical time and logical clocks Knowing the ordering of events is important not enough with

Lamport Clocks Doug Woos Logistics notes Problem Set 1 due Friday Chandy-Lamport Snapshots

Issues with Clocks Context The tree correction protocol was based on the idea of local

CS603: Distributed Systems Lecture 4: Overcoming failures in distributed systems Cristina

Sensor Networks Where Theory Meets Practice Roger Wattenhofer ETH Zurich Distributed

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 - PowerPoint PPT Presentation

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 Scan- OLAP OLTP Streaming oriented 4 Scan- OLAP OLTP Streaming Archiving oriented 4 Scan- Log- OLAP OLTP Streaming Archiving processing oriented 4 Scan- Log-

Creating LaTeX and HTML documents from within Stata using texdoc and webdoc Ben Jann University

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Log all the things! Honza Krl @honzakral Logs? Events! Log lines Twitter feed Invoices

Detecting Large-Scale System Problems by Mining Console Logs Author : Wei Xu, Ling Huang,

Harvesting Logs and Events Using MetaCentrum Virtualization Services Radoslav Bod, Daniel

SECFUZZ: Fuzz-testing Security Protocols Petar Tsankov, Mohammad Torabi Dashti, David Basin ETH

Investigation into file size distribution and its effect on disk server performance Brain Davies

MapReduce Andrew Crotty Alex Galakatos What is MapReduce? MapReduce is a framework for:

Precimonious &amp; HiFPTuner Tuning Assistant for Floating-Point Precision Ignacio Laguna,

Towards a Methodology for Benchmarking Edge Processing Frameworks Pedro Silva, Alexandru Costan,

ADVANCED DATABASE SYSTEMS Recovery Protocols @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

A ns2-based simulation framework for performance evaluation of overlay networks Michele Amoretti

IMRT: Patient Specific QA ICPT School on Medical Physics for Radiation Therapy Justus Adamson PhD

CS535 Big Data 2/10/2019 Week 4-A Sangmi Lee Pallickara CS535 Big Data | Computer Science |

20 Schemes Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science

redo logging (fjnish) / distributed systems 1 1 last time (1) block groups keep related

Debugging &amp; Logging Java Logging Java has built-in support for logging Logs contain

Announcements Quiz on Thursday Next assignment will be available later this week (Thursday

ONLINE DEGREE-BOUNDED STEINER Sina Dehghani Saeed Seddighin NETWORK DESIGN Ali Shafahi Fall

Logical time and logical clocks Knowing the ordering of events is important not enough with

Lamport Clocks Doug Woos Logistics notes Problem Set 1 due Friday Chandy-Lamport Snapshots

Issues with Clocks Context The tree correction protocol was based on the idea of local

CS603: Distributed Systems Lecture 4: Overcoming failures in distributed systems Cristina

Sensor Networks Where Theory Meets Practice Roger Wattenhofer ETH Zurich Distributed

Precimonious & HiFPTuner Tuning Assistant for Floating-Point Precision Ignacio Laguna,

Debugging & Logging Java Logging Java has built-in support for logging Logs contain