Ph.D. Matt Might, The Illustrated Guide to a Ph.D.: - PowerPoint PPT Presentation

Ph.D. Matt Might, The Illustrated Guide to a Ph.D.: http://matt.might.net/articles/phd-school-in-pictures

ONE SIZE DOES NOT FIT ALL

Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 4

OLTP 5

OLAP 5

Archive 5

Streaming 5

Log-processing 5

Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 6

Indexes Column Row Raw files Row+Column 7

Storage Views 1 abc 56 887.9 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6 omg 90 148.9 8

Storage Views 1 abc 56 887.9 Log 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6 omg 90 148.9 8

Storage Views 1 abc 56 887.9 Log 2 fdg 89 445.35 3 poe 67 234.67 Row 4 lkj 12 385.92 5 yui 17 612.13 6 omg 90 148.9 8

Storage Views 1 abc 56 887.9 Log 2 fdg 89 445.35 3 poe 67 234.67 Row 4 lkj 12 385.92 Column 5 yui 17 612.13 6 omg 90 148.9 8

Storage Views 1 abc 56 887.9 Log 2 fdg 89 445.35 3 poe 67 234.67 Row 4 lkj 12 385.92 Column 5 yui 17 612.13 Column grouped 6 omg 90 148.9 8

Storage Views 1 abc 56 887.9 Index Log 2 fdg 89 445.35 3 poe 67 234.67 Row 4 lkj 12 385.92 Column 5 yui 17 612.13 Column grouped 6 omg 90 148.9 8

Storage Views 1 abc 56 887.9 Index Log 2 fdg 89 445.35 PAX 3 poe 67 234.67 Row 4 lkj 12 385.92 Column 5 yui 17 612.13 Column grouped 6 omg 90 148.9 8

Example: Flight Tickets tickets.customer_id " ! ( ( )) customer.* a 1 =x 1.. a n =x n customer.id Result Log SV 9

Example: Flight Tickets tickets.customer_id " ! ( ( )) customer.* a 1 =x 1.. a n =x n customer.id Result Log SV )) ! ( tickets ! tickets.customer_id $ ( ( )) ( bag=tickets " # Col SV bag,key customer.* a 1 =x 1.. a n =x n recent customer.id Result Log SV # ( ! ( " Row SV ) ) r e c e n t b a g , k e y b customers a g = c u s t o m e r s 9

Example: Flight Tickets tickets.customer_id " ! ( ( )) customer.* a 1 =x 1.. a n =x n customer.id Result Log SV )) ! ( tickets ! tickets.customer_id $ ( ( )) ( bag=tickets " # Col SV bag,key customer.* a 1 =x 1.. a n =x n recent customer.id Result Log SV # ( ! ( " Row SV ) ) r e c e n t b a g , k e y b customers a g = c u s t o m e r s Cold s Index y a d Col SV 7 - SV ! tickets.customer_id $ w ( ( )) o n ! < customer.* a 1 =x 1.. a n =x n e $ price,rid ) m customer.id ) i t ! ( bag=tickets ( " # # Col SV ! bag,key Result ( recent count(*)>=5 customer_id " time>=now-7days ) Log SV $ id,rid # ( ! ( " Index )) Frequent Fliers Row SV recent tickets.customer_id bag,key bag=customers SV 9 (Adaptive Partial Index) customer.id

Example: Flight Tickets tickets.customer_id " ! ( ( )) customer.* Primary Log Store a 1 =x 1.. a n =x n customer.id Result Log SV )) ! ( tickets ! tickets.customer_id $ ( ( )) ( bag=tickets " # Primary Col SV bag,key customer.* a 1 =x 1.. a n =x n Log Store recent customer.id Result Log SV # ( ! ( " Row SV ) ) r e c e n t b a g , k e y b customers a g = c u s t o m e r s Cold s Index y a d Col SV 7 - SV ! tickets.customer_id $ w ( ( )) o n ! < customer.* a 1 =x 1.. a n =x n e $ price,rid ) m customer.id ) Primary i t ! ( Log Store bag=tickets ( " # # Col SV ! bag,key Result ( recent count(*)>=5 customer_id " time>=now-7days ) Log SV $ id,rid # ( ! ( " Index )) Frequent Fliers Row SV recent tickets.customer_id bag,key bag=customers SV 9 (Adaptive Partial Index) customer.id

WTF! Where’s The Food! 10

Rodent Store

What to store? Data Files copy 1 copy 2 copy 3

How to store? + a b ? Data Files

Where to store? ? Data Files

Data Management System Data View Logical DSL DSL WWHow! Language WWHow! Layer Physical Storage Interface Data View Physical

Example Use-cases • WWHow! File System • WWHow! RAID • WWHow! Relational DBMS • WWHow! Cloud

Store my conferences talks (PDFs 2x and PPTs 1x) using RSA compression on University server STORE ‘/Users/Bob/Conferences/Talks/*.*’ WHAT *.(pdf | ppt), *.pdf WHERE vise4 HOW encryption(rsa) FOR *;

I want my conference talks to be highly available STORE ‘/Users/Bob/Conferences/Talks/*.*’ WHAT *.(pdf | ppt), *.pdf HOW encryption(rsa) FOR * PREFERENCE Availability=‘high’;

I want my conference talks to be highly available STORE ‘/Users/Bob/Conferences/Talks/*.*’ WHAT *.(pdf | ppt), *.pdf HOW encryption(rsa) FOR * PREFERENCE Availability=‘high’; job for the   WWhow! data storage optimizer

OctopusDB • Cool Vision • Tough to realize 19

C-Store

Trojan Columns Application User Database Query Processor Relations UDF Storage Layer Physical Representation File 1 File 2 File 3 .... File n 23

Trojan Columns Relation Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Physical Table Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household 24

Trojan Columns Relation Customer Tuple name phone market_segment Iterator write-UDF (a) Convert row (c) Get next smith 2134 automobile tuples into blobs row data john 3425 household kim 6756 furniture Data Data joe 9878 building Parser Accessor mark 4312 building (b) Store blob data steve 2435 automobile jim 5766 household ian 8789 household Physical Table Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household 24

Trojan Columns Relation (g)End of table Customer Tuple name phone market_segment Iterator read-UDF (f) Fetch (e) Reconstruct smith 2134 automobile blob data row tuples john 3425 household kim 6756 furniture Data Data joe 9878 building Parser Accessor mark 4312 building (d) Parse blob data steve 2435 automobile jim 5766 household ian 8789 household Physical Table Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household 25

Example: TPC-H Query 6 Result γ agg (extendedprice * discount) σ shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24 π quantity, discount extendedprice, shipdate S CAN lineitem 26

26 Example: TPC-H Query 6 ‘1994-01-01’ AND ‘1995-01-01’ scanUDF agg (extendedprice * discount) AND discount BETWEEN extendedprice, shipdate shipdate BETWEEN shipd AND quantity < 24 quantity, discount scanUDF 0.05 AND 0.07 lineitem Result S CAN σ π γ ‘1994-01-01’ AND ‘1995-01-01’ agg (extendedprice * discount) AND discount BETWEEN extendedprice, shipdate shipdate BETWEEN AND quantity < 24 quantity, discount 0.05 AND 0.07 lineitem Result S CAN σ π γ

Example: TPC-H Query 6 Result Result Result γ γ γ agg (extendedprice * discount) agg (extendedprice * discount) agg (extendedprice * discount) selectUDF σ σ σ shipdate BETWEEN shipdate BETWEEN shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ ‘1994-01-01’ AND ‘1995-01-01’ ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN AND discount BETWEEN AND discount BETWEEN 0.05 AND 0.07 0.05 AND 0.07 0.05 AND 0.07 AND quantity < 24 AND quantity < 24 AND quantity < 24 π π π quantity, discount quantity, discount quantity, discount extendedprice, shipdate extendedprice, shipdate extendedprice, shipdate scanUDF S CAN S CAN S CAN lineitem lineitem lineitem shipd scanUDF selectUDF σ σ σ 26 te BETWEEN

Benchmark Results * 30 Standard Row Trojan Columns Query Time (sec) 20 10 0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 71.74058 72.41696 27 * Mike Stonebraker et. al. C-Store: A Column Oriented DBMS. VLDB 2005

Ph.D. Matt Might, The Illustrated Guide to a Ph.D.: - PowerPoint PPT Presentation

Ph.D. Matt Might, The Illustrated Guide to a Ph.D.: http://matt.might.net/articles/phd-school-in-pictures ONE SIZE DOES NOT FIT ALL Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 4 Streaming OLAP OLTP Archiving

Catalog 2020 Water slides tunnelslides.com Water slide AQUA BANAN Material : AISI 304 / DIN EN

Misc What's a reduction? Tapes, NTIME, NEXP, Padding, PH What is a reduction from A to B?

Connectivity-Optimized Representation Learning via Persistent Homology Christoph D. Hofer, Roland

PH 7.130 Paco 2 91 torr Pao 2 74 torr HCO 3 30.1 mEq/L BE -0.4 O2

Sample Preparation and Characterization Cy Jeffries EMBL Hamburg Small-angle scattering (SAS)

PhD Rants and Raves (Be afraid. Be very afraid.) Yannis Smaragdakis University of Massachusetts,

Mathy Vanhoef Public PhD Defense A Security Analysis of the WPA- TKIP and TLS Security Protocols

Tips for a succesful PhD, and how to win an award with it! Tias Guns KULeuven & VUB, Belgium

Bonnie A. Green, Ph.D. Associate Dean of the College of Arts and Sciences Professor of

Gowtham Information Technology Services it-help@mtu.edu (906) 487/1111

Network File System (NFS) Nima Honarmand (Based on slides

The Redistributive Consequences of Segregation Lisa Windsteiger London School of Economics

A St Andrews PhD Dr Clare Peddie ProDean Taught Postgraduates pgtprodean@st-andrews.ac.uk A St

Integro-differential equations: Regularity theory and Pohozaev identities Xavier Ros Oton

EECS PhD Qualifying Exam 1 Department of Electrical Engineering and Computer Science (EECS)

Understanding Similarity Metrics in Neighbour-based Recommender Systems Alejandro Bellogn , Arjen

Towards End-to-end Data Exchange in the IoT Georgios Bouloukakis Joint work with Nikolaos

Innovation and Collaboration: The Transdisciplinary Obesity Prevention (TOP) Graduate Certificate

Life After Your PhD Keyi Chen & Jamie Nelson What are your plans? Do you want to ... 1.

From Rutgers PhD to Medical Affairs Professional Dharm Patel, PhD Medical Strategy &

Social Science Librarians Boot Camp 2019 Brittany Andersen PhD Candidate at Boston University /

Overcoming Impossibility Results in Composable Security using Interval-Wise Guarantees Daniel

SunyoungKim,PhD Todays agenda Evaluation Expert evaluation o Cognitive

29/7/19 The rationale: why do this? Writing a PhD proposal o Needs to be a contribution to

Ph.D. Matt Might, The Illustrated Guide to a Ph.D.: - PowerPoint PPT Presentation

Ph.D. Matt Might, The Illustrated Guide to a Ph.D.: http://matt.might.net/articles/phd-school-in-pictures ONE SIZE DOES NOT FIT ALL Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 4 Streaming OLAP OLTP Archiving

Catalog 2020 Water slides tunnelslides.com Water slide AQUA BANAN Material : AISI 304 / DIN EN

Misc What's a reduction? Tapes, NTIME, NEXP, Padding, PH What is a reduction from A to B?

Connectivity-Optimized Representation Learning via Persistent Homology Christoph D. Hofer, Roland

PH 7.130 Paco 2 91 torr Pao 2 74 torr HCO 3 30.1 mEq/L BE -0.4 O2

Sample Preparation and Characterization Cy Jeffries EMBL Hamburg Small-angle scattering (SAS)

PhD Rants and Raves (Be afraid. Be very afraid.) Yannis Smaragdakis University of Massachusetts,

Mathy Vanhoef Public PhD Defense A Security Analysis of the WPA- TKIP and TLS Security Protocols

Tips for a succesful PhD, and how to win an award with it! Tias Guns KULeuven &amp; VUB, Belgium

Bonnie A. Green, Ph.D. Associate Dean of the College of Arts and Sciences Professor of

Gowtham Information Technology Services it-help@mtu.edu (906) 487/1111

Network File System (NFS) Nima Honarmand (Based on slides

The Redistributive Consequences of Segregation Lisa Windsteiger London School of Economics

A St Andrews PhD Dr Clare Peddie ProDean Taught Postgraduates pgtprodean@st-andrews.ac.uk A St

Integro-differential equations: Regularity theory and Pohozaev identities Xavier Ros Oton

EECS PhD Qualifying Exam 1 Department of Electrical Engineering and Computer Science (EECS)

Understanding Similarity Metrics in Neighbour-based Recommender Systems Alejandro Bellogn , Arjen

Towards End-to-end Data Exchange in the IoT Georgios Bouloukakis Joint work with Nikolaos

Innovation and Collaboration: The Transdisciplinary Obesity Prevention (TOP) Graduate Certificate

Life After Your PhD Keyi Chen &amp; Jamie Nelson What are your plans? Do you want to ... 1.

From Rutgers PhD to Medical Affairs Professional Dharm Patel, PhD Medical Strategy &amp;

Social Science Librarians Boot Camp 2019 Brittany Andersen PhD Candidate at Boston University /

Overcoming Impossibility Results in Composable Security using Interval-Wise Guarantees Daniel

SunyoungKim,PhD Todays agenda Evaluation Expert evaluation o Cognitive

29/7/19 The rationale: why do this? Writing a PhD proposal o Needs to be a contribution to

Tips for a succesful PhD, and how to win an award with it! Tias Guns KULeuven & VUB, Belgium

Life After Your PhD Keyi Chen & Jamie Nelson What are your plans? Do you want to ... 1.

From Rutgers PhD to Medical Affairs Professional Dharm Patel, PhD Medical Strategy &