Cheap data analytics using cold storage devices Renata - PowerPoint PPT Presentation

Cheap data analytics using cold storage devices Renata Borovica-Gajic, Raja Appuswamy, and Anastasia Ailamaki

Proliferation of cold data “80% enterprise data is cold with 60% CAGR” [ Horison] “cold data: an incredibly valuable piece of the analysis pipeline” [Intel] Cold Storage Devices (CSD) to the rescue Active disks Latency ~ 10 secs B A Power one disk Latency ~ 10 ms Cool one disk PB of storage at cost ~ tape and latency ~ disks 2

CSD in the storage tiering hierarchy Tiers DRAM $$$ SSD Performance 15k RPM HDD 7200 $$ RPM Capacity HDD $ VTL Archival ms ns µs min hour sec Data Access Latency 3

CSD in the storage tiering hierarchy Tiers DRAM $$$ SSD Performance 15k RPM HDD 7200 $$ RPM Capacity HDD $ CSD COLD ? $ VTL Archival ms ns µs min hour sec Data Access Latency Can we shrink tiers to further save cost? 4

CSD in the storage tiering hierarchy Tiers DRAM $$$ SSD Performance 15k RPM HDD 7200 $$ ? RPM Capacity HDD $ CSD COLD $ Archival ms ns µs min hour sec Data Access Latency Can we shrink tiers to further save cost? 5

CSD in the storage tiering hierarchy Storing 100TB of data Tiers [Horison, 2015] DRAM 400 $$$ SSD 350 Performance 15k RPM HDD 300 $159,641 Cost (x1000$) 250 200 150 $ CSD COLD 100 50 0 CSD Trad. ms ns µs min hour sec 2-tier 3-tier Data Access Latency CSD offers significant cost savings (40%) But… can we run queries over CSD? 6

Query execution over CSD Traditional setting Virtualized enterprise data center VM2 VM3 VM1 Clients DB1 DB1 DB2 DB3 objects blocks Network A1A2A3 C3 C2 C1C2C3 C1 B4 A3 B3 B1B2B3B4 B2 A2 B1 A1 Cold Storage Tier HDD-Based Capacity Tier     Uniform access Control layout Uniform access Control layout  Static (pull-based) execution Pull-based execution will trigger unwarranted group switches 7

What this means for an enterprise datacenter… Setting : multitenant enterprise datacenter, clients: PostgreSQL , TPCH 50, Q12, CSD: shared, layout: one client per group 5 PostgreSQL CSD CSD 8 Ideal HDD HDD Average execution time Average execution time 4 7 6 (x1000 sec) (x1000 sec) 3 5 4 2 3 2 1 1 0 0 0 10 20 1 2 3 4 5 Number of clients (groups) Group switch latency (sec) Lost opportunity: CSD relegated to archival storage 8

Need hardware-software codesign 1. Data access has to be hardware-driven to minimize group switches 2. Query execution engine has to process data pushed from storage in out-of-order (unpredictable) manner 3. Reduce data round-trips to cold storage by smart data caching 9

Skipper to the rescue Virtualized enterprise data center VM VM1 VM2 VM3 2. PostgreSQL Opportunistic execution with multi-way joins MJoin DB1 DB2 DB3 Hash Hash Hash 3. Network I/O Scheduler object-group map. Scan A Scan B Scan C Cache 1. Management A1 B1 C1 A2 Novel ranking algorithm Progress driven caching Cold Storage 10

Multi-way joins in PostgreSQL Setting : Query AxBxC, A:A1, A2; B: B1,B2; C:C1, C2; VM: PostgreSQL State Manager Join Execution A1,B1,C1 Subplans: A2,B1,C1 Executed Executed Pending Pending Pending A1,B1,C1 A1,B1,C1 A1,B1,C2 A1,B1,C1 MJoin A2,B1,C1 A1,B1,C2 A1,B1,C2 A1,B2,C1 A1,B2,C1 Hash Hash Hash A1,B2,C2 A1,B2,C1 A1,B2,C2 A1,B2,C2 A2,B1,C2 Scan A Scan B Scan C A2,B1,C1 A2,B1,C1 A2,B2,C1 A2,B1,C2 A2,B2,C2 A2,B1,C2 A1 B1 C1 A2,B2,C1 A2,B2,C1 A2 A2,B2,C2 A2,B2,C2 Cache Manager A1 A2 C1 B1 Enable out-of-order opportunistic execution

Progress driven caching Setting : Query AxBxC, Cache size: 4, Cache full, Evict a candidate Cache Pending Pending ? A1 A2 C1 B1 C2 A1,B1,C2 A1,B1,C2 A1,B2,C1 A1,B2,C1 A1,B2,C2 A1,B2,C2 LRU No progress (drop B1) A2,B1,C2 A2,B1,C2 A2,B2,C1 A2,B2,C1 New “Max progress” algorithm A2,B2,C2 A2,B2,C2 Object A1 A2 B1 C1 A1 A2 B1 C2 Executed Progress 1 1 2 0 Progress: 2 A1,B1,C1 A2,B1,C1 Minimizes data roundtrips, maximizes query progress 12

Rank-based scheduling Which group to switch to ? FCFS – Fair but inefficient Group Table objects O1 O2 O3 O44 O5 G1 O1 (DB1), O3 (DB3) G2 O2 (DB2), O4 (DB4) Max-requests: Efficient, not fair G3 O5 (DB5) O1 O3 O1, O2, O3, O4, O5 TIME O1 O3 O2 O4 New Ranking Algorithm O1, O3 O1 O4 O1 O3 O3 O2 Rank(G) = #Requests + ∑Wait O2, O4 O1 O3 O2O4 …. O1 O3 O2 O4 Provides efficiency Provides fairness O5 STARVES O1 O4 O1 O3 O5 O3 O2 O2 O4 Balances efficiency and fairness 13

Skipper in action Setting : multitenant enterprise datacenter, clients: TPCH 50, Q12, CSD: shared, layout: one client per group 5 8 PostgreSQL PostgreSQL Average exec. time Average exec. time Ideal 7 Ideal 4 Skipper 6 Skipper (x1000 sec) (x1000 sec) 5 3 4 2 3 2 1 1 0 0 0 10 20 30 40 1 2 3 4 5 Group switch latency (sec) Number of clients Skipper performs within 20% of HDD-based capacity tier Skipper is resilient to group switch latency 14

Minimizing group switches Setting : multitenant enterprise datacenter, 5 clients: TPCH 50, Q12, CSD: shared, layout: one client per group 100% Exec. time breakdown (%) 90% 80% Transfer time 70% Switch time 60% Processing 50% 40% 30% 20% 10% 0% PostgreSQL Skipper Skipper substantially reduces overhead of group switches 15

Conclusions • Cold storage can substantially reduce TCO – But DBMS performance suffers due to pull-based execution • Skipper enables efficient query execution over CSD with – Out-of-order execution based on multi-way joins – Novel progress based caching policy – Rank based I/O scheduling • Skipper makes data analytics over CSD as a service possible – Providers reduce cost by offloading data to CSD – Customers reduce cost by running inexpensive data analytics over CSD Thank you! 16

Cheap data analytics using cold storage devices Renata - PowerPoint PPT Presentation

Cheap data analytics using cold storage devices Renata Borovica-Gajic, Raja Appuswamy, and Anastasia Ailamaki Proliferation of cold data 80% enterprise data is cold with 60% CAGR [ Horison] cold data: an incredibly valuable piece of

Cheap Talk Games: Extensions Cheap Talk Games: Extensions F. Koessler / November 12, 2008 Cheap

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

Cheap and Safe Cold Synthesis Generator LENR ENERGY Me-H LENR Reactor Metal hydride Cold

Advanced Cold Asphalts HIGH PERFORMANCE ASPHALT COLD MIX FOR POT HOLE AND UTILITY CUT REPAIRS

Cold Brew THIS IS COLD BREW Cofgee brewed with cold fresh water over a long time gets unique

WEATHER FRONTS Map Obtained from TWC COLD FRONTS We already have stated that a cold front is a

nouvelle mthode de placement MSc Jacques VERCRUYSSE GEO-GREEN sprl-bvba Cheap-GSHPs (Cheap and

Challenges of Cold Supply Chain Challenges of Cold Supply Chain Dr. Armin Hoffmann, Pharm.D.,

CONCRETING 1 3/4/2015 2 3/4/2015 3 3/4/2015 ACI DEFINITION OF COLD WEATHER Cold Weather - A

Cold War Development of the Cold War The Cold War (1945-91) was one of perception where

2015 Operations Stay Treat CHL2 4K Cold Box Modifications SC1 2K Cold Box CC4 Failure

Cold Atom Atom Clocks Clocks Cold Cold Atom Clocks and Fundamental Fundamental Tests Tests

Jack Fried Cold Electronics Review October 13, 2016 10/13/2016 Cold Electronics Review 1 APA

The Cold Weather Plan The Cold Weather Plan update for this winter p Preventing Illness by

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Cheap Children and the Cheap Children and the Persistence of Poverty Persistence of

Perceus A Cluster Provisioning and Management Toolkit Bill Strossman Bill Strossman What

The Devil Wears RPM: Continous Security Integration Ikey Doherty Intel Corporation Who are you?

Slide Handouts: Instruction Ask the Expert Welcome to Module 6 Lesson 1. Instruction: Ask the

rpm-packaging Project overview and update What does rpm-packaging do? RPM Packaging for

h"ps://conferences.lbl.gov/event/188/ Next NSD staff mee,ng: 19 th of March 2019 Notes on NSD

First Quarter Results Fiscal Year 2021 RPM International Inc. Consolidated Statements of Income

Bro + ELK BroCon 2015 Michael Pananen Vigilant Technology

LCMAPS (and VOMS) integration for Globus services D. H. van Dok (Nikhef) and M. Sall (Nikhef)

Cheap data analytics using cold storage devices Renata - PowerPoint PPT Presentation

Cheap data analytics using cold storage devices Renata Borovica-Gajic, Raja Appuswamy, and Anastasia Ailamaki Proliferation of cold data 80% enterprise data is cold with 60% CAGR [ Horison] cold data: an incredibly valuable piece of

Cheap Talk Games: Extensions Cheap Talk Games: Extensions F. Koessler / November 12, 2008 Cheap

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

Cheap and Safe Cold Synthesis Generator LENR ENERGY Me-H LENR Reactor Metal hydride Cold

Advanced Cold Asphalts HIGH PERFORMANCE ASPHALT COLD MIX FOR POT HOLE AND UTILITY CUT REPAIRS

Cold Brew THIS IS COLD BREW Cofgee brewed with cold fresh water over a long time gets unique

WEATHER FRONTS Map Obtained from TWC COLD FRONTS We already have stated that a cold front is a

nouvelle mthode de placement MSc Jacques VERCRUYSSE GEO-GREEN sprl-bvba Cheap-GSHPs (Cheap and

Challenges of Cold Supply Chain Challenges of Cold Supply Chain Dr. Armin Hoffmann, Pharm.D.,

CONCRETING 1 3/4/2015 2 3/4/2015 3 3/4/2015 ACI DEFINITION OF COLD WEATHER Cold Weather - A

Cold War Development of the Cold War The Cold War (1945-91) was one of perception where

2015 Operations Stay Treat CHL2 4K Cold Box Modifications SC1 2K Cold Box CC4 Failure

Cold Atom Atom Clocks Clocks Cold Cold Atom Clocks and Fundamental Fundamental Tests Tests

Jack Fried Cold Electronics Review October 13, 2016 10/13/2016 Cold Electronics Review 1 APA

The Cold Weather Plan The Cold Weather Plan update for this winter p Preventing Illness by

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Cheap Children and the Cheap Children and the Persistence of Poverty Persistence of

Perceus A Cluster Provisioning and Management Toolkit Bill Strossman Bill Strossman What

The Devil Wears RPM: Continous Security Integration Ikey Doherty Intel Corporation Who are you?

Slide Handouts: Instruction Ask the Expert Welcome to Module 6 Lesson 1. Instruction: Ask the

rpm-packaging Project overview and update What does rpm-packaging do? RPM Packaging for

h&quot;ps://conferences.lbl.gov/event/188/ Next NSD staff mee,ng: 19 th of March 2019 Notes on NSD

First Quarter Results Fiscal Year 2021 RPM International Inc. Consolidated Statements of Income

Bro + ELK BroCon 2015 Michael Pananen Vigilant Technology

LCMAPS (and VOMS) integration for Globus services D. H. van Dok (Nikhef) and M. Sall (Nikhef)

h"ps://conferences.lbl.gov/event/188/ Next NSD staff mee,ng: 19 th of March 2019 Notes on NSD