GPU-Accelerated Analytics on your Data Lake. Data Lake @blazingdb - PowerPoint PPT Presentation

GPU-Accelerated Analytics on your Data Lake.

Data Lake @blazingdb

Data Swamp @blazingdb

ETL Hell >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>> >>> >>>> >>>>> >>>>> 01010101001001 DATA LAKE 01010101100001 >>>>>>>>>>> >>> >>>> >>>> >>>>>>>>> 01011010100100 0001010100001001011010110 01011010100001 >>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> 01010110100001 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>> >>>>>>>>>>> 01010101001001 >>>>>>>>>>>>>> >>>>>>>>>>> >>> 01010101100001 >>>>> >>>>>>>>>>>>>> 01011010100100 >>>>>>>>>>>>>>>>>> 01011010100001 >>>> 01010110100001 >>>>>>>>>>>>>>>>>>>>>>> @blazingdb

COMMON DATA LAYER @blazingdb

Simplify Data Storage SCHEMA METADATA DATA @blazingdb

SQL Warehouse on Data Lake @blazingdb

BlazingDB – How it works • Compression/Decompression • Filtering (Predicate Pushdown) • Aggregations • Transformations DATA LAKE • Joins • Sorting/Ordering 0001010100001001011010110 • RAM Cache (Hot) • Disk Cache (Medium) • HDD Local Disk • SSD HDFS AWS S3 @blazingdb

BlazingDB Multi-nodal Cluster @blazingdb

Shared Data Architecture DATA LAKE 0001010100001001011010110 @blazingdb

The Nays No Ingest No Duplication No BlazingDB No Consistency No Vendor Specific ETL Management Lock-in @blazingdb

The Yays Incredibly Scalable, Multi-Terabyte Data Sharing High Fast SQL On Demand Queries (Across Clusters Concurrency Data Warehouse And Other Tools) @blazingdb

DEMO @blazingdb

Demo - Architecture HDFS on Azure Azure GPU Servers NC24 V1 • 4 Servers @blazingdb

Queries: BlazingDB 4 Node Query times (Lower is better) 380.5 281.1 251.8 SECONDS Cold Medium (Disk cache only) Hot 154.1 142.1 135.5 73.6 73.8 72 63.1 46 46.3 14.9 14 12.2 Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES @blazingdb

Query 1 Query1 select l_returnflag, l_linestatus, 1 sum(l_quantity) as sum_qty, 2 sum(l_extendeprice) as sum_disc_price, 3 sum(l_extendeprice*(1-l_discount)) as 4 sum_base_price, sum(l_extendeprice*(1-l_discount)*(1+l_tax)) as 5 sum_charge, avg(l_quatity) as avg_qty, 6 SECONDS avg(l_extendedprice) as avg_price, 7 avg(l_discount) as avg_disc, 8 count(l_quantity) as count_order 9 from lineitem 10 where l_shipdate <= ‘1995 -06- 01’ 11 group by l_returnflag, l_linestatus 12 order by l_returnflag, l_linestatus; 13 Data Points Query 1 • 6 billion row table Cold Medium Hot • Many aggregations/transformations (Disk cache only) @blazingdb

Query 2 Query2 select lineitem.l_orderkey, 1 sum(lineitem.l_extendedprice*(1- 2 lineitem.l_discount)) as revenue, 3 orders.o_orderdate, orders.o_shippriority 4 from customer inner join orders on customer.c_custkey = 5 orders.o_custkey inner join lineitem on lineitem.l_orderkey = orders.o_orderkey 6 SECONDS where 7 customer.c_mktsegment = 'BUILDING' 8 and orders.o_orderdate < '1995-03-15' 9 and lineitem.l_shipdate > '1995-03-15' 10 group by lineitem.l_orderkey, 11 orders.o_orderdate, orders.o_shippriority 12 order by revenue desc,orders.o_orderdate; 13 Data Points Query 2 • Join 6B rows to 1.5B rows to 150M rows Cold Medium Hot • Many aggregations/transformations (Disk cache only) • Order (sorting) @blazingdb

Query 3 Query3 select nation.name, sum(lineitem.l_extendedprice * 1 (1 - lineitem.l_discount)) as revenue 2 from customer 3 inner join orders on customer.cust_key = 4 orders.o_custkey inner join lineitem on lineitem.l_orderkey = orders.o_orderkey 5 inner join supplier on lineitem.l_suppkey = supplier.s_suppkey inner join nation on 6 SECONDS supplier.s_nationkey = nation.nation_key 7 inner join region on nation.region_key = 8 region.r_regionkey 9 where supplier.s_nationkey = nation.nation_key 10 and region.r_name = 'ASIA' 11 and orders.o_orderdate >= '19940101' 12 and orders.o_orderdate < '19950101' 13 group by nation.name order by revenue desc 14 Data Points Query 3 • Join 6B rows to 1.5B rows to 150M rows (and many Cold Medium Hot small joins) (Disk cache only) • Multiple aggregations/transformations • Order (sorting) @blazingdb

Query 4 Query4 select sum(l_extendedprice) as sum_exprice, 1 sum(l_discount) as sum_discount 2 from lineitem 3 where l_shipdate >= '19940101' 4 and l_shipdate < '19950101' and l_discount >= 0.05 and l_discount <= 0.07 5 and l_quantity < 24 6 SECONDS 7 8 9 10 11 12 13 14 Data Points Query 4 • 6B row table Cold Medium Hot • Multiple aggregations/transformations (Disk cache only) @blazingdb

Query 5 Query1 select supplier.s_acctbal, supplier.s_suppkey, nation.name, part.p_partkey, part.p_mfgr, supplier.s_address, supplier.s_phone, supplier.s_comment from supplier inner join partsupp on supplier.s_suppkey = partsupp.ps_suppkey inner join nation on supplier.s_nationkey = nation.nation_key inner join region on nation.region_key = region.r_regionkey inner join part on part.p_partkey = partsupp.ps_partkey where part.p_size = 15 and part.p_type in ('ECONOMY ANODIZED BRASS', 'ECONOMY BRUSHED BRASS', SECONDS 'ECONOMY BURNISHED BRASS', 'ECONOMY PLATED BRASS', 'ECONOMY POLISHED BRASS', 'LARGE ANODIZED BRASS', LARGE BRUSHED BRASS','LARGE BURNISHED BRASS','LARGE PLATED BRASS', 'LARGE POLISHED BRASS', 'SMALL ANODIZED BRASS', 'SMALL BRUSHED BRASS', 'SMALL BURNISHED BRASS', SMALL PLATED BRASS', 'SMALL POLISHED BRASS', 'STANDARD ANODIZED BRASS', 'STANDARD BRUSHED BRASS', 'STANDARD BURNISHED BRASS', 'STANDARD PLATED BRASS', 'STANDARD POLISHED BRASS') and region.r_name = 'EUROPE' order by supplier.s_acctbal desc, supplier.s_suppkey, nation.name, part.p_partkey Data Points Query 5 • Join multiple tables Cold Medium Hot • Many aggregations/transformations (Disk cache only) • String comparisons @blazingdb

Data Pipeline Coming Soon Common Data Layer STORAGE GPU Data Frame (Data Lake) Apache Arrow INGEST @blazingdb

Questions? @blazingdb

GPU-Accelerated Analytics on your Data Lake. Data Lake @blazingdb - PowerPoint PPT Presentation

GPU-Accelerated Analytics on your Data Lake. Data Lake @blazingdb Data Swamp @blazingdb ETL Hell

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Picture This! Visualization on GPU Accelerated Supercomputers Peter Messmer, 11/15/2016 NVIDIA

GPU OPEN ANALYTICS INITIATIVE END-TO-END ACCELERATED ANALYTICS Brad Rees, Ph.D. - Senior

GPU-accelerated similarity searching in a database of short DNA sequences Richard Wilton

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

GPU-accelerated Data Management Data Processing on Modern Hardware Sebastian Bre TU Dortmund

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Document Name Solar Analytics - Rooftop PV energy analytics PREPARED BY: Your Name, Your Title

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

PacketShader: A GPU-Accelerated Software Router Some images and sentence are from original author

DLGF Data 101: Real Property Presented by: James Johnson Director of Data Analysis Division May

Data exclusivity, market protection and paediatric rewards Workshop for Micro, Small and Medium

Visual AI on the Edge and in the Cloud Easy Access to Earth Observation Data Problems We Solve

Prioritizing Data and Purpose of Data Points What data do I have? What data do I trust? What

InfoTech-Cs Reverse Data Modelling Contact: jiri.pavlicek@infotech-cs.eu InfoTech-Cs CS The

GAUGING MOOC LEARNERS ADHERENCE TO THE DESIGNED LEARNING PATH DAN DAVIS , GUANLIANG CHEN,

TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015

Data preparation for verifjcation L. Wilson Associate Scientist Emeritus Environment Canada

GPU-Accelerated Analytics on your Data Lake. Data Lake @blazingdb - PowerPoint PPT Presentation

GPU-Accelerated Analytics on your Data Lake. Data Lake @blazingdb Data Swamp @blazingdb ETL Hell

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Picture This! Visualization on GPU Accelerated Supercomputers Peter Messmer, 11/15/2016 NVIDIA

GPU OPEN ANALYTICS INITIATIVE END-TO-END ACCELERATED ANALYTICS Brad Rees, Ph.D. - Senior

GPU-accelerated similarity searching in a database of short DNA sequences Richard Wilton

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

GPU-accelerated Data Management Data Processing on Modern Hardware Sebastian Bre TU Dortmund

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Document Name Solar Analytics - Rooftop PV energy analytics PREPARED BY: Your Name, Your Title

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

PacketShader: A GPU-Accelerated Software Router Some images and sentence are from original author

DLGF Data 101: Real Property Presented by: James Johnson Director of Data Analysis Division May

Data exclusivity, market protection and paediatric rewards Workshop for Micro, Small and Medium

Visual AI on the Edge and in the Cloud Easy Access to Earth Observation Data Problems We Solve

Prioritizing Data and Purpose of Data Points What data do I have? What data do I trust? What

InfoTech-Cs Reverse Data Modelling Contact: jiri.pavlicek@infotech-cs.eu InfoTech-Cs CS The

GAUGING MOOC LEARNERS ADHERENCE TO THE DESIGNED LEARNING PATH DAN DAVIS , GUANLIANG CHEN,

TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015

Data preparation for verifjcation L. Wilson Associate Scientist Emeritus Environment Canada

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team