Data Lake to AI on GPUs CPUs can no longer handle the growing data - PowerPoint PPT Presentation

Data Lake to AI on GPUs

CPUs can no longer handle the growing data demands of data science workloads Slow Process Suboptimal Infrastructure Preparing data and training models Hundreds to tens of thousands of CPU can take days or even weeks. servers are needed in data centers. @blazingdb @blazingdb

GPUs are well known for accelerating the training of machine learning and deep learning models. Performance Machine improvements Learning increase at scale. 40x Improvement Deep Learning over CPU. (Neural Networks) @blazingdb @blazingdb

But data preparation still happens on CPUs, and can’t keep up with GPU accelerated machine learning. • Apache Spark Query ETL ML Train • Apache Spark + GPU ML ML Query ETL Train Enterprise GPU users find it challenging to “Feed the Beast” . @blazingdb @blazingdb

An end-to-end analytics solution on GPUs is the only way to maximize GPU power. RAPIDS (All GPU) ML Query ETL Train Expertise: Expertise: Expertise: · Python · GPU DBMS · CUDA · Data Science · GPU Columnar Analytics · Machine Learning · Machine Learning · Data Lakes · Deep Learning @blazingdb @blazingdb

RAPIDS, the end-to-end GPU analytics ecosystem import cudf A set of open source libraries for GPU from cuml import KNN accelerating data preparation and import numpy as np machine learning . np_float = np.array([ [1,2,3], #Point 1 [1,2,3], #Point 2 [1,2,3], #Point 3 ]).astype('float32') Data Preparation Model Training Visualization gdf_float = cudf.DataFrame() gdf_float['dim_0'] = np.ascontiguousarray(np_float[:,0]) gdf_float['dim_1'] = np.ascontiguousarray(np_float[:,1]) gdf_float['dim_2'] = np.ascontiguousarray(np_float[:,2]) cuDF cuML cuGRAPH print('n_samples = 3, n_dims = 3') print(gdf_float) Data Preparation Machine Learning Graph Analytics knn_float = KNN(n_gpus=1) knn_float.fit(gdf_float) Distance,Index = knn_float.query(gdf_float,k=3) # Get 3 nearest neighbors In GPU Memory print(Index) print(Distance) @blazingdb @blazingdb

BlazingSQL: The GPU SQL Engine on RAPIDS A SQL engine built on RAPIDS. Query enterprise data lakes lightning fast with full interoperability with the RAPIDS stack. @blazingdb @blazingdb

BlazingSQL, The GPU SQL Engine for RAPIDS from blazingsql import BlazingContext A SQL engine built on RAPIDS. Query enterprise data lakes lightning fast with bc = BlazingContext() full interoperability with RAPIDS stack. #Register Filesystem bc.hdfs('data', host='129.13.0.12', port=54310) cuDF Data Preparation # Create Table bc.create_table('performance', file_type='parquet', path='hdfs://data/performance/') cuML Machine Learning #Execute Query In GPU Memory result_gdf = bc.run_query('SELECT * FROM performance WHERE YEAR(maturity_date)>2005') cuGRAPH Graph Analytics print(result_gdf) @blazingdb @blazingdb

Getting Started Demo @blazingdb @blazingdb

BlazingSQL + XGBoost Loan Risk Demo Train a model to assess risk of new mortgage loans based on Fannie Mae loan performance data Mortgage Data ETL/ 4.22M Loans XGBoost Training Feature Engineering 148M Perf. Records CSV Files on HDFS + 1 Nodes 4 Nodes + 8 vCPUs per node + 16 vCPUs per node + CLUSTER CLUSTER 30GB RAM 1 Tesla T4 GPU 2560 16GB CUDA Cores VRAM @blazingdb @blazingdb

RAPIDS + BlazingSQL outperforms traditional CPU pipelines Demo Timings (ETL Phase) 3.8GB (1 x T4) 3.8GB (4 Nodes) 15.6GB (1 x T4) 15.6GB (4 Nodes) TIME IN SECONDS 0’’ 1000’’ 2000’’ 3000’’ @blazingdb @blazingdb

Scale up the data on a DGX 4 x V100 GPUs @blazingdb @blazingdb

BlazingSQL + Graphistry Netflow Analysis Visually analyze the VAST netflow data set inside Graphistry in order to quickly detect anomalous events. Netflow Data ETL Visualization 65M Events 1,440 Devices 2 Weeks @blazingdb @blazingdb

Benchmarks Netflow Demo Timings (ETL Only) @blazingdb @blazingdb

Benefits of BlazingSQL Blazing Fast. Data Lake to RAPIDS Massive time savings with our Query data from Data Lakes GPU accelerated ETL pipeline. directly with SQL in to GPU memory, let RAPIDS do the rest. Minimal Code Changes Required. Stateless and Simple. RAPIDS with BlazingSQL mirrors Underlying services being Pandas and SQL interfaces for stateless reduces complexity seamless onboarding. and increase extensibility. @blazingdb @blazingdb

Upcoming BlazingSQL Releases VO.1 VO.2 VO.3 VO.4 VO.5 Query Direct Query String Physical Plan Distributed GDFs Flat Files Support Optimizer Scheduler Use the PyBlazing Integrate FileSystem API, String support and string Partition culling for where SQL queries are fanned connection to execute SQL adding the ability to operation support. clauses and joins. out across multiple GPUs queries on GDFs that are directly query flat files and servers. loaded by the cuDF API (Apache Parquet & CSV) inside distributed file systems. @blazingdb @blazingdb

Get Started BlazingSQL is quick to get up and running using either DockerHub or Conda Install: @blazingdb @blazingdb

Data Lake to AI on GPUs CPUs can no longer handle the growing data - PowerPoint PPT Presentation

Data Lake to AI on GPUs CPUs can no longer handle the growing data demands of data science workloads Slow Process Suboptimal Infrastructure Preparing data and training models Hundreds to tens of thousands of CPU can take days or even weeks.

Touchless Handle Touchless Handle | Product Vision Touchless Handle is a gesture-based way to

Laurance Lake Reservoir Course Reservoir Course Project j Laurance Lake Laurance Lake

Hidden Lake Vista -Montana Tufa Towers Mono Lake California Sylvan Lake ,Black Hills-South Dakota

CPUs Chapter 3.5 Caches. Memory management. Caches and CPUs address data cache

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Touchless Handle Swipe to lock/unlock Touchless Handle is a hands-free way to operate a bathroom

Temporary Seasonal Lake Lowering Overview Pertinent facts and details about Lake Conroe, Lake

JJC - Campus Lake Lake pictures taken by Virginia Piekarski, 2006 JJC Lake Rehabilitation

Lake Friendly Living WHAT YOU CAN DO NOW TO KEEP YOUR LAKE CLEAN AND HEALTHY! Pleasant Lake

Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs G. Bernab

1,202 CPUs and 176 GPUs used by the Google DeepMind AI to win against the Go game

Boost.Compute A C++ library for GPU computing Kyle Lutz GPUs Multi-core CPUs (NVIDIA, AMD,

HetCore: TFET-CMOS Hetero-Device Architecture for CPUs and GPUs Bhargava Gopireddy, Dimitrios

Cartoon parallel architectures; CPUs and GPUs CSE 6230, Fall 2014 Th Sep 11 Thanks

Search on Modern CPUs and GPUs N. Satish, C. Kim, J. Chhugani, A. Nguyen, V. Lee, D. Kim, P. Dubey

Using GPUs as CPUs for Engineering Applications: Challenges and Issues Michael A. Heroux Sandia

HCI & Storage 1 2 Isilon The Recognized Leader Reflects on both product

Twitter Data Processing with MongoDB By Ama & Sameera Introduction Create twitter

HBase on top of HDFS Seminar Software Systems Engineering "Mobile, Security, Cloud

Towards General-Purpose Resource Management in Shared Cloud Services Jonathan Mace , Brown

Sentiment Analysis using Hadoop Sponsored By Atlink Communications Inc Instructor : Dr.Sadegh

Introduction to OpenStack Nabil Abdennadher, HES-SO What is OpenStack ? Free and

Grid Datafarm Architecture and Standardization of Grid File System Osamu Tatebe Tatebe Osamu

Distributed Sensing and Perception via Sparse Representation Allen Y. Yang yang@eecs.berkeley.edu

Data Lake to AI on GPUs CPUs can no longer handle the growing data - PowerPoint PPT Presentation

Data Lake to AI on GPUs CPUs can no longer handle the growing data demands of data science workloads Slow Process Suboptimal Infrastructure Preparing data and training models Hundreds to tens of thousands of CPU can take days or even weeks.

Touchless Handle Touchless Handle | Product Vision Touchless Handle is a gesture-based way to

Laurance Lake Reservoir Course Reservoir Course Project j Laurance Lake Laurance Lake

Hidden Lake Vista -Montana Tufa Towers Mono Lake California Sylvan Lake ,Black Hills-South Dakota

CPUs Chapter 3.5 Caches. Memory management. Caches and CPUs address data cache

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Touchless Handle Swipe to lock/unlock Touchless Handle is a hands-free way to operate a bathroom

Temporary Seasonal Lake Lowering Overview Pertinent facts and details about Lake Conroe, Lake

JJC - Campus Lake Lake pictures taken by Virginia Piekarski, 2006 JJC Lake Rehabilitation

Lake Friendly Living WHAT YOU CAN DO NOW TO KEEP YOUR LAKE CLEAN AND HEALTHY! Pleasant Lake

Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs G. Bernab

1,202 CPUs and 176 GPUs used by the Google DeepMind AI to win against the Go game

Boost.Compute A C++ library for GPU computing Kyle Lutz GPUs Multi-core CPUs (NVIDIA, AMD,

HetCore: TFET-CMOS Hetero-Device Architecture for CPUs and GPUs Bhargava Gopireddy, Dimitrios

Cartoon parallel architectures; CPUs and GPUs CSE 6230, Fall 2014 Th Sep 11 Thanks

Search on Modern CPUs and GPUs N. Satish, C. Kim, J. Chhugani, A. Nguyen, V. Lee, D. Kim, P. Dubey

Using GPUs as CPUs for Engineering Applications: Challenges and Issues Michael A. Heroux Sandia

HCI &amp; Storage 1 2 Isilon The Recognized Leader Reflects on both product

Twitter Data Processing with MongoDB By Ama &amp; Sameera Introduction Create twitter

HBase on top of HDFS Seminar Software Systems Engineering &quot;Mobile, Security, Cloud

Towards General-Purpose Resource Management in Shared Cloud Services Jonathan Mace , Brown

Sentiment Analysis using Hadoop Sponsored By Atlink Communications Inc Instructor : Dr.Sadegh

Introduction to OpenStack Nabil Abdennadher, HES-SO What is OpenStack ? Free and

Grid Datafarm Architecture and Standardization of Grid File System Osamu Tatebe Tatebe Osamu

Distributed Sensing and Perception via Sparse Representation Allen Y. Yang yang@eecs.berkeley.edu

HCI & Storage 1 2 Isilon The Recognized Leader Reflects on both product

Twitter Data Processing with MongoDB By Ama & Sameera Introduction Create twitter

HBase on top of HDFS Seminar Software Systems Engineering "Mobile, Security, Cloud