GPU OPEN ANALYTICS INITIATIVE END-TO-END ACCELERATED ANALYTICS Brad - PowerPoint PPT Presentation

GPU OPEN ANALYTICS INITIATIVE END-TO-END ACCELERATED ANALYTICS Brad Rees, Ph.D. - Senior Solution Architect - NVIDIA GTC DC, November 2017 The AI Computing Company

AGENDA – TWO PARTS Discuss Analysis from the Perspective of Data Science “ Data science , also known as data-driven • Part 1 science , is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data … ” • Big Data and Spark - WIkipedia • GPU Barriers Better Exploration ∝ Better Science • Part 2 Faster Analytics yield better Exploration • GOAI Fail Fast Needs to be Embraces I have not failed. I've just found 10,000 ways that won't work. - Thomas A. Edison

the Big Data Catalyst The Glue that Binds Big Data • Spark has become synonymous with Hadoop and Big Data • It’s the interface/API for big data app to app communication • The processing layer for big data and leading ML framework

SPARK IS NOT ENOUGH We Want More Efficiency and Speed • Common issue is speed at scale • Scaling out to get the necessary speed for mission critical workloads is prohibitively expensive • Clients want core ML on GPU Commercial Government HPC We need a GPU-equivalent to Spark … But there are some Barriers

GPU ADOPTION • Too much data movement BARRIERS • Too many makeshift data formats • No inter-GPU communication Concerns: • No Python API for data manipulation Too Hard to Integrate GPUs • • No all inclusive Machine Learning Library Not suited for Data Science •

DATA MOVEMENT AND TRANSFORMATION The bane of productivity / performance • Too much time spent Moving data Data movement and conversion • hinder any performance gains No Inter-GPU Communication • CPU

Parquet GML CSV Panda Avro HDFS XML Numpy DATA FORMATS JSON Pickle CSC ProtoBuf CSR COO Plain Text vs Binary Compressed vs Uncompressed * Not a complete list

ARE THE GPU BARRIERS TO GREAT? Is there any hope? ☹️ Data movement ☹️ Data formats ☹️ Inter-GPU communication ☹️ No Python API for data manipulation ☹️ No all inclusive Machine Learning Library

GPU OPEN ANALYTICS INITIATIVE Luckily others were also thinking about the problems • Formed in March at Strata SJ; Launched at GTC in May • Goal: GOAI seeks to foster open collaboration between GPU analytics projects and products to enable data scientists to efficiently combine the best tools for their workflows.

ACCELERATED ANALYTICS ECOSYSTEM Prior State (pre-March 2017) ● Fragmented with too INTERACTION Graphistry Jupyter NB many holes MapD Immerse ● Still too reliant on CPU for moving data between applications Data Manipulation ● 80-90% of data science is PROCESSING accelerated analytics, not MapD Anaconda * deep learning yet AND Fast Data BlazingDB NV Graph (Dask (Streaming) (“SQL”) ANALYTICS “Python”) IN GPU MEMORY Many Columnar Data Frames DATA (everyone has their own makeshift data frame) STRUCTURE Key: Open Source Free to Use STORAGE MapD GPU Ram BlazingDB Disk Closed Source * Primarily x86 w/ some GPU acceleration

ACCELERATED ANALYTICS ECOSYSTEM Post-March 2017 INTERACTION Graphistry Jupyter NB MapD Immerse Data Manipulation PROCESSING MapD Anaconda AND H2O (Data. H2O.ai (GPU Fast Data BlazingDB NV Graph (Dask Table “R”) MLlib) (Streaming) (“SQL”) ANALYTICS “Python”) IN GPU MEMORY Standard Columnar Data Frame DATA (Open Sourced/Free to Use from MapD) STRUCTURE Key: Open Source Free to Use STORAGE MapD + BlazingDB MapD GPU Ram BlazingDB Disk System Memory Closed Source

LEARNING FROM APACHE ARROW Interoperability Big Data ecosystem facing similar issues Major push in the big data world to remove bottlenecks of copy & converting data between systems Apache Arrow™ enables execution engines to take advantage of the • latest SIMD (Single input multiple data) operations Columnar layout is optimized for data locality for • better performance on modern hardware like CPUs and GPUs. The Arrow memory format supports zero-copy • reads for lightning-fast data access without serialization overhead.

THE GPU DATA FRAME First GOAI Project ✓ Data movement ✓ Data formats ✓ Inter-GPU communication ✓ Python API ✓ Machine Learning Library CPU So …. What does this get me?

SEAMLESS CALLS BETWEEN APPLICATIONS What does GOAI get me? Big improvement for Data Science Load data into MapD • • Call an H2O ML algorithm All via Anaconda Python • • Within a Jupyter Notebook Demos available on goai github

SEAMLESS CALLS BETWEEN APPLICATIONS What does GOAI get me? Big improvement for Data Science Load data into MapD • pygdf: Python library for manipulating GDFs • Call an H2O ML algorithm • Creating GDFs from numpy arrays and Pandas DataFrames • Performing math operations on columns All via Anaconda Python • • Import/export via CUDA IPC • Sort, join, reductions • Within a Jupyter Notebook • JIT compilation of group by and filter kernels using Numba Demos available on goai github

SIMPLE DATA CONVERSION Convert from Pandas and Numpy

Several Examples Available on GOAI GitHub

GOAL OF GOAI Better Adoption with Better Usability and TCO Hadoop Processing, Reading from disk HDFS HDFS HDFS HDFS HDFS SQL Query ETL Train Read Write Read Write Read Spark In-Memory Processing 25-100x Improvement Large TCO benefit Less code over Hadoop Language flexible HDFS Large Adoption SQL Query ETL ML Train Primarily In-Memory Read GPU + Spark In-Memory Processing 5-10x Improvement Small TCO benefit More code over Spark Language rigid HDFS GPU SQL CPU GPU CPU GPU ML Small Adoption Read ETL Substantially on GPU Read Read Query Write Write Read Train End-to-End GPU Processing (GOAI) 25-100x Improvement Large TCO benefit Same code over Spark Language flexible Arrow SQL ML Large Adoption? Query ETL Primarily on GPU Read Train

• libgdf: C library of helper functions: • Copying GDF metadata block to the host and parsing it INITIAL LIBRARIES to a host-side struct • Importing/exporting via CUDA IPC GPU Data Frame • CUDA kernels to perform element-wise math operations on GDF columns. • CUDA sort, join, and reduction operations on GDFs. github.com/gpuopenanalytics • pygdf: Python library for manipulating GDFs • Creating GDFs from numpy arrays and Pandas DataFrames • Performing math operations on columns • Import/export via CUDA IPC • Sort, join, reductions • JIT compilation of group by and filter kernels using Numba • dask_gdf: Extension for Dask to work with distributed GDFs. • Same operations as pygdf, but working on GDFs chunked onto different GPUs and different servers.

ABOUT ~8.5x speedup on half a DGX Python on GPU... ~100x speedup using MapD on to produce a robust GLM via Numba and Pandas half a DGX to analyze census 10-fold cross-validation vs an 8 data vs a 20 node Spark cluster node Spark cluster ~5X faster than Redshift to utilize full >50x speedup in ~100x more cyber security data disk storage and system memory performing pagerank on a interactively visualized using an graph on half a DGX vs intuitive layout algorithm on a an 8 node Spark cluster single GPU as a connected graph

MapD GPU-accelerated analytics platform Consists of MapD Core database and MapD Immerse MapD Core database is an in-GPU-memory, columnar, open-source, GPU-accelerated, SQL database. MapD Enterprise brings distributed and high availability modes, GPU-accelerated backend rendering, Kerberos/LDAP security, and ODBC/JDBC. MapD Immerse is a visual analytics platform on top of the MapD Core database that allows data scientists and analysts to interactively explore large datasets.

1.1 BILLION TAXI RIDES BENCHMARK GPU Memory based Query 1 Query 2 Query 3 Query 4 8134 19624 85942 10190 5000 databases 8x to 15x faster 4500 than CPU in- 4000 memory databases 3500 such as Redshift. Time in Milliseconds 2970 3000 100x to 485x faster 2500 2250 than Spark 2000 on 11-servers 1560 1500 1250 1209 Open Source core 1000 795 DBMS 596 518 372 500 150 80 21 Free Community 0 Edition MapD DGX-1 Kinetica DGX-1 Redshift 6-node Spark 11-node @marklit82 Source: MapD Benchmarks on DGX-1 from internal NVIDIA testing following guidelines of Mark Litwintschik’s blogs: Redshift, 6-node ds2.8xlarge cluster & Spark 2.1, 11 x m3.xlarge cluster w/ HDFS

BlazingDb GPU-accelerated petabyte scale data warehouse Consists of BlazingDB database BlazingDB database is a disk-based, columnar, GPU-accelerated SQL database. BlazingDB has distributed and high availability modes, JDBC, and Python/C# APIs. BlazingDB offers a Community Edition that can be downloaded for free and has an Enterprise Edition that you can launch today on AWS.

Blazing DB high performance SQL on petabyte scale Blazing speedup BlazingDB SQL is built on a columnar relational data model. Enterprise grade security through Spring Security BlazingDB distributes both data and computation to multiple instances, for more data, or faster query speeds • https://blazingdb.com/

GPU OPEN ANALYTICS INITIATIVE END-TO-END ACCELERATED ANALYTICS Brad - PowerPoint PPT Presentation

GPU OPEN ANALYTICS INITIATIVE END-TO-END ACCELERATED ANALYTICS Brad Rees, Ph.D. - Senior Solution Architect - NVIDIA GTC DC, November 2017 The AI Computing Company AGENDA TWO PARTS Discuss Analysis from the Perspective of Data Science

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

THEIA GPU Open Source multicore programmable GPU Problem Statement Develop an open source 3D

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Use Tesla to provide first GPU VM Service in China Feng Zhu

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

GPU Architecture and chitecture and GPU Ar The good The good The bad The bad

GPU programming in Haskell Henning Thielemann 2015-01-23 GPU programming in Haskell Motivation:

MVAPICH2-GPU: Op0mized GPU to GPU Communica0on for InfiniBand

Real-Time GPU Management Heechul Yun 1 This Week Topic: General Purpose Graphic Processing

GPU programming Dr. Bernhard Kainz 1 Overview About myself Last week Motivation GPU

Reading Europe Advanced Data Investigation Tool Presentation Brigitte Ouvry-Vial Le Mans

Ministerial Leadership in the 21st Century Daiwa House 20 February 2020 Fathoming Fragility and

Factors affecting implementation of PatientCentered Medical Homes (PCMH) for Older Adults in

Assessment of HIV+/50+ HIV + individuals over the age of 50 are currently considered a targeted

A Beamer Tutorial in Beamer Charles T. Batts April 4, 2007 Department of Computer Science The

RAPIDS: PYTHON GPU-ACCELERATED DATA SCIENCE Keith Kraus 3-18-2019 Dante Gama Dessavre DATA

Checklist: How to prepare and conduct a presentation with Power-Point Overview 1. Benefits and

vertical thermal gradients How do thermal tolerance limits differ among species? What

GPU OPEN ANALYTICS INITIATIVE END-TO-END ACCELERATED ANALYTICS Brad - PowerPoint PPT Presentation

GPU OPEN ANALYTICS INITIATIVE END-TO-END ACCELERATED ANALYTICS Brad Rees, Ph.D. - Senior Solution Architect - NVIDIA GTC DC, November 2017 The AI Computing Company AGENDA TWO PARTS Discuss Analysis from the Perspective of Data Science

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

THEIA GPU Open Source multicore programmable GPU Problem Statement Develop an open source 3D

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Use Tesla to provide first GPU VM Service in China Feng Zhu

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

Super GPU &amp; Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

GPU Architecture and chitecture and GPU Ar The good The good The bad The bad

GPU programming in Haskell Henning Thielemann 2015-01-23 GPU programming in Haskell Motivation:

MVAPICH2-GPU: Op0mized GPU to GPU Communica0on for InfiniBand

Real-Time GPU Management Heechul Yun 1 This Week Topic: General Purpose Graphic Processing

GPU programming Dr. Bernhard Kainz 1 Overview About myself Last week Motivation GPU

Reading Europe Advanced Data Investigation Tool Presentation Brigitte Ouvry-Vial Le Mans

Ministerial Leadership in the 21st Century Daiwa House 20 February 2020 Fathoming Fragility and

Factors affecting implementation of PatientCentered Medical Homes (PCMH) for Older Adults in

Assessment of HIV+/50+ HIV + individuals over the age of 50 are currently considered a targeted

A Beamer Tutorial in Beamer Charles T. Batts April 4, 2007 Department of Computer Science The

RAPIDS: PYTHON GPU-ACCELERATED DATA SCIENCE Keith Kraus 3-18-2019 Dante Gama Dessavre DATA

Checklist: How to prepare and conduct a presentation with Power-Point Overview 1. Benefits and

vertical thermal gradients How do thermal tolerance limits differ among species? What

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,