GOAI ONE YEAR LATER Joshua Patterson, Director AI Infrastructure - PowerPoint PPT Presentation

GOAI ONE YEAR LATER Joshua Patterson, Director AI Infrastructure 3/27/18 @datametrician

THE WORLD WE ANALYZE Realities of Data 2

IN A FINITE CRISIS CPU Performance Has Plateaued 10 7 10 6 Transistors 10 5 1.1X per year (thousands) 10 4 10 3 1.5X per year 10 2 Single-threaded perf 1980 1990 2000 2010 2020 3

IN A FINITE CRISIS GPU Performance Grows 10 7 GPU-Computing perf 1000X 1.5X per year By 2025 10 6 10 5 1.1X per year 10 4 10 3 1.5X per year 10 2 Single-threaded perf 1980 1990 2000 2010 2020 4

IN A FINITE CRISIS CPU Performance Has Plateaued Peak Double Precision 8.0 7.0 6.0 5.0 TFLOPS 4.0 3.0 2.0 1.0 0.0 2008 2010 2012 2014 2016 2017 5 NVIDIA GPU x86 CPU

PRE-52 WEEKS LATER Fast Was Made Slow Continuum Gunrock H2O.ai Graphistry Read Data APP B GPU APP B Copy & Convert Data CPU GPU Copy & Convert GPU APP A Copy & Convert Data APP A BlazingDB MapD Load Data 6

GPU LEADERS UNITE Could We Do Better Than Big Data? 7

TRADITIONAL DATA SCIENCE ON GPUS Lots of glue code and plagued by copy and converts Hadoop Processing, Reading from disk HDFS HDFS HDFS HDFS HDFS Query ETL ML Train Read Write Read Write Read Spark In-Memory Processing 25-100x Improvement Less code HDFS Language flexible Query ETL ML Train Read Primarily In-Memory GPU/Spark In-Memory Processing 5-10x Improvement More code Language rigid HDFS ReadQuery CPU GPU GPU CPU GPU ML Read ETL Substantially on GPU Read Write Write Read Train • Each system has a different internal memory format with mostly overlapping functionality • Depending on the workflow, 80+% of time and computation is wasted on the serialization, 8 deserialization, and copying of data

PRE-52 WEEKS LATER What We Want Continuum Gunrock H2O.ai Graphistry Read Data GPU CPU Memory GPU Buffer BlazingDB MapD Load Data 9

GOAI AND THE BIG DATA ECOSYSTEM No copy and converts on the GPU, compatible with Apache Arrow No Copy & Converts - Full Interoperability Continuum Gunrock H2O.ai GPU Simantex Data Graphistry Frame • All systems utilize the same memory format, so overhead for cross-system nvGRAPH communication is minimized and projects can share features and functionality • Most, if not all of the GPU Data Frame functionality is going back into Apache Arrow • Currently three GPU Data Frame libraries: libgdf (C library), pygdf (Python library), and Dask_gdf (multi-gpu, multi-node Python library) BlazingDB MapD 10 github.com/apache/arrow github.com/gpuopenanalytics

DATA SCIENCE ON GPUS WITH GOAI + GDF Faster Data Access Less Data Movement Hadoop Processing, Reading from disk HDFS HDFS HDFS HDFS HDFS Query ETL ML Train Read Write Read Write Read Spark In-Memory Processing 25-100x Improvement Less code HDFS Language flexible Query ETL ML Train Read Primarily In-Memory GPU/Spark In-Memory Processing 5-10x Improvement More code Language rigid HDFS GPU ReadQuery CPU GPU CPU GPU ML Read ETL Substantially on GPU Read Write Write Read Train End to End GPU Processing (GOAI) 10-25x Improvement Same code Arrow Language flexible ML Query ETL Primarily on GPU Read Train 11

ANACONDA – PyGDF & DaskGDF Moving From Traditional Flows Traditional Workflows Arrays Data Database ETL Model Sparse Frame Matrix Manipulation of columns Data originates from a Training with database occur here (encoding, many transformations, training algorithms to Nearly all data curation variable creation). find the most happens within the database accurate (joins, group bys, unions, Data structure is method etc…) converted from a dataframe to a matrix or Database has already dealt arrays with providing structure to the data and contains nearly all useable data for ML The output of a database and its functionality is a data frame where additional ETL is minimal 12

PYGDF & DaskGDF UDFs Python -> GPU Acceleration • Write a custom function in Python that gets JIT compiled into a GPU Kernel by Numba • Functions can be applied by row, column, or groupby group 13

ANACONDA – PyGDF & DaskGDF To Complex Flows Complex Workflows Array Data Data ETL Model Sparse Lake Frame Matrix Many Array Data ETL Model Database Data ETL Sparse Frame Matrix Types Array Data ETL Model Sparse Streams Frame Matrix Data originates from wherever ETL process is in charge of moving Training with many developers can find data, and it’s data into one usable format (from csv, algorithms occur. stored in many formats. xml, json, db formats, hadoop formats, etc…) Feedback loop to ETL then With many groups using data in occurs. different ways, data is stored in Data curation performed on all the formats for maximum usability, but disparate data Back in the ETL process: if pushes more manipulation to the a subset of the data is the ETL functions Subsets of the data created for different root cause of accuracy modeling targets issues, new subsets are The output of all these sources are formed for new algorithmic varying data frames with varying approaches. The rest of traditional ETL occurs. structure 14

PYGDF NEW JOINS Faster Join Support Coming TPCH Query 21 – End to End Results Using 32-bit Keys* TIME (MS) SF1 SF10 SF100 CPU (single-threaded) 1329 31731 465064 300x V100 (PCIe3) 22 164 1521 3.2x V100 (3xNVLINK2) 12 45 466 TPCH Query 4 – End to End Results Using 32-bit Keys* TIME (MS) SF1 SF10 SF100 CPU (single-threaded) 150 2041 24960 26x V100 (PCIe3) 13 105 946 3.1x V100 (3xNVLINK2) 7 23 308 15 *A *Assuming g the input tables are loaded and pinned in system memory

PYGDF NEW JOINS Takeaways GPU memory capacity is not a limiting factor GPU query performance up to 2-3 orders of magnitude better than CPU GPU query perf is dominated by the CPU-GPU interconnect throughput NVLINK systems show 3x better E2E query performance compared to PCIe Thanks Nikolay Sakharnykh! S8417 - Breaking the Speed of Interconnect with Compression for Database Applications – Tuesday, Mar 27, 2:00pm – Room 210F 16

BLAZINGDB JOINS GOAI Scale Out Data Warehousing 17

BLAZINGDB Scale Out Data Warehousing Compression/Decompression on GPU to SCHEMA improve throughput METADATA DATA • Compression/Descompression • Filtering (Predicate Pushdown) • Aggregations • Transformations DATA LAKE • Joins • Sorting/Ordering Same system more interoperability 0001010100001001011010110 Parquet In Arrow/GDF Out • RAM Cache • Disk Cache • HDD • SSD 18

BLAZINGDB The Future Coming Soon Common Data Layer STORAGE GDF (Data Lake) Arrow INGEST 19

H2O.AI ML On GPU First 3 algorithms • GLM • K-Means • GBM (XGBoost) 20

H2O.AI ML On GPU 2 new algorithms • tSVD • PCA 21

XGBOOST Faster, More Scalable, & Better Inferencing Scalability increase from 16GB to 100GB on DGX-1 Performance Improvement not only on single-GPU, but multi-GPU scaling GBDT Inference Library Thanks Andrey Adinets, Vinay Deshpande, and Thejaswi Nanditale! 22

NVGRAPH Arrow to Graph • Ported nvGRAPH to run natively on GPU DataFrame, so we can use two columns as source and destinations to define an unweighted graph. • Breadth First Search, Jaccard Similarity and Pagerank with Python Bindings • Developing Hornet integration for GoAi as well as Gunrock • 1x P100 2-3 orders of magnitude faster than i7-3930K NetworkX Python Library 23

IBM SNAPML More Proof of ML on GPU 24

MAPD In Memory GPU Database 100x Faster Queries Speed of Thought Visualization + MapD Core MapD Immerse A visual analytics engine that A fast, relational, column store leverages the speed + rendering database powered by GPUs capabilities of MapD Core 25

MAPD Improvements Since GTC17 Multi-source Dashboards MapD Immerse Multi-layer GeoCharts Auto-refresh for streaming data Charting - Combo chart, multi-measure line chart, stacked bar Performance - Joins, string literal comparisons MapD Core Data Ingestion - Read from Kafka, compressed files, S3 Major rendering performance improvements O(1-10MM ) polygons in ~ms Arrow - improved GPU memory mgt, pymapd with bi-directional arrow-based ingest 26

MAPD MapD Presto GPU Database Performance 35 30 30 25 25 20 20 https://github.com/NVIDIA/presto-mapd-connector 15 10 8 8GPU MapD alone up to 40x dual 20-core CPU on 6 4 5 inferencing streaming data 1.2 1.2 1.2 0.1 0.1 0.1 0 PRESTO ON JSON PRESTO ON PARQUET MAPD PRESTO + MAPD Faux Multi-node MapD Presto being developed 10 mins 30 mins 60 mins 27

MAPD Dashboards Comparison vs Kibana 28

MAPD MapD Immerse vs Elastic Kibana 300 MapD Immerse (DGX) x MapD Immerse (P2) Elastic Kibana Time to Fully Load (seconds) 200 100 < 12s < 9s 0 1 6 11 16 21 26 31 Days of Data 29

GRAPHISTRY Accelerated visual graph analytics and investigation platform 30

GRAPHISTRY Improvements since GTC17 CSV, JSON, ETC Arrow.js GPU DATA FRAME https://www.npmjs.com/package/arrow 31

TESLA V100 32GB WORLD’S MOST ADVANCED DATA CENTER GPU WITH 2X THE MEMORY 5,120 CUDA cores 640 NEW Tensor cores 7.8 FP64 TFLOPS | 15.7 FP32 TFLOPS | 125 Tensor TFLOPS 20MB SM RF | 16MB Cache 32GB HBM2 @ 900 GB/s | 300 GB/s NVLink 32

GOAI ONE YEAR LATER Joshua Patterson, Director AI Infrastructure - PowerPoint PPT Presentation

GOAI ONE YEAR LATER Joshua Patterson, Director AI Infrastructure 3/27/18 @datametrician THE WORLD WE ANALYZE Realities of Data 2 IN A FINITE CRISIS CPU Performance Has Plateaued 10 7 10 6 Transistors 10 5 1.1X per year (thousands) 10 4 10

The Shmitah Cycle Common Holy Year 1 Year 2 Year 1 Year 2 Year 3 Year 4 Year 5 Year 6

One Year Later One Year Later Presentation to the Federal Communications Commission November

Jieun Kim Hi-Sun Kim University of Chicago 1 st 2 nd 3 rd 4 th 5 th st nd rd th th year

District 211 One-to-One Program One-to-One: Program Background 2012-2013 2016-2017 One-to-One

Blueprint for Restoring Safety and Soundness to the GSEs: One Year Later November 2018 Safety

ONE YEAR LATER WHERE WE ARE ON TRANSPORTATION ALTERNATIVES QUESTIONS & COMMENTS Submit

POLICY BRIEFING: POLICY BRIEFING: The LSRP Program, A Year The LSRP Program, A Year Later

Seven Years Later: Seven Years Later: What the Agile Manifesto Left Out What the Agile Manifesto

Promise for the Future -- Impressions of some of the later Swenson cultivars -- Bruce Smith

United Way Presentation Tiny Tots Child Care Later that night Three months later How it

Improving availability for Customers June 2016 ECR Malaysia Confidential - Joe Dybell Dairy

Life on on the the Battlefields Battlefields Life 94 Years Years Later Later 94 Charlotte

Later Hittite Kings Regnal Dates and Succession M5-20a M5-17b Later Hittite Kings Regnal Dates

For one hour , One year later et al.: Melting frozen time in narrative text comprehension Berry

Dual Language Immersion Middle School Programming One Team. One Mission. One Rock Hill. Welcome!

full year results full year results full year results full full year results full year results full

Cincinnatus Central School District Long Range Facilities Plan LRFP Eric Benedict Director of

Parquet Modular Encryption Gidon Gershinsky IBM Research Haifa Lab Speaker Senior Architect

Generating Sustainable Value. Hill of Towie, Scotland trig-ltd.com Important Information This

January 27, 2016 Saf e H arbor A nd Legend To the extent that statements in this PowerPoint

MONTAGE Sectional view Roof timber External facing brick finishing S.I.P. system beams (Posi

Double End Tenoner A term not only for the furniture industry but in a lot of areas of

Welcome! The evolution of AHT Amorphous Metal Ribbon into the leading green heating technology of

National Context Production and trade statistics 2015 17 November 2016 National Context Name

Sambuz

Useful Links

Newsletter

Mail Us

GOAI ONE YEAR LATER Joshua Patterson, Director AI Infrastructure - PowerPoint PPT Presentation

GOAI ONE YEAR LATER Joshua Patterson, Director AI Infrastructure 3/27/18 @datametrician THE WORLD WE ANALYZE Realities of Data 2 IN A FINITE CRISIS CPU Performance Has Plateaued 10 7 10 6 Transistors 10 5 1.1X per year (thousands) 10 4 10

The Shmitah Cycle Common Holy Year 1 Year 2 Year 1 Year 2 Year 3 Year 4 Year 5 Year 6

One Year Later One Year Later Presentation to the Federal Communications Commission November

Jieun Kim Hi-Sun Kim University of Chicago 1 st 2 nd 3 rd 4 th 5 th st nd rd th th year

District 211 One-to-One Program One-to-One: Program Background 2012-2013 2016-2017 One-to-One

Blueprint for Restoring Safety and Soundness to the GSEs: One Year Later November 2018 Safety

ONE YEAR LATER WHERE WE ARE ON TRANSPORTATION ALTERNATIVES QUESTIONS &amp; COMMENTS Submit

POLICY BRIEFING: POLICY BRIEFING: The LSRP Program, A Year The LSRP Program, A Year Later

Seven Years Later: Seven Years Later: What the Agile Manifesto Left Out What the Agile Manifesto

Promise for the Future -- Impressions of some of the later Swenson cultivars -- Bruce Smith

United Way Presentation Tiny Tots Child Care Later that night Three months later How it

Improving availability for Customers June 2016 ECR Malaysia Confidential - Joe Dybell Dairy

Life on on the the Battlefields Battlefields Life 94 Years Years Later Later 94 Charlotte

Later Hittite Kings Regnal Dates and Succession M5-20a M5-17b Later Hittite Kings Regnal Dates

For one hour , One year later et al.: Melting frozen time in narrative text comprehension Berry

Dual Language Immersion Middle School Programming One Team. One Mission. One Rock Hill. Welcome!

full year results full year results full year results full full year results full year results full

Cincinnatus Central School District Long Range Facilities Plan LRFP Eric Benedict Director of

Parquet Modular Encryption Gidon Gershinsky IBM Research Haifa Lab Speaker Senior Architect

Generating Sustainable Value. Hill of Towie, Scotland trig-ltd.com Important Information This

January 27, 2016 Saf e H arbor A nd Legend To the extent that statements in this PowerPoint

MONTAGE Sectional view Roof timber External facing brick finishing S.I.P. system beams (Posi

Double End Tenoner A term not only for the furniture industry but in a lot of areas of

Welcome! The evolution of AHT Amorphous Metal Ribbon into the leading green heating technology of

National Context Production and trade statistics 2015 17 November 2016 National Context Name

Sambuz

Useful Links

Newsletter

Mail Us

ONE YEAR LATER WHERE WE ARE ON TRANSPORTATION ALTERNATIVES QUESTIONS & COMMENTS Submit