IBMs Open-Source Based AI Developer Tools Sumit Gupta VP , AI, - - PowerPoint PPT Presentation

ibm s open source based ai developer tools
SMART_READER_LITE
LIVE PREVIEW

IBMs Open-Source Based AI Developer Tools Sumit Gupta VP , AI, - - PowerPoint PPT Presentation

IBMs Open-Source Based AI Developer Tools Sumit Gupta VP , AI, Machine Learning & HPC IBM Cognitive Systems @SumitGup guptasum@us.ibm.com March 2019 AI Software Portfolio Strategy Deliver a comprehensive platform that enables data


slide-1
SLIDE 1

Sumit Gupta

VP , AI, Machine Learning & HPC IBM Cognitive Systems @SumitGup guptasum@us.ibm.com March 2019

IBM’s Open-Source Based AI Developer Tools

slide-2
SLIDE 2

AI Software Portfolio Strategy

Deliver a comprehensive platform that enables data science at all skill levels

2

Build AI Models (Machine / Deep Learning) Train AI Models: Interactive or Batch Deploy & Manage Model Lifecycle With GPU acceleration AI Model Performance Monitoring Prepare Data Inference on CPU, GPU, FPGA

Scale to Enterprise-wide Deployment

  • Multiple data scientists
  • Shared Cluster / Hardware Infrastructure

Hybrid Cloud: Common experience on-premise and in public cloud

slide-3
SLIDE 3

IBM Open Source Based AI Stack

3

Accelerated AC922 Power9 Servers Storage (Spectrum Scale ESS)

Watson Studio

SnapML WML CE

Runtime Environment Train, Deploy, Manage Models

Watson OpenScale

Model Metrics, Bias, and Fairness Monitoring

Watson Machine Learning

Watson ML CE Watson ML Accelerator

Data Preparation Model Development Environment

Auto-AI software: PowerAI Vision, IBM Auto-AI

Previous Names: WML Accelerator = PowerAI Enterprise WML Community Ed. = PowerAI-base

Runs on x86 & other storage too Available on Private Cloud or Public Cloud

slide-4
SLIDE 4

4

Watson ML Accelerator Infrastructure Designed for AI

Power9 or x86 Servers with GPU Accelerators Storage (ESS)

Our Focus: Ease of Use & Faster Model Training Times

WML CE: Open Source ML Frameworks

Large Model Support (LMS) Snap ML DDL-16 IBM Spectrum Conductor Apache Spark, Cluster Virtualization, Job Orchestration Distributed Deep Learning (DDL) Auto Hyper-Parameter Optimization (HPO) Elastic Distributed Training (EDT) & Elastic Distributed Inference (EDI) Model Management & Execution Model Life Cycle Management

Watson ML Watson ML Community Edition WML CE

slide-5
SLIDE 5

Snap ML

Distributed High Performance Machine Learning Library

5

Multi-Core, Multi-Socket & GPU Acceleration Distributed Training: Multi-CPU & Multi-GPU

GPU Accelerated

Logistic Regression Linear Regression Ridge / Lasso Regression Support Vector Machines

Multi-Core CPU

Decision Trees Random Forests

More coming ….

CPU-GPU Memory Management

APIs for Popular ML Frameworks

Snap Machine Learning (ML) Library

New

slide-6
SLIDE 6

6

Why do we need High-Performance ML Most Popular Data Science Methods

Source: Kaggle Data Science Survey 2017

Supported by Snap ML Support in 2H 19

  • Performance Matters for
  • Online Re-training of Models
  • Model Selection & Hyper-

Parameter Tuning

  • Fast Adaptability to Changes
  • Scalability to Large Datasets useful for
  • Recommendation engines,

Advertising, Credit Fraud

  • Space Exploration, Weather

Deep Learning

slide-7
SLIDE 7

7

Logistic Regression: 46x Faster

Snap ML (GPU) vs TensorFlow (CPU)

1.1 Hours 1.53 Minutes

20 40 60 80

Google TensorFlow Snap ML

Runtime (Minutes)

46x Faster

90 x86 Servers (CPU-only) 4 Power9 Servers With GPUs

Advertising Click-through rate prediction Criteo dataset: 4 billion examples, 1 million features

30.75 30.64 30.66 29.83 30.63 106.71 86.87 94.94 86.22 102.10 20 40 60 80 100 120 1 2 3 4 5

RunTime (s)

Runs

Predict volatility of stock price, 10-K textual financial reports, 482,610 examples x 4.27M features

Ridge Regression: 3x Faster

Power-NVLink-GPUs vs x86-PCIe-GPUs

x86 Server with 1 GPU Power Server with 1 GPU

slide-8
SLIDE 8

Snap ML is 2-4x Faster than scikit-learn – CPU-only

3.0x 4.2x

3.8x faster

Snap ML on Power vs sklearn on x86

Decision Trees Random Forests

3.8x 3.0x 3.8x 2.4x 2.0x 2.0x

200 400 600 800 1000 1200 creditcard susy higgs epsilon

Runtime (sec) Datasets

SnapML (P9) sklearn (x86)

20 40 60 80 100 120 creditcard susy higgs epsilon

Runtime (sec) Datasets

SnapML (P9) sklearn (x86)

4.2x faster

Snap ML on Power vs sklearn on x86

slide-9
SLIDE 9

Summary of Performance Results for Snap ML

9

GPU vs CPU Snap ML vs scikit-learn: Linear Models 20-40x Power vs x86 with GPUs Snap ML: Linear Models 3x CPU Only: Power vs x86 Snap ML vs scikit-learn: Tree Models 2-4x

slide-10
SLIDE 10

10

Store Large Models & Dataset in System Memory

Transfer One Layer at a Time to GPU

5 10 15 20 25 30 35

Power + 4 GPUs x86 + 4 GPUs

Images / sec

TensorFlow with LMS

Memory CPU 170GB/s NVLink 150 GB/s

IBM AC922 Power9 Server

CPU-GPU NVLink 5x Faster than Intel x86 PCI-Gen3

GPU GPU Memory CPU 170GB/s NVLink 150 GB/s GPU GPU

500 Iterations of Enlarged GoogleNet model on Enlarged ImageNet Dataset (2240x2240), mini-batch size = 15 Both servers with 4 NVIDIA V100 GPUs

4.7x Faster

Large Model Support (LMS) Enables Higher Accuracy via Larger Models

slide-11
SLIDE 11

Distributed Deep Learning (DDL)

11

Deep learning training takes days to weeks DDL in WML CE extends TensorFlow & enables scaling to 100s of servers Automatically distribute and train on large datasets to 100s of GPUs

Near Ideal (95%) Scaling to 256 GPUs

1 2 4 8 16 32 64 128 256

4 16 64 256

Speedup Number of GPUs

Ideal Scaling DDL Actual Scaling ResNet-50, ImageNet-1K Caffe with PowerAI DDL, Running on S822LC Power System

Runs for Days Runs within Hours

slide-12
SLIDE 12

Change Hyperparameters Training Job n Training Job 2 Training Job 1

Auto Hyper-Parameter Optimization (HPO) in WML Accelerator

IBM Spectrum Conductor running Spark

WML Accelerator Auto-Hyperparameter Optimizer (Auto-HPO)

Train Model Manually Choose Parameters Monitor & Prune Select Best Hyperparameters

Run Model Training 100s of Times Manual Process

Lots of Hyperparameters: Learning rate, Decay rate, Batch size, Optimizers (Gradient Descent, Momentum, ..) Auto-HPO has 3 search approaches Random, Tree-based Parzen Estimator (TPE), Bayesian

slide-13
SLIDE 13

13

2 4 6 8

8:09 8:10 8:11 8:12 8:13 8:14 8:15 8:16 8:17 8:18 8:19 8:20 8:21

2 4 6 8

8:09 8:10 8:11 8:12 8:13 8:14 8:15 8:16 8:17 8:18 8:19 8:20 8:21

T0: Job 1 Starts, uses all available GPUs 2 Servers with 4 GPUs each: total 8 GPUs Available Policies: Fair share, Preemption, Priority T1: Job 2 Starts, Job 1 gives up 4 GPUs T2: Job 2 gets higher priority, Job 1 gives up GPUs T3: Job 1 finishes, Job 2 uses all GPUs

Elastic Distributed Training (EDT)

Job 1 Job 2

Dynamically Reallocates GPUs within milliseconds Increases Job Throughput and Server / GPU Utilization Works with Spark & AI Jobs Works with Hybrid x86 & Power Cluster

GPU Slots Time Time

slide-14
SLIDE 14

14

PowerAI Vision: “Point-and-Click” AI for Images & Video

Label Image or Video Data Auto-Train AI Model Package & Deploy AI Model

slide-15
SLIDE 15

15

Core use cases

Image Classification Object Detection Image Segmentation

slide-16
SLIDE 16

Automatic Labeling using PowerAI Vision

16

Train DL Model Manually Label Some Image / Video Frames Auto-Label Full Dataset with Trained DL Model Manually Correct Labels on Some Data

Repeat Till Labels Achieve Desired Accuracy

slide-17
SLIDE 17

Retail Analytics Worker Safety Compliance Remote Inspection & Asset Management

Track how customers navigate store, identify fraudulent actions, detect low inventory Zone monitoring, heat maps, detection of loitering, ensure worker safety compliance Identify faulty or worn-

  • ut equipment in

remote & hard to reach locations

slide-18
SLIDE 18

Oil & Gas Semiconductor Manufacturing Electronics Manufacturing Travel & Transportation Aerospace & Defense Steel Manufacturing Utilities Inspection Robotic Manufacturing

Quality Inspection Use Cases

18

slide-19
SLIDE 19

AI Developer Box & AI Starter Kit

AI Starter Kit Power AI DevBox

Power9 + GPU Desktop PC: $3,449

Order from: https://raptorcs.com/POWERAI/

2 AC922 Accelerated Servers + 1 P9 Linux Storage Server Free 30-Day Licenses for PowerAI Vision & WML Accelerator (free to Academia) WML Accelerator Pre-installed (formerly called PowerAI Enterprise)

19

slide-20
SLIDE 20

500+ Clients using AI on Power Systems

20

Power AI Clients at THINK 2019

slide-21
SLIDE 21

10 20 30 40 50 60 70 80 20000 40000 60000 80000 100000 J u n

  • 1

8 J u l

  • 1

8 A u g

  • 1

8 S e p

  • 1

8 O c t

  • 1

8 N

  • v
  • 1

8 D e c

  • 1

8 J a n

  • 1

9 F e b

  • 1

9

From 6K to 85K Members in 9 Months

Members Groups Members Groups

https://www.meetup.com/topics/powerai/

IBM AI Meetups Community Grew 10x in 9 Months

21

slide-22
SLIDE 22

Summary

22

Watson ML: Machine / Deep Learning Toolkit Snap ML: Fast Machine Learning Framework Power AI DevBox & AI Starter Kit

slide-23
SLIDE 23

Get Started Today with Machine & Deep Learning

23

Build a Data Science Team

Your Developers Can Learn http://cognitiveclass.ai

Identify a Low Hanging Use Case Figure Out Data Strategy Consider Pre-Built AI APIs Hire Consulting Services Get Started Today at www.ibm.biz/poweraideveloper

slide-24
SLIDE 24

Additional Details

24

slide-25
SLIDE 25

Why are Linear & Tree Models Useful?

25

Need Less Data Interpretability Fast Training

GLMs can scale to datasets with billions of examples and/or features & still train in minutes to hours Machine learning models can train to “good-enough” accuracy with much less data than deep learning requires Linear models explicitly assign an importance to each input feature Tree models explicitly illustrate the path to a decision.

Less Tuning

Linear models involve much fewer parameters than more complex models (GBMs, Neural Nets)