S91030 - Hybrid Machine Learning with the Kubeflow Pipelines and - - PowerPoint PPT Presentation

▶

Mar 12, 2023 427 likes •787 views

S91030 - Hybrid Machine Learning with the Kubeflow Pipelines and RAPIDS Sina Chavoshi Cloud AI Strategy: The right approach for the right problem Building blocks Platform Solutions Cloud AI Strategy: The right approach for the right

SLIDE 1

S91030 - Hybrid Machine Learning with the Kubeflow Pipelines and RAPIDS

Sina Chavoshi

SLIDE 2

The right approach for the right problem

Building blocks Platform Solutions

Cloud AI Strategy:

SLIDE 3

The right approach for the right problem

Building blocks Platform Solutions

Cloud AI Strategy:

SLIDE 4

Building Blocks

Sight Language Conversation

SLIDE 5

The right approach for the right problem

Building blocks Platform Solutions

Cloud AI Strategy:

SLIDE 6

Solutions / Contact Center

Customer Phone Chat Contact Center Provider Contact Center Interface Virtual Agent Agent Assist Knowledge Base (PDF/HTML) Backend Fulfillment Virtual Agent Agent Google Cloud Contact Center AI

SLIDE 7

The right approach for the right problem

Building blocks Platform Solutions

Cloud AI Strategy:

SLIDE 8

Cloud AI Platform

Data pipeline

Cloud Dataprep BigQuery Cloud Dataflow Cloud Dataproc

Model development

Cloud ML Engine

Model deployment and management

Cloud ML Engine Cloud Kubernetes Engine

Tools

Jupyter Notebooks

Services

ASL

Community

Kubeflow

SLIDE 9

Building & deploying real-life ML applications is hard and costly because of lack of tooling that covers end-to-end ML development & deployment.

SLIDE 10

In addition to the actual ML...

ML Code

SLIDE 11

You have to worry about so much more.

Configuration Data Collection Data Verification Feature Extraction Process Management Tools Analysis Tools Machine Resource Management Serving Infrastructure Monitoring

ML Code Source: Sculley et al.: Hidden Technical Debt in Machine Learning Systems

SLIDE 12

02 03

AI problems today

Problems Solutions

Deployment Brittle, opinionated infrastructure that is hard to productionize and breaks between cloud and on-prem Talent Machine Learning expertise is scarce Collaboration Difficult to find, leverage existing solutions

Reusable pipelines

01 02 03

SLIDE 13

01: Kubeflow

Scalable ML services on Kubernetes

Easy to get started

Out-of-box support for top frameworks

– pytorch, caffe, tf and xgboost

Kubernetes manages dependencies, resources

Swappable & scalable

Library of ML services
GPU support
Massive scale

Meet customer where they are

GCP
On-prem with Cisco

Cloud On-prem

Training

ML microservices

Predict Training Predict

SLIDE 14

Product Overview

RAPIDS

SLIDE 15

THE BIG PROBLEM IN DATA SCIENCE

All Data ETL

Manage Data

Structured Data Store Data Preparation

Training

Model Training Visualization

Evaluate

Scoring

Deploy Slow Training Times for Data Scientists

SLIDE 16

RAPIDS — OPEN GPU DATA SCIENCE

Software Stack Python

Data Preparation

cuDF

Graph Analytics

cuGRAPH

Model Training

cuML

CUDA PYTHON APACHE ARROW on GPU Memory DASK/SPARK DEEP LEARNING FRAMEWORKS CUDNN RAPIDS CUML CUDF CUGRAPH

SLIDE 17

BENCHMARKS

cuML — XGBoost End-to-End cuIO/cuDF — Load and Data Preparation Benchmark

200GB CSV dataset; Data preparation includes joins, variable transformations.

CPU Cluster Configuration

CPU nodes (61 GiB of memory, 8 vCPUs, 64-bit platform), Apache Spark

DGX Cluster Configuration

5x DGX-1 on InfiniBand network Time in seconds — Shorter is better

cuIO / cuDF (Load and Data Preparation) Data Conversion XGBoost

SLIDE 18

AI Hub & Pipelines: Fast & simple adoption of AI

5. Publish

Upload & share pipelines running best within your org or publicly.

1. Search & Discover

Find best-of-breed solutions on the AI Hub which leverage Cloud AI solutions

2. Deploy

Quick 1-click implementation of ML pipelines onto Google Cloud Platform .

4. Run in production

Deploy customized pipelines in production.

3. Customize

Experiment and adjustment

ut-of-the-box pipelines to custom use

cases. Network effect

The Flywheel of AI Adoption

SLIDE 19

02: Reusable Pipelines

Enable developers to build custom ML applications by easily “stitching” and connecting various components.

Reuse instead of reimplement or reinvent
Discover, learn and replicate successful pipelines

SLIDE 20

What constitutes a Kubeflow Pipeline

Containerized implementations of ML Tasks

○ Containers provide portability, repeatability and encapsulation ○ A task can be single node or *distributed* ○ A containerized task can invoke other services

Specification of the sequence of steps

○ Specified via Python SDK

Input Parameters

○ A “Job” = Pipeline invoked w/ specific parameters

SLIDE 21

03: AI Hub at a glance

All AI content in one place Quick discovery of plug & play AI pipelines & other content built by teams across Google and by partners and customers. Fast & simple implementation of AI on GCP One-click deployment of AI pipelines via Kubeflow on GCP as the go-to platform for AI + hybrid & on premise. Enterprise-grade internal & external sharing Foster reuse by sharing deployable AI pipelines & other content privately within organizations & publicly. 1 2 3

SLIDE 22

Mission

The one place for everything AI, from experimentation to production.

SLIDE 23

Public and private AI Hub

By Google Unique AI assets by Google By partners Created, shared & monetized by anyone By customers Content shared securely within and with other organizations

Public content + Private content

AutoML, TPUs, Cloud AI Platform, etc.

SLIDE 24

Kubeflow Pipelines enable

Workflow

rchestration

Rapid reliable experimentation Share, re-use & compose

SLIDE 25

Demo

SLIDE 26

Visual depiction of pipeline topology

SLIDE 27

View all current and historical runs, grouped as “Experiments”

SLIDE 28

Rich visualizations of metrics

SLIDE 29

Clone an existing pipeline

SLIDE 30

Access to all config params, inputs and outputs for each run

SLIDE 31

Update parameters and submit

SLIDE 32

Easy comparison of Runs

SLIDE 33

Easy comparison of Runs

SLIDE 34

SLIDE 35

S91030 - Hybrid Machine Learning with the Kubeflow Pipelines and RAPIDS

The right approach for the right problem

Cloud AI Strategy:

The right approach for the right problem

Cloud AI Strategy:

Building Blocks

The right approach for the right problem

Cloud AI Strategy:

Solutions / Contact Center

The right approach for the right problem

Cloud AI Strategy:

Cloud AI Platform

Building & deploying real-life ML applications is hard and costly because of lack of tooling that covers end-to-end ML development & deployment.

In addition to the actual ML...

You have to worry about so much more.

AI problems today

01: Kubeflow

RAPIDS

THE BIG PROBLEM IN DATA SCIENCE

RAPIDS — OPEN GPU DATA SCIENCE

BENCHMARKS

AI Hub & Pipelines: Fast & simple adoption of AI

02: Reusable Pipelines

What constitutes a Kubeflow Pipeline

03: AI Hub at a glance

Mission

Public and private AI Hub

Kubeflow Pipelines enable

Demo

That’s a wrap.