S91030 - Hybrid Machine Learning with the Kubeflow Pipelines and - - PowerPoint PPT Presentation

s91030 hybrid machine learning with the kubeflow
SMART_READER_LITE
LIVE PREVIEW

S91030 - Hybrid Machine Learning with the Kubeflow Pipelines and - - PowerPoint PPT Presentation

S91030 - Hybrid Machine Learning with the Kubeflow Pipelines and RAPIDS Sina Chavoshi Cloud AI Strategy: The right approach for the right problem Building blocks Platform Solutions Cloud AI Strategy: The right approach for the right


slide-1
SLIDE 1

S91030 - Hybrid Machine Learning with the Kubeflow Pipelines and RAPIDS

Sina Chavoshi

slide-2
SLIDE 2

The right approach for the right problem

Building blocks Platform Solutions

Cloud AI Strategy:

slide-3
SLIDE 3

The right approach for the right problem

Building blocks Platform Solutions

Cloud AI Strategy:

slide-4
SLIDE 4

Building Blocks

Sight Language Conversation

slide-5
SLIDE 5

The right approach for the right problem

Building blocks Platform Solutions

Cloud AI Strategy:

slide-6
SLIDE 6

Solutions / Contact Center

Customer Phone Chat Contact Center Provider Contact Center Interface Virtual Agent Agent Assist Knowledge Base (PDF/HTML) Backend Fulfillment Virtual Agent Agent Google Cloud Contact Center AI

slide-7
SLIDE 7

The right approach for the right problem

Building blocks Platform Solutions

Cloud AI Strategy:

slide-8
SLIDE 8

Cloud AI Platform

Data pipeline

Cloud Dataprep BigQuery Cloud Dataflow Cloud Dataproc

Model development

Cloud ML Engine

Model deployment and management

Cloud ML Engine Cloud Kubernetes Engine

Tools

Jupyter Notebooks

Services

ASL

Community

Kubeflow

slide-9
SLIDE 9

Building & deploying real-life ML applications is hard and costly because of lack of tooling that covers end-to-end ML development & deployment.

slide-10
SLIDE 10

In addition to the actual ML...

ML Code

slide-11
SLIDE 11

You have to worry about so much more.

Configuration Data Collection Data Verification Feature Extraction Process Management Tools Analysis Tools Machine Resource Management Serving Infrastructure Monitoring

ML Code Source: Sculley et al.: Hidden Technical Debt in Machine Learning Systems

slide-12
SLIDE 12

02 03

AI problems today

Problems Solutions

Deployment Brittle, opinionated infrastructure that is hard to productionize and breaks between cloud and on-prem Talent Machine Learning expertise is scarce Collaboration Difficult to find, leverage existing solutions

Reusable pipelines

01 02 03

slide-13
SLIDE 13

01: Kubeflow

Scalable ML services on Kubernetes

Easy to get started

  • Out-of-box support for top frameworks

– pytorch, caffe, tf and xgboost

  • Kubernetes manages dependencies, resources

Swappable & scalable

  • Library of ML services
  • GPU support
  • Massive scale

Meet customer where they are

  • GCP
  • On-prem with Cisco

Cloud On-prem

Training

ML microservices

Predict Training Predict

slide-14
SLIDE 14

Product Overview

RAPIDS

slide-15
SLIDE 15

THE BIG PROBLEM IN DATA SCIENCE

All Data ETL

Manage Data

Structured Data Store Data Preparation

Training

Model Training Visualization

Evaluate

Scoring

Deploy Slow Training Times for Data Scientists

slide-16
SLIDE 16

RAPIDS — OPEN GPU DATA SCIENCE

Software Stack Python

Data Preparation

cuDF

Graph Analytics

cuGRAPH

Model Training

cuML

CUDA PYTHON APACHE ARROW on GPU Memory DASK/SPARK DEEP LEARNING FRAMEWORKS CUDNN RAPIDS CUML CUDF CUGRAPH

slide-17
SLIDE 17

BENCHMARKS

cuML — XGBoost End-to-End cuIO/cuDF — Load and Data Preparation Benchmark

200GB CSV dataset; Data preparation includes joins, variable transformations.

CPU Cluster Configuration

CPU nodes (61 GiB of memory, 8 vCPUs, 64-bit platform), Apache Spark

DGX Cluster Configuration

5x DGX-1 on InfiniBand network Time in seconds — Shorter is better

cuIO / cuDF (Load and Data Preparation) Data Conversion XGBoost

slide-18
SLIDE 18

AI Hub & Pipelines: Fast & simple adoption of AI

  • 5. Publish

Upload & share pipelines running best within your org or publicly.

  • 1. Search & Discover

Find best-of-breed solutions on the AI Hub which leverage Cloud AI solutions

  • 2. Deploy

Quick 1-click implementation of ML pipelines onto Google Cloud Platform .

  • 4. Run in production

Deploy customized pipelines in production.

  • 3. Customize

Experiment and adjustment

  • ut-of-the-box pipelines to custom use

cases. Network effect

The Flywheel of AI Adoption

slide-19
SLIDE 19

02: Reusable Pipelines

Enable developers to build custom ML applications by easily “stitching” and connecting various components.

  • Reuse instead of reimplement or reinvent
  • Discover, learn and replicate successful pipelines
slide-20
SLIDE 20

What constitutes a Kubeflow Pipeline

  • Containerized implementations of ML Tasks

○ Containers provide portability, repeatability and encapsulation ○ A task can be single node or *distributed* ○ A containerized task can invoke other services

  • Specification of the sequence of steps

○ Specified via Python SDK

  • Input Parameters

○ A “Job” = Pipeline invoked w/ specific parameters

slide-21
SLIDE 21

03: AI Hub at a glance

All AI content in one place Quick discovery of plug & play AI pipelines & other content built by teams across Google and by partners and customers. Fast & simple implementation of AI on GCP One-click deployment of AI pipelines via Kubeflow on GCP as the go-to platform for AI + hybrid & on premise. Enterprise-grade internal & external sharing Foster reuse by sharing deployable AI pipelines & other content privately within organizations & publicly. 1 2 3

slide-22
SLIDE 22

Mission

The one place for everything AI, from experimentation to production.

slide-23
SLIDE 23

Public and private AI Hub

By Google Unique AI assets by Google By partners Created, shared & monetized by anyone By customers Content shared securely within and with other organizations

Public content + Private content

AutoML, TPUs, Cloud AI Platform, etc.

slide-24
SLIDE 24

Kubeflow Pipelines enable

Workflow

  • rchestration

Rapid reliable experimentation Share, re-use & compose

slide-25
SLIDE 25

Demo

slide-26
SLIDE 26

Visual depiction of pipeline topology

slide-27
SLIDE 27

View all current and historical runs, grouped as “Experiments”

slide-28
SLIDE 28

Rich visualizations of metrics

slide-29
SLIDE 29

Clone an existing pipeline

slide-30
SLIDE 30

Access to all config params, inputs and outputs for each run

slide-31
SLIDE 31

Update parameters and submit

slide-32
SLIDE 32

Easy comparison of Runs

slide-33
SLIDE 33

Easy comparison of Runs

slide-34
SLIDE 34
slide-35
SLIDE 35

That’s a wrap.