Machine Learning pipeline wit ith R Contributions to whiteboxing - PowerPoint PPT Presentation

Towards an Enterprise grade Machine Learning pipeline wit ith R Contributions to whiteboxing machine learning for interoperation with production environments Thomas.Strehl@at.ibm.com Thomas.Weinrich@at.ibm.com Rudolf.Pailer@at.ibm.com @20200120

REnterprise: Machine Learning with R in the Enterprise MLOps Whiteboxing ML for Interoperation with production environments • Enterprise Environment • Processes, Governance, Architecture, Security • Requirements, Develop, Test, Release Management, Rollout • Documentation, Incident Management • CRISP-DM + MLOps -> CrispML • Standard process for Data Mining related projects • DevOps Automation for Machine Learning • -> Service orientated Architecture for data preparation, training and scoring • Demo: rep-admin + rep-crispml • CrispML demo implementation on kubernetes • Automated ML pipeline for R Thomas.strehl@at.ibm.com Stand 20200120

REnterprise: Classical Software Factory Lineup Development Environment Test Environment Production Environment Requirements Development Test Production Solution Design Governance Architecture Board Scrum, Standups Defect Management Incident Management Pipeline Jira + Confluence SCM: BitBucket Build: Jenkins Artifacts: Artifactory Deploy: Jenkins Test Test Management Unit: JUnit Service: SoapUI Functional: Tosca, Selenium Data Test Data Management Production specific samples semi-realistic Infrastructure Automation Config Management Release Management Monitoring: Dynatrace Logging: Splunk Thomas.strehl@at.ibm.com Stand 20200120

REnterprise: Classical Machine Learning Pipeline CRISP-DM (Cross InduStry Process for Data Mining) • Business understanding • Data understanding • Data preparation • Modeling • Evaluation • Deployment • Devised in late 1990 • Used by around 45% of data projects Thomas.strehl@at.ibm.com Stand 20200120

REnterprise: ML interaction with the Enterprise MLOps Interaction of Enterprise services with ML services • Data Preperation + Training • Mass Data: FileSystem, DataBase, DWH, DataLake , Data Platform, … • Scoring • Batch Scoring: e.g. Rscript, REST • Record Scoring: e.g REST • Governance • Reporting, Statistics, Performance • Documentation, Changes, Defects, Incidents Thomas.strehl@at.ibm.com Stand 20200120

REnterprise: CRISP-DM generic Interfaces MLOps CrispML: Implementation of 9 methods for Training and Scoring • Training and Scoring • DataIngestion: Raw Data • DataPreparation: Algorithm independent • DataCuration: Algorithm specific • Training • ModelTraining: Algorithm, Hyperparameters • ModelReport : ML KPI’s • ModelPersist: Model Registry • Scoring • NewDataScore: persist each score with reference to metadata • NewDataLabel: import new ground truth • NewDataReport: verify new ground truth against persisted score Thomas.strehl@at.ibm.com Stand 20200120

REnterprise: CrispML Components MLOps CrispML: Training and Scorings Servers and Clients Training - Area Scoring - Area CrispML-App Database CrispML-App plumber plumber trainer REST client Data Ingestion Data Ingestion Data Preperation Data Preperation Data Curation Data Curation Train Model Train Model Storage Persist Model Persist Model Score NewData Score NewData Labels NewData Labels NewData Verify NewData Verify NewData Thomas.strehl@at.ibm.com Stand 20200120

REnterprise: CrispML Big Picture MLOps Orchestration: Run ML cycle including training and verification in place in each enviroment Deployment: Stage ML functionality across environmnets Complete pipeline subject to QA Fallback: Use model trained in QA GitLab All ML functionality in R package Development Environment QA Environment Production Environment R-Package Training Environment Training Environment Data Ingestion Data Ingestion Data Ingestion Data Ingestion Data Preperation Data Preperation Data Preperation Verify NewData Data Preperation Data Curation Data Curation Data Curation Train Model Train Model Train Model Labels NewData Data Curation Persist Model Persist Model Persist Model Scoring Environment Scoring Environment Score NewData Score NewData Train Model Score NewData Score NewData Labels NewData Labels NewData Labels NewData Persist Model Verify NewData Verify NewData Verify NewData Data: Access data pool shared across environments (optional GDPR filters) Thomas.strehl@at.ibm.com Stand 20200120

REnterprise: CrispML implementation in R MLOps Crispml: Selfcontained, standalone, scalable REST Docker container • CrispML • All CRISP-DM methods designed for remote invocation • REST interface to ingestion, training, scoring • Plumber (other options: openCPU, rserver) • Admin Console • Lightweight demo implementation of remote control • Shiny GUI app • Challenge: no direct access to data, only via REST • Runtime Environment • Linux (any R platform), Kubernetes, … Thomas.strehl@at.ibm.com Stand 20200120

REnterprise: Demo based on DevOps and Containers MLOps DevOps, Cloud • ‘Traditional’ platform • E.g. Bitbucket, Jira, Confluence, Jenkins, Artifactory, VMware • DevOps -> MLOps • Automatisierung der Pipeline: Gitlab, Tekton • Containerized • Docker, Kubernetes • IBM Cloud • Gitlab, Tekton, Kubernetes, logDNA, sysDIG, DB2 Thomas.strehl@at.ibm.com Stand 20200120

Why DevOps – Traditional software delivery lifecycle Plan Develop Test Release Business Owner Require- Code Unit, UAT,.. Production Customer ments Failures due to Bottlenecks trying to Complex, manual, Poor visibility into inconsistent dev and deliver frequent processes for release dependencies across production releases to meet lack repeatability and releases, resources, environments market demands speed and teams

Why DevOps – Transforming the software delivery lifecycle Plan Develop Test Release Business Owner Require- Code Unit, UAT,.. Production Customer ments Agility & Flexibility Standardization Fail fast & Fail early Automation

DevOps: Continuous flow in Enterprise systems DevOps Practice Areas 3 DevOps dimensions 6 DevOps practice areas 4 DevOps software lifecycle DevOps Dimensions People Processes Technology

DevOps Principles: Continuous everything Dashboard everything Automate everything Test everything Collaboration for speed Continuous monitoring Continuous Delivery Continuous testing • • • Collaborative steering • Visibility to the teams Continuous Integration Test automation • • • Collaborative Dev-Ops • Infra as Code • Feedback loops • Monitor and audit everything Continuous monitoring • Logging and monitoring •

DevOps: Automation, automation, automation  If someone has to do the same thing more than once, it’s a candidate for automation  If something is hard, do it repeatedly  Develop and Test against production-like systems  Iterative and frequent deployments using repeatable and reliable processes  Continuously monitor and validate operational quality characteristics  Encourage a culture of experimentation and valuing team improvement • Minimizing business risk – fail small and fast • All DevOps principles also apply to MLOps -> CrispMl approach

REnterprise: Containerizing: Docker + Kubernetes MLOps CrispML: • Docker • Lightweight variant of virtual server • Start from downloadable template and enhance along ‘ Dockerfile ’ • Persist as ‘image’ and instantiate as ‘container’ • Template images available e.g. for ‘ rshiny ’ and ‘plumber’ applications • Kubernetes • Orchestrator for containerized applications • Scaling, Loadbalancing , System Monitoring, Storage, Network, … Thomas.strehl@at.ibm.com Stand 20200120

thomas.strehl@at.ibm.com @20200202 REnterprise: Scaling R on Kubernetes Scaling by future/promise Scaling by ingress alb for Monitoring of node Scaling by service within container different applications by Daemonsets across pods Kubernetes cluster Ingress ALB nginx Service AppX Service AppY Workers Nodes Nodes Pod 1 Pods Pod 1 sysDIG, logDNA Pods Storage Containers Containers Containers Containers plumber plumber-main plumber plumber-main sidecar sidecar plumber-child plumber-child plumber-child plumber-child Pod n Pod n sidecar sidecar Containers Containers plumber plumber Pod Pod sidecar sidecar DaemonSet DaemonSet Thomas.strehl@at.ibm.com Stand 20200120

REnterprise: Performance Testing MLOps CrispML: loadimpact/k6 -> influxdb -> grafana • Many options • jmeter, Locust(python), grinder(java), Gatling(scala ), …………….. • loadimpact/k6 • java script, 3000+ stars on GitHub • Writes to influxdb, prebuilt Grafana dashboards, invoked as container Thomas.strehl@at.ibm.com Stand 20200120

REnterprise: Database and Persistent storage MLOps CrispML: odbc, DBI, dbplyr -> DB2 (requires OS level driver) • Peformance sensitive • Throughput depends on network latency • R-Packages • pool: DB connection pool • dbplyr: Execute dataframe operations in DB CrispML: Kubernetes Persistent Volume Claim • Kubernetes file system, DWH, Data Lake, Data Platform • Persist results (models, parameters, …) • Persist state across instances of R processes on different pod/nodes Thomas.strehl@at.ibm.com Stand 20200120

Machine Learning pipeline wit ith R Contributions to whiteboxing - PowerPoint PPT Presentation

Towards an Enterprise grade Machine Learning pipeline wit ith R Contributions to whiteboxing machine learning for interoperation with production environments Thomas.Strehl@at.ibm.com Thomas.Weinrich@at.ibm.com Rudolf.Pailer@at.ibm.com @20200120

Development of a Pipeline Development of a Pipeline CAMAC cont roller CAMAC cont roller wit

SIG IG1510: Power Your Material Editing wit ith Substance Designer, MDL and Ir Iray Sebastien

Sahar hara a Be Beach ch Sahara ara Beach ch Perfec fect place e to connec ect wit ith

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Scaled Machine Learning at Matroid Reza Zadeh @Reza_Zadeh | http://reza-zadeh.com Machine Learning

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Zoonoses Online Education Proje ject Onlin line cou ourses wit ith vid videos for or th the

Characterizing Ext xtragalactic Pre-Main- Sequence Stars wit ith Machine and Deep Learnin ing

Pipeline MACH IN E LEARN IN G W ITH P YS PARK Andrew Collier Data Scientist, Exegetic Analytics

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Office of Pipeline Safety Office of Pipeline Safety Presentation on Presentation on Damage

Ma Magic Mountain Pipeline Phase 6 gic Mountain Pipeline Phase 6 Project ject Board Meeting

Kubernetes Networking and Istio Apurva Bhandari $whoami SRE / DevOps Docker & Kubernetes

The Next Big Challenge? Gianni Antichi, Gbor Rtvri Disclaimer This is a

Implementing Blue/Green Deployments with Istio Machine Intelligence Modern Infrastructure

One year of Deploying Applications for Docker, CoreOS, Kubernetes and Co thomas@endocode.com

Agreeing on Institutional Goals for Multi-Agent Societies D. Gaertner 1 , 2 , J.-A. Rodriguez 2 ,

Towards Quantifiable Boundaries for Elastic Horizontal Scaling of Microservices Manuel Ramrez

Adjusting bolus insulin on pump therapy (CSII) Dr Jackie Elliott Senior Clinical Lecturer /

Chapter 8: Information Extraction (IE) 8.1 Motivation and Overview 8.2 Rule-based IE 8.3 Hidden

Machine Learning pipeline wit ith R Contributions to whiteboxing - PowerPoint PPT Presentation

Towards an Enterprise grade Machine Learning pipeline wit ith R Contributions to whiteboxing machine learning for interoperation with production environments Thomas.Strehl@at.ibm.com Thomas.Weinrich@at.ibm.com Rudolf.Pailer@at.ibm.com @20200120

Development of a Pipeline Development of a Pipeline CAMAC cont roller CAMAC cont roller wit

SIG IG1510: Power Your Material Editing wit ith Substance Designer, MDL and Ir Iray Sebastien

Sahar hara a Be Beach ch Sahara ara Beach ch Perfec fect place e to connec ect wit ith

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Scaled Machine Learning at Matroid Reza Zadeh @Reza_Zadeh | http://reza-zadeh.com Machine Learning

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Zoonoses Online Education Proje ject Onlin line cou ourses wit ith vid videos for or th the

Characterizing Ext xtragalactic Pre-Main- Sequence Stars wit ith Machine and Deep Learnin ing

Pipeline MACH IN E LEARN IN G W ITH P YS PARK Andrew Collier Data Scientist, Exegetic Analytics

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Office of Pipeline Safety Office of Pipeline Safety Presentation on Presentation on Damage

Ma Magic Mountain Pipeline Phase 6 gic Mountain Pipeline Phase 6 Project ject Board Meeting

Kubernetes Networking and Istio Apurva Bhandari $whoami SRE / DevOps Docker &amp; Kubernetes

The Next Big Challenge? Gianni Antichi, Gbor Rtvri Disclaimer This is a

Implementing Blue/Green Deployments with Istio Machine Intelligence Modern Infrastructure

One year of Deploying Applications for Docker, CoreOS, Kubernetes and Co thomas@endocode.com

Agreeing on Institutional Goals for Multi-Agent Societies D. Gaertner 1 , 2 , J.-A. Rodriguez 2 ,

Towards Quantifiable Boundaries for Elastic Horizontal Scaling of Microservices Manuel Ramrez

Adjusting bolus insulin on pump therapy (CSII) Dr Jackie Elliott Senior Clinical Lecturer /

Chapter 8: Information Extraction (IE) 8.1 Motivation and Overview 8.2 Rule-based IE 8.3 Hidden

Kubernetes Networking and Istio Apurva Bhandari $whoami SRE / DevOps Docker & Kubernetes