Machine Learning pipeline wit ith R Contributions to whiteboxing - - PowerPoint PPT Presentation

machine learning pipeline wit ith r
SMART_READER_LITE
LIVE PREVIEW

Machine Learning pipeline wit ith R Contributions to whiteboxing - - PowerPoint PPT Presentation

Towards an Enterprise grade Machine Learning pipeline wit ith R Contributions to whiteboxing machine learning for interoperation with production environments Thomas.Strehl@at.ibm.com Thomas.Weinrich@at.ibm.com Rudolf.Pailer@at.ibm.com @20200120


slide-1
SLIDE 1

Towards an Enterprise grade Machine Learning pipeline wit ith R

Contributions to whiteboxing machine learning for interoperation with production environments

Thomas.Strehl@at.ibm.com Thomas.Weinrich@at.ibm.com Rudolf.Pailer@at.ibm.com @20200120

slide-2
SLIDE 2

REnterprise: Machine Learning with R in the Enterprise

Thomas.strehl@at.ibm.com Stand 20200120

  • Enterprise Environment
  • Processes, Governance, Architecture, Security
  • Requirements, Develop, Test, Release Management, Rollout
  • Documentation, Incident Management
  • CRISP-DM + MLOps -> CrispML
  • Standard process for Data Mining related projects
  • DevOps Automation for Machine Learning
  • -> Service orientated Architecture for data preparation, training and scoring
  • Demo: rep-admin + rep-crispml
  • CrispML demo implementation on kubernetes
  • Automated ML pipeline for R

MLOps

Whiteboxing ML for Interoperation with production environments

slide-3
SLIDE 3

Development Environment Test Environment Production Environment Data Pipeline Infrastructure Test

REnterprise: Classical Software Factory Lineup

Thomas.strehl@at.ibm.com Stand 20200120

Requirements Solution Design Development Test Production Jira + Confluence Unit: JUnit Logging: Splunk Monitoring: Dynatrace semi-realistic specific samples Production Build: Jenkins SCM: BitBucket

Governance

Architecture Board Defect Management Release Management Incident Management Artifacts: Artifactory Deploy: Jenkins Functional: Tosca, Selenium Service: SoapUI Config Management Automation Scrum, Standups Test Management Test Data Management

slide-4
SLIDE 4

REnterprise: Classical Machine Learning Pipeline

Thomas.strehl@at.ibm.com Stand 20200120

  • Business understanding
  • Data understanding
  • Data preparation
  • Modeling
  • Evaluation
  • Deployment
  • Devised in late 1990
  • Used by around 45% of data projects

CRISP-DM (Cross InduStry Process for Data Mining)

slide-5
SLIDE 5

REnterprise: ML interaction with the Enterprise

Thomas.strehl@at.ibm.com Stand 20200120

  • Data Preperation + Training
  • Mass Data: FileSystem, DataBase, DWH, DataLake, Data Platform, …
  • Scoring
  • Batch Scoring: e.g. Rscript, REST
  • Record Scoring: e.g REST
  • Governance
  • Reporting, Statistics, Performance
  • Documentation, Changes, Defects, Incidents

Interaction of Enterprise services with ML services

MLOps

slide-6
SLIDE 6

REnterprise: CRISP-DM generic Interfaces

Thomas.strehl@at.ibm.com Stand 20200120

  • Training and Scoring
  • DataIngestion: Raw Data
  • DataPreparation: Algorithm independent
  • DataCuration: Algorithm specific
  • Training
  • ModelTraining: Algorithm, Hyperparameters
  • ModelReport: ML KPI’s
  • ModelPersist: Model Registry
  • Scoring
  • NewDataScore: persist each score with reference to metadata
  • NewDataLabel: import new ground truth
  • NewDataReport: verify new ground truth against persisted score

CrispML: Implementation of 9 methods for Training and Scoring

MLOps

slide-7
SLIDE 7

REnterprise: CrispML Components

Training - Area

CrispML-App plumber

Data Ingestion Train Model Persist Model Data Preperation Data Curation Labels NewData Score NewData Verify NewData

Scoring - Area

CrispML-App plumber

Data Ingestion Train Model Persist Model Data Preperation Data Curation Labels NewData Score NewData Verify NewData

Database Storage trainer REST client

MLOps

CrispML: Training and Scorings Servers and Clients

Thomas.strehl@at.ibm.com Stand 20200120

slide-8
SLIDE 8

REnterprise: CrispML Big Picture

Thomas.strehl@at.ibm.com Stand 20200120

Data Ingestion Train Model Labels NewData Score NewData Persist Model Verify NewData Data Preperation Data Curation

GitLab Development Environment

Data Ingestion Train Model Labels NewData Score NewData Persist Model Verify NewData Data Preperation Data Curation

QA Environment Data: Access data pool shared across environments (optional GDPR filters) Orchestration: Run ML cycle including training and verification in place in each enviroment

All ML functionality in R package R-Package

Deployment: Stage ML functionality across environmnets

Complete pipeline subject to QA Fallback: Use model trained in QA

Scoring Environment

Production Environment

Training Environment

Labels NewData Score NewData Verify NewData Data Ingestion Train Model Persist Model Data Preperation Data Curation

Scoring Environment Training Environment

Labels NewData Score NewData Verify NewData Data Ingestion Train Model Persist Model Data Preperation Data Curation MLOps

slide-9
SLIDE 9

REnterprise: CrispML implementation in R

Thomas.strehl@at.ibm.com Stand 20200120

  • CrispML
  • All CRISP-DM methods designed for remote invocation
  • REST interface to ingestion, training, scoring
  • Plumber (other options: openCPU, rserver)
  • Admin Console
  • Lightweight demo implementation of remote control
  • Shiny GUI app
  • Challenge: no direct access to data, only via REST
  • Runtime Environment
  • Linux (any R platform), Kubernetes, …

Crispml: Selfcontained, standalone, scalable REST Docker container

MLOps

slide-10
SLIDE 10

REnterprise: Demo based on DevOps and Containers

Thomas.strehl@at.ibm.com Stand 20200120

  • ‘Traditional’ platform
  • E.g. Bitbucket, Jira, Confluence, Jenkins, Artifactory, VMware
  • DevOps -> MLOps
  • Automatisierung der Pipeline: Gitlab, Tekton
  • Containerized
  • Docker, Kubernetes
  • IBM Cloud
  • Gitlab, Tekton, Kubernetes, logDNA, sysDIG, DB2

DevOps, Cloud

MLOps

slide-11
SLIDE 11

Why DevOps – Traditional software delivery lifecycle

Customer

Failures due to inconsistent dev and production environments Bottlenecks trying to deliver frequent releases to meet market demands Complex, manual, processes for release lack repeatability and speed Poor visibility into dependencies across releases, resources, and teams

Business Owner

Plan

Require- ments

Develop

Code

Test

Unit, UAT,..

Release

Production

slide-12
SLIDE 12

Why DevOps – Transforming the software delivery lifecycle

Plan

Require- ments

Develop

Code

Release

Production

Test

Unit, UAT,..

Business Owner Customer

Agility & Flexibility Standardization Fail fast & Fail early Automation

slide-13
SLIDE 13

3 DevOps dimensions 6 DevOps practice areas 4 DevOps software lifecycle

DevOps: Continuous flow in Enterprise systems

People Processes Technology

DevOps Dimensions DevOps Practice Areas

slide-14
SLIDE 14

DevOps Principles: Continuous everything

Dashboard everything

  • Continuous monitoring
  • Visibility to the teams

Automate everything

  • Continuous Delivery
  • Continuous Integration
  • Infra as Code

Test everything

  • Continuous testing
  • Test automation

Monitor and audit everything

  • Continuous monitoring
  • Logging and monitoring

Collaboration for speed

  • Collaborative steering
  • Collaborative Dev-Ops
  • Feedback loops
slide-15
SLIDE 15

DevOps: Automation, automation, automation

  • If someone has to do the same thing more than once, it’s a candidate for

automation

  • If something is hard, do it repeatedly
  • Develop and Test against production-like systems
  • Iterative and frequent deployments using repeatable and reliable

processes

  • Continuously monitor and validate operational quality characteristics
  • Encourage a culture of experimentation and valuing team improvement
  • Minimizing business risk – fail small and fast
  • All DevOps principles also apply to MLOps -> CrispMl approach
slide-16
SLIDE 16

REnterprise: Containerizing: Docker + Kubernetes

Thomas.strehl@at.ibm.com Stand 20200120

  • Docker
  • Lightweight variant of virtual server
  • Start from downloadable template and enhance along ‘Dockerfile’
  • Persist as ‘image’ and instantiate as ‘container’
  • Template images available e.g. for ‘rshiny’ and ‘plumber’ applications
  • Kubernetes
  • Orchestrator for containerized applications
  • Scaling, Loadbalancing, System Monitoring, Storage, Network, …

CrispML:

MLOps

slide-17
SLIDE 17

Kubernetes cluster Workers Nodes

thomas.strehl@at.ibm.com @20200202

Nodes Pods Containers

plumber-child sidecar plumber-child plumber-main

Pod n Containers

plumber sidecar

Pod 1 Containers

plumber sidecar

Pod

DaemonSet

Pods Containers

plumber-child sidecar plumber-child plumber-main

Pod n Containers

plumber sidecar

Pod 1 Containers

plumber sidecar

Ingress ALB nginx Pod

DaemonSet

sysDIG, logDNA Storage

Scaling by future/promise within container Scaling by ingress alb for different applications Scaling by service across pods Monitoring of node by Daemonsets

REnterprise: Scaling R on Kubernetes

Service AppX Service AppY

Thomas.strehl@at.ibm.com Stand 20200120

slide-18
SLIDE 18

REnterprise: Performance Testing

Thomas.strehl@at.ibm.com Stand 20200120

  • Many options
  • jmeter, Locust(python), grinder(java), Gatling(scala), ……………..
  • loadimpact/k6
  • java script, 3000+ stars on GitHub
  • Writes to influxdb, prebuilt Grafana dashboards, invoked as container

CrispML: loadimpact/k6 -> influxdb -> grafana

MLOps

slide-19
SLIDE 19

REnterprise: Database and Persistent storage

Thomas.strehl@at.ibm.com Stand 20200120

  • Peformance sensitive
  • Throughput depends on network latency
  • R-Packages
  • pool: DB connection pool
  • dbplyr: Execute dataframe operations in DB
  • Kubernetes file system, DWH, Data Lake, Data Platform
  • Persist results (models, parameters, …)
  • Persist state across instances of R processes on different pod/nodes

CrispML: odbc, DBI, dbplyr -> DB2 (requires OS level driver) CrispML: Kubernetes Persistent Volume Claim

MLOps

slide-20
SLIDE 20

REnterprise: Application Logging

Thomas.strehl@at.ibm.com Stand 20200120

  • Protocol
  • Flow of operation, quantitative messages (parameters, results, …)
  • R-Packages
  • Rsyslog. Log4r, Logger, logging, lgr, futil.Logger, shinyEventLogger
  • InfluxdbR
  • Read/write influxdb. Wrapper to IQL (influx query language)
  • Kubernetes
  • stdout, stderr, /var/log/*.log -> elasticsearch, …
  • IBM Cloud: logDNA

Crispml: file, stdout, stderr. InfluxdbR -> influxdb -> grafana

MLOps

slide-21
SLIDE 21

REnterprise: Application Monitoring

Thomas.strehl@at.ibm.com Stand 20200120

  • System resources as seen by application
  • Memory, gc activity, database connections, shiny sessions, plumber calls
  • R-Packages
  • utils (gc, memory.profile), memuse(Sys.*); profmem, bench; hprof*
  • gc(), memory.profile()
  • Slow (>150ms, >200ms)
  • Kubernetes System Resource Monitoring:
  • prometheus - > grafana.
  • IBM Cloud: sysDIG

CrispML: entry/exit Log -> memory.profile() -> InfluxdbR -> influxdb -> grafana

MLOps

slide-22
SLIDE 22

REnterprise: Demand handling and Build & Deploy

Thomas.strehl@at.ibm.com Stand 20200120

  • Demand Handling
  • Build and Deploy

CrispML: GitLab -> (Epic) -> UserStoryTask -> Branch -> Merge -> Version

MLOps

CrispML: GitLab -> Tekton -> Image Registry -> Kubernetes Deployment

slide-23
SLIDE 23

Setup showcase CrispML

slide-24
SLIDE 24

POD Deployments TaskRun TaskRun Tasks

CrispML build&delivery mit gitlab und Tekton Pipelines

GITLAB GIT Listener Pipeline Tasks PipelineRun TaskRun Image registry POD Deployments

SCM Pipeline Def

Listener

Runtime

commit Trigger Update k8s ressource

slide-25
SLIDE 25

gitlab

grafana

sysDig logDNA

InfluxDB

sysDig logDNA Database Storage

MLOps

REnterprise: CrispML Kubernetes Services

Deployed Containers: CrispML*, K6, InfluxDB, Grafana IBM Cloud Services: logDNA, sysDIG, GitLab, Tekton, DB2, Storage

influxdb Crispml scoring sysDig logDNA

Crispml training

sysDig logDNA influxdb

Loadimpact

k6

logDNA sysDig influxdb

sysDIG logDNA Crispml admin

sysDig logDNA influxdb logDNA

Tekton pipeline sysDig

Thomas.strehl@at.ibm.com Stand 20200120

slide-26
SLIDE 26

REnterprise: DEMO: CrispML, MLOps and K8S in action

Thomas.strehl@at.ibm.com Stand 20200120

  • CrispML
  • R- Code: Plumber REST service and remote invocation by Rshiny
  • MLOps
  • GitLab Repository, Issue Board, Commit in Rstudio
  • Tekton Trigger, build & deployment pipeline to kubernetes
  • Loadimpact/k6 Performance Test
  • Docker file and Performance test script
  • Kubernetes invocation performance test job and scaling from 1 to 3 instances
  • logDNA application logs
  • sysDIG Kubernetes system Resources
  • influxDB + grafana: live response time report

MLOps

slide-27
SLIDE 27

REnterprise: CrispML - Shiny application

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

slide-28
SLIDE 28

REnterprise: CrispML - Plumber REST service (train model)

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

slide-29
SLIDE 29

REnterprise: CrispML - Shiny call to remote REST API

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

slide-30
SLIDE 30

REnterprise: MLOps - GIT commit in RStudio

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

slide-31
SLIDE 31

REnterprise: MLOps – Tekton build & deploy pipeline to K8S

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

slide-32
SLIDE 32

REnterprise: MLOps - Build Pipeline: Reference to source

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

slide-33
SLIDE 33

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

REnterprise: K8S - Dashboard: Deployments

slide-34
SLIDE 34

REnterprise: InfluxDB - Performance Logging with InfluxdbR

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

slide-35
SLIDE 35

REnterprise: logDNA - CrispML application log

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

slide-36
SLIDE 36

REnterprise: k6 - Dockerfile

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

slide-37
SLIDE 37

REnterprise: k6 - LoadTest Script and K8S job yaml

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

slide-38
SLIDE 38

REnterprise: k6 - Performance test Report

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

slide-39
SLIDE 39

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

REnterprise: InfluxDB+Grafana: k6 & Plumber respone time

slide-40
SLIDE 40

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

REnterprise: K8S Dashboard: Scale to 3 pods running CrispML

slide-41
SLIDE 41

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

REnterprise: sysDIG – K8S Resources during pod scale up test

slide-42
SLIDE 42

REnterprise: MLOps - GitLab Repository

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

slide-43
SLIDE 43

REnterprise: MLOps - GitLab Issue Board

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

slide-44
SLIDE 44

REnterprise: MLOps - GitLab merge Request

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

slide-45
SLIDE 45

REnterprise: MLOps - Tekton Pipelines

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

slide-46
SLIDE 46

REnterprise: MLOps - Tekton Pipeline Trigger

Thomas.strehl@at.ibm.com Stand 20200120

MLOps

slide-47
SLIDE 47

FIN MLOps

Thomas.Strehl@at.ibm.com Thomas.Weinrich@at.ibm.com Rudolf.Pailer@at.ibm.com @20200120

slide-48
SLIDE 48

Towards an an Enterprise grade Machine Learning pipeline with R

Contributions to a machine learning oriented pipeline in an enterprise environment

Thomas.Strehl@at.ibm.com @20200120

slide-49
SLIDE 49

Operational BI and Data Warehousing Self-Service Analytics New Business Models

TRANSFORMATION

Value

MODERNIZATION COST REDUCTION INSIGHT-DRIVEN

Most are here

85%

view AI as a strategic

  • pportunity
49

I want AI !

slide-50
SLIDE 50

50

BUT…, business stakeholders do not trust AI.

60%

  • f companies see regulatory constraints as a

barrier to implementing AI.

– IBM IBV AI 2018

63%

cite availability of technical skills as a challenge to implementation.

  • IBM IBV AI 2018

Without expensive Data Science resources handholding multiple AI models in a production application: 1. No way to validate if AI models are compliant with regulations and will achieve expected business outcomes before deploying 2. Difficult to track and measure indicators of business success in production 3. Resource intensive and unreliable processes for ongoing business monitoring and compliance 4. Impossible for business users to feedback subtle domain knowledge into model lifecycle

slide-51
SLIDE 51

I have a Jupyter Notebook – Problem Solved

slide-52
SLIDE 52

Skill Requirements in Data Science & AI Projects

slide-53
SLIDE 53

https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf

Hidden Depth in Machine Learning Systems

slide-54
SLIDE 54

Machine Learning Life Cycle

Data Governance Model Versioning Model Deployment Model Monitoring Dynamic Model Selection & Retraining

Data Science Solutions are not static by definition!

Modelling & Evaluation

slide-55
SLIDE 55

Core Team & SMEs Executive Sponsors with Core & Extended Team

Agile Governance & Steering – Results are regularly shared, focus can be adjusted

55 Confidential

Strategic Themes Prioritized Use Cases Planning Prioritized Deep Dive Topics Demonstration / Review Presentation of Pre-final Results Retrospective Potential for optimization Presentation Interim Results e.g. Concepts, Prototypes, MVP Daily Stand-up Meeting One Sprint 4 weeks

slide-56
SLIDE 56

Example Governance Bodies

56 Confidential

Leadership

Sponsor and Senior Management

Enterprise Architecture Agile Project Data Governance Operations

Vision & Goals Policies, Security, Compliance ML-Ops

slide-57
SLIDE 57

Compliance

57 Confidential

Requirements

  • EU General Data Protection Regulation - GDPR
  • Industry Specific Regulations
  • Bankwesengesetz (BWG), Telekommunikationsgesetz (TKG), …
  • General security and data protection considerations

Solutions

  • Data Access Control
  • Pseudonymization
  • Anonymization
  • Data aggregation (e.g. k-anonymity, background knowledge attack)
  • Encryption (data at rest, data on the move)
  • Audit Logs
slide-58
SLIDE 58

Data Governance

58 Confidential

Extract, transform and load data

  • Definition and management of data ingestion pipelines

Registration, Metadata & Discovery

  • Find and understand ingested data sets
  • Metadata for data sets
  • Versioning of data sets
  • Provenance and lineage of data sets

Access control

  • Define users and roles
  • Protect data against unauthorized access

Watson Knowledge Catalog