Accelerate Innovation in the Enterprise Solutions and Reference - - PowerPoint PPT Presentation

accelerate innovation in the enterprise
SMART_READER_LITE
LIVE PREVIEW

Accelerate Innovation in the Enterprise Solutions and Reference - - PowerPoint PPT Presentation

Accelerate Innovation in the Enterprise Solutions and Reference with Distributed ML / DL architecture Nanda Vijaydev BlueData (recently acquired by HPE) The AI Conference, New York Agenda AI, Machine Learning (ML), and Deep Learning


slide-1
SLIDE 1

Solutions and Reference architecture

Nanda Vijaydev – BlueData (recently acquired by HPE) The AI Conference, New York

Accelerate Innovation in the Enterprise with Distributed ML / DL

slide-2
SLIDE 2
  • AI, Machine Learning (ML), and Deep Learning (DL)
  • Example Enterprise Use Cases
  • Deployment Challenges for Distributed ML / DL
  • Distributed TensorFlow and Horovod on Containers

with Intel Xeon processors and Intel MKL

  • Lessons Learned and Key Takeaways

Agenda

slide-3
SLIDE 3

AI, Machine Learning, and Deep Learning

slide-4
SLIDE 4

Let’s Get Grounded…What is AI?

What are Machine Learning and Deep Learning?

Artificial intelligence Machine learning Deep learning Artificial intelligence (AI)

Mimics human behavior. Any technique that enables machines to solve a task in a way like humans do.

Machine learning (ML)

Algorithms that allow computers to learn from examples without being explicitly programmed.

Deep learning (DL)

Subset of ML, using deep artificial neural networks as models, inspired by the structure and function of the human brain.

Example:

Siri

Example:

Google Maps

Example:

Self-driving car

slide-5
SLIDE 5

Why Should You Be Interested in AI / ML / DL?

Everyone wants AI / ML / DL and advanced analytics…. ….but face many challenges

Use cases New roles, skill gaps Culture and change Data preparation Legacy infrastructure AI and advanced analytics infrastructure could constitute

15-20% of the market by 20211

AI and advanced analytics represent

2 of top 3 CIO priorities

Enterprise AI adoption

2.7X growth in last 4 years2

1 IDC. Goldman Sachs. HPE Corporate Strategy.2018 2 Gartner - “2019 CIO Survey: CIOs Have Awoken to the Importance of AI”

slide-6
SLIDE 6

Key Questions Remain …

How do you integrate your AI and data ecosystem for ML / DL and advanced analytics? How do you modernize, consume, and prepare your EDW or Hadoop big data foundation for AI? How do you get started with gaining intelligence with your data? What is the best way to prepare your company for a data-centric and AI future? What opportunities does AI bring to your business? What are the major use cases?

slide-7
SLIDE 7

AI / ML / DL Adoption in the Enterprise

Health

Personalized medicine, image analytics

Manufacturing

Predictive and prescriptive maintenance

Consumer tech

Chatbots

Financial services

Fraud detection, ID verification

Government

Cyber-security, smart cities and utilities

Energy

Seismic and reservoir modeling

Service providers

Media delivery

Retail

Video surveillance, shopping patterns

slide-8
SLIDE 8

Example Enterprise Use Cases

slide-9
SLIDE 9

Financial Services Use Cases

Fraud Detection

  • Real-Time Transactions
  • Credit Card
  • Merchant
  • Collusion
  • Impersonation
  • Social Engineering

Fraud

Risk Modeling & Credit Worthiness Check

  • Loan Defaults
  • Delayed Payments
  • Liquidity
  • Market & Currencies
  • Purchases and

Payments

  • Time Series

CLV Prediction and Recommendation

  • Historical Purchase

View

  • Pattern Recognition
  • Retention Strategy
  • Upsell
  • Cross-Sell
  • Nurturing

Customer Segmentation

  • Behavioral Analysis
  • Understanding

Customer Quadrant

  • Effective Messaging &

Improved Engagement

  • Targeted Customer

Support

  • Enhanced Retention

Other

  • Image Recognition
  • NLP
  • Security
  • Video Analysis

Wide Range of ML / DL Use Cases for Wholesale / Commercial Banking, Credit Card / Payments, Retail Banking, etc.

CLV: Customer Lifetime Value

slide-10
SLIDE 10

Fraud Detection Use Case

  • One of the most common use cases for ML / DL in

Financial Services is to detect and prevent fraud

  • This requires:

– Distributed Big Data processing frameworks such as Spark – ML / DL tools such as TensorFlow, H2O, and others – Continuous model training and deployment – Multiple large data sets

slide-11
SLIDE 11

Fraud Detection Use Case (cont’d)

  • Data science teams need the ability to create

distributed ML / DL environments for sandbox as well as trial and error experimentation

  • This requires:

– Hardware acceleration (e.g. Xeon, MKL) – Multiple different ML / DL and data science tools – Fast and repeatable deployment of clusters

slide-12
SLIDE 12
  • Precision Medicine and Personal Sensing

– Disease prediction, diagnosis, and detection (e.g. genomics research) – Using data from local sensors (e.g. mobile phones) to identify human behavior

  • Electronic Health Record (EHR) correlation

– “Smart” health records

  • Improved Clinical Workflow

– Decision support for clinicians

  • Claims Management and Fraud Detection

– Identify fraudulent claims

  • Drug Discovery and Development

ML / DL in Healthcare – Use Cases

slide-13
SLIDE 13
  • Many types of data

– Genomic – Microbiome – Epigenome – Etc.

  • Huge volumes of data

(petabytes > exabytes)

Use Case: Precision Medicine

slide-14
SLIDE 14

360° View of the Patient

Visit Care Site Rx

Patient

Demographics

Studies Genomics Diagnosis Labs

slide-15
SLIDE 15

Deployment Challenges for Distributed ML / DL

slide-16
SLIDE 16

Why Distributed ML / DL?

Speed Large Data Volumes Fault Tolerance

slide-17
SLIDE 17
  • Complexity, lack of repeatability and

reproducibility across environments

  • Sharing data, not duplicating data
  • Need agility to scale up and down compute

resources

  • Deploying multiple distributed platforms,

libraries, applications, and versions

  • One size environment fits none
  • Need a flexible and future-proof solution

Laptop On-Prem Cluster Off-Prem Cluster

Distributed ML / DL – Challenges

slide-18
SLIDE 18

Example Deployment Challenges

  • How to run clusters on heterogeneous host hardware

– CPUs and GPUs, including multiple GPU versions

  • How to maximize use of expensive hardware resources
  • How to minimize manual operations

– Automating the cluster creation and and deployment process – Creating reproducible clusters and reproducible results – Enabling on-demand provisioning and elasticity

slide-19
SLIDE 19

Example Deployment Challenges

  • How to support the latest versions of software

– Deployment complexity and upgrades – Version compatibility

  • How to ensure enterprise-class security

– Network, storage, user authentication, and access

slide-20
SLIDE 20

Docker is software that

performs operating-system-level virtualization also known as containerization. Containerization allows the existence of multiple instances on a server .

Source: https://en.wikipedia.org/wiki/docker_(software)

Modern Technology Innovations

Simplify Deployments Innovate Faster Deploy Anywhere

slide-21
SLIDE 21

Distributed ML / DL and Containers

  • ML / DL applications are compute

hardware intensive

  • They can benefit from the flexibility,

agility, and resource sharing attributes

  • f containerization
  • But care must be taken in how this is

done, especially in a large-scale distributed environment

slide-22
SLIDE 22

22 IT Data Data Platforms Data Science and ML / DL Tools Solutions Genome Research Video Surveillance Customer 360 Example Industry Use Cases Fraud Detection

HDFS/NFS

User Access Security Time to Deploy Multi-Tenant Data Duplication Data Store Cloud

AI-Driven Solutions for the Enterprise

slide-23
SLIDE 23

Turnkey Container-Based Solution

IOBoost™ – Extreme performance and scalability ElasticPlane™ – Self-service, multi-tenant clusters DataTap™ – In-place access to data on-prem or in the cloud

BlueData EPIC™ Software Platform

Data Scientists Developers Data Engineers Data Analysts

BI/Analytics Tools Bring-Your-Own

NFS HDFS

Compute Storage On-Premises Public Cloud

Big Data Tools ML / DL Tools Data Science Tools

CPUs GPUs

slide-24
SLIDE 24

TensorFlow and Horovod on Containers with Intel Xeon processors and Intel MKL

slide-25
SLIDE 25

Distributed TensorFlow – Concepts

  • Running TensorFlow training in parallel, on multiple

devices

  • Goal is to improve accuracy and speed
  • Different layers may be trained on different nodes

(model parallelism)

  • Same model can applied on different subset of data,

in different nodes (data parallelism)

slide-26
SLIDE 26

Distributed TensorFlow – Schemes

  • Data parallelism implementation

– Needs to sync model parameters – Uses a centralized or decentralized scheme to communicate parameter update

  • Centralized schemes use Parameter

Server to communicate updates to

parameters (gradients) between nodes

  • Decentralized schedules use ring-allreduce

scheme

  • Horovod is an open source framework

developed by Uber that supports allreduce

slide-27
SLIDE 27

Meet Horovod

  • Distributed training framework for

– Tensorflow, PyTorch, Keras

  • Separates infrastructure capabilities from ML
  • Installs easily on existing ML framework

– pip install horovod

  • Uses bandwidth optimal communication

protocol – RDMA, InfiniBand if available

slide-28
SLIDE 28

Shared Data

TensorFlow with Horovod on Docker

Docker Containers

MPI 3.1.3 TensorFlow 1.7* MKL MPI 3.1.3 TensorFlow 1.7* MKL MPI 3.1.3 TensorFlow 1.7* MKL Horovod cluster on multiple containers, and machines

slide-29
SLIDE 29

TensorFlow with Horovod

  • tensorflow_wrd2vec.py from git https://github.com/horovod/horovod

examples

  • Data comes from shared NFS mounts, automatically surfaced by BlueData

into containers

  • Passwordless ssh setup during cluster creation
  • All prerequisites installed on all nodes, including:

– MKL – Math Kernel Library – tensorflow, pytorch, scikit-learn, ... (compute frameworks) –

  • penmpi (To distribute the job)

– tensorboard for visualization

slide-30
SLIDE 30

App Store with Pre-Built ML / DL Images

Docker images for multiple applications and versions Ability to create and add new images

slide-31
SLIDE 31

mpirun –np 4 /

  • -allow-run-as-root /
  • d -H bluedata-302.bdlocal:2,bluedata-301.bdlocal:4 /
  • bind-to none -map-by slot /
  • x LD_LIBRARY_PATH /
  • x PATH /
  • mca pml ob1 /
  • mca btl ^openib python tensorflow_word2vec_logs.py

TensorFlow with Horovod

slide-32
SLIDE 32

Lessons Learned and Key Takeaways

slide-33
SLIDE 33

Lessons Learned and Takeaways

  • Enterprises are using ML / DL today to solve difficult problems

(example use cases: fraud detection, disease prediction)

  • Distributed ML / DL in the enterprise requires a complex stack,

with multiple different tools (TensorFlow is one popular option)

  • The only constant is change … be prepared

– Business needs, use cases, and tools will constantly evolve

  • Deployments are challenging, with many potential pitfalls

– Containerization can deliver agility and cost saving benefits

slide-34
SLIDE 34

Lessons Learned and Takeaways

  • Leverage a flexible, scalable, and elastic platform for success

– BlueData provides a turnkey container-based platform for large-scale distributed AI / ML / DL in the enterprise – Enterprise-grade security and performance, proven in production at leading Global 2000 organizations – Decouple compute from storage for greater efficiency, and deploy on-premises, in a hybrid model, or multi-cloud – Save time, save money, and accelerate innovation

slide-35
SLIDE 35

To learn more, visit the BlueData booth in the Expo Hall Thank You

www.bluedata.com