The Challenge HPC IT departments required to host Data Science - PowerPoint PPT Presentation

The Challenge ▪ HPC IT departments required to host ▪ Data Science and Machine Learning a variety of different workloads ▪ General applications supporting business processes ▪ Containerized and non-containerized ▪ Multiple ways to run workloads Virtualized and non-virtualized ▪ On-premises and in the cloud ▪ Hard to find and retain skilled staff Hard to build, manage, and monitor ▪ Hard to maximize resource utilization and scale the necessary computing infrastructure up/down/out appropriately ▪ Hard to be remain flexible / agile and up to date

Clou oud Adoption ion Transformation Trends Data-Int Inten ensive sive Work rkload loads Competing effectively and solving complex business problems is driving new types of workloads Compute te-In Inten ensive sive Worklo kload ads Private and public cloud are both attractive options for IT organizations Linux-based clusters are the preferred infrastructure for running advanced Clust stered ed IT infrast astruc ructu ture e workloads and private clouds prov ovid ides es the fou oundation dation

Services are Converging Smart operators use The services those resources convergence to maximize support are also converging innovation, insight, and agility HPC, HP , Big Da Data, , Clou oud adoption ion and Ma Mach chin ine e Learning ing is ta table stake kes are be beco coming ng mission on-crit critica ical l Adv dvance nced d cl clust stered ed IT infrast astruc ructur ture e enabl bles es the co conver erge genc nce

Are you ready?

Whatever the approach, the enterprise datacenter needs a tru trust sted ed pl platf tform orm for de depl ployi oying ng , mana ma nagin ging , and mon monit itori oring ng its advanced IT IT in infr frast struc ructur ture .

Wha hat t wou ould it me mean n to o you our or organ an ization if you could… • Deploy a cluster in 5 minutes? • Extend your infrastructure into the cloud with a few clicks? • Automate the de depl ploym ymen ent and management of new infrastructure? • Free up specialized staff for higher-value activities? • Retain knowledge of infrastructure management and best practices? • Spin up and tear down clustered environments in minutes ?

Recommended Approach 1 Host all workloads on clusters rather than individual servers 2 Multiple workloads and execution paradigms on same cluster 3 Fast, automated re-purposing of compute resources 4 Use manual or policy-driven control 5 Automatically extend on-premises infrastructure to public cloud

Introducing: Br Brig ight ht So Soft ftwa ware e Empowering the adoption of advanced clustered infrastructure for HPC , Da HP Data ta Sc Scie ienc nce , and Pr Priv ivate Cl te Clou ouds ds

Bright software auto tomat ates es deploying, managing, and monitoring cl cluste tered d serv rver r infra rastr tructu cture in the data center or in the cloud

Ideal for managing converged IT with multiple cluster types deployed across both physical and virtual infrastructure, on premises or in the cloud.

Here’s what you can do with Bright 1 Delive iver r co comp mputi ting ng ca capacit city y fast 2 Provis vision ion 10 to to 10 10,000+ 0+ nodes from ba bare metal in minutes 3 Repurp rpose ose serv rvers rs to to acc ccomm mmod odat ate e fluct ctuat ating ing wo work rkloa loads ds on th the fly 4 Ext xtend yo your on-premises mises envi vironmen nment t to AW AWS and Az Azure Dyn ynamica ically lly 5 Automat ate e provi visioning ioning, , deployme yment, nt, and ma managemen ent

Bright for Data Science makes it easy to use a Bright cluster for AI

Bright for Data Science Bright for Data Science HPC Bright Clust ster er Mana nager ger GPU s GPU s GPU s GPU s GPU s GPU s GPU s GPU s GPU s GPU s Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux

Without Bright • Not installable from OS repositories • Time-consuming, manual installation of deep learning libraries and frameworks • 60+ dependencies must be satisfied • Versions must all work together

With Bright “This [solution] will be a powerful productivity multiplier for customers because these software modules take days to download and install if using the open source repositories.” – a Bright user

With Bright: two simple commands 1 # yum install tensorflow cm-jupyterhub # yum --installroot=/cm/images/ai-image \ 2 install cm-ml-distdeps • 1 st command installs frameworks into a shared directory on the head node. It is immediately available on every node in cluster. • Yum installs all dependencies for tensorflow and cm-jupyterhub, and all the Python dependencies • 2 nd command installs all library dependencies into ai-image

Cloud Bursting ▪ On-premise cluster extended with resources from public cloud ▪ Uniformity: cloud nodes look & feel ▪ Possible to do gradual same as transition to cloud on-premise nodes ▪ Single workload management ▪ Multi-cloud possible system (e.g. some jobs to AWS, ▪ Same user authentication some to Azure) ▪ Same software images used for ▪ Applications will run in provisioning cloud as if they were ▪ Same shared software running in an on-premise cluster environment (e.g. NFS applications tree, environment modules)

Achieving Uniformity WORKLOAD PROVISIONING AUTHENTICATION MANAGEMENT ▪ Node-installer loaded from cloud machine image (instead of loading through ▪ Typical set-up: one job PXE) ▪ Head node runs LDAP server queue per cloud region ▪ Cloud director serves as ▪ Cloud director runs LDAP ▪ User decides whether to run provisioning node for all replica server job on-premise or in cloud nodes in particular cloud by submitting to queue region ▪ AD/external LDAP also ▪ Single queue containing all possible ▪ Cloud director receives copy nodes also possible of all software images (kept up-to-date automatically) ▪ Same kernel version

REGION Y REGION X

Scaling node count up/down Add/remove cloud nodes: ▪ Manually by administrator ▪ Automatically based on workload in queue using cm-scale tool cm cm-sca cale le node operations: ▪ Power on/off SC SCALE ALE ▪ Create new node (in cloud) / terminate ▪ Move to new node category (i.e. re-purpose node) ▪ Subscribe to new configuration overlay (i.e. re-purpose node) Custom policies via Python module

Moving data in/out of cloud ▪ Jobs depend on input data and produce output data ▪ cm-sub allows user to specify data dependencies for jobs ▪ Job input data will be moved into cloud before job resources are allocated ▪ Data staged on temporary storage node (dynamically spun up) ▪ Job output data will be moved back to on-premises cluster ▪ Data movement is transparent to user

To man anage age ad advanc anced ed IT Infrast rastruct ructur ure e for … Defense Big Data Analytics Big Data Ope penStac nStack Spark Life Science Virtual Machines Deep p Lear arning ing Energy HPC HPC Cassandra Manufacturing Academic Data Science Da ience Research Pharma NoSQL Government Machine Learning Edu duca cati tion choose

What is Bright Edge? A new feature in Bright 8.2 that allows nodes of a single, centrally managed cluster to span geographic locations

What is Bright Edge? Simplified deployment and management of edge compute Reduced admin time for distributed clusters Promotes standardization

Customer Spotlight: Van Andel Institute "We know that cloud computing is the wave of the future. The hybrid Van n Andel del Inst stitut tute e (VAI) I) hosts ts approach we are getting thirty ty individu vidual resear search ch groups ups with Bright is providing a who o use genomic omic seque uenci ncing ng path that helps us analys alysis, s, molec ecular ular dyna ynami mics s transition. ” simulati ulation on, and d modeli eling ng to investigat estigate e epigeneti genetics cs, cancer ncer, — Zack Ramjan, Research and d neur urodegene odegenerat rative ve disea eases. ses. Computing Architect, VAI Bri right ht OpenSta nStack ck lets ts VAI I mana nage ge high-perfor performan mance ce computing puting (HPC) C) and d cloud ud computi puting ng in th the e same me infras astruc ructur ture, e, greatl atly y reducing ucing the labor or and d effort t needed eded for mana nageme gement nt and d change ange contr trol ol.

The Challenge HPC IT departments required to host Data Science - PowerPoint PPT Presentation

The Challenge HPC IT departments required to host Data Science and Machine Learning a variety of different workloads General applications supporting business processes Containerized and non-containerized Multiple ways

VAST CHALLENGE 2017 Bianca Barnucz & Stephanie Wegscheidl OVERVIEW VAST Challenge

ReSAKSS DATA CHALLENGE Annual Newsletter www.resakss.org/challenge ReSAKSS DATA CHALLENGE ANNUAL

STEP CHALLENGE February 7 th March 8 th CHALLENGE OVERVIEW This Step Challenge is a fun

Michelin Challenge Bibendum 2014 CONTENT CHALLENGE BIBENDUM THINK & ACTION TANK TO

Ultimately our vision is about GRAND CHALLENGE using science to make a difference in the world.

New Challenge 10 New Challenge 10 June 1, 2007 Business environment Direction Challenge

Heat Program Challenge: Risk Perception Source: NOAA, ADHS Challenge: Risk Perception Source:

Arizona FAF$A Challenge Julie Sainz, M.Ed. Arizona FAF$A Challenge Project Manager Arizona

City of Santa Clara Challenge Team May 10, 2017 https://hkidsf.org/our-programs/challenge-team/

@ International KEYSTONE Challenge Track Conference Challenge Track Koice 11 12 May 2015

THIS IS WHERE CHANGE BEGINS - Worlds Challenge Challenge February 12, 2018 1 AGENDA

www.bpho.org.uk Oxford 24 th June 2014 Physics Challenge AS Challenge A2 Challenge

Smarter Cities Challenge Burlington, Vermont 2 | Smarter Cities Challenge Mission

Forensic Challenge V2.0 UNAM-CERT RedIRIS Topics * Forensic Challenge V1.0 * Forensic

POSIX mini-challenge Leo Freitas and Jim Woodcock University of York December 2006 @ TC Dublin

Nebraska Challenge Nebraska Challenge Set Exercise Set Exercise November 2011 November, 2011 1

The Data Science Process Polong Lin Big Data University Leader & Data Scientist IBM

VHL and clear cell Renal Cell Carcinoma Gene expression profiles in renal cell VHL syndrome

S7750 - THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE Presenter: Louis Capps,

Space-Time Areal Mixture Model: Relabeling Algorithm and Model Selection Issues Md Monir

SIBR: Interprofessional Rounding A VCU Health Priority Initiative Sarah Hartigan, MD Associate

NATIONAL CANCER INSTITUTE Driving progress in cancer control 2015 AT A GLANCE RESEARCH 920

Introduction to the National Cancer Institutes Interest in e-ASIA Julie A. Schneider, Ph.D.

DATA IN THE CLOUD Christopher R. Kinsinger, National Cancer Institute Outline Introduction

The Challenge HPC IT departments required to host Data Science - PowerPoint PPT Presentation

The Challenge HPC IT departments required to host Data Science and Machine Learning a variety of different workloads General applications supporting business processes Containerized and non-containerized Multiple ways

VAST CHALLENGE 2017 Bianca Barnucz &amp; Stephanie Wegscheidl OVERVIEW VAST Challenge

ReSAKSS DATA CHALLENGE Annual Newsletter www.resakss.org/challenge ReSAKSS DATA CHALLENGE ANNUAL

STEP CHALLENGE February 7 th March 8 th CHALLENGE OVERVIEW This Step Challenge is a fun

Michelin Challenge Bibendum 2014 CONTENT CHALLENGE BIBENDUM THINK &amp; ACTION TANK TO

Ultimately our vision is about GRAND CHALLENGE using science to make a difference in the world.

New Challenge 10 New Challenge 10 June 1, 2007 Business environment Direction Challenge

Heat Program Challenge: Risk Perception Source: NOAA, ADHS Challenge: Risk Perception Source:

Arizona FAF$A Challenge Julie Sainz, M.Ed. Arizona FAF$A Challenge Project Manager Arizona

City of Santa Clara Challenge Team May 10, 2017 https://hkidsf.org/our-programs/challenge-team/

@ International KEYSTONE Challenge Track Conference Challenge Track Koice 11 12 May 2015

THIS IS WHERE CHANGE BEGINS - Worlds Challenge Challenge February 12, 2018 1 AGENDA

www.bpho.org.uk Oxford 24 th June 2014 Physics Challenge AS Challenge A2 Challenge

Smarter Cities Challenge Burlington, Vermont 2 | Smarter Cities Challenge Mission

Forensic Challenge V2.0 UNAM-CERT RedIRIS Topics * Forensic Challenge V1.0 * Forensic

POSIX mini-challenge Leo Freitas and Jim Woodcock University of York December 2006 @ TC Dublin

Nebraska Challenge Nebraska Challenge Set Exercise Set Exercise November 2011 November, 2011 1

The Data Science Process Polong Lin Big Data University Leader &amp; Data Scientist IBM

VHL and clear cell Renal Cell Carcinoma Gene expression profiles in renal cell VHL syndrome

S7750 - THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE Presenter: Louis Capps,

Space-Time Areal Mixture Model: Relabeling Algorithm and Model Selection Issues Md Monir

SIBR: Interprofessional Rounding A VCU Health Priority Initiative Sarah Hartigan, MD Associate

NATIONAL CANCER INSTITUTE Driving progress in cancer control 2015 AT A GLANCE RESEARCH 920

Introduction to the National Cancer Institutes Interest in e-ASIA Julie A. Schneider, Ph.D.

DATA IN THE CLOUD Christopher R. Kinsinger, National Cancer Institute Outline Introduction

VAST CHALLENGE 2017 Bianca Barnucz & Stephanie Wegscheidl OVERVIEW VAST Challenge

Michelin Challenge Bibendum 2014 CONTENT CHALLENGE BIBENDUM THINK & ACTION TANK TO

The Data Science Process Polong Lin Big Data University Leader & Data Scientist IBM