The Challenge HPC IT departments required to host Data Science - - PowerPoint PPT Presentation
The Challenge HPC IT departments required to host Data Science - - PowerPoint PPT Presentation
The Challenge HPC IT departments required to host Data Science and Machine Learning a variety of different workloads General applications supporting business processes Containerized and non-containerized Multiple ways
The Challenge
▪ HPC ▪ Data Science and Machine Learning ▪ General applications supporting business processes ▪ Containerized and non-containerized ▪ Virtualized and non-virtualized ▪ On-premises and in the cloud ▪ Hard to find and retain skilled staff ▪ Hard to maximize resource utilization and scale up/down/out appropriately ▪ Hard to be remain flexible / agile and up to date
IT departments required to host a variety of different workloads Multiple ways to run workloads Hard to build, manage, and monitor the necessary computing infrastructure
Competing effectively and solving complex business problems is driving new types of workloads
Transformation Trends
Data-Int Inten ensive sive Work rkload loads Clou
- ud Adoption
ion
Private and public cloud are both attractive options for IT organizations Linux-based clusters are the preferred infrastructure for running advanced workloads and private clouds
Compute te-In Inten ensive sive Worklo kload ads
Clust stered ed IT infrast astruc ructu ture e prov
- vid
ides es the fou
- undation
dation
Services are Converging
Adv dvance nced d cl clust stered ed IT infrast astruc ructur ture e enabl bles es the co conver erge genc nce
HP HPC, , Big Da Data, , and Ma Mach chin ine e Learning ing are be beco coming ng mission
- n-crit
critica ical l Clou
- ud adoption
ion is ta table stake kes The services those resources support are also converging Smart operators use convergence to maximize innovation, insight, and agility
Are you ready?
Whatever the approach, the enterprise datacenter needs a tru trust sted ed pl platf tform
- rm for de
depl ployi
- ying
ng, ma mana nagin ging, and mon monit itori
- ring
ng its advanced IT IT in infr frast struc ructur ture.
Wha hat t wou
- uld it me
mean n to
- you
- ur or
- rgan
anization if you could…
- Deploy a cluster in 5 minutes?
- Extend your infrastructure into the
cloud with a few clicks?
- Automate the de
depl ploym ymen ent and management of new infrastructure?
- Free up specialized staff for
higher-value activities?
- Retain knowledge of infrastructure
management and best practices?
- Spin up and tear down clustered
environments in minutes?
Recommended Approach
1 2 3 4 5 Host all workloads on clusters rather than individual servers Multiple workloads and execution paradigms on same cluster Fast, automated re-purposing of compute resources Use manual or policy-driven control Automatically extend on-premises infrastructure to public cloud
Introducing: Br Brig ight ht So Soft ftwa ware e
Empowering the adoption of advanced clustered infrastructure for HP HPC, Da Data ta Sc Scie ienc nce, and Pr Priv ivate Cl te Clou
- uds
ds
Bright software auto tomat ates es deploying, managing, and monitoring cl cluste tered d serv rver r infra rastr tructu cture in the data center or in the cloud
Ideal for managing converged IT with multiple cluster types deployed across both physical and virtual infrastructure,
- n premises or in the cloud.
Here’s what you can do with Bright
Delive iver r co comp mputi ting ng ca capacit city y fast Provis vision ion 10 to to 10 10,000+ 0+ nodes from ba bare metal in minutes Repurp rpose
- se serv
rvers rs to to acc ccomm mmod
- dat
ate e fluct ctuat ating ing wo work rkloa loads ds on th the fly Ext xtend yo your on-premises mises envi vironmen nment t to AW AWS and Az Azure Dyn ynamica ically lly Automat ate e provi visioning ioning, , deployme yment, nt, and ma managemen ent
1 2 3 4 5
Bright for Data Science makes it easy to use a Bright cluster for AI
Clust ster er Mana nager ger
Bright for Data Science
Linux GPUs Linux GPUs Linux GPUs Linux GPUs Linux GPUs Linux GPUs Linux GPUs Linux GPUs Linux GPUs Linux GPUs
Bright
HPC
Bright for Data Science
Without Bright
- Not installable from
OS repositories
- Time-consuming, manual
installation of deep learning libraries and frameworks
- 60+ dependencies must
be satisfied
- Versions must all
work together
“This [solution] will be a powerful productivity multiplier for customers because these software modules take days to download and install if using the open source repositories.”
– a Bright user
With Bright
With Bright: two simple commands
# yum install tensorflow cm-jupyterhub # yum --installroot=/cm/images/ai-image \ install cm-ml-distdeps
- 1st command installs frameworks into a shared directory on the
head node. It is immediately available on every node in cluster.
- Yum installs all dependencies for tensorflow and cm-jupyterhub,
and all the Python dependencies
- 2nd command installs all library dependencies into ai-image
1 2
▪ Uniformity: cloud nodes look & feel same as
- n-premise nodes
▪ Single workload management system ▪ Same user authentication ▪ Same software images used for provisioning ▪ Same shared software environment (e.g. NFS applications tree, environment modules)
Cloud Bursting
▪ On-premise cluster extended with resources from public cloud ▪ Possible to do gradual transition to cloud ▪ Multi-cloud possible (e.g. some jobs to AWS, some to Azure) ▪ Applications will run in cloud as if they were running in an on-premise cluster
Achieving Uniformity
PROVISIONING AUTHENTICATION WORKLOAD MANAGEMENT
▪ Node-installer loaded from cloud machine image (instead of loading through PXE) ▪ Cloud director serves as provisioning node for all nodes in particular cloud region ▪ Cloud director receives copy
- f all software images (kept
up-to-date automatically) ▪ Same kernel version ▪ Head node runs LDAP server ▪ Cloud director runs LDAP replica server ▪ AD/external LDAP also possible ▪ Typical set-up: one job queue per cloud region ▪ User decides whether to run job on-premise or in cloud by submitting to queue ▪ Single queue containing all nodes also possible
REGION Y REGION X
Scaling node count up/down
Add/remove cloud nodes:
▪ Manually by administrator ▪ Automatically based on workload in queue using cm-scale tool
cm cm-sca cale le node operations:
▪ Power on/off ▪ Create new node (in cloud) / terminate ▪ Move to new node category (i.e. re-purpose node) ▪ Subscribe to new configuration overlay (i.e. re-purpose node)
Custom policies via Python module
SC SCALE ALE
Moving data in/out of cloud
▪ Jobs depend on input data and produce output data ▪ cm-sub allows user to specify data dependencies for jobs ▪ Job input data will be moved into cloud before job resources are allocated ▪ Data staged on temporary storage node (dynamically spun up) ▪ Job output data will be moved back to on-premises cluster ▪ Data movement is transparent to user
To man anage age ad advanc anced ed IT Infrast rastruct ructur ure e for…
Da Data Science ience
Spark Cassandra NoSQL Deep p Lear arning ing
Machine Learning
HPC HPC
Big Data Analytics Life Science Energy Pharma Edu duca cati tion Government Manufacturing Defense Academic Research
Ope penStac nStack
Virtual Machines
choose
Big Data
What is Bright Edge?
A new feature in Bright 8.2 that allows nodes of a single, centrally managed cluster to span geographic locations
What is Bright Edge?
Simplified deployment and management of edge compute Reduced admin time for distributed clusters Promotes standardization
Customer Spotlight: Van Andel Institute
"We know that cloud computing is the wave of the future. The hybrid approach we are getting with Bright is providing a path that helps us
- transition. ”
— Zack Ramjan, Research
Computing Architect, VAI
Van n Andel del Inst stitut tute e (VAI) I) hosts ts thirty ty individu vidual resear search ch groups ups who
- use genomic
- mic seque
uenci ncing ng analys alysis, s, molec ecular ular dyna ynami mics s simulati ulation
- n, and
d modeli eling ng to investigat estigate e epigeneti genetics cs, cancer ncer, and d neur urodegene
- degenerat
rative ve disea eases. ses. Bri right ht OpenSta nStack ck lets ts VAI I mana nage ge high-perfor performan mance ce computing puting (HPC) C) and d cloud ud computi puting ng in th the e same me infras astruc ructur ture, e, greatl atly y reducing ucing the labor
- r and
d effort t needed eded for mana nageme gement nt and d change ange contr trol
- l.
Customer Spotlight: Bo Boei eing ng
”This supports one of Boeing’s IT priorities to realize productivity gains by streamlining processes and tools to eliminate waste.”
— Luis Gutierrez, Global Data Center and Server Infrastructure Director
Boeing ing makes kes exten ensi sive ve use e of IT IT to assi sist st in engi gine neeri ering ng product ducts.
- s. Over
r time e it a acqui quired ed some me 13,000 000 server vers s that at were e distri ributed buted geograph graphicall cally. Admi minist nistrat ration
- n was ad hoc,
maintenan intenance ce inconsi
- nsist
stent, nt, and d utiliza zation tion sub-optima
- ptimal.
Today day, Boeing ng uses es Bright ght Cluster uster Manager nager to consol solidate idate thei eir r server vers s into
- Mega
a Data a Cente ters, s, reduci ucing ng cost and d tripling ing server ver utiliza zation tion top 90% 0%
To manage advanced IT Infrastructure for…
COMPUTE E AND ND DATA INT NTEN ENSIVE VE WORK RKLOADS DS ON N PRE REMISE SES S OR I R IN N THE E CLOUD INN NNOVATION, ION, INS NSIGHT GHT AND ND AGILI LITY TY
Wha hat t Our ur Cus ustomer tomers s Sa Say
“Bright cuts my own workload by 50%, and pays for itself over and over in terms of headcount savings.” “Compared to the open source toolkits we were using, we typically save at least a day per month on just systems maintenance alone.” “We've experienced substantial savings in terms of hardware, maintenance and a reduction in staff hours.”