A TPURDUEUNIVE RSITY RCAC Staffing - - PowerPoint PPT Presentation

a tpurdueunive rsity
SMART_READER_LITE
LIVE PREVIEW

A TPURDUEUNIVE RSITY RCAC Staffing - - PowerPoint PPT Presentation

Preston Smith Director of Research Services July 2, 2015 RE SE ARCH COMPUTING INTRODUC DUCTION TO TORE SE SE ARCH SE SE RVICE S A TPURDUEUNIVE RSITY RCAC Staffing https://www.rcac.purdue.edu/about/staff/ WH WHOAR AREWE ?


slide-1
SLIDE 1

Preston Smith Director of Research Services

INTRODUC DUCTION TO TORE SE SE ARCH SE SE RVICE S

July 2, 2015

RE SE ARCH COMPUTING

A TPURDUEUNIVE RSITY

slide-2
SLIDE 2

RCAC Staffing

https://www.rcac.purdue.edu/about/staff/

slide-3
SLIDE 3

OVERVIEW

WH WHOAR AREWE ?

  • IT Research Computing (RCAC)
  • A unit of IT

aP (Information T echnology at Purdue) – the central IT organization at Purdue.

  • RCAC provides advanced computational resources and

services to support Purdue faculty and staff researchers.

Our goal: To be the one--stop provider of choice for research compu9ng and data services at Purdue -

  • Delivering powerful, reliable, easy--to--use, service--oriented

compu9ng and exper9se to Purdue researchers.

slide-4
SLIDE 4

COMMUNITY CLUSTE RS

A BUSI USINE SS SSMODE LFORHP HPCA T T PUR URDUE UEUN UNIVE RSI SITY

slide-5
SLIDE 5

THEFIRSTCOMMUNITY CLUSTE RS

  • Without a large capital acquisition by the university,

providing cutting-edge computing capabilities for researchers was not possible.

  • Many faculty were getting funding to acquire and
  • perate HPC resources for themselves
  • Solution: pool these funds to operate clusters for

researchers!

  • The faculty no longer have to devote a grad student to

managing their cluster!

THE POWE R OF SH SHARING

slide-6
SLIDE 6

COMMUNITY CLUSTE RS

VE VE RSIO ION1: TH THEBASI SICRUL ULE S S

  • Y
  • u get out at least what you put in
  • Buy 1 node or 100, you get a queue that

guarantees access up to that many CPUs

  • But wait, there’s more!!
  • What if your neighbor isn’t using his queue?

– Y

  • u can use it, but your job is subject to preemption

if he wants to run.

  • Y
  • u don’t have to do the work
  • Y
  • ur grad student gets to do research rather than

run your cluster.

– Nor do you have to provide space in your lab for computers.

  • IT

aP provides data center space, systems administration, application support.

  • Just submit jobs!
slide-7
SLIDE 7

SIX COMMUNITY CLUSTERS

COA TE S

8,032 cores Installed July 2009 24 departments 61 faculty Re9red

  • Sep. 2014

ROSSMANN

11,088 cores Installed Sept. 2010 17 departments 37 faculty

CARTE R

10,368 cores Installed April 2012 26 departments 60 faculty #175 on June 2013 Top 500

CONTE

9,280 Xeon cores (69,600 Xeon Phi cores) Installed August 2013 20 departments 51 faculty (as of Aug. 2014) #39 on June 2014 Top 500

HANSE N

9,120 cores Installed Sept. 2011 13 departments 26 faculty

STE E LE

7,216 cores Installed May 2008

Re9red Nov. 2013

slide-8
SLIDE 8

COMMUNITY CLUSTE RS

VI VIT AL L ST A TS S

  • 165 “owners”
  • ~1200 active users
  • 259M hours provided in 2014
  • Nationally

, the gold standard for condo-style computing

  • T
  • day

, the program is part of many departments’ faculty recruiting process.

  • A selling point to attract people to Purdue!
  • Please feel free and have your faculty candidates meet with

us during recruitment!

slide-9
SLIDE 9

IMPACT

F ACU CUL TYPARTN RTNE RS RS

Department Electrical and Computer Engineering OSG CMS Tier2 Mechanical Engineering AeronauNcs and AstronauNcs Earth, Atmospheric, and Planetary Sciences Chemistry Materials Engineering Chemical Engineering Biological Sciences Medicinal Chemistry and Molecular Pharmacology MathemaNcs Physics Biomedical Engineering StaNsNcs Nuclear Engineering Civil Engineering Agricultural and Biological Engineering Industrial and Physical Pharmacy Commercial Partners Computer Science Other College of Agriculture Agronomy Forestry and Natural Resources Cores 9816 9168 7008 5048 3632 1936 1504 1144 1104 1104 720 664 640 520 492 448 416 384 304 280 256 240 64

slide-10
SLIDE 10

IMPACT

HPC PCUSE USE RSAND ND SP SPONSO SORE D D DO DOLLARS RS

$150 $100 $50 $

  • 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

$200 $450 $400 $350 $300 $250 Millions HPC Awards Non-

  • HPC Awards
slide-11
SLIDE 11

COMPUTATION

NE W NE W MODE L– ORGANIZE ZE D BY CO COMMON N PROFI FILE S LE S

Community Clusters to Cluster Communities What neighborhoods are in our community?

HPC (Rice): MulNple cores or nodes, probably MPI. Benefit from high-

  • p

erformance network and parallel

  • filesystem. The vast majority of

campus -

  • 80% of all work!

HTC (Hammer): Primarily single core. CPU--bound. No need for high

  • performance network.

Life Science/Big Memory (Snyder): Use enNre node to get large amounts

  • f memory. Less need for high
  • performance network. Needs large,

fast storage.

slide-12
SLIDE 12

DA T A STORAGE

INFRAST ASTRUCTU TUREFOR ORRE SE ARCHDA T A

slide-13
SLIDE 13

DATA STORAGE

WH WHA TIS ISAVAILABL BLE TOD ODA Y FO FOR HPC C

  • Research computing has historically provided some

storage for research data for HPC users:

  • Archive (Fortress)
  • Actively running jobs (Cluster Scratch - Lustre)
  • Home directories

… And Purdue researchers have PURR to package, publish, and describe research data.

slide-14
SLIDE 14

THE SERVICE

FE FE A TUR URE S

HPC researchers can at last purchase storage! A storage service for research to address many common requests:

  • 100G available at no charge to research groups
  • Mounted on all clusters and exported via CIFS to labs
  • Not scratch: Backed up via snapshots, with DR coverage
  • Data in Depot is owned by faculty member!
  • Sharing ability – Globus, CIFS, and WWW
  • Maintain group-wide copies of application software or shared

data

slide-15
SLIDE 15

A SOLUTION

ADOP OPTION ON

Well received!

  • In less than 7 months, over 105 research

groups are participating.

  • Many are not HPC users!
  • Half a PB in use since November
  • A research group purchasing space has

purchased, on average, 8.6TB.

slide-16
SLIDE 16

THETE CHNOLOGY

WH WHA TDID IDWE WEGE T? T?

Approximately 2.25 PB of IBM GPFS Hardware provided by a pair of Data Direct Networks SFA12k arrays, one in each of MA TH and FREH datacenters 160 Gb/sec to each datacenter 5x Dell R620 servers in each datacenter

slide-17
SLIDE 17

DESIGN TARGETS

WH WHA TDOWE WENE E DTO TODO? O?

The Research Data Depot Can do:

Depot Requirements Previous solu9ons At least 1 PB usable capacity >1 PB 40 GB/sec throughput 5 GB/sec < 3ms average latency, < 20 ms maximum latency Variable 100k IOPS sustained 55k 300 MB/sec min client speed 200 MB/sec max Support 3000 simultaneous clients Yes Filesystem snapshots Yes MulN--site replicaNon No Expandable to 10 PB Yes Fully POSIX compliant, including parallel I/O No

slide-18
SLIDE 18

DATA

GUI UIDI DINGPRIN INCIP IPLE S

  • It’s important to think of Depot as a “data service” – not

“storage”

  • It is not enough to just provide infrastructure
  • “Here’s a mountpoint, have fun”
  • Our goal: enabling the frictionless use and movement of

data

  • Instrument -> Depot -> Scratch -> Fortress -> Collaborators ->

and back

  • Continue to improve access to non-UNIX users
slide-19
SLIDE 19

LIBRARY DA T A SE RVICE S

HOW OW CAN ANI I MANAGE GE ALL L MY DA T A? ?

  • Collaborations on multi-disciplinary grant proposals, both

internal and external

  • Developing customized Data Management Plans
  • Organizing your data
  • Describing your data
  • Sharing your data
  • Publishing your datasets
  • Preserving your data
  • Education on data management best practices
slide-20
SLIDE 20

OTHE R SE RVICE S

BE BE YO YONDTH THECOMMUNI NITYCLUST USTE RS

slide-21
SLIDE 21

IMPROVE D NE TWORKING

DE DE VE VE LOPME NT E NTS IN N CAMPU PUS NE TWO WORK RK

2014 network improvements

  • 100 Gb/sec WAN connections
  • Research Core
  • 160 Gb/sec core to each resource (up from 40)
  • 20 Gb/sec research core to most of campus
  • Campus Core Upgrade

h l ps://www.rcac.purdue.edu/news/681

slide-22
SLIDE 22

GLOBUS

E VE VE RYBODYNE E DS DS TO O SH SHAR ARE

Globus:

Transfer and share large datasets…. …. With dropbox-like characteristics ….

…. Directly from your own storage system!

This image cannot currently be displayed.
slide-23
SLIDE 23

GLOBUS

ST A TIS ISTIC ICS

Data moved in 2014: 13 TB in, 19TB out 200k files both directions 55 unique users In progress: Globus interface to Fortress

h" ps://transfer.rcac.purdue.edu

slide-24
SLIDE 24

EDUCATION

TRAIN ININ INGOPPOR ORTUNIT ITIE IE S

  • Programming practices – Software Carpentry
  • Parallel Programming – MPI, OpenMP
  • Big Data
  • Matlab
  • Accelerators – Xeon Phi, OpenACC, CUDA
  • UNIX 101
  • Effective use of Purdue research clusters
slide-25
SLIDE 25

NEED HELP?

CO COFFE FFE EBRE AK CO CONS NSUL T A TI TIONS NS

This image cannot currently be displayed.

COFFEE BREAK

CONSUL TATIONS RESEARCH COMPUTING

Meet up with ITaP research computing staff and other researchers who use or are interested in High Performance Computing at

  • Purdue. Join us for informal discussions of scientific computing

along with any other topic that might spring up.We’ll be meeting at different coffee shops around campus each week. Check the coffee web page to see this week’s location and time.

rcac.purdue.edu/coffee

This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently be displayed.
slide-26
SLIDE 26

SCHOLAR

HPC PCFO FOR INST STRUCT CTION N

  • Need to teach students to use HPC in a course?
  • Scholar cluster is available to any instructor at no cost.

Spring 2015: CS STAT CHEM EAPS AGRY ANSC ChemE

slide-27
SLIDE 27

SOFTWARE SOLUTIONS

NE E D D A PROGRAMME R?

Bring in our expertise to help your researchers create or modify software to take advantage of the latest technology in advanced computation, web frameworks, data analysis, visualization, sharing, and management. Our software development effort can be funded through grant awards

  • r contracts based on developer time.