and Extreme Scale Research Computing D. Karres, Beckman Institute - - PowerPoint PPT Presentation

and
SMART_READER_LITE
LIVE PREVIEW

and Extreme Scale Research Computing D. Karres, Beckman Institute - - PowerPoint PPT Presentation

NSF XSEDE Campus Champions and Extreme Scale Research Computing D. Karres, Beckman Institute J. Alameda, National Center for Supercomputing Applications S. Kappes, National Center for Supercomputing Applications Outline Research


slide-1
SLIDE 1

NSF XSEDE Campus Champions and Extreme Scale Research Computing

  • D. Karres, Beckman Institute
  • J. Alameda, National Center for Supercomputing Applications
  • S. Kappes, National Center for Supercomputing Applications
slide-2
SLIDE 2

Outline

  • Research Computing @ Illinois

– National Science Foundation Investments

  • Extreme Science and Engineering Discovery Environment (XSEDE)
  • Blue Waters

– Research IT

  • Highlighted Core Services

– Campus Champions

  • History/Motivation
  • Scope
  • Benefits
slide-3
SLIDE 3

Outline

  • Research Computing @ Illinois

– National Science Foundation Investments

  • Extreme Science and Engineering Discovery Environment (XSEDE)
  • Blue Waters

– Research IT

  • Highlighted Core Services

– Campus Champions

  • History/Motivation
  • Scope
  • Benefits
slide-4
SLIDE 4

National Science Foundation Investments

  • Some Context: M. Parashar, NSF Town Hall,

PEARC18, July 25, 2018

– Leadership HPC

  • Blue Waters (through December 2019)
  • Phase 1: Frontera @ TACC (production ~mid 2019)

– Innovative HPC

  • Allocated through XSEDE
  • Large Scale, Long-tail, Data Intensive, Cloud

– Services

  • XSEDE: Supporting Innovative HPC resources
  • XD Metrics Service (xdmod)
  • Open Science Grid
slide-5
SLIDE 5

Outline

  • Research Computing @ Illinois

– National Science Foundation Investments

  • Extreme Science and Engineering Discovery Environment (XSEDE)
  • Blue Waters

– Research IT

  • Highlighted Core Services

– Campus Champions

  • History/Motivation
  • Scope
  • Benefits
slide-6
SLIDE 6

Slides adapted from:

Linda Akli, SURA

Assistant Director, Education, Training, Outreach Manager, XSEDE Broadening Participation Program

XSEDE Overview

Fall 2018

slide-7
SLIDE 7

What is XSEDE?

Foundation for a National CI Ecosystem

  • Comprehensive suite of advanced digital

services that federates with other high-end facilities and campus-based resources

Unprecedented Integration of Diverse Advanced Computing Resources

  • Innovative, open architecture making

possible the continuous addition of new technology capabilities and services

slide-8
SLIDE 8

XSEDE Leadership

slide-9
SLIDE 9

Mission and Goals

Mission: Accelerate scientific discovery

  • Deepen and Extend Use
  • Raise the general awareness of the value
  • Deepen the use and extend use to new communities
  • Contribute to the preparation of current and next

generation scholars, researchers, and engineers

  • Advance the Ecosystem
  • Sustain the Ecosystem

Strategic Goals:

slide-10
SLIDE 10

Total Research Funding Supported by XSEDE 2.0

10

$1.97 billion in research supported by XSEDE 2.0

September 2016 - April 2018

Research funding only. XSEDE leverages and integrates additional infrastructure, some funded by NSF (e.g. “Track 2” systems) and some not (e.g. Internet2).

NSF, 754.3, 38% NIH, 432.0, 22% DOE, 325.0, 16% DOD, 175.1, 9% DOC, 55.2, 3% NASA, 38.2, 2% All Others , 187.7, 10%

slide-11
SLIDE 11

XSEDE Supports a Breadth of Research

Earthquake Science Molecular Dynamics Nanotechnology Plant Science Storm Modeling Epidemiology Particle Physics Economic Analysis of Phone Network Patterns Large Scale Video Analytics (LSVA) Decision Making Theory Library Collection Analysis

Replicating Brain Circuitry to Direct a Realistic Prosthetic Arm XSEDE researchers visualize massive Joplin, Missouri tornado

slide-12
SLIDE 12

A collaboration of social scientists, humanities scholars and digital researchers harnessed the power of high-performance computing to find and understand the historical experiences of black women by searching two massive databases of written works from the 18th through 20th centuries.

Recovering Lost History

slide-13
SLIDE 13

XSEDE Visualization and Data Resources

Visualization

Visualization Portal

  • Remote, interactive, web-

based visualization

  • iPython / Jupyter Notebook

integration

  • R Studio Integration

Storage

  • Resource file system storage:

All compute/visualization allocations include access to limited disk and scratch space

  • n the compute/visualization

resource file systems to accomplish project goals

  • Archival Storage: Archival

storage on XSEDE systems is used for large-scale persistent storage requested in conjunction with compute and visualization resources.

  • Stand-alone Storage: Stand-

alone storage allows storage allocations independent of a compute allocation.

13

slide-14
SLIDE 14

Compute and Analytics Resources

Bridges: Featuring interactive on-demand access, tools for gateway building, and virtualization. Comet: hosting a variety of tools including Amber, GAUSSIAN, GROMACS, Lammps, NAMD, and VisIt. Jetstream: A self-provisioned, scalable science and engineering cloud environment Stampede-2: Intel's new innovative MIC technology on a massive scale Super Mic: Equipped with Intel's Xeon Phi

  • technology. Cluster consists of 380

compute nodes. Wrangler: Data Analytics System combines database services, flash storage and long- term replicated storage, and an analytics

  • server. IRODS Data Management,

HADOOP Service Reservations, and Database instances.

slide-15
SLIDE 15

Science Gateways

The CIPRES science gateway: A NSF investment launching thousands of scientific publications with no sign of slowing down. https://sciencenode.org/feature/cipres-one-facet-in-bold-nsf-vision.php?clicked=title

slide-16
SLIDE 16

XSEDE High Throughput Computing Partnership

16

  • Governed by the OSG consortium
  • 126 institutions with ~120 active sites collectively

supporting usage of ~2,000,000 core hours per day

  • High throughput workflows with simple system and

data dependencies are a good fit for OSG

  • Access Options:
  • OSGConnect available to any researcher affiliated

with US institutions and who are funded by US funding agencies

  • OSG Virtual Organization such as CMS and ATLAS
  • XSEDE
  • https://portal.xsede.org/OSG-User-Guide

Open Science Grid

slide-17
SLIDE 17

Accessing XSEDE - Allocations

Education Research

17

Champion Startup

slide-18
SLIDE 18

XSEDE User Support Resources

Technical information Training Help Desk/Consultants Extended Collaborative Support Service

slide-19
SLIDE 19

Workforce Development: Training

XSEDE Training Course Catalog with all materials in a single location Course Calendar for viewing a listing of and registering for upcoming training events and a registration Online Training on materials relevant to XSEDE users Badges available for completing selected training Some events provide participation documentation Training Roadmaps

slide-20
SLIDE 20

pearc19.pearc.org

July 28 - Aug 1, 2019 Chicago, IL

slide-21
SLIDE 21

Welcome to XSEDE!

slide-22
SLIDE 22

Outline

  • Research Computing @ Illinois
  • National Science Foundation Investments
  • Extreme Science and Engineering Discovery Environment (XSEDE)
  • Blue Waters
  • Research IT
  • Highlighted Core Services
  • Campus Champions
  • History/Motivation
  • Scope
  • Benefits
slide-23
SLIDE 23

Blue Waters Overview

slide-24
SLIDE 24

Blue Waters

  • Most capable supercomputer on a University

campus

  • Managed by the Blue Waters Project of the

National Center for Supercomputing Applications at the University of Illinois

  • Funded by the National Science Foundation

24

Goal of the project Ensure researchers and educators can advance discovery in all fields of study

slide-25
SLIDE 25

Blue Waters System

Top-ranked system in all aspects of its capabilities Emphasis on sustained performance

  • Built by Cray (2011 – 2012).
  • 45% larger than any other system Cray has ever built
  • By far the largest NSF GPU resource
  • Ranks among Top 10 HPC systems in the world in peak performance despite its age
  • Largest memory capacity of any HPC system in the world: 1.66 PB (PetaBytes)
  • One of the fastest file systems in the world: more than 1 TB/s (TeraByte per second)
  • Largest backup system in the world: more than 250 PB
  • Fastest external network capability of any open science site: more than 400 Gb/s

(Gigabit per second)

slide-26
SLIDE 26

26

Blue Waters System

Processors, Memory, Interconnect, Online Storage, System Software, Programming Environment

Software

Visualization, analysis, computational libraries, etc.

SEAS: Software Engineering and Application Support

Petascale Applications

Computing Resource Allocations

User and Production Support

WAN Connections, Consulting, System Management, Security, Operations, …

National Petascale Computing Facility EOT

Education, Outreach, and Training

GLCPC

Great Lakes Consortium for Petascale Computing

Hardware

External networking, IDS, back-up storage, import/export, etc

Industry partners

Blue Waters Ecosystem

slide-27
SLIDE 27

Blue Waters Computing System

Sonexion: 26 usable PB

>1 TB/sec 100 GB/sec

Spectra Logic: 200 usable PB 400+ Gb/sec WAN

Scuba Subsystem: Storage Configuration for User Best Access

1.66 PB

10/40/100 Gb Ethernet Switch IB Switch External Servers

13.34 PFLOPS

slide-28
SLIDE 28

Blue Waters Allocations: ~600 Active Users

NSF PRAC, 80%

  • 30 – 40 teams, annual request for proposals (RFP) coordinated by NSF
  • Blue Waters project does not participate in the review process

Illinois, 7%

  • 30 – 40 teams, biannual RFP

GLCPC, 2%

  • 10 teams, annual RFP

Education, 1%

  • Classes, workshops, training events, fellowships. Continuous RFP.

Industry Innovation and Exploration, 5% Broadening Participation, a new category for underrepresented communities

28

slide-29
SLIDE 29

Blue Waters Allocations: ~600 Active Users

NSF PRAC, 80%

  • 30 – 40 teams, annual request for proposals (RFP) coordinated by NSF
  • Blue Waters project does not participate in the review process

Illinois, 7%

  • 30 – 40 teams, biannual RFP

GLCPC, 2%

  • 10 teams, annual RFP

Education, 1%

  • Classes, workshops, training events, fellowships. Continuous RFP.

Industry Innovation and Exploration, 5% Broadening Participation, a new category for underrepresented communities

29

slide-30
SLIDE 30

Usage by Discipline and User

Data From Blue Waters 2016-2017 Annual Report

Biophysics 10.8% Physics 12.3% Astronomical Sciences 10.4% Earth Sciences 13.3% Stellar Astronomy and Astrophysics 7.4% Molecular Biosciences 7.6% Atmospheric Sciences 6.4% Chemistry 5.2% Fluid, Particulate, and Hydraulic Systems 4.5% Engineering 4.9% Extragalactic Astronomy and Cosmology 2.4% Planetary Astronomy 2.5% Galactic Astronomy 2.1% Materials Research 2.5% Nuclear Physics 1.3% Biochemistry and Molecular Structure and Function 1.5% Neuroscience Biology 0.8% Computer and Computation Research 1.0% Biological Sciences 1.5% Magnetospheric Physics 0.5% Chemical, Thermal Systems 0.3% Design and Computer- Integrated Engineering 0.3% Climate Dynamics 0.1% Environmental Biology 0.1% Social, Behavioral, and Economic Sciences 0.1% Other 7.5%

slide-31
SLIDE 31

Recent Science Highlights

31

LIGO binary-blackhole observation verification 160-million-atom flu virus EF5 Tornado Simulation Arctic Elevation Maps Earthquake rupture

slide-32
SLIDE 32

Support for Python and Containers

  • Approx. 20% of Blue

Waters users use Python.

  • We provide over 260

Python packages and two Python versions.

  • Support for GPUs,

ML/DL, etc.

  • Support for “Docker-like”

containers using Shifter.

  • MPI across nodes with

access to native driver.

  • Access to GPU from

container.

  • Support for Singularity.

32

slide-33
SLIDE 33

33

Currently available libraries

  • TensorFlow 1.3.0

In the Pipeline

  • TensorFlow 1.4.x
  • PyTorch
  • Caffe2
  • Cray ML Acceleration

Data challenge: large training datasets

  • Example/Research Data on BW
  • ImageNet
  • Seeking Datasets for:
  • Natural Language Processing
  • Still looking for data set large enough
  • Biomedical dataset
  • biobank http://www.ukbiobank.ac.uk
  • Seeking users interests

Data Science and Machine Learning

slide-34
SLIDE 34

Blue Waters Summary

Outstanding Computing System

  • The largest installation of Cray’s most advanced technology
  • Extreme-scale Lustre file system with advances in

reliability/maintainability

  • Extreme-scale archive with advanced RAIT capability

Most balanced system in the open community

  • Blue Waters is capable of addressing science problems that are

memory, storage, compute, or network intensive or any combination.

  • Use of innovative technologies provides a path to future systems

NCSA is a leader in developing and deploying these technologies as well as contributing to community efforts.

34

slide-35
SLIDE 35

Post Blue Waters

  • Frontera: New leadership class resource at

Texas Advanced Computing Center

– Production mid-2019 – More to come soon

slide-36
SLIDE 36

Outline

  • Research Computing @ Illinois

– National Science Foundation Investments

  • Extreme Science and Engineering Discovery Environment (XSEDE)
  • Blue Waters

– Research IT

  • Highlighted Core Services

– Campus Champions

  • History/Motivation
  • Scope
  • Benefits
slide-37
SLIDE 37

Research IT

  • Research IT Portal:

https://researchit.illinois.edu/

– Aggregation of many sources of Research IT resources – Some provided as services to campus wide community – Many contributed by campus units

slide-38
SLIDE 38

Highlighted Core Research IT Services

  • Active Data Service (ADS)

– Partnership between Research Data Service (University Library), NCSA and Tech Services – Mid-scale storage of actively used data

  • https://www.library.illinois.edu/rds/active-data-

storage-overview/

slide-39
SLIDE 39

Highlighted Core Research IT Services

  • Amazon Web Services

– Cloud services platform – Access coordinated by Tech Services

  • http://techservices.illinois.edu/services/amazon-web-services
slide-40
SLIDE 40

Highlighted Core Research IT Services

  • Data Management Consultations

– Consultation with library based subject experts for Data Management Plans, required for most funding proposals – Offered by University Library

slide-41
SLIDE 41

Highlighted Core Research IT Services

  • IDEALS (Illinois Digital Environment for

Access to Learning and Scholarship

– Digital repository for research and scholarship produced at the University of Illinois – Offered by University Library – https://www.ideals.illinois.edu/

slide-42
SLIDE 42

Highlighted Core Research IT Services

  • Illinois Campus Cluster Program (ICCP)

– High performance computing cluster available at Illinois – Offered by Research IT – Investor-based access – On-demand Research Computing as a Service access – https://campuscluster.illinois.edu/

slide-43
SLIDE 43

Highlighted Core Research IT Services

  • Training

– Training opportunities from many sources aggregated on Research IT portal – Combination of live events and asynchronous training opportunities

  • https://researchit.illinois.edu/resources?categories=8&

page=1

slide-44
SLIDE 44

Upcoming Core Research IT Services

  • Research Computing Applications and

Software Development Consulting

– Allocated resource; collaborate with one or more experts for up to 1 year on your research computing project – Apply online at https://researchit.illinois.edu/initiatives

  • High Throughput Computing

– New computational resource – Currently in pilot phase

slide-45
SLIDE 45

Outline

  • Research Computing @ Illinois

– National Science Foundation Investments

  • Extreme Science and Engineering Discovery Environment (XSEDE)
  • Blue Waters

– Research IT

  • Highlighted Core Services

– Campus Champions

  • History/Motivation
  • Scope
  • Benefits
slide-46
SLIDE 46

Campus Champions

The grass roots, ubiquitous, community of practice

Slides adapted from: Dana Brunson XSEDE Campus Engagement co-manager

  • Asst. VP for Research Cyberinfrastructure

Director, High Performance Computing Center Oklahoma State University

With support from:

slide-47
SLIDE 47

Campus Champions: Breadth

  • 520+ champions from 260+ US academic and research-focused institutions help

their local researchers to use CI, especially large scale and advanced computing.

  • Every US state, every US EPSCoR jurisdiction
  • FREE to join with letter of collaboration.
  • Funded via XSEDE project (but no longer XSEDE-focused).
  • All flavors of campus research computing professionals (directors, sysadmins,

user support, trainers, coordinators, research software engineers, etc.).

  • Plus new “affiliates” – from organizations that serve the larger community

(XSEDE, Internet2, Globus, BDHubs, national HPC centers, etc).

  • https://www.xsede.org/campus-champions

47

slide-48
SLIDE 48

Campus Champions

Research computing community facilitating computing- and data-intensive research and education

48

  • Champions help their

local researchers and educators find and use the advanced digital services that best meet their needs; and,

  • Champions share CI

challenges and solutions

With support from: https://www.xsede.org/web/site/community-engagement/campus-champions

slide-49
SLIDE 49

Evolution of Champions

TeraGrid 2004-2011

  • Fall 2007: Planning for Champions began
  • 2008: Campus Champions official started
  • May 2008: First Champion selected
  • December 2009: Champion Leadership team

formed XSEDE 1: 2011-2016

  • August 2011: 100th Institution joined
  • May 2012: Champion Fellows program began
  • June 2013: 200th Champion joined
  • July 2013: Domain and Student Champion program

initiated

  • January 2015: Regional Champion Program Initiated
  • August 2015: 200th Institution joined

XSEDE 2: 2016- present

  • July 2017: Elected first half of leadership team
  • September 2017: 400th Champion joined
  • April 2018: Champion institutions in every state

and EPSCoR jurisdiction

  • July 2018: Fully elected Leadership team
  • Planning for sustainability beyond XSEDE

Current numbers:

  • Total Academic Institutions : 244

–Academic institutions in EPSCoR jurisdictions : 74 –Minority Serving Institutions: 48

  • Non-academic, not-for-profit organizations: 25
  • Total Campus Champion Institutions: 269
  • Total Number of Champions: 523

49

slide-50
SLIDE 50

100 200 300 400 500 600 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

Campus Champions Growth

slide-51
SLIDE 51

Synchronous Champions

Monthly Discussions

  • 1st Tuesday: Champion Leadership team
  • 2nd Tuesday: Community Chat
  • 3rd Tuesday: All-Champions Call

– Guest speakers – Community updates

  • 4th Tuesday: Sustainability working group

Other Calls and Meetings

  • Ad hoc special topics calls
  • Face-to-face meetings at regional and national conferences

51

slide-52
SLIDE 52

All Champion Monthly call

52

slide-53
SLIDE 53

Asynchronous Champions (CY 2017)

53

Longest Threads in 2017 1. Data Management Initiatives on YOUR Campus - TGR/TGW? 2. Adjusted Peak Performance for HPC clusters 3. OS flavors in HPC 4. HPC systems login access only with VPN -- good idea? 5. AMD EPYC and Intel Skylake Pricing Extremes 6. Theoretical Peak Performance 7. Champions-style job board? 8. CephFS for HPC? 9. Successful Scheduling

  • 10. Xeon Phi on Motherboard
  • 11. cgroups Memory Leak in RHEL 6/7.x
  • 12. Advice needed -- Gaussian software on an HPC
  • 13. sysadmin internships for undergrads?
  • 14. On Evaluation of Cluster Security and Patching
  • 15. Cluster Environment Monitoring
  • 16. Service ticketing/tracking software
  • 17. HPC Steering Committee
  • 18. If you had $50K...
  • 19. University risk due to using external computing resources
  • 20. Cloud Costing for NSF
  • 21. High School Student Looking for HPC
  • 22. cloud vs local cluster stats
  • 23. Memory leak in VASP (GPU)?
  • 24. Origins of the Data Management Plan
  • 25. New Campus Champion
  • 26. Onboarding of HPC users & its challenges
  • 27. How to submit an extension via XRAS

Slack

  • 14,425 total messages
  • 11,752 direct messages
  • 1,525 in public channel

Email list:

  • Distinct contributors: 274
  • NEW contributors: 128
  • # messages: 1409
  • # threads: 601
slide-54
SLIDE 54

Champion Focus Group at PEARC18 Quotes: What value do you gain from the Champions program?

  • …It gives you a position not necessarily of authority but you have expertise of

how to help someone with their work and they take you seriously.

  • …When I talk to the CS faculty in their ivory towers I know exactly what the

cutting edge is and can interact with them.

  • I don’t think I could’ve survived without it. So having a job is a benefit
  • It’s a very welcoming community…I came to the community 2 years ago…I have

no hesitation about sending something to the champions list or the slack channel even if it’s a dumb question

  • There are other communities that don’t receive my dumb questions well but

not the Champions community

slide-55
SLIDE 55

Campus Champions & Friends: Community of Communities

55

Photo Credit: Tiffany Jolley

slide-56
SLIDE 56

Plus: SC18 All Champions Meeting with guest speaker Dan Stanzione Tuesday 1:30-3:30 P.M. Room D220 (Dan at 2:15)

56

slide-57
SLIDE 57

Questions?

  • Thank you for your attention!
  • Dean Karres (karres@Illinois.edu)
  • Jay Alameda (alameda@Illinois.edu)
  • Sandie Kappes (kappes@Illinois.edu)