User Support, Campus Integration, OSG XSEDE
Rob Gardner OSG Council Meeting June 25, 2015
User Support, Campus Integration, OSG XSEDE Rob Gardner OSG - - PowerPoint PPT Presentation
User Support, Campus Integration, OSG XSEDE Rob Gardner OSG Council Meeting June 25, 2015 149. Present to Council 1 page document on "Enabling Campus Resource Sharing and use of remote OSG resources in 15 minutes - Rob Gardner, Frank
Rob Gardner OSG Council Meeting June 25, 2015
"Enabling Campus Resource Sharing and use of remote OSG resources in 15 minutes - Rob Gardner, Frank
resource and user community
○ Track 1: “light, quick” ■ Sumit from Palmetto to OSG, and back ■ “Quick Connect” → OSG Connect to Palmetto via hosted Bosco service (ssh) ○ Track 2: “full OSG capabilities” ■ Full HT Condor CE, OASIS+Squid, {StashCache}
Working document: goo.gl/9aNkJs
Current opp limit on Palmetto Capped at 500 jobs due to PBS Pro limitation that prevents Clemson users in the general pool from (non-owners) preempting OSG users. Expect a fix in next release of PBS Pro so that OSG jobs can claim additional idle cycles
submitted from login.osgconnect.net
manage submission from a campus login host
hosted schedd
monitor, and account: ○ local campus allocation ○ XD allocation ○ Full integration with campus IDM & signup
Evaluating at:
CE services on behalf of campuses short of manpower
without requiring local OSG expertise
accounts for supported VOs
communicate the scale and reach of OSG to outside agencies/projects - Rob G, Bo, Clemmie
○ 67% size of XD, 35% BlueWaters ○ 2.5 Million CPU hours/day ○ 800M hours/year ○ 125M/y provided opportunistic
Rudi Eigenmann Program Director Division of Advanced Cyberinfrastructure (ACI) NSF CISE CASC Meeting, April 1, 2015
OSG as a campus research computing cluster
★ Login host ★ Job scheduler ★ Software (modules) ★ Storage ★ Tools
○ identical software on all clusters
continuously
campus bridging yum repo $ switchmodules oasis $ module avail $ module load R $ module load namd $ module purge $ switchmodules local
All software accesses by module are monitored centrally for support purposes
○ on login.osgconnect.net, on campus, or laptop
knowledge base
management of user documentation
○ Markdown tutorials → same place as code ○ tutorial write-ups track code samples closely ○ auto html and upload to help desk (in seconds)
Code and Markdown managed in Github Content indexed, searchable
User sees personal history of support requests to OSG .. and can drill down to see full interaction
make private notes,
for technical support tracking. Of course, all available via email: user- support@openscienc egrid.org Can DM tweet to @osgusers which generates a ticket
Uber-like feedback is collected (except we don’t rate users :)
programming using Python. IPython Notebook is used for instruction.
is a top source of delays and confusion.
server with login accounts that users retain indefinitely.
establishing a common baseline for the toolchain.
Developed a platform to launch per-user IPython Notebook servers:
1. User visits http://ipython.osgconnect.net and logs in.
IPython Docker container. Docker provides user and data isolation. Containers can be shut down and re-instantiated on demand.
provisioned IPython instance is
persistent and accessible via login.osgconnect.net.
to support 2015 OSG User School
Joint Software Carpentry & Open Science Grid Workshop at Duke University Distributed high throughput computation is concerned with using many computing resources potentially spread over large geographic distances and shared between organizations. These could be university research computing clusters, national leadership class HPC facilities, or public cloud resources. Incorporating these into science workflows can dramatically benefit your research program. However, to get the most of these systems requires some knowledge and skill in scientific computation. This workshop extends basic instruction on Linux programming from the Software Carpentry series with concepts and exercises on distributed high throughput computation. Participants will use resources of the Scalable Computing Support Center as well as the Open Science Grid, a national supercomputing-scale high throughput computing facility. There will be experts on hand to answer questions about distributed high throughput computing and whether it is a good fit for your science.
Cleveland (formal decision next week)
○ Distributed High Throughput Computation: a Campus Roundtable Discussion (Research Track)
Consortium, HPC Symposium (Aug 11-13, Boulder)
○ 30 minute slot shared with XSEDE
OSG as XD Provider to XSEDE
Stampede and Comet
○ 50k SU to Kettimuthu/ANL (CS: workflow modeling) ○ 100k SU to Qin/Spellman (class on gene networks) ○ 1.39M SU to Gull/UMich (PHYS: condensed matter)
○ → start a MD-HTC activity with ACI-REF?
○ Local submit to OSG validated at scale of 1500 jobs ○ Joint use of campus and OSG resources in same work environment ■ Model for other campuses, ACI-REF as channel ○ Quick Connect to share resources is functional
○ plan detailed studies of common application scaling properties & potential conversion to HTC workflow