Habanero Operating Committee Spring 2018 Meeting March 6, 2018 - PowerPoint PPT Presentation

Habanero Operating Committee Spring 2018 Meeting March 6, 2018 Meeting Called By: Kyle Mandli , Chair

Introduction George Garrett Manager, Research Computing Services shinobu@columbia.edu The HPC Support Team Research Computing Services hpc-support@columbia.edu

Agenda 1. Habanero Expansion Update 2. Storage Expansion 3. Additional Updates 4. Business Rules 5. Support Services 6. Current Usage 7. HPC Publications Reporting 8. Feedback

Habanero

Habanero - Ways to Participate Four Ways to Participate 1. Purchase 2. Rent 3. Free Tier 4. Education Tier

Habanero Expansion Update Habanero HPC Cluster • 1st Round Launched in 2016 with 222 nodes (5328 cores) • Expansion nodes went live on December 1st, 2017 – Added 80 more nodes (1920 cores) – 12 New Research Groups onboarded • Total: 302 nodes (7248 cores) after expansion

Habanero Expansion Equipment • 80 nodes (1920 cores) – Same CPUs (24 cores per server) – 58 Standard servers (128 GB) – 9 High Memory servers (512 GB) – 13 GPU servers each with 2 x Nvidia P100 modules • 240 TB additional storage purchased

Compute Nodes - Types (Post-Expansion) Type Quantity Standard 234 High Memory 41 GPU Servers 27 Total 302

Head Nodes 2 Submit nodes • Submit jobs to compute nodes 2 Data Transfer nodes (10 Gb) scp, rdist, Globus • 2 Management nodes • Bright Cluster Manager, Slurm

HPC - Visualization Server • Remote GUI access to Habanero storage • Reduce need to download data • Same configuration as GPU node (2 x K80) • NICE Desktop Cloud Visualization software

Habanero Storage Expansion (Spring 2018) • Researchers purchased around 100 TB additional storage • Placing order with vendor (DDN) in March • Install new drives after purchasing process completes • Total Habanero storage after expansion: 740 TB Contact us if you need quota increase prior to equipment delivery.

Additional Updates • Scheduler upgrade – Slurm 16.05 to 17.2 – More efficient – Bug fixes • New test queue added – High priority queue dedicated to interactive testing – 4 hour max walltime – Max 2 jobs per user • Jupyterhub and Docker being piloted – Contact us if interested in testing

Additional Updates (Continued) • Yeti cluster updates – Yeti round 1 was retired in November 2017 – Yeti round 2 slated for retirement in March 2019 New HPC cluster • – RFP process – Purchase round to commence in late Spring 2018

Business Rules • Business rules set by Habanero Operating Committee • Any rules that require revision can be adjusted • If you have special requests, i.e. longer walltime or temporary bump in priority or resources, contact us and we will raise with the Habanero OC chair as needed

Nodes For each account there are three types of execute nodes 1. Nodes owned by the account 2. Nodes owned by other accounts 3. Public nodes

Nodes 1. Nodes owned by the account – Fewest restrictions – Priority access for node owners

Nodes 2. Nodes owned by other accounts – Most restrictions – Priority access for node owners

Nodes 3. Public nodes – Few restrictions – No priority access Public nodes: 25 total (3 GPU, 3 High Mem, 19 Standard)

Job wall time limits • Your maximum wall time is 5 days on nodes your group owns and on public nodes • Your maximum wall time on other group's nodes is 12 hours

12 Hour Rule • If your job asks for 12 hours of walltime or less, it can run on any node • If your job asks for more than 12 hours of walltime, it can only run on nodes owned by its own account or public nodes

Fair share • Every job is assigned a priority • Two most important factors in priority 1. Target share 2. Recent use

Target Share • Determined by number of nodes owned by account • All members of account have same target share

Recent Use • Number of cores*hours used "recently" • Calculated at group and user level • Recent use counts for more than past use • Half-life weight currently set to two weeks

Job Priority • If recent use is less than target share, job priority goes up • If recent use is more than target share, job priority goes down • Recalculated every scheduling iteration

Business Rules Questions regarding business rules?

Support Services Email support hpc-support@columbia.edu

User Documentation • hpc.cc.columbia.edu • Click on "Habanero Documentation" • https://confluence.columbia.edu/confluence/display/rcs/Hab anero+HPC+Cluster+User+Documentation

Office Hours HPC support staff are available to answer your Habanero questions in person on the first Monday of every month. Where: Science & Engineering Library, NWC Building When: 3-5 pm first Monday of the month RSVP is required: https://goo.gl/forms/v2EViPPUEXxTRMTX2

Group Information Sessions HPC support staff can come and talk to your group Topics can be general and introductory or tailored to your group. Contact hpc-support to discuss setting up a session.

Support Services Questions regarding support services?

Cluster Usage (As of 03/01/2018) • 44 Groups • 1080 Users • 7 Renters • 63 Free tier users • Education tier – 9 courses since launch – 5 courses in Spring 2018 • 2,097,172 Jobs Completed

Job Size Cores 1 - 49 cores 50 - 249 250 - 499 500 - 999 1000+ cores cores cores cores Jobs 2,088,654 5,894 1,590 479 555

Cluster Usage in Core Hours

Group Utilization

HPC Publications Reporting • Research conducted on the Habanero, Yeti, and/or Hotfoot machines has led to over 100 peer-reviewed publications in top-tier research journals. • To report new publications utilizing one or more of these machines, please email srcpac@columbia.edu

Feedback? Any feedback about your experience with Habanero?

End of Slides Questions? User support: hpc-support@columbia.edu

Habanero Operating Committee Spring 2018 Meeting March 6, 2018 - PowerPoint PPT Presentation

Habanero Operating Committee Spring 2018 Meeting March 6, 2018 Meeting Called By: Kyle Mandli , Chair Introduction George Garrett Manager, Research Computing Services shinobu@columbia.edu The HPC Support Team Research Computing Services

Habanero Operating Committee January 25 2017 Habanero Overview 1. Execute Nodes 2. Head Nodes

Fine-grained parallelism in probabilistic parsing with Habanero Java Matthew Francis-Landau 1 ,

Introduc)on to Habanero Java David Bunde, Jaime Spacco, Casey

Prasanth Chatarasi PhD Thesis Defense Habanero Extreme Scale Software Research Group School of

Common Subexpression Convergence (CSC) Sana Damani and Vivek Sarkar Habanero Extreme Scale

Operating Systems Operating Systems CMPSC 473 CMPSC 473 Operating Systems Structure Operating

Operating Systems Operating Systems CMPSC 473 CMPSC 473 Operating Systems Structure Operating

CPS 210: Operating Systems CPS 210: Operating Systems Operating Systems: The Big Picture

Operating Systems Operating Systems CMPSC 473 CMPSC 473 Operating Systems Structure Operating

FY09 Operating Plan and Budget FY09 Operating Plan and Budget FY09 Operating Plan and Budget

Operating Systems WT 2019/20 Abridged History of Operating Systems Something to Ponder What is

Shared Research Computing Policy Advisory Committee Spring 2018 Meeting Monday, April 16, 2018

Draft Operating Budget Committee of the Whole January 7, 2019 Operating Budget 2019-2023

Unique Aspects of Operating in an Airport 1 Operating a Business in an Airport . . . Its

Town Finance: the Operating Budget and the Capital Budget TOWN FORUM OCTOBER 23, 2014 Operating

2012 Operating Budget TOWN OF PELHAM A vibrant, creative and caring community 2012 Operating

User Tools and Languages for Graph-based Grid Workflows User Tools and Languages for Graph-based

Workflow Integrated Network Resource Orchestration Phil Wang, Inder Monga, Satish Raghunath,

How to Use HPC AI500 Zihan Jiang, Xingwang Xiong, Tianshu Hao, and Jianfeng Zhan INSTITUTE O

Fusion Research in Ioffe Institute L.G.Askinazi On behalf of FT-2, Globus-M, TUMAN-3M,

Data Transfers in the Grid: Data Transfers in the Grid: Workload Analysis of Globus Globus

IPv6 and the Grid Work in Progress S.Bhatti, P.Kirstein, S.Venaas, P.OHanlon and S. Jiang

Data Management Network transfers Network data transfers Not everyone needs to transfer large

Grid Computing with Debian, Globus Grid Computing with Debian, Globus and ARC and ARC Mattias

Sambuz

Useful Links

Newsletter

Mail Us