but also
play

but also prof. dr. Andrej Filipi , IJS, UNG prof. dr. Borut P . - PowerPoint PPT Presentation

Joef Stefan Institute SLING - Slovenian Supercomputing Network Site Report for NDGF all Hands 2017 Barbara Kroovec Jan Jona Javorek http://www.arnes.si http://www.ijs.si/ barbara.krasovec@arnes.si http://www.sling.si/


  1. Jožef Stefan Institute SLING - Slovenian Supercomputing Network Site Report for NDGF all Hands 2017 Barbara Krošovec Jan Jona Javoršek http://www.arnes.si http://www.ijs.si/ barbara.krasovec@arnes.si http://www.sling.si/ jona.javorsek@ijs.si

  2. but also prof. dr. Andrej Filipčič , IJS, UNG prof. dr. Borut P . Kerševan , Uni Lj, IJS Dejan Lesjak, IJS Peter Kacin, Arnes Matej Žerovnik, Arnes 2/25

  3. SLING a small national grid initiative

  4. SLING ● SiGNET at Jožef Stefan Institute EGEE, since 2004 ● Arnes and Jožef Stefan Institute EGI, since 2010 ● full EGI membersip, no EGI Edge ● 3 years of ELIXIR collaboration ● becoming a consorcium: PRACE, EUDAT ● T asks: core services, integration, site support, user support etc. 4/25

  5. SLING Consortium Bringing everyone in ... 5/25

  6. Collaboration CERN, Belle2, Pierre Auger ... 6/25

  7. SLING Current Centres Current Centres Arctur  7 centres Arctur Arnes Arnes  over 22.000 cores atos@ijs atos@ijs  over 4PB storage CIPKeBiP CIPKeBiP NSC@ijs NSC@ijs  over 6 million jobs/y SiGNET@ijs SiGNET@ijs  HPC, GPGPU, VM UNG UNG krn@ijs krn@ijs ARSO ARSO CI CI FE FE 7/25

  8. Arnes: demo, testing, common ● national VOs CLUSTER DATA SHEET (generic, domain) 4500 cores alltogether: ATLAS majority HPC-enabled ● registered with EGI ● 2 locations 3 CUDA GPGPU units ● Nordugrid ARC ~6T RAM ● SLURM (no CreamCE) ● LHCOne, GÉANT 8/25

  9. „New“ space 196 m 2 , in-row cooling (18/77 racks) 9/25

  10. SiGNET: HPC/Atlas at Jožef Stefan ● since 2004 CLUSTER DATA SHEET ● ATLAS, Belle2 ● 5280 cores ● ARC, gLite with SLURM ● 64-core AMD Opteron ● LHCone AT-NL-DK 256 GB GÉANT(both 10 Gbit/s) 1 TB disk 1 Gb/s ● 3 x dCache servers: ● schrooted RTEs → 132 GB mem, 10 Gb/s 2 x 60 x 6 TB Singularity HPC over recent Gentoo ● 3 x cache NFS à 50 TB 10/25

  11. SiGNET: more ● additional dCache: – 2 servers à 400 TB – Belle: independent dCache 2 x 200 TB (mostly waiting for the move) ● services: – 1 squid for frontier + CVMFS – 1 production ARC-CE – 3 cache servers also data transfer servers for ARC – all supportin serfers in VMs (cream-CE, site bdii, apel, test ARC-CE) 11/25

  12. LHCone and GÉANT ● LHCone: 30 Gbit/s (20 IJS) ● Géant: 40 Gbit/s 12/25

  13. NSC@ijs: institute / common ● same VOs + IJS CLUSTER DATA SHEET ● not registered with EGI 1980 cores alltogether: ● under full load ... all HPC-enabled ● lots of spare room 16 CUDA GPGPU units ● Nordugrid ARC Nvidia K40 ● SLURM ~1T RAM ● LHCOne, GÉANT 13/25

  14. Other progeria Reactor process simulations 14/25 Encyme Activation

  15. Supported Users 2015 ● high energy physics ● computer science ● astrophysics ● computational chemistry ● mathematics ● bioinformatics, genetics ● material science ● language technologies ● multimedia 15/25

  16. Supported Users 2017 ● Machine Learning, Deep Learning and MonteCarlo over many felds, often on GPGU ● computer science (with above) ● genetics (Java ⇾ R), bioinformatics, ● computational chemistry (also GPGPU) ● high energy physics ,astrophysics ● mathematics, language technologies ● material science, multimedia 16/25

  17. Main Diferences ● University Curriculum (CS) involvement ● Critical usage (genetics) ● More complex software deployments ● Ministry interest and support 17/25

  18. Modus Operandi @ SLING ● ARC Client used extensively scripts + ARC Runner etc ● Many single users with complicated setups: GPGU etc ● Some groups with critical tasks: medical, research, industrial 18/25

  19. Technical Plans / Wishes ● Joint national Puppet ● RTEs+Singularity national CVMS (also user RW pools) ● Joint Monitoring Icinga + Grafana ● Advanced Web Job Status T ool GridMonitor++ ● ARC Client improvements 19/25

  20. RTEs + Singularity portable images & HW support, repositories, Docker compatibility, GPGU integration ... More in the following days 20/25

  21. Joint Monitoring Web Status ● Currently separate similar solutions – and no access for users ● A national (or wider) solution wanted ● Web Status tool for user on a similar level + more info!! 21/25

  22. Web Job Status Tool ● RTE/Singularity info (in InfoSys too) ● HW Details, specifcally RAM and GPGPU consumption ● Queue Lenght and Scheduling Info ● Stats for User's Jobs 22/25

  23. ARC CE Wishlist ● GPGPU info in accounting and InfoSys ● ARC CE load balancing + HA ~ failover mode ● testing environment / setup 23/25

  24. Questions? Andrej Filipčič, IJS, UNG Borut Paul Kerševan, IJS, FMF Barbara Krašovec, IJS Dejan Lesjak, IJS Janez Srakar, IJS Jan Jona Javoršek, IJS Matej Žerovnik, Arnes Peter Kacin, Arnes info@sling.si http://www.sling.si/ 24/25

  25. Arc Client Improvements ● More bug fxes and error docs... (THANKS!) ● Python/ACT ● a Wish List: – Stand-Alone, Docker/Singularity – GPGU/CPU type selectors – MacOS client (old and sad) (workaround done) 25/25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend