but also prof. dr. Andrej Filipi , IJS, UNG prof. dr. Borut P . - - PowerPoint PPT Presentation

but also
SMART_READER_LITE
LIVE PREVIEW

but also prof. dr. Andrej Filipi , IJS, UNG prof. dr. Borut P . - - PowerPoint PPT Presentation

Joef Stefan Institute SLING - Slovenian Supercomputing Network Site Report for NDGF all Hands 2017 Barbara Kroovec Jan Jona Javorek http://www.arnes.si http://www.ijs.si/ barbara.krasovec@arnes.si http://www.sling.si/


slide-1
SLIDE 1

SLING - Slovenian Supercomputing Network

Site Report for NDGF all Hands 2017

Barbara Krošovec Jan Jona Javoršek

barbara.krasovec@arnes.si jona.javorsek@ijs.si

Jožef Stefan Institute

http://www.arnes.si http://www.ijs.si/ http://www.sling.si/

slide-2
SLIDE 2

2/25

but also

  • prof. dr. Andrej Filipčič, IJS, UNG
  • prof. dr. Borut P

. Kerševan, Uni Lj, IJS Dejan Lesjak, IJS Peter Kacin, Arnes Matej Žerovnik, Arnes

slide-3
SLIDE 3

SLING

a small national grid initiative

slide-4
SLIDE 4

4/25

SLING

  • SiGNET at Jožef Stefan Institute

EGEE, since 2004

  • Arnes and Jožef Stefan Institute

EGI, since 2010

  • full EGI membersip, no EGI Edge
  • 3 years of ELIXIR collaboration
  • becoming a consorcium:

PRACE, EUDAT

  • T

asks: core services, integration, site support, user support etc.

slide-5
SLIDE 5

5/25

SLING Consortium

Bringing everyone in ...

slide-6
SLIDE 6

6/25

Collaboration

CERN, Belle2, Pierre Auger ...

slide-7
SLIDE 7

7/25

SLING

Current Centres

Arctur Arnes atos@ijs CIPKeBiP NSC@ijs SiGNET@ijs UNG krn@ijs ARSO CI FE

Current Centres

Arctur Arnes atos@ijs CIPKeBiP NSC@ijs SiGNET@ijs UNG krn@ijs ARSO CI FE

 7 centres  over 22.000 cores  over 4PB storage  over 6 million jobs/y  HPC, GPGPU, VM

slide-8
SLIDE 8

8/25

Arnes: demo, testing, common

  • national VOs

(generic, domain) ATLAS

  • registered with EGI
  • 2 locations
  • Nordugrid ARC
  • SLURM (no CreamCE)
  • LHCOne, GÉANT

CLUSTER DATA SHEET 4500 cores alltogether: majority HPC-enabled 3 CUDA GPGPU units ~6T RAM

slide-9
SLIDE 9

9/25

„New“ space

196 m2, in-row cooling (18/77 racks)

slide-10
SLIDE 10

10/25

SiGNET: HPC/Atlas at Jožef Stefan

  • since 2004
  • ATLAS, Belle2
  • ARC, gLite with SLURM
  • LHCone AT-NL-DK

GÉANT(both 10 Gbit/s)

  • 3 x dCache servers:

132 GB mem, 10 Gb/s 2 x 60 x 6 TB

  • 3 x cache NFS à 50 TB

CLUSTER DATA SHEET

  • 5280 cores
  • 64-core AMD Opteron

256 GB 1 TB disk 1 Gb/s

  • schrooted RTEs →

Singularity HPC

  • ver recent Gentoo
slide-11
SLIDE 11

11/25

SiGNET: more

  • additional dCache:

– 2 servers à 400 TB – Belle: independent dCache 2 x 200 TB

(mostly waiting for the move)

  • services:

– 1 squid for frontier + CVMFS – 1 production ARC-CE – 3 cache servers also data transfer servers

for ARC

– all supportin serfers in VMs (cream-CE, site

bdii, apel, test ARC-CE)

slide-12
SLIDE 12

12/25

LHCone and GÉANT

  • LHCone: 30 Gbit/s (20 IJS)
  • Géant: 40 Gbit/s
slide-13
SLIDE 13

13/25

NSC@ijs: institute / common

  • same VOs + IJS
  • not registered with EGI
  • under full load ...
  • lots of spare room
  • Nordugrid ARC
  • SLURM
  • LHCOne, GÉANT

CLUSTER DATA SHEET 1980 cores alltogether: all HPC-enabled 16 CUDA GPGPU units Nvidia K40 ~1T RAM

slide-14
SLIDE 14

14/25

Other

progeria

Reactor process simulations Encyme Activation

slide-15
SLIDE 15

15/25

Supported Users 2015

  • high energy physics
  • computer science
  • astrophysics
  • computational chemistry
  • mathematics
  • bioinformatics, genetics
  • material science
  • language technologies
  • multimedia
slide-16
SLIDE 16

16/25

Supported Users 2017

  • Machine Learning, Deep Learning and

MonteCarlo over many felds,

  • ften on GPGU
  • computer science (with above)
  • genetics (Java ⇾ R), bioinformatics,
  • computational chemistry (also GPGPU)
  • high energy physics ,astrophysics
  • mathematics, language technologies
  • material science, multimedia
slide-17
SLIDE 17

17/25

Main Diferences

  • University Curriculum (CS)

involvement

  • Critical usage (genetics)
  • More complex software

deployments

  • Ministry interest and support
slide-18
SLIDE 18

18/25

Modus Operandi @ SLING

  • ARC Client used extensively

scripts + ARC Runner etc

  • Many single users with

complicated setups: GPGU etc

  • Some groups with critical

tasks: medical, research,

industrial

slide-19
SLIDE 19

19/25

Technical Plans / Wishes

  • Joint national Puppet
  • RTEs+Singularity

national CVMS (also user RW pools)

  • Joint Monitoring

Icinga + Grafana

  • Advanced Web Job Status T
  • ol

GridMonitor++

  • ARC Client improvements
slide-20
SLIDE 20

20/25

RTEs + Singularity

portable images & HW support, repositories, Docker compatibility, GPGU integration ...

More in the following days

slide-21
SLIDE 21

21/25

Joint Monitoring Web Status

  • Currently separate similar

solutions – and no access for users

  • A national (or wider) solution

wanted

  • Web Status tool for user on a

similar level + more info!!

slide-22
SLIDE 22

22/25

Web Job Status Tool

  • RTE/Singularity info

(in InfoSys too)

  • HW Details, specifcally

RAM and GPGPU consumption

  • Queue Lenght

and Scheduling Info

  • Stats for User's Jobs
slide-23
SLIDE 23

23/25

ARC CE Wishlist

  • GPGPU info

in accounting and InfoSys

  • ARC CE load balancing + HA

~ failover mode

  • testing environment / setup
slide-24
SLIDE 24

24/25

Questions?

Andrej Filipčič, IJS, UNG Borut Paul Kerševan, IJS, FMF Barbara Krašovec, IJS Dejan Lesjak, IJS Janez Srakar, IJS Jan Jona Javoršek, IJS Matej Žerovnik, Arnes Peter Kacin, Arnes

info@sling.si http://www.sling.si/

slide-25
SLIDE 25

25/25

Arc Client Improvements

  • More bug fxes and error

docs... (THANKS!)

  • Python/ACT
  • a Wish List:

– Stand-Alone, Docker/Singularity – GPGU/CPU type selectors – MacOS client (old and sad)

(workaround done)