Tier 2 Computer Centres CSD3 Cambridge Service for Data Driven - - PowerPoint PPT Presentation

tier 2 computer centres csd3
SMART_READER_LITE
LIVE PREVIEW

Tier 2 Computer Centres CSD3 Cambridge Service for Data Driven - - PowerPoint PPT Presentation

Tier 2 Computer Centres CSD3 Cambridge Service for Data Driven Discovery www.hpc-uk.ac.uk Tier 2 Computer Centres A community resource.founded on cooperation and collaboration Each centre will give a short introduction covering


slide-1
SLIDE 1

Tier 2 Computer Centres

www.hpc-uk.ac.uk

CSD3

Cambridge Service for Data Driven Discovery

slide-2
SLIDE 2

Tier 2 Computer Centres

A community resource……….founded on cooperation and collaboration

Each centre will give a short introduction covering (some of):

  • USP
  • Contact Details
  • Hardware
  • Access Mechanisms
  • RSE Support

Open Access Call – 12th Oct (Technical Assessment – 21st Sep)

https://www.epsrc.ac.uk/funding/calls/tier2openaccess/

slide-3
SLIDE 3

Andy Turner, EPCC a.turner@epcc.ed.ac.uk

slide-4
SLIDE 4

280 node HPE(SGI) ICE XA:

  • 10,080 cores (2 18-core Xeon per node)
  • 128 GiB memory per node
  • DDN Lustre file system
  • Single rail FDR Infiniband hypercube

1.9 PiB Tier-2 Data Facility:

  • DDN Web Object Scalar Appliances
  • Link to other Tier-1/2 facilities

Callum Bennetts/Maverick Photography

Simple access routes

  • Free Instant Access for testing
  • (Driving Test access coming soon)
  • EPSRC RAP: Open Access Call

http://www.cirrus.ac.uk

slide-5
SLIDE 5

Cirrus RSE Support

User Support

  • Freely available to all users from

any institution

  • Provided by EPCC experts in a

wide range of areas

  • Easily accessed through helpdesk:

just ask for the help you need

  • Help provided directly to researcher
  • r to RSE working with researchers

Technical Projects

  • Explore new technologies,

software, tools

  • Add new capabilities to Cirrus
  • Benchmark and profile commonly

used applications

  • Work with user community and
  • ther RSE’s

Keen to work with RSE’s at other institutions to help them support local users on Cirrus

slide-6
SLIDE 6

http://gw4.ac.uk/isambard James Price, University of Bristol j.price@bristol.ac.uk

slide-7
SLIDE 7

The System

  • Exploring Arm processor technology
  • Provided by Cray
  • 10,000+ ARMv8 cores
  • Cray software tools
  • Compiler, math libraries, tools...
  • Technology comparison:
  • x86, Xeon Phi (KNL), NVIDIA P100 GPUs
  • Sonexion 3000 SSU (~450 TB)
  • Phase 1 installed March 2017
  • The Arm part arrives early 2018
  • Early access nodes from September 2017
slide-8
SLIDE 8

User Support

  • 4 x 0.5 FTEs from GW4 consortium
  • Cray/Arm centre of excellence
  • Training (porting/optimising for Arm)
  • Hackathons

Target codes

  • Will focus on the main

codes from ARCHER

  • Already running on Arm:
  • VASP
  • CP2K
  • GROMACS
  • Unified Model (UM)
  • OpenFOAM
  • CloverLeaf
  • TeaLeaf
  • SNAP
  • Many more codes

ported by the wider Arm HPC user community Access

  • 25% of the machine time will be available

to users from the EPSRC community

  • EPSRC RAP: Open Access Call
slide-9
SLIDE 9

HPC Midlands Plus

www.hpc-midlands-plus.ac.uk

  • Prof. Steven Kenny

Loughborough University s.d.kenny@lboro.ac.uk

slide-10
SLIDE 10

Centre Facilities

  • System supplied by Clustervision-Huawei
  • x86 system
  • 14,336 x86 cores
  • consisting of 512 nodes each with
  • 2 x Intel Xeon E5-2680v4 cpus with 14 cores per cpu
  • 128 GB RAM per node
  • 3:1 blocking EDR Infiniband network
  • giving 756 core non-blocking islands
  • 1 PB GPFS filestore
  • 15% of the system made available by EPSRC RAP and seedcorn

time

slide-11
SLIDE 11

Centre Facilities

  • OpenPower System
  • 5 x (2 x 10) core 2.86 GHz POWER8 systems each with 1 TB RAM

connected to the Infiniband network

  • one with 2 x P100 GPGPUs
  • Dedicated 10 TB SSD GPFS filestore for prestaging files
  • Aim of the system is threefold
  • Data analysis of large datasets
  • Test bed for codes that are memory bandwidth limited
  • On-the-fly data processing
  • Comprehensive software stack installed

www.hpc-midlands-plus.ac.uk/software-list

  • 4 FTE RSE support for academics at consortium Universities
slide-12
SLIDE 12

Dr Paul Richmond EPSRC Research Software Engineering Fellow http://www.jade.ac.uk

slide-13
SLIDE 13

The JADE System

  • 22 NVIDIA DGX-1
  • 3.740 PetaFLOPs (FP16)
  • 2.816 Terabytes HBM GPU Memory
  • 1PB filestore
  • P100 GPUs - Optimised for Deep Learning
  • NVLink between devices
  • PCIe to Host (dense nodes)
  • Use cases
  • 50% ML (Deep Learning)
  • 30% MD
  • 20% Other
slide-14
SLIDE 14

Hosting and Access

  • ATOS have been selected as the provider
  • Following procurement committees review from tender
  • Running costs to be recouped through selling time to industrial users
  • Hosted at STFC Daresbury
  • Will run SLURM scheduler for scheduling at the node level
  • Resource allocation
  • Open to all without charge
  • Some priority to supporting institutions
  • Light touch review process (similar to DiRAC)
slide-15
SLIDE 15

Governance and RSE Support

  • All CIs have committed RSE support time for their local institutions
  • To support local users of JADE system
  • Training: Some commitment to training offered by come CIs (EPCC, Paul

Richmond EPSRC RSE Fellow)

  • Organisation Committee: RSE Representative from each institution
  • Software Support and Requests via Github issue tracker
  • Governance via steering committee
  • Responsible for open calls

http://docs.jade.ac.uk

slide-16
SLIDE 16

Tier 2 Hub in Materials and Molecular Modelling (MMM Hub) Thomas

www.thomasyoungcentre.org

slide-17
SLIDE 17

Rationale for a Tier 2 Hub in MMM

  • Increased growth in UK MMM research

created an unprecedented need for HPC, particularly for medium-sized, high- throughput simulations

  • These were predominantly run on

ARCHER (30% VASP). Tier 3 sources were too constrained

  • The aim of the installation of “Thomas”

was to rebalance the ecosystem for the MMM community

  • It has created a UK-wide Hub for MMM

that serves the entire UK MMM community

  • The Hub will build a community to foster

collaborative research and the cross- fertilisation of ideas

  • Support and software engineering training

is offered

slide-18
SLIDE 18

“Thomas” Cluster 17,280 cores, 720 nodes; 24 cores/node, 128GB RAM/node

… …

x16 OSS x16 x16 OSS x16

Intel OPA

1:1 36 node blocks 3:1 between blocks

x16 slot

Thomas scratch (428TB) home and software

Thomas Service Architecture

www.thomasyoungcentre.org

Performance

  • Technical performance
  • 523.404 Tflop/s
  • 5.5 GiB/s IO bandwidth
slide-19
SLIDE 19

Access and Sharing

  • Access models/mechanisms:
  • 75% of machine cycles are available to the

university partners providing funding for Thomas’ hosting and operations costs

  • Funding partners Imperial, King’s, QMUL and

UCL, Belfast, Kent, Oxford, Southampton

  • 25% of cycles are available to the wider UK

MMM Community

  • Allocations to non-partner researchers and

groups across the UK will be handled via existing consortia (MCC & UKCP), not T2 RAC

  • Tier 2 – 1 integration via SAFE will be developed
  • ver the coming year

www.thomasyoungcentre.org

slide-20
SLIDE 20
  • Coordinator (Karen Stoneham)

based at the TYC

  • UCL RITS Research Computing

Team support (x9)

  • Online training & contact details
  • User group oversee service at

regular meetings

  • ‘Points of Contact’ at each partner

Institution managing allocations and account approval

Thomas Support Team

www.thomasyoungcentre.org

slide-21
SLIDE 21

CSD3

Cambridge Service for Data Driven Discovery

www.csd3.cam.ac.uk Mike Payne, University of Cambridge resources@csd3.cam.ac.uk

slide-22
SLIDE 22

CSD3

Cambridge Service for Data Driven Discovery

USPs

  • co-locate ‘big compute’ and ‘big data’
  • facilitate complex computational tasks/workflows

Hardware

  • 12,288 cores (2 x 16 core Intel Skylake/384 GB

per node)

  • 12,288 cores (2 x 16 core Intel Skylake/192 GB

per node)

  • 342 Intel Knights Landing/96 GB
  • Intel Omnipath
  • 90 x Intel Xeon/4 Nvidia P100 (16GByte)/96 GB
  • EDR Infiniband
  • 50 node Hadoop cluster
  • Hierarchical storage (burst buffers/SSDs/etc)
  • 5 PB disk + 10PB tape

www.csd3.cam.ac.uk

slide-23
SLIDE 23

Access Mechanisms

  • Pump priming/Proof of Concept
  • EPSRC Open Access
  • EPSRC Grants (other RCs?)
  • Cash (for academic/industrial/commercial

users)

resources@csd3.cam.ac.ukl

Aspirations

It is our intention that over the lifetime of the CSD3 service an increasing proportion of the computational workload will be more complex computational tasks that exploit multiple capabilities on the system. You, as RSEs, are uniquely placed to develop new computational methodologies, along with the innovative researchers you know. The CSD3 system is available to you for developing and testing your methodology and for demonstrating its capability.

RSE Support

  • Led by Filippo Spiga
  • 3 FTEs (plus additional support in some of
  • ur partner institutions)
  • Collaborative/cooperative support model

CSD3