Introductjon to EuroEXA EuroEXA: Co-designed Innovatjon and System - - PowerPoint PPT Presentation

introductjon to euroexa
SMART_READER_LITE
LIVE PREVIEW

Introductjon to EuroEXA EuroEXA: Co-designed Innovatjon and System - - PowerPoint PPT Presentation

Introductjon to EuroEXA EuroEXA: Co-designed Innovatjon and System for Resilient Exascale Computjng in Europe: From Applicatjons to Silicon Enrico Calore This project has received funding from the European INFN Ferrara, Italy Unions Horizon


slide-1
SLIDE 1

Enrico Calore INFN Ferrara, Italy enrico.calore@fe.infn.it

Introductjon to EuroEXA

EuroEXA: Co-designed Innovatjon and System for Resilient Exascale Computjng in Europe: From Applicatjons to Silicon

Advanced Workshop on Modern FPGA Based Technology for Scientjfjc Computjng

13/05/2019

This project has received funding from the European Union’s Horizon 2020 research and innovatjon programme under grant agreement No 754337

slide-2
SLIDE 2

Exascale: What does it really mean?

  • 1. 1 Billion Billion (10^18) FLOPs or equivalent
  • 2. €100M - €500M per system
  • 3. 20MW - 60MW of power
slide-3
SLIDE 3

EuroEXA response: concepts

  • Optjmizing the components of existjng systems is not efgectjve
  • We need a holistjc approach focusing on the entjre stack:
  • Technology
  • Components
  • Architecture
  • Infrastructure
  • System sofuware
  • Applicatjons
slide-4
SLIDE 4
  • Energy effjciency
  • Tight integratjon
  • Customized ARM processing chiplet and FPGA acceleratjon
  • Advanced cooling
  • Reduced Joules/bit transfer (memory compression, UNIMEM)
  • Scalability
  • Mitjgatjon of transfer cost (memory compression, UNIMEM, network geographic addressing)
  • Applicatjon co-design
  • Optjmized programming environment (runtjme systems, libraries, etc.)
  • Distributed storage on BeeGFS
  • Resilience
  • Whole system resiliency cost–benefjt analysis
  • ARM microarchitecture extensions
  • System sofuware extensions

EuroEXA in a nutshell

slide-5
SLIDE 5

EuroEXA partnership

Academic/Gov. Partners Commercial Partners Supporters

slide-6
SLIDE 6

What is EuroEXA?

Alliance of Multjple Projects, IP and Partners Other Partners Open Competjtjve Exercise Organised by the EU EU Funding & Monitoring Additjonal IP

slide-7
SLIDE 7

System architecture and technology

  • ARM Processing and FPGA DataFlow
  • UNIMEM Architecture with PGAS
  • Distributed Storage on BeeGFS
  • Memory Compression Technologies
  • Unique Hybrid Geographically-Addressed,

Switching and Topology Interconnect

slide-8
SLIDE 8

System architecture and technology: Compute node

Technology from FORTH:

  • 12cm x 13cm
  • 4 ARM Processors and 4 FPGA Accelerators
  • M.2 SSD
  • 4 x SODIMMS + Onboard RAM
  • Daughterboard style
  • 160Gb/s of I/O

Censored Censored Censored Censored

slide-9
SLIDE 9

EuroEXA node architecture: some details

FPGA UNIMEM EXANET ARM ISA MEM MEM

  • We employ FPGAs as our compute accelerator
  • We innovate around the ARM ISA HW and SW ecosystem
  • We scale-up with EXANET a low-latency, HPC network
  • We support Global Shared Address Space (GSAS) with UNIMEM

EuroEXA node architecture

slide-10
SLIDE 10

Applicatjon portjng and optjmizatjon

FPGA UNIMEM EXANET ARM ISA MEM MEM NEMO SMURFF NEST/DPSNN InfOli FRTM Alya LFRic IFS LBM GADGET AVU-GSR Neuromarketjng Quantum Espresso Astronomy image classifjcatjon

14 applicatjons being ported and optjmized for ARM + FPGA

slide-11
SLIDE 11

Co-design, demonstratjon and evaluatjon using exascale-class apps

NEMO

IOPS Mem BW Mem capacity

SMURFF NEST/DPSNN InfOli FRTM Alya LFRic IFS LBM GADGET AVU-GSR Neuromarketjng Quantum Espresso

Astronomy image classifjcatjon

FLOPS

slide-12
SLIDE 12

Co-design, demonstratjon and evaluatjon using exascale-class apps

NEMO

IOPS Mem BW Mem capacity

SMURFF NEST/DPSNN InfOli FRTM Alya LFRic IFS LBM (collide) GADGET AVU-GSR Neuromarketjng Quantum Espresso

Astronomy image classifjcatjon

FLOPS

slide-13
SLIDE 13

Co-design, demonstratjon and evaluatjon using exascale-class apps

NEMO

IOPS Mem BW Mem capacity

SMURFF NEST/DPSNN InfOli FRTM Alya LFRi c IFS LBM (propagate) GADGET AVU-GSR Neuromarketjng Quantum Espresso

Astronomy image classifjcatjon

FLOPS

slide-14
SLIDE 14

Co-design, demonstratjon and evaluatjon using exascale-class apps

NEMO

IOPS Mem BW Mem capacity

SMURFF NEST/DPSNN InfOli FRTM Alya LFRic IFS LBM GADGET AVU-GSR Neuromarketjng Quantum Espresso

Astronomy image classifjcatjon

FLOPS

slide-15
SLIDE 15

Applicatjons

Working together with a rich mix of key HPC applicatjons from across:

  • climate/weather, (ECMWF/STFC/UoM)
  • physics/energy (INAF/INFN/Fraunhofer)
  • life-science/bioinformatjcs (Neurasmus/Synelixis/BSC/IMEC)
slide-16
SLIDE 16

System sofuware

  • OS and system SW adaptatjons
  • Device drivers, hyperconverged storage
  • Programming runtjmes extensions
  • MPI, task-based distributed-memory programming, FPGA programming
  • Resource allocatjon optjmizatjon
slide-17
SLIDE 17

How to program FPGAs?

Traditjonal low level approaches are diffjcult to embrace for Scientjfjc HPC applicatjons (e.g. using Hardware Defjnitjon Languages) HPC scientific applications has to adapt to specific characteristics:

  • Software lifetime may be very long; even tens of years.
  • Software must be portable across current and future HPC hardware

architectures, which are very heterogeneous.

  • Software has to be strongly optimized to exploit the available hardware

for better performances.

slide-18
SLIDE 18

Directjve based approach

  • Code modifjcatjons could be minimal thanks to the annotatjon of pre-

existjng C code using #pragma directjves.

  • Programming efgorts needed mainly to re-organize the data structures

and to effjciently design data movements.

  • If protjng is needed, programming efgorts would not be lost:

– Also other directjve based languages would benefjt from data re-

  • rganizatjon and effjciently designed data movements.

– Switching between directjve based languages should be just a

matuer of changing #pragma directjves.

slide-19
SLIDE 19

For CPUs and GPUs

  • OpenMP

Widely used for CPU multj-threading (lately supports also GPUs/accelerators)

  • OpenACC

Introduced for GPUs/accelerators

slide-20
SLIDE 20

For FPGAs...

High Level Synthesis approachs (aka Algorithmic-based) are gettjng popular due to accelerated design tjme and tjme to market: HLS (High Level Synthesis)

slide-21
SLIDE 21

One model for all: OmpSs

The OmpSs Programming Model

Programming model developed at BSC to extend OpenMP with new directives to support asynchronous parallelism and heterogeneity, including devices like GPUs and FPGAs. For Xilinx FPGAs, it relies on the Vivado HLS compiler.