Computing on FPGA S. F . Schifano University of Ferrara and - PowerPoint PPT Presentation

Computing on FPGA S. F . Schifano University of Ferrara and INFN-Ferrara Advanced Workshop on Modern FPGA Based Technology for Scientific Computing May 14, 2019 ICTP , Trieste, Italy S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 1 / 35

Outline 1 Introduction Spin Glass Models 2 The Janus Project 3 4 Spin Glass Implementation on Janus 5 Spin Glass Simulations on commodity processors S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 2 / 35

Background: Let me introduce myself Development of computing systems optimized for computational physics: APEmille and apeNEXT: LQCD-machines, FPGA used to interface APE with standard commodity CPUs AMchip: pattern matching processor, installed at CDF, FPGAs to control configuration of the system Janus I+II: FPGA-based system for spin-glass simulations QPACE: Cell-based machine, mainly for LQCD apps, Network processor on FPGA AuroraScience: multi-core based machine, Network processor on FPGA EuroEXA: hybrid ARM+FPGA exascale system, accelerator on FPGA S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 3 / 35

APEmille e apeNEXT (2000 and 2004) a × b + c a , b , c ∈ C S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 4 / 35

Janus I (2007) 256 FPGAs 16 boards 8 host PC Monte Carlo simulations of Spin Glass systems S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 5 / 35

QPACE Machine (2008) Processor IBM PowerXCell8i, enhanced version of PS3 8 backplanes per rack 256 nodes (2048 cores) 16 root-cards 8 cold-plates 26 Tflops peak double-precision 35 KWatt maximum power consumption 773 MFLOPS / Watt TOP-GREEN 500 in Nov.’09 and July’10 S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 6 / 35

Aurora Machine (2008) S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 7 / 35

Janus II (2012) S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 8 / 35

Spin-Glass The Spin-glass is a statistic model to study some behaviours of complex macroscopic systems like disordered magnetic materials . An apparently trivial generalization of ferromagnet model. S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 9 / 35

Spin-Glass Models Ising Model E ( { S } ) = − J � � ij � s i · s j , J > 0 , s i , s j ∈ {− 1 , + 1 } Edwards Anderson Model (Binary) E ( { S } ) = � � ij � J ij · s i · s j , J ij , s i , s j ∈ {− 1 , + 1 } Edwards Anderson Model (Gaussian) E ( { S } ) = � � ij � J ij · s i · s j , J ij ∈ R , s i , s j ∈ {− 1 , + 1 } Heisenberg Model J ij ∈ R , s i , s j ∈ R 3 � ij � J ij · � s i · � E ( { S } ) = � s j S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 10 / 35

The Edwards-Anderson (EA) Model The system variables are spins ( ± 1), arranged in D-dimensional (usually D=3) lattice of size L . Spins s i interacts only with its nearest neighbours Pair of spins ( s i , s j ) share a coupling term J ij The energy of a configuration { S } is computed as: � E ( { S } ) = J ij s i s j � ij � Each configuration { S } has a probability given by the Boltzmann factor: − E ( { S } ) P ( { S } ) ∝ e kT Average of macroscopic observable ( magnetization ) are defined as: � � � M � = M ( { S } ) P ( { S } ) where M ( { S } ) = s i i { S } S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 11 / 35

Spin Glass Monte Carlo Algorithms A lattice size L has 2 L 3 different configurations (e.g. L = 80 ⇒ 2 803 ) pratically impossible to manage to generate all configurations not all configurations have the same probability and are equally important. Monte Carlo algorithms, like the Metropolis and Heatbath, are adopted: configurations are generated according to their probability observables average are computed as unweighted sums of Monte Carlo generated configurations: � M ( { S MC � M � ∼ } ) i i S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 12 / 35

Metropolis Algorithm for EA Require: set of { S } and { J } 1: loop // loop on Monte Carlo steps for all s i ∈ { S } do 2: s ′ i = ( s i == 1 ) ? − 1 : 1 // flip tentatively value of s i 3: � ij � ( J ij · s ′ ∆ E = � i · s j ) − ( J ij · s i · s j ) // compute energy change 4: 5: if ∆ E ≤ 0 then s i = s ′ 6: // accept new value of s i i 7: else 8: ρ = rnd() // compute a random number 0 ≤ ρ ≤ 1 , ρ ∈ Q if ρ < e − β ∆ E then // β = 1 / T , T = Temperature 9: s i = s i ‘ 10: // accept new value of s i end if 11: end if 12: end for 13: 14: end loop S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 13 / 35

Spin Glass Simulation is Computer Challenging E ( { S } ) = − � � ij � J ij s i s j , s i , s j ∈ { + 1 , − 1 } , J ij ∈ { + 1 , − 1 } Frustation effects make: the energy function landscape corrugated the approach to the thermal equilibrium a slowly converging process. S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 14 / 35

Spin-glass is Computer Challenging To bring a lattice L = 48 . . . 128 to the thermal equilibrium, typical state-of-the-art simulation-campaign steps are: simulation of Hundreds ( Thousands ) systems, samples , with different initial values of spins and couplings, for each sample the simulation is repeated 2-4 times with different initial spin-values (coupling values kept fixed), replicas . Each simulation may requires 10 12 . . . 10 13 Monte Carlo update steps. 80 3 × 10 ns × 10 11 MC-steps ≈ 16 years Exploiting of parallelism is necessary. S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 15 / 35

The Janus System Architecture: a cluster of 16 boards each board is a 2D toroidal grid of 4 × 4 FPGA-based Simulation Processors (SP) data links among nearest neighbours on the grid one Control Processor (CP) on each board JANUS is a project carried out by BIFI, University of Madrid, Estremadura, Rome and Ferrara, and by Eurotech. S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 16 / 35

The Janus I System S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 17 / 35

The Janus II System: Architecture S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 18 / 35

The Janus II System: SP Xilinx Virtex-7 XC7VX485T FPGA ◮ 485000 logic cells ◮ ∼ 32 Mbit embedded memory two banks of DDR-3 memory of 8 Gbyte S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 19 / 35

The Janus II System: CP Computer-on-Module (COM) system Intel Core i7 processor running at 2.2 GHz running standard Linux OS one input-output FPGA connected on the PCIe bus: ◮ configure the FPGAs of SPs ◮ manage all input-ouput operations ◮ monitor codes execution S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 20 / 35

Single-Spin Update Algorithm i = ¯ flip the value of the spin S ′ S i = − S i 1 compute the variation of energy ∆ E = E ′ i − E i 2 E i = − S i � � j � J ij S j − ¯ E ′ = S i � � j � J ij S j = S i � � j � J ij S j i E ′ ∆ E i = i − E i = − E i − E i = − 2 E i i = ¯ if ∆ E i < 0 accept the new value of spin S ′ S i 3 if ∆ E i ≥ 0: 4 compute a random number ρ ( ρ ∈ [ 0 . . . 1 ]) 1 if ρ < e − β ∆ E i accept the new of spin S 2 se ρ ≥ e − β ∆ E i reject the new value of spin S 3 where β = 1 / T and T is the value of the temperature. The energy E i associated to the site i takes then all even integer values in the range [ − 6 , 6 ] , and correspondingly: ∆ E i ∈ {− 12 , − 8 , − 4 , + 0 , + 4 , + 8 , + 12 } . S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 21 / 35

Random Wheel Generator Engine The Parisi-Rapuano generator is a popular choise for Spin Glass simulations: WHEEL[K] = WHEEL[K-24] + WHEEL[K-55] ρ = WHEEL[K] ⊕ WHEEL[K-61] WHEEL is a circular array of 64 32-bit unsigned-integers random values ρ is the generated pseudo-random number S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 22 / 35

Single-Spin Update Engine Integers numbers are expensive in terms of resources. mapping spins and coupling into bit-valued ({0,1}) variables: S i → σ i = ( 1 + S i ) / 2 J ij → γ ij = ( 1 + J ij ) / 2 then evaluation of contribution to energy at site i from site j ζ ij = S i J ij Sj can be computed as ζ ′ ij = 2 ( σ i ⊕ γ ij ⊕ σ j ) − 1 ζ ′ S i J ij S j ζ ij σ i γ ij σ j ij -1 -1 -1 -1 0 0 0 -1 -1 -1 1 1 0 0 1 1 -1 1 -1 1 0 1 0 1 -1 1 1 -1 0 1 1 -1 1 -1 -1 1 1 0 0 1 1 -1 1 -1 1 0 1 -1 1 1 -1 -1 1 1 0 -1 1 1 1 1 1 1 1 1 S. F. Schifano (Univ. and INFN of Ferrara) Computing on FPGA May 14, 2019 23 / 35

Computing on FPGA S. F . Schifano University of Ferrara and - PowerPoint PPT Presentation

Computing on FPGA S. F . Schifano University of Ferrara and INFN-Ferrara Advanced Workshop on Modern FPGA Based Technology for Scientific Computing May 14, 2019 ICTP , Trieste, Italy S. F. Schifano (Univ. and INFN of Ferrara) Computing on

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

Current Trends in Hybrid FPGA/CPU Devices Hybrid FPGA/CPU Devices Xilinx Zynq Series Real

FPGA-CAPELLA: A REAL TIME AUDIO FX UNIT COSMA KUFA AND JUSTIN XIAO WHAT IS FPGA-CAPELLA?

Public FPGA based DM Public FPGA based DMA Atta A Attacking king UlfFrisk Agenda Background

GRVI Phalanx Update: A Massively Parallel RISC-V FPGA Accelerator Framework Jan Gray |

An introduction to FPGA-based acceleration of neural networks Marco Pagani 1 What is an FPGA?

RTLinux in an FPGA Alejandro Lucero alucero@os3sl.com www.os3sl.com RTLinux in a FPGA 1.

T42 Transputer Design in FPGA Transputer Design in FPGA T42 Year- -Three Design Status

T42 Transputer Design in FPGA Transputer Design in FPGA T42 Year- -Two Design Status

FPGA high-resolution TDC Development of high-resolution TDC based on FPGA.

FPGA%Timing%Models Many%FPGA%and%CPLD%vendors%provide%a% timing model

The nextpnr FOSS FPGA place-and-route tool Clifford Wolf Symbiotic EDA FOSS FPGA PnR VPR

From OO to FPGA: From OO to FPGA: Fitting Round Objects Fitting Round Objects into Square

POWER CORRECTIONS FROM MILAN TO LHC Gavin P . Salam, CERN Giuseppe Marchesini Memorial Meeting

Driving best value for delivery of the National Child Measurement Programme Alison Gahagan and

https://conferences.lbl.gov/event/192/ Next NSD staff meeting: 14 th of May 2019 Notes on NSD

Model Checking for Symbolic-Heap Separation Logic with Inductive Predicates James Brotherston 1

Backward Secure Dynamic Searchable Symmetric Encryption with Efficient Updates Hyung Tae Lee

Outline Intro dilepton physics vector mesons in medium transport models basic principles

Open Access a battle of words Jean-Claude Gudon CC-by Janus-Bifrons the

Low-mass dileptons at HADES and CBM in a transport approach Janus Weil FIAS with H. van Hees,