R. (lele) Tripiccione Dipartimento di Fisica, Universita' di Ferrara - PowerPoint PPT Presentation

Spin glass simulations on Janus R. (lele) Tripiccione Dipartimento di Fisica, Universita' di Ferrara raffaele.tripiccione@unife.it UCHPC, Rodos (Greece) Aug. 27 th , 2012

Warning / Disclaimer / Fineprints I' m an outsider here ---> a physicist's view on an application-specific architecture A flavor of physics-motivated, performance-paranoic, (hopefully) unconventional computer architecture However a few points of contact with main-stream CS may still exist ...

On the menu today WHAT?: spin-glass simulations in short WHY?: computational challenges HOW?: the JANUS systems DID IT WORK?: measured and expected performance (and comparison with “conventional” systems) Take-away lessons / Conclusions

Our computational problem Bring a spin-glass (*) system of e.g. 48 3 grid points to thermal equilibrium: s e d i l - a challenge never attempted sofar ---> s t x e n - follow the system for 10 12 – 10 13 Monte Carlo (*) steps e h t n - on ~100 independent system instances i d e n i f e d e b Back-of-envelope estimate: o t ) * ( 1 high-end CPU for 10,000 years (which is not the same as 10,000 CPUs for 1 year ...)

Statistical mechanics in brief .... Statistical mechanics tries to describe the macroscopic behaviour of matter in terms of average values of microscopic structure An (hopefully familiar) example : Explain why magnets have a transition temperature beyond which they lose their magnetic state T

The Ising model ..... The tiny little magnets are named spins; they take just two values A “configuration” is a specific value assignment for all spins in the system The “macro”-behavior is dictated by the energy function at the “micro” level: Each spin interacts only with its nearest neighbours in a discrete D-dim mesh: U { S }=− ∑ 〈 ij 〉 J S i S j J  0 Statistical physics bridges the gap from micro to macro ....

The spin-glass model ..... Spin-glasses are a generalization of Ising systems. They are the reference theoretical model of glassy behavior Interesting per se A model of complexity Interesting for industrial applications An apparently trivial change in the energy functions makes spin-glasses much more complex than Ising systems Studying these systems is a computational nightmare ...

Why are Spin Glasses so hard?? A very simple change in the energy-function (defined on e.g. a discrete 3- D lattice) U =− ∑ NB  ij  J ij  i  j , ={ 1, − 1 } J ={ 1, − 1 } hides tremendously complex dynamics, due to the extremely irregular energy landscape in the configuration space (frustration):

Monte Carlo algorithms These beasts are best studied numerically by Monte Carlo algorithms Monte Carlo algorithms navigate in configuration space in such a way that: ----> any configuration will show up according to its probability to be realized in the real world (at a given temperature) MC algorithms come in several versions … … most versions have remarkably similar requirements in terms of their algorithmic structure.

The Metropolis algorithm An endless loop ..... Pick up one (or several) spin(s) Compute the energy U Flip it/them Compute the new energy U '  U = U ' − U Compute  U  0 If accept the change unconditionally − U / KT else accept the change only with probability e pick up new spin(s) and do it again

... just a few C lines

Monte Carlo algorithms - t o n Common features: s ( e l r l a o m c ) s l l bit-manipulation operations on spins (+ LUT access) y a n m l a o s r (good-quality/long) random numbers m t o n o o t a huge degree of available parallelism c d e r i w d regular program flow (orderly loops on the grid sites) r y a r h regular, predictable memory access pattern o m e m p information-exchange (processor<->memory) is huge i h c - however the size of the data-base is tiny n o

Compute intensive, you mean?? One Monte Carlo step is roughly the (real) time in which a (real) system flips one of its spins, roughly 1 pico-second If you want to understand what happens in just the first seconds of a real experiment you need O(10 12 ) time steps on ~ 100 replicas of a 100 3 system ---> 10 20 updates Clever programming on standard CPUs: 1 ns /spin-update ---> 3000 years

Compute intensive, you mean?? The dynamics is dramatically slow (see picture) So even a simulated box whose size is a small multiple of the corr. Length will give accurate physics results Good news: we're in business even if we simulate a very small box .... However ....

Hard scaling vs Weak Scaling Amdahl's law (strong scaling) vs...  1 − p  p 1 S A =  1 − p  p / N =  1 − p  p / N … Gustafson's law (weak scaling) S G =  1 − p  N p =  1 − p  N p  1 − p  p In our case … enlarging system-size is meaningless, as we do not yet have the resources to study a “small” system ----> the ultimate quest for strong scaling ....

The JANUS project An attempt at developing, building and operating an application- driven compute engine for Monte Carlo simulations of spin glass systems A collaboration of: Universities of Rome (La Sapienza) and Ferrara Universities of Madrid, Zaragoza, Badajoz BIFI (Zaragoza) Eurotech Partially supported by Microsoft, Xilinx

The nature of the available parallelism Spin – glass simulations have two levels of available parallelism 1) Embarassingly trivial: need statistics on several replicas ---> farm it out to independent processors 2) Trivially identified: sweep order for Monte Carlo update is not specified ---> can update in parallel any set of non-mutually interacting spins make it a black-white checkerboard: it opens the way to tens of thousands of independent thread... 1) & 2) do not commute

The ideal spin glass machine ..... A further question: what is the appropriate system-scale at which this parallelism is best exploited One update engine: U =− ∑ NB  ij   i J ij  j computes the local contribution to U addresses a probability table compares with a freshly generated random numbr assigns the new spin value

The ideal spin glass machine ..... All this is just a bunch (~1000) of gates And in spite of that a typical CPU core, with O(10 7 +) gates can process perhaps 4 spins at each clock cycle If you can arrange your stock of gates the way it best suits the algorithm, can easily expect ~1000 update engines on one chip ----> The best structure is a massively-many-core organization ( or perhaps an application-driven GPU??)

The ideal spin glass machine ..... is an orderly structure (a 2D grid) of a large number of “update engines” each update engine handles a subset of the physical mesh its architectural structure is extremely simple each data path processess one bit at a time memory addresing is regular and predictable SIMD processing is OK however memory bandwidth requirements are huge (need 7 bit to process one bit..) however memory can be “local to the processor” Simple hardware structure ---> FPGA are OK!

The JANUS machine A parallel system of (themselves) massively parallel processor chips The basic hardware element: A 2-D grid of 4 x 4 (FPGA based) processors (SP's) Data links among nearest neighbours on the grid One control processors on each board (IOP) with 2 Gbit Ethernet links to host st

JANUS: a picture gallery

Our “large” machine 256 (16 x 16) processors 8 host PCs --> ~ 90 TIPS for spin-glass simulation A typical simulation wall-clock time on this nice little machine goes down to a more manageable ~ 100 days.

JANUS as a spin-glass engine The 2008 implementation (XILINX Virtex4-LX200): 1024 update cores on each processor, pipelineable to one spin update per clock cycle ---> 88% of available logic resources system clock at 62.5 Mhz ---> 16 ps average spin update time using a bandwidth of ~ 12000 read bits + 1000 written bits per clock cycle ---> 47% of available on-chip memory

(Measured) Performances Let's use “conventional” units, first ???? The data path of each Processing Element (PE) performs 11 + 2 sustained pipelined ops per clock cycle (62.5 Mhz) We have 1024 PEs ----> ~ 830 GIPS However 11 ops are on very short data words: more honestly: 7 ... 8 sustained “conventional” pipelined ops per clock cycle: We have 1024 PEs ----> ~ 300 GIPS ---> 10 GIPS/W Sustained by ~ 1 Tbyte/sec combined memory bandwidth

(Measured) Performances Physicicst like a different figure-of-merit ----> the spin-flip rate R, typically measured in psecs per flip For each processor in the system: 1 1 R 16ps / flip = = ≃ Nf 1024 × 62.5 MHz For one complete element of the IANUS core (16 procs): 1 1 . R = = ≃ 1 ps / flip . . N p Nf 16 × 1024 × 62.5 MHz e r u t a N s a t s a f s a

Physics results

Performance figures (2008-2009) Spin-glass addicts like to quote the average spin-update time SUT GUT ! ! Janus module 16 ps 1 ps x 0 0 7 – PC (IntelCoreDuo) 3000 ps 700 ps x 0 0 3 IBM CBE (all cores) - 65 ps

Performance figures (2010-2011) In the last couple of years, multi/many core processors and GPUs have entered the arena.... ! ! x 0 2 – x 0 1 l l i t S

R. (lele) Tripiccione Dipartimento di Fisica, Universita' di Ferrara - PowerPoint PPT Presentation

Spin glass simulations on Janus R. (lele) Tripiccione Dipartimento di Fisica, Universita' di Ferrara raffaele.tripiccione@unife.it UCHPC, Rodos (Greece) Aug. 27 th , 2012 Warning / Disclaimer / Fineprints I' m an outsider here ---> a

Dedicated Computers for (H igh-Energy) Physics R. Tripiccione Physics Department, Universita' di

Iterative Convex Regularization Lorenzo Rosasco Universita di Genova Universita di Genova

Scattering in PT-symmetric Quantum Mechanics Francesco Cannata Dipartimento di Fisica

Current Issues in Electromagnetic Knockout Reactions C. Giusti Dipartimento di Fisica Nucleare e

Emission mechanisms. II Emission mechanisms. II Giorgio Giorgio Matt Matt (Dipartimento di

Emission mechanisms. I Emission mechanisms. I Giorgio Giorgio Matt Matt (Dipartimento di

High energy Inelastic Neutron High energy Inelastic Neutron Scattering on VESUVIO Scattering on

Presentation by CS Makarand Lele Practicing Company Secretary VICE PRESIDENT Institute of

Public Hearing April 25, 2018 Lisa Dharwadkar, Milind Lele Walan Specialty Construction

ABOUTTHEBAND Luca Colombo, Paolo Costa, Lele Melotti and Giovanni Boscariol are four top session

Shock Induced Turbulent Mixing Akshay Subramaniam PI: Sanjiva K. Lele Outline Introduction

Shareholder Protection: A Leximetric Approach Mathias Siems & Priya Lele Centre for Business

How do giant molecules wiggle? Ashish Lele National Chemical Laboratory Acknowledgement: Chirag,

SAXS and Biochemical Methods Maria Antonietta Vanoni Dipartimento di Bioscienze Universita

Malattia da IgG4 Riccardo Capecchi Immunologia Clinica e Allergologia Dipartimento di Medicina

dipartimento di matematica, universita di genova cnr spin, genova first question why so

aks249 Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield Topic :

Elections, Computer Security, and Electronic Voting CS161 4/19/2010 David Wagner #1 #2 #3

rt ttt rt

NSI & SDN Guy Roberts, DANTE GLIF Chicago, October 12th, 2012 NSI v 2.0 Plugest

Seminar: Entwicklungsprozess von Software-Produktlinien Review Process Sandro Schulze WiSE

Helen Basturkmen University of Auckland, New Zealand Todays talk An EAP/ESP Teacher Education

LUCA PACIOLI 1447 TO 1517 THE FATHER OF ACCOUNTING DEREK STONE BA FCA FRSA FORMELY SENIOR

Facts and Conjectures about Factorizations of Fibonacci and Lucas Numbers Je ff Lagarias ,

R. (lele) Tripiccione Dipartimento di Fisica, Universita' di Ferrara - PowerPoint PPT Presentation

Spin glass simulations on Janus R. (lele) Tripiccione Dipartimento di Fisica, Universita' di Ferrara raffaele.tripiccione@unife.it UCHPC, Rodos (Greece) Aug. 27 th , 2012 Warning / Disclaimer / Fineprints I' m an outsider here ---> a

Dedicated Computers for (H igh-Energy) Physics R. Tripiccione Physics Department, Universita' di

Iterative Convex Regularization Lorenzo Rosasco Universita di Genova Universita di Genova

Scattering in PT-symmetric Quantum Mechanics Francesco Cannata Dipartimento di Fisica

Current Issues in Electromagnetic Knockout Reactions C. Giusti Dipartimento di Fisica Nucleare e

Emission mechanisms. II Emission mechanisms. II Giorgio Giorgio Matt Matt (Dipartimento di

Emission mechanisms. I Emission mechanisms. I Giorgio Giorgio Matt Matt (Dipartimento di

High energy Inelastic Neutron High energy Inelastic Neutron Scattering on VESUVIO Scattering on

Presentation by CS Makarand Lele Practicing Company Secretary VICE PRESIDENT Institute of

Public Hearing April 25, 2018 Lisa Dharwadkar, Milind Lele Walan Specialty Construction

ABOUTTHEBAND Luca Colombo, Paolo Costa, Lele Melotti and Giovanni Boscariol are four top session

Shock Induced Turbulent Mixing Akshay Subramaniam PI: Sanjiva K. Lele Outline Introduction

Shareholder Protection: A Leximetric Approach Mathias Siems &amp; Priya Lele Centre for Business

How do giant molecules wiggle? Ashish Lele National Chemical Laboratory Acknowledgement: Chirag,

SAXS and Biochemical Methods Maria Antonietta Vanoni Dipartimento di Bioscienze Universita

Malattia da IgG4 Riccardo Capecchi Immunologia Clinica e Allergologia Dipartimento di Medicina

dipartimento di matematica, universita di genova cnr spin, genova first question why so

aks249 Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield Topic :

Elections, Computer Security, and Electronic Voting CS161 4/19/2010 David Wagner #1 #2 #3

rt ttt rt

NSI &amp; SDN Guy Roberts, DANTE GLIF Chicago, October 12th, 2012 NSI v 2.0 Plugest

Seminar: Entwicklungsprozess von Software-Produktlinien Review Process Sandro Schulze WiSE

Helen Basturkmen University of Auckland, New Zealand Todays talk An EAP/ESP Teacher Education

LUCA PACIOLI 1447 TO 1517 THE FATHER OF ACCOUNTING DEREK STONE BA FCA FRSA FORMELY SENIOR

Facts and Conjectures about Factorizations of Fibonacci and Lucas Numbers Je ff Lagarias ,

Shareholder Protection: A Leximetric Approach Mathias Siems & Priya Lele Centre for Business

NSI & SDN Guy Roberts, DANTE GLIF Chicago, October 12th, 2012 NSI v 2.0 Plugest