Sandro Wenzel / CERN-PH-SFT For the VecGeom team Geant4 - PowerPoint PPT Presentation

Updates on VecGeom Focus on SIMD performance- and developments Sandro Wenzel / CERN-PH-SFT For the VecGeom team Geant4 collaboration meeting, Fermilab, 31.09.2015

Primary Goals of VecGeom Provide multi-track interface/API to important shape functions and geometry navigation x 2 s d 1 x 4 x 3 x 1 vectors of particles ComputeStep for multiple tracks 2 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel

Primary Goals of VecGeom Provide multi-track interface/API to important shape functions and geometry navigation Gain from CPU SIMD units when processing multiple tracks for simple shapes for logical volumes with few daughters Alternatively: Gain from CPU SIMD units when processing single- tracks for complicated shapes for logical volumes with many daughters Code re-usage/compilation on many platforms (including GPUs) 2 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel

Main components of VecGeom Geometry Modeller „Shapes“ Navigation Box, Tube,... LogicalVolume NavigationState PlacedVolume Navigator scalar API vector API Transformations scalar API vector API double DistanceToOut(Vector3D const &p, double ComputeStep(Vector3D, Vector3D) Vector3D const &d) void ComputeStep(...“multi-track“ void DistanceToOut(„multitrack- interface...) interface“) 3 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel

Recap of prototype status early 2014 provided SIMD optimized vector interfaces and algorithms for few elementary solids and geometry base functions ( implemented important functions for particle navigation ) can run chain of algorithms in vector/SIMD mode SIMD distFromInside mothervolume vector flow pick next daughter volume SIMD transform coordinates to daughter frame SIMD distToOutside daughtervol SIMD update step + boundary CHEP13 paper: http://arxiv.org/pdf/1312.0816.pdf 4 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel

Recap of prototype status early 2014 provided SIMD optimized vector interfaces and algorithms for few elementary solids and geometry base functions ( implemented important functions for particle navigation ) can run chain of algorithms in vector/SIMD mode good overall performance gains for such an algorithm (in toy detector SIMD with 4 boxes, 3 tubes, 2 cones) - compared to ROOT/5.34.17 distFromInside mothervolume vector flow 16 particles 1024 particles SIMD MAX pick next Intel daughter volume ~2.8x ~4.0x 4x IvyBridge (AVX) SIMD transform coordinates to Intel Haswell ~3.0x ~5.0x 4x daughter frame (AVX2) SIMD distToOutside Intel Xeon- daughtervol ~4.1x ~4.8x 8x Phi SIMD (AVX512) update step + boundary gcc 4.8; -O3 -funroll-loops -mavx; no FMA CHEP13 paper: http://arxiv.org/pdf/1312.0816.pdf 4 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel

Summary of developments after prototype transition of prototype into true library development gitlab.cern.ch/VecGeom/VecGeom design work...; integration with USolids developments, ... porting considerable portion of solid code to VecGeom ported/adapted existing (USolids) code into generic templated and platform independent code which be instantiated for the scalar + GPU + multi-track interfaces (following the VecGeom development model) see table next slide focused somewhat on getting CMS geometry treatable with VecGeom; now possible a lot of effort into validating shape algorithms worked on navigator structure, geometry model, etc. very much ongoing (active R&D) integration of VecGeom into Geant-V simulation framework more or less achieved but more effort needed 5

Shape development status mid 2015 Shape VecGeom Box yes Trap + Trd yes Tube[s] yes Cone[s] yes GenericTrap/Arb8 (yes) Tet Polycone yes Polyhedron yes Torus yes Parallelepiped yes Extruded solid MultiUnion Tesselated Solid Composites yes Templat. Composites (yes) Hype,Ellipsoid, Parab yes Orb/Sphere yes ... the rest ... the rest is „Eltu, Twisted[*], ScaledShape, ...“ 6 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel

Shape development status mid 2015 SIMD acceleration Shape VecGeom Multi-Track Internal SIMD SIMD impr Box yes yes Trap + Trd yes yes Tube[s] yes yes Cone[s] yes (incomplete) GenericTrap/Arb8 (yes) (yes) (yes) Tet (targeted) Polycone yes (targeted) Polyhedron yes yes Torus yes yes Parallelepiped yes yes Extruded solid (targeted) MultiUnion (targeted) Tesselated Solid (targeted) Composites yes Templat. Composites (yes) (yes) Hype,Ellipsoid, Parab yes yes Orb/Sphere yes yes ... the rest ... the rest is „Eltu, Twisted[*], ScaledShape, ...“ 6 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel

Example for multi-track SIMD Performance performance of hollow tube segment 1600 time units 1200 800 VecGeom scalar excellent 400 SIMD vector USolids ROOT performance VMP G4 ROOT 0 Geant4 DistanceToIn SafetyToIn In-or-Out? USolids VecGeom ScalarAPI total speedup cmp 7x 3.3x 13.62x VecGeom Many-Track API to USolids gcc 4.7; -O3 -funroll-loops -mavx; no FMA; Geant4 10.1 (Release); Root 5.34.18 (Release); benchmark with 1000 particles 7 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel

Multi-particle SIMD performance on Xeon Phi Often achieving considerable vector performance on the Intel Xeon Phi with the multi-track interface (example for the trapezoid and simple tube) theoretical max vector gain is 8 for double precision (register width = 512 bytes) trapezoid benchmark - Vc vectorization - Intel(R) Xeon Phi(TM) tube benchmark - Vc vectorization - Intel(R) Xeon Phi(TM) Inside Contains SafetyToIn Inside Contains SafetyToIn SafetyToOut DistanceToIn SafetyToOut DistanceToIn DistanceToOut DistanceToOut benchmark performed by Sofia Vallecorsa + Guilherme Amadio (Intel IPCC) 8 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel

Example for 1-track SIMD improvement: Polyhedron 0.004 USolids VecGeom noSIMD 0.003 VecGeom SIMD 0.002 for some polyhedra 0.001 considerable overall improvement compared to small test USolids implementation 0 DistToIn DistToOut SafetyToOut For very complex shapes; 0.01 USolid implementation might be better choice 0.008 demonstrated gain from 0.005 internal vectorization ( typically factor 1.4 ish ) 0.003 test done on iCore7 AVX 0 with 1000 particles HBHalf@CMS 9 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel

Global library performance evaluations 10 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel

A global performance evaluation of 1-track mode Trying to benchmark complete geometry modeller: shapes + navigation Developed X-Ray benchmark: propagate geantinos pixel-by-pixel not a realistic benchmark ... (G4 is not optimized for geantino tracing) ... but an indication that we are globally moving into the right direction dir G4 ROOT VecGeom* y 21.5s 12.7s 5.9s z 10.7s 6.58s 4.09s time to obtain the X-Ray image for the CMS calorimeter along different propagation directions (* current stable state of master branch, further improvements expected ) 11 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel

Scaling on the Xeon Phi Cannot yet compile Geant-V on the Xeon Phi But we can compile VecGeom X-Ray benchmark and can use it for some scaling studies Idea: treat different pixels in different treads (OpenMP) Plot shows thread-speedup for x-raying the CMS calorimeter Demonstrating: thread safety of VecGeom sharing of the geometry among all threads (memory reduction); and perfect scaling up to the number of physical cores preliminary, plot provided by Sofia Vallecorsa (Intel IPCC@CERN) 12 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel

Comparing VecGeom/TGeo in Geant-V Spent considerable time this year to make CMS@Geant-V run with VecGeom many many debugging sessions -:) more or less stable now (validated by number of steps + simple observables) Allows for a first realistic estimate of the overall impact on total simulation time 10 p-p events 7TeV in CMS; Factor ~ 1.6 improvement in simulation runtime when switching from ROOT to VecGeom using only scalar mode of VecGeom so far; further speedup expected in future preliminary, plot provided by Andrei Gheata 13 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel

Comparing VecGeom/TGeo in Geant-V VecGeom has a thin „NavigationStates“ (no caching of global matrix; usage of 32byte indices rather than 64byte volume pointers) leads to considerable memory reduction in Geant-V track objects and in the overall simulation (which also contributes positively to the speed gain) preliminary, plot provided by Andrei Gheata 14 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel

Latest developments in navigation 15 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel

Sandro Wenzel / CERN-PH-SFT For the VecGeom team Geant4 - PowerPoint PPT Presentation

Updates on VecGeom Focus on SIMD performance- and developments Sandro Wenzel / CERN-PH-SFT For the VecGeom team Geant4 collaboration meeting, Fermilab, 31.09.2015 Primary Goals of VecGeom Provide multi-track interface/API to important shape

Recent Developments in USolids/VecGeom Status + Plans Sandro Wenzel / CERN-PH-SFT Geant4

Overview of the SPS LLRF upgrade Gregoire Hagmann (CERN) Mattia Rizzi (CERN) Philippe

Status and Roadmap of the CernVM-FS Graphdriver Plugin for Docker CERN, SFT Group Meeting Nikola

WOODEN TILES PRESENTATION Porcelanosa Pricing Sizes available : Maximum Retail Price

Accelera'ng records management at CERN Andrew Short andrew.short@cern.ch CERN Accelerator

Marek Domaracky CERN IT Vidyo@CERN CERN WebRTC Future 3 VIDYO@CERN: SCALE AND

Benchmarking topics at Benchmarking topics at CERN CERN Helge Meinhard / CERN- -IT IT Helge

Enabling CernVM for the Interactive Use Case Vasilis Nicolaou SFT Group CERN

Far Cry and DirectX Far Cry and DirectX Carsten Wenzel Carsten Wenzel Far Cry uses the latest

Binary code browser Student: Alin Mindroc (Romania) Mentor: Dr. Sandro Wenzel Main goals:

Introduction to Linked Data Sandro Hawke, W3C sandro@hawke.org @sandhawke

Colonel Sandro Calaresu EEAS CMPD sandro.calaresu @eeas.europa.eu 1 Hybrid Threats The Menu:

High performance geometry -- ideas for future direction ( or reasons to start from scratch ) --

Geant4 Model Testing Framework: From PAW to ROOT 12/08/2009 Author: Roman Atachiants (PH-SFT)

RECENT PROGRESS ON WEB SERVICES FOR SFT Nefeli Kousi TASKS TASKS ROOT Primer to Notebooks

3 theory on the lattice Michael Kroyter The Open University of Israel SFT 2015 Chengdu

A method to approximate Lyapunov exponents and most unstable trajectories of switching systems.

x z y x z

Faster than Weighted A*: An Optimistic Approach to Bounded Suboptimal Search Jordan Thayer and

Gianluigi Rozza Collaboration Network MOX (A. Quarteroni, F. Ballarin, P . Pacciarini) EPFL (T.

Diffeomorphisms and Heegaard splittings of 3-manifolds Hyamfest Melbourne, July 2011

Microstate counting for AdS black holes Alberto Zaffaroni Milano-Bicocca PRIN Kick-off Meeting

Light if no normals specified, assumes all identical shadows Eye Image Plane

Indian Ocean Warming and its Impact on Indian Summer Monsoon and Global Hiatus Suryachandra A.