Porting the LHCb Stack from x86 (Intel) to aarch64 (ARM) CHEP 2018, - PowerPoint PPT Presentation

Porting the LHCb Stack from x86 (Intel) to aarch64 (ARM) CHEP 2018, Sofia Laura Promberger 1 2 Marco Clemencic 1 Ben Couturier 1 Aritz Brosa Iartza 1 3 Niko Neufeld 1 on behalf of the LHCb collaboration July 12, 2018 1 CERN 2 Hochschule Karlsruhe - Technik und Wirtschaft 1 3 Universidad de Oviedo (ES)

Motivation - The Upgrade In 2021 Currently (Run 2) Upgrade (Run 3) Data acquisition rate 50 GB/s 4 TB/s Data recording rate 0.7 GB/s 2 - 10 GB/s For the upgrade • Software needs major refactoring and usage of new technology • New HLT farm Goal • Add cross-platform support to the LHCb stack → More flexibility with the tender for the new HLT farm � Biggest Problem: Vectorization 2

The LHCb Stack • 5 million lines of code (experiment-specific projects) • Multiple, large projects experiment-specific For this work experiment-independent • Old version of the LHCb external dependencies (LCG) stack (Oct 2017) Structure of the stack • Not multi-threaded 3

Vectorization Vcl Vc Intel AVX2 Yes Yes Intel AVX512 Yes In development PowerPC Altivec No No ARM NEON No In development Vectorization Style Wrapper for High-level, intrinsics targets horizontal vectorization Extensibility for new intrinsics Medium Complex (no unit tests) → Vcl allows ’fast’ implementation of other platforms 4

Port to aarch64 (ARM) LCG requires • Changing compile flags • e. g. replace -max-page-size=0x1000 by -common-page-size=0x1000 • Changing versions of the external dependencies • Disabling unnecessary packages (e. g. Oracle, R) Other projects • Changing compile flags • Replacement of Vc by • Vcl • Scalar code 5

Port to aarch64 - Problems Default signedness of char • Intel uses signed char • ARM uses unsigned char → Use -fsigned-char to change the default to signed char 1 // Jenkins one − at − time hash function static unsigned int hash32( const char ∗ key ) 2 { 3 4 unsigned int hash = 0; 5 for ( const char ∗ k = key; ∗ k; ++k ) { hash += ∗ k; 6 hash += ( hash << 10 ); 7 8 hash ˆ= ( hash >> 6 ); } 9 10 hash += ( hash << 3 ); 11 hash ˆ= ( hash >> 11 ); 12 hash += ( hash << 15 ); 13 return hash; } 14 6

Port to aarch64 - Problems II Cast double to unsigned int • Intel assembly uses vcvttsd2si • ARM assembly uses fcvtzu if (m xInverted == true) { 1 2 strip = (unsigned int) floor(((m uMaxLocalu)/m pitch) +0.5); 3 } float x = − 3.3; 1 2 unsigned int y = (unsigned int) x; Problem float x = − 3.3; 1 2 uint32 t y = static cast < uint32 t > (static cast < int > (x)); 7 Solution

Performance - The machines ThunderX2 E5-2630 v4 Power8+ Power9 Architecture ARM Intel PowerPc PowerPc Platform aarch64 x86 64 ppc64le ppc64le Compiler GCC 7.2 GCC 6.2 GCC 7.3 GCC 7.3 Number logical cores 224 40 128 176 Threads per core 4 2 8 4 Cores per socket 28 10 8 22 Sockets/NUMA nodes 2 2 2 2 RAM (GB) 256 64 256 128 Largest intrinsic set NEON AVX2 Altivec Altivec CPU performance top-notch cost-efficient high-tier mid-tier 8

Performance - Scalability of the LHCb Stack 175 150 125 Total events per sec 100 75 50 25 0 0 25 50 75 100 125 150 175 200 Number of processes Thunder X2, Gcc 7.2, CentOS E5-2630 v4, Gcc 6.2, CentOS POWER8+, Gcc 7.3, CentOS POWER9, Gcc 7.3, RHEL 9

Scalability II - Cost-Performance Estimations 10

Outlook • Long-term goal: Adding cross-platform support to the Run 3 LHCb stack � Requires a fully functioning cross-platform vectorization library • Finding a cross-platform vectorization library • ROOT plans to use VecCore which has both, UMESIMD and Vc as back end → LHCb evaluates to switch to VecCore instead of Vc and Vcl • New vectorization intrinsic set for ARM: SVE • First official date for CPU release: Fujitsu - 2021 → Too late for LHCb Run 3 11

Summary • Cross-platform support of the LHCb stack for aarch64 and ppc64le • Biggest problem: Vectorization • ”Hackish” workarounds of Vc just for this study • Cost-performance estimation • To be considered: pricing, not multi-threaded, less vectorization on aarch64 • ARM and Intel quite close → Competitive tender for real evaluation necessary 12

Questions? 12

Vectorization Vcl Vc UMESIMD Intel AVX2 Yes Yes Yes Intel AVX512 Yes In development Yes PowerPC Altivec No No Early Example ARM NEON No In development Early Example Vectorization Wrapper for High-level, Wrapper for Style intrinsics targets horizontal intrinsics vectorization Extensibility Medium Complex easy (unit tests for new (no unit tests) available) intrinsics 13

Performance - Scalability of the LHCb Stack normalized 175 150 125 Total events per sec 100 75 50 25 0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 % of used logical cores Thunder X2, Gcc 7.2, CentOS E5-2630 v4, Gcc 6.2, CentOS POWER8+, Gcc 7.3, CentOS POWER9, Gcc 7.3, RHEL 14

Porting the LHCb Stack from x86 (Intel) to aarch64 (ARM) CHEP 2018, - PowerPoint PPT Presentation

Porting the LHCb Stack from x86 (Intel) to aarch64 (ARM) CHEP 2018, Sofia Laura Promberger 1 2 Marco Clemencic 1 Ben Couturier 1 Aritz Brosa Iartza 1 3 Niko Neufeld 1 on behalf of the LHCb collaboration July 12, 2018 1 CERN 2 Hochschule Karlsruhe

Porting FreeBSD to AArch64 Andrew Turner andrew@fubar.geek.nz 12 June 2015 About me Source

Stack Stack Heap Heap Data Data Text Text Program A Program B Stack Stack Text Heap

LHCb Computing Computing LHCb Nick Brook Organisation LHCb software Distributed

Status of the LHCb Experiment LHCb RRB meeting 24 April 2002 on behalf of the LHCb Collaboration

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

Porting OpenVMS to x86-64 Update Clair Grant Camiel Vanderhoeven April 8, 2016 Porting OpenVMS

ARM Software Suite Powered by GDM Why use ARM Software? ARM is the software solution to plan,

ARM Advanced RISC Machines The ARM Instruction Set The ARM Instruction Set - ARM University

ARM Cortex-M4 Programming Model ARM = Advanced RISC Machines, Ltd. ARM licenses IP to other

Porting FreeBSD on Xen on ARM How to support your OS as Xen ARM guest Julien Grall

CS 105 Intel x86 (IA32/64) Processors Intel x86 (IA32/64) Processors Tour of the Black Holes

Recent results of double parton scattering studies at LHCb Liupan An On behalfof the LHCb

LHCb - Underground Guide Basic training for LHCb guides EDMS No.2150881 (v.1.0.0) Objectives of

Measurement of exotic hadrons at LHCb Pavel Krokovny on behalf of the LHCb Collaboration Budker

Central Exclusive Production (CEP) at LHCb Ronan McNulty (UCD Dublin) on behalf of the LHCb

Tracking and Alignment in LHCb Florin MACIUC on behalf of LHCb collaboration

S8688 : INSIDE DGX-2 Glenn Dearth, Vyas Venkataraman Mar 28, 2018 Why was DGX-2 created DGX-2

SFIO progress on Swiss-Tx SCS meeting on Frangipani: a scalable distrib- uted file system to

From Channel Slicing to From Channel Slicing to Spatial Division Multiplexing Spatial Division

Further Results Further Results of Soft-Inplane Tiltrotor of Soft-Inplane Tiltrotor

Performance Optimization on a Performance Optimization on a Supercomputer with cTuning and

Cross-Tool Semantics for Protocol Security Goals SSR December 5, 2016 Gaithersburg, MD Joshua

Activities and Plans in the US Paul M. Grant W2AGZ Technologies San Jose, CA USA A Sober

CRTs Are Back, Alive and Well! Allyson Simpson Senior Director, Office of Gift Planning Western