CMS Patatrack project A. Bocci 1 , V. Innocente 1 , M. Kortelainen 2 - PowerPoint PPT Presentation

CMS Patatrack project A. Bocci 1 , V. Innocente 1 , M. Kortelainen 2 , F. Pantaleo 1 , M. Rovere 1 CERN1, FNAL2 2019 Joint HSF/OSG/WLCG Workshop March 19, 2019 FERMILAB-SLIDES-19-010-CD This manuscript has been authored by Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359 with the U.S. Department of Energy, Office of Science, Office of High Energy Physics.

Tie Patatrack group Matti Kortelainen (FNAL), CMS Patatrack project HOW2019, 2019–03–19 2/12 pool of expertise event, sponsored by NVIDIA CERN, collaboration with Openlab, training and working with students, and so on • Patatrack was formed by people with common interest and a varied − Software optimisation − Heterogeneous architectures − Track reconstruction − High Level Trigger • Work started in 2016 with the participation to the EuroHack 2016 • And continued through 2017 to 2019 with self-organized Hackathons at

Tie Patatrack demonstrator Matti Kortelainen (FNAL), CMS Patatrack project HOW2019, 2019–03–19 3/12 efficienty offloaded • Goal is demonstrate that part of the HLT reconstruction can be − Running on a single machine equipped with GPUs • Focus on a ∼ 10 % slice of HLT time consumption − Pixel local reconstruction − Pixel-only track reconstruction − Vertex reconstruction • Other groups have started to work on − Calorimeters local reconstruction − Full track reconstruction • For more details see closeby talks in − ACAT 2019, 10–15 March, Saas-Fee (Switzerland) − CDT/WIT 2019, 2–5 April, Valencia (Spain)

Tie Patatrack demonstrator workflow Matti Kortelainen (FNAL), CMS Patatrack project HOW2019, 2019–03–19 4/12 • Copy the pixel raw data to the GPU • Pixel local reconstruction − Decode the raw data − Clustering − Calibrations • Pixel-only tracking − Form hit doublets − Form hit quadruplets with Cellular automaton algorithm • Optionally − Full track fit (Riemann, Broken-line fits) • Some GPU algorithms are same, others different wrt. (legacy) CPU − Implementations are currently different − Bitwise or statistically identical physics performance • Organized as a chain of 3 GPU producer modules − Pass GPU data from one producer to the next − Use the CMSSW’s “external worker” mechanism

Timing Performance on 2018 data Tie Patatrack demonstrator (2018) Matti Kortelainen (FNAL), CMS Patatrack project HOW2019, 2019–03–19 5/12 • 2018 data: average pileup 50 Figure: Comparison between CPU and GPU Timing • HLT-like configuration, optimised for maximal throughput • One Tesla V100 is 5 × – 7 × faster than one Xeon Gold 6130 November 8, 2018, CMS Collaboration Patatrack Demonstrator: Pixel Tracks 12

CPU utilization Matti Kortelainen (FNAL), CMS Patatrack project Number of scheduled Number of running modules Time (ms) utilization tuned to minimize the CPU workers scheduled in parallel external workers 6/12 (i7-4771, GeForce RTX 2080) HOW2019, 2019–03–19 • Caveat: different machine − 8 threads and 8 concurrent events • After the initialization − CPU utilization is roughly 50 % − Tiere are roughly 4–5 external • NB: this workflow is “artificially”

GPU utilization Matti Kortelainen (FNAL), CMS Patatrack project HOW2019, 2019–03–19 7/12 • Screenshot of NVIDIA Visual Profiler for a random 10 ms period • Kernels and data transfers being run in parallel

Lessons learned: design principles GPU are working HOW2019, 2019–03–19 8/12 Matti Kortelainen (FNAL), CMS Patatrack project and GPUs are costly should be asynchronous • For optimal performance, follow a Data Oriented Design − Memory operations are costly, computations are almost free − Design the data structure for maximal efficiency (SOA vs ... vs. AOS) − Implement the algorithms around the data structure − Avoid object-oriented patterns in critical code e.g. data formats ⋆ inheritance, virtual functions, etc • Most (all?) GPU operations (memory copies, running “kernels”, etc) − Tie “kernels” run on the GPU while the CPU is doing other work − Tie GPU can transfer data to and from the host while both the CPU and the • Memory transfer, and especially data format conversions, between CPU − In some cases, almost as much as running the original algorithm itself

Lessons learned: separate Matti Kortelainen (FNAL), CMS Patatrack project HOW2019, 2019–03–19 9/12 features tools and architectures • CUDA and CMSSW support different sets of compilers and C++ − CUDA 10.1 supports ⋆ C++ 14 ⋆ GCC 8, CLANG 7 ⊲ CUDA 10.0 supported GCC 7, CLANG 6 − CMSSW 10.6.X supports ⋆ C++ 17 ⋆ GCC 7 and GCC 8, CLANG 7 ⋆ CUDA 10.1 in latest pre-release (was 10.0 before) • Unfortunately, we need to keep the host and device code somewhat − Host code can use C++ 17 features − Device code (and common code) is limited to C++14 features − You do not want to #include framework (or ROOT) headers in device code!

Lessons learned: what about CMSSW? Matti Kortelainen (FNAL), CMS Patatrack project HOW2019, 2019–03–19 10/12 GPU • Redesign dedicated data formats for use on GPUs − In fact, they might be more efficient also on traditional CPUs • Design a chain of algorithms (framework modules) that work on the − Without copying data back and forth • Take advantage of the “external worker” approach in CMSSW − Launch the work on the GPU, schedule other work in parallel on the CPU • Split GPU modules in two parts − Tie part that deals with the framework and the rest of the CMSSW − Tie part that deals with the GPU data structures and kernels • Split the GPU-related work in two (or more) modules, e.g. − Copy data from CPU to GPU, launch kernels − Copy data from GPU to CPU ⋆ ran only if another modules consumes the CPU SOA − Transform CPU SOA to CPU legacy data format ⋆ ran only if another module consumes

Model for CUDA Producers wrapper that holds also the CUDA device and the CUDA stream HOW2019, 2019–03–19 11/12 algorithms high GPU memory use Matti Kortelainen (FNAL), CMS Patatrack project amortizes the cost of raw CUDA allocations CUDA stream • Aim to avoid blocking synchronization as much as possible • A helper object gives the CUDA device and stream to use for the • Memory management − Raw CUDA allocations and frees should be avoided within the event loop − Preallocating memory buffers as module member data leads to unnecessarily − We went for a caching allocator for device and pinned-host memory that ⋆ Currently based on the caching allocator of cub • GPU event products are like regular EDM products, but enclosed in a − Allows the consumer to set the device, and queue more work to the same − Allows also the TBB-flowgraph streaming_node style operation ⋆ Module in the middle of the chain may only queue more asynchronous work ⋆ Later module in the chain synchronizes (with “external worker”)

Conclusions Matti Kortelainen (FNAL), CMS Patatrack project HOW2019, 2019–03–19 12/12 traditional CPUs • We have demonstrated that GPUs are an efficient alternative to − For complex tasks like track reconstruction • Next steps − Integrate the developments in the official CMSSW − Continue evolving the framework to make it easier to leverage GPUs − Focus on code portability and avoiding code duplication as much as possible − Study how more algorithms and data structures could benefit from GPUs − Study local vs. remote offloading to GPUs

Matti Kortelainen (FNAL), CMS Patatrack project HOW2019, 2019–03–19 13/15 BACKUP MATERIAL

Tie Patatrack demonstrator (2018) Matti Kortelainen (FNAL), CMS Patatrack project HOW2019, 2019–03–19 14/15 HLT Pixel Tracking performance (a) TTbar Efficiency vs (b) TTbar Fake Rate vs • Similar efficiency and fake rate as with legacy CPU algorithm • More information: CMS Detector Performance Note DP-2018/059 Figure: Track reconstruction efficiency as a function of simulated track (a), and fake rate as a function of reconstructed track (b). November 8, 2018, CMS Collaboration Patatrack Demonstrator: Pixel Tracks 7

Tie Patatrack demonstrator (2018) Matti Kortelainen (FNAL), CMS Patatrack project HOW2019, 2019–03–19 15/15 HLT Pixel Tracking performance (a) p T resolution vs p t (b) p T resolution vs • Proper fits improve resolution significantly • More information: CMS Detector Performance Note DP-2018/059 Figure: Track p T resolution as a function of the simulated track p T (a) and (b) November 8, 2018, CMS Collaboration Patatrack Demonstrator: Pixel Tracks 10

CMS Patatrack project A. Bocci 1 , V. Innocente 1 , M. Kortelainen 2 - PowerPoint PPT Presentation

CMS Patatrack project A. Bocci 1 , V. Innocente 1 , M. Kortelainen 2 , F. Pantaleo 1 , M. Rovere 1 CERN1, FNAL2 2019 Joint HSF/OSG/WLCG Workshop March 19, 2019 FERMILAB-SLIDES-19-010-CD This manuscript has been authored by Fermi Research

The CMS HL-LHC Upgrades and Proposed U.S. CMS Contributions Vivian ODell, U. S. CMS HL-LHC

Pixel trigger in CMS Peter Wittich CMS/Cornell University 12/2/2019 Trigger in CMS for Phase 2:

Flow measurements from CMS Julia Velkovska for the CMS Collaboration CMS flow measurements: LHC

CMS Programme India CERN LHC CMS India-CMS Kajari Mazumdar ( on behalf of

CMS Mortgage Strategies CMS TacOpps I - Trends & Opportunities in CRE Debt CMS TacOpps I

CMS physics overview CMS physics overview LISHEP-2013, March 18-22, Rio de Janeiro LISHEP-2013,

PhEDEx and CMS Data Transfers Paul Rossman Fermilab Global CMS Data Network Paul Rossman

Heterogeneous events selection at the CMS Experiment Felice Pantaleo CERN Experimental Physics

bluecube V 4 . 3 1 Blue Cube CMS V4.3 by Digitalcube TABLE OF CONTENTS Introduction Discover

Dj Vu and TEAMserver integration with a Content Management System (CMS) CMS integration

ACL/CMS Track: HHS Investments in Cross Cutting Quality Measurement Initiatives Jean Close, CMS

CMS Mortgage Strategies CRE Debt as Fixed Income Portfolio Enhancement CMS TacOpps I Targets

CMS Data Transfer tests towards LHC data taking CMS Data Transfer tests towards LHC data taking D

New FEMS Delegate Dijana kori Croatian Microbiological Society (CMS) www.hmd-cms.hr CMS -

Corrections to the MC B-momentum spectra due to Data MC cms energy differences V.Golubev

CMS Upgrades CMS Plans up to 2020/ 2030? Dan Green Fermilab 06/05/13 U of D0 CMS Upgrades

Learning to Remove Pileup at the LHC with Jet Images ACAT 2017 Eric M. Metodiev Center for

96 Chapter 6 P olymorphism The is one of the most p o w erful mec hanisms pro

formatting text with face gestures Alice Strunkmann-Meister, Rodrigo Blsquez Interaction

Persistent Homology in Text Mining ACAT Meeting, Bremen Hubert Wagner (Jagiellonian University)

Interfaces and Inheritance Based on The Java Tutorial

Deep Tracking & Flow Instructor - Simon Lucey 16-423 - Designing Computer Vision Apps Today

Effects of impurity ions upon Cs recycling in a negative hydrogen ion source Motoi Wada

Bioinformatics Algorithms (Fundamental Algorithms, module 2) Zsuzsanna Lipt ak Masters in

CMS Patatrack project A. Bocci 1 , V. Innocente 1 , M. Kortelainen 2 - PowerPoint PPT Presentation

CMS Patatrack project A. Bocci 1 , V. Innocente 1 , M. Kortelainen 2 , F. Pantaleo 1 , M. Rovere 1 CERN1, FNAL2 2019 Joint HSF/OSG/WLCG Workshop March 19, 2019 FERMILAB-SLIDES-19-010-CD This manuscript has been authored by Fermi Research

The CMS HL-LHC Upgrades and Proposed U.S. CMS Contributions Vivian ODell, U. S. CMS HL-LHC

Pixel trigger in CMS Peter Wittich CMS/Cornell University 12/2/2019 Trigger in CMS for Phase 2:

Flow measurements from CMS Julia Velkovska for the CMS Collaboration CMS flow measurements: LHC

CMS Programme India CERN LHC CMS India-CMS Kajari Mazumdar ( on behalf of

CMS Mortgage Strategies CMS TacOpps I - Trends &amp; Opportunities in CRE Debt CMS TacOpps I

CMS physics overview CMS physics overview LISHEP-2013, March 18-22, Rio de Janeiro LISHEP-2013,

PhEDEx and CMS Data Transfers Paul Rossman Fermilab Global CMS Data Network Paul Rossman

Heterogeneous events selection at the CMS Experiment Felice Pantaleo CERN Experimental Physics

bluecube V 4 . 3 1 Blue Cube CMS V4.3 by Digitalcube TABLE OF CONTENTS Introduction Discover

Dj Vu and TEAMserver integration with a Content Management System (CMS) CMS integration

ACL/CMS Track: HHS Investments in Cross Cutting Quality Measurement Initiatives Jean Close, CMS

CMS Mortgage Strategies CRE Debt as Fixed Income Portfolio Enhancement CMS TacOpps I Targets

CMS Data Transfer tests towards LHC data taking CMS Data Transfer tests towards LHC data taking D

New FEMS Delegate Dijana kori Croatian Microbiological Society (CMS) www.hmd-cms.hr CMS -

Corrections to the MC B-momentum spectra due to Data MC cms energy differences V.Golubev

CMS Upgrades CMS Plans up to 2020/ 2030? Dan Green Fermilab 06/05/13 U of D0 CMS Upgrades

Learning to Remove Pileup at the LHC with Jet Images ACAT 2017 Eric M. Metodiev Center for

96 Chapter 6 P olymorphism The is one of the most p o w erful mec hanisms pro

formatting text with face gestures Alice Strunkmann-Meister, Rodrigo Blsquez Interaction

Persistent Homology in Text Mining ACAT Meeting, Bremen Hubert Wagner (Jagiellonian University)

Interfaces and Inheritance Based on The Java Tutorial

Deep Tracking &amp; Flow Instructor - Simon Lucey 16-423 - Designing Computer Vision Apps Today

Effects of impurity ions upon Cs recycling in a negative hydrogen ion source Motoi Wada

Bioinformatics Algorithms (Fundamental Algorithms, module 2) Zsuzsanna Lipt ak Masters in

CMS Mortgage Strategies CMS TacOpps I - Trends & Opportunities in CRE Debt CMS TacOpps I

Deep Tracking & Flow Instructor - Simon Lucey 16-423 - Designing Computer Vision Apps Today