 
              Event t Reconst r t ruct i t ion on Modern and Fut u t ure Com put e t er Archit e t ect u t ures Ivan Kisel GSI, Darmstadt MPI, Munich, 16 November 2010
• 1000 charged particles/collision Rekonstruktionsherausforderung im CBM-Experiment CBM experiment at FAIR/GSI • The silicon detector is 1 m long • The first plane has only 5 cm diameter • A very high track density • A non-homogeneous magnetic field • 10 7 collisions/second Silicon Detector Target Beam Vocabulary: Collision = Event Trajectory = Track Measurement = Hit 16 November 2010, MPI Munich Ivan Kisel, GSI 2/17
HEP Research Centers Research Accelerator (GeV) Experiment Physics Center PEP-II, e - x e + (9 x 3.1) SLAC, USA BaBar B-Physics D0 Universal Fermilab, Tevatron, p x p (1000 x 1000) USA CDF Universal PHENIX Quark-Gluon-Plasma BNL, USA RHIC, Heavy Ions STAR Quark-Gluon-Plasma ALI CE (CERN) KEK-B, e - x e + (8 x 3.5) KEK, Japan BELLE B-Physics ATLAS Universal CMS Universal CERN, LHC, p x p (7000 x 7000) Switzerland ALICE Quark-Gluon-Plasma LHCb B-Physics ZEUS Proton-Physics H1 Proton-Physics DESY, HERA, e +/- x p (27.5 x 920) Germany HERMES Spin-Physics HERA-B B-Physics PANDA Quark-Physics FAIR/GSI, SIS 100/300, p, Heavy Ions Germany • 5000 charged particles/collision CBM Quark-Gluon-Plasma • 2000 proton-proton collisions/second • 300 heavy ion collisions/second • 15 GB/second data flow (TPC only) 16 November 2010, MPI Munich Ivan Kisel, GSI 3/28 3/28
From Raw Data to Physics 1. Particle Accelerator 2. Particle Detectors 3. Data Acquisition 4. Data Reconstruction 5. Physics Analysis 16 November 2010, MPI Munich Ivan Kisel, GSI 4/28
HEP Experiments: Collider and Fixed-Target 16 November 2010, MPI Munich Ivan Kisel, GSI 5/28
Schematic View of a Detector Setup Silicon Detector Electromagnetic Calorimeter Magnet Hadron Muon Chambers Calorimeter 16 November 2010, MPI Munich Ivan Kisel, GSI 6/28
Methods for Event Reconstruction Track finding Track fitting Kalman Filter Time consuming!!! Kalman Filter Vertex finding/ fitting • Global Methods • all hits are treated equivalently • typical methods: • Conformal Mapping • Histogramming • Hough Transformation • Local Methods Ring finding • sequential selection of candidates • typical methods: Combinatorics • Track following • Kalman Filter • Neural Networks • combine local and global relations • typical methods: • Perceptron • Hopfield network • Cellular Automaton • Elastic Net 16 November 2010, MPI Munich Ivan Kisel, GSI 7/28
Cellular Automaton (CA) as Track Finder Track finding: Wich hits in detector belong to the same track? – Cellular Automaton (CA) 0. Hits Detector layers 0. Hits (CBM) Hits 1. Segments Cellular Automaton: 1. Build short track segments. 1000 Hits 2. Connect according to the track model, 2. Counters estimate a possible position on a track. 1 2 3. Tree structures appear, 4 3 collect segments into track candidates. 4. Select the best track candidates. 3. Track Candidates 4. Tracks (CBM) Cellular Automaton: • local w.r.t. data • intrinsically parallel • extremely simple 4. Tracks • very fast Perfect for many-core CPU/GPU ! 1000 Tracks 16 November 2010, MPI Munich Ivan Kisel, GSI 8/28
Kalman Filter (KF) based Track Fit Track fit: Estimation of the track parameters at one or more hits along the track – Kalman-Filter (KF) 3 Correction Detector layers Hits 1 Initialising π (r, C) r – Track parameters C – Precision Precision 2 Prediction State vector Position, direction and momentum KF Block-diagram 1 r = { x, y, z, p x , p y , p z } Kalman Filter: 1. Start with an arbitrary initialization. 3 2 2. Add one hit after another. 3. Improve the state vector. 4. Get the optimal parameters after the last hit. Nowadays the Kalman-Filter is used in almost all HEP experiments KF as a recursive least squares method 16 November 2010, MPI Munich Ivan Kisel, GSI 9/28
Track Finding in the Pattern Tracker of HERA-B (DESY) OTR I TR Extremely low resolution and efficiency of the pattern tracker of HERA-B Cellular Automaton Kalman Filter Hough Transformation TEMA RANGER CATS 16 November 2010, MPI Munich Ivan Kisel, GSI 10/28
Competition CATS(CA)/ RANGER(KF)/ TEMA(HT) (HERA-B, DESY) Tracking efficiency Tracking quality Efficiency N inel x 50 tracks Time consumption Time/event (sec) The reconstruction package CATS based on the Cellular Automaton for track finding and the Kalman Filter for track fitting outperforms alternative packages (SUSi, HOLMES, L2Sili, OSCAR, RANGER, TEMA) based on traditional methods in efficiency, accuracy and speed N inel x 50 tracks 16 November 2010, MPI Munich Ivan Kisel, GSI 11/28
16 November 2010, MPI Munich Ivan Kisel, GSI 12/18
Many-Core HPC: Cores, Threads and SI MD HEP: cope with high data rates ! 2015 Cores and Threads realize the task level of parallelism Process 2010 Thread1 Thread2 … … exe r/w r/w exe exe r/w ... ... 2000 CPU Thread Thread Core Threads Cores Vector Scalar D S S S S Performance SIMD Width Fundamental redesign of traditional approaches to data processing Vectors (SIMD) = data level of parallelism is necessary SIMD = Single Instruction, Multiple Data 16 November 2010, MPI Munich Ivan Kisel, GSI 13/28
Our Experience with Many-Core CPU/ GPU Architectures NVI DI A GPU I ntel/ AMD CPU 512 cores Since 2008 2x4 cores Since 2005 6.5 ms/event (CBM) 63% of the maximal GPU utilization (ALICE) I ntel MI CA I BM Cell Since 2008 32 cores Since 2006 1+ 8 cores Cooperation with Intel (ALICE/CBM) 70% of the maximal Cell performance (CBM) Future systems are heterogeneous 16 November 2010, MPI Munich Ivan Kisel, GSI 14/28
CPU/ GPU Programming Frameworks • Intel Ct (C for throughput) • Extension to the C language • Intel CPU/GPU specific • SIMD exploitation for automatic parallelism • NVIDIA CUDA (Compute Unified Device Architecture) • Defines hardware platform • Generic programming • Extension to the C language • Explicit memory management • Programming on thread level • OpenCL (Open Computing Language) • Open standard for generic programming • Extension to the C language • Supposed to work on any hardware • Usage of specific hardware capabilities by extensions • Vector classes (Vc) • Overload of C operators with SIMD/SIMT instructions • Uniform approach to all CPU/GPU families • Uni-Frankfurt/FIAS/GSI Vector classes: Cooperation with the Intel Ct group 16 November 2010, MPI Munich Ivan Kisel, GSI 15/28
Vector Classes (Vc) Vector classes overload scalar C operators with SIMD/SIMT extensions Scalar SIMD c = a+ b vc = _mm_add_ps(va,vb) Vector classes: Vc increase the speed by the factor:  provide full functionality for all platforms SSE2 – SSE4 4x   support the conditional operators future CPUs 8x   MICA/Larrabee 16x phi(phi< 0)+ = 360;  NVIDIA Fermi research Vector classes enable easy vectorization of complex algorithms 16 November 2010, MPI Munich Ivan Kisel, GSI 16/28
Tracking Challenge in CBM (FAI R/ GSI ) • Fixed-target heavy-ion experiment • 10 7 Au+ Au collisions/s • 1000 charged particles/collision • Non-homogeneous magnetic field • Double-sided strip detectors (85% combinatorial space points) Track reconstruction in STS/MVD and displaced vertex search are required in the first trigger level. Reconstruction packages: • track finding Cellular Automaton (CA) • track fitting Kalman Filter (KF) • vertexing KF Particle 16 November 2010, MPI Munich Ivan Kisel, GSI 17/28
Kalman Filter (KF) for Track Fitting 1 Parameterization of the magnetic field 2 KF was considerably reworked Optimization of the algorithm 3 December 21, 1968. The Apollo 8 spacecraft has just been sent on its way to the Moon. 003:46:31 Collins: Roger. At your convenience, would you please go P00 and Accept? We're going to update to your W-matrix. 16 November 2010, MPI Munich Ivan Kisel, GSI 18/28
Kalman Filter Track Fit on Cell Intel P4 10000x faster on each CPU Cell Comp. Phys. Comm. 178 (2008) 374-383 The KF speed was increased by 5 orders of magnitude blade11bc4 @IBM, Böblingen: 2 Cell Broadband Engines with 256 kB Local Store at 2.4 GHz Motivated by, but not restricted to Cell ! 16 November 2010, MPI Munich Ivan Kisel, GSI 19/28
Performance of the KF Track Fit on CPU/ GPU Systems Scalabilty 2xCell SPE (16 ) 10.00 Woodcrest ( 2 ) Task Level Parallelism Data Stream Parallelism Clovertown ( 4 ) (100x) Time/Track, µ s (10x) Dunnington ( 6 ) 1.00 0.10 Threads Cores Cores and Threads SIMD SIMD 0.01 scalar double single -> 2 4 8 16 32 Threads Scalability on different CPU architectures – speed-up 100 GPU CPU Real-time performance on NVIDIA GPU graphic cards Real-time performance on different Intel CPU platforms The Kalman Filter Algorithm performs at ns level CBM Progr. Rep. 2008 16 November 2010, MPI Munich Ivan Kisel, GSI 20/28
CBM Cellular Automaton (CA) Track Finder Top view Front view 770 Tracks Intel X5550, 2x4 cores at 2.67 GHz Scalability Efficiency Highly efficient reconstruction of 150 central collisions per second 16 November 2010, MPI Munich Ivan Kisel, GSI 21/28
Recommend
More recommend