Ivan Kisel GSI, Darmstadt MPI, Munich, 16 November 2010 1000 - - PowerPoint PPT Presentation

ivan kisel
SMART_READER_LITE
LIVE PREVIEW

Ivan Kisel GSI, Darmstadt MPI, Munich, 16 November 2010 1000 - - PowerPoint PPT Presentation

Event t Reconst r t ruct i t ion on Modern and Fut u t ure Com put e t er Archit e t ect u t ures Ivan Kisel GSI, Darmstadt MPI, Munich, 16 November 2010 1000 charged particles/collision Rekonstruktionsherausforderung im


slide-1
SLIDE 1

Event t Reconst r t ruct i t ion

  • n Modern and Fut u

t ure Com put e t er Archit e t ect u t ures

Ivan Kisel

GSI, Darmstadt

MPI, Munich, 16 November 2010

slide-2
SLIDE 2

16 November 2010, MPI Munich Ivan Kisel, GSI 2/17

Rekonstruktionsherausforderung im CBM-Experiment

  • 1000 charged particles/collision
  • The silicon detector is 1 m long
  • The first plane has only 5 cm diameter
  • A very high track density
  • A non-homogeneous magnetic field
  • 107 collisions/second

Vocabulary: Collision = Event Trajectory = Track Measurement = Hit

Beam Target Silicon Detector

CBM experiment at FAIR/GSI

slide-3
SLIDE 3

16 November 2010, MPI Munich Ivan Kisel, GSI 3/28

HEP Research Centers

Research Center Accelerator (GeV) Experiment Physics

SLAC, USA PEP-II, e- x e+ (9 x 3.1) BaBar B-Physics Fermilab, USA Tevatron, p x p (1000 x 1000) D0 Universal CDF Universal BNL, USA RHIC, Heavy Ions PHENIX Quark-Gluon-Plasma STAR Quark-Gluon-Plasma KEK, Japan KEK-B, e- x e+ (8 x 3.5) BELLE B-Physics CERN, Switzerland LHC, p x p (7000 x 7000) ATLAS Universal CMS Universal ALICE Quark-Gluon-Plasma LHCb B-Physics DESY, Germany HERA, e+/- x p (27.5 x 920) ZEUS Proton-Physics H1 Proton-Physics HERMES Spin-Physics HERA-B B-Physics FAIR/GSI, Germany SIS 100/300, p, Heavy Ions PANDA Quark-Physics CBM Quark-Gluon-Plasma

3/28

  • 5000 charged particles/collision
  • 2000 proton-proton collisions/second
  • 300 heavy ion collisions/second
  • 15 GB/second data flow (TPC only)

ALI CE (CERN)

slide-4
SLIDE 4

16 November 2010, MPI Munich Ivan Kisel, GSI 4/28

From Raw Data to Physics

1. Particle Accelerator 2. Particle Detectors 3. Data Acquisition 4. Data Reconstruction 5. Physics Analysis

slide-5
SLIDE 5

16 November 2010, MPI Munich Ivan Kisel, GSI 5/28

HEP Experiments: Collider and Fixed-Target

slide-6
SLIDE 6

16 November 2010, MPI Munich Ivan Kisel, GSI 6/28

Schematic View of a Detector Setup

Magnet Muon Chambers Silicon Detector Electromagnetic Calorimeter Hadron Calorimeter

slide-7
SLIDE 7

16 November 2010, MPI Munich Ivan Kisel, GSI 7/28

Methods for Event Reconstruction

  • Global Methods
  • all hits are treated equivalently
  • typical methods:
  • Conformal Mapping
  • Histogramming
  • Hough Transformation
  • Local Methods
  • sequential selection of candidates
  • typical methods:
  • Track following
  • Kalman Filter
  • Neural Networks
  • combine local and global relations
  • typical methods:
  • Perceptron
  • Hopfield network
  • Cellular Automaton
  • Elastic Net

Track finding Track fitting Vertex finding/ fitting Ring finding

Time consuming!!! Kalman Filter Kalman Filter Combinatorics

slide-8
SLIDE 8

16 November 2010, MPI Munich Ivan Kisel, GSI 8/28

Cellular Automaton (CA) as Track Finder

Track finding: Wich hits in detector belong to the same track? – Cellular Automaton (CA)

  • 0. Hits
  • 1. Segments

1 2 3 4

  • 2. Counters
  • 3. Track Candidates
  • 4. Tracks

Cellular Automaton:

  • local w.r.t. data
  • intrinsically parallel
  • extremely simple
  • very fast

Perfect for many-core CPU/GPU !

Detector layers Hits

  • 4. Tracks (CBM)
  • 0. Hits (CBM)

1000 Hits 1000 Tracks Cellular Automaton:

  • 1. Build short track segments.
  • 2. Connect according to the track model,

estimate a possible position on a track.

  • 3. Tree structures appear,

collect segments into track candidates.

  • 4. Select the best track candidates.
slide-9
SLIDE 9

16 November 2010, MPI Munich Ivan Kisel, GSI 9/28

Kalman Filter (KF) based Track Fit

Track fit: Estimation of the track parameters at one or more hits along the track – Kalman-Filter (KF) Detector layers Hits

π

(r, C)

r – Track parameters C – Precision Initialising Prediction Correction Precision

1 2 3

r = { x, y, z, px, py, pz }

Position, direction and momentum

State vector

Nowadays the Kalman-Filter is used in almost all HEP experiments

Kalman Filter:

  • 1. Start with an arbitrary initialization.
  • 2. Add one hit after another.
  • 3. Improve the state vector.
  • 4. Get the optimal parameters after the last hit.

KF as a recursive least squares method KF Block-diagram

1 2 3

slide-10
SLIDE 10

16 November 2010, MPI Munich Ivan Kisel, GSI 10/28

Track Finding in the Pattern Tracker of HERA-B (DESY)

TEMA RANGER CATS

Hough Transformation Kalman Filter Cellular Automaton

Extremely low resolution and efficiency

  • f the pattern tracker of HERA-B

OTR I TR

slide-11
SLIDE 11

16 November 2010, MPI Munich Ivan Kisel, GSI 11/28

Competition CATS(CA)/ RANGER(KF)/ TEMA(HT) (HERA-B, DESY)

The reconstruction package CATS based on the Cellular Automaton for track finding and the Kalman Filter for track fitting

  • utperforms alternative packages

(SUSi, HOLMES, L2Sili, OSCAR, RANGER, TEMA) based on traditional methods in efficiency, accuracy and speed

Tracking quality Time consumption

Ninel x 50 tracks Time/event (sec)

Tracking efficiency

Ninel x 50 tracks Efficiency

slide-12
SLIDE 12

16 November 2010, MPI Munich 12/18 Ivan Kisel, GSI

slide-13
SLIDE 13

16 November 2010, MPI Munich Ivan Kisel, GSI 13/28

CPU

Thread Thread

2000

Many-Core HPC: Cores, Threads and SI MD

Cores and Threads realize the task level of parallelism

2010 2015

Process

Thread1 Thread2 … … exe r/w r/w exe exe r/w ... ...

Vectors (SIMD) = data level of parallelism

Core

Scalar Vector

D S S S S

SIMD = Single Instruction, Multiple Data

Fundamental redesign

  • f traditional approaches to data processing

is necessary HEP: cope with high data rates !

Cores Threads SIMD Width Performance

slide-14
SLIDE 14

16 November 2010, MPI Munich Ivan Kisel, GSI 14/28

Our Experience with Many-Core CPU/ GPU Architectures

63% of the maximal GPU utilization (ALICE)

2x4 cores Since 2005 Since 2008 512 cores 1+ 8 cores Since 2006 Since 2008 32 cores

70% of the maximal Cell performance (CBM) Cooperation with Intel (ALICE/CBM)

I ntel/ AMD CPU NVI DI A GPU I ntel MI CA I BM Cell

6.5 ms/event (CBM)

Future systems are heterogeneous

slide-15
SLIDE 15

16 November 2010, MPI Munich Ivan Kisel, GSI 15/28

CPU/ GPU Programming Frameworks

Vector classes: Cooperation with the Intel Ct group

  • Intel Ct (C for throughput)
  • Extension to the C language
  • Intel CPU/GPU specific
  • SIMD exploitation for automatic parallelism
  • NVIDIA CUDA (Compute Unified Device Architecture)
  • Defines hardware platform
  • Generic programming
  • Extension to the C language
  • Explicit memory management
  • Programming on thread level
  • OpenCL (Open Computing Language)
  • Open standard for generic programming
  • Extension to the C language
  • Supposed to work on any hardware
  • Usage of specific hardware capabilities by extensions
  • Vector classes (Vc)
  • Overload of C operators with SIMD/SIMT instructions
  • Uniform approach to all CPU/GPU families
  • Uni-Frankfurt/FIAS/GSI
slide-16
SLIDE 16

Vector Classes (Vc)

16 November 2010, MPI Munich Ivan Kisel, GSI 16/28

Vector classes:

provide full functionality for all platforms

support the conditional operators phi(phi< 0)+ = 360; c = a+ b vc = _mm_add_ps(va,vb) Scalar SIMD

Vector classes enable easy vectorization of complex algorithms

Vc increase the speed by the factor:

SSE2 – SSE4 4x

future CPUs 8x

MICA/Larrabee 16x

  • NVIDIA Fermi research

Vector classes overload scalar C operators with SIMD/SIMT extensions

slide-17
SLIDE 17

16 November 2010, MPI Munich Ivan Kisel, GSI 17/28

Tracking Challenge in CBM (FAI R/ GSI )

  • Fixed-target heavy-ion experiment
  • 107 Au+ Au collisions/s
  • 1000 charged particles/collision
  • Non-homogeneous magnetic field
  • Double-sided strip detectors

(85% combinatorial space points) Track reconstruction in STS/MVD and displaced vertex search are required in the first trigger level. Reconstruction packages:

  • track finding

Cellular Automaton (CA)

  • track fitting

Kalman Filter (KF)

  • vertexing

KF Particle

slide-18
SLIDE 18

16 November 2010, MPI Munich Ivan Kisel, GSI 18/28

Kalman Filter (KF) for Track Fitting

Parameterization

  • f the magnetic field

December 21, 1968. The Apollo 8 spacecraft has just been sent on its way to the Moon.

003:46:31 Collins: Roger. At your convenience, would you please go P00 and Accept? We're going to update to your W-matrix.

Optimization

  • f the algorithm

1 2 3

KF was considerably reworked

slide-19
SLIDE 19

16 November 2010, MPI Munich Ivan Kisel, GSI 19/28

Kalman Filter Track Fit on Cell

Motivated by, but not restricted to Cell !

blade11bc4 @IBM, Böblingen: 2 Cell Broadband Engines with 256 kB Local Store at 2.4 GHz

Intel P4 Cell

10000x faster

  • n each CPU
  • Comp. Phys. Comm. 178 (2008) 374-383

The KF speed was increased by 5 orders of magnitude

slide-20
SLIDE 20

16 November 2010, MPI Munich Ivan Kisel, GSI 20/28

Performance of the KF Track Fit on CPU/ GPU Systems

scalar double single -> 2 4 8 16 32 1.00 10.00 0.10 0.01

Scalability on different CPU architectures – speed-up 100 Data Stream Parallelism (10x) Task Level Parallelism (100x)

2xCell SPE (16 ) Woodcrest ( 2 ) Clovertown ( 4 ) Dunnington ( 6 )

SIMD Cores and Threads

Time/Track, µs Threads Cores Threads SIMD Real-time performance on different Intel CPU platforms Real-time performance on NVIDIA GPU graphic cards

Scalabilty CPU GPU

The Kalman Filter Algorithm performs at ns level

CBM Progr. Rep. 2008

slide-21
SLIDE 21

16 November 2010, MPI Munich Ivan Kisel, GSI 21/28

CBM Cellular Automaton (CA) Track Finder

770 Tracks

Top view Front view Efficiency Scalability

Highly efficient reconstruction of 150 central collisions per second

Intel X5550, 2x4 cores at 2.67 GHz

slide-22
SLIDE 22

16 November 2010, MPI Munich Ivan Kisel, GSI 22/28

First Level Event Selection (FLES) Complexity

60 000 Cores Sverre Jarp

2010

Farm PC CPU Farm Sub-Farm PC CPU/GPU Socket Core Thread Vector

  • FARM CONTROL SYSTEM:

monitoring the farm

 reliability

  • SCHEDULER:

high-level parallelism

 scalability

  • ALGORITHMS:

low-level parallelism

 CPU/GPU load

Big Bang

CBM DAQ/FLES

slide-23
SLIDE 23

16 November 2010, MPI Munich Ivan Kisel, GSI 23/28

I nternational Tracking Workshop

45 participants from Austria, China, Germany, India, Italy, Norway, Russia, Switzerland, UK and USA

slide-24
SLIDE 24

Software Evolution: Many-Core Barrier

16 November 2010, MPI Munich Ivan Kisel, GSI 24/28

t 1990 2000 2010 t 1990 2000 2010

Many-core HPC era Scalar single-core OOP

Consolidate efforts of:

  • Physicists
  • Mathematicians
  • Computer scientists
  • Developers of parallel languages
  • Many-core CPU/GPU producers

Software redesign can be synchronized between the experiments

slide-25
SLIDE 25

16 November 2010, MPI Munich Ivan Kisel, GSI 25/28

/ / Track Reconstruction in CBM and ALI CE

Different experiments have similar reconstruction problems

CBM (FAI R/ GSI ) ALI CE (CERN)

Track reconstruction is the most time consuming part of the event reconstruction, therefore many-core CPU/GPU platforms. Track finding is based in both cases on the Cellular Automaton method, track fitting – on the Kalman Filter method.

NVIDIA GPU 240 cores (ALICE HLT Group) Intel CPU 8 cores (CBM Reco Group)

107 collisions/s Collider Fixed-Target Forward geometry Cylindrical geometry 104 collisions/s

slide-26
SLIDE 26

Stages of Event Reconstruction: To-Do List

16 November 2010, MPI Munich Ivan Kisel, GSI 26/28

Track finding Track fitting Vertex finding/fitting Ring finding (PID)

Time consuming!!! Kalman Filter Kalman Filter Combinatorics

Detector/geometry independent RICH specific Track model dependent Detector dependent

  • Generalized track finder(s)
  • Geometry representation
  • Interfaces
  • Infrastructure
  • Kalman Filter
  • Kalman Smoother
  • Deterministic Annealing Filter
  • Gaussian Sum Filter
  • Field representation
  • 3D Mathematics
  • Adaptive filters
  • Functionality
  • Physics analysis
  • Ring finders
slide-27
SLIDE 27

16 November 2010, MPI Munich Ivan Kisel, GSI 27/28

Consolidate Efforts: Common Reconstruction Package

ALICE (CERN) CBM (FAIR/GSI) STAR (BNL) PANDA (FAIR/GSI) Host Experiments:

Uni-Frankfurt/FIAS: Vector classes GPU implementation GSI: Algorithms development Many-core optimization HEPHY (Vienna)/Uni-Gjovik: Kalman Filter track fit Kalman Filter vertex fit OpenLab (CERN): Many-core optimization Benchmarking Intel: Ct implementation Many-core optimization Benchmarking

Common Reconstruction Package

slide-28
SLIDE 28

16 November 2010, MPI Munich Ivan Kisel, GSI 28/28

Follow-up Workshop

Follow-up Workshop: 10-11 March 2011 at CERN contact Sverre Jarp