Multiproce cessi sing ng in Athena na
1
I. I. Performance nce study of Athena na event and job level parallelism on multi-co core systems.
- II. Performance
Multiproce cessi sing ng in Athena na I. I. Performance nce - - PowerPoint PPT Presentation
Multiproce cessi sing ng in Athena na I. I. Performance nce study of Athena na event and job level parallelism on multi-co core systems. II. Performance nce optimizations ns in Athena naMP. 1 Athena multi i jobs Athena MJ - job
1
for i in range(4): $> Athena.py -c “EvtM tMax=25; SkipEv Events ts=$ =$i*25” Jobo.py
core-0
JOB 0: Events: [0,1,…,24]
core-1
JOB 1: Events: [25,…,49]
core-2
JOB 2: Events: [50,…,74]
core-3
JOB 3: Events: [75,…,99]
LBL-ATLAS-Computing, 2010
2
OS-fork merge
$> Athena.py --
tMax=100 Jobo.py
firstEvnts
core-0
WORK RKER R 0:
Events: [0, 4, 8,…96]
core-1
WORK RKER R 1:
Events: [1, 5, 9,…,97]
core-2
WORK RKER R 2:
Events: [2, 6, 10,…,98]
core-3
WORK RKER R 3:
Events: [3, 7, 11,…,99]
tmp files
tmp files Output tmp files Output tmp files
LBL-ATLAS-Computing, 2010
Maximize the shared d memory!
SERIAL: parent-init-fork SERIAL: parent-merge and finalize AthenaMP Status by S.Binet - http://indico.cern.ch/getFile.py/access?contribId=2&resId=0&materialId=slides&confId=92059
3
4
5
Hit the memory limit, swapping
6
Hardware Optimizations: HT, QPI, NUMA, Affinity OS optimizations: affinity, numactl, io-related, disks,
Compiler, Malloc, etc.
Shared memory, forking later after init Queue event distribution
7
Intel Nehalem
coors.lbl.gov, rainier.lbl.gov
Intel sub-Nehalem
most of LXPLUS machines: Voatlas91,lxplus250,lxplus251
CPU-Memory symmetric access
8
9
AthenaMP Athena MJ
10
Affinity: pinning each processes to a separate CPU-core Floating: each process scheduled by OS; core switching is frequent
11
Workers floating Workers pinned to cpu-cores
core-0
WORK RKER R 0: Events: [0, 4, 5,…]
core-1
WORK RKER R 1: Events: [1, 6, 9,…]
core-2
WORK RKER R 2: Events: [2, 8, 10,…]
core-3
WORK RKER R 3: Events: [3, 7, 11,…]
LBL-ATLAS-Computing, 2010
events = multiprocesssing.queue(EvtMax+ncpus) events = [0,1,2,3,4,…,99, None,None,None,None] … evt_loop(evt=events.get(); evt != None): evt_loop_mgr.seek (evt_nbr) evt_loop_mgr.nextEvent ()
Slower worker doesn’t get left behind
Lost evt order
12
13
Round-robin event distribution Queue event distribution
AthenaMP shares memory about ~0.5 Gb of real
Queue balances workers arrival times thus improving
Hyper-Threading can give 25-30% gain on events
Affinity settings exploit CPUs better than linux cpu
NUMA effects take place on Nehalem CPUs.
14
Architectural gains: HyperThreading, QPI, NUMA etc. OS gains: affinity, numactl, io-related, disks, virtual
Compiler, Malloc, etc.
Faster initialization… Faster distribution of events to workers... Faster merging: merging events processed by workers
Faster finalization…
15
16 LBL-ATLAS-Computing, 2010