multiproce cessi sing ng in athena na
play

Multiproce cessi sing ng in Athena na I. I. Performance nce - PowerPoint PPT Presentation

Multiproce cessi sing ng in Athena na I. I. Performance nce study of Athena na event and job level parallelism on multi-co core systems. II. Performance nce optimizations ns in Athena naMP. 1 Athena multi i jobs Athena MJ - job


  1. Multiproce cessi sing ng in Athena na I. I. Performance nce study of Athena na event and job level parallelism on multi-co core systems. II. Performance nce optimizations ns in Athena naMP. 1

  2. Athena multi i jobs Athena MJ - job level l parall lleli lism for i in range(4): $> Athena.py - c “ EvtM tMax=25; SkipEv Events ts=$ =$i *25” Jobo.py core-0 JOB 0: start init end Events: [0,1,…,24] core-1 JOB 1: start end init Events: [25,…,49] core-2 JOB 2: start init end Events: [50,…,74] core-3 JOB 3: start init end Events: [75,…,99] PARALLEL: independent jobs LBL-ATLAS-Computing, 2010 2

  3. Athena naMP - event level parallelism sm $> Athena.py -- --nprocs=4 -c EvtM tMax=100 Jobo.py core-0 output- WORK RKER R 0: tmp Maximize Events: [0, 4, 8,…96] files the shared d memory! core-1 output WORK RKER R 1: tmp firstEvnts Events: [1, 5, 9,…,97] files init end OS-fork merge core-2 Output WORK RKER R 2: tmp Events: [2, 6, 10,…,98] files Inpu put Outpu tput t Files core-3 Output WORK RKER R 3: Files tmp Events: [3, 7, 11,…,99] files SERIAL: PARALLEL: workers event loop SERIAL: parent-merge and finalize parent-init-fork AthenaMP Status by S.Binet - http://indico.cern.ch/getFile.py/access?contribId=2&resId=0&materialId=slides&confId=92059 LBL-ATLAS-Computing, 2010 3

  4. Memory footpr print t of Athen enaMP MP & & Athen enaMJ MJ Athen enaMP ~0. 0.5 5 Gb Gb physical memory ry saved ved per r pro roces ess 4

  5. Event throughp ughput ut of Athena naMP and Athena naMJ Hit the memory limit, swapping Athen enaMP Athen enaMJ 5

  6. 1. External Optimizations: (no touching complex Athena code)  Hardware Optimizations: HT, QPI, NUMA, Affinity  OS optimizations: affinity, numactl, io-related, disks, virtual machines, etc.  Compiler, Malloc, etc. 2. Gains from AthenaMP/Athena design improvements:  Shared memory, forking later after init  Queue event distribution endless ground for improvements :) 6

  7. Archi hitectur ure upgrades Intel Nehalem Intel sub-Nehalem coors.lbl.gov, rainier.lbl.gov most of LXPLUS machines: Voatlas91,lxplus250,lxplus251 CPU-Memory symmetric access • Hyper Threading ->two logical cores on physical one • QPI Quick Path from CPU to CPU and CPU-to-Memory • Turbo Boost -> dynamic change of CPU-frequency • CPU-Memory non-symmetric access (NUMA) 7

  8. Event t Through ghput t per process for RDO to ESD reco on differe rent t machines 8

  9. Gain from Hyper er-Threa eadi ding AthenaMP Athena MJ 9

  10. Setti ting g affin init ity y of workers to cpu-cores Affinity: pinning each processes to a separate CPU-core Floating: each process scheduled by OS; core switching is frequent 10

  11. Event workers through ghput Workers floating Workers pinned to cpu-cores 11

  12. Rece cent Progress: s: Event distribution using Queue… Lost evt order core-0 events = multiprocesssing.queue(EvtMax+ncpus) WORK RKER R 0: events = [0,1,2,3,4,…,99, None,None,None,None] Events: [0, 4, 5,…] … core-1 WORK RKER R 1: Events: [1, 6, 9,…] evt_loop(evt=events.get(); evt != None): evt_loop_mgr.seek (evt_nbr) evt_loop_mgr.nextEvent () core-2 WORK RKER R 2: Events: [2, 8, 10,…] core-3 WORK RKER R 3: Events: [3, 7, 11,…] Balance e the e arri riva val times es of f work rker ers! Slower worker doesn’t get left behind LBL-ATLAS-Computing, 2010 12

  13. Worke kers through ghput t for Queue Round-robin event Queue event distribution distribution 13

  14.  AthenaMP shares memory about ~0.5 Gb of real memory footprint per worker.  Queue balances workers arrival times thus improving mp-scaling.  Hyper-Threading can give 25-30% gain on events throughput  Affinity settings exploit CPUs better than linux cpu scheduling.  NUMA effects take place on Nehalem CPUs. . 14

  15. 1. Externally available performance gains (without touching the athena code)  Architectural gains: HyperThreading, QPI, NUMA etc.  OS gains: affinity, numactl, io-related, disks, virtual machines, etc.  Compiler, Malloc, etc. 2. Gains from Athena/AthenaMP design improvements:  Faster initialization…  Faster distribution of events to workers...  Faster merging: merging events processed by workers instantly by one writer on a fly, without waiting for workers to finish…  Faster finalization… endless ground for improvements :) 15

  16. • Paolo Calafiura, Sebastien Binet, Yushu Yao, Charles Leggett, Wim Lavrijsen • Keith Jackson, David Levinthal • Ian Hinchliffe and LBL ATLAS Group • LBNL and DOE for Funding • CERN for Research LBL-ATLAS-Computing, 2010 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend