Testing AutoFDO for Geant4
Nathalie Rauschmayr
IT-CF-FPP With help from Benedikt Hegner and Shahzad Malik Muzaffar
1/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
Testing AutoFDO for Geant4 Nathalie Rauschmayr IT-CF-FPP With help - - PowerPoint PPT Presentation
Testing AutoFDO for Geant4 Nathalie Rauschmayr IT-CF-FPP With help from Benedikt Hegner and Shahzad Malik Muzaffar 1/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr Introduction Idea: Autotuning Compile 2/33 Testing AutoFDO for Geant4
IT-CF-FPP With help from Benedikt Hegner and Shahzad Malik Muzaffar
1/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
Compile
2/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
Compile Run
3/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
Compile Feedback Run
4/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
Compile Feedback Run
4/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
5/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
5/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
6/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
7/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
8/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
gcc -fprofile-generate test.c -o test test.gcno test.gcda gcc -fprofile-use test.c -o test Instrumentation Run Recompile Production Environment
9/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
gcc -fprofile-generate test.c -o test test.gcno test.gcda gcc -fprofile-use test.c -o test Instrumentation Run Recompile Production Environment
9/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
Create production binary Run production binary with perf Convert perf-profile Recompile with converted perf-profile
10/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
gcc -O3 -ggdb
perf record -b -e cpu/event=0xc4,umask=0x20, name=br inst retired near taken, period=1000009/pp ./test create gcov --binary=./test
gcc -O3 -fauto-profile=test.gcov test.c -o test Create production binary Run production binary with perf Convert perf-profile Recompile with converted perf-profile
11/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
12/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
13/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
14/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
15/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
Training data Run Number of Events FullCMS run 100 events FullCMS 100, 500, 1k FullCMS run 500 events FullCMS 100, 500, 1k FullCMS run 1k events FullCMS 100, 500, 1k
Normal AutoFDO 100 events AutoFDO 500 events AutoFDO 1000 events 130 140 150 160 170 Runtime in [s] Processing 100 events Normal AutoFDO 100 events AutoFDO 500 events AutoFDO 1000 Events 600 650 700 Runtime in [s] Processing 500 events
16/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
Normal AutoFDO 100 events AutoFDO 500 events AutoFDO 1000 events 1,150 1,200 1,250 1,300 1,350 1,400 Runtime in [s] Processing 1000 events
17/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
18/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
Training data Run Number of Events cmsRun 20 events config1 cmsRun config1 20, 50, 100 cmsRun 50 events config1 cmsRun config1 20, 50, 100 cmsRun 100 events config1 cmsRun config1 20, 50, 100
Normal AutoFDO 20 events AutoFDO 50 events AutoFDO 100 events 520 540 560 580 Runtime in [s] Processing 20 events Normal AutoFDO 20 events AutoFDO 50 events AutoFDO 100 Events 1,250 1,300 1,350 Runtime in [s] Processing 50 events
19/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
Normal AutoFDO 20 events AutoFDO 50 events AutoFDO 100 events 2,500 2,600 2,700 Runtime in [s] Processing 100 events
20/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
Training data Run Number of Events cmsRun 100 events config1 cmsRun config2 20, 50, 100
Normal AutoFDO 100 events 1,600 1,650 1,700 1,750 1,800 1,850 Runtime in [s] Processing 20 events Normal AutoFDO 100 events 3,800 4,000 4,200 4,400 Runtime in [s] Processing 50 events Normal AutoFDO 100 events 7,500 8,000 8,500 Runtime in [s] Processing 100 events
21/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
Training data Run Number of Events cmsRun 100 events config1 cmsRun config2 20, 50, 100 cmsRun 100 events config2 cmsRun config2 20, 50, 100
Normal AutoFDO 100 events AutoFDO 100 events 1,600 1,650 1,700 1,750 1,800 1,850 Runtime in [s] Processing 20 events Normal AutoFDO 100 events AutoFDO 100 events 3,800 4,000 4,200 4,400 Runtime in [s] Processing 50 events Normal AutoFDO 100 events AutoFDO 100 events 7,500 8,000 8,500 Runtime in [s] Processing 100 events
22/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
Training data Run Number of Events fullcms 100 events cmsRun job config2 20, 50, 100
Normal AutoFDO 100 events 540 550 560 570 580 Runtime in [s] Processing 20 events Normal AutoFDO 100 events 1,260 1,280 1,300 1,320 1,340 1,360 1,380 Runtime in [s] Processing 50 events Normal AutoFDO 100 events 2,550 2,600 2,650 2,700 2,750 Runtime in [s] Processing 100 events
23/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
libG4processes.so libAnnotated
G4EnhancedVecAllocator.hh;122;146;0;10000;9550;d9a18bb69d5efaf3d9068625ec56d66a G4EnhancedVecAllocator.hh;137;8389;0;225;450;6a740d527b3f213d4868919fc7d9710c G4EnhancedVecAllocator.hh;135;8389;0;10000;9550;a17d8feb82daee40febb118864576dc9
24/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
libG4processes.so libAnnotated
G4EnhancedVecAllocator.hh;122;146;0;10000;9550;d9a18bb69d5efaf3d9068625ec56d66a G4EnhancedVecAllocator.hh;137;8389;0;225;450;6a740d527b3f213d4868919fc7d9710c G4EnhancedVecAllocator.hh;135;8389;0;10000;9550;a17d8feb82daee40febb118864576dc9
24/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
G 4 P h
u c l e a r C r
s S e c t i
. c c : 1 6 5 G 4 N u c l e i M
e l . c c : 1 3 3 2 s t l u n i n i t i a l i z e d . h : 7 4 G 4 h B r e m s s t r a h l u n g M
e l . c c : 8 7 G 4 C
P a i r i n g C
r e c t i
s . h h : 5 6 v e c t
. t c c : 1 8 4 G 4 M u B r e m s s t r a h l u n g M
e l . c c : 3 G 4 I n u c l P a r t i c l e . h h : 8 3 G 4 E l e c t r
u c l e a r C r
s S e c t i
. c c : 2 3 2 7 l
a l e f a c e t s . h : 8 6 7 G 4 E m C
r e c t i
s . c c : 3 9 1 G 4 V E n e r g y L
s P r
e s s . c c : 1 9 9 G 4 F a s t V e c t
. h h : 6 8 G 4 V E n e r g y L
s P r
e s s . c c : 1 4 1 2 G 4 V E n e r g y L
s P r
e s s . c c : 1 1 4 3 G 4 V E n e r g y L
s P r
e s s . c c : 1 1 4 2 G 4 U n i v e r s a l F l u c t u a t i
. c c : 2 2 4 G 4 U n i v e r s a l F l u c t u a t i
. c c : 2 1 3 G 4 P
s s
. h h : 5 7 G 4 P r
e s s M a n a g e r . c c : 2 7 3 G 4 T r a n s p
t a t i
. c c : 7 3 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 ·10 4 Basic block counts 20 events 50 events 100 events
25/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
G 4 P h
u c l e a r C r
s S e c t i
. c c : 1 6 5 G 4 N u c l e i M
e l . c c : 1 3 3 2 s t l u n i n i t i a l i z e d . h : 7 4 G 4 h B r e m s s t r a h l u n g M
e l . c c : 8 7 G 4 C
P a i r i n g C
r e c t i
s . h h : 5 6 v e c t
. t c c : 1 8 4 G 4 M u B r e m s s t r a h l u n g M
e l . c c : 3 G 4 I n u c l P a r t i c l e . h h : 8 3 G 4 E l e c t r
u c l e a r C r
s S e c t i
. c c : 2 3 2 7 l
a l e f a c e t s . h : 8 6 7 G 4 E m C
r e c t i
s . c c : 3 9 1 G 4 V E n e r g y L
s P r
e s s . c c : 1 9 9 G 4 F a s t V e c t
. h h : 6 8 G 4 V E n e r g y L
s P r
e s s . c c : 1 4 1 2 G 4 V E n e r g y L
s P r
e s s . c c : 1 1 4 3 G 4 V E n e r g y L
s P r
e s s . c c : 1 1 4 2 G 4 U n i v e r s a l F l u c t u a t i
. c c : 2 2 4 G 4 U n i v e r s a l F l u c t u a t i
. c c : 2 1 3 G 4 P
s s
. h h : 5 7 G 4 P r
e s s M a n a g e r . c c : 2 7 3 G 4 T r a n s p
t a t i
. c c : 7 3 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 ·10 4 Branch probability 20 events 50 events 100 events without profile
26/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
1 Start perf together with the job 2 Gather profiles 3 Convert and merge profiles 4 Add compiler flag in CMake scripts
27/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
28/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
Perf profile delivered as number one hotspot: G4NeutronHPInelasticCompFS::SelectExitChannel with 5.9 % of br inst retired near taken
29/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
gcc
G4NeutronHPInelasticCompFS.cc:182:5: note: Unroll loop 9 times G4NeutronHPInelasticCompFS.cc:168:3: note: Unroll loop 6 times
30/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
>>>readelf -S fullcms | less There are 46 section headers, starting at offset 0x176209c0: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 [ 1] .note.ABI-tag NOTE 0000000000400190 00000190 0000000000000020 0000000000000000 A 4 [...] [29] .gnu.switches.tex PROGBITS 0000000000000000 01712830 0000000000a00569 0000000000000000 1
31/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
>>>head
/data/geant4.10.01.p03/source/event/src/G4EventManager.cc: /data/geant4.10.01.p03/ /data/geant4.10.01.p03/source/event/src/G4SmartTrackStack.cc: /data/geant4.10.01.p03/ /data/geant4.10.01.p03/source/event/src/G4StackManager.cc: /data/geant4.10.01.p03/ /data/geant4.10.01.p03/source/externals/clhep/src/Evaluator.cc: /data/geant4.10.01.p03/ /data/geant4.10.01.p03/source/externals/clhep/src/LorentzRotation.cc /data/geant4.10.01.p03/source/externals/clhep/src/LorentzVector.cc /data/geant4.10.01.p03/source/externals/clhep/src/LorentzVectorL.cc
32/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr
33/33 Testing AutoFDO for Geant4 Nathalie Rauschmayr