OPTIMIZING BUILDS
ON WINDOWS
SOME PRACTICAL CONSIDERATIONS
Alexandre Ganea, Ubisoft alexandre.ganea@ubisoft.com 2019 Bay Area LLVM Developers' Meeting, Oct.22-23
1
OPTIMIZING BUILDS ON WINDOWS SOME PRACTICAL CONSIDERATIONS - - PowerPoint PPT Presentation
OPTIMIZING BUILDS ON WINDOWS SOME PRACTICAL CONSIDERATIONS Alexandre Ganea, Ubisoft alexandre.ganea@ubisoft.com 2019 Bay Area LLVM Developers' Meeting, Oct.22-23 1 SUMMARY PART 1 PREAMBLE PART 2 EXPERIMENTS PART 3 PROPOSAL PART 4 NEXT
ON WINDOWS
SOME PRACTICAL CONSIDERATIONS
Alexandre Ganea, Ubisoft alexandre.ganea@ubisoft.com 2019 Bay Area LLVM Developers' Meeting, Oct.22-23
1SUMMARY
PART 1
PREAMBLE
PART 2
EXPERIMENTS
PART 3
PROPOSAL
PART 4
NEXT STEPS
PART 1
CHALLENGES
3Lines of Code (Assassins’ Creed, Far Cry)
Editor Build 20,000 .CPP 25,000 .H 23 GB .OBJ 9 GB .DEBUG$T 10 M TYPE RECORDS 42 M SYMBOLS 300 M .EXE 2 GB .PDB Windows 10 Fastbuild, distributed Always Unity builds Concurent AAA games 20 – 25 LoC/game 30 - 50 M Programmers/title 100 – 250 Code Changes/day 100 – 150 (peak:400) Build targets/platform 5 – 6 Platforms/Game 4+ Code workspace 70 - 100 GB Data workspace 100 - 200 GB Game builds/day 100 – 150 Stripped Build 1 - 6 GB Final Build 50 - 90 GB
Game production constraints @ Ubisoft
08 min 50 sec 08 min 33 sec 08 min 50 sec 08 min 20 sec 07 min 00 sec 04 min 00 sec 04 min 15 sec 10 min 20 sec 06 min 46 sec 01 min 18 sec 43 sec 29 sec 29 sec 29 sec 00 min 00 sec 02 min 53 sec 05 min 46 sec 08 min 38 sec 11 min 31 sec 14 min 24 sec 17 min 17 sec 20 min 10 sec 2017 (MSVC) 2018 (MSVC) Fall 2018 (MSVC + LLD) 2019 (MSVC + LLD) 2019 (Clang) 100% cache hit, local SSD 100% cache hit, 1 Gpbs network
AAA GAME, CLEAN REBUILD X64 EDITOR RELEASE (FASTBUILD)
Compiler Linker
PART 2
clang-cl /E md5sum curl https://store/ clang-cl clang-scan-deps while read x; do md5sum $x; done
deps.txt a.cpp a.cpp
found not found
5-10 sec 0.02 sec 0.02 sec
FASTBUILD CACHE READ ALGORITHM
PART 2 – EXPERIMENTSdeps+MD5.txt
06 min 10 sec 04 min 05 sec 35 sec 40 sec 40 sec 40 sec VS2017 15.9.16 Network cache Network cache + clang-scan-deps
100% NETWORK CACHE HITS
AAA GAME, X64 EDITOR RELEASE (FASTBUILD)
Compiler/Cache Linker
PART 2 – EXPERIMENTSclang-scan-deps + network cache LLD (MSVC OBJs + ghash)
Intel Xeon W-2135 @ 3.7 GHz, 128 GB, NVMe SSD, 1Gbps Network
7 GB –> 22.6 GB 50k files
PART 2 – EXPERIMENTS(ms)
11.5% process time
CLANG-SCAN-DEPS STANDALONE (50K FILES)
avg ~90% cpu
Intel Xeon W-2135 @ 3.7 GHz (6-core), 128 GB, NVMe SSD
STRINGMAP
STRINGMAP
sizeof(std::error_code) -> 16 bytes sizeof(llvm::ErrorOr<DirectoryEntry&>) -> 24 bytes sizeof(llvm::StringMapEntry<llvm::ErrorOr<DirectoryEntry&>>) –> 32 bytes (+string contents)
DOWN THE RABBIT HOLE
nullptr nullptr nullptr 0x15f238a92 nullptr nullptr nullptr NumBuckets 0x12345678
uint32_t
StringMapEntry* NumBuckets count value string
size_t T count
PART 2 – EXPERIMENTSSTRINGMAP: MEMORY LAYOUT
STRINGMAP (VTUNE)
0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 1 5 9 13 17 21 25 29 33 37 41 45 49 60.2% 14.7% 8.0% 5.3% 3.5% 1.7% 0.8% 0.5% 0.2% 0.1% 0.1%
Hash collisions / call
0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% 1 5 9 13 17 21 25 29 33 37 41 45 79.4% 11.7% 3.5% 1.6% 1.0% 0.5% 0.3% 0.2% 0.1% 0.1%
Cachelines hit / call 187 M samples
PART 2 – EXPERIMENTSSTRINGMAP STATS
DenseMap<uint64_t,T> + xxHash64() + StringSaver
DenseMap<__int128,T> + XXH128() + StringSaver
LINK AAA GAME, X64 EDITOR RELEASE
(22.8GB MSVC OBJS)
VS2019 16.2 LLD 9.0 LLD 8 + // GHASH
Intel Xeon W-2135 @ 3.7 GHz (6-core), 128 GB, NVMe SSD
PART 2 – EXPERIMENTS58 sec 62 sec 49 sec
19.21 s 7.42 s 5.29 s 4.20 s .0 s 5.0 s 10.0 s 15.0 s 20.0 s 25.0 s Clang 9.0, no Ghash Clang 8.0 + // Ghash (12-byte buckets) Clang 8.0 + // Ghash (8-byte buckets) Clang 8.0 + // Ghash (8-byte buckets) + 2MB pages GHash
uint64_t TypeIndex uint32_t GHash uint64_t
TypeIndex
PART 2 – EXPERIMENTSCOMPILING WITH CLANG 9.0
93 ms
PART 2 – EXPERIMENTSCLANG CC1 IN PROCMON
int main(int argc_, const char **argv_) { noteBottomOfStack(); llvm::InitLLVM X(argc_, argv_); SmallVector<const char *, 256> argv(argv_, argv_ + argc_); if (llvm::sys::Process::FixupStandardFileDescriptors()) return 1; llvm::InitializeAllTargets(); return ClangDriverMain(argv); } int ClangDriverMain(SmallVectorImpl<const char *>& argv) { static LLVM_THREAD_LOCAL bool EnterPE = true; if (EnterPE) { llvm::sys::DynamicLibrary::AddSymbol("ClangDriverMain", (void*)(i.. EnterPE = false; } else { llvm::cl::ResetAllOptionOccurrences(); } auto TargetAndMode = ToolChain::getTargetAndModeFromProgramName(arg..
clang/tools/driver/driver.cpp
int Command::Execute(ArrayRef<llvm::Optional<StringRef>> Redirects, std::string *ErrMsg, bool *ExecutionFailed) const { [...] typedef int (*ClangDriverMainFunc)(SmallVectorImpl<const char *> &); ClangDriverMainFunc ClangDriverMain = nullptr; [...] if (ClangDriverMain) { [...] llvm::CrashRecoveryContext CRC; CRC.EnableExceptionHandler = true; const void *PrettyState = llvm::SavePrettyStackState(); int Ret = 0; auto ExecuteClangMain = [&]() { Ret = ClangDriverMain(Argv); }; if (!CRC.RunSafely(ExecuteClangMain)) { llvm::RestorePrettyStackState(PrettyState); return CRC.RetCode; } return Ret; } else { auto Args = llvm::toStringRefArray(Argv.data()); return llvm::sys::ExecuteAndWait(Executable, Args, Env, Redirects, /*secondsToWait*/ 0, /*memoryLimit*/ 0, ErrMsg, ExecutionFailed); } }
clang/lib/driver/Job.cpp
PART 2 – EXPERIMENTSMAKING CC1 REENTRANT
CLANG DRIVER & CC1 MERGED
34 min 00 sec 28 min 00 sec 12 min 00 sec 32 min 30 sec 30 min 16 sec 13 min 10 sec 22 min 46 sec 19 min 54 sec 07 min 10 sec 6-core - W10 build 1803 6-core - W10 build 1903 36-core - W10 build 1709
BYPASSING THE CC1 PROCESS CLEAN REBUILD LLVM, CLANG & LLD
VS2019 16.2 Clang 9.0 Clang 9.0 + cc1 bypass
PART 2 – EXPERIMENTSLINKING RAINBOW6: SIEGE WITH THINLTO :-(
96% idle 4%
PART 2 – EXPERIMENTSTHINLTO: ALLOCATOR CONTENTION
$ LD_PRELOAD=/path/to/my/malloc.so /bin/ls
#include "rpmalloc/rpmalloc.c" extern "C" { _ACRTIMP _CRTRESTRICT void *malloc(size_t size) { return rpmalloc(size); } _ACRTIMP void free(void *p) { rpfree(p); } _ACRTIMP _CRTRESTRICT void *calloc(size_t n, size_t elem_size) { return rpcalloc(n, elem_size); } _ACRTIMP _CRTRESTRICT void *realloc(void *ptr, size_t size) { return rprealloc(ptr, size); } } // Bypass CRT debug allocator #ifdef _DEBUG void *operator new(decltype(sizeof(0)) n) noexcept(false) { return malloc(n); } void __CRTDECL operator delete(void *const block) noexcept { free(block); } void *operator new[](std::size_t s) throw(std::bad_alloc) { return malloc(s); } void operator delete[](void *p) throw() { free(p); } #endif
https://github.com/mjansson/rpmalloc
llvm/lib/Support/Windows/Memory.inc
PART 2 – EXPERIMENTSREPLACING THE CRT ALLOCATOR
57 min 00 sec 20 min 13 sec 16 min 19 sec 37 min 12 sec > 1 h 30 min 03 min 57 sec
VS 2017 15.9.16 Clang 9.0 ThinLTO Clang 9.0 ThinLTO + rpmalloc
THINLTO (CLEAN REBUILD) RAINBOW 6: SIEGE, PC GAME PROFILE
6-core (W10 build 1903) 36-core (W10 build 1709)
PART 2 – EXPERIMENTSPART 3
PROOF-OF-CONCEPT
37FASTBUILD
clang.exe lld-link.exe llvm-tblgen.exe clang-tblgen.exe llvm-lib.exe ml64.exe (masm) rc.exe cmake.exe
PREVIOUS BUILD PROCESS
Image Credit: Caterpillar
PART 3 – PROPOSALFASTBUILD
ml64.exe (masm) rc.exe cmake.exe
LLVM-BUILDOZER
clang.exe lld-link.exe llvm-tblgen.exe clang-tblgen.exe llvm-lib.exe
BUILDING WITH BUILDOZER
FASTBUILD
ml64.exe (masm) rc.exe cmake.exe
Worker 1 Worker 2 Worker 3 Worker 4 Worker 5 Local Local Local
int buildozer::ImportEXE(llvm::StringRef EXE) { [..] HINSTANCE H = LoadLibraryA(EXE.data()); if (!H) return 0; RemapIAT(H); InitDebInfo(); PatchRPMalloc(M); InitializeStaticTLS(H); InitializeCRT(M); FindEntryPoints(M); [..] }
PART 3 – PROPOSALRUNNING THE DOZER
”LoadLibrary can also be used to load other executable modules.[..] However, do not use LoadLibrary to run an .exe file. Instead, use the CreateProcess function.” (MSDN)
int buildozer::ImportEXE(llvm::StringRef EXE) { [..] HINSTANCE H = LoadLibraryA(EXE.data()); if (!H) return 0; RemapImportAddressTable(H); InitDebInfo(); PatchRPMalloc(M); InitializeStaticTLS(H); InitializeCRT(M); FindEntryPoints(M); [..] }
PART 3 – PROPOSALRUNNING THE DOZER
Pool.emplace(NumWorkers, [&]() { while (true) { buildozer::WorkUnit *WU = AcquireWork(..); if (!WU) break; int Mod = IdentifyMOD(WU); llvm::CrashRecoveryContext CRC; CRC.RunSafely([&] { buildozer::Launch(Mod, WU->Directory, WU->Arguments); }); [..] } }); Pool.join();
PART 3 – PROPOSALRUNNING THE DOZER
RUNNING THE DOZER
19 min 34 sec 11 min 53 sec Clang 9.0 Buildozer
Local build, AAA game, x64 Editor Release
PART 3 – PROPOSALIntel Xeon W-2135 @ 3.7 GHz (6-core), 128 GB, NVMe SSD
PART 4
SHORT TERM
PART 4– NEXT STEPS 49LONG TERM
PART 4– NEXT STEPS 50BUILD TARGET
PLATFORM
LONG TERM
PART 4– NEXT STEPS 51BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
LONG TERM
PART 4– NEXT STEPS 52PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
LONG TERM
PART 4– NEXT STEPS 53PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
DAILY COMMITS
LONG TERM
PART 4– NEXT STEPS 54PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
DAILY COMMITS ACTIVE BRANCHES
LONG TERM
PART 4– NEXT STEPS 55PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
DAILY COMMITS ACTIVE BRANCHES GAME PRODUCTION
LONG TERM
PART 4– NEXT STEPS 56PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
PLATFORM
BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET BUILD TARGET
DAILY COMMITS ACTIVE BRANCHES 5 min x6 x6 x100 x4 GAME PRODUCTION x20
Alexandre Ganea, Ubisoft alexandre.ganea@ubisoft.com