Jan Hubička and Martin Liška
SUSElabs jh@suse.cz, mliska@suse.cz
Building openSUSE with link-time optimizations Jan Hubika and - - PowerPoint PPT Presentation
Building openSUSE with link-time optimizations Jan Hubika and Martin Lika SUSElabs jh@suse.cz, mliska@suse.cz Outlilne What is link-time optimization? Link-time optimization and GCC Benchmarks Can we build openSUSE with
Jan Hubička and Martin Liška
SUSElabs jh@suse.cz, mliska@suse.cz
File1.c File2.c File3.c File4.c File1.o File2.o File3.o File4.o GCC GCC GCC GCC ld / gold a.out
File1.c File2.c File3.c File4.c File1.o IL File2.o IL File3.o IL File4.o IL GCC GCC GCC GCC ld / gold a.out LTO plug-in link-time compiler
(from linker’s resolution data most symbols become “static”)
9 9 9 ( G C C 2 . 9 5 ) : Function-at-a-time
1 ( G C C 3 . ) : New inliner (first high-level opt . in gcc)
4 ( G C C 3 . 4 ) : Unit-at-a-time; intermodule compilation for C; Inter-procedural
5 ( G C C 4 . ) : New SSA optimization framework
6 ( G C C 4 . 1 ) : Inter-procedural optimizations: profile guided inlining, pure/const discovery, mod/ref, inter-procedural constant propagation,
8 ( G C C 4 . 4 ) : Inter-procedural optimization on SSA; early optimization and inlining.
1 ( G C C 4 . 5 ) : Basic LTO framework (5 years in development)
1 1 ( G C C 4 . 6 ) : WHOPR (parallel link-time optimization); Firefox builds
File1.c File2.c File3.c File4.c File1.o IL File2.o IL File3.o IL File4.o IL GCC GCC GCC GCC ld / gold a.out LTO plug-in link-time compiler
File1.c File2.c File3.c File4.c File1.o IL File2.o IL File3.o IL File4.o IL GCC GCC GCC GCC ld / gold a.out LTO plguin Whole Program Analysis Local opt. Local opt Local opt.
1 2 ( G C C 4 . 7 . ) : Memory use optimizations, new inliner heuristics, new inter-procedural constant propagation with clonning
1 3 ( G C C 4 . 8 . ) : symbol table; propagation of values passed through aggregates
1 4 ( G C C 4 . 9 . ) : slim LTO objects by default; on demand loading of functions; devirtualization pass; feedback directed code layout
1 5 ( G C C 5 ) : Identical Code Folding; COMDAT optimization; One Definition Rule for C++; alignment propagation; correct command line options handling with LTO
1 6 ( G C C 6 ) : Linker-plugin now detects type of output binary. C&Fortran type merging. Better alias anaysis
1 7 ( G C C 7 ) : Inter-procedural value range propagaion; bitwise propagation
1 8 ( G C C 8 ) : Early debug info. Profile representation rewrite. Function splitting now by default. Reworked runtime estimation; Malloc attribute propagation
Parser IL generation
Early opts:
Early Inliner Constant prop. Forwward prop. Jump threading Scalar repl. of aggr. Alias analysis Redundancy ellim. Dead store ellim. Dead code ellim. Tail recursion Switch conversion pure/const/nothrow EH optimization Profile guessing
Compile time Link-time serial
IP analysis streaming out Symbol & type streaming in + merging Inter-procedural (whole program) Opts:
Dead symbol ellim. Symbol promotion profile analysis Identical code folding devirtualization Constant propagation const/destr merging Inlining pure/const/nothrow mod/ref comdat
Partitioning streaming out Streaming in symbols, types and declarations & link Stream in and apply transformations High level opts:
Constant prop. Complette unroll Forward prop. Alias analysis Return slot opt. Redudancy ellim. Jump threading Dead code ellim. Conditional store ellim. Copy prop. If combine Tail recursion Copy loop headers Scalar repl. of aggr. Dead store ellim. Dead code ellim. Reassociation Sincos, bswap opt. Loop invariant motion Partial redundancy ellim. Loop splitting Unroll and jam Loop dsitribution Loop interchange ...
Low level opts:
Common subexpression ellim. Forward propagation Copy propagation Partial Redundancy Ellim. Code hoisting Copy propagation Store motion If conversion Loop invariant motion Loop unrolling Doloop optimization Web construction Copy propagation Common subexpression ellim. Dead store ellim. Instruction combine Function partitioning Instruction splitting Live range shrinking Scheduling Register allocation Global common subexpr. Ellim. Shrink wrapping Stack adjustment opt. Register renaming Constant prop. Code reordering Scheduling X87 register stack Code/data alignment Machine dependent reorg. Code output
Link-time parallel
GCC 7 -O2 GCC 8 -O2 0.5 1 1.5 2 2.5 generic native
GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast 2 4 6 8 10 12 14 generic native
GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast 2 4 6 8 10 12 14 generic native 429.mcf 458.sjeng 445.gobmk 403.gcc 462.libquantum 473.astar 464.h264ref 471.omnetpp 401.bzip2 483.xalancbmk 400.perlbench 456.hmmer Geomean
20 40 60 80 100 120
GCC -Ofast relative to GCC 6 -O2
GCC 6 GCC 7 GCC 8
GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast 2 4 6 8 10 12 14 generic native
GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast 2 4 6 8 10 12 14 generic native
GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast 2 4 6 8 10 12 14 generic native 400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk Geomean
10
Clang & ICC -Ofast relative to GCC 8
clang/flang 6 ICC 18
GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast 2 4 6 8 10 12 14 generic native
GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast GCC 8 -O2 -flto GCC 8 -Ofast lto 2 4 6 8 10 12 14 generic native
GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast GCC 8 -O2 -flto GCC 8 -Ofast lto 2 4 6 8 10 12 14 generic native 400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk Geomean
2 4 6 8
GCC -Ofast -flto relative to -Ofast
GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast GCC 8 -O2 -flto GCC 8 -Ofast lto 2 4 6 8 10 12 14 generic native
GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast GCC 8 -O2 -flto GCC 8 -Ofast lto clang/fllang 6 -Ofast -flto ICC 18 -Ofast -flto 5 10 15 20 25 generic native
GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast GCC 8 -O2 -flto GCC 8 -Ofast lto clang/fllang 6 -Ofast -flto ICC 18 -Ofast -flto 5 10 15 20 25 generic native 400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk Geomean
20 40 60 80 100 120
Clang/flang 6 and ICC 18 relative to GCC 8 (-O2 -flto)
clang/flang 6 ICC 18
GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast GCC 8 -O2 -flto GCC 8 -Ofast lto clang/fllang 6 -Ofast -flto ICC 18 -Ofast -flto 5 10 15 20 25 generic native
Hmmer benchmark has problems; was excluded. To build with profile feedback often you can use: ./configure ; CFLAGS=”-O2 -fprofile-generate” make ; make check ; make clean ; CFLAGS=”-O2 -fprofile-use” make
400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk Geomean
5 10 15 20 25
Performance relative to GCC 8 -Ofast
LTO FDO FDO+LTO
GCC 7 -O2 GCC 8 -O2 GCC 7 -Ofast GCC 8 -OfastGCC 8 -Ofast + FDO Clang 6 -O2 Clang 6 -Ofast Icc 18 -Ofast 2000000 4000000 6000000 8000000 10000000 12000000 Non-LTO LTO
GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast ICC 18 -Ofast GCC 8 -O2 -flto GCC 8 -Ofast lto ICC 18 -Ofast -flto
10 20 30 40 50 60 generic native
responsiveness tp5o dromaeo dom Displaylist mutate tap paint speedometer a11yr svgr_opacity tp5o ARES6 stylebench tart dromaeo css tsvgx startup time
5 10 15 20 25 30 35
Firefox performance relative to non-LTO build
static Profile feedback
clang6 -Oz -flto clang6 -Oz -flto=thin clang6 -O2 -flto clang6 -O3 -flto=thin + FDO clang6 O3 -flto + FDO clang6 -O3 -flto clang6 -O3 -flto=thin clang6 -Oz clang6 -Os clang6 -O2 clang6 -O3 + FDO clang6 -O3 gcc8 -Os -flto gcc8 -O3 -flto + FDO gcc8 -O2 -flto gcc8 -O3 -flto gcc8 -Os gcc8 -O3 + FDO gcc8 -O2 gcc8 -O3 gcc8 -O3 -flto + FDO gcc7 -O3 -flto + FDO gcc6 -O3 -flto + FDO 20000000 40000000 60000000 80000000 100000000 120000000 140000000 EH data relocations text
clang6 -Oz -flto clang6 -Oz -flto=thin clang6 -O2 -flto clang6 -O3 -flto=thin + FDO clang6 O3 -flto + FDO clang6 -O3 -flto clang6 -O3 -flto=thin clang6 -Oz clang6 -Os clang6 -O2 clang6 -O3 + FDO clang6 -O3 gcc8 -Os -flto gcc8 -O3 -flto + FDO gcc8 -O2 -flto gcc8 -O3 -flto gcc8 -Os gcc8 -O3 + FDO gcc8 -O2 gcc8 -O3 gcc8 -O3 -flto + FDO gcc7 -O3 -flto + FDO gcc6 -O3 -flto + FDO 20000000 40000000 60000000 80000000 100000000 120000000 140000000 EH data relocations text
clang6 -Oz -flto clang6 -Oz -flto=thin clang6 -O2 -flto clang6 -O3 -flto=thin + FDO clang6 O3 -flto + FDO clang6 -O3 -flto clang6 -O3 -flto=thin clang6 -Oz clang6 -Os clang6 -O2 clang6 -O3 + FDO clang6 -O3 gcc8 -Os -flto gcc8 -O3 -flto + FDO gcc8 -O2 -flto gcc8 -O3 -flto gcc8 -Os gcc8 -O3 + FDO gcc8 -O2 gcc8 -O3 gcc8 -O3 -flto + FDO gcc7 -O3 -flto + FDO gcc6 -O3 -flto + FDO 20000000 40000000 60000000 80000000 100000000 120000000 140000000 EH data relocations text
clang6 -Oz -flto clang6 -Oz -flto=thin clang6 -O2 -flto clang6 -O3 -flto=thin + FDO clang6 O3 -flto + FDO clang6 -O3 -flto clang6 -O3 -flto=thin clang6 -Oz clang6 -Os clang6 -O2 clang6 -O3 + FDO clang6 -O3 gcc8 -Os -flto gcc8 -O3 -flto + FDO gcc8 -O2 -flto gcc8 -O3 -flto gcc8 -Os gcc8 -O3 + FDO gcc8 -O2 gcc8 -O3 gcc8 -O3 -flto + FDO gcc7 -O3 -flto + FDO gcc6 -O3 -flto + FDO 20000000 40000000 60000000 80000000 100000000 120000000 140000000 EH data relocations text
clang6 -Oz -flto clang6 -Oz -flto=thin clang6 -O2 -flto clang6 -O3 -flto=thin + FDO clang6 O3 -flto + FDO clang6 -O3 -flto clang6 -O3 -flto=thin clang6 -Oz clang6 -Os clang6 -O2 clang6 -O3 + FDO clang6 -O3 gcc8 -Os -flto gcc8 -O3 -flto + FDO gcc8 -O2 -flto gcc8 -O3 -flto gcc8 -Os gcc8 -O3 + FDO gcc8 -O2 gcc8 -O3 gcc8 -O3 -flto + FDO gcc7 -O3 -flto + FDO gcc6 -O3 -flto + FDO 20000000 40000000 60000000 80000000 100000000 120000000 140000000 EH data relocations text
–
mysql_upgrade (-92%)
–
Innochecksum (-94%)
–
mbstream (-94%)
LD PR: https://sourceware.org/PR23079
–
__asm__(".symver old_foo,foo@VERS_1.1")
– New symver function attribute must be added (GCC 9.1.0)
– e2fsprogs, btrfsprogs, … – Fat LTO objects must be used (-fgat-lto-objects) – LTO elf sections should be stripped – OBS sanitizer should be extended – LTO mode can combine both LTO objects and assembly objects
– ltrace: error: type of 'filter_matches_symbol' does not match
– gdb: error: type 'struct ipa_sym_addresses' violates the C++ One
Definition Rule [-Werror=odr]
– Example of syscall.cc in Chromium project:
asm volatile(".text\n" ".align 16, 0x90\n" ".type SyscallAsm, @function\n" "SyscallAsm:.cfi_startproc\n"
–
Error:nacl_helper.ltrans1.ltrans.o: In function `playground2::SandboxSyscall(int, long,
long, long, long, long, long)': nacl_helper.ltrans1.o:(.text+0x4503): undefined reference to `SyscallAsm'
5,000,000 10,000,000 15,000,000 20,000,000 25,000,000 30,000,000 35,000,000 40,000,000 20 40 60 80 100 120 140
License
This slide deck is licensed under the Creative Commons Attribution-ShareAlike 4.0 International license. It can be shared and adapted for any purpose (even commercially) as long as Attribution is given and any derivative work is distributed under the same license. Details can be found at https://creativecommons.org/licenses/by-sa/4.0/
General Disclaimer
This document is not to be construed as a promise by any participating organisation to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. openSUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for openSUSE products remains at the sole discretion of openSUSE. Further,
without obligation to notify any person or entity of such revisions or changes. All openSUSE marks referenced in this presentation are trademarks or registered trademarks of SUSE LLC, in the United States and other countries. All third-party trademarks are the property of their respective owners.
Credits
Template Richard Brown rbrown@opensuse.org Design & Inspiration
http://opensuse.github.io/branding- guidelines/