Link Time Optimization Mechanism in GCC-4.7.2 Uday Khedker - - PowerPoint PPT Presentation

link time optimization mechanism in gcc 4 7 2
SMART_READER_LITE
LIVE PREVIEW

Link Time Optimization Mechanism in GCC-4.7.2 Uday Khedker - - PowerPoint PPT Presentation

Link Time Optimization Mechanism in GCC-4.7.2 Uday Khedker (www.cse.iitb.ac.in/uday) GCC Resource Center, Department of Computer Science and Engineering, Indian Institute of Technology, Bombay 13 June 2014 EAGCC-PLDI-14 LTO in GCC-4.7.2:


slide-1
SLIDE 1

Link Time Optimization Mechanism in GCC-4.7.2

Uday Khedker

(www.cse.iitb.ac.in/˜uday) GCC Resource Center, Department of Computer Science and Engineering, Indian Institute of Technology, Bombay 13 June 2014
slide-2
SLIDE 2 EAGCC-PLDI-14 LTO in GCC-4.7.2: 1/24

Motivation for Link Time Optimization

  • Default cgraph creation is restricted to a translation unit (i.e. a single file)

⇒ Interprocedural analysis and optimization is restricted to a single file

  • All files (or their equivalents) are available only at link time

(assuming static linking)

  • LTO enables interprocedural optimizations across different files
Uday Khedker GRC, IIT Bombay
slide-3
SLIDE 3 EAGCC-PLDI-14 LTO in GCC-4.7.2: 2/24

Link Time Optimization

  • LTO framework supported from GCC-4.6.0
  • Use -flto option during compilation
  • Generates conventional .o files with GIMPLE level information inserted

Complete translation is performed in this phase

  • During linking all object modules are put together and lto1 is invoked
  • lto1 re-executes optimization passes from the function cgraph optimize

Basic Idea: Provide a larger call graph to regular ipa passes

Uday Khedker GRC, IIT Bombay
slide-4
SLIDE 4 EAGCC-PLDI-14 LTO in GCC-4.7.2: 3/24

Understanding LTO Framework

main () { printf ("hello, world\n"); }

Uday Khedker GRC, IIT Bombay
slide-5
SLIDE 5 EAGCC-PLDI-14 LTO in GCC-4.7.2: 4/24

Assembly Output without LTO Information (1)

.file "t0.c" .section .rodata .LC0: .string "hello, world" .text .globl main .type main, @function main: .LFB0: .cfi_startproc pushl %ebp .cfi_def_cfa_offset 8 .cfi_offset 5, -8 movl %esp, %ebp .cfi_def_cfa_register 5 andl $-16, %esp subl $16, %esp movl $.LC0, (%esp) call puts leave .cfi_restore 5 .cfi_def_cfa 4, 4 ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (GNU) 4.7.2" .section .note.GNU-stack,"",@prog Uday Khedker GRC, IIT Bombay
slide-6
SLIDE 6 EAGCC-PLDI-14 LTO in GCC-4.7.2: 5/24

Assembly Output with LTO Information in GCC-4.7.2 (2)

.ascii "\b" .text .section .gnu.lto_.refs.57f4e8b14959f6c4,"",@progbits .string "x\234cb‘d‘f‘‘‘b\200\001" .string "" .string "\204" .ascii "\t" .text .section .gnu.lto_.statics.57f4e8b14959f6c4,"",@progbits .string "x\234cb‘d‘b\300\016@\342\214\020&" .string "" .string "\375" .ascii "\t" .text .section .gnu.lto_.decls.57f4e8b14959f6c4,"",@progbits .string "x\234\215R=O\002A\020\2359Ne\303IB!\201\n\032M\224h\374\00 .string "\3218\311\313\275\333\233\2317\363n5@\020q@p(\2565\200E\34 .string "\2004\370!\336mB\003~\2068\017\022tB\230‘\020\232\2046\241 .string "\022Z\023\372\b\345\247\261^\t\270\341v\357\355\210\307>\0 Uday Khedker GRC, IIT Bombay
slide-7
SLIDE 7 EAGCC-PLDI-14 LTO in GCC-4.7.2: 6/24

Assembly Output with LTO Information in GCC-4.7.2 (3)

.string "\3474\030\205KN\321;\346\034\367L\324\031\304\301" .string "\3040\023\202\202\031\f\324\002&\336aT\261\" .string "\024\313k\260\004\017\\\306 O\245\323 \375\347iWu\001\232" .string "\"\343\245\226\225\032\242\322\306\004\024]\261\244’\246" .string "\273%\262\367P\3440\360\245A\b.8\257q~\302\263\257\341" .string "\377\r\037\020\236h\020A\257qK-\"\277\300hO\006g\262" .string "\347/vE^Ovc\036\032r\343\032\232\230a\324%.N\317G\006" .string "\366\3442L\222\270\242\334Q\201\216\307\334o\207\276\342" .string "\270%&\2661\3446E\377\037\374Q\320\364\013\"P\027\003\333| .string "\007\257\212^\335\254\252\353bD2\345\305\300\030\231\362" .string "\273\326#\372[\032l\230\031j\204$\334Jg9\r\237\236\363\356 .string "\377\335\273%d\363\346V>\271\221J\301Teu\245" .ascii "o\026\005\213." .text .section .gnu.lto_.symtab.57f4e8b14959f6c4,"",@progbits .string "main" .string "" .string "" .string "" .string "" Uday Khedker GRC, IIT Bombay
slide-8
SLIDE 8 EAGCC-PLDI-14 LTO in GCC-4.7.2: 7/24

Assembly Output with LTO Information in GCC-4.7.2 (4)

.string "" .string "" .string "" .string "" .string "" .string "" .string "" .string "\240" .string "" .string "" .text .section .gnu.lto_.opts,"",@progbits .string "’-fexceptions’’-mtune=generic’’-march=pentiumpro’’-flto’" .text .section .rodata .LC0: .string "hello, world" Uday Khedker GRC, IIT Bombay
slide-9
SLIDE 9 EAGCC-PLDI-14 LTO in GCC-4.7.2: 8/24

Assembly Output with LTO Information in GCC-4.7.2 (5)

.text .globl main .type main, @function main: .LFB0: .cfi_startproc pushl %ebp .cfi_def_cfa_offset 8 .cfi_offset 5, -8 movl %esp, %ebp .cfi_def_cfa_register 5 andl $-16, %esp subl $16, %esp movl $.LC0, (%esp) call puts Uday Khedker GRC, IIT Bombay
slide-10
SLIDE 10 EAGCC-PLDI-14 LTO in GCC-4.7.2: 9/24

Assembly Output with LTO Information in GCC-4.7.2 (6)

leave .cfi_restore 5 .cfi_def_cfa 4, 4 ret .cfi_endproc .LFE0: .size main, .-main .comm __gnu_lto_v1,1,1 .ident "GCC: (GNU) 4.7.2" .section .note.GNU-stack,"",@progbits Uday Khedker GRC, IIT Bombay
slide-11
SLIDE 11 EAGCC-PLDI-14 LTO in GCC-4.7.2: 10/24

Main Change in GCC-4.9.0

  • LTO output does not contain object code but only LTO information
Uday Khedker GRC, IIT Bombay
slide-12
SLIDE 12 EAGCC-PLDI-14 LTO in GCC-4.7.2: 11/24

Interprocedural Optimizations Using LTO

Whole program optimization needs to see the entire program

  • Does it need the entire program together in the memory?

Load only the call graph without function bodies

◮ Independent computation of summary information of functions ◮ “Adjusting” summary information through whole program analysis
  • ver the call graph
◮ Perform transformation independently on functions
  • Process the entire program together
Uday Khedker GRC, IIT Bombay
slide-13
SLIDE 13 EAGCC-PLDI-14 LTO in GCC-4.7.2: 12/24

Why Avoid Loading Function Bodies?

  • Practical programs could be rather large and compilation could become

very inefficient

  • Many optimizations decisions can be taken by looking at the call graph

alone

◮ Procedure Inlining: just looking at the call graph is sufficient

Perhaps some summary size information can be used

◮ Procedure Cloning: some additional summary information about

actual parameters of a call is sufficient

Uday Khedker GRC, IIT Bombay
slide-14
SLIDE 14 EAGCC-PLDI-14 LTO in GCC-4.7.2: 13/24

Partitioned and Non-Partitioned LTO

Analysis Sequential Analysis Transformation Load complete call graph

Uday Khedker GRC, IIT Bombay
slide-15
SLIDE 15 EAGCC-PLDI-14 LTO in GCC-4.7.2: 13/24

Partitioned and Non-Partitioned LTO

Analysis Sequential Analysis Transformation Load complete call graph Load function summaries but not bodies

Uday Khedker GRC, IIT Bombay
slide-16
SLIDE 16 EAGCC-PLDI-14 LTO in GCC-4.7.2: 13/24

Partitioned and Non-Partitioned LTO

Analysis Sequential Analysis Transformation Load complete call graph Load function summaries but not bodies Load all function bodies

Uday Khedker GRC, IIT Bombay
slide-17
SLIDE 17 EAGCC-PLDI-14 LTO in GCC-4.7.2: 13/24

Partitioned and Non-Partitioned LTO

Analysis Sequential Analysis Transformation Load complete call graph Load function summaries but not bodies Load all function bodies Load all function bodies

Uday Khedker GRC, IIT Bombay
slide-18
SLIDE 18 EAGCC-PLDI-14 LTO in GCC-4.7.2: 13/24

Partitioned and Non-Partitioned LTO

Analysis Sequential Analysis Transformation Load complete call graph Load function summaries but not bodies Load all function bodies Load all function bodies Load function bodies

  • ne by one
Uday Khedker GRC, IIT Bombay
slide-19
SLIDE 19 EAGCC-PLDI-14 LTO in GCC-4.7.2: 13/24

Partitioned and Non-Partitioned LTO

Analysis Sequential Analysis Transformation Load complete call graph Load function summaries but not bodies Load all function bodies Load all function bodies Load function bodies

  • ne by one

Load groups

  • f function

bodies

Uday Khedker GRC, IIT Bombay
slide-20
SLIDE 20 EAGCC-PLDI-14 LTO in GCC-4.7.2: 13/24

Partitioned and Non-Partitioned LTO

Analysis Sequential Analysis Transformation Load complete call graph Load function summaries but not bodies Load all function bodies Load all function bodies Load function bodies

  • ne by one

Load groups

  • f function

bodies All function bodies already loaded

Uday Khedker GRC, IIT Bombay
slide-21
SLIDE 21 EAGCC-PLDI-14 LTO in GCC-4.7.2: 13/24

Partitioned and Non-Partitioned LTO

Analysis Sequential Analysis Transformation Load complete call graph Load function summaries but not bodies Load all function bodies Load all function bodies Load function bodies

  • ne by one

Load groups

  • f function

bodies All function bodies already loaded Contradiction

Uday Khedker GRC, IIT Bombay
slide-22
SLIDE 22 EAGCC-PLDI-14 LTO in GCC-4.7.2: 13/24

Partitioned and Non-Partitioned LTO

Analysis Sequential Analysis Transformation Load complete call graph Load function summaries but not bodies Load all function bodies Load all function bodies Load function bodies

  • ne by one

Load groups

  • f function

bodies All function bodies already loaded Contradiction

×

Uday Khedker GRC, IIT Bombay
slide-23
SLIDE 23 EAGCC-PLDI-14 LTO in GCC-4.7.2: 13/24

Partitioned and Non-Partitioned LTO

Analysis Sequential Analysis Transformation Load complete call graph Load function summaries but not bodies Load all function bodies Load all function bodies Load function bodies

  • ne by one

Load groups

  • f function

bodies All function bodies already loaded

×

IPA not possible (only one function body at a time) Strictly sequential transformations

Uday Khedker GRC, IIT Bombay
slide-24
SLIDE 24 EAGCC-PLDI-14 LTO in GCC-4.7.2: 13/24

Partitioned and Non-Partitioned LTO

Analysis Sequential Analysis Transformation Load complete call graph Load function summaries but not bodies Load all function bodies Load all function bodies Load function bodies

  • ne by one

Load groups

  • f function

bodies All function bodies already loaded

××

Uday Khedker GRC, IIT Bombay
slide-25
SLIDE 25 EAGCC-PLDI-14 LTO in GCC-4.7.2: 13/24

Partitioned and Non-Partitioned LTO

Analysis Sequential Analysis Transformation Load complete call graph Load function summaries but not bodies Load all function bodies Load all function bodies Load function bodies

  • ne by one

Load groups

  • f function

bodies All function bodies already loaded

××

No need to load the entire program in memory IPA possible (multiple function bodies) Parallel transformations possible Analysis and transformations in independent processes

Uday Khedker GRC, IIT Bombay
slide-26
SLIDE 26 EAGCC-PLDI-14 LTO in GCC-4.7.2: 13/24

Partitioned and Non-Partitioned LTO

Analysis Sequential Analysis Transformation Load complete call graph Load function summaries but not bodies Load all function bodies Load all function bodies Load function bodies

  • ne by one

Load groups

  • f function

bodies All function bodies already loaded

××

Partitioned Mode

Uday Khedker GRC, IIT Bombay
slide-27
SLIDE 27 EAGCC-PLDI-14 LTO in GCC-4.7.2: 13/24

Partitioned and Non-Partitioned LTO

Analysis Sequential Analysis Transformation Load complete call graph Load function summaries but not bodies Load all function bodies Load all function bodies Load function bodies

  • ne by one

Load groups

  • f function

bodies All function bodies already loaded

××

Partitioned Mode Balanced partitions -flto -flto-partitions=balanced One Partition per file -flto -flto-partitions=1to1 Partitions by number -flto --params lto-partitions=n Partitions by size -flto --params lto-min-partition=s

Uday Khedker GRC, IIT Bombay
slide-28
SLIDE 28 EAGCC-PLDI-14 LTO in GCC-4.7.2: 13/24

Partitioned and Non-Partitioned LTO

Analysis Sequential Analysis Transformation Load complete call graph Load function summaries but not bodies Load all function bodies Load all function bodies Load function bodies

  • ne by one

Load groups

  • f function

bodies All function bodies already loaded

××

Entire program needs to be loaded in memory No partitions -flto -flto-partitions=none Strictly sequential transformations Analysis and transformations in the same processes

Uday Khedker GRC, IIT Bombay
slide-29
SLIDE 29 EAGCC-PLDI-14 LTO in GCC-4.7.2: 13/24

Partitioned and Non-Partitioned LTO

Analysis Sequential Analysis Transformation Load complete call graph Load function summaries but not bodies Load all function bodies Load all function bodies Load function bodies

  • ne by one

Load groups

  • f function

bodies All function bodies already loaded

××

Non-Partitioned Mode Entire program needs to be loaded in memory No partitions -flto -flto-partitions=none Strictly sequential transformations Analysis and transformations in the same processes

Uday Khedker GRC, IIT Bombay
slide-30
SLIDE 30 EAGCC-PLDI-14 LTO in GCC-4.7.2: 14/24

Partitioned LTO (aka WHOPR Mode of LTO)

  • Three steps
◮ LGEN: − generation of summary information − generation of translation unit information ◮ WPA: Whole Program Analysis − Reads the call graph and not function bodies − Summary information for each function ◮ LTRANS: Local Transformations Uday Khedker GRC, IIT Bombay
slide-31
SLIDE 31 EAGCC-PLDI-14 LTO in GCC-4.7.2: 14/24

Partitioned LTO (aka WHOPR Mode of LTO)

  • Three steps
◮ LGEN: Potentially Parallel − generation of summary information − generation of translation unit information ◮ WPA: Whole Program Analysis − Reads the call graph and not function bodies − Summary information for each function ◮ LTRANS: Local Transformations Uday Khedker GRC, IIT Bombay
slide-32
SLIDE 32 EAGCC-PLDI-14 LTO in GCC-4.7.2: 14/24

Partitioned LTO (aka WHOPR Mode of LTO)

  • Three steps
◮ LGEN: Potentially Parallel − generation of summary information − generation of translation unit information ◮ WPA: Whole Program Analysis Sequential − Reads the call graph and not function bodies − Summary information for each function ◮ LTRANS: Local Transformations Uday Khedker GRC, IIT Bombay
slide-33
SLIDE 33 EAGCC-PLDI-14 LTO in GCC-4.7.2: 14/24

Partitioned LTO (aka WHOPR Mode of LTO)

  • Three steps
◮ LGEN: Potentially Parallel − generation of summary information − generation of translation unit information ◮ WPA: Whole Program Analysis Sequential − Reads the call graph and not function bodies − Summary information for each function ◮ LTRANS: Local Transformations Potentially Parallel Uday Khedker GRC, IIT Bombay
slide-34
SLIDE 34 EAGCC-PLDI-14 LTO in GCC-4.7.2: 14/24

Partitioned LTO (aka WHOPR Mode of LTO)

  • Three steps
◮ LGEN: Potentially Parallel − generation of summary information − generation of translation unit information ◮ WPA: Whole Program Analysis Sequential − Reads the call graph and not function bodies − Summary information for each function ◮ LTRANS: Local Transformations Potentially Parallel
  • Processing sequence
Uday Khedker GRC, IIT Bombay
slide-35
SLIDE 35 EAGCC-PLDI-14 LTO in GCC-4.7.2: 14/24

Partitioned LTO (aka WHOPR Mode of LTO)

  • Three steps
◮ LGEN: Potentially Parallel − generation of summary information − generation of translation unit information ◮ WPA: Whole Program Analysis Sequential − Reads the call graph and not function bodies − Summary information for each function ◮ LTRANS: Local Transformations Potentially Parallel
  • Processing sequence
◮ gcc executes LGEN ◮ Subsequent process of lto1 executes WPA ◮ Subsequent independent processes of lto1 execute LTRANS Uday Khedker GRC, IIT Bombay
slide-36
SLIDE 36 EAGCC-PLDI-14 LTO in GCC-4.7.2: 15/24

Non-Partitioned LTO

  • Two steps
◮ LGEN: − generation of translation unit information − no summary ◮ IPA: Inter-Procedural Analysis − Reads the call graph and function bodies − Performs analysis and transformation

IPA is a whole program analysis (processes the entire program together)

Uday Khedker GRC, IIT Bombay
slide-37
SLIDE 37 EAGCC-PLDI-14 LTO in GCC-4.7.2: 15/24

Non-Partitioned LTO

  • Two steps
◮ LGEN: Potentially Parallel − generation of translation unit information − no summary ◮ IPA: Inter-Procedural Analysis − Reads the call graph and function bodies − Performs analysis and transformation

IPA is a whole program analysis (processes the entire program together)

Uday Khedker GRC, IIT Bombay
slide-38
SLIDE 38 EAGCC-PLDI-14 LTO in GCC-4.7.2: 15/24

Non-Partitioned LTO

  • Two steps
◮ LGEN: Potentially Parallel − generation of translation unit information − no summary ◮ IPA: Inter-Procedural Analysis Sequential − Reads the call graph and function bodies − Performs analysis and transformation

IPA is a whole program analysis (processes the entire program together)

Uday Khedker GRC, IIT Bombay
slide-39
SLIDE 39 EAGCC-PLDI-14 LTO in GCC-4.7.2: 15/24

Non-Partitioned LTO

  • Two steps
◮ LGEN: Potentially Parallel − generation of translation unit information − no summary ◮ IPA: Inter-Procedural Analysis Sequential − Reads the call graph and function bodies − Performs analysis and transformation

IPA is a whole program analysis (processes the entire program together)

  • Processing sequence
Uday Khedker GRC, IIT Bombay
slide-40
SLIDE 40 EAGCC-PLDI-14 LTO in GCC-4.7.2: 15/24

Non-Partitioned LTO

  • Two steps
◮ LGEN: Potentially Parallel − generation of translation unit information − no summary ◮ IPA: Inter-Procedural Analysis Sequential − Reads the call graph and function bodies − Performs analysis and transformation

IPA is a whole program analysis (processes the entire program together)

  • Processing sequence
◮ gcc executes LGEN ◮ Subsequent process of lto1 executes IPA Uday Khedker GRC, IIT Bombay
slide-41
SLIDE 41 EAGCC-PLDI-14 LTO in GCC-4.7.2: 16/24

LTO Pass Hooks

struct ipa opt pass d { struct opt pass pass; void (*generate summary) (void); void (*read summary) (void); void (*write summary) (struct cgraph node set def *, struct varpool node set def *); void (*write optimization summary)(struct cgraph node set def *, struct varpool node set def *); void (*read optimization summary) (void); void (*stmt fixup) (struct cgraph node *, gimple *); unsigned int function transform todo flags start; unsigned int (*function transform) (struct cgraph node *); void (*variable transform) (struct varpool node *); };

Uday Khedker GRC, IIT Bombay
slide-42
SLIDE 42 EAGCC-PLDI-14 LTO in GCC-4.7.2: 16/24

LTO Pass Hooks

struct ipa opt pass d { struct opt pass pass; void (*generate summary) (void); void (*read summary) (void); void (*write summary) (struct cgraph node set def *, struct varpool node set def *); void (*write optimization summary)(struct cgraph node set def *, struct varpool node set def *); void (*read optimization summary) (void); void (*stmt fixup) (struct cgraph node *, gimple *); unsigned int function transform todo flags start; unsigned int (*function transform) (struct cgraph node *); void (*variable transform) (struct varpool node *); }; LGEN for Partitioned LTO

Uday Khedker GRC, IIT Bombay
slide-43
SLIDE 43 EAGCC-PLDI-14 LTO in GCC-4.7.2: 16/24

LTO Pass Hooks

struct ipa opt pass d { struct opt pass pass; void (*generate summary) (void); void (*read summary) (void); void (*write summary) (struct cgraph node set def *, struct varpool node set def *); void (*write optimization summary)(struct cgraph node set def *, struct varpool node set def *); void (*read optimization summary) (void); void (*stmt fixup) (struct cgraph node *, gimple *); unsigned int function transform todo flags start; unsigned int (*function transform) (struct cgraph node *); void (*variable transform) (struct varpool node *); }; LGEN for Non-Partitioned LTO

Uday Khedker GRC, IIT Bombay
slide-44
SLIDE 44 EAGCC-PLDI-14 LTO in GCC-4.7.2: 16/24

LTO Pass Hooks

struct ipa opt pass d { struct opt pass pass; (member void (*execute) (void);) void (*generate summary) (void); void (*read summary) (void); void (*write summary) (struct cgraph node set def *, struct varpool node set def *); void (*write optimization summary)(struct cgraph node set def *, struct varpool node set def *); void (*read optimization summary) (void); void (*stmt fixup) (struct cgraph node *, gimple *); unsigned int function transform todo flags start; unsigned int (*function transform) (struct cgraph node *); void (*variable transform) (struct varpool node *); }; WPA for Partitioned LTO

Uday Khedker GRC, IIT Bombay
slide-45
SLIDE 45 EAGCC-PLDI-14 LTO in GCC-4.7.2: 16/24

LTO Pass Hooks

struct ipa opt pass d { struct opt pass pass; (member void (*execute) (void);) void (*generate summary) (void); void (*read summary) (void); void (*write summary) (struct cgraph node set def *, struct varpool node set def *); void (*write optimization summary)(struct cgraph node set def *, struct varpool node set def *); void (*read optimization summary) (void); void (*stmt fixup) (struct cgraph node *, gimple *); unsigned int function transform todo flags start; unsigned int (*function transform) (struct cgraph node *); void (*variable transform) (struct varpool node *); }; IPA for Non-Partitioned LTO

Uday Khedker GRC, IIT Bombay
slide-46
SLIDE 46 EAGCC-PLDI-14 LTO in GCC-4.7.2: 16/24

LTO Pass Hooks

struct ipa opt pass d { struct opt pass pass; void (*generate summary) (void); void (*read summary) (void); void (*write summary) (struct cgraph node set def *, struct varpool node set def *); void (*write optimization summary)(struct cgraph node set def *, struct varpool node set def *); void (*read optimization summary) (void); void (*stmt fixup) (struct cgraph node *, gimple *); unsigned int function transform todo flags start; unsigned int (*function transform) (struct cgraph node *); void (*variable transform) (struct varpool node *); }; LTRANS for Partitioned LTO

Uday Khedker GRC, IIT Bombay
slide-47
SLIDE 47 EAGCC-PLDI-14 LTO in GCC-4.7.2: 17/24

lto1 Control Flow

lto_main lto_init lto_process_name lto_reader_init read_cgraph_and_symbols if (flag_wpa) /* WPA for partitioned LTO */ do_whole_program_analysis materialize_cgraph execute_ipa_pass_list (all_regular_ipa_passes) lto wpa write files else /* IPA for non-partitioned LTO */ /* Only LTRANS for partitioned LTO */ materialize_cgraph cgraph_optimize

Uday Khedker GRC, IIT Bombay
slide-48
SLIDE 48 EAGCC-PLDI-14 LTO in GCC-4.7.2: 18/24

cc1 Control Flow: A Recap

toplev_main /* In file toplev.c */ compile_file lang_hooks.parse_file=>c_common_parse_file lang_hooks.decls.final_write_globals=>c_write_global_declarations cgraph_finalize_compilation_unit cgraph_analyze_functions /* Create GIMPLE */ cgraph_analyze_function /* Create GIMPLE */ ... cgraph_optimize ipa_passes execute_ipa_pass_list(all_small_ipa_passes) /*!in lto*/ execute_ipa_summary_passes(all_regular_ipa_passes) execute_ipa_summary_passes(all_lto_gen_passes) ipa_write_summaries execute_ipa_pass_list(all_late_ipa_passes) cgraph_expand_all_functions cgraph_expand_function /* Intra. GIMPLE, expansion, and RTL passes */

Uday Khedker GRC, IIT Bombay
slide-49
SLIDE 49 EAGCC-PLDI-14 LTO in GCC-4.7.2: 19/24

cc1 and Non-Partitioned lto1

toplev main ... compile file ... cgraph analyze function cgraph optimize ... ipa passes ... cgraph expand all functions ... tree rest of compilation cc1

Uday Khedker GRC, IIT Bombay
slide-50
SLIDE 50 EAGCC-PLDI-14 LTO in GCC-4.7.2: 19/24

cc1 and Non-Partitioned lto1

toplev main ... compile file ... cgraph analyze function lto main ... read cgraph and symbols ... materialize cgraph cgraph optimize ... ipa passes ... cgraph expand all functions ... tree rest of compilation lto1

Uday Khedker GRC, IIT Bombay
slide-51
SLIDE 51 EAGCC-PLDI-14 LTO in GCC-4.7.2: 20/24

Our Pictorial Convention

Source code cc1′ lto1′ common cc1 executable cc1′ lto1′ common lto1 executable cc1′ lto1′ common

Uday Khedker GRC, IIT Bombay
slide-52
SLIDE 52 EAGCC-PLDI-14 LTO in GCC-4.7.2: 21/24

The GNU Tool Chain: Our First Picture

gcc Source Program Target Program cc1 cpp cc1 cpp as ld glibc/newlib

Uday Khedker GRC, IIT Bombay
slide-53
SLIDE 53 EAGCC-PLDI-14 LTO in GCC-4.7.2: 21/24

The GNU Tool Chain: Our First Picture

gcc Source Program Target Program cc1 cpp cc1 cpp as ld via collect2 glibc/newlib

Uday Khedker GRC, IIT Bombay
slide-54
SLIDE 54 EAGCC-PLDI-14 LTO in GCC-4.7.2: 22/24

The GNU Tool Chain for Non-Partitioned LTO Support

gcc

Uday Khedker GRC, IIT Bombay
slide-55
SLIDE 55 EAGCC-PLDI-14 LTO in GCC-4.7.2: 22/24

The GNU Tool Chain for Non-Partitioned LTO Support

gcc cc1′ lto1′ common cc1

Uday Khedker GRC, IIT Bombay
slide-56
SLIDE 56 EAGCC-PLDI-14 LTO in GCC-4.7.2: 22/24

The GNU Tool Chain for Non-Partitioned LTO Support

gcc cc1′ lto1′ common cc1 “Fat” .s files

Uday Khedker GRC, IIT Bombay
slide-57
SLIDE 57 EAGCC-PLDI-14 LTO in GCC-4.7.2: 22/24

The GNU Tool Chain for Non-Partitioned LTO Support

gcc cc1′ lto1′ common cc1 “Fat” .s files as as “Fat” .o files

Uday Khedker GRC, IIT Bombay
slide-58
SLIDE 58 EAGCC-PLDI-14 LTO in GCC-4.7.2: 22/24

The GNU Tool Chain for Non-Partitioned LTO Support

gcc cc1′ lto1′ common cc1 “Fat” .s files as as “Fat” .o files collect2

Uday Khedker GRC, IIT Bombay
slide-59
SLIDE 59 EAGCC-PLDI-14 LTO in GCC-4.7.2: 22/24

The GNU Tool Chain for Non-Partitioned LTO Support

gcc cc1′ lto1′ common cc1 “Fat” .s files as as “Fat” .o files collect2 cc1′ lto1′ common lto1

Uday Khedker GRC, IIT Bombay
slide-60
SLIDE 60 EAGCC-PLDI-14 LTO in GCC-4.7.2: 22/24

The GNU Tool Chain for Non-Partitioned LTO Support

gcc cc1′ lto1′ common cc1 “Fat” .s files as as “Fat” .o files collect2 cc1′ lto1′ common lto1 Single .s file

Uday Khedker GRC, IIT Bombay
slide-61
SLIDE 61 EAGCC-PLDI-14 LTO in GCC-4.7.2: 22/24

The GNU Tool Chain for Non-Partitioned LTO Support

gcc cc1′ lto1′ common cc1 “Fat” .s files as as “Fat” .o files collect2 cc1′ lto1′ common lto1 Single .s file as as Single .o file

Uday Khedker GRC, IIT Bombay
slide-62
SLIDE 62 EAGCC-PLDI-14 LTO in GCC-4.7.2: 22/24

The GNU Tool Chain for Non-Partitioned LTO Support

gcc cc1′ lto1′ common cc1 “Fat” .s files as as “Fat” .o files collect2 cc1′ lto1′ common lto1 Single .s file as as Single .o file collect2

Uday Khedker GRC, IIT Bombay
slide-63
SLIDE 63 EAGCC-PLDI-14 LTO in GCC-4.7.2: 22/24

The GNU Tool Chain for Non-Partitioned LTO Support

gcc cc1′ lto1′ common cc1 “Fat” .s files as as “Fat” .o files collect2 cc1′ lto1′ common lto1 Single .s file as as Single .o file collect2 + glibc/newlib ld ld a.out file

Uday Khedker GRC, IIT Bombay
slide-64
SLIDE 64 EAGCC-PLDI-14 LTO in GCC-4.7.2: 22/24

The GNU Tool Chain for Non-Partitioned LTO Support

gcc cc1′ lto1′ common “Fat” .s files as as “Fat” .o files collect2 cc1′ lto1′ common lto1 Single .s file as as Single .o file collect2 + glibc/newlib ld ld a.out file Common Code (executed twice for each function in the input program for single process LTO. Once during LGEN and then during WPA + LTRANS) cgraph optimize ipa passes execute ipa pass list(all small ipa passes)/*!in lto*/ execute ipa summary passes(all regular ipa passes) execute ipa summary passes(all lto gen passes) ipa write summaries execute ipa pass list(all late ipa passes) cgraph expand all functions cgraph expand function /* Intraprocedural passes on GIMPLE, */ /* expansion pass, and passes on RTL. */

Uday Khedker GRC, IIT Bombay
slide-65
SLIDE 65 EAGCC-PLDI-14 LTO in GCC-4.7.2: 23/24

Partitioned LTO (aka WHOPR LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c

Uday Khedker GRC, IIT Bombay
slide-66
SLIDE 66 EAGCC-PLDI-14 LTO in GCC-4.7.2: 23/24

Partitioned LTO (aka WHOPR LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o

Uday Khedker GRC, IIT Bombay
slide-67
SLIDE 67 EAGCC-PLDI-14 LTO in GCC-4.7.2: 23/24

Partitioned LTO (aka WHOPR LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

Uday Khedker GRC, IIT Bombay
slide-68
SLIDE 68 EAGCC-PLDI-14 LTO in GCC-4.7.2: 23/24

Partitioned LTO (aka WHOPR LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

cc1′ lto1′ common

Option

  • flto -o out
Uday Khedker GRC, IIT Bombay
slide-69
SLIDE 69 EAGCC-PLDI-14 LTO in GCC-4.7.2: 23/24

Partitioned LTO (aka WHOPR LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

cc1′ lto1′ common

Option

  • flto -o out
  • ut
Uday Khedker GRC, IIT Bombay
slide-70
SLIDE 70 EAGCC-PLDI-14 LTO in GCC-4.7.2: 23/24

Partitioned LTO (aka WHOPR LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

cc1′ lto1′ common

Option

  • flto -o out
  • ut

External View Internal View

Uday Khedker GRC, IIT Bombay
slide-71
SLIDE 71 EAGCC-PLDI-14 LTO in GCC-4.7.2: 23/24

Partitioned LTO (aka WHOPR LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

cc1′ lto1′ common

Option

  • flto -o out
  • ut

External View Internal View large call graph with procedure summaries without procedure bodies (Interproc. analysis: √ Transformation: ×) /tmp/ccdKEyVB.ltrans0.o (possibly multiple files)

Uday Khedker GRC, IIT Bombay
slide-72
SLIDE 72 EAGCC-PLDI-14 LTO in GCC-4.7.2: 23/24

Partitioned LTO (aka WHOPR LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

cc1′ lto1′ common

Option

  • flto -o out
  • ut

External View Internal View large call graph with procedure summaries without procedure bodies (Interproc. analysis: √ Transformation: ×) /tmp/ccdKEyVB.ltrans0.o (possibly multiple files)

cc1′ lto1′ common

(possibly multiple files)

Uday Khedker GRC, IIT Bombay
slide-73
SLIDE 73 EAGCC-PLDI-14 LTO in GCC-4.7.2: 23/24

Partitioned LTO (aka WHOPR LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

cc1′ lto1′ common

Option

  • flto -o out
  • ut

large call graph with procedure summaries without procedure bodies (Interproc. analysis: √ Transformation: ×) /tmp/ccdKEyVB.ltrans0.o (possibly multiple files)

cc1′ lto1′ common

(possibly multiple files) LGEN

Uday Khedker GRC, IIT Bombay
slide-74
SLIDE 74 EAGCC-PLDI-14 LTO in GCC-4.7.2: 23/24

Partitioned LTO (aka WHOPR LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

cc1′ lto1′ common

Option

  • flto -o out
  • ut

large call graph with procedure summaries without procedure bodies (Interproc. analysis: √ Transformation: ×) /tmp/ccdKEyVB.ltrans0.o (possibly multiple files)

cc1′ lto1′ common

(possibly multiple files) LGEN WPA

Uday Khedker GRC, IIT Bombay
slide-75
SLIDE 75 EAGCC-PLDI-14 LTO in GCC-4.7.2: 23/24

Partitioned LTO (aka WHOPR LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

cc1′ lto1′ common

Option

  • flto -o out
  • ut

large call graph with procedure summaries without procedure bodies (Interproc. analysis: √ Transformation: ×) /tmp/ccdKEyVB.ltrans0.o (possibly multiple files)

cc1′ lto1′ common

(possibly multiple files) LGEN WPA LTRANS

Uday Khedker GRC, IIT Bombay
slide-76
SLIDE 76 EAGCC-PLDI-14 LTO in GCC-4.7.2: 24/24

(Non-Partitioned LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c

Uday Khedker GRC, IIT Bombay
slide-77
SLIDE 77 EAGCC-PLDI-14 LTO in GCC-4.7.2: 24/24

(Non-Partitioned LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o

Uday Khedker GRC, IIT Bombay
slide-78
SLIDE 78 EAGCC-PLDI-14 LTO in GCC-4.7.2: 24/24

(Non-Partitioned LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

Uday Khedker GRC, IIT Bombay
slide-79
SLIDE 79 EAGCC-PLDI-14 LTO in GCC-4.7.2: 24/24

(Non-Partitioned LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

cc1′ lto1′ common

Option

  • flto -o out
  • flto-partition=none
Uday Khedker GRC, IIT Bombay
slide-80
SLIDE 80 EAGCC-PLDI-14 LTO in GCC-4.7.2: 24/24

(Non-Partitioned LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

cc1′ lto1′ common

Option

  • flto -o out
  • flto-partition=none
  • ut
Uday Khedker GRC, IIT Bombay
slide-81
SLIDE 81 EAGCC-PLDI-14 LTO in GCC-4.7.2: 24/24

(Non-Partitioned LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

cc1′ lto1′ common

Option

  • flto -o out
  • flto-partition=none
  • ut

External View Internal View

Uday Khedker GRC, IIT Bombay
slide-82
SLIDE 82 EAGCC-PLDI-14 LTO in GCC-4.7.2: 24/24

(Non-Partitioned LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

cc1′ lto1′ common

Option

  • flto -o out
  • flto-partition=none
  • ut

External View Internal View large call graph with procedure bodies (Interproc. analysis: √ Transformation: √)

Uday Khedker GRC, IIT Bombay
slide-83
SLIDE 83 EAGCC-PLDI-14 LTO in GCC-4.7.2: 24/24

(Non-Partitioned LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

cc1′ lto1′ common

Option

  • flto -o out
  • flto-partition=none
  • ut

External View Internal View large call graph with procedure bodies (Interproc. analysis: √ Transformation: √)

Uday Khedker GRC, IIT Bombay
slide-84
SLIDE 84 EAGCC-PLDI-14 LTO in GCC-4.7.2: 24/24

(Non-Partitioned LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

cc1′ lto1′ common

Option

  • flto -o out
  • flto-partition=none
  • ut

large call graph with procedure bodies (Interproc. analysis: √ Transformation: √) LGEN

Uday Khedker GRC, IIT Bombay
slide-85
SLIDE 85 EAGCC-PLDI-14 LTO in GCC-4.7.2: 24/24

(Non-Partitioned LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

cc1′ lto1′ common

Option

  • flto -o out
  • flto-partition=none
  • ut

large call graph with procedure bodies (Interproc. analysis: √ Transformation: √) LGEN IPA + transformations

Uday Khedker GRC, IIT Bombay
slide-86
SLIDE 86 EAGCC-PLDI-14 LTO in GCC-4.7.2: 24/24

(Non-Partitioned LTO)

f1.c

cc1′ lto1′ common

f1.o Option -flto -c f2.c

cc1′ lto1′ common

f2.o f3.c

cc1′ lto1′ common

f3.o

cc1′ lto1′ common

Option

  • flto -o out
  • flto-partition=none
  • ut

large call graph with procedure bodies (Interproc. analysis: √ Transformation: √) LGEN IPA + transformations IPA can examine function bodies also

Uday Khedker GRC, IIT Bombay