Debug Information From Metadata to Modules Adrian Prantl Duncan - - PowerPoint PPT Presentation

debug information
SMART_READER_LITE
LIVE PREVIEW

Debug Information From Metadata to Modules Adrian Prantl Duncan - - PowerPoint PPT Presentation

Debug Information From Metadata to Modules Adrian Prantl Duncan Exon Smith Apple Apple What is Debug Information? provides a mapping from source code binary program on disk: as DWARF, a highly compressed format in LLVM: as


slide-1
SLIDE 1

Debug Information

From Metadata to Modules

Adrian Prantl Apple Duncan Exon Smith Apple

slide-2
SLIDE 2

What is Debug Information?

  • provides a mapping from source code → binary program
  • on disk: as DWARF, a highly compressed format
  • in LLVM: as metadata (pre-finalized DWARF)

10% 5% 10% 13% 17% 45%

Types, Subprograms (45% of the DWARF for clang) Strings Locations Ranges, Inline Line table Accelerator

DWARF debug information for clang r250459, RelWithDebInfo+Assertions

slide-3
SLIDE 3

Debug Info, Scalability, and LTO

  • we attacked this problem from two sides:
  • LLVM: efficient new Metadata representation
  • Clang: emit less debug info with Module Debugging
  • volume of debug info limits scalability of

the compiler, particularly when using LTO

slide-4
SLIDE 4

LLVM: efficient new Metadata representation

  • making Metadata lightweight: dropping use-lists and separating

from Value

  • specialized MDNodes: syntax, isa support, and memory footprint
  • constructing Metadata graphs efficiently and distinct Metadata
  • grab bag of other major LTO optimizations
slide-5
SLIDE 5

Making Metadata lightweight

Argument Value MDString MDNode User

  • ld class hierarchy
slide-6
SLIDE 6

How do operands work?

Argument Value MDString MDNode User

slide-7
SLIDE 7

Use

value next prev

Use

value next prev

intrusive storage for use-lists

How do operands work?

Argument Value MDString MDNode User Use

value next prev

Value

vtable type uselist flags

slide-8
SLIDE 8

Use

value next prev

Use

value next prev

How do operands work?

Argument Value MDString MDNode User User

vtable type uselist flags

  • perands

User operands are an array of Uses

Use

value next prev

slide-9
SLIDE 9

VH

vtable value next prev

VH

vtable value next prev

VH

vtable value next prev

How do operands work?

Argument Value MDString MDNode User

ValueHandles are second-class

DenseMap

K2 K1 K3 V2 V1 V3

Value

vtable type uselist flags

slide-10
SLIDE 10

How did operands work?

Argument Value MDString MDNode User

  • ld MDNode operands were an array of ValueHandles

VH

vtable value next prev

MDNode

vtable type uselist flags fold-next flags

  • perands

VH

vtable value next prev

VH

vtable value next prev

slide-11
SLIDE 11

Argument Value MDString MDNode User

Separating Metadata from Value

Argument Value User MDString MDNode Metadata

slide-12
SLIDE 12

Separating Metadata from Value

MDString MDNode Metadata

slide-13
SLIDE 13
  • no vtable
  • no use-lists
  • no Type pointer

MDString MDNode Metadata

Metadata is lightweight

Metadata

md-flags

Metadata base class has size of 1 pointer

slide-14
SLIDE 14

Metadata is lightweight

MDString MDNode Metadata

new MDNode operands are 4x smaller

Op

Metadata*

Op

Metadata*

MDNode

md-flags node-flags context

Op

Metadata*

slide-15
SLIDE 15

MDString MDNode Metadata DINode MDTuple DILocation DISubrange DIScope DIEnumerator DIExpression GenericDINode DILocalScope DIType DICompileUnit DISubprogram MDString MDNode Metadata

Specialized MDNodes for debug info

slide-16
SLIDE 16

!1 = metadata !{metadata !2, metadata !"string"}

  • ld MDNode syntax

MDTuple: generic MDNode

!1 = !{!2, !"string"}

MDTuple syntax

if (isa<MDTuple>(N)) { ... }

isa support

MDString MDNode Metadata MDTuple

slide-17
SLIDE 17

DILocation: syntax

!1 = metadata !{i32 30, i32 7, metadata !2, null} !1 = !DILocation(line: 30, column: 7, scope: !2)

MDString MDNode Metadata MDTuple DILocation

slide-18
SLIDE 18

isLocation():

if (auto *N = dyn_cast<MDNode>(V)) if ((N->getNumOperands() == 3 || N->getNumOperands() == 4) && isa<ConstantInt>(N->getOperand(0) && isa<ConstantInt>(N->getOperand(1) && DINode(N).isScope(N->getOperand(2)) { ... }

if (DINode(N).isLocation()) { ... }

DILocation: isa support

if (isa<DILocation>(N)) { ... }

MDString MDNode Metadata DILocation

slide-19
SLIDE 19

DILocation: memory footprint

DILocation

md-flags node-flags context

Op

Metadata*

Op

Metadata*

32-bit line scope 16-bit column inlinedAt

MDString MDNode Metadata DILocation

slide-20
SLIDE 20

What about other Metadata graphs?

  • we should have more primitives for generic Metadata
  • MDInt and MDFloat: skip ConstantInt and ConstantFloat
  • vectors, dictionaries and lists (when tuples don't fit)
  • specialized nodes: syntax, isa support, and memory footprint
  • what makes a graph important and/or stable enough?
  • can we enable it for out-of-tree nodes?
slide-21
SLIDE 21

Constructing Metadata graphs

  • frontends (DIBuilder), bitcode deserialization, and lib/Linker build

metadata graphs

  • need temporary nodes for forward references
  • need use-lists (and RAUW support) to replace temporary nodes
  • Metadata use-lists are second-class
  • how can we limit exposure to use-lists?
slide-22
SLIDE 22

Op

Metadata*

Op

Metadata*

MDNode

md-flags node-flags RAUW

Op

Metadata*

  • largely unoptimized
  • uses side storage
  • dropped automatically, except

uniquing cycles

Temporary storage for explicit use-lists

SmallDenseMap Storage

ref

  • wner?

index ref

  • wner?

index ref

  • wner?

index ref

  • wner?

index

RAUW

context next-index use-map

slide-23
SLIDE 23

Constructing a graph

!0 = !{!1} !1 = !{!2} !2 = !{} how can we build this graph?

slide-24
SLIDE 24

Constructing a graph, top-down

!0 = !{!1} !1 = !{!2} !2 = !{}

1'

create temporary node for !1

slide-25
SLIDE 25

Constructing a graph, top-down

!0 = !{!1} !1 = !{!2} !2 = !{}

1'

create (unresolved) node for !0

slide-26
SLIDE 26

Constructing a graph, top-down

!0 = !{!1} !1 = !{!2} !2 = !{}

1' 2'

create temporary node for !2

slide-27
SLIDE 27

Constructing a graph, top-down

!0 = !{!1} !1 = !{!2} !2 = !{}

1' 2' 1

create (unresolved) node for !1

slide-28
SLIDE 28

Constructing a graph, top-down

1' 2' 1

replace temporary node for !1 with real node !0 = !{!1} !1 = !{!2} !2 = !{}

slide-29
SLIDE 29

Constructing a graph, top-down

!0 = !{!1} !1 = !{!2} !2 = !{}

2' 1 2

create node for !2

slide-30
SLIDE 30

Constructing a graph, top-down

!0 = !{!1} !1 = !{!2} !2 = !{}

2' 1 2

replace temporary node for !2 with real node, resolving !1 and !0

1

slide-31
SLIDE 31

Constructing a graph, top-down

!0 = !{!1} !1 = !{!2} !2 = !{}

1 2

that was a lot of RAUW and malloc traffic...

slide-32
SLIDE 32

Constructing a graph, bottom-up

!0 = !{!1} !1 = !{!2} !2 = !{} avoid malloc traffic and RAUW by reversing the order

slide-33
SLIDE 33

Constructing a graph, bottom-up

!0 = !{!1} !1 = !{!2} !2 = !{}

2

create node for !2

slide-34
SLIDE 34

Constructing a graph, bottom-up

!0 = !{!1} !1 = !{!2} !2 = !{}

1 2

create node for !1

slide-35
SLIDE 35

Constructing a graph, bottom-up

!0 = !{!1} !1 = !{!2} !2 = !{}

1 2

create node for !0

slide-36
SLIDE 36

Constructing a graph, bottom-up

!0 = !{!1} !1 = !{!2} !2 = !{}

1 2

no extra malloc traffic; no RAUW

slide-37
SLIDE 37

Constructing a cycle of uniqued nodes

!0 = !{!1} !1 = !{!2} !2 = !{!0}

1 2

building a cycle of uniqued nodes requires temporary nodes

slide-38
SLIDE 38

Not every node should be uniqued

  • graphs intentionally defeat uniquing when they want distinct nodes
  • !alias.scopes need distinct root nodes
  • DILexicalBlocks lack naturally discriminating operands
  • cycles of uniqued nodes need forward references and RAUW
  • cycles of uniqued nodes "look" distinct
  • we don't solve graph isomorphism
slide-39
SLIDE 39

distinct nodes are more efficient

  • distinct nodes are not uniqued

!1 = distinct !{} !2 = distinct !{}

  • note: self-references are automatically distinct
  • no re-uniquing penalty when operands change
  • never require use-lists (or RAUW support)

!1 = distinct !{!1} !1 = !{!1}

slide-40
SLIDE 40

Constructing cyclic graphs efficiently

!0 = distinct !{!1} !1 = !{!2} !2 = !{!0}

1 2

we can do better with distinct nodes

slide-41
SLIDE 41

Constructing cyclic graphs efficiently

!0 = distinct !{!1} !1 = !{!2} !2 = !{!0} create node for !0, with a dangling operand

slide-42
SLIDE 42

Constructing cyclic graphs efficiently

!0 = distinct !{!1} !1 = !{!2} !2 = !{!0}

2

create node for !2

slide-43
SLIDE 43

Constructing cyclic graphs efficiently

!0 = distinct !{!1} !1 = !{!2} !2 = !{!0}

1 2

create node for !1

slide-44
SLIDE 44

Constructing cyclic graphs efficiently

!0 = distinct !{!1} !1 = !{!2} !2 = !{!0}

1 2

patch operand(s) for !0

slide-45
SLIDE 45
  • careful scheduling avoids malloc traffic and RAUW
  • partial support in lib/Linker; not done in BitcodeReader (yet)

Constructing cyclic graphs efficiently

!0 = distinct !{!1} !1 = !{!2} !2 = !{!0}

1 2

slide-46
SLIDE 46

Grab bag: other major LTO optimizations

  • Metadata lazy-loaded (in bulk); new LTO API to expose it
  • avoided lib/Linker quadratic memory leak into LLVMContext from

globals with appending linkage

  • debug info requires fewer MCSymbols (and they're cheaper)
  • Value has dropped a couple of pointers
slide-47
SLIDE 47

What progress have we made?

compiler version small (verify-uselistorder) medium (llvm-lto) large (clang)

3.5 (r232544)

48s 2.27GB 10m 35s 22.8GB 25m 41s 75.6GB

3.6 (r240577)

38s 1.40GB 8m 32s 15.1GB 19m 45s 35.9GB

3.7 (r247539)

35s 0.79GB 7m 52s 9.15GB 18m 10s 19.3GB

ToT (r250621)

34s 0.73GB 7m 37s 8.11GB 16m 23s 17.2GB

3.5 vs. ToT

1.4x 3.1x 1.4x 2.8x 1.6x 4.4x

self-hosted clang/libLTO, using ld64-253.2 from Xcode 7 on a 2013 Mac Pro with 32GB RAM runtime and peak memory usage of ld, when linking executables from 3.6 (r240577) source tree

slide-48
SLIDE 48

What's left in LLVM?

  • use more distinct nodes; take more advantage of them
  • richer syntax for scoped debug info nodes
  • fine-grained lazy-loading of debug info metadata
  • debug info graphs need to be sliceable (link only what's used)
  • MC-layer diet v2 (I'm looking at you, MCRelaxableFragment)
  • leave debug info types out of LTO!
slide-49
SLIDE 49

Debug Information

  • provides a mapping from source code → binary program
  • stored in extra sections in the .o files

StringRef.o .debug_info: class StringRef { … .debug_line: 0x10 StringRef.cpp 23 0x20 StringRef.h 128 … .text: _ZN4llvm9StringRef:

slide-50
SLIDE 50

StringRef.o

Where does it go?

Option 1: linker leaves debug info in the .o files

PassManager.o .debug_info: .debug_line: .text: cc1_main.o .debug_info: .debug_line: .text: .text: .text: .debug_info: class StringRef { … .debug_line: 0x10 StringRef.cpp 23 0x20 StringRef.h 128 … .text: _ZN4llvm9StringRef:

slide-51
SLIDE 51

bin/clang

Where does it go?

Option 1: linker leaves debug info in the .o files

  • fast linking — slow debugging

StringRef.o PassManager.o .debug_info: .debug_line: cc1_main.o .debug_info: .debug_line: .text: .text: .text: .debug_info: class StringRef { … .debug_line: 0x10 StringRef.cpp 23 0x20 StringRef.h 128 … .text: _ZN4llvm9StringRef: .text: .text: which file has the definition of StringRef?

slide-52
SLIDE 52

Where does it go?

Option 2: linker links debug info together with the executable

  • typically done on Linux
  • very long link times

bin/clang .debug_info: .debug_line: .text:

slide-53
SLIDE 53

Where does it go?

Option 2: linker links debug info together with the executable

  • typically done on Linux
  • very long link times

bin/clang .debug_info: .debug_line: .text: StringRef.dwo .debug_info.dwo: class StringRef { … .debug_line.dwo: x+0x0 StringRef.cpp 23 x+0x10 StringRef.h 128 …

relocatable skeleton linked with executable bulk in external .dwo

  • split DWARF
slide-54
SLIDE 54

Where does it go?

Option 3: debug info archived separately from executable

PassManager.o .debug_info: .debug_line: .text: cc1_main.o .debug_info: .debug_line: .text: StringRef.o .debug_info: class StringRef { … .debug_line: 0x10 StringRef.cpp 23 0x20 StringRef.h 128 … .text: _ZN4llvm9StringRef:

slide-55
SLIDE 55

clang.dSYM / clang.dwp

Where does it go?

  • 1. dsymutil (Darwin)
  • 2. dwp (Linux)

Option 3: debug info archived separately from executable

bin/clang .debug_info: class StringRef { … .debug_line: 0x10 StringRef.cpp 23 0x20 StringRef.h 128 … .text: _ZN4llvm9StringRef:

slide-56
SLIDE 56

Why is clang.dSYM 1.2GB?

  • the problem is type information, specifically,

redundant type information:

  • #include “llvm/ADT/StringRef.h”

at -g recursively pulls in ~46KB of types into each .o file and there are ~1500 .o files

slide-57
SLIDE 57

(llvm-)dsymutil

  • a new linker for debug information built on top of LLVM
  • dsymutil collects debug info from all the .o files and generates a

single .dSYM bundle with all the debug info and accelerator tables for fast lookup

  • dsymutil performs ODR type uniquing for C++
slide-58
SLIDE 58

(llvm-)dsymutil

ninja clang (1561 targets)

clang.dSYM

  • no-odr

clang.dSYM

Regular

1.2G 413M

LTO

369M 388M

measured on a 2013 Mac Pro with 12 cores at 2.7GHz and 32GB RAM

clang r250459, X86/ARM/AArch64, RelWithDebInfo+Assertions, 1 parallel LTO link

slide-59
SLIDE 59
  • flimit-debug-info
  • emit C++ class types only in the .o file that has the vtable of the

class or an explicit template instantiation and forward declarations everywhere else

  • only C++ classes with vtables / explicit template instantiations
  • every .o file and (3rd-party) library must be built with debug info
  • debugger must scan every .o file for the definition of StringRef

(LLDB does not even support that)

  • Darwin and FreeBSD default to -fstandalone-debug

(also known as -fno-standalone-debug)

slide-60
SLIDE 60
  • flimit-debug-info

ninja clang (1561 targets)

_build/lib clang.dSYM

Standalone

4.1G 413M

Limited

3.1G 402M

LTO _build/lib clang.dSYM

Standalone

5.1G 388M

Limited

3.9G 387M

measured on a 2013 Mac Pro with 12 cores at 2.7GHz and 32GB RAM

clang r250459, X86/ARM/AArch64, RelWithDebInfo+Assertions, 1 parallel LTO link

slide-61
SLIDE 61

Clang Modules

  • Clang Modules are a saner alternative to textual #include
  • think of them as precompiled headers + additional semantics
  • on disk: .pcm file with the serialized Clang AST of header files
  • Darwin: built implicitly and stored in a global module cache
  • Linux: typically built explicitly
slide-62
SLIDE 62

Module Debugging

  • build Debug Info together with the Clang Module
  • new driver option: -gmodules

LLVM_Utils.pcm .debug_info: class StringRef { … .clang_ast: class StringRef { …

cc1: -dwarf-ext-refs -fmodule-format=obj

  • emit COFF / ELF / Mach-O Module containers

with a .clang_ast section holding the AST.

  • emit full debug information for every type in

the module

  • debug info contributes ~15% of the .pcm size

Module debugging also works with precompiled headers

slide-63
SLIDE 63

Reminder: -flimit-debug-info

TableGen.o .debug_info: .text: call _ZN4llvm9StringRef… .debug_info: namespace llvm { class StringRef { StringRef(const char*); … StringRef.o .text: …

definition use: forward declaration

namespace { class StringRef; }

slide-64
SLIDE 64

TableGen.o .debug_info:

Module Debugging

use: forward declaration

.debug_info: namespace llvm { class StringRef { StringRef(const char*); … .clang_ast: … LLVM_Utils.pcm module LLVM_Utils { module ADT { dwo_name = LLVM_Utils.pcm dwo_id = <module_hash>

metadata for rebuilding module for header file

split DWARF for locating module debug info on disk .text: call _ZN4llvm9StringRef… namespace { class StringRef; } module LLVM_Utils { module ADT { } }

definition

slide-65
SLIDE 65

dsymutil and Clang Modules

  • dsymutil clones the debug info from

all imported modules into the .dSYM bundle bottom-up

  • meanwhile using “ODR” type uniquing

to resolve all forward declarations

  • top-level modules are unique: this

works for C, C++ and Objective-C

  • consumers of the resulting .dSYM

need not know about modules

.debug_info: module Darwin { module C { module stdint { … } … } } module std { … module vector { … } } module LLVM_Utils { … } … StringRef(const char*) clang.dSYM

slide-66
SLIDE 66

dsymutil and Clang Modules

measured on a 2013 Mac Pro with 12 cores at 2.7GHz and 32GB RAM

clang r250459, X86/ARM/AArch64, RelWithDebInfo+Assertions, 1 parallel LTO link

ninja clang (1561 targets)

Wall Clock _build/lib modules-cache clang.dSYM

Standalone

7m 30s 4.1G 413M

Limited

7m 23s 3.1G 402M

  • fmodules

6m 28s 7.2G 322M 564M

  • gmodules

4m 54s 1.2G 368M 453M

LTO Wall Clock _build/lib modules-cache clang.dSYM

Standalone

26m 39s 5.1G 388M

Limited

26m 05s 3.9G 387M

  • fmodules

31m 41s 8.9G 322M 381M

  • gmodules

20m 35s 1.6G 369M 407M

slide-67
SLIDE 67

dsymutil and Clang Modules

measured on a 2013 Mac Pro with 12 cores at 2.7GHz and 32GB RAM

clang r250459, X86/ARM/AArch64, RelWithDebInfo+Assertions, 1 parallel LTO link

ninja clang (1561 targets)

Wall Clock _build/lib modules-cache clang.dSYM

Standalone

7m 30s 4.1G 413M

Limited

7m 23s 3.1G 402M

  • fmodules

6m 28s 7.2G 322M 564M

  • gmodules

4m 54s 1.2G 368M 453M

LTO Wall Clock _build/lib modules-cache clang.dSYM

Standalone

26m 39s 5.1G 388M

Limited

26m 05s 3.9G 387M

  • fmodules

31m 41s 8.9G 322M 381M

  • gmodules

20m 35s 1.6G 369M 407M

slide-68
SLIDE 68

dsymutil and Clang Modules

measured on a 2013 Mac Pro with 12 cores at 2.7GHz and 32GB RAM

clang r250459, X86/ARM/AArch64, RelWithDebInfo+Assertions, 1 parallel LTO link

ninja clang (1561 targets)

Wall Clock _build/lib modules-cache clang.dSYM

Standalone

7m 30s 4.1G 413M

Limited

7m 23s 3.1G 402M

  • fmodules

6m 28s 7.2G 322M 564M

  • gmodules

4m 54s 1.2G 368M 453M

LTO Wall Clock _build/lib modules-cache clang.dSYM

Standalone

26m 39s 5.1G 388M

Limited

26m 05s 3.9G 387M

  • fmodules

31m 41s 8.9G 322M 381M

  • gmodules

20m 35s 1.6G 369M 407M

slide-69
SLIDE 69

dsymutil and Clang Modules

measured on a 2013 Mac Pro with 12 cores at 2.7GHz and 32GB RAM

clang r250459, X86/ARM/AArch64, RelWithDebInfo+Assertions, 1 parallel LTO link

ninja clang (1561 targets)

Wall Clock _build/lib modules-cache clang.dSYM

Standalone

7m 30s 4.1G 413M

Limited

7m 23s 3.1G 402M

  • fmodules

6m 28s 7.2G 322M 564M

  • gmodules

4m 54s 1.2G 368M 453M

LTO Wall Clock _build/lib modules-cache clang.dSYM

Standalone

26m 39s 5.1G 388M

Limited

26m 05s 3.9G 387M

  • fmodules

31m 41s 8.9G 322M 381M

  • gmodules

20m 35s 1.6G 369M 407M

slide-70
SLIDE 70

dsymutil and Clang Modules

measured on a 2013 Mac Pro with 12 cores at 2.7GHz and 32GB RAM

clang r250459, X86/ARM/AArch64, RelWithDebInfo+Assertions, 1 parallel LTO link

ninja clang (1561 targets)

Wall Clock _build/lib modules-cache clang.dSYM

Standalone

7m 30s 4.1G 413M

Limited

7m 23s 3.1G 402M

  • fmodules

6m 28s 7.2G 322M 564M

  • gmodules

4m 54s 1.2G 368M 453M

LTO Wall Clock _build/lib modules-cache clang.dSYM

Standalone

26m 39s 5.1G 388M

Limited

26m 05s 3.9G 387M

  • fmodules

31m 41s 8.9G 322M 381M

  • gmodules

20m 35s 1.6G 369M 407M

slide-71
SLIDE 71

dsymutil and Clang Modules

measured on a 2013 Mac Pro with 12 cores at 2.7GHz and 32GB RAM

clang r250459, X86/ARM/AArch64, RelWithDebInfo+Assertions, 1 parallel LTO link

ninja clang (1561 targets)

Wall Clock _build/lib modules-cache clang.dSYM

Standalone

7m 30s 4.1G 413M

Limited

7m 23s 3.1G 402M

  • fmodules

6m 28s 7.2G 322M 564M

  • gmodules

4m 54s 1.2G 368M 453M

LTO Wall Clock _build/lib modules-cache clang.dSYM

Standalone

26m 39s 5.1G 388M

Limited

26m 05s 3.9G 387M

  • fmodules

31m 41s 8.9G 322M 381M

  • gmodules

20m 35s 1.6G 369M 407M

slide-72
SLIDE 72

What if consumers know about Modules?

  • LLDB is built on top of Clang
  • when evaluating an expression, LLDB
  • 1. loads type info from DWARF
  • 2. builds a Clang AST
  • 3. compiles and executes the Clang AST
slide-73
SLIDE 73
  • LLDB is built on top of Clang
  • when evaluating an expression, LLDB
  • 1. loads type info from DWARF
  • 2. builds a Clang AST
  • 3. compiles and executes the Clang AST

a module-aware LLDB

  • imports the type’s AST

from the Clang Module

What if consumers know about Modules?

slide-74
SLIDE 74

Questions?