Generating Optimized Code with GlobalISel Or: GlobalISel going - - PowerPoint PPT Presentation

generating optimized code with globalisel
SMART_READER_LITE
LIVE PREVIEW

Generating Optimized Code with GlobalISel Or: GlobalISel going - - PowerPoint PPT Presentation

Generating Optimized Code with GlobalISel Or: GlobalISel going beyond "it works" 1 LLVM Dev Meeting 2019 Volkan Keles, Daniel Sanders Apple Agenda What is GlobalISel? GlobalISel Combiner and Helpers Testing and


slide-1
SLIDE 1

Generating Optimized Code with GlobalISel

Or: GlobalISel going beyond "it works"

LLVM Dev Meeting 2019 • Volkan Keles, Daniel Sanders • Apple 1
slide-2
SLIDE 2

Agenda

  • What is GlobalISel?
  • GlobalISel Combiner and Helpers
  • Testing and Debugging
  • Declarative Combiner
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 2
slide-3
SLIDE 3

But first...

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 3
slide-4
SLIDE 4

History

  • In 2017, we got GlobalISel fully working for our target
  • Fast compile time, but codegen quality was significantly lower
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 4
slide-5
SLIDE 5

History

  • In 2017, we got GlobalISel fully working for our target
  • Fast compile time, but codegen quality was significantly lower
  • Added several new features to improve codegen quality
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 4
slide-6
SLIDE 6

History

  • In 2017, we got GlobalISel fully working for our target
  • Fast compile time, but codegen quality was significantly lower
  • Added several new features to improve codegen quality
  • By 2019, the codegen quality has improved
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 4
slide-7
SLIDE 7

Apple GPU Compiler Uses GlobalISel

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 5
slide-8
SLIDE 8

What is GlobalISel?

  • GlobalISel is a new instruction selection framework
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 6
slide-9
SLIDE 9

What is GlobalISel?

  • GlobalISel is a new instruction selection framework
  • Supports more global optimization (e.g. match across BasicBlocks)
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 6
slide-10
SLIDE 10

What is GlobalISel?

  • GlobalISel is a new instruction selection framework
  • Supports more global optimization (e.g. match across BasicBlocks)
  • More flexible
  • From the speed of FastISel to the quality of SelectionDAGISel
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 6
slide-11
SLIDE 11

What is GlobalISel?

  • GlobalISel is a new instruction selection framework
  • Supports more global optimization (e.g. match across BasicBlocks)
  • More flexible
  • From the speed of FastISel to the quality of SelectionDAGISel
  • Easier to understand, maintain, and test
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 6
slide-12
SLIDE 12

What is GlobalISel?

  • GlobalISel is a new instruction selection framework
  • Supports more global optimization (e.g. match across BasicBlocks)
  • More flexible
  • From the speed of FastISel to the quality of SelectionDAGISel
  • Easier to understand, maintain, and test
  • Keeps all state in the Machine IR (MIR)
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 6
slide-13
SLIDE 13

What is GlobalISel?

  • GlobalISel is a new instruction selection framework
  • Supports more global optimization (e.g. match across BasicBlocks)
  • More flexible
  • From the speed of FastISel to the quality of SelectionDAGISel
  • Easier to understand, maintain, and test
  • Keeps all state in the Machine IR (MIR)
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 6

A Proposal for Global Instruction Selection


Quentin Colombet
 2015 LLVM Developers’ Meeting
slide-14
SLIDE 14

Anatomy of GlobalISel

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 7 LLVM-IR Generic Machine Instructions (gMIR) Machine Instructions (MIR) Generic Machine Instructions and Machine Instructions IR Translator Legalizer Instruction Selector Register Bank Selector
slide-15
SLIDE 15 IR Translator Legalizer Instruction Selector Register Bank Selector

IR Translator

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 8

Convert LLVM-IR into gMIR

LLVM-IR Generic Machine Instructions (gMIR) Machine Instructions (MIR) Generic Machine Instructions and Machine Instructions
slide-16
SLIDE 16 IR Translator Legalizer Instruction Selector Register Bank Selector Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 9 LLVM-IR Generic Machine Instructions (gMIR) Machine Instructions (MIR) Generic Machine Instructions and Machine Instructions
slide-17
SLIDE 17 IR Translator Legalizer Instruction Selector Register Bank Selector

Legalizer

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 10

Replace unsupported operations with supported ones

LLVM-IR Generic Machine Instructions (gMIR) Machine Instructions (MIR) Generic Machine Instructions and Machine Instructions
slide-18
SLIDE 18 IR Translator Legalizer Instruction Selector Register Bank Selector Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 11 LLVM-IR Generic Machine Instructions (gMIR) Machine Instructions (MIR) Generic Machine Instructions and Machine Instructions
slide-19
SLIDE 19 IR Translator Legalizer Instruction Selector Register Bank Selector

Register Bank Selector

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 12

Binds registers to a Register Bank

LLVM-IR Generic Machine Instructions (gMIR) Machine Instructions (MIR) Generic Machine Instructions and Machine Instructions
slide-20
SLIDE 20 IR Translator Legalizer Instruction Selector Register Bank Selector Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 13 LLVM-IR Generic Machine Instructions (gMIR) Machine Instructions (MIR) Generic Machine Instructions and Machine Instructions
slide-21
SLIDE 21 IR Translator Legalizer Instruction Selector Register Bank Selector

Instruction Selector

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 14

Select target instructions

LLVM-IR Generic Machine Instructions (gMIR) Machine Instructions (MIR) Generic Machine Instructions and Machine Instructions
slide-22
SLIDE 22 IR Translator Legalizer Instruction Selector Register Bank Selector

Anatomy of GlobalISel

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 15 LLVM-IR Generic Machine Instructions (gMIR) Machine Instructions (MIR) Generic Machine Instructions and Machine Instructions
slide-23
SLIDE 23 IR Translator Legalizer Instruction Selector Register Bank Selector

Anatomy of GlobalISel

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 15 LLVM-IR Generic Machine Instructions (gMIR) Machine Instructions (MIR) Generic Machine Instructions and Machine Instructions

Tutorial: Head First into GlobalISel


Aditya Nandakumar, Daniel Sanders, and Justin Bogner
 2017 LLVM Developers’ Meeting

slide-24
SLIDE 24

Anatomy of GlobalISel

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 16 LLVM-IR Generic Machine Instructions (gMIR) Machine Instructions (MIR) Generic Machine Instructions and Machine Instructions IR Translator Legalizer Instruction Selector Register Bank Selector
slide-25
SLIDE 25 IR Translator Legalizer Instruction Selector Register Bank Selector

Combiner

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 17 LLVM-IR Generic Machine Instructions (gMIR) Machine Instructions (MIR) Generic Machine Instructions and Machine Instructions Combiner 2 Combiner 1
slide-26
SLIDE 26 IR Translator Legalizer Instruction Selector Register Bank Selector Combiner 2 Combiner 1 Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 18

Simplify/Optimize gMIR/MIR

LLVM-IR Generic Machine Instructions (gMIR) Machine Instructions (MIR) Generic Machine Instructions and Machine Instructions

Combiner

slide-27
SLIDE 27 IR Translator Legalizer Instruction Selector Register Bank Selector Combiner 2 Combiner 1 Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 19

Simplify/Optimize gMIR/MIR

LLVM-IR Generic Machine Instructions (gMIR) Machine Instructions (MIR) Generic Machine Instructions and Machine Instructions

Combiner

slide-28
SLIDE 28 IR Translator Legalizer Instruction Selector Register Bank Selector Combiner 2 Combiner 1 Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 19

Simplify/Optimize gMIR/MIR

LLVM-IR Generic Machine Instructions (gMIR) Machine Instructions (MIR) Generic Machine Instructions and Machine Instructions

Combiner

IR Translator Legalizer Instruction Selector Register Bank Selector Combiner 2 Combiner 1 IR Translator Legalizer Instruction Selector Register Bank Selector Combiner 4 Combiner 1 Combiner 3 Combiner 2
slide-29
SLIDE 29

Why do we need combiners?

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 20
slide-30
SLIDE 30

CodeGen Quality

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 21 Instruction Count (%) 0% 50% 100% 150%

SelectionDAGISel GlobalISel w/o Opt GlobalISel w/Opt

slide-31
SLIDE 31

CodeGen Quality

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 22 Instruction Count (%) 0% 50% 100% 150%

SelectionDAGISel GlobalISel w/o Opt GlobalISel w/Opt

slide-32
SLIDE 32

CodeGen Quality

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 22 Instruction Count (%) 0% 50% 100% 150%

SelectionDAGISel GlobalISel w/o Opt GlobalISel w/Opt

<2%

slide-33
SLIDE 33

CodeGen Quality

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 22 Instruction Count (%) 0% 50% 100% 150%

SelectionDAGISel GlobalISel w/o Opt GlobalISel w/Opt

slide-34
SLIDE 34

Compile Time Performance

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 23 Compile Time (%) 0% 25% 50% 75% 100%

SelectionDAGISel GlobalISel

slide-35
SLIDE 35

Compile Time Performance

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 24 Compile Time (%) 0% 25% 50% 75% 100%

SelectionDAGISel GlobalISel

slide-36
SLIDE 36

Compile Time Performance - ISel Only

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 25 Compile Time (%) 0% 25% 50% 75% 100%

SelectionDAGISel GlobalISel

slide-37
SLIDE 37

Compile Time Performance - ISel Only

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 25 Compile Time (%) 0% 25% 50% 75% 100%

SelectionDAGISel GlobalISel

45%

slide-38
SLIDE 38

Compile Time Performance - ISel Only

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 25 Compile Time (%) 0% 25% 50% 75% 100%

SelectionDAGISel GlobalISel

slide-39
SLIDE 39

Features Needed

  • Common Subexpression Elimination (CSE)
  • Combiners
  • KnownBits
  • SimplifyDemandedBits
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 26
slide-40
SLIDE 40

CSE

  • Considered using MachineCSE, but it was expensive
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 27
slide-41
SLIDE 41

CSE

  • Considered using MachineCSE, but it was expensive
  • We chose a continuous CSE approach
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 27
slide-42
SLIDE 42

CSE

  • Considered using MachineCSE, but it was expensive
  • We chose a continuous CSE approach
  • Instructions are CSE'd at creation time using CSEMIRBuilder
  • Information is provided by an analysis pass
  • BasicBlock-local
  • Supports a subset of generic operations
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 27
slide-43
SLIDE 43

Things to be aware of

  • CSE needs to be informed of:
  • Changes to MachineInstrs (creation, modification, and erasure)
  • Installs a delegate to handle creation/erasure automatically
  • Installs a change observer to inform changes
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 28
slide-44
SLIDE 44

Compile Time Cost

  • We were expecting this to come at a big compile-time cost
  • Improved compile time for some cases
  • Later passes had less work to do
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 29
slide-45
SLIDE 45

Combiner

  • Applies a set of combine rules
  • Important for producing good code
  • Expensive in terms of compile-time
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 30
slide-46
SLIDE 46

What is a combine?

  • An optimization that transforms a pattern into something more desirable
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 31
slide-47
SLIDE 47

What is a combine?

  • An optimization that transforms a pattern into something more desirable
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 31

define i32 @foo(i8 %in) { %ext1 = zext i8 %in to i16 %ext2 = zext i16 %ext1 to i32 ret i32 %ext2 }

slide-48
SLIDE 48

What is a combine?

  • An optimization that transforms a pattern into something more desirable
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 31

define i32 @foo(i8 %in) { %ext1 = zext i8 %in to i16 %ext2 = zext i16 %ext1 to i32 ret i32 %ext2 }

slide-49
SLIDE 49

What is a combine?

  • An optimization that transforms a pattern into something more desirable
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 32

define i32 @foo(i8 %in) { %ext2 = zext i8 %in to i32 ret i32 %ext2 }

slide-50
SLIDE 50

GlobalISel Combiner

  • GlobalISel Combiner consists of 3 main pieces
  • Combiner iterates over the MachineFunction
  • CombinerInfo specifies which operations to be combined and how
  • CombinerHelper is a library of generic combines
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 33
slide-51
SLIDE 51

GlobalISelCombiner

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 34

MyTargetCombinerInfo : CombinerInfo combine(…)

MyTargetCombinerPass CombinerHelper

Uses Uses

Combiner

Uses
slide-52
SLIDE 52

A Basic Combiner

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 35

bool MyTargetCombinerInfo::combine(GISelChangeObserver &Observer, MachineInstr &MI, MachineIRBuilder &B) const { MyTargetCombinerHelper TCH(Observer, B, KB); // ... // Try all combines. if (OptimizeAggresively) return TCH.tryCombine(MI); // Combine COPY only. if (MI.getOpcode() == TargetOpcode::COPY) return TCH.tryCombineCopy(MI); return false; }

slide-53
SLIDE 53

A Simple Combine

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 36 bool MyTargetCombinerHelper::combineExt(GISelChangeObserver &Observer, MachineInstr &MI, MachineIRBuilder &B) const { // .. // Combine zext(zext x) -> zext x if (MI.getOpcode() == TargetOpcode::G_ZEXT) { Register SrcReg = MI.getOperand(1).getReg(); MachineInstr *SrcMI = MRI.getVRegDef(SrcReg); // Check if SrcMI is a G_ZEXT. if (SrcMI->getOpcode() == TargetOpcode::G_ZEXT) { SrcReg = SrcMI->getOperand(1).getReg(); B.buildZExt(Reg, SrcReg); MI.eraseFromParent(); return true; } } // ... }
slide-54
SLIDE 54

A Simple Combine

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 36 bool MyTargetCombinerHelper::combineExt(GISelChangeObserver &Observer, MachineInstr &MI, MachineIRBuilder &B) const { // .. // Combine zext(zext x) -> zext x if (MI.getOpcode() == TargetOpcode::G_ZEXT) { Register SrcReg = MI.getOperand(1).getReg(); MachineInstr *SrcMI = MRI.getVRegDef(SrcReg); // Check if SrcMI is a G_ZEXT. if (SrcMI->getOpcode() == TargetOpcode::G_ZEXT) { SrcReg = SrcMI->getOperand(1).getReg(); B.buildZExt(Reg, SrcReg); MI.eraseFromParent(); return true; } } // ... }
slide-55
SLIDE 55

MIPatternMatch

  • Simple and easy mechanism to match generic patterns
  • Similar to what we have for LLVM IR
  • Combines can be implemented easily using matchers
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 37
slide-56
SLIDE 56

MIPatternMatch

  • Simple and easy mechanism to match generic patterns
  • Similar to what we have for LLVM IR
  • Combines can be implemented easily using matchers
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 37 // Combine zext(zext x) -> zext x Register SrcReg; if (mi_match(Reg, MRI, m_GZext(m_GZext(m_Reg(SrcReg))))) { B.buildZExt(Reg, SrcReg); MI.eraseFromParent(); return true; }
slide-57
SLIDE 57

A Simpler Combine

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 38 // Combine zext(zext x) -> zext x Register SrcReg; if (mi_match(Reg, MRI, m_GZext(m_GZext(m_Reg(SrcReg))))) { B.buildZExt(Reg, SrcReg); MI.eraseFromParent(); return true; }
slide-58
SLIDE 58

A Simpler Combine

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 38 // Combine zext(zext x) -> zext x Register SrcReg; if (mi_match(Reg, MRI, m_GZext(m_GZext(m_Reg(SrcReg))))) { Observer.changingInstr(MI); MI.getOperand(1).setReg(SrcReg); Observer.changedInstr(MI); return true; } // Combine zext(zext x) -> zext x Register SrcReg; if (mi_match(Reg, MRI, m_GZext(m_GZext(m_Reg(SrcReg))))) { B.buildZExt(Reg, SrcReg); MI.eraseFromParent(); return true; }
slide-59
SLIDE 59

A Simpler Combine

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 38 // Combine zext(zext x) -> zext x Register SrcReg; if (mi_match(Reg, MRI, m_GZext(m_GZext(m_Reg(SrcReg))))) { Observer.changingInstr(MI); MI.getOperand(1).setReg(SrcReg); Observer.changedInstr(MI); return true; } // Combine zext(zext x) -> zext x Register SrcReg; if (mi_match(Reg, MRI, m_GZext(m_GZext(m_Reg(SrcReg))))) { B.buildZExt(Reg, SrcReg); MI.eraseFromParent(); return true; }
slide-60
SLIDE 60

Informing the Observer

  • Observer needs to be informed when something changed
  • createdInstr() and erasedInstr() are handled automatically
  • changingInstr() and changedInstr() are handled manually and

mandatory for MRI.setRegClass(), MO.setReg(), etc.

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 39
slide-61
SLIDE 61

KnownBits Analysis

  • Many combines are only valid for certain cases
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 40
slide-62
SLIDE 62

KnownBits Analysis

  • Many combines are only valid for certain cases
  • (a + 1) → (a | 1) is only valid if (a & 1) == 0
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 40
slide-63
SLIDE 63

KnownBits Analysis

  • Many combines are only valid for certain cases
  • (a + 1) → (a | 1) is only valid if (a & 1) == 0
  • We added an analysis pass to provide this information
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 40
slide-64
SLIDE 64

KnownBits Analysis

  • Many combines are only valid for certain cases
  • (a + 1) → (a | 1) is only valid if (a & 1) == 0
  • We added an analysis pass to provide this information
  • Currently provides known-ones, known-zeros, and unknowns
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 40
slide-65
SLIDE 65

Example

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 41

? = Unknown

%1:(s32) = G_CONSTANT i32 0xFF0 %2:(s32) = G_AND %0, %1 %3:(s32) = G_CONSTANT i32 0x0FF %4:(s32) = G_AND %2, %3

Value %0 0x???????? %1 0x00000FF0 %2 %3 0x000000FF %4

slide-66
SLIDE 66

Example

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 42

? = Unknown

%1:(s32) = G_CONSTANT i32 0xFF0 %2:(s32) = G_AND %0, %1 %3:(s32) = G_CONSTANT i32 0x0FF %4:(s32) = G_AND %2, %3

Value %0 0x???????? %1 0x00000FF0 %2 0x00000??0 %3 0x000000FF %4 0x000000?0

slide-67
SLIDE 67

Example

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 43

? = Unknown

%1:(s32) = G_CONSTANT i32 0xFF0 %2:(s32) = G_AND %0, %1 %3:(s32) = G_CONSTANT i32 0x0FF %4:(s32) = G_AND %2, %3

Value %0 0x???????? %1 0x00000FF0 %2 0x00000??0 %3 0x000000FF %4 0x000000?0

slide-68
SLIDE 68

Example

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 44

? = Unknown

%5:(s32) = G_CONSTANT i32 0x0F0 %4:(s32) = G_AND %2, %3

Value %0 0x???????? %1 0x00000FF0 %2 0x00000??0 %3 0x000000FF %4 0x000000?0 %5 0x000000F0

slide-69
SLIDE 69

Why an Analysis Pass?

  • In SelectionDAGISel, computeKnownBits() is just a function
  • In GlobalISel, it’s an Analysis Pass
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 45
slide-70
SLIDE 70

Why an Analysis Pass?

  • In SelectionDAGISel, computeKnownBits() is just a function
  • In GlobalISel, it’s an Analysis Pass
  • It allows us to add support for:
  • Caching within a pass
  • Caching between passes
  • Early exit when enough is known
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 45
slide-71
SLIDE 71

Why an Analysis Pass?

  • In SelectionDAGISel, computeKnownBits() is just a function
  • In GlobalISel, it’s an Analysis Pass
  • It allows us to add support for:
  • Caching within a pass
  • Caching between passes
  • Early exit when enough is known
  • Allows us to have alternative implementations
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 45
slide-72
SLIDE 72

Extending KnownBits

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 46 void MyTargetLowering::computeKnownBitsForTargetInstr( GISelKnownBits &Analysis, Register R, KnownBits &Known, const APInt &DemandedElts, const MachineRegisterInfo &MRI, unsigned Depth = 0) const override { // ... switch (Opcode) { // ... case TargetOpcode::ANDWrr: { Analysis.computeKnownBitsImpl(MI.getOperand(2).getReg(), Known, DemandedElts, Depth + 1); Analysis.computeKnownBitsImpl(MI.getOperand(1).getReg(), Known2, DemandedElts, Depth + 1); Known.One &= Known2.One; Known.Zero |= Known2.Zero; break; } // ... } // ... }
slide-73
SLIDE 73

KnownBits Analysis

  • Allows optimizations that otherwise wouldn't be possible
  • Available to any MachineFunction pass
  • Caching will make it cheaper than SelectionDAGISel's equivalent
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 47
slide-74
SLIDE 74

SimplifyDemandedBits

  • Essentially a special case of Combine
  • Tries to eliminate calculations that contribute to the bits that are never

read

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 48
slide-75
SLIDE 75

SimplifyDemandedBits

  • Essentially a special case of Combine
  • Tries to eliminate calculations that contribute to the bits that are never

read

  • If demand mask is 0xF0:
  • (a << 16) | (b & 0xFFFF) → (b & 0xFFFF)
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 48
slide-76
SLIDE 76

SimplifyDemandedBits

  • Essentially a special case of Combine
  • Tries to eliminate calculations that contribute to the bits that are never

read

  • If demand mask is 0xF0:
  • (a << 16) | (b & 0xFFFF) → (b & 0xFFFF)
  • Not upstreamed yet, but we plan to fix that soon
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 48
slide-77
SLIDE 77

Testing

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 49 SelectionDAGISel LLVM-IR SelectionDAG Machine Instructions (MIR) MIR LLVM-IR
slide-78
SLIDE 78

Testing

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 49 IR Translator Legalizer Instruction Selector Register Bank Selector LLVM-IR Generic Machine Instructions (gMIR), Machine Instructions (MIR), and gMIR+MIR mixed MIR gMIR + MIR gMIR + MIR gMIR LLVM-IR SelectionDAGISel LLVM-IR SelectionDAG Machine Instructions (MIR) MIR LLVM-IR
slide-79
SLIDE 79

Unit Testing

  • Unit Testable too
  • We use FileCheck as a library to check results
  • It allows us to test exactly what optimizations do
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 50 Legalizer gMIR + MIR gMIR Step gMIR + MIR Step Step Step
slide-80
SLIDE 80

Debugging

  • It is error prone to implement optimizations from scratch
  • Special cases
  • Floating Point Precision issues (e.g. x * y + z → fma(x, y, z))
  • Porting can be difficult too due to differences vs SelectionDAGISel
  • It is especially hard to debug on GPUs
  • Xcode has tool to debug shaders, but it relies on the compiler being

correct

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 51
slide-81
SLIDE 81

BlockExtractor

  • LLVM Pass used by llvm-extract
  • Promotes specified BasicBlocks to functions
  • Exploitable to find critical block(s) for a bug
  • GlobalISel can be disabled per function
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 52 GlobalISel Function BB1 BB2 BB3 BB4
slide-82
SLIDE 82

BlockExtractor

  • LLVM Pass used by llvm-extract
  • Promotes specified BasicBlocks to functions
  • Exploitable to find critical block(s) for a bug
  • GlobalISel can be disabled per function
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 52 GlobalISel Function BB1 BB2 BB3 BB4 GlobalISel
slide-83
SLIDE 83

BlockExtractor

  • LLVM Pass used by llvm-extract
  • Promotes specified BasicBlocks to functions
  • Exploitable to find critical block(s) for a bug
  • GlobalISel can be disabled per function
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 52 GlobalISel Function BB1 BB2 BB3 BB4 GlobalISel
slide-84
SLIDE 84

BlockExtractor

  • LLVM Pass used by llvm-extract
  • Promotes specified BasicBlocks to functions
  • Exploitable to find critical block(s) for a bug
  • GlobalISel can be disabled per function
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 52 GlobalISel Function BB1 BB2 BB3 BB4 GlobalISel
slide-85
SLIDE 85

BlockExtractor

  • LLVM Pass used by llvm-extract
  • Promotes specified BasicBlocks to functions
  • Exploitable to find critical block(s) for a bug
  • GlobalISel can be disabled per function
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 52 GlobalISel Function BB1 BB2 BB3 BB4 GlobalISel
slide-86
SLIDE 86

BlockExtractor

  • LLVM Pass used by llvm-extract
  • Promotes specified BasicBlocks to functions
  • Exploitable to find critical block(s) for a bug
  • GlobalISel can be disabled per function
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 52 GlobalISel Function BB1 BB2 BB3 BB4 GlobalISel SDAGISel
slide-87
SLIDE 87 GlobalISel Function BB5

BlockExtractor

  • Search space still too large?
  • Split the BasicBlocks and repeat
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 53 BB6 BB1 BB2 BB3 BB4 SDAGISel
slide-88
SLIDE 88 GlobalISel Function BB5

BlockExtractor

  • Search space still too large?
  • Split the BasicBlocks and repeat
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 54 BB6 BB1 BB2 BB3 BB4 SDAGISel
slide-89
SLIDE 89 GlobalISel Function BB5

BlockExtractor

  • Search space still too large?
  • Split the BasicBlocks and repeat
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 54 BB6 BB1 BB2 BB3 BB4 SDAGISel
slide-90
SLIDE 90 GlobalISel Function BB5

BlockExtractor

  • Search space still too large?
  • Split the BasicBlocks and repeat
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 54 BB6 BB1 BB2 BB3 BB4 SDAGISel
slide-91
SLIDE 91 GlobalISel Function BB5

BlockExtractor

  • Search space still too large?
  • Split the BasicBlocks and repeat
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 54 BB6 BB1 BB2 BB3 BB4 SDAGISel
slide-92
SLIDE 92

BlockExtractor

  • All the components are upstream
  • You will need a driver script to put them together

$ ./bin/llvm-extract -o - -S \

  • b ‘foo:bb9;bb20’ <input> > extracted.ll
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 55
slide-93
SLIDE 93

Advice

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 56
slide-94
SLIDE 94

Advice: Minimize Fallbacks

  • Falling back:
  • Wastes compile time
  • Skews quality metrics
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 57

SelectionDAGISel GlobalISel

Compile Time Development Progress →
slide-95
SLIDE 95

Advice: Track Metrics Closely

  • Catch regressions early
  • Celebrate wins
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 58

🧑 🎊 😂

slide-96
SLIDE 96
  • Identify important optimizations
  • Code Coverage Insights
  • Minimize with BlockExtractor
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 59 SelectionDAGISel Function BB5 BB6 BB1 BB2 BB3 BB4 SelectionDAGISel Function BB5 BB6 BB1 BB2 BB3 BB4 GlobalISel

Advice: Identify Key Optimizations

40 instrs 45 instrs, 5 due to BB4 SDAGISel
slide-97
SLIDE 97

Advice: Starting a Combiner

  • Simple combines go a long way
  • PreLegalizerCombiner and

PostLegalizerCombiner are easy starting points

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 60

Gain Effort

slide-98
SLIDE 98 IR Translator Legalizer Instruction Selector

Advice: Freedom

  • Remember: Not a fixed pipeline
  • Can replace passes
  • Insert a pass where appropriate
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 61 Peephole Trivial RegBank Selector Errata Fixups Extract Invariants Select Intrinsics Just Make it Faster
slide-99
SLIDE 99

Work In Progress

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 62
slide-100
SLIDE 100

Declarative Combiner

  • Modify RuleSets
  • Targets may wish to disable rules or make them only apply in certain

circumstances

  • Analyze RuleSets
  • Enables various kinds of tooling
  • Optimize RuleSets
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 63
slide-101
SLIDE 101

Goals

  • Test Combine rules in isolation
  • More debuggable: infinite loops and large rule-sets
  • More target control
  • Enable tools: profilers, coverage, static-analysis, proof engines
  • Be independent of algorithm used
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 64
slide-102
SLIDE 102

Declarative Rule

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 65

def : GICombineRule< (defs reg:$D, reg:$S), (match (G_ZEXT s16:$t1, s8:$S), (G_ZEXT s32:$D, s16:$t1)), (apply (G_ZEXT s32:$D, s8:$S))>;

G_ZEXT G_ZEXT s8 s16 s32 G_ZEXT s8 s32
slide-103
SLIDE 103

Why not SelectionDAG's patterns?

  • They can't describe several classes of DAG.
  • Only the bottom-up tree-like DAG's with limited node sharing
  • Can't describe multiple results from one instruction
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 66
slide-104
SLIDE 104

Example - SelectionDAG Style

  • Analyzes def of the G_SEXT, G_ANYEXT, G_ZEXT in isolation
  • Folds the G_LOAD down into the extend, duplicating the load
  • Volatile/Atomics rejected unless hasOneUse()
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 67 G_LOAD G_SEXT G_ZEXT G_ANYEXT G_SEXTLOAD G_ZEXTLOAD G_EXTLOAD
slide-105
SLIDE 105

Example - GlobalISel Style

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 68 G_LOAD G_SEXT G_ZEXT G_SEXTLOAD G_ZEXT G_TRUNC G_ANYEXT G_SEXTLOAD G_AND G_CONSTANT G_LOAD G_SEXT G_ZEXT G_ANYEXT G_SEXTLOAD G_ZEXTLOAD G_EXTLOAD
slide-106
SLIDE 106

Debug Info

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 69

def : GICombineRule< (defs reg:$D, reg:$S, instr:$MI0, instr:$MI1), (match (G_ZEXT $t0, $S):$MI0, (G_ZEXT $D, $t0):$MI1, (isScalarType type:$D), (isLargerType type:$D, type:$S)), (apply (G_ZEXT $D, $S, (debug_locations $MI0, $MI1)))>;

slide-107
SLIDE 107

Rule Selection

  • CombinerHelpers are declared in TableGen
  • Specifies a class name and a list of combines in priority order
  • Generated combiner ensures this order is honoured but still optimizes
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 70

def MyPreLegalizerCombinerHelper : GICombinerHelper< "MyGenPreLegalizerCombinerHelper", [copy_prop, fold_add_0, fold_mul_1, postpone_sext_for_add, postpone_zext_for_add, postpone_sext_for_sub, postpone_zext_for_sub, extending_loads]>;

slide-108
SLIDE 108

Rule Selection

  • CombineGroups may be specified to factor out:
  • Common combines (e.g. identities)
  • Common target features (e.g. unfused_muladd, load_multiple, bswap)
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 71 def identities : GICombineGroup<[fold_add_0, fold_mul_1]>; def trivial_combines : GICombineGroup<[copy_prop, identities]>; def postpone_extends : GICombineGroup<[ postpone_sext_for_add, postpone_zext_for_add, postpone_sext_for_sub, postpone_zext_for_sub]>; def MyPreLegalizerCombinerHelper: GICombinerHelper< "MyGenPreLegalizerCombinerHelper", [trivial_combines, postpone_extends, extending_loads]>;
slide-109
SLIDE 109

Rule Selection

  • CombineGroups may be specified to factor out:
  • Common combines (e.g. identities)
  • Common target features (e.g. unfused_muladd, load_multiple, bswap)
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 72 def MyPreLegalizerCombinerHelper: GICombinerHelper< "MyGenPreLegalizerCombinerHelper", [trivial_combines, postpone_extends, extending_loads]>;
slide-110
SLIDE 110

Rule Selection

  • Generated combiner includes a command line option when asked
  • -myprelegalizercombiner-disable-rule=1
  • -myprelegalizercombiner-disable-rule=0-50,75-100
  • -myprelegalizercombiner-disable-rule=fold_2_plus_2_to_5
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 73 def MyPreLegalizerCombinerHelper: GICombinerHelper< "MyGenPreLegalizerCombinerHelper", [trivial_combines, postpone_extends, extending_loads]> { let DisableRuleOption = "myprelegalizercombiner-disable-rule"; }
slide-111
SLIDE 111

Rule Selection

  • Sometimes we can generally use a group but there's a small flaw
  • Combiners (and maybe groups in future) can modify their contents
  • Exact modifiers TBD
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 74 def MyPreLegalizerCombinerHelper: GICombinerHelper< "MyGenPreLegalizerCombinerHelper", [trivial_combines, postpone_extends, extending_loads]> { let DisableRuleOption = "myprelegalizercombiner-disable-rule"; let Modifiers = [(disable_rule copy_prop), (add_predicate lower_add_to_or, (when_profitable $d))]; }
slide-112
SLIDE 112

Integration

  • Generates a Combiner
  • Integrate into CombinerInfo via constructor and combine() tweak
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 75

AArch64GenPreLegalizerCombinerHelper Generated; if (!Generated.parseCommandLineOption()) report_fatal_error("Invalid rule identifier"); if (Generated.tryCombineAll(Observer, MI, B)) return true;

slide-113
SLIDE 113

Extensibility

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 76

def : GICombineRule< (defs reg:$D, reg:$S), (match (G_ZEXT s32:$t1, s8:$S), (G_ZEXT s16:$D, s32:$t1), (require (allof O3, armv8, neon)), (a_b_testing "Experiment54")), (apply (G_ZEXT s16:$D, s8:$S), (debug_print "Investigate this test"), (tweet "@llvmorg" "Optimization win! 👼"))>;

slide-114
SLIDE 114

Development Tools

  • Coverage - Are rules tested? Do they trigger in practice?
  • Profiler - Are they worth their cost?
  • Controlled application of rules?
  • If I applied them in this order would the outcome be better?
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 77
slide-115
SLIDE 115

Debugging Tools

  • Rule Bisection - Which one caused a miscompilation?
  • N-stable Loop Detection - Why doesn't my combiner terminate?
  • Rule Proving, i.e. ALIVE for backend? - Are my rules correct?
  • State machine debugger? - What is the combiner doing?
  • MIR Patches - Re-construct intermediate MIR
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 78
slide-116
SLIDE 116

Recap

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 79
slide-117
SLIDE 117

Recap

Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 80
slide-118
SLIDE 118

Recap

  • In 2017, we got GlobalISel working but code quality wasn't there yet
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 80
slide-119
SLIDE 119

Recap

  • In 2017, we got GlobalISel working but code quality wasn't there yet
  • Added continuous CSE, Combine, KnownBits, and SimplifyDemandedBits
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 80
slide-120
SLIDE 120

Recap

  • In 2017, we got GlobalISel working but code quality wasn't there yet
  • Added continuous CSE, Combine, KnownBits, and SimplifyDemandedBits
  • By 2019:
  • Compile time 45% faster than SelectionDAGISel
  • Generated code quality on par with SelectionDAGISel
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 80
slide-121
SLIDE 121

Recap

  • In 2017, we got GlobalISel working but code quality wasn't there yet
  • Added continuous CSE, Combine, KnownBits, and SimplifyDemandedBits
  • By 2019:
  • Compile time 45% faster than SelectionDAGISel
  • Generated code quality on par with SelectionDAGISel
  • Other targets are actively working on GlobalISel
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 80
slide-122
SLIDE 122

Recap

  • In 2017, we got GlobalISel working but code quality wasn't there yet
  • Added continuous CSE, Combine, KnownBits, and SimplifyDemandedBits
  • By 2019:
  • Compile time 45% faster than SelectionDAGISel
  • Generated code quality on par with SelectionDAGISel
  • Other targets are actively working on GlobalISel
  • We shipped it! You might even be using it!
Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019 80
slide-123
SLIDE 123 81 Generating Optimized Code with GlobalISel • LLVM Dev Meeting 2019