Compiler Development (CMPSC 401) Code Optimization Janyl Jumadinova - - PowerPoint PPT Presentation

compiler development cmpsc 401
SMART_READER_LITE
LIVE PREVIEW

Compiler Development (CMPSC 401) Code Optimization Janyl Jumadinova - - PowerPoint PPT Presentation

Compiler Development (CMPSC 401) Code Optimization Janyl Jumadinova April 15, 2019 Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 1 / 56 Code Optimization Goal: Optimize generated code by exploiting machine-dependent


slide-1
SLIDE 1

Compiler Development (CMPSC 401)

Code Optimization Janyl Jumadinova April 15, 2019

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 1 / 56

slide-2
SLIDE 2

Code Optimization

Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 2 / 56

slide-3
SLIDE 3

Code Optimization

Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 2 / 56

slide-4
SLIDE 4

Code Optimization

Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level. Critical step in most compilers, but often very messy. Techniques developed for one machine may be completely useless on another. Techniques developed for one language may be completely useless with another.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 2 / 56

slide-5
SLIDE 5

Machine Code

ARM vs. Intel’s x86 ARM has an advantage in terms of power consumption, making it attractive for all sorts of battery operated devices.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 3 / 56

slide-6
SLIDE 6

x86 Overview

Address space: 232 Data types: – 8,16,32,64 bit int, signed and unsigned – 32 and 64-bit floating point – Binary coded decimal – 64,128,256 bit vectors of integers/floats

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 4 / 56

slide-7
SLIDE 7

x86 Registers Overview

16-bit integer registers General purpose (with exceptions): AX, BX, CX, DX Pointer registers: SP (Stack pointer), BP (Base Pointer) For array indexing: DI, SI Segment registers: CS, DS, SS, ES (legacy) FLAGS register to store flags, e.g. CF, OF, ZF Instruction Pointer: IP

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 5 / 56

slide-8
SLIDE 8

For instructions see Intel Software developers manual “The x86 isn’t all that complex... it just doesn’t make a lot of sense” Mike Johnson, AMD, 1994

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 6 / 56

slide-9
SLIDE 9

Code Optimization

Goal: Optimize generated code by exploiting machine-dependent properties not visible at the IR level.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 7 / 56

slide-10
SLIDE 10

Processor Pipelines

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 8 / 56

slide-11
SLIDE 11

Processor Pipelines

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 9 / 56

slide-12
SLIDE 12

Processor Pipelines

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 10 / 56

slide-13
SLIDE 13

Processor Pipelines

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 11 / 56

slide-14
SLIDE 14

Processor Pipelines

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 12 / 56

slide-15
SLIDE 15

Processor Pipelines

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 13 / 56

slide-16
SLIDE 16

Processor Pipelines

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 14 / 56

slide-17
SLIDE 17

Processor Pipelines

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 15 / 56

slide-18
SLIDE 18

Instruction Scheduling

Because of processor pipelining, the order in which instructions are executed can impact performance. Instruction scheduling is the reordering or insertion of machine instructions to increase performance. All good optimizing compilers have some sort of instruction scheduling support.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 16 / 56

slide-19
SLIDE 19

Data Dependencies

A data dependency in machine code is a set of instructions whose behavior depends on one another. Intuitively, a set of instructions that cannot be reordered around each

  • ther.

Three types of data dependencies:

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 17 / 56

slide-20
SLIDE 20

Finding Data Dependencies

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 18 / 56

slide-21
SLIDE 21

Finding Data Dependencies

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 19 / 56

slide-22
SLIDE 22

Finding Data Dependencies

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 20 / 56

slide-23
SLIDE 23

Data Dependencies

The graph of the data dependencies in a basic block is called the data dependency graph.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 21 / 56

slide-24
SLIDE 24

Data Dependencies

The graph of the data dependencies in a basic block is called the data dependency graph. Always a directed acyclic graph: Directed : One instruction depends on the other. Acyclic : No circular dependencies allowed. Can schedule instructions in a basic block in any order as long we never schedule a node before all its parents.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 21 / 56

slide-25
SLIDE 25

Data Dependencies

The graph of the data dependencies in a basic block is called the data dependency graph. Always a directed acyclic graph: Directed : One instruction depends on the other. Acyclic : No circular dependencies allowed. Can schedule instructions in a basic block in any order as long we never schedule a node before all its parents. Idea : Do a topological sort of the data dependency graph and

  • utput instructions in that order.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 21 / 56

slide-26
SLIDE 26

Instruction Scheduling

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 22 / 56

slide-27
SLIDE 27

Instruction Scheduling

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 23 / 56

slide-28
SLIDE 28

Instruction Scheduling

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 24 / 56

slide-29
SLIDE 29

Small Problem

There can be many valid topological orderings of a data dependency graph. How do we pick one that works well with the pipeline? In general, finding the fastest instruction schedule is known to be NP-hard. Heuristics are used in practice:

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 25 / 56

slide-30
SLIDE 30

Small Problem

There can be many valid topological orderings of a data dependency graph. How do we pick one that works well with the pipeline? In general, finding the fastest instruction schedule is known to be NP-hard. Heuristics are used in practice:

Schedule instructions that can run to completion without interference before instructions that cause interference. Schedule instructions with more dependants before instructions with fewer dependants.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 25 / 56

slide-31
SLIDE 31

More Advanced Scheduling

Modern optimizing compilers can do far more aggressive scheduling to obtain impressive performance gains. Loop unrolling

  • Expand out several loop iterations at once.
  • Use previous algorithm to schedule instructions more intelligently.

Can find pipelining-level parallelism across loop iterations.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 26 / 56

slide-32
SLIDE 32

More Advanced Scheduling

Modern optimizing compilers can do far more aggressive scheduling to obtain impressive performance gains. Loop unrolling

  • Expand out several loop iterations at once.
  • Use previous algorithm to schedule instructions more intelligently.

Can find pipelining-level parallelism across loop iterations. Software pipelining

  • Loop unrolling on steroids; can convert loops using tens of cycles

into loops averaging two or three cycles.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 26 / 56

slide-33
SLIDE 33

Memory Caches

Because computers use different types of memory, there are a variety

  • f memory caches in the machine.

Caches are designed to anticipate common use patterns. Compilers often have to rewrite code to take maximal advantage of these designs.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 27 / 56

slide-34
SLIDE 34

Locality

Empirically, many programs exhibit temporal locality and spatial locality.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 28 / 56

slide-35
SLIDE 35

Locality

Empirically, many programs exhibit temporal locality and spatial locality. Temporal locality: Memory read recently is likely to be read again. Spatial locality: Memory read recently will likely have nearby objects read as well. Most memory caches are designed to exploit temporal and spatial locality by

  • Holding recently-used memory addresses in cache.
  • Loading nearby memory addresses into cache.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 28 / 56

slide-36
SLIDE 36

Memory Caches

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 29 / 56

slide-37
SLIDE 37

Memory Caches

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 30 / 56

slide-38
SLIDE 38

Memory Caches

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 31 / 56

slide-39
SLIDE 39

Memory Caches

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 32 / 56

slide-40
SLIDE 40

Memory Caches

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 33 / 56

slide-41
SLIDE 41

Memory Caches

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 34 / 56

slide-42
SLIDE 42

Memory Caches

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 35 / 56

slide-43
SLIDE 43

Improving Locality

Programmers frequently write code without understanding the locality implications. Languages don’t expose low-level memory details. Some compilers are capable of rewriting code to take advantage of locality.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 36 / 56

slide-44
SLIDE 44

Improving Locality

Programmers frequently write code without understanding the locality implications. Languages don’t expose low-level memory details. Some compilers are capable of rewriting code to take advantage of locality. Loop reordering. Structure peeling.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 36 / 56

slide-45
SLIDE 45

Loop Reordering

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 37 / 56

slide-46
SLIDE 46

Loop Reordering

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 38 / 56

slide-47
SLIDE 47

Loop Reordering

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 39 / 56

slide-48
SLIDE 48

Loop Reordering

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 40 / 56

slide-49
SLIDE 49

Loop Reordering

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 41 / 56

slide-50
SLIDE 50

Loop Reordering

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 42 / 56

slide-51
SLIDE 51

Structure Peeling

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 43 / 56

slide-52
SLIDE 52

Structure Peeling

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 44 / 56

slide-53
SLIDE 53

Structure Peeling

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 45 / 56

slide-54
SLIDE 54

Structure Peeling

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 46 / 56

slide-55
SLIDE 55

Structure Peeling

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 47 / 56

slide-56
SLIDE 56

Structure Peeling

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 48 / 56

slide-57
SLIDE 57

Structure Peeling

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 49 / 56

slide-58
SLIDE 58

Structure Peeling

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 50 / 56

slide-59
SLIDE 59

Structure Peeling

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 51 / 56

slide-60
SLIDE 60

Structure Peeling

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 52 / 56

slide-61
SLIDE 61

Summary

Instruction scheduling optimizations try to take advantage of the processor pipeline. Locality optimizations try to take advantage of cache behavior. Parallelism optimizations try to take advantage of multi-core machines. There are many more optimizations out there!

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 53 / 56

slide-62
SLIDE 62

Where we have been

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 54 / 56

slide-63
SLIDE 63

Where we have been

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 55 / 56

slide-64
SLIDE 64

Why Study Compilers?

Build a large, ambitious software system. See theory come to life. Learn how programming languages work. Learn tradeoffs in language design.

Janyl Jumadinova Compiler Development (CMPSC 401) April 15, 2019 56 / 56