[PPT] - Instruction Selection Aslan Askarov aslan@cs.au.dk Partially based PowerPoint Presentation

SLIDE 1

Compilation 2016

Instruction Selection

Aslan Askarov aslan@cs.au.dk      Partially based on slides by E. Ernst

SLIDE 2

Where are we?

High-level source code Low-level target code Lexing/Parsing Translation to LLVM-- IR Register allocation Semantic analysis Instruction selection

SLIDE 3

Instruction selection — translating IR elements into target

How to pick instructions for different IR elements?
When IR is relatively simple, such as LLVM--, the

process is relatively straightforward

most of the hard work is done by the codegen
When IR is a bit more complex, such as the textbook

IR Tree language, there is more work to be done at this phase

Maximum Munch algorithm

SLIDE 4

Tree IR language (from Textbook)

A simple tree expression language:

SLIDE 5

Instruction Selection for Tree IR language

Each IR node does one thing, real

machine instructions typically do several things

Ex: typical memory access ➜
This is good, IR should be primitive
Instruction selection = find ways to

express IR trees using instructions

NB: using shorthand notation ➜

BINOP CONST e c PLUS MEM + CONST e c MEM

SLIDE 6

Describing Instructions

Basic device: the tree pattern
Matching idea
A tree pattern is a partial tree, a tile
From the top: concrete nodes
At bottom: blanks, standing for subtrees, called leaves 
Repeated matching, tiling, reconstructs an IR tree
Read off instruction sequence:

top-down traversal = reverse order

SLIDE 7

For illustration: Jouette

Need concrete instruction set
Hypothetical (RISC) CPU architecture ‘Jouette’
Instructions ➜
Three-address format:

flexible locations

Arithmetic operations: 
nly in registers
Addressing modes: 
nly one address, fixed offset

ADD ri ⃪ rj + rk MUL ri ⃪ rj * rk SUB ri ⃪ rj - rk DIV ri ⃪ rj / rk ADDI ri ⃪ rj + c SUBI ri ⃪ rj - c LOAD ri ⃪ M[rj + c]

SLIDE 8

Jouette Tiles

Two categories:
‘Expression tile’: produces a result in a register
‘Statement tile’: creates a side-effect
Special case: a register is an atomic expression

(no name) ri

t TEMP TEMP

shorthand:

SLIDE 9

Jouette Expression Tiles

Main arithmetic operations: unique patterns

DIV ri ⃪ rj / rk SUB ri ⃪ rj - rk MUL ri ⃪ rj * rk ADD ri ⃪ rj + rk

+

*

/

SLIDE 10

Jouette Expression Tiles

Arithmetic operations involving immediate:

multiple interpretations — multiple patterns

SUBI ri ⃪ rj - c ADDI ri ⃪ rj + c

CONST

CONST + CONST + CONST

SLIDE 11

Jouette Expression Tiles

Reading from memory: many interpretations

LOAD ri ⃪ M[rj + c]

MEM + CONST MEM CONST MEM + CONST MEM

SLIDE 12

Jouette Statement Tiles

Storing in memory: larger tiles

STORE M[ri + c] ⃪ rj

MEM MOVE MEM + CONST MOVE MEM + CONST MOVE MEM CONST MOVE

SLIDE 13

Jouette Statement Tiles

Moving in memory
(Not a typical RISC instruction, but illustrative)
NB: store tiles always match the two nodes

MOVE(MEM,_) simultaneously

MOVEMM[ri] ⃪ M[rj]

MEM MOVE MEM

SLIDE 14

Example Tilings

Consider an IR tree for a[i] := x

MEM MOVE MEM + + MEM * + FP CONST a TEMP i CONST 4 FP CONST x

Discuss how this tree can specify that assignment!

SLIDE 15

Example Tilings

One way to tile this IR tree for a[i] := x

MEM MOVE MEM + + MEM * + FP CONST a TEMP i CONST 4 FP CONST x LOAD r1 ⃪ M[FP + a] ADDI r2 ⃪ r0 + 4 MUL r2 ⃪ ri * r2 ADD r1 ⃪ r1 + r2 LOAD r2 ⃪ M[FP + x] STORE M[r1 + 0] ⃪ r2

SLIDE 16

Example Tilings

Another way to tile this IR tree for a[i] := x

MEM MOVE MEM + + MEM * + FP CONST a TEMP i CONST 4 FP CONST x LOAD r1 ⃪ M[FP + a] ADDI r2 ⃪ r0 + 4 MUL r2 ⃪ ri * r2 ADD r1 ⃪ r1 + r2

SLIDE 17

Example Tilings

An “anti-optimal” tiling of the tree for a[i] := x

MEM MOVE MEM + + MEM * + FP CONST a TEMP i CONST 4 FP CONST x ADDI r1 ⃪ r0 + a ADD r1 ⃪ FP + r1 LOAD r1 ⃪ M[r1 + 0] ADDI r2 ⃪ r0 + 4 MUL r2 ⃪ ri * r2 ADD r1 ⃪ r1 + r2 ADDI r2 ⃪ r0 + x ADD r2 ⃪ FP + r2

SLIDE 18

Optimal vs Optimum Tilings

What’s the “best” tiling?
Minimal number of instructions?
Best performance at runtime?
Compositionally assumption: Can compute “best”

based on each tile (reality: cost is not additive!)

Choice here: Minimal number of instructions
Optimal: No gain combining two neighboring tiles
Optimum: No tiling has lower cost
Property for optimal: local, for optimum: global
Note that optimum ⇒ optimal, not vice versa

SLIDE 19

Comparing Criteria

Obviously, optimal easier than optimum
Then, how valuable is optimum?
RISC CPU architecture: Not terribly important
each tile small, optimal/optimum often identical
CISC CPU architecture: More important
larger tiles, many choices everywhere

SLIDE 20

Algorithm: Maximal Munch

A greedy algorithm, fast, easy to understand
Idea:
Start from root of IR tree, work downward
At each node N, choose biggest tile that matches
Recur on leaves of chosen tile (not children of N ! )
Note: Is never stuck if all single-node tiles exist

SLIDE 21

Maximal Munch Example

The second tiling for a[i] := x

MEM MOVE MEM + + MEM * + FP CONST a TEMP i CONST 4 FP CONST x LOAD r1 ⃪ M[FP + a] ADDI r2 ⃪ r0 + 4 MUL r2 ⃪ ri * r2 ADD r1 ⃪ r1 + r2 ADDI r2 ⃪ FP + x

SLIDE 22

Optimum Algorithm

An algorithm based on dynamic programming, a bit

more complex than maximal munch

Idea:
Start from bottom of IR tree, work upward

(recursion: process children, then current node)

Concept: assign cost to each node (bottom up)
At each node, compute cost for each tile T by

adding cost of T to cost of T's leaves

Solution is optimum

SLIDE 23

Algorithm Complexity

Parameters:
N : number of nodes in given IR tree
T : number of tiles
K : average number of non-leaf nodes in tiles
K' : max no. of nodes to check to see which tiles match
T' : average number of tiles matching at a node
Maximal Munch: N/K(K'+T')
Optimum (dyn.pgm.) algorithm: N(K'+T')
But this is linear in the size of the IR tree!
“No problem!”

SLIDE 24

Tree Grammars

Motivation: Some CPUs, e.g., Motorola 68000,

have register classes: data vs. address registers

Problem: using previous algorithm, sub-tiling may

produce result in the wrong type of register

Idea:
Specify tiles as CFG rules ➜
Non-terminal indicates class
Derivation creates IR tree
Ambiguity = alternative tilings
Tools exist (code-generator generators),

usage not unlike parser generators

d ➜ MEM(+(a,CONST)) d ➜ MEM(+(CONST ,a)) d ➜ MEM(CONST) d ➜ MEM(a) d ➜ a a ➜ d

SLIDE 25

CPU Architecture Issues

RISC was mostly invented to fit well with modern code

generation

RISC features, good and bad:
many registers (e.g., 32)
every register can do everything (just one class)
arithmetic operations only on registers (no MUL?)
three-address instructions (flexible placement)
just one memory addressing mode (M[reg+const])
uniform instruction size (e.g., 32 bit)
every instruction has a single effect/result

SLIDE 26

Summary

IR nodes do one thing, instructions many
Tree patterns, tiles, ‘leaves’ of tiles
Instruction selection: Cover IR tree with tiles
Jouette architecture, instruction set
Jouette statement tiles, expression tiles
Example tilings
Optimum vs. optimal tilings
Algorithms: Maximal munch; dyn. programming
Tree grammars