Instruction Selection Aslan Askarov aslan@cs.au.dk Partially based - - PowerPoint PPT Presentation

instruction selection
SMART_READER_LITE
LIVE PREVIEW

Instruction Selection Aslan Askarov aslan@cs.au.dk Partially based - - PowerPoint PPT Presentation

Compilation 2016 Instruction Selection Aslan Askarov aslan@cs.au.dk Partially based on slides by E. Ernst Where are we? High-level source code Translation to Lexing/Parsing Semantic analysis LLVM-- IR Low-level target


slide-1
SLIDE 1

Compilation 2016

Instruction Selection

Aslan Askarov aslan@cs.au.dk
 
 
 Partially based on slides by E. Ernst

slide-2
SLIDE 2

Where are we?

High-level source code Low-level target code Lexing/Parsing Translation to LLVM-- IR Register allocation Semantic analysis Instruction selection

slide-3
SLIDE 3

Instruction selection — translating IR elements into target

  • How to pick instructions for different IR elements?
  • When IR is relatively simple, such as LLVM--, the

process is relatively straightforward

  • most of the hard work is done by the codegen
  • When IR is a bit more complex, such as the textbook

IR Tree language, there is more work to be done at this phase

  • Maximum Munch algorithm
slide-4
SLIDE 4

Tree IR language (from Textbook)

  • A simple tree expression language:

signature TREE = sig type label = Temp.label datatype stm = MOVE of exp * exp | EXP of exp | JUMP of exp * label list | CJUMP of relop * exp * exp * label * label | SEQ of stm * stm | LABEL of label and exp = CONST of int | NAME of label | TEMP of Temp.temp | BINOP of binop * exp * exp | MEM of exp | CALL of exp * exp list | ESEQ of stm * exp and binop = PLUS | MINUS | MUL | DIV | AND | OR | LSHIFT | RSHIFT | ARSHIFT | XOR and relop = EQ | NE | LT | GT | LE | GE | ULT | ULE | UGT | UGE ... end

slide-5
SLIDE 5

Instruction Selection for Tree IR language

  • Each IR node does one thing, real

machine instructions typically do several things

  • Ex: typical memory access ➜
  • This is good, IR should be primitive
  • Instruction selection = find ways to

express IR trees using instructions

  • NB: using shorthand notation ➜

BINOP CONST e c PLUS MEM + CONST e c MEM

slide-6
SLIDE 6

Describing Instructions

  • Basic device: the tree pattern
  • Matching idea
  • A tree pattern is a partial tree, a tile
  • From the top: concrete nodes
  • At bottom: blanks, standing for subtrees, called leaves

  • Repeated matching, tiling, reconstructs an IR tree
  • Read off instruction sequence: 


top-down traversal = reverse order

slide-7
SLIDE 7

For illustration: Jouette

  • Need concrete instruction set
  • Hypothetical (RISC) CPU architecture ‘Jouette’
  • Instructions ➜
  • Three-address format:


flexible locations

  • Arithmetic operations:

  • nly in registers
  • Addressing modes:

  • nly one address, fixed offset

ADD ri ⃪ rj + rk MUL ri ⃪ rj * rk SUB ri ⃪ rj - rk DIV ri ⃪ rj / rk ADDI ri ⃪ rj + c SUBI ri ⃪ rj - c LOAD ri ⃪ M[rj + c]

slide-8
SLIDE 8

Jouette Tiles

  • Two categories:
  • ‘Expression tile’: produces a result in a register
  • ‘Statement tile’: creates a side-effect
  • Special case: a register is an atomic expression

(no name) ri

t TEMP TEMP

shorthand:

slide-9
SLIDE 9

Jouette Expression Tiles

  • Main arithmetic operations: unique patterns

DIV ri ⃪ rj / rk SUB ri ⃪ rj - rk MUL ri ⃪ rj * rk ADD ri ⃪ rj + rk

+

  • *

/

slide-10
SLIDE 10

Jouette Expression Tiles

  • Arithmetic operations involving immediate:


multiple interpretations — multiple patterns

SUBI ri ⃪ rj - c ADDI ri ⃪ rj + c

  • CONST

CONST + CONST + CONST

slide-11
SLIDE 11

Jouette Expression Tiles

  • Reading from memory: many interpretations

LOAD ri ⃪ M[rj + c]

MEM + CONST MEM CONST MEM + CONST MEM

slide-12
SLIDE 12

Jouette Statement Tiles

  • Storing in memory: larger tiles

STORE M[ri + c] ⃪ rj

MEM MOVE MEM + CONST MOVE MEM + CONST MOVE MEM CONST MOVE

slide-13
SLIDE 13

Jouette Statement Tiles

  • Moving in memory
  • (Not a typical RISC instruction, but illustrative)
  • NB: store tiles always match the two nodes

MOVE(MEM,_) simultaneously

MOVEMM[ri] ⃪ M[rj]

MEM MOVE MEM

slide-14
SLIDE 14

Example Tilings

  • Consider an IR tree for a[i] := x

MEM MOVE MEM + + MEM * + FP CONST a TEMP i CONST 4 FP CONST x

Discuss how this tree can specify that assignment!

slide-15
SLIDE 15

Example Tilings

  • One way to tile this IR tree for a[i] := x

MEM MOVE MEM + + MEM * + FP CONST a TEMP i CONST 4 FP CONST x LOAD r1 ⃪ M[FP + a] ADDI r2 ⃪ r0 + 4 MUL r2 ⃪ ri * r2 ADD r1 ⃪ r1 + r2 LOAD r2 ⃪ M[FP + x] STORE M[r1 + 0] ⃪ r2

slide-16
SLIDE 16

Example Tilings

  • Another way to tile this IR tree for a[i] := x

MEM MOVE MEM + + MEM * + FP CONST a TEMP i CONST 4 FP CONST x LOAD r1 ⃪ M[FP + a] ADDI r2 ⃪ r0 + 4 MUL r2 ⃪ ri * r2 ADD r1 ⃪ r1 + r2

slide-17
SLIDE 17

Example Tilings

  • An “anti-optimal” tiling of the tree for a[i] := x

MEM MOVE MEM + + MEM * + FP CONST a TEMP i CONST 4 FP CONST x ADDI r1 ⃪ r0 + a ADD r1 ⃪ FP + r1 LOAD r1 ⃪ M[r1 + 0] ADDI r2 ⃪ r0 + 4 MUL r2 ⃪ ri * r2 ADD r1 ⃪ r1 + r2 ADDI r2 ⃪ r0 + x ADD r2 ⃪ FP + r2

slide-18
SLIDE 18

Optimal vs Optimum Tilings

  • What’s the “best” tiling?
  • Minimal number of instructions?
  • Best performance at runtime?
  • Compositionally assumption: Can compute “best”

based on each tile (reality: cost is not additive!)

  • Choice here: Minimal number of instructions
  • Optimal: No gain combining two neighboring tiles
  • Optimum: No tiling has lower cost
  • Property for optimal: local, for optimum: global
  • Note that optimum ⇒ optimal, not vice versa
slide-19
SLIDE 19

Comparing Criteria

  • Obviously, optimal easier than optimum
  • Then, how valuable is optimum?
  • RISC CPU architecture: Not terribly important
  • each tile small, optimal/optimum often identical
  • CISC CPU architecture: More important
  • larger tiles, many choices everywhere
slide-20
SLIDE 20

Algorithm: Maximal Munch

  • A greedy algorithm, fast, easy to understand
  • Idea:
  • Start from root of IR tree, work downward
  • At each node N, choose biggest tile that matches
  • Recur on leaves of chosen tile (not children of N ! )
  • Note: Is never stuck if all single-node tiles exist
slide-21
SLIDE 21

Maximal Munch Example

  • The second tiling for a[i] := x

MEM MOVE MEM + + MEM * + FP CONST a TEMP i CONST 4 FP CONST x LOAD r1 ⃪ M[FP + a] ADDI r2 ⃪ r0 + 4 MUL r2 ⃪ ri * r2 ADD r1 ⃪ r1 + r2 ADDI r2 ⃪ FP + x

slide-22
SLIDE 22

Optimum Algorithm

  • An algorithm based on dynamic programming, a bit

more complex than maximal munch

  • Idea:
  • Start from bottom of IR tree, work upward


(recursion: process children, then current node)

  • Concept: assign cost to each node (bottom up)
  • At each node, compute cost for each tile T by

adding cost of T to cost of T's leaves

  • Solution is optimum
slide-23
SLIDE 23

Algorithm Complexity

  • Parameters:
  • N : number of nodes in given IR tree
  • T : number of tiles
  • K : average number of non-leaf nodes in tiles
  • K' : max no. of nodes to check to see which tiles match
  • T' : average number of tiles matching at a node
  • Maximal Munch: N/K(K'+T')
  • Optimum (dyn.pgm.) algorithm: N(K'+T')
  • But this is linear in the size of the IR tree!
  • “No problem!”
slide-24
SLIDE 24

Tree Grammars

  • Motivation: Some CPUs, e.g., Motorola 68000,

have register classes: data vs. address registers

  • Problem: using previous algorithm, sub-tiling may

produce result in the wrong type of register

  • Idea:
  • Specify tiles as CFG rules ➜
  • Non-terminal indicates class
  • Derivation creates IR tree
  • Ambiguity = alternative tilings
  • Tools exist (code-generator generators),


usage not unlike parser generators

d ➜ MEM(+(a,CONST)) d ➜ MEM(+(CONST ,a)) d ➜ MEM(CONST) d ➜ MEM(a) d ➜ a a ➜ d

slide-25
SLIDE 25

CPU Architecture Issues

  • RISC was mostly invented to fit well with modern code

generation

  • RISC features, good and bad:
  • many registers (e.g., 32)
  • every register can do everything (just one class)
  • arithmetic operations only on registers (no MUL?)
  • three-address instructions (flexible placement)
  • just one memory addressing mode (M[reg+const])
  • uniform instruction size (e.g., 32 bit)
  • every instruction has a single effect/result
slide-26
SLIDE 26

Summary

  • IR nodes do one thing, instructions many
  • Tree patterns, tiles, ‘leaves’ of tiles
  • Instruction selection: Cover IR tree with tiles
  • Jouette architecture, instruction set
  • Jouette statement tiles, expression tiles
  • Example tilings
  • Optimum vs. optimal tilings
  • Algorithms: Maximal munch; dyn. programming
  • Tree grammars