Extended Lattice-Based Memory Allocation Alain Darte Tomofumi Yuki - - PowerPoint PPT Presentation

extended lattice based memory allocation
SMART_READER_LITE
LIVE PREVIEW

Extended Lattice-Based Memory Allocation Alain Darte Tomofumi Yuki - - PowerPoint PPT Presentation

Extended Lattice-Based Memory Allocation Alain Darte Tomofumi Yuki Alexandre Isoard Laboratoire de lInformatique du Paralllisme Lyon, France 25th International Conference on Compiler Construction March 1718, 2016 Barcelona Spain A.


slide-1
SLIDE 1

Extended Lattice-Based Memory Allocation

Alain Darte Alexandre Isoard Tomofumi Yuki

Laboratoire de l’Informatique du Parallélisme Lyon, France

25th International Conference

  • n Compiler Construction

March 17–18, 2016 Barcelona Spain

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 1 / 20

slide-2
SLIDE 2

Motivation

We want to automatically find compact memory allocations. Because: domain specific languages (DSL) abstract away memory allocation:

◮ Alpha ◮ Single Assignment C ◮ . . .

array expansion increases parallelism but also memory footprint

◮ Parallelizing/optimizing compilers

useful for programming memory hierarchy

◮ Kernel offloading to GPU ◮ High level synthesis for FPGA ◮ Flat-mode of Xeon Phi

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 2 / 20

slide-3
SLIDE 3

Static intra-array optimization

Principle: Reuse memory locations (values without overlapping lifetimes) Reuse within a given array (element wise) Reduce memory of temporary arrays Static optimization (at compile time)

for(int t = 0; t < n-1; ++t) for(int i = 0; i < n; ++i) A[t+1][i] = f(A[t][i-1], A[t][i], A[t][i+1]);

Uses n2 storage.

for(int t = 0; t < n-1; ++t) for(int i = 0; i < n; ++i) A[(t+1)%2][i] = f(A[t%2][i-1], A[t%2][i], A[t%2][i+1]);

Uses 2n storage.

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 3 / 20

slide-4
SLIDE 4

Static intra-array optimization

Principle: Reuse memory locations (values without overlapping lifetimes) Reuse within a given array (element wise) Reduce memory of temporary arrays Static optimization (at compile time)

for(int t = 0; t < n-1; ++t) for(int i = 0; i < n; ++i) A[t+1][i] = f(A[t][i-1], A[t][i], A[t][i+1]);

Uses n2 storage.

for(int t = 0; t < n-1; ++t) for(int i = 0; i < n; ++i) int x = i-t; int b = n+1; A[x%b] = f(A[x%b], A[(x+1)%b], A[(x+2)%b]);

Uses n + 1 storage!

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 3 / 20

slide-5
SLIDE 5

History: shift in approach

Schedule-based approaches: Wilde & Rajopadhye (1996) De Greef, Cathoor & al. (1997) Lefebvre-Feautrier (1998) Quilleré & Rajopadhye (2000) Thies, Amarasinghe, & al. (2007) Separation of concerns (live-range conflict vs. modular mapping search) Universal Occupancy Vectors, Strout & al. (1998) Lattice-based, Darte & al. (2005) SMO, Bhaskaracharya & al. (2015) Extended Lattice-based, Darte & al. (this paper)

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 4 / 20

slide-6
SLIDE 6

Our contribution Basis selection heuristic driven by reuse vectors

Provides: Support for wide range of languages (cf. our paper at IMPACT’16) Method to optimize mapping for size Natural treatment of non-convex union of polyhedra Parametric analysis and parametric modular mapping Simple to use (only requires conflict difference set) Scope: Intra-array optimization ☛ Inter-array? Size focused ☛ Locality? Affine mapping ☛ Redundant storage, live-range splitting?

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 5 / 20

slide-7
SLIDE 7

Successive modulo

for(int t = 0; t < n; ++t) for(int i = 0; i < n; ++i) A[t][i] = A[t-1][i-1] + A[t-1][i] + A[t-1][i+1];

t i Canonical basis: (Lefebvre-Feautrier) A[t][i] → A[t % 2][i % n]

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 6 / 20

slide-8
SLIDE 8

Successive modulo

for(int t = 0; t < n; ++t) for(int i = 0; i < n; ++i) A[t][i] = A[t-1][i-1] + A[t-1][i] + A[t-1][i+1];

t i Canonical basis: (Lefebvre-Feautrier) A[t][i] → A[t % 2][i % n]

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 6 / 20

slide-9
SLIDE 9

Successive modulo

for(int t = 0; t < n; ++t) for(int i = 0; i < n; ++i) A[t][i] = A[t-1][i-1] + A[t-1][i] + A[t-1][i+1];

t i Canonical basis: (Lefebvre-Feautrier) A[t][i] → A[t % 2][i % n]

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 6 / 20

slide-10
SLIDE 10

Successive modulo

for(int t = 0; t < n; ++t) for(int i = 0; i < n; ++i) A[t][i] = A[t-1][i-1] + A[t-1][i] + A[t-1][i+1];

t i Canonical basis: (Lefebvre-Feautrier) A[t][i] → A[t % 2][i % n] Skewed basis: A[t][i] → A[(i-t) % (n+1)]

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 6 / 20

slide-11
SLIDE 11

Successive modulo

for(int t = 0; t < n; ++t) for(int i = 0; i < n; ++i) A[t][i] = A[t-1][i-1] + A[t-1][i] + A[t-1][i+1];

t i Canonical basis: (Lefebvre-Feautrier) A[t][i] → A[t % 2][i % n] Skewed basis: A[t][i] → A[(i-t) % (n+1)]

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 6 / 20

slide-12
SLIDE 12

Tiling case

We usually get this kind of conflicts (live-out set) after tiling: t i

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 7 / 20

slide-13
SLIDE 13

Tiling case

We usually get this kind of conflicts (live-out set) after tiling: t i Canonical basis:

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 7 / 20

slide-14
SLIDE 14

Tiling case

We usually get this kind of conflicts (live-out set) after tiling: t i Canonical basis: A[t][i] → A[t % n][i % n]

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 7 / 20

slide-15
SLIDE 15

Tiling case

We usually get this kind of conflicts (live-out set) after tiling: t i Canonical basis: A[t][i] → A[t % n][i % n] Skewed basis:

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 7 / 20

slide-16
SLIDE 16

Tiling case

We usually get this kind of conflicts (live-out set) after tiling: t i Canonical basis: A[t][i] → A[t % n][i % n] Skewed basis: A[t][i] → A[(i-t) % (2n-1)] Bhaskaracharya & al.: Looks for hyperplanes interecting at most one point

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 7 / 20

slide-17
SLIDE 17

Tiling in general (with longer dependences)

We also get this kind of conflicts (live-out set) after tiling: t i Bhaskaracharya & al.: Looks for hyperplanes interecting at most one point

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 8 / 20

slide-18
SLIDE 18

Tiling in general (with longer dependences)

We also get this kind of conflicts (live-out set) after tiling: t i Bhaskaracharya & al. basis: A[t][i] → A[(i-2t) % (5n-4)] Bhaskaracharya & al.: Looks for hyperplanes interecting at most one point

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 8 / 20

slide-19
SLIDE 19

Tiling in general (with longer dependences)

We also get this kind of conflicts (live-out set) after tiling: t i Bhaskaracharya & al. basis: A[t][i] → A[(i-2t) % (5n-4)] Best basis: A[t][i] → A[t%2][(i-t)%(2n-1)] Bhaskaracharya & al.: Looks for hyperplanes interecting at most one point Ours: Finds the basis that minimizes reuse distance

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 8 / 20

slide-20
SLIDE 20

Running example

t i A[t][i] → A[t%2][i]

for(int t = 0; t < n; ++t) for(int i = 0; i < n; ++i) A[t][i] = A[t-1][i-1] + A[t-1][i] + A[t-1][i+1] + A[t-2][i];

֒ →

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 9 / 20

slide-21
SLIDE 21

Running example (skewed)

t i

for(int t = 0; t < n; ++t) for(int i = t; i < n+t; ++i) int j = i-t; A[t][j] = A[t-1][j-1] + A[t-1][j] + A[t-1][j+1] + A[t-2][j];

֒ →

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 10 / 20

slide-22
SLIDE 22

Running example (tiled)

t i

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 11 / 20

slide-23
SLIDE 23

Running example (tiled)

t i

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 11 / 20

slide-24
SLIDE 24

Running example (tiled)

t i

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 11 / 20

slide-25
SLIDE 25

Conflict set to conflict differences (visualization)

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 12 / 20

t i δt δi

slide-26
SLIDE 26

Running example (successive modulo)

δt δi m0 Canonical basis:

  • 1

1

  • x mod
  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 13 / 20

slide-27
SLIDE 27

Running example (successive modulo)

δt δi m0 n − 1 Canonical basis:

  • 1

1

  • x mod
  • n
  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 13 / 20

slide-28
SLIDE 28

Running example (successive modulo)

δt δi m0 n − 1 slice Canonical basis:

  • 1

1

  • x mod
  • n
  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 13 / 20

slide-29
SLIDE 29

Running example (successive modulo)

δt δi m1 n − 1 Canonical basis:

  • 1

1

  • x mod
  • n

n

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 13 / 20

slide-30
SLIDE 30

Running example (successive modulo)

δt δi m0 Canonical basis:

  • 1

1

  • x mod
  • n

n

  • Best basis:
  • 1

−1 1

  • x mod
  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 13 / 20

slide-31
SLIDE 31

Running example (successive modulo)

δt δi m0

2n − 2

Canonical basis:

  • 1

1

  • x mod
  • n

n

  • Best basis:
  • 1

−1 1

  • x mod
  • 2n − 1
  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 13 / 20

slide-32
SLIDE 32

Running example (successive modulo)

δt δi m0

2n − 2

slice Canonical basis:

  • 1

1

  • x mod
  • n

n

  • Best basis:
  • 1

−1 1

  • x mod
  • 2n − 1
  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 13 / 20

slide-33
SLIDE 33

Running example (successive modulo)

δt δi m1

1

Canonical basis:

  • 1

1

  • x mod
  • n

n

  • Best basis:
  • 1

−1 1

  • x mod
  • 2n − 1

2

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 13 / 20

slide-34
SLIDE 34

Heuristic 1: Mapping space

Idea: a greedy algorithm Find the direction m0 that minimizes b0; Slice orthogonally to m0; Then continue with m1 not colinear to m0; ...

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 14 / 20

slide-35
SLIDE 35

Heuristic 1: Mapping space

Idea: a greedy algorithm Find the direction m0 that minimizes b0; Slice orthogonally to m0; Then continue with m1 not colinear to m0; ... Pros: The first modulo is always better than the one of Lefebvre-Feautrier Implementable with Farkas’ lemma Naturally gives a unimodular basis

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 14 / 20

slide-36
SLIDE 36

Heuristic 1: Mapping space

Idea: a greedy algorithm Find the direction m0 that minimizes b0; Slice orthogonally to m0; Then continue with m1 not colinear to m0; ... Pros: The first modulo is always better than the one of Lefebvre-Feautrier Implementable with Farkas’ lemma Naturally gives a unimodular basis Cons: No guarantee on next moduli, slicing not taken into account Give bad results on most tiled codes: e.g.: (n, n) has a smaller first modulo than (2n − 1, 2)

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 14 / 20

slide-37
SLIDE 37

Running example (admissible lattice)

An admissible lattice corresponds to a valid storage δt δi a0 a1 Canonical basis: A =

  • n

n

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 15 / 20

slide-38
SLIDE 38

Running example (admissible lattice)

An admissible lattice corresponds to a valid storage δt δi a0 a1 Canonical basis: A =

  • n

n

  • Best basis:

A =

  • 2

2 − n 2 n

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 15 / 20

slide-39
SLIDE 39

Running example (admissible lattice)

An admissible lattice corresponds to a valid storage δt δi a0 a1 extrude Canonical basis: A =

  • n

n

  • Good basis:

A =

  • 2

1 − n 2 n

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 15 / 20

slide-40
SLIDE 40

Heuristic 2: Lattice space

Determinant of lattice = size of mapping: Find first lattice vector a0 with minimal norm; Extrude along a0; Find second lattice vector a1 with minimal norm; ...

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 16 / 20

slide-41
SLIDE 41

Heuristic 2: Lattice space

Determinant of lattice = size of mapping: Find first lattice vector a0 with minimal norm; Extrude along a0; Find second lattice vector a1 with minimal norm; ... Pros: Find most constant reuse vectors Give a unimodular basis (if enforced)

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 16 / 20

slide-42
SLIDE 42

Heuristic 2: Lattice space

Determinant of lattice = size of mapping: Find first lattice vector a0 with minimal norm; Extrude along a0; Find second lattice vector a1 with minimal norm; ... Pros: Find most constant reuse vectors Give a unimodular basis (if enforced) Cons: Requires star-shaping (and complement might be costly) Depends on the norm and basis description Cannot go back to mapping space if parametric

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 16 / 20

slide-43
SLIDE 43

From lattice space back to mapping space

The lattice is the kernel of the mapping ☛ we need to “invert” it

  • 2

1 − 2n 2

  • =
  • 1

−1 1 2n − 1 2

  • 1

−1 1

  • mod
  • 2n − 1

2

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 17 / 20

slide-44
SLIDE 44

From lattice space back to mapping space

The lattice is the kernel of the mapping ☛ we need to “invert” it

  • n

2 1 − n 2

  • =

?

  • 2n − 1

2

?

mod

  • 2n − 1

2

  • Problem:

Heuristic 2 might give parametric axis “Inverting” a parametric lattice is hard!

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 17 / 20

slide-45
SLIDE 45

From lattice space back to mapping space

The lattice is the kernel of the mapping ☛ we need to “invert” it

  • n

2 1 − n 2

  • =

?

  • 2n − 1

2

?

mod

  • 2n − 1

2

  • Problem:

Heuristic 2 might give parametric axis “Inverting” a parametric lattice is hard! Fix: (Heuristic 3) Apply Heuristic 2 as long as we get constant (reuse) vectors “Invert” the partial lattice (back to mapping) Complete basis with Heuristic 1 (restricting to the orthogonal)

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 17 / 20

slide-46
SLIDE 46

Running example (Bhaskaracharya & al.)

Actually work on asymmetric conflict relation: Find hyperplane that does not intersect any point Or find hyperplane that intersects the minimum number of polyhedra Depends on the representation:

δx δy

Gives canonical axis: (1, 0)

δx δy

Gives a skewed diagonal: (3, −2)

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 18 / 20

slide-47
SLIDE 47

Results

Example Algorithm Reuse Vector Mapping Reduction LBM-D2Q9 LeFe (t, i, j) → (t, i, j) mod (2, N, N) 1 BBC (t, i, j) → (i − 2t, j) mod (N + 2, N) 2 Lattice (1, 1, 1) (t, i, j) → (i − t, j − t) mod (N + 1, N + 1) 2 LBM-D3Q27 LeFe (t, i, j, k) → (t, i, j, k) mod (2, N, N, N) 1 BBC (t, i, j, k) → (i − 2t, j, k) mod (N + 2, N, N) 2 Lattice (1, 1, 1, 1) (t, i, j, k) → (k − t, i − t, j − t) mod (N, N + 1, N + 1) 2 diamond-tile LeFe (t, i) → (t, i) mod (B, 2B − 1) 1 BBC (t, i) → (t − 3i) mod (6B − 5) B/3 Lattice (2, 0) (t, i) → (i, t) mod (2B − 1, 2) B/2 Running example LeFe/BBC (x, y) → (x, y) mod (N, N) 1 Lattice (2, 2) (x, y) → (x − y, y) mod (2N − 1, 2) N/4 heat-2d-tiled LeFe/BBC (t, i, j) → (t, i, j) mod (B, B, B) 1 Lattice (1, 1, 1) (t, i, j) → (i − j, j − t) mod (2B − 1, 3B − 2) B/6

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 19 / 20

slide-48
SLIDE 48

Conclusion

Internal improvements: Heuristic 2 finding parametric vectors but of constant direction Parametric star-shaping Future works: Optimizing the partial constant lattice in a global way? Optimizing all moduli in a global way? Locality awareness / redundant storage Inter-array optimization More examples, more languages, and more applications Thank you for your attention!

  • A. Darte, A. Isoard, T. Yuki (LIP, Lyon)

Extended Lattice-Based Memory Allocation Compiler Construction 2016 20 / 20