Generating SIMD Instructions for Cerebras CS-1 using Polyhedral - - PowerPoint PPT Presentation

generating simd instructions for cerebras cs 1 using
SMART_READER_LITE
LIVE PREVIEW

Generating SIMD Instructions for Cerebras CS-1 using Polyhedral - - PowerPoint PPT Presentation

Generating SIMD Instructions for Cerebras CS-1 using Polyhedral Compilation Techniques Sven Verdoolaege Manjunath Kudlur Rob Schreiber Harinath Kamepalli Cerebras Systems January 22, 2020 January 22, 2020 2 / 31 Outline Target


slide-1
SLIDE 1

Generating SIMD Instructions for Cerebras CS-1 using Polyhedral Compilation Techniques

Sven Verdoolaege Manjunath Kudlur Rob Schreiber Harinath Kamepalli

Cerebras Systems

January 22, 2020

slide-2
SLIDE 2

January 22, 2020 2 / 31

Outline

1

Target Architecture

2

Code Generation

3

SIMD Code Generation

4

Conclusion

slide-3
SLIDE 3

Target Architecture January 22, 2020 3 / 31

Outline

1

Target Architecture

2

Code Generation

3

SIMD Code Generation

4

Conclusion

slide-4
SLIDE 4

Target Architecture January 22, 2020 4 / 31

Cerebras CS-1

Largest chip ever built 46,225 mm2 silicon 1.2 trillion transistors 400,000 AI optimized cores 18 Gigabytes of On-chip Memory 9 PByte/s memory bandwidth 100 Pbit/s fabric bandwidth TSMC 16nm process

slide-5
SLIDE 5

Target Architecture January 22, 2020 5 / 31

Interesting Features

Dataflow scheduling in hardware

◮ Triggered by data ◮ Filters out sparse zero data ◮ Skips unnecessary processing

slide-6
SLIDE 6

Target Architecture January 22, 2020 6 / 31

Sparse Tensor Communication

Tensor 42 57 13 Dense Communication 42 57 13 send

slide-7
SLIDE 7

Target Architecture January 22, 2020 6 / 31

Sparse Tensor Communication

Tensor 42 57 13 Dense Communication 42 57 13 send Sparse Communication break up tensor into chunks (e.g., rows)

  • nly send

◮ non-zero entry + position in chunk ◮ end-of-chunk

42 1 57 13 2 eoc eoc eoc send

slide-8
SLIDE 8

Target Architecture January 22, 2020 7 / 31

Interesting Features

Dataflow scheduling in hardware

◮ Triggered by data ◮ Filters out sparse zero data ◮ Skips unnecessary processing

slide-9
SLIDE 9

Target Architecture January 22, 2020 7 / 31

Interesting Features

Dataflow scheduling in hardware

◮ Triggered by data ◮ Filters out sparse zero data ◮ Skips unnecessary processing

Powerful SIMD Engine

◮ Performs some number of operations per cycle ◮ Mimics normalized loop nest of depth at most four

⇒ removes overhead of software managed loops

slide-10
SLIDE 10

Target Architecture January 22, 2020 8 / 31

SIMD Instructions

Loop code:

handle(uint16_t index , half data) { for (int c3 = 0; c3 <= 4; c3 += 1) for (int c4 = 0; c4 <= 4; c4 += 1) dx_local [2 * dy_index_0 + c3][2 * index + c4] += (data) * (W_local [0][ c3][c4]); }

slide-11
SLIDE 11

Target Architecture January 22, 2020 8 / 31

SIMD Instructions

Loop code:

handle(uint16_t index , half data) { for (int c3 = 0; c3 <= 4; c3 += 1) for (int c4 = 0; c4 <= 4; c4 += 1) dx_local [2 * dy_index_0 + c3][2 * index + c4] += (data) * (W_local [0][ c3][c4]); }

SIMD instruction:

handle(uint16_t index , half data) { set_base_address (dx , &dx_local [2 * dy_index_0 ][2 * index]); invoke_simd(fmach , dx , W, data , index ); } void main () { configure(/* 5,5; W_local: i,j -> 0,i,j; dx_local: i,j -> i,j */); set_base_address (W, &W_local [0][0][0] ); }

slide-12
SLIDE 12

Code Generation January 22, 2020 9 / 31

Outline

1

Target Architecture

2

Code Generation

3

SIMD Code Generation

4

Conclusion

slide-13
SLIDE 13

Code Generation January 22, 2020 10 / 31

Code Generation Overview

LAIR code DTG codegen C-level code LAIR map

slide-14
SLIDE 14

Code Generation January 22, 2020 10 / 31

Code Generation Overview

LAIR code DTG codegen C-level code LAIR map LAIR ⇒ DSL written by hand or extracted from TensorFlow (Abadi et al. 2016)

slide-15
SLIDE 15

Code Generation January 22, 2020 11 / 31

LAIR Example

lair matvec <T=float16 >(M, N): T W[M][N], T x[N] -> T y[M] { all (i, j) in (M, N) y[i] += W[i][j] * x[j] } lair node defines one or more output tensors in terms of input tensors each statement has zero-based rectangular set of instances LAIR is single assignment (at tensor level) all accesses are affine (not piecewise, not quasi-affine) each tensor in a statement is accessed through single index expression Other nodes combine and/or specialize lair nodes ⇒ e.g., M = 32 and N = 16

slide-16
SLIDE 16

Code Generation January 22, 2020 12 / 31

Code Generation Overview

LAIR code DTG codegen C-level code LAIR map LAIR ⇒ DSL written by hand or extracted from TensorFlow (Abadi et al. 2016)

slide-17
SLIDE 17

Code Generation January 22, 2020 12 / 31

Code Generation Overview

LAIR code DTG codegen C-level code LAIR map LAIR ⇒ DSL written by hand or extracted from TensorFlow (Abadi et al. 2016) LAIR map contains information in isl (V. 2010) notation about the size of the target rectangle of PEs how input and output tensors are communicated where computations are performed

slide-18
SLIDE 18

Code Generation January 22, 2020 13 / 31

LAIR Map Example

lair matvec <T=float16 >(M, N): T W[M][N], T x[N] -> T y[M] { all (i, j) in (M, N) y[i] += W[i][j] * x[j] } Mapping of 32 × 16 matrix vector multiplication to 4 × 4 PEs. PEx PEy x y size: { PE[4, 4] } compute_map: { ff[i, j] -> PE[j//4, i//8] } iport_map: { x[i=0:15]

  • > [PE[i//4,
  • 1] -> index[i%4]] }
  • port_map: { y[i=0:31]
  • > [PE[4, i//8] -> index[i%8]] }
slide-19
SLIDE 19

Code Generation January 22, 2020 14 / 31

Task Graph Construction

Code generation consists of Parse LAIR and LAIR map Construct task graph Detect SIMD opportunities Write out code

slide-20
SLIDE 20

Code Generation January 22, 2020 14 / 31

Task Graph Construction

Code generation consists of Parse LAIR and LAIR map Construct task graph Detect SIMD opportunities Write out code Task graph construction: split LAIR specification into communication tasks computation tasks Two types:

◮ react to incoming tensor element ◮ read in entire tensor or operate on local memory

slide-21
SLIDE 21

SIMD Code Generation January 22, 2020 15 / 31

Outline

1

Target Architecture

2

Code Generation

3

SIMD Code Generation

4

Conclusion

slide-22
SLIDE 22

SIMD Code Generation January 22, 2020 16 / 31

SIMD Code Generation

⇒ detect sets of computation instances that can be performed by SIMD instructions ⇒ determine

◮ supported instruction ◮ “fixed” instance set sizes ◮ accesses of the form

  • ffset + linear in iterators

“fixed” sizes: may depend on PE, but not on tensor element Otherwise, configuration needs to be performed before each invocation

slide-23
SLIDE 23

SIMD Code Generation January 22, 2020 16 / 31

SIMD Code Generation

⇒ detect sets of computation instances that can be performed by SIMD instructions ⇒ determine

◮ supported instruction ◮ “fixed” instance set sizes ◮ accesses of the form

  • ffset + linear in iterators

“fixed” sizes: may depend on PE, but not on tensor element Otherwise, configuration needs to be performed before each invocation

slide-24
SLIDE 24

SIMD Code Generation January 22, 2020 17 / 31

SIMD Instructions

Loop code:

handle(uint16_t index , half data) { for (int c3 = 0; c3 <= 4; c3 += 1) for (int c4 = 0; c4 <= 4; c4 += 1) dx_local [2 * dy_index_0 + c3][2 * index + c4] += (data) * (W_local [0][ c3][c4]); }

SIMD instruction:

handle(uint16_t index , half data) { set_base_address (dx , &dx_local [2 * dy_index_0 ][2 * index]); invoke_simd(fmach , dx , W, data , index ); } void main () { configure(/* 5,5; W_local: i,j -> 0,i,j; dx_local: i,j -> i,j */); set_base_address (W, &W_local [0][0][0] ); }

slide-25
SLIDE 25

SIMD Code Generation January 22, 2020 18 / 31

Challenge

Recall: lair node guarantees: each statement has zero-based rectangular set of instances all accesses are affine (not piecewise, not quasi-affine) SIMD detection requirements: “fixed” instance set sizes accesses of the form

  • ffset + linear in iterators

Trivial?

slide-26
SLIDE 26

SIMD Code Generation January 22, 2020 19 / 31

Trivial Example

lair matvec <T=float16 >(M, N): T W[M][N], T x[N] -> T y[M] { all (i, j) in (M, N) y[i] += W[i][j] * x[j] } compute_map: { ff[i, j] -> PE[j//4, i//8] }

slide-27
SLIDE 27

SIMD Code Generation January 22, 2020 19 / 31

Trivial Example

lair matvec <T=float16 >(M, N): T W[M][N], T x[N] -> T y[M] { all (i, j) in (M, N) y[i] += W[i][j] * x[j] } compute_map: { ff[i, j] -> PE[j//4, i//8] }

Computation instances: i j

slide-28
SLIDE 28

SIMD Code Generation January 22, 2020 19 / 31

Trivial Example

lair matvec <T=float16 >(M, N): T W[M][N], T x[N] -> T y[M] { all (i, j) in (M, N) y[i] += W[i][j] * x[j] } compute_map: { ff[i, j] -> PE[j//4, i//8] }

Computation instances: i j Mapping to PEs

slide-29
SLIDE 29

SIMD Code Generation January 22, 2020 19 / 31

Trivial Example

lair matvec <T=float16 >(M, N): T W[M][N], T x[N] -> T y[M] { all (i, j) in (M, N) y[i] += W[i][j] * x[j] } compute_map: { ff[i, j] -> PE[j//4, i//8] }

Computation instances: i j Computation instances on PE: i j 4PEx 8PEy Mapping to PEs

slide-30
SLIDE 30

SIMD Code Generation January 22, 2020 19 / 31

Trivial Example

lair matvec <T=float16 >(M, N): T W[M][N], T x[N] -> T y[M] { all (i, j) in (M, N) y[i] += W[i][j] * x[j] } compute_map: { ff[i, j] -> PE[j//4, i//8] }

Computation instances: i j Computation instances on PE: i j 4PEx 8PEy Mapping to PEs Arrival of x-value

slide-31
SLIDE 31

SIMD Code Generation January 22, 2020 19 / 31

Trivial Example

lair matvec <T=float16 >(M, N): T W[M][N], T x[N] -> T y[M] { all (i, j) in (M, N) y[i] += W[i][j] * x[j] } compute_map: { ff[i, j] -> PE[j//4, i//8] }

Computation instances: i j Computation instances on PE: i j 4PEx 8PEy Mapping to PEs Arrival of x-value ⇒ Size: [8, 1] ⇒ Access to y: y[8PEy + i′] (local coordinates: i′, j′)

slide-32
SLIDE 32

SIMD Code Generation January 22, 2020 20 / 31

Size Computation

Input: S: set of instances executed on a PE on arrival of a tensor element

slide-33
SLIDE 33

SIMD Code Generation January 22, 2020 20 / 31

Size Computation

Input: S: set of instances executed on a PE on arrival of a tensor element Compute element-wise minimum and maximum of S Construct { x : min ≤ x ≤ max } Check equal to S ⇒ S is a dense box Size: max − min + 1 Check size does not depend on “index”

slide-34
SLIDE 34

SIMD Code Generation January 22, 2020 21 / 31

Convolution

lair C() : float16 x[8], float16 W[3] -> float16 y[6] { all (w, rw) in (8 - 3 + 1, 3) y[w] += x[w + rw] * W[rw] } compute_map: { C[w, rw] -> PE[0, 0] }

slide-35
SLIDE 35

SIMD Code Generation January 22, 2020 21 / 31

Convolution

lair C() : float16 x[8], float16 W[3] -> float16 y[6] { all (w, rw) in (8 - 3 + 1, 3) y[w] += x[w + rw] * W[rw] } compute_map: { C[w, rw] -> PE[0, 0] }

Computation instances: w rw

slide-36
SLIDE 36

SIMD Code Generation January 22, 2020 21 / 31

Convolution

lair C() : float16 x[8], float16 W[3] -> float16 y[6] { all (w, rw) in (8 - 3 + 1, 3) y[w] += x[w + rw] * W[rw] } compute_map: { C[w, rw] -> PE[0, 0] }

Computation instances: w rw Arrival of x-value

slide-37
SLIDE 37

SIMD Code Generation January 22, 2020 21 / 31

Convolution

lair C() : float16 x[8], float16 W[3] -> float16 y[6] { all (w, rw) in (8 - 3 + 1, 3) y[w] += x[w + rw] * W[rw] } compute_map: { C[w, rw] -> PE[0, 0] }

Computation instances: w rw Arrival of x-value

slide-38
SLIDE 38

SIMD Code Generation January 22, 2020 21 / 31

Convolution

lair C() : float16 x[8], float16 W[3] -> float16 y[6] { all (w, rw) in (8 - 3 + 1, 3) y[w] += x[w + rw] * W[rw] } compute_map: { C[w, rw] -> PE[0, 0] }

Computation instances: w rw Arrival of x-value Compute minimum and maximum

slide-39
SLIDE 39

SIMD Code Generation January 22, 2020 21 / 31

Convolution

lair C() : float16 x[8], float16 W[3] -> float16 y[6] { all (w, rw) in (8 - 3 + 1, 3) y[w] += x[w + rw] * W[rw] } compute_map: { C[w, rw] -> PE[0, 0] }

Computation instances: w rw Arrival of x-value Compute minimum and maximum Construct { x : min ≤ x ≤ max }

slide-40
SLIDE 40

SIMD Code Generation January 22, 2020 21 / 31

Convolution

lair C() : float16 x[8], float16 W[3] -> float16 y[6] { all (w, rw) in (8 - 3 + 1, 3) y[w] += x[w + rw] * W[rw] } compute_map: { C[w, rw] -> PE[0, 0] }

Computation instances: w rw Arrival of x-value Compute minimum and maximum Construct { x : min ≤ x ≤ max } ⇒ not a dense box

slide-41
SLIDE 41

SIMD Code Generation January 22, 2020 22 / 31

Variable Compression

Variable compression (Meister 2004): pick affine transformation (with inverse) mapping lower-dimensional set to full-dimensional set (in lower-dimensional space)

slide-42
SLIDE 42

SIMD Code Generation January 22, 2020 22 / 31

Variable Compression

Variable compression (Meister 2004): pick affine transformation (with inverse) mapping lower-dimensional set to full-dimensional set (in lower-dimensional space) A B B[i] → A[1 + 2i, 3i]

slide-43
SLIDE 43

SIMD Code Generation January 22, 2020 23 / 31

Size Computation

Input: S: set of instances executed on a PE on arrival of a tensor element Compute element-wise minimum and maximum of S Construct { x : min ≤ x ≤ max } Check equal to S ⇒ S is a dense box Size: max − min + 1 Check size does not depend on “index”

slide-44
SLIDE 44

SIMD Code Generation January 22, 2020 23 / 31

Size Computation

Input: S: set of instances executed on a PE on arrival of a tensor element Apply variable compression to S to obtain S′ Compute element-wise minimum and maximum of S′ Construct { x : min ≤ x ≤ max } Check equal to S′ ⇒ S′ is a dense box Size: max − min + 1 Check size does not depend on “index”

slide-45
SLIDE 45

SIMD Code Generation January 22, 2020 24 / 31

Convolution

lair C() : float16 x[8], float16 W[3] -> float16 y[6] { all (w, rw) in (8 - 3 + 1, 3) y[w] += x[w + rw] * W[rw] } compute_map: { C[w, rw] -> PE[0, 0] }

Computation instances: w rw Arrival of x-value Compute minimum and maximum Construct { x : min ≤ x ≤ max } ⇒ not a dense box

slide-46
SLIDE 46

SIMD Code Generation January 22, 2020 24 / 31

Convolution

lair C() : float16 x[8], float16 W[3] -> float16 y[6] { all (w, rw) in (8 - 3 + 1, 3) y[w] += x[w + rw] * W[rw] } compute_map: { C[w, rw] -> PE[0, 0] }

Computation instances: w rw Arrival of x-value

slide-47
SLIDE 47

SIMD Code Generation January 22, 2020 24 / 31

Convolution

lair C() : float16 x[8], float16 W[3] -> float16 y[6] { all (w, rw) in (8 - 3 + 1, 3) y[w] += x[w + rw] * W[rw] } compute_map: { C[w, rw] -> PE[0, 0] }

Computation instances: w rw Compressed instances: i Arrival of x-value Compress

slide-48
SLIDE 48

SIMD Code Generation January 22, 2020 24 / 31

Convolution

lair C() : float16 x[8], float16 W[3] -> float16 y[6] { all (w, rw) in (8 - 3 + 1, 3) y[w] += x[w + rw] * W[rw] } compute_map: { C[w, rw] -> PE[0, 0] }

Computation instances: w rw Compressed instances: i Arrival of x-value Compress Compute minimum and maximum

slide-49
SLIDE 49

SIMD Code Generation January 22, 2020 24 / 31

Convolution

lair C() : float16 x[8], float16 W[3] -> float16 y[6] { all (w, rw) in (8 - 3 + 1, 3) y[w] += x[w + rw] * W[rw] } compute_map: { C[w, rw] -> PE[0, 0] }

Computation instances: w rw Compressed instances: i Arrival of x-value Compress Compute minimum and maximum Construct { x : min ≤ x ≤ max }

slide-50
SLIDE 50

SIMD Code Generation January 22, 2020 24 / 31

Convolution

lair C() : float16 x[8], float16 W[3] -> float16 y[6] { all (w, rw) in (8 - 3 + 1, 3) y[w] += x[w + rw] * W[rw] } compute_map: { C[w, rw] -> PE[0, 0] }

Computation instances: w rw Compressed instances: i Arrival of x-value Compress Compute minimum and maximum Construct { x : min ≤ x ≤ max } ⇒ a dense box

slide-51
SLIDE 51

SIMD Code Generation January 22, 2020 24 / 31

Convolution

lair C() : float16 x[8], float16 W[3] -> float16 y[6] { all (w, rw) in (8 - 3 + 1, 3) y[w] += x[w + rw] * W[rw] } compute_map: { C[w, rw] -> PE[0, 0] }

Computation instances: w rw Compressed instances: i Arrival of x-value Compress Compute minimum and maximum Construct { x : min ≤ x ≤ max } ⇒ a dense box Size: max − min + 1 ⇒ [1], [2] or [3] depending on “index”

slide-52
SLIDE 52

SIMD Code Generation January 22, 2020 25 / 31

Fixed Size Box Hull Approximation

Fixed size box hull approximation: Result: box containing the input set with

◮ variable offset (in particular, may involve “index”) ◮ fixed size (in particular, does not involve “index”)

Approach: look for suitable constraints in representation of input set May fail to produce a result (also used by PPCG (V. et al. 2013) to obtain mapping to shared memory)

slide-53
SLIDE 53

SIMD Code Generation January 22, 2020 25 / 31

Fixed Size Box Hull Approximation

Fixed size box hull approximation: Result: box containing the input set with

◮ variable offset (in particular, may involve “index”) ◮ fixed size (in particular, does not involve “index”)

Approach: look for suitable constraints in representation of input set May fail to produce a result i j (also used by PPCG (V. et al. 2013) to obtain mapping to shared memory)

slide-54
SLIDE 54

SIMD Code Generation January 22, 2020 25 / 31

Fixed Size Box Hull Approximation

Fixed size box hull approximation: Result: box containing the input set with

◮ variable offset (in particular, may involve “index”) ◮ fixed size (in particular, does not involve “index”)

Approach: look for suitable constraints in representation of input set May fail to produce a result i j (also used by PPCG (V. et al. 2013) to obtain mapping to shared memory)

slide-55
SLIDE 55

SIMD Code Generation January 22, 2020 25 / 31

Fixed Size Box Hull Approximation

Fixed size box hull approximation: Result: box containing the input set with

◮ variable offset (in particular, may involve “index”) ◮ fixed size (in particular, does not involve “index”)

Approach: look for suitable constraints in representation of input set May fail to produce a result i j (also used by PPCG (V. et al. 2013) to obtain mapping to shared memory)

slide-56
SLIDE 56

SIMD Code Generation January 22, 2020 25 / 31

Fixed Size Box Hull Approximation

Fixed size box hull approximation: Result: box containing the input set with

◮ variable offset (in particular, may involve “index”) ◮ fixed size (in particular, does not involve “index”)

Approach: look for suitable constraints in representation of input set May fail to produce a result i j (also used by PPCG (V. et al. 2013) to obtain mapping to shared memory)

slide-57
SLIDE 57

SIMD Code Generation January 22, 2020 26 / 31

Size Computation

Input: S: set of instances executed on a PE on arrival of a tensor element Apply variable compression to S to obtain S′ Compute element-wise minimum and maximum of S′ Construct { x : min ≤ x ≤ max } Check equal to S′ ⇒ S′ is a dense box Size: max − min + 1 Check size does not depend on “index”

slide-58
SLIDE 58

SIMD Code Generation January 22, 2020 26 / 31

Size Computation

Input: S: set of instances executed on a PE on arrival of a tensor element Apply variable compression to S to obtain S′ Try and compute fixed size box hull of S′ If successful and extra instances write to disjoint locations, then use box size. Stop. Compute element-wise minimum and maximum of S′ Construct { x : min ≤ x ≤ max } Check equal to S′ ⇒ S′ is a dense box Size: max − min + 1 Check size does not depend on “index”

slide-59
SLIDE 59

SIMD Code Generation January 22, 2020 27 / 31

Convolution

lair C() : float16 x[8], float16 W[3] -> float16 y[6] { all (w, rw) in (8 - 3 + 1, 3) y[w] += x[w + rw] * W[rw] } compute_map: { C[w, rw] -> PE[0, 0] }

Computation instances: w rw Compressed instances: i Arrival of x-value Compress

slide-60
SLIDE 60

SIMD Code Generation January 22, 2020 27 / 31

Convolution

lair C() : float16 x[8], float16 W[3] -> float16 y[6] { all (w, rw) in (8 - 3 + 1, 3) y[w] += x[w + rw] * W[rw] } compute_map: { C[w, rw] -> PE[0, 0] }

Computation instances: w rw Compressed instances: i Arrival of x-value Compress Try and compute box hull

slide-61
SLIDE 61

SIMD Code Generation January 22, 2020 27 / 31

Convolution

lair C() : float16 x[8], float16 W[3] -> float16 y[6] { all (w, rw) in (8 - 3 + 1, 3) y[w] += x[w + rw] * W[rw] } compute_map: { C[w, rw] -> PE[0, 0] }

Computation instances: w rw Compressed instances: i Arrival of x-value Compress Try and compute box hull Extra instances write to disjoint locations

slide-62
SLIDE 62

Conclusion January 22, 2020 28 / 31

Outline

1

Target Architecture

2

Code Generation

3

SIMD Code Generation

4

Conclusion

slide-63
SLIDE 63

Conclusion January 22, 2020 29 / 31

Conclusion

achieving good performance on Cerebras CS-1 requires generation of SIMD instructions heuristics based approach can detect opportunities in many cases, using

◮ variable compression ◮ fixed size box hull approximation

effective use of polyhedral compilation techniques (other than affine scheduling)

slide-64
SLIDE 64

January 22, 2020 30 / 31

References I

Abadi, Mart´ ın, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng (Nov. 2016). “TensorFlow: A System for Large-Scale Machine Learning”. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). Savannah, GA: USENIX Association, pp. 265–283. Meister, Benoˆ ıt (Dec. 2004). “Stating and Manipulating Periodicity in the Polytope Model. Applications to Program Analysis and Optimization”. PhD thesis. Universit´ e Louis Pasteur. V., Sven (2010). “isl: An Integer Set Library for the Polyhedral Model”. In: Mathematical Software - ICMS 2010. Ed. by Komei Fukuda, Joris Hoeven, Michael Joswig, and Nobuki Takayama. Vol. 6327. Lecture Notes in Computer Science. Springer, pp. 299–302. doi: 10.1007/978-3-642-15582-6_49.

slide-65
SLIDE 65

January 22, 2020 31 / 31

References II

V., Sven, Juan Carlos Juega, Albert Cohen, Jos´ e Ignacio G´

  • mez, Christian Tenllado, and

Francky Catthoor (2013). “Polyhedral parallel code generation for CUDA”. In: ACM Trans.

  • Archit. Code Optim. 9.4, p. 54. doi: 10.1145/2400682.2400713.