CS 293S Parallelism and Dependence Theory Yufei Ding Reference - - PowerPoint PPT Presentation

cs 293s parallelism and dependence theory
SMART_READER_LITE
LIVE PREVIEW

CS 293S Parallelism and Dependence Theory Yufei Ding Reference - - PowerPoint PPT Presentation

CS 293S Parallelism and Dependence Theory Yufei Ding Reference Book: Optimizing Compilers for Modern Architecture by Allen & Kennedy Slides adapted from Louis-Nol Pouche, Mary Hall End of Moore's Law necessitate parallel computing


slide-1
SLIDE 1

CS 293S Parallelism and Dependence Theory

Yufei Ding Reference Book: “Optimizing Compilers for Modern Architecture” by Allen & Kennedy

Slides adapted from Louis-Noël Pouche, Mary Hall

slide-2
SLIDE 2

2

End of Moore's Law necessitate parallel computing

End of Moore‘s law necessitate a means of increasing

performance beyond simply producing more complex chips.

One such method is to employ cheaper and less complex

chips in parallel architectures

slide-3
SLIDE 3

3

Amdahl’s law

if f is the fraction of the code parallelized, and if the parallelized

version runs on a p-processor machine with no communication

  • r parallelization overhead, the speedup is

If f = 50%, than the maximum speedup would be ?

1 1 − # + (#/')

slide-4
SLIDE 4

4

Data locality

Temporal locality occurs when the same data is used several

times within a short time period.

Spatial locality occurs when different data elements that are

located near to each other are used within a short period of time.

Better locality à less cache misses An important form of spatial locality occurs when all the elements

that appear on one cache line are used together.

  • 1. Parallelism and data locality are often correlated.
  • 2. Same/Similar set of Techniques for exploring

parallelism and maximizing data locality.

slide-5
SLIDE 5

5

Data locality

Kernels can often be written in many semantically equivalent ways

but with widely varying data localities and performances

for (i=1; i<N; i++) for (j=1; j<N; j++) A[i, j] = 0; for (j=1; j<N; j++) for (i=1; i<N; i++) A[i, j] = 0; b = ceil (N/M) for (i= b * p; i < min(n, b*(p+1)); i++) for (j=1; j<N; j++) A[i, j] = 0;

(a) Zeroing an array column-by-column (b) Zeroing an array row-by-row. (c) Zeroing an array row-by-row in parallel.

slide-6
SLIDE 6

6

Data locality

Kernels can often be written in many semantically equivalent ways

but with widely varying data localities and performances

for (i=1; i<N; i++) for (j=1; j<N; j++) A[i, j] = 0; for (j=1; j<N; j++) for (i=1; i<N; i++) A[i, j] = 0; b = ceil (N/M) for (i= b * p; i < min(n, b*(p+1)); i++) for (j=1; j<N; j++) A[i, j] = 0;

(a) Zeroing an array column-by-column (b) Zeroing an array row-by-row. (c) Zeroing an array row-by-row in parallel.

slide-7
SLIDE 7

7

How to get efficient parallel programs?

Programmer: writing correct and efficient sequential programs

is not easy; writing parallel programs that are correct and efficient is even harder.

data locality, data dependence Debugging is hard Compiler? Correctness V.S. Efficiency Simple assumption

no pointers and pointer arithmetic Affine: Affine loop + affine array access + …

slide-8
SLIDE 8

8

Affine Array Accesses

Common patterns of data accesses: (i, j, k are loop indexes) A[i], A[j], A[i-1], A[0], A[i+j], A[2*i], A[2*i+1] , A[i,j],

A[i-1, j+1]

Array indexes are affine expressions of surrounding loop

indexes

Loop indexes: in, in-1, ... , i1 Integer constants: cn, cn-1, ... , c0 Array index: cnin+ cn-1in-1+ ... + c1i1+ c0 Affine expression: linear expression + a constant term (c0)

slide-9
SLIDE 9

9

Affine loop

All loop bounds and contained control conditions have to

be expressible as a linear affine expression in the containing loop index variables

Affine array accesses No pointers + no possible aliasing (e.g., overlap of two

arrays) between statically distinct base addresses.

slide-10
SLIDE 10

10

Loop/Array Parallelism

The loop is parallelizable because each iteration accesses a

different set of data.

We can execute the loop on a computer with N processors by

giving each processor an unique ID p = 0 , 1 , . . . , M - 1 and having each processor execute the same code: C[p] = A[p]+B[p];

for (i=1; i<N; i++) C[i] = A[i]+B[i];

slide-11
SLIDE 11

11

Parallelism & Dependence

for (i=1; i<N; i++) A[i] = A[i-1]+B[i]; A[1] = A[0]+B[1]; A[2] = A[1]+B[2]; A[3] = A[2]+B[3]; …

slide-12
SLIDE 12

12

Focus of the this lecture

Data Dependence True, Anti-, Output dependence Source and Sink Distance vector, direction vector Relation between Reordering transformation and Direction

vector

Loop dependence

loop-carried dependence Loop-Independent Dependences

Dependence graph

slide-13
SLIDE 13

13

Dependence Concepts

Assume statement S2 depends on statement S1.

  • 1. True dependences (RAW hazard): read after write.

Denoted by S1 d S2

  • 2. Antidependence (WAR hazard): write after read.

Denoted by S1 d-1 S2

  • 3. Output dependence (WAW hazard): write after write.

Denoted by S1 d0 S2

slide-14
SLIDE 14

14

Source and Sink Source: the statement (instance) executed earlier Sink: the statement (instance) executed later Graphically, a dependence is an edge from source to

sink

Dependence Concepts

S1 PI = 3.14 S2 R = 5.0 S3 AREA = PI * R ** 2

S1 S2 S3 sources sink

slide-15
SLIDE 15

15

Dependence in Loops

Let us look at two different loops: DO I = 1, N S1 A(I+1) = A(I) + B(I) ENDDO DO I = 1, N S1 A(I+2) = A(I) + B(I) ENDDO

  • In both cases, statement S1 depends on itself
  • However, there is a significant difference
  • We need a formalism to describe and distinguish such dependences
slide-16
SLIDE 16

16

Data Dependence Analysis

Objective: compute the set of statement instances which are dependent Possible approaches: q Distance vector: compute an indicator of the distance between two dependent iteration q Dependence polyhedron: compute list of sets of dependent instances, with a set of dependence polyhedra for each pair

  • f statements
slide-17
SLIDE 17

17

Program Abstraction Level

Statement Instance of statement

For (i = 1; i <=10; i++) A[i] = A[i-1] + 1 A[4] = A[3] + 1

slide-18
SLIDE 18

18

Iteration Domain

Iteration Vector A n-level loop nest can be represented as a n-entry vector, each

component corresponding to each level loop iterator For (x1=L1; x1<U1; x1++) … For (x2=L2; x2<U2; x2++) … For (xn=Ln; xn<Un; xn++) <some statement S1> The iteration vector (2, 1, …) denotes the instance of S1 executed during the 2nd iteration of the X1 loop and the 1st iteration of the X2 loop

slide-19
SLIDE 19

19

Iteration Domain

Dimension of Iteration Domain: Decided by loop nesting levels Bounds of Iteration Domain: Decided by loop bounds Using inequalities

For (i=1; i<=n; i++) For (j=1; j<=n; j++) if (i<=n+2-j) b[j]=b[j]+a[i];

slide-20
SLIDE 20

20

Modeling Iteration Domains

Representing iteration bounds by affine function:

slide-21
SLIDE 21

21

Loop Normalization

Algorithm: Replace loop boundaries and steps:

for (i = L, i < U, i = i + S) à for (i = 1, i < (U-L+S)/S, i = i + 1)

Replace each reference to original loop variable i with:

i * S - S + L

slide-22
SLIDE 22

22

Examples: Loop Normalization

For (i=4; i<=N; i+=6) For (j=0; j<=N; j+=2) A[i] = 0 For (ii=1; ii<=(N+2)/6; ii++) For (jj=1; jj<=(N+2)/2; jj++) i=ii*6-6+4 j=jj*2-2 A[i]=0

slide-23
SLIDE 23

23

Distance/Direction Vectors

The distance vector is a vector d(sink, source) such that: dk = sinkk - sourcek.

i.e., the difference between their iteration vectors sink - source!!

The direction vector is a vector D(i,j) such that:

Dk = “<” if d(i,j)k > 0; Dk = “>” if d(i,j)k < 0; Dk = “=“ otherwise.

slide-24
SLIDE 24

24

Example 1:

DO I = 1, N S1 A(I+1) = A(I) + B(I) ENDDO

q Dependence distance vector of the true dependence: source: A(I+1); sink: A(I) q Consider a memory location A(x) iteration vector of source: (x-1) iteration vector of sink: (x) q Distance vector: (x) - (x-1) = (1) q Direction vector: (<)

slide-25
SLIDE 25

25

Example 2:

DO I = 1, N DO J = 1, M DO K = 1, L S1 A(I+1, J, K-1) = A(I, J, K) + 10 ENDDO ENDDO ENDDO

What is the dependence distance vector of the true

dependence?

What is the dependence distance vector of the anti-

dependence?

slide-26
SLIDE 26

26

Example 2:

DO I = 1, N DO J = 1, M DO K = 1, L S1 A(I+1, J, K-1) = A(I, J, K) + 10 ENDDO ENDDO ENDDO

sink happens before source: the assumed anti-dependence is invalid!

For the true dependence:

Distance Vector: (1, 0, -1) Direction Vector: (<, =, >)

For the anti-dependence:

Distance Vector: (-1, 0, 1) Direction Vector: (>, =, <)

slide-27
SLIDE 27

27

Example 3:

What is the dependence distance vector of the true

dependence?

What is the dependence distance vector of the anti-

dependence?

DO K = 1, L DO J = 1, M DO I = 1, N S1 A(I+1, J, K-1) = A(I, J, K) + 10 ENDDO ENDDO ENDDO

slide-28
SLIDE 28

28

Example 3:

For the true dependence:

Distance Vector: (-1, 0, 1) Direction Vector: (>, =, <)

For the anti-dependence:

Distance Vector: (1, 0, -1) Direction Vector: (<, =, >) DO K = 1, L DO J = 1, M DO I = 1, N S1 A(I+1, J, K-1) = A(I, J, K) + 10 ENDDO ENDDO ENDDO

The assumed true dependence is invalid!

slide-29
SLIDE 29

29

q True dependence turns into an anti-dependence. “Write then read” turns into “read then write”. q Reflected in direction vector of the true dependence: (<, =, >) turns into (>, =, <)

DO I = 1, N DO J = 1, M DO K = 1, L S1 A(I+1, J, K-1) = A(I, J, K) + 10 ENDDO ENDDO ENDDO DO K = 1, L DO J = 1, M DO I = 1, N S1 A(I+1, J, K-1) = A(I, J, K) + 10 ENDDO ENDDO ENDDO

Example 2 Example 3

slide-30
SLIDE 30

30

Example 4:

DO J = 1, M DO I = 1, N DO K = 1, L S1 A(I+1, J, K-1) = A(I, J, K) + 10 ENDDO ENDDO ENDDO

What is the dependence distance vector of the true

dependence?

What is the dependence distance vector of the anti-dependence? Is this program equivalent with Example 2?

slide-31
SLIDE 31

31

distance vectors direction vectors (1, 0, -1) (<, =, >) Consider the true dependence

source sink

(0, 1, -1) (=, <, >)

source sink

q True dependence stays as true dependence. “Write then read” stays as “Write then read”. q Reflected in direction vector of the true dependence: (<, =, >) turns into (=, <, >)

write read write read So, it is still a true dependence.

DO I = 1, N DO J = 1, M DO K = 1, L S1 A(I+1, J, K-1) = A(I, J, K) + 10 ENDDO ENDDO ENDDO

Example 2

DO J = 1, M DO I = 1, N DO K = 1, L S1 A(I+1, J, K-1) = A(I, J, K) + 10 ENDDO ENDDO ENDDO

Example 4

slide-32
SLIDE 32

32

Reordering Transformations

Definition: merely changes the order of execution of the code no adding or deleting A reordering transformation does not eliminate dependences However, it can change the execution order of original sink

and source, causing incorrect behavior

slide-33
SLIDE 33

33

“Any reordering transformation that preserves every

dependence in a program preserves the meaning of that program.”

  • --- Fundamental Theorem of Dependence
slide-34
SLIDE 34

34

Theorem of loop reordering

Direction Vector Transformation Let T be a reordering transformation that is applied to a loop nest and

that does not rearrange the statements in the body of the loop.

Then the transformation is valid if, after it is applied, none of the direction

vectors for dependences with source and sink in the original nest has a leftmost non- “=” component that is “>”.

Follows from Fundamental Theorem of Dependence: All dependences exist None of the dependences have been reversed

slide-35
SLIDE 35

35

Procedure to Check Validity of a Loop Reordering

  • 1. List the direction vectors of all types of data dependences in the
  • riginal program
  • 2. According to the new order of loops, exchange the elements in the

direction vectors to derive the new direction vectors.

  • 3. If all the direction vectors have a “<“ as the first non-“=“ sign, the

transformation is valid. A all-“=“ vector will stay as all-“=“ vector; it won’t affect the correctness of loop reordering.

slide-36
SLIDE 36

Example

36

?

DO H = 1, 10 DO I = 1, 10 Do J = 1, 10 Do K = 1, 10 S A(H, I+1, J-2, K+3) = A(H, I, J, K) + B ENDDO ENDDO ENDDO ENDDO DO H = 1, 10 DO J = 1, 10 Do I = 1, 10 Do K = 1, 10 S A(H, I+1, J-2, K+3) = A(H, I, J, K) + B ENDDO ENDDO ENDDO ENDDO

slide-37
SLIDE 37

37

Loop-Carried and Loop-Independent Dependences

If in a loop statement S2 depends on S1, then there are two

possible ways of this dependence occurring:

Source and sink happen on different iterations This is called a loop-carried dependence. S1 and S2 execute on the same iteration This is called a loop-independent dependence

slide-38
SLIDE 38

38

Loop-Carried Dependence

Example:

DO I = 1, N S1 A(I+1) = F(I) S2 F(I+1) = A(I) ENDDO

slide-39
SLIDE 39

39

Loop-Carried Dependence

Dependence Level:

Level of a loop-carried dependence is the index of the leftmost non- “=” of D(i,j) for the dependence.

For instance:

Direction vector for S1 is (=, =, <) Level of the dependence is 3 A level-k true dependence between S1 and S2 is denoted by

S1 dk S2

DO I = 1, 10 DO J = 1, 10 DO K = 1, 10 S1 A(I, J, K+1) = A(I, J, K) ENDDO ENDDO ENDDO

The iterations of a loop can be executed in parallel if the loop carries no dependences

slide-40
SLIDE 40

40

Loop-Independent Dependences

Example:

More complicated example:

DO I = 1, 10 S1 A(I) = ... S2 ... = A(I) ENDDO DO I = 1, 9 S1 A(I) = ... S2 ... = A(10-I) ENDDO

slide-41
SLIDE 41

41

Loop-Independent Dependences

Theorem 2.5. If there is a loop-independent dependence from S1

to S2, any reordering transformation that does not move statement instances between iterations and preserves the relative order of S1 and S2 in the loop body preserves that dependence.

S2 depends on S1 with a loop independent true dependence is

denoted by S1 d∞ S2

The direction vector has entries that are all “=” for loop

independent dependences

slide-42
SLIDE 42

42

Is the reordering legal?

DO I = 1, 100 DO J=1, 100 A(I+1, J) = A(I, 5) + B ENDDO ENDDO DO J = 1, 100 DO I=1, 100 A(I+1, J) = A(I, 5) + B ENDDO ENDDO

(<, <) (<, =) (<, >) (<, <) (=, <) (>, <)

slide-43
SLIDE 43

43

DO I = 1, 100 D(I) = A (5, I) DO J=1, 100 A(J, I-1) = B(I) + C ENDDO ENDDO

S1 S2

Dependence Graph

Nodes for statements Edges for data dependences Labels on edges for dependence levels and types

s1 s2

δ1-1 from S1 to S2: (<) level-1 antidependence S1 is the source, S2 is the sink S2 S1

Important point: order of vectors depends on order

  • f loops, not use in arrays

Only consider common loops!

slide-44
SLIDE 44

44

no dependence

DO I = 1, 100 D(I) = A (102, I) DO J=1, 100 A(J, I-1) = B(I) + C ENDDO ENDDO

S1 S2

slide-45
SLIDE 45

45

DO I = 1, 100 S1 X(I) = Y(I) + 10

DO J = 1, 100

S2 B(J) = A(J,N)

DO K = 1, 100

S3 A(J+1,K)=B(J)+C(J,K)

ENDDO

S4 Y(I+J) = A(J+1, N)

ENDDO

ENDDO

Dependence Graph

slide-46
SLIDE 46

46

DO I = 1, 100 S1 X(I) = Y(I) + 10

DO J = 1, 100

S2 B(J) = A(J,N)

DO K = 1, 100

S3 A(J+1,K)=B(J)+C(J,K)

ENDDO

S4 Y(I+J) = A(J+1, N)

ENDDO

ENDDO

  • 1. True dependences denoted by Si d Sj
  • 2. Antidependence denoted by Si d-1 Sj
  • 3. Output dependence denoted by Si d0 Sj

d and δ are used interchangeably