loop fusion and fission and presburger trans framework
play

Loop Fusion and Fission and Presburger Trans Framework ! Last time ! - PDF document

Loop Fusion and Fission and Presburger Trans Framework ! Last time ! Unimodular transformation framework ! Loop permutation, Loop reversal, Loop skewing ! Fourier Motzkin ! Frameworks ! Unimodular ! Polyhedral ! Presburger


  1. Loop Fusion and Fission and Presburger Trans Framework ! Last time – ! Unimodular transformation framework – ! Loop permutation, Loop reversal, Loop skewing – ! Fourier Motzkin ! Frameworks – ! Unimodular – ! Polyhedral – ! Presburger – ! Sparse Polyhedral ! Today – ! Presburger or Kelly & Pugh transformation framework – ! Loop fusion – ! Loop fission – ! Unroll and jam CS553 Lecture Loop Transformations 1 Loop Fusion ! Idea – ! Combine multiple loop nests into one ! Example ! do i = 1,n do i = 1,n A(i) = A(i-1) ! ! A(i) = A(i-1) ! enddo ! B(i) = A(i)/2 ! do j = 1,n enddo B(j) = A(j)/2 ! ! enddo ! Cons ! Pros ! ! May hurt data locality ! ! May improve data locality May hurt icache performance ! ! Reduces loop overhead ! ! ! ! Enables array contraction (opposite of scalar expansion) May enable better instruction scheduling ! ! CS553 Lecture Loop Transformations 2

  2. Legality of Loop Fusion ! Basic Conditions – ! Both loops must have same structure – ! Same loop depth Can we relax any of these – ! Same loop bounds restrictions? – ! Same iteration directions – ! Dependences must be preserved e.g., Flow dependences must not become anti dependences do i = 1,n do i = 1,n ! body1 ! body1 All cross-loop Ensure that fusion enddo ! body2 dependences does not introduce do i = 1,n enddo flow from body1 dependences from ! body2 to body2 body2 to body1 enddo CS553 Lecture Loop Transformations 3 Loop Fusion Example ! What are the dependences? do i = 1,n ! ! What are the dependences? ! s 1 A(i) = B(i) + 1 ! do i = 1,n enddo ! s 1 � f s 2 ! s 1 A(i) = B(i) + 1 s 1 � f s 2 do i = 1,n ! ! s 2 C(i) = A(i)/2 ! s 2 C(i) = A(i)/2 s 3 � a s 2 enddo ! s 2 � f s 3 ! s 3 D(i) = 1/C(i+1) enddo ! do i = 1,n ! ! s 3 D(i) = 1/C(i+1) Fusion changes the dependence enddo between s 2 and s 3 , so fusion is illegal ! ! Is there some transformation that will enable fusion of these loops? CS553 Lecture Loop Transformations 4

  3. Loop Fusion Example (cont) ! Loop reversal is legal for the original loops – ! Does not change the direction of any dep in the original code – ! Will reverse the direction in the fused loop: s 3 � a s 2 will become s 2 � f s 3 do i = n,1 ! ! s 1 A(i) = B(i) + 1 enddo ! do i = n,1,-1 ! s 1 � f s 2 ! s 1 A(i) = B(i) + 1 s 1 � f s 2 do i = n,1 ! ! s 2 C(i) = A(i)/2 ! s 2 C(i) = A(i)/2 s 2 � f s 3 enddo ! s 2 � f s 3 ! s 3 D(i) = 1/C(i+1) enddo ! do i = n,1 ! After reversal and fusion all original ! s 3 D(i) = 1/C(i+1) dependences are preserved enddo ! CS553 Lecture Loop Transformations 5 Kelly and Pugh Transformation Framework ! Specify iteration space as a set of integer tuples ! Specify data dependences as relations between integer tuples (i.e., data dependence relations) { [ i 1 , j 1 ] → [ i 2 , j 2 ] | ( i 1 = i 2 − 1) ∧ ( j 1 = j 2 − 1) ∧ (1 ≤ i 1 , j 1 , i 2 , j 2 ≤ n ) } ! Specify transformations as relations/mappings between integer tuples ! Execute iterations in transformed iteration space in lexicographic order CS553 Lecture Loop Transformations 6

  4. Specifying Loop Fusion in Kelly and Pugh Framework ! Specify iteration space as a set of integer tuples ! Specify data dependences as mappings between integer tuples (i.e., data dependence relations) ! Specify transformations as mappings between integer tuples CS553 Lecture Loop Transformations 7 Checking Legality in Kelly & Pugh Framework ! For each dependence, [I] -> [J] the transformed I iteration must be executed after the transformed J iteration. CS553 Lecture Loop Transformations 8

  5. Loop Fusion Example (cont) ! Loop reversal is legal for the original loops – ! Does not change the direction of any dep in the original code – ! Will reverse the direction in the fused loop: s 3 � a s 2 will become s 2 � f s 3 do i = n,1,-1 ! ! s 1 A(i) = B(i) + 1 ! do i = n,1,-1 enddo ! ! s 1 A(i) = B(i) + 1 s 1 � f s 2 s 1 � f s 2 do i = n,1,-1 ! s 2 C(i) = A(i)/2 ! ! s 2 C(i) = A(i)/2 s 2 � f s 3 enddo ! ! s 3 D(i) = 1/C(i+1) s 2 � f s 3 enddo ! do i = n,1,-1 After reversal and fusion all original ! ! s 3 D(i) = 1/C(i+1) dependences are preserved enddo ! CS553 Lecture Loop Transformations 9 Fusion Example ! Can we fuse these loop nests? do i = 1,n Fusion of these loops would ! X(i) = 0 do i = 1,n violate this enddo � f ! X(i) = 0 dependence do j = 1,n do k = 1,n do k = 1,n X(k) = X(k)+A(k,i)*Y(i) X(k) = X(k)+A(k,j)*Y(j) enddo enddo enddo enddo CS553 Lecture Loop Transformations 10

  6. Fusion Example (cont) ! Use loop interchange to preserve dependences do i = 1,n ! X(i) = 0 do i = 1,n � f enddo ! X(i) = 0 � f do k = 1,n do j = 1,n do j = 1,n X(i) = X(i)+A(i,j)*Y(j) X(k) = X(k)+A(k,j)*Y(j) enddo enddo enddo enddo CS553 Lecture Loop Transformations 11 Loop Fission (Loop Distribution) ! Idea – ! Split a loop nest into multiple loop nests (the inverse of fusion) ! Example ! do i = 1,n ! A(i) = B(i) + 1 ! do i = 1,n ! enddo A(i) = B(i) + 1 ! C(i) = A(i)/2 ! enddo ! do i = 1,n ! C(i) = A(i)/2 enddo ! ! Motivation? – ! Produces multiple (potentially) less constrained loops – ! May improve locality – ! Enable other transformations, such as interchange ! Legality? CS553 Lecture Loop Transformations 12

  7. Loop Fission (cont) ! Legality – ! Fission is legal when the loop body contains no cycles in the dependence graph do i = 1,n Cycles cannot do i = 1,n ! body1 be preserved ! body1 enddo because after ! body2 do i = 1,n fission all enddo cross-loop ! body2 dependences enddo flow from body1 to body2 CS553 Lecture Loop Transformations 13 Loop Fission Example ! Recall our fusion example do i = 1,n Can we perform fission on this loop? ! ! s 1 A(i) = B(i) + 1 enddo ! do i = 1,n ! s 1 � f s 2 ! s 1 A(i) = B(i) + 1 s 1 � f s 2 do i = 1,n ! ! s 2 C(i) = A(i)/2 ! s 2 C(i) = A(i)/2 s 3 � a s 2 enddo ! ! s 3 s 2 � f s 3 D(i) = 1/C(i+1) enddo ! do i = 1,n ! ! s 3 D(i) = 1/C(i+1) enddo ! CS553 Lecture Loop Transformations 14

  8. Loop Fission Example (cont) ! If there are no cycles, we can reorder the loops with a topological sort do i = 1,n ! Can we perform fission on this loop? ! s 1 A(i) = B(i) + 1 ! do i = 1,n enddo s 1 � f s 2 ! ! s 1 A(i) = B(i) + 1 s 1 � f s 2 do i = 1,n ! ! s 3 D(i) = 1/C(i+1) ! s 2 C(i) = A(i)/2 s 3 � a s 2 enddo ! s 3 � a s 2 ! s 3 D(i) = 1/C(i+1) enddo ! do i = 1,n ! ! s 2 C(i) = A(i)2 enddo ! CS553 Lecture Loop Transformations 15 Loop Unrolling ! Motivation – ! Reduces loop overhead – ! Improves effectiveness of other transformations – ! Code scheduling – ! CSE ! The Transformation Make n copies of the loop: n is the unrolling factor ! ! ! ! Adjust loop bounds accordingly CS553 Lecture Loop Transformations 16

  9. Loop Unrolling (cont) ! Example do i=1,n do i=1,n by 2 A(i) = B(i) + C(i) A(i) = B(i) + C(i) enddo A(i+1) = B(i+1) + C(i+1) enddo ! Details ! ! When is loop unrolling legal? ! ! Handle end cases with a cloned copy of the loop ! ! Enter this special case if the remaining number of iteration is less than the unrolling factor CS553 Lecture Loop Transformations 17 Loop Balance ! Problem – ! We’d like to produce loops with the right balance of memory operations and floating point operations – ! The ideal balance is machine-dependent – ! e.g. How many load-store units are connected to the L1 cache? – ! e.g. How many functional units are provided? ! Example ! ! The inner loop has 1 memory ! do j = 1,2*n operation per iteration and 1 floating ! do i = 1,m point operation per iteration ! A(j) = A(j) + B(i) If our target machine can only ! ! ! enddo support 1 memory operation for ! enddo every two floating point operations, this loop will be memory bound ! What can we do? CS553 Lecture Loop Transformations 18

  10. Unroll and Jam ! Idea – ! Restructure loops so that loaded values are used many times per iteration ! Unroll and Jam – ! Unroll the outer loop some number of times – ! Fuse (Jam) the resulting inner loops ! ! Example ! Unroll the Outer Loop ! do j = 1,2*n do j = 1,2*n by 2 do i = 1,m do i = 1,m ! A(j) = A(j) + B(i) A(j) = A(j) + B(i) ! enddo enddo ! enddo do i = 1,m A(j+1) = A(j+1) + B(i) enddo enddo CS553 Lecture Loop Transformations 19 Unroll and Jam Example (cont) ! Unroll the Outer Loop do j = 1,2*n by 2 do i = 1,m A(j) = A(j) + B(i) enddo do i = 1,m A(j+1) = A(j+1) + B(i) enddo enddo ! Jam the inner loops ! ! The inner loop has 1 load per ! do j = 1,2*n by 2 iteration and 2 floating point ! do i = 1,m operations per iteration ! A(j) = A(j) + B(i) ! ! We reuse the loaded value of B(i) ! A(j+1) = A(j+1) + B(i) ! ! The Loop Balance matches the ! enddo machine balance ! enddo CS553 Lecture Loop Transformations 20

  11. Unroll and Jam (cont) ! Legality – ! When is Unroll and Jam legal? ! Disadvantages – ! What limits the degree of unrolling? CS553 Lecture Loop Transformations 21 Concepts ! Loop transformation – ! Loop fusion – ! Loop fission – ! Unroll and jam ! Kelly & Pugh Transformation Framework – ! iteration spaces as constrained sets of integer tuples – ! data dependences as relations between integer tuples – ! transformations as relations/mappings between integer tuples CS553 Lecture Loop Transformations 22

  12. Next Time ! Lecture – ! Automatic Parallelization ! Reading – ! Automatic Parallelization CS553 Lecture Loop Transformations 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend