Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time
Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen and Nicolas Vasilache
ALCHEMY, INRIA Futurs / University of Paris-Sud XI
Iterative Optimization in the Polyhedral Model: Part I, - - PowerPoint PPT Presentation
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time Louis-Nol Pouchet , Cdric Bastoul, Albert Cohen and Nicolas Vasilache ALCHEMY, INRIA Futurs / University of Paris-Sud XI March 12, 2007 Fifth International
ALCHEMY, INRIA Futurs / University of Paris-Sud XI
Outline: CGO’07
◮ 1 point in the space ⇔ 1 distinct legal program version ◮ suitable for various exploration methods
◮ 99% of the best speedup attained within 20 runs of a dedicated heuristic ◮ wall clock optimal transformation discoverable on small kernels 2
Scheduling in the Polyhedral Model: A Motivating Example CGO’07
4
Scheduling in the Polyhedral Model: A Motivating Example CGO’07
4
Scheduling in the Polyhedral Model: A Motivating Example CGO’07
4
Scheduling in the Polyhedral Model: A Motivating Example CGO’07
Transformation Description
Changes the direction in which a loop traverses its iteration range
Makes the bounds of a given loop depend on an outer loop counter
Exchanges two loops in a perfectly nested loop, a.k.a. permutation
Extracts one iteration of a given loop
Allows to reorder loops
Fuses two loops, a.k.a. jamming
Splits a single loop nest into many, a.k.a. fission or splitting
4
Scheduling in the Polyhedral Model: A Motivating Example CGO’07
6
Scheduling in the Polyhedral Model: A Motivating Example CGO’07
6
Scheduling in the Polyhedral Model: A Motivating Example CGO’07
6
Scheduling in the Polyhedral Model: Overview CGO’07
1
◮ Efficiently construct a space of all legal, distinct affine schedules 7
Scheduling in the Polyhedral Model: Overview CGO’07
1
◮ Efficiently construct a space of all legal, distinct affine schedules
#Sched.
7
Scheduling in the Polyhedral Model: Overview CGO’07
1
◮ Efficiently construct a space of all legal, distinct affine schedules
#Sched.
#Legal
7
Scheduling in the Polyhedral Model: Overview CGO’07
1
◮ Efficiently construct a space of all legal, distinct affine schedules
#Sched.
#Legal
◮ Rely on the polyhedral model and Integer Linear Programming to
7
Scheduling in the Polyhedral Model: Overview CGO’07
1
◮ Efficiently construct a space of all legal, distinct affine schedules
#Sched.
#Legal
◮ Rely on the polyhedral model and Integer Linear Programming to
◮ Search space will emcoumpass unique, distinct compositions of
7
Scheduling in the Polyhedral Model: Overview CGO’07
1
◮ Efficiently construct a space of all legal, distinct affine schedules
#Sched.
#Legal
◮ Rely on the polyhedral model and Integer Linear Programming to
◮ Search space will emcoumpass unique, distinct compositions of
2
◮ Perform exhaustive scan to discover wall clock optimal schedule, and
◮ Build an efficient heuristic to accelerate the space traversal 7
Search Space Construction: Preliminaries CGO’07
9
Search Space Construction: Preliminaries CGO’07
DS1 =
1 −1 −1 1 1 −1 −1 1 −1 −1 1 2 . i j n 1 ≥ 9
Search Space Construction: Preliminaries CGO’07
9
Search Space Construction: Preliminaries CGO’07
DS1δS2 :
1 −1 1 −1 −1 3 1 −1 −1 3 1 −1 −1 3 . iS1 iS2 jS2 1 = 0 ≥
S1 iterations S2 iterations
9
Search Space Construction: Preliminaries CGO’07
9
Search Space Construction: Way to Go CGO’07
10
Search Space Construction: Way to Go CGO’07
4&!"#$"%&'()*+,-&'&+,
10
Search Space Construction: Way to Go CGO’07
4&5$0)$%(*6&,7+/(*(7+ 4&!"#$"%&'())"
10
Search Space Construction: Way to Go CGO’07
4&5$0)$%(*6&,7+/(*(7+ 4&8$9:$)&!";;$ <$%(/& 8$9:$) =0%*(>%("9)
10
Search Space Construction: Way to Go CGO’07
4&5$0)$%(*6&,7+/(*(7+ 4&8$9:$)&!";;$ <$%(/& 8$9:$) =0%*(>%("9)
10
Search Space Construction: Way to Go CGO’07
4&5$0)$%(*6&,7+/(*(7+ 4&8$9:$)&!";;$ <$%(/& 8$9:$) =0%*(>%("9) 4&!"#$%&'()%&*$
10
Search Space Construction: Way to Go CGO’07
4&5$0)$%(*6&,7+/(*(7+ 4&8$9:$)&!";;$ <$%(/& 8$9:$) =0%*(>%("9) 4&!"#$%&'()%&*$
10
Search Space Construction: Way to Go CGO’07
4&5$0)$%(*6&,7+/(*(7+ 4&8$9:$)&!";;$ <$%(/& 8$9:$) =0%*(>%("9) 4&?/"+*(3,$*(7+ 4&!"#$%&'(#)
◮ Reduce redundancy ◮ Detect implicit equalities 10
Search Space Construction: Way to Go CGO’07
!"#$%& '(")*+,(-".$,)& /,0+12$0).*
:&/"8*"#$.;&2,)%$.$,) :&<"(="*&30--" !"#$%& <"(="* >8#.$?#$0(* :&@%0).$12".$,) :&A(,B02.$,)
10
Search Space Construction: Way to Go CGO’07
!"#$%& '(")*+,(-".$,)& /,0+12$0).*
:&/"8*"#$.;&2,)%$.$,) :&<"(="*&30--" !"#$%& <"(="* >8#.$?#$0(*
:&B%0).$12".$,) :&C(,A02.$,)
10
Search Space Construction: Way to Go CGO’07
Benchmark
#Sched #Legal Time
11
Search Space Exploration: Framework for Iterative Optimization CGO’07
!"#$% &'(&')'*+,+-#* .+'&,+-/'%0#1(-2,+-#*%,*3%&4*%#5%6,)'%)#4&0'%0#3'% 7-+8%+&,*)5#&1'3%!"#$% "#3'%9'*'&,+-#*
$#2;8'3&,2%0#1(4+-*9%2-6&,&-')
!"#$%&'()*(+" #$,+
9$(:1+,0)( 0+&0+7+45)5'$4 $/";!$9 <$34,+, 7+)0#1"7&)#+
12
Search Space Exploration: Exhaustive Scan CGO’07
6e+08 8e+08 1e+09 1.2e+09 1.4e+09 1.6e+09 1.8e+09 2e+09 100 200 300 400 500 600 700 800 900 1000 Cycles Transformation identifier matmult
5e+08 1e+09 1.5e+09 2e+09 2.5e+09 3e+09 3.5e+09 4e+09 1000 2000 3000 4000 5000 6000 7000 Cycles Transformation identifier locality
13
Search Space Exploration: Exhaustive Scan CGO’07
1.26e+09 1.28e+09 1.3e+09 1.32e+09 1.34e+09 1.36e+09 1.38e+09 1.4e+09 1.42e+09 100 200 300 400 500 600 700 800 Cycles Transformation identifier crout
1.26e+09 1.27e+09 1.28e+09 1.29e+09 1.3e+09 1.31e+09 1.32e+09 1.33e+09 1.34e+09 100 200 300 400 500 600 700 800 Cycles Transformation identifier crout
14
Search Space Exploration: Exhaustive Scan CGO’07
15
Search Space Exploration: Heuristic Scan CGO’07
16
Search Space Exploration: Heuristic Scan CGO’07
16
Search Space Exploration: Heuristic Scan CGO’07
1
2
16
Search Space Exploration: Heuristic Scan CGO’07
40 50 60 70 80 90 100 2 4 6 8 10 12 14 16 18 20 Maximum speedup achieved (in %) Runs locality Decoupling Random 20 30 40 50 60 70 80 90 100 2 4 6 8 10 12 14 16 18 20 Maximum speedup achieved (in %) Runs matmult Decoupling Random 65 70 75 80 85 90 95 100 2 4 6 8 10 12 14 16 18 20 Maximum speedup achieved (in %) Runs mvt Decoupling Random
5e+08 1e+09 1.5e+09 2e+09 2.5e+09 3e+09 3.5e+09 4e+09 1000 2000 3000 4000 5000 6000 7000 Cycles Transformation identifier locality
6e+08 8e+08 1e+09 1.2e+09 1.4e+09 1.6e+09 1.8e+09 2e+09 100 200 300 400 500 600 700 800 900 1000 Cycles Transformation identifier matmult
4e+08 5e+08 6e+08 7e+08 8e+08 9e+08 1e+09 1.1e+09 1.2e+09 1.3e+09 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Cycles (M)
matvecttransp Original
17
Conclusion: CGO’07
18
Conclusion: CGO’07
18
Conclusion: CGO’07
18
Conclusion: CGO’07
18
Conclusion: CGO’07
18
Conclusion: CGO’07
18
Questions: CGO’07 19
Questions: A Transformation Example CGO’07
21