Iterative Optimization in the Polyhedral Model: Part I, - - PowerPoint PPT Presentation

iterative optimization in the polyhedral model part i one
SMART_READER_LITE
LIVE PREVIEW

Iterative Optimization in the Polyhedral Model: Part I, - - PowerPoint PPT Presentation

Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time Louis-Noel Pouchet, Cedric Bastoul, Albert Cohen and Nicolas Vasilache Presented by: Tom St. John Dept of Computer & Information Sciences University of Delaware


slide-1
SLIDE 1

Presented by: Tom St. John

Dept of Computer & Information Sciences University of Delaware

Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time Louis-Noel Pouchet, Cedric Bastoul, Albert Cohen and Nicolas Vasilache

slide-2
SLIDE 2

Introduction

  • Focus on Loop Nest Optimization for regular loops
  • Automatic method for parallelism extraction/loop

transformation

  • Combine iterative methods with the polyhedral model
  • Solution is independent of the compiler and the target

machine

slide-3
SLIDE 3

Contribution

Search Space Construction

  • One point in the search space maps to one distinct legal

program version

  • Suitable for various exploration methods

Performance

  • 99% of best speedup attained within 20 runs of a dedicated

heuristic

  • Wall clock optimal transformation discoverable on small

kernels

slide-4
SLIDE 4

A Motivating Example

slide-5
SLIDE 5

One-Dimensional Scheduling

Original Schedule

  • Specify the outer-most loop only
  • Initial outer-most loop is i
slide-6
SLIDE 6

One-Dimensional Scheduling

Distribute Loops

  • Specify the outer-most loop only
  • All instances of S1 execute before the first instance of S2
slide-7
SLIDE 7

One-Dimensional Scheduling

Distribute Loops and Loop Interchange for S2

  • Specify the outer-most loop only
  • The outer-most loop for S2 becomes j
slide-8
SLIDE 8

One-Dimensional Scheduling

Distribute Loops and Loop Interchange for S2

slide-9
SLIDE 9

One-Dimensional Scheduling

  • A schedule is an affine function of the iteration vector and the

parameters

slide-10
SLIDE 10

One-Dimensional Scheduling

  • A schedule is an affine function of the iteration vector and the

parameters

  • For -1 t 1, there are 37 = 2187 possible schedules
slide-11
SLIDE 11

One-Dimensional Scheduling

  • A schedule is an affine function of the iteration vector and the

parameters

  • For -1 t 1, there are 37 = 2187 possible schedules
  • However, only 129 legal distinct schedules
slide-12
SLIDE 12

Overview

slide-13
SLIDE 13

Search Space Construction

  • Efficiently construct a space of all legal, distinct affine

schedules

slide-14
SLIDE 14

Search Space Construction

  • Efficiently construct a space of all legal, distinct affine

schedules

slide-15
SLIDE 15

Search Space Construction

  • Efficiently construct a space of all legal, distinct affine

schedules

  • Rely on polyhedral model and integer linear programming to

guarantee completeness and correctness of the space properties

  • Search space will encompass unique, distinct compositions of

reversal, skewing, interchange, fusion, peeling, shifting, distribution

slide-16
SLIDE 16

Search Space Exploration

  • Perform exhaustive scan to discover wall clock optimal

schedule, and evidence of intricacy of the best transformation

  • Build an efficient heuristic to accelerate the space traversal
slide-17
SLIDE 17

Search Space Construction

slide-18
SLIDE 18

Polyhedral Representation

Static Control Parts (SCoP)

  • Loops have affine control only
slide-19
SLIDE 19

Polyhedral Representation

Static Control Parts (SCoP)

  • Loops have affine control only
  • Iteration domain – represented as integer polyhedra
slide-20
SLIDE 20

Polyhedral Representation

Static Control Parts (SCoP)

  • Loops have affine control only
  • Iteration domain – represented as integer polyhedra
  • Memory accesses – static references, represented as affine

functions of and

slide-21
SLIDE 21

Polyhedral Representation

Static Control Parts (SCoP)

  • Loops have affine control only
  • Iteration domain – represented as integer polyhedra
  • Memory accesses – static references, represented as affine

functions of and

  • Data dependence between S1 and S2 – a subset of the

Cartesian product of and

slide-22
SLIDE 22

Polyhedral Representation

Static Control Parts (SCoP)

  • Loops have affine control only
  • Iteration domain – represented as integer polyhedra
  • Memory accesses – static references, represented as affine

functions of and

  • Data dependence between S1 and S2 – a subset of the

Cartesian product of and

  • Reduced dependence graph labelled by dependence

polyhedra

slide-23
SLIDE 23

Space Construction

slide-24
SLIDE 24

Space Construction

slide-25
SLIDE 25

Search Space Construction

slide-26
SLIDE 26

Search Space Construction

slide-27
SLIDE 27

Search Space Construction

slide-28
SLIDE 28

Search Space Construction

  • Solve the constraint system
  • Use optimized Fourier-Motzkin projection algorithm
  • Reduces redundancy
  • Detects implicit equalities
slide-29
SLIDE 29

Search Space Construction

slide-30
SLIDE 30

Search Space Construction

  • One point in the search space one set of legal schedules

w.r.t. the dependence

slide-31
SLIDE 31

Search Space Construction

Algorithm

  • Add constraints obtained for each dependence
  • Bound the search space
  • Search space – represented by a set of linear constraints on

the schedule coefficients (Z-polytope)

  • Every integral point in the search space corresponds to a

distinct program version where the semantics are preserved

slide-32
SLIDE 32

Search Space Exploration

slide-33
SLIDE 33

Workflow

  • CLooG – http://cloog.org
  • PipLib – http://piplib.org
  • PolyLib - http://icps.u-strasbg.fr/polylib/
slide-34
SLIDE 34

Exhaustive Scan

Performance Distribution (1)

slide-35
SLIDE 35

Exhaustive Scan

Performance Distribution (2)

slide-36
SLIDE 36

Exhaustive Scan

Performance Comparison

slide-37
SLIDE 37

Heuristic Scan

Propose a decoupling heuristic:

  • The general form of the schedule is embedded in the iterator

coefficients

  • Decouple the schedule
  • Parameter and constant coefficients are less critical, but can

be used to refine the search

slide-38
SLIDE 38

Heuristic Scan

Addressing scalability to larger SCoP

  • Impose a static or dynamic maximum on the number of runs

(limit exploration to domain)

  • Replace the exhaustive enumeration of combinations with

a limited set of random trials in the domain

slide-39
SLIDE 39

Heuristic Scan

Results

slide-40
SLIDE 40

Conclusions

  • Implemented optimization and transformation framework on

top of the compiler

  • Achieved promising speedup and fast heuristic convergence
  • Optimal transformation can be discovered for small kernels
slide-41
SLIDE 41

Ongoing/Future Work

  • Combine with state-of-the-art feedback-directed iterative

methods

  • Part II – Multidimensional Schedules (PLDI 08)
  • Integrate into GCC GRAPHITE branch