More Definite Results From the Pluto Scheduling Algorithm
By
Athanasios Konstantinidis
Supervisor
Paul H. J. Kelly
Pluto Scheduling Algorithm By Athanasios Konstantinidis Supervisor - - PowerPoint PPT Presentation
More Definite Results From the Pluto Scheduling Algorithm By Athanasios Konstantinidis Supervisor Paul H. J. Kelly About Me PhD student at Imperial College London supervised by Paul H. J. Kelly. Compiler and Language support for
By
Athanasios Konstantinidis
Supervisor
Paul H. J. Kelly
GPGPUs, Cell BE, Multicore etc.).
Control Graph Extraction ROSE AST Polyhedral Model extraction Main graph Polyhedral framework poly graph PLuTo Scheduling Dependence Analysis Constraints Polyhedral scanning algorithm (CLooG)
Affine Transformations
CLooG graph extraction CLooG IR CUDA graph extraction Graph to AST CLooG graph CUDA graph ROSE AST
Control Graph Extraction ROSE AST Polyhedral Model extraction Main graph Polyhedral framework poly graph PLuTo Scheduling Dependence Analysis Constraints Polyhedral scanning algorithm (CLooG)
Affine Transformations
CLooG graph extraction CLooG IR CUDA graph extraction Graph to AST CLooG graph CUDA graph ROSE AST
Does not require file I/O for syntactic post-processing
The layout of the constraints can affect the scheduling solutions
Control Graph Extraction ROSE AST Polyhedral Model extraction Main graph Polyhedral framework poly graph PLuTo Scheduling Dependence Analysis Constraints Polyhedral scanning algorithm (CLooG)
Affine Transformations
CLooG graph extraction CLooG IR CUDA graph extraction Graph to AST CLooG graph CUDA graph ROSE AST
the original iteration space.
iteration space.
and minimum communication between hyperplane instances (i.e. between different loop iterations).
space time space time time space
MAX + scalar dimensions
lexicographic minimum solution.
Components (SCC) – loop distribution – and remove the killed dependences
Affine Form
Structure Parameters
Farkas Lemma Parameters Unknown schedule coefficients Constant Identification h-transformation
Cost
depend on the ordering of the transformation coefficients.
Ordering of Transformation Coefficients
Cost = 1
i j
for i = 0,N for j = 0, N A[i][j] = A[i-1][j]*A[i-1][j-1] 0 1
solutions both having Cost = 1.
i j
Order 1 : Cost = 1 0 1 0 Order 2 : Cost = 1 0 1 0
i j
i j
Order 1 : Cost = 1 1 0 0 Order 2 : Cost = 1 1 0 0
i j
Fully Parallel Inner Loop Pipeline/Wavefront
Wavefront/pipeline Non-parallel loops
i j i j
i j
Start-up Cost Drain Cost
i j
Better spatial/temporal Locality along a wavefront
i j
Start-up Cost Drain Cost
i j
Better spatial/temporal Locality along a wavefront Depend on structure parameters
i j
Start-up Cost Drain Cost
i j
Better spatial/temporal Locality along a wavefront Depend on structure parameters Number of Read-after-Read dependences that lie within the wavefront
degrees of parallelism.
bit vector
If e extends along i If e does not extend along i
Boolean
If e extends in only 1 dimension If e extends in more than 1 dimensions
i j
1
e
2
e
i j i j
Fully parallel dimension
we are effectively pushing them towards inner nest levels.
minimize communication.
parallelism.
degrees of parallelism.
loops.