A C omparative S tudy of M odulo S cheduling T echniques Josep M. - - PowerPoint PPT Presentation

a c omparative s tudy of m odulo s cheduling t echniques
SMART_READER_LITE
LIVE PREVIEW

A C omparative S tudy of M odulo S cheduling T echniques Josep M. - - PowerPoint PPT Presentation

UNIVERSITAT POLITCNICA DE CATALUNYA UPC A C omparative S tudy of M odulo S cheduling T echniques Josep M. Codina, Josep Llosa and Antonio Gonzlez Dept. of Computer Architecture Universitat Politcnica de Catalunya Barcelona, SPAIN E-mail:


slide-1
SLIDE 1

A Comparative Study of Modulo Scheduling Techniques

Josep M. Codina, Josep Llosa and Antonio González

  • Dept. of Computer Architecture

Universitat Politècnica de Catalunya Barcelona, SPAIN E-mail: {jmcodina,josepll,antonio}@ac.upc.es

UPC

UNIVERSITAT POLITÈCNICA DE CATALUNYA

slide-2
SLIDE 2

Software Pipelining

UPC

INTRODUCTION

Instruction Scheduling for VLIW/Superscalar Processors

VLIW processors in DSP market EPIC/IPF

Loop Scheduling: Software Pipelining

Loops consume most of the application’ execution time

Software Pipelining a loop is an NP-complete problem Software Pipelining big family of techniques

Modulo Scheduling based on heuristics

slide-3
SLIDE 3

Motivation

UPC

INTRODUCTION

Modulo Scheduling is an environment to define techniques

Different factors to take into account Lot of techniques can fit in the environment. Different ideas

Proposals in the literature evaluated without common

Platform (i.e. compiler) Benchmarks Target architectures Measures

Lack of a thorough comparison

slide-4
SLIDE 4

Objectives

UPC

INTRODUCTION

Perform a comparison of state-of-the-art MS techniques

Qualitative Quantitative

The work is target to compiler writers

Is one of the techniques better than the others for all architectures? Which is the most powerful technique for a given architecture?

slide-5
SLIDE 5

Talk Outline

UPC

Modulo Scheduling Background Selection Criteria Techniques Compared Study Environment Results Conclusions

slide-6
SLIDE 6

Talk Outline

UPC

Modulo Scheduling Background Selection Criteria Techniques Compared Study Environment Results Conclusions

slide-7
SLIDE 7

Basic Ideas

UPC

MODULO SCHEDULING

Prolog Kernel Epilog Initiation Interval (II)

Stage 3 Stage 2 Stage 1 Stage 2 Stage 1

Iteration 1 Iteration 2 Iteration 3 Iteration 4

slide-8
SLIDE 8

Basic Scheme

UPC

MODULO SCHEDULING

Find MII and Set II=MII Look for a schedule Found it ?

Sí No

Increase the II

slide-9
SLIDE 9

Basic Scheme

UPC

MODULO SCHEDULING

Find MII and Set II=MII Look for a schedule Found it ?

Sí No

Increase the II

MII depends on

  • Resources
  • Recurrences
slide-10
SLIDE 10

Basic Scheme

UPC

MODULO SCHEDULING

Find MII and Set II=MII Look for a schedule Found it ?

Sí No

Increase the II

Look for a schedule

  • Ordering the nodes
  • Finding a feasible cycle
  • Top-Down/Bottom-up
  • Bi-directional
  • When no feasible cycle
  • Use of backtracking
  • Increase the II
slide-11
SLIDE 11

Basic Scheme

UPC

MODULO SCHEDULING

Find MII and Set II=MII Look for a schedule Found it ?

Sí No

Increase the II

Can we meet the constraints?

  • Resources
  • Dependences
slide-12
SLIDE 12

Basic Scheme

UPC

MODULO SCHEDULING

Find MII and Set II=MII Look for a schedule Found it ?

Sí No

Increase the II

  • The larger the II, the more likely to find a schedule
  • The larger the II, the lower the performance
  • II lower than the length of a single iteration
slide-13
SLIDE 13

Backtracking

UPC

MODULO SCHEDULING

Not always beneficial

Can produce better schedules Can just increase the process of finding a schedule

In some cases, no feasible schedule for a given II

Backtracking must be limited

BudgetRatio:

Ratio of the maximum number of operation scheduling steps attempted before increasing the II

slide-14
SLIDE 14

Talk Outline

UPC

Modulo Scheduling Background Selection Criteria Techniques Compared Study Environment Results Conclusions

slide-15
SLIDE 15

UPC

SELECTION CRITERIA

Value of the code generated

Parallelism Register pressure Code size Execution time

Effectiveness/Cost of the technique

slide-16
SLIDE 16

UPC

SELECTION CRITERIA

Value of the code generated

Parallelism Register pressure Code size Execution time

Effectiveness/Cost of the technique

  • Effectiveness on exploting ILP
  • What is the difference between

II and MII?

slide-17
SLIDE 17

UPC

SELECTION CRITERIA

Value of the code generated

Parallelism Register pressure Code size Execution time

Effectiveness/Cost of the technique

  • Software pipelining puts high

demands on register pressure

  • How many regs are needed?
  • How many loops within a given

number of registers?

slide-18
SLIDE 18

UPC

SELECTION CRITERIA

Value of the code generated

Parallelism Register pressure Code size Execution time

Effectiveness/Cost of the technique

  • Crucial in embedded domains
  • Stages of a schedule
slide-19
SLIDE 19

UPC

SELECTION CRITERIA

Value of the code generated

Parallelism Register pressure Code size Execution time

Effectiveness/Cost of the technique

  • Main objective
slide-20
SLIDE 20

UPC

SELECTION CRITERIA

Value of the code generated

Parallelism Register pressure Code size Execution time

Effectiveness/Cost of the technique

  • Can all the loops be scheduled?
  • Compilation time
slide-21
SLIDE 21

Talk Outline

UPC

Modulo Scheduling Background Selection Criteria Techniques Compared Study Environment Results Conclusions

slide-22
SLIDE 22

Techniques

Modulo Scheduling Techniques

Iterative Modulo Scheduling (IMS) Swing Modulo Scheduling (SMS) Slack Modulo Scheduling (Slack MS) Integrated Register-sensitive Iterative Software Pipelining method (IRIS)

Complementary techniques

Stage Modulo Scheduling (Stage MS)

UPC

TECHNIQUES COMPARED

  • Post-pass that can be applied after a MS technique
  • To reduce the Register Pressure
  • Without increasing the II
  • Moves operations by II
  • Various heuristics. We selected 3UP+RSS heuristic

Stage MS

slide-23
SLIDE 23

Main Differences

UPC

TECHNIQUES COMPARED

Yes Yes No Yes Backtracking Stage MS Heuristics

  • Bi-directional
  • Close to pred
  • r succ.

depending on the benefit

  • Bi-directional
  • Close to pred or succ

Top-Down Finding a cycle Top-Down

  • Dynamic
  • Based on Slack
  • Priority to

recurrences

  • No pred. and succ.

scheduled in partial schedule Top-Down Order of nodes IRIS Slack MS SMS IMS

slide-24
SLIDE 24

Qualitative Comparison

UPC

TECHNIQUES COMPARED

No Backtracking Cost Backtracking Backtracking Backtracking Effectiveness Yes Yes Yes Yes Code Size Stage Heuristics Bi-directional

  • Order
  • Bi-directional

No Register Pressure Backtracking

  • Order
  • Backtracking

Order Backtracking Parallelism IRIS Slack MS SMS IMS

slide-25
SLIDE 25

Talk Outline

UPC

Modulo Scheduling Background Selection Criteria Techniques Compared Study Environment Results Conclusions

slide-26
SLIDE 26

Environment

UPC

STUDY ENVIRONMENT

Platform (i.e. compiler)

ICTINEO

Benchmarks

SPECfp95 Perfect Club

Target architectures

Some architectures varying the complexity Low Complexity architecture Medium Complexity architecture Complex architecture

1936 loops

Less Constrained More Constrained

slide-27
SLIDE 27

Architectures Description

UPC

STUDY ENVIRONMENT

1936 loops

4-Issue

Medium Complexity

2 Int FU and 2 FP FU 2 memory ports 4 write-ports 8 read-ports Unlimited register ports 8-Issue Fully Pipelined Simple ops Non-Pipelined Complex ops Fully Pipelined ops

Complex Architecture Low Complexity

DIV, MOD, SQRT MUL ADD, SUB, COMP DIV, MOD, SQRT MUL ADD, SUB, COMP 20 18 8 6 FP INT MEM

Latencies

5 3 8 6 4 2 3 1 3 2

Complex Low/Medium

slide-28
SLIDE 28

Methodology

UPC

STUDY ENVIRONMENT

Study of the BudgetRatio for each architecture: 1, 2.5, 5 and 10

Effectiveness Performance Cost

Measures for each technique with and without Stage MS

Effectiveness and cost Parallelism Register pressure Code size Execution

slide-29
SLIDE 29

Talk Outline

UPC

Modulo Scheduling Background Selection Criteria Techniques Compared Study Environment Results Conclusions

slide-30
SLIDE 30

0,9 0,95 1 1,05 1,1 1,15 1,2 1 2,5 5 10 BudgetRatio Sum II/Sum MI IMS IRIS Slack

Performance

50 100 150 200 250 300 350 400 450 1 2,5 5 10 BudgetRatio Total time

Cost

BudgetRatio Study

UPC

RESULTS

Low Complexity Architecture

2 4 6 8 10 12 1 2,5 5 10 BudgetRatio % non scheduled ops

Effectiveness

5 Medium Complexity 2.5 10 Complex Architecture Low Complexity

slide-31
SLIDE 31

II vs MII

UPC

1 1,002 1,004 1,006 1,008 1,01 1,012 1,014 Average (II/MII) Low Medium Architectures IMS SMS IRIS Slack

UPC

RESULTS

slide-32
SLIDE 32

1 1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9 MaxLive/MinAvg Low Medium Architectures IMS IMS+ST SMS SMS+ST IRIS IRIS+ST Slack Slack+ST

Register Pressure

UPC UPC

RESULTS

slide-33
SLIDE 33

28000 28100 28200 28300 28400 28500 28600 28700 Cycles

Millions

Techniques

Low Complexity Architecture

IMS IMS+ST SMS SMS+ST IRIS IRIS+ST Slack Slack+ST

Execution Time

UPC

28000 30000 32000 34000 36000 38000 40000 42000 Cycles

Millions

Techniques

Medium Complexity Architecture

IMS IMS+ST SMS SMS+ST IRIS IRIS+ST Slack Slack+ST

UPC

RESULTS

slide-34
SLIDE 34

Complex Architecture

UPC

RESULTS

1 1,01 1,02 1,03 1,04 1,05 1,06 1,07 Average (II/MII) Techniques

II vs MII

IMS SMS IRIS Slack 1 1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 MaxLive/MinAvg Techniques IMS IMS+ST SMS SMS+ST IRIS IRIS+ST Slack Slack+ST 28000 30000 32000 34000 36000 38000 40000 42000 Cycles

Millions

Techniques IMS IMS+ST SMS SMS+ST IRIS IRIS+ST Slack Slack+ST

UPC

RESULTS

slide-35
SLIDE 35

Talk Outline

UPC

Modulo Scheduling Background Selection Criteria Techniques Compared Study Environment Results Conclusions

slide-36
SLIDE 36

Summary

UPC

CONCLUSIONS

We have performed a comparison of well known techniques

IMS, SMS, Slack MS, IRIS and Stage MS

We have use a common

Compiler platform Benchmarks Target architectures Measures

To perform a Quantitative Comparison

Study of the BudgetRatio Measures of the techniques (Code quality, Effectiveness and cost)

slide-37
SLIDE 37

Results Conclusion

UPC

CONCLUSIONS

Some interesting results:

SMS best II and register pressure for lower/medium complex IMS slightly better II for complex architectures Slack MS, close to SMS in register pressure IMS and IRIS improved by Stage MS but still beyond SMS and Slack

Bottomline

SMS is generally the best or close to the best

The tool can be used to assess any particular microarchitecture

slide-38
SLIDE 38

A Comparative Study of Modulo Scheduling Techniques

Josep M. Codina, Josep Llosa and Antonio González

  • Dept. of Computer Architecture

Universitat Politècnica de Catalunya Barcelona, SPAIN E-mail: {jmcodina,josepll,antonio}@ac.upc.es

UPC

UNIVERSITAT POLITÈCNICA DE CATALUNYA