a c omparative s tudy of m odulo s cheduling t echniques
play

A C omparative S tudy of M odulo S cheduling T echniques Josep M. - PowerPoint PPT Presentation

UNIVERSITAT POLITCNICA DE CATALUNYA UPC A C omparative S tudy of M odulo S cheduling T echniques Josep M. Codina, Josep Llosa and Antonio Gonzlez Dept. of Computer Architecture Universitat Politcnica de Catalunya Barcelona, SPAIN E-mail:


  1. UNIVERSITAT POLITÈCNICA DE CATALUNYA UPC A C omparative S tudy of M odulo S cheduling T echniques Josep M. Codina, Josep Llosa and Antonio González Dept. of Computer Architecture Universitat Politècnica de Catalunya Barcelona, SPAIN E-mail: {jmcodina,josepll,antonio}@ac.upc.es

  2. Software Pipelining UPC � Instruction Scheduling for VLIW/Superscalar Processors � VLIW processors in DSP market � EPIC/IPF � Loop Scheduling: Software Pipelining � Loops consume most of the application’ execution time I NTRODUCTION � Software Pipelining a loop is an NP-complete problem � Software Pipelining big family of techniques � Modulo Scheduling based on heuristics

  3. Motivation UPC � Modulo Scheduling is an environment to define techniques � Different factors to take into account � Lot of techniques can fit in the environment. Different ideas � Proposals in the literature evaluated without common � Platform (i.e. compiler) I NTRODUCTION � Benchmarks � Target architectures � Measures � Lack of a thorough comparison

  4. Objectives UPC � Perform a comparison of state-of-the-art MS techniques � Qualitative � Quantitative � The work is target to compiler writers I NTRODUCTION � Is one of the techniques better than the others for all architectures? � Which is the most powerful technique for a given architecture?

  5. Talk Outline UPC � Modulo Scheduling Background � Selection Criteria � Techniques Compared � Study Environment � Results � Conclusions

  6. Talk Outline UPC � Modulo Scheduling Background � Selection Criteria � Techniques Compared � Study Environment � Results � Conclusions

  7. Basic Ideas UPC Initiation Interval (II) Stage 1 Prolog Stage 2 M ODULO S CHEDULING Stage 3 Stage 2 Stage 1 Kernel Iteration 1 Iteration 2 Epilog Iteration 3 Iteration 4

  8. Basic Scheme UPC Find MII and Set II=MII Look for a schedule M ODULO S CHEDULING No Increase the II Found it ? Sí

  9. Basic Scheme UPC MII depends on Find MII and Set II=MII • Resources • Recurrences Look for a schedule M ODULO S CHEDULING No Increase the II Found it ? Sí

  10. Basic Scheme UPC Find MII and Set II=MII Look for a schedule Look for a • Ordering the nodes schedule M ODULO S CHEDULING • Finding a feasible cycle • Top-Down/Bottom-up No • Bi-directional Increase the II Found it ? • When no feasible cycle • Use of backtracking Sí • Increase the II

  11. Basic Scheme UPC Find MII and Set II=MII Look for a schedule M ODULO S CHEDULING No Increase the II Found it ? Can we meet the constraints? • Resources Sí • Dependences

  12. Basic Scheme UPC • The larger the II, the more likely to find a schedule Find MII and Set II=MII • The larger the II, the lower the performance • II lower than the length of a single iteration Look for a schedule M ODULO S CHEDULING No Increase the II Found it ? Sí

  13. Backtracking UPC � Not always beneficial � Can produce better schedules � Can just increase the process of finding a schedule M ODULO S CHEDULING � In some cases, no feasible schedule for a given II Backtracking must be limited � BudgetRatio: Ratio of the maximum number of operation scheduling steps attempted before increasing the II

  14. Talk Outline UPC � Modulo Scheduling Background � Selection Criteria � Techniques Compared � Study Environment � Results � Conclusions

  15. UPC � Value of the code generated � Parallelism � Register pressure S ELECTION C RITERIA � Code size � Execution time � Effectiveness/Cost of the technique

  16. UPC � Value of the code generated • Effectiveness on exploting ILP � Parallelism • What is the difference between II and MII? � Register pressure S ELECTION C RITERIA � Code size � Execution time � Effectiveness/Cost of the technique

  17. UPC � Value of the code generated � Parallelism • Software pipelining puts high demands on register pressure � Register pressure • How many regs are needed? S ELECTION C RITERIA • How many loops within a given � Code size number of registers? � Execution time � Effectiveness/Cost of the technique

  18. UPC � Value of the code generated � Parallelism � Register pressure S ELECTION C RITERIA • Crucial in embedded domains � Code size • Stages of a schedule � Execution time � Effectiveness/Cost of the technique

  19. UPC � Value of the code generated � Parallelism � Register pressure S ELECTION C RITERIA � Code size � Execution time • Main objective � Effectiveness/Cost of the technique

  20. UPC � Value of the code generated � Parallelism � Register pressure S ELECTION C RITERIA � Code size � Execution time � Effectiveness/Cost of the technique • Can all the loops be scheduled? • Compilation time

  21. Talk Outline UPC � Modulo Scheduling Background � Selection Criteria � Techniques Compared � Study Environment � Results � Conclusions

  22. Techniques Stage MS UPC � Modulo Scheduling Techniques • Post-pass that can be applied after a MS technique � Iterative Modulo Scheduling (IMS) • To reduce the Register Pressure T ECHNIQUES C OMPARED � Swing Modulo Scheduling (SMS) � Slack Modulo Scheduling (Slack MS) • Without increasing the II � Integrated Register-sensitive Iterative Software Pipelining method (IRIS) • Moves operations by II � Complementary techniques • Various heuristics. We selected 3UP+RSS heuristic � Stage Modulo Scheduling (Stage MS)

  23. Main Differences UPC IMS SMS Slack MS IRIS •Priority to recurrences •Dynamic Order of T ECHNIQUES C OMPARED Top-Down •No pred. and succ. Top-Down nodes •Based on Slack scheduled in partial schedule •Bi-directional •Close to pred •Bi-directional Finding a Stage MS Top-Down or succ. cycle Heuristics •Close to pred or succ depending on the benefit Backtracking Yes No Yes Yes

  24. Qualitative Comparison UPC IMS SMS Slack MS IRIS •Order Parallelism Backtracking Order Backtracking •Backtracking T ECHNIQUES C OMPARED •Order Register Stage No Bi-directional Pressure Heuristics •Bi-directional Code Size Yes Yes Yes Yes Effectiveness Backtracking Backtracking Backtracking No Cost Backtracking

  25. Talk Outline UPC � Modulo Scheduling Background � Selection Criteria � Techniques Compared � Study Environment � Results � Conclusions

  26. Environment UPC � Platform (i.e. compiler) ICTINEO � Benchmarks S TUDY E NVIRONMENT � SPECfp95 1936 loops � Perfect Club � Target architectures � Some architectures varying the complexity Less Constrained � Low Complexity architecture � Medium Complexity architecture � Complex architecture More Constrained

  27. Architectures Description UPC Low Complexity Medium Complexity Complex Architecture Fully Pipelined Simple ops Fully Pipelined ops Non-Pipelined Complex ops 8-Issue 4-Issue 4 write-ports Unlimited register ports S TUDY E NVIRONMENT 8 read-ports 1936 loops 2 memory ports 2 Int FU and 2 FP FU Latencies Low/Medium Complex MEM 2 3 ADD, SUB, COMP 1 3 INT MUL 2 4 DIV, MOD, SQRT 6 8 ADD, SUB, COMP 3 5 FP MUL 6 8 DIV, MOD, SQRT 18 20

  28. Methodology UPC � Study of the BudgetRatio for each architecture: 1, 2.5, 5 and 10 � Effectiveness � Performance � Cost S TUDY E NVIRONMENT � Measures for each technique with and without Stage MS � Effectiveness and cost � Parallelism � Register pressure � Code size � Execution

  29. Talk Outline UPC � Modulo Scheduling Background � Selection Criteria � Techniques Compared � Study Environment � Results � Conclusions

  30. BudgetRatio Study UPC 1,2 Low Complexity 1,15 Architecture 1,1 Sum II/Sum MI IMS 1,05 IRIS Slack Medium Complex 1 Low Complexity Complexity Architecture 0,95 Performance Effectiveness Cost 0,9 1 2,5 5 10 10 BudgetRatio 5 2.5 450 12 400 10 % non scheduled ops 350 300 8 Total time 250 R ESULTS 6 200 150 4 100 2 50 0 0 1 2,5 5 10 1 2,5 5 10 BudgetRatio BudgetRatio

  31. II vs MII UPC UPC 1,014 1,012 Average (II/MII) 1,01 IMS 1,008 SMS 1,006 IRIS 1,004 Slack 1,002 1 R ESULTS Low Medium Architectures

  32. Register Pressure UPC UPC 1,9 1,8 MaxLive/MinAvg 1,7 IMS 1,6 IMS+ST 1,5 SMS 1,4 SMS+ST 1,3 IRIS IRIS+ST 1,2 Slack 1,1 Slack+ST 1 R ESULTS Low Medium Architectures

  33. Execution Time UPC UPC Low Complexity Architecture Medium Complexity Architecture Millions Millions IMS 28700 IMS 42000 IMS+ST 28600 IMS+ST 40000 SMS 28500 SMS 38000 SMS+ST Cycles 28400 SMS+ST Cycles 36000 IRIS 28300 IRIS 34000 IRIS+ST 28200 IRIS+ST 32000 Slack 28100 R ESULTS Slack 30000 Slack+ST 28000 Slack+ST 28000 Techniques Techniques

  34. Complex Architecture UPC UPC II vs MII 1,8 Millions 1,7 1,07 42000 IMS MaxLive/MinAvg 1,6 1,06 Average (II/MII) 40000 IMS+ST IMS 1,05 1,5 SMS 38000 IMS IMS+ST 1,04 1,4 SMS+ST SMS SMS Cycles 36000 1,03 IRIS 1,3 IRIS SMS+ST 34000 1,02 IRIS+ST Slack IRIS 1,2 32000 Slack 1,01 IRIS+ST 1,1 Slack+ST R ESULTS R ESULTS 30000 1 Slack 1 Slack+ST Techniques 28000 Techniques Techniques

  35. Talk Outline UPC � Modulo Scheduling Background � Selection Criteria � Techniques Compared � Study Environment � Results � Conclusions

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend