compiler optimisation
play

Compiler Optimisation 6 Instruction Scheduling Hugh Leather IF - PowerPoint PPT Presentation

Compiler Optimisation 6 Instruction Scheduling Hugh Leather IF 1.18a hleather@inf.ed.ac.uk Institute for Computing Systems Architecture School of Informatics University of Edinburgh 2019 Introduction This lecture: Scheduling to hide


  1. Compiler Optimisation 6 – Instruction Scheduling Hugh Leather IF 1.18a hleather@inf.ed.ac.uk Institute for Computing Systems Architecture School of Informatics University of Edinburgh 2019

  2. Introduction This lecture: Scheduling to hide latency and exploit ILP Dependence graph Local list Scheduling + priorities Forward versus backward scheduling Software pipelining of loops

  3. Latency, functional units, and ILP Instructions take clock cycles to execute ( latency ) Modern machines issue several operations per cycle Cannot use results until ready, can do something else Execution time is order-dependent Latencies not always constant (cache, early exit, etc) Operation Cycles 3 load, store load / 2 cache 100s 1 loadI, add, shift mult 2 40 div 0 – 8 branch

  4. Machine types In order Deep pipelining allows multiple instructions Superscalar Multiple functional units, can issue > 1 instruction Out of order Large window of instructions can be reordered dynamically VLIW Compiler statically allocates to FUs

  5. E ff ect of scheduling Superscalar, 1 FU: New op each cycle if operands ready Simple schedule 1 a := 2*a*b*c Cycle Operations Operands waiting r arp , @ a ⇒ r 1 loadAI add r 1 , r 1 ⇒ r 1 r arp , @ b ⇒ r 2 loadAI mult r 1 , r 2 ⇒ r 1 r arp , @ c ⇒ r 2 loadAI mult r 1 , r 2 ⇒ r 1 r 1 ⇒ r arp , @ a storeAI Done 1 load s/ store s 3 cycles, mult s 2, add s 1

  6. E ff ect of scheduling Superscalar, 1 FU: New op each cycle if operands ready Simple schedule 1 a := 2*a*b*c Cycle Operations Operands waiting 1 r arp , @ a ⇒ r 1 r 1 loadAI 2 r 1 3 r 1 add r 1 , r 1 ⇒ r 1 r arp , @ b ⇒ r 2 loadAI mult r 1 , r 2 ⇒ r 1 r arp , @ c ⇒ r 2 loadAI mult r 1 , r 2 ⇒ r 1 r 1 ⇒ r arp , @ a storeAI Done 1 load s/ store s 3 cycles, mult s 2, add s 1

  7. E ff ect of scheduling Superscalar, 1 FU: New op each cycle if operands ready Simple schedule 1 a := 2*a*b*c Cycle Operations Operands waiting 1 r arp , @ a ⇒ r 1 r 1 loadAI 2 r 1 3 r 1 4 add r 1 , r 1 ⇒ r 1 r 1 r arp , @ b ⇒ r 2 loadAI mult r 1 , r 2 ⇒ r 1 r arp , @ c ⇒ r 2 loadAI mult r 1 , r 2 ⇒ r 1 r 1 ⇒ r arp , @ a storeAI Done 1 load s/ store s 3 cycles, mult s 2, add s 1

  8. E ff ect of scheduling Superscalar, 1 FU: New op each cycle if operands ready Simple schedule 1 a := 2*a*b*c Cycle Operations Operands waiting 1 r arp , @ a ⇒ r 1 r 1 loadAI 2 r 1 3 r 1 4 add r 1 , r 1 ⇒ r 1 r 1 5 r arp , @ b ⇒ r 2 r 2 loadAI 6 r 2 7 r 2 mult r 1 , r 2 ⇒ r 1 r arp , @ c ⇒ r 2 loadAI r 1 , r 2 ⇒ r 1 mult r 1 ⇒ r arp , @ a storeAI Done 1 load s/ store s 3 cycles, mult s 2, add s 1

  9. E ff ect of scheduling Superscalar, 1 FU: New op each cycle if operands ready Simple schedule 1 a := 2*a*b*c Cycle Operations Operands waiting 1 r arp , @ a ⇒ r 1 r 1 loadAI 2 r 1 3 r 1 4 add r 1 , r 1 ⇒ r 1 r 1 5 r arp , @ b ⇒ r 2 r 2 loadAI 6 r 2 7 r 2 8 mult r 1 , r 2 ⇒ r 1 r 1 9 Next op does not use r 1 r 1 r arp , @ c ⇒ r 2 loadAI r 1 , r 2 ⇒ r 1 mult r 1 ⇒ r arp , @ a storeAI Done 1 load s/ store s 3 cycles, mult s 2, add s 1

  10. E ff ect of scheduling Superscalar, 1 FU: New op each cycle if operands ready Simple schedule 1 a := 2*a*b*c Cycle Operations Operands waiting 1 r arp , @ a ⇒ r 1 r 1 loadAI 2 r 1 3 r 1 4 add r 1 , r 1 ⇒ r 1 r 1 5 r arp , @ b ⇒ r 2 r 2 loadAI 6 r 2 7 r 2 8 mult r 1 , r 2 ⇒ r 1 r 1 9 r arp , @ c ⇒ r 2 r 1 , r 2 loadAI 10 r 2 11 r 2 r 1 , r 2 ⇒ r 1 mult r 1 ⇒ r arp , @ a storeAI Done 1 load s/ store s 3 cycles, mult s 2, add s 1

  11. E ff ect of scheduling Superscalar, 1 FU: New op each cycle if operands ready Simple schedule 1 a := 2*a*b*c Cycle Operations Operands waiting 1 r arp , @ a ⇒ r 1 r 1 loadAI 2 r 1 3 r 1 4 add r 1 , r 1 ⇒ r 1 r 1 5 r arp , @ b ⇒ r 2 r 2 loadAI 6 r 2 7 r 2 8 mult r 1 , r 2 ⇒ r 1 r 1 9 r arp , @ c ⇒ r 2 r 1 , r 2 loadAI 10 r 2 11 r 2 12 r 1 , r 2 ⇒ r 1 r 1 mult 13 r 1 r 1 ⇒ r arp , @ a storeAI Done 1 load s/ store s 3 cycles, mult s 2, add s 1

  12. E ff ect of scheduling Superscalar, 1 FU: New op each cycle if operands ready Simple schedule 1 a := 2*a*b*c Cycle Operations Operands waiting 1 r arp , @ a ⇒ r 1 r 1 loadAI 2 r 1 3 r 1 4 add r 1 , r 1 ⇒ r 1 r 1 5 r arp , @ b ⇒ r 2 r 2 loadAI 6 r 2 7 r 2 8 mult r 1 , r 2 ⇒ r 1 r 1 9 r arp , @ c ⇒ r 2 r 1 , r 2 loadAI 10 r 2 11 r 2 12 r 1 , r 2 ⇒ r 1 r 1 mult 13 r 1 14 r 1 ⇒ r arp , @ a store to complete storeAI 15 store to complete 16 store to complete Done 1 load s/ store s 3 cycles, mult s 2, add s 1

  13. E ff ect of scheduling Superscalar, 1 FU: New op each cycle if operands ready Schedule loads early 2 a := 2*a*b*c Cycle Operations Operands waiting r arp , @ a ⇒ r 1 loadAI loadAI r arp , @ b ⇒ r 2 r arp , @ c ⇒ r 3 loadAI add r 1 , r 1 ⇒ r 1 r 1 , r 2 ⇒ r 1 mult mult r 1 , r 2 ⇒ r 1 r 1 ⇒ r arp , @ a storeAI Done 2 load s/ store s 3 cycles, mult s 2, add s 1

  14. E ff ect of scheduling Superscalar, 1 FU: New op each cycle if operands ready Schedule loads early 2 a := 2*a*b*c Cycle Operations Operands waiting 1 r arp , @ a ⇒ r 1 r 1 loadAI loadAI r arp , @ b ⇒ r 2 r arp , @ c ⇒ r 3 loadAI add r 1 , r 1 ⇒ r 1 r 1 , r 2 ⇒ r 1 mult mult r 1 , r 3 ⇒ r 1 r 1 ⇒ r arp , @ a storeAI Done 2 load s/ store s 3 cycles, mult s 2, add s 1

  15. E ff ect of scheduling Superscalar, 1 FU: New op each cycle if operands ready Schedule loads early 2 a := 2*a*b*c Cycle Operations Operands waiting 1 r arp , @ a ⇒ r 1 r 1 loadAI 2 loadAI r arp , @ b ⇒ r 2 r 1 , r 2 r arp , @ c ⇒ r 3 loadAI add r 1 , r 1 ⇒ r 1 r 1 , r 2 ⇒ r 1 mult mult r 1 , r 3 ⇒ r 1 r 1 ⇒ r arp , @ a storeAI Done 2 load s/ store s 3 cycles, mult s 2, add s 1

  16. E ff ect of scheduling Superscalar, 1 FU: New op each cycle if operands ready Schedule loads early 2 a := 2*a*b*c Cycle Operations Operands waiting 1 r arp , @ a ⇒ r 1 r 1 loadAI 2 loadAI r arp , @ b ⇒ r 2 r 1 , r 2 3 r arp , @ c ⇒ r 3 r 1 , r 2 , r 3 loadAI add r 1 , r 1 ⇒ r 1 r 1 , r 2 ⇒ r 1 mult mult r 1 , r 3 ⇒ r 1 r 1 ⇒ r arp , @ a storeAI Done 2 load s/ store s 3 cycles, mult s 2, add s 1

  17. E ff ect of scheduling Superscalar, 1 FU: New op each cycle if operands ready Schedule loads early 2 a := 2*a*b*c Cycle Operations Operands waiting 1 r arp , @ a ⇒ r 1 r 1 loadAI 2 loadAI r arp , @ b ⇒ r 2 r 1 , r 2 3 r arp , @ c ⇒ r 3 r 1 , r 2 , r 3 loadAI 4 add r 1 , r 1 ⇒ r 1 r 1 , r 2 , r 3 r 1 , r 2 ⇒ r 1 mult mult r 1 , r 3 ⇒ r 1 r 1 ⇒ r arp , @ a storeAI Done 2 load s/ store s 3 cycles, mult s 2, add s 1

  18. E ff ect of scheduling Superscalar, 1 FU: New op each cycle if operands ready Schedule loads early 2 a := 2*a*b*c Cycle Operations Operands waiting 1 r arp , @ a ⇒ r 1 r 1 loadAI 2 loadAI r arp , @ b ⇒ r 2 r 1 , r 2 3 r arp , @ c ⇒ r 3 r 1 , r 2 , r 3 loadAI 4 add r 1 , r 1 ⇒ r 1 r 1 , r 2 , r 3 5 r 1 , r 2 ⇒ r 1 r 1 , r 3 mult 6 r 1 r 1 , r 3 ⇒ r 1 mult storeAI r 1 ⇒ r arp , @ a Done 2 load s/ store s 3 cycles, mult s 2, add s 1

  19. E ff ect of scheduling Superscalar, 1 FU: New op each cycle if operands ready Schedule loads early 2 a := 2*a*b*c Cycle Operations Operands waiting 1 r arp , @ a ⇒ r 1 r 1 loadAI 2 loadAI r arp , @ b ⇒ r 2 r 1 , r 2 3 r arp , @ c ⇒ r 3 r 1 , r 2 , r 3 loadAI 4 add r 1 , r 1 ⇒ r 1 r 1 , r 2 , r 3 5 r 1 , r 2 ⇒ r 1 r 1 , r 3 mult 6 r 1 7 r 1 , r 3 ⇒ r 1 r 1 mult 8 r 1 r 1 ⇒ r arp , @ a storeAI Done 2 load s/ store s 3 cycles, mult s 2, add s 1

  20. E ff ect of scheduling Superscalar, 1 FU: New op each cycle if operands ready Schedule loads early 2 a := 2*a*b*c Cycle Operations Operands waiting 1 r arp , @ a ⇒ r 1 r 1 loadAI 2 loadAI r arp , @ b ⇒ r 2 r 1 , r 2 3 r arp , @ c ⇒ r 3 r 1 , r 2 , r 3 loadAI 4 add r 1 , r 1 ⇒ r 1 r 1 , r 2 , r 3 5 r 1 , r 2 ⇒ r 1 r 1 , r 3 mult 6 r 1 7 r 1 , r 3 ⇒ r 1 r 1 mult 8 r 1 9 r 1 ⇒ r arp , @ a store to complete storeAI 10 store to complete 11 store to complete Done Uses one more register 11 versus 16 cycles – 31% faster! 2 load s/ store s 3 cycles, mult s 2, add s 1

  21. Scheduling problem Schedule maps operations to cycle; 8 a 2 Ops , S ( a ) 2 N Respect latency; 8 a , b 2 Ops , a dependson b = ) S ( a ) � S ( b ) + λ ( b ) Respect function units; no more ops per type per cycle than FUs can handle Length of schedule, L ( S ) = max a ∈ Ops ( S ( a ) + λ ( a )) Schedule S is time-optimal if 8 S 1 , L ( S )  L ( S 1 ) Problem: Find a time-optimal schedule 3 Even local scheduling with many restrictions is NP-complete 3 A schedule might also be optimal in terms of registers, power, or space

  22. List scheduling Local greedy heuristic to produce schedules for single basic blocks 1 Rename to avoid anti-dependences 2 Build dependency graph 3 Prioritise operations 4 For each cycle Choose the highest priority ready operation & schedule it 1 Update ready queue 2

  23. List scheduling Dependence/Precedence graph Schedule operation only when operands ready Build dependency graph of read-after-write (RAW) deps Label with latency and FU requirements Example: a = 2*a*b*c

  24. List scheduling Dependence/Precedence graph Schedule operation only when operands ready Build dependency graph of read-after-write (RAW) deps Label with latency and FU requirements Anti-dependences (WAR) restrict movement Example: a = 2*a*b*c

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend