hardware acceleration for programs in ssa form
play

Hardware Acceleration for Programs in SSA Form Manuel Mohr, Artjom - PowerPoint PPT Presentation

saarland university computer science Hardware Acceleration for Programs in SSA Form Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, Jrg Henkel Institute for Program Structures and Data Organization, Karlsruhe


  1. saarland university computer science Hardware Acceleration for Programs in SSA Form Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, Jörg Henkel Institute for Program Structures and Data Organization, Karlsruhe Institute of Technology (KIT) 1 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC KIT – University of the State of Baden-Wuerttemberg and www.kit.edu Jörg Henkel – Hardware Acceleration for Programs in SSA Form National Research Center of the Helmholtz Association

  2. SSA-Based Register Allocation Not in Static Single In Static Single Assignment Form Assignment Form Front end Parsing Middle end Optimizations Back end Register Allocation 2 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  3. SSA-Based Register Allocation Not in Static Single In Static Single Assignment Form Assignment Form Front end Parsing Middle end Optimizations Back end Register Allocation 2 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  4. SSA-Based Register Allocation Not in Static Single In Static Single Assignment Form Assignment Form Front end Parsing Middle end Optimizations Back end SSA-Based Register Allocation 2 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  5. SSA-Based Register Allocation Not in Static Single In Static Single Assignment Form Assignment Form Front end Parsing Middle end Fewer spills but more shuffle code Optimizations Back end SSA-Based Register Allocation 2 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  6. Register Transfer Graphs Shuffle code = parallel copy operations between registers 3 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  7. Register Transfer Graphs Shuffle code = parallel copy operations between registers r 1 r 2 r 3 r 4 r 5 Register Transfer Graph (RTG) Nodes: Registers Directed edge ( r 1 , r 2 ) : After copies, value of r 1 must be in r 2 At most one incoming edge per node No incoming edge: Register value is irrelevant after copies 3 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  8. Motivation Number and size of RTGs depend on quality of allocation Reduction is an NP-complete problem r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 ⇒ On standard hardware, implementation may be expensive: 5 % to 20 % of all generated instructions (SPEC) 4 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  9. Motivation Number and size of RTGs depend on quality of allocation Reduction is an NP-complete problem r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 ⇒ On standard hardware, implementation may be expensive: 5 % to 20 % of all generated instructions (SPEC) mov r2 , r1 xor r6 , r7 xor r4 , r5 mov r3 , r2 xor r6 , r5 xor r5 , r4 mov r7 , r8 xor r5 , r6 xor r4 , r3 xor r6 , r7 xor r6 , r5 xor r3 , r4 xor r7 , r6 xor r5 , r4 xor r4 , r3 4 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  10. Motivation Number and size of RTGs depend on quality of allocation Reduction is an NP-complete problem r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 ⇒ On standard hardware, implementation may be expensive: 5 % to 20 % of all generated instructions (SPEC) Question 1: Is it possible to create an instruction set extension that allows implementing an RTG in one processor cycle? 4 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  11. Motivation Number and size of RTGs depend on quality of allocation Reduction is an NP-complete problem r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 ⇒ On standard hardware, implementation may be expensive: 5 % to 20 % of all generated instructions (SPEC) Question 1: Is it possible to create an instruction set extension that allows implementing an RTG in one processor cycle? Question 2: Is it worth it? 4 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  12. Fundamental Hardware Constraints Changing contents of multiple registers in one cycle very costly 5 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  13. Fundamental Hardware Constraints Changing contents of multiple registers in one cycle very costly Idea: Modify access to register file instead of contents Swap r 1 and r 2 : Exchange the access to r 1 and r 2 r 1 42 r 2 23 Register File 5 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  14. Fundamental Hardware Constraints Changing contents of multiple registers in one cycle very costly Idea: Modify access to register file instead of contents Swap r 1 and r 2 : Exchange the access to r 1 and r 2 r 1 42 r 2 23 Register File 5 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  15. Fundamental Hardware Constraints Changing contents of multiple registers in one cycle very costly Idea: Modify access to register file instead of contents Swap r 1 and r 2 : Exchange the access to r 1 and r 2 r 1 42 r 2 23 Register File ⇒ Restriction to permutations of registers 5 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  16. ISA Extension Add permutation instructions to SPARC V8 ISA 32 registers ⇒ 5 bits to identify one register 7 bits for opcode ⇒ 25 bits left for encoding 5 register numbers 31 27 24 21 19 14 9 4 0 0001 a 1 000 a 2 b c d e 6 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  17. ISA Extension Add permutation instructions to SPARC V8 ISA 32 registers ⇒ 5 bits to identify one register 7 bits for opcode ⇒ 25 bits left for encoding 5 register numbers 31 27 24 21 19 14 9 4 0 0001 a 1 000 a 2 b c d e Two new instructions: permi5 : Implement cyclic RTG with up to 5 elements permi23 : Implement two independent cycles with 2 and up to 3 elements 6 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  18. Examples r 1 r 2 r 3 r 4 r 5 permi5 r1, r2, r3, r4, r5 7 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  19. Examples r 1 r 2 r 3 r 4 r 5 permi5 r1, r2, r3, r4, r5 r 1 r 2 permi5 r1, r2 7 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  20. Examples r 1 r 2 r 3 r 4 r 5 permi5 r1, r2, r3, r4, r5 r 1 r 2 permi5 r1, r2 r 1 r 2 r 3 r 4 permi23 r1, r2, r3, r4 7 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  21. Code Generation Goal: Generate efficient code using permi instructions for all RTGs Question: Which RTGs can be implemented using only permi ? 8 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

  22. Code Generation Goal: Generate efficient code using permi instructions for all RTGs Question: Which RTGs can be implemented using only permi ? r 1 r 2 RTGs in permutation form Permutation can be written as a product of cycles Cycles can be implemented with permi s r 3 r 4 r 5 8 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend