Hardware Acceleration for Programs in SSA Form Manuel Mohr, Artjom - PowerPoint PPT Presentation

saarland university computer science Hardware Acceleration for Programs in SSA Form Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, Jörg Henkel Institute for Program Structures and Data Organization, Karlsruhe Institute of Technology (KIT) 1 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC KIT – University of the State of Baden-Wuerttemberg and www.kit.edu Jörg Henkel – Hardware Acceleration for Programs in SSA Form National Research Center of the Helmholtz Association

SSA-Based Register Allocation Not in Static Single In Static Single Assignment Form Assignment Form Front end Parsing Middle end Optimizations Back end Register Allocation 2 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

SSA-Based Register Allocation Not in Static Single In Static Single Assignment Form Assignment Form Front end Parsing Middle end Optimizations Back end SSA-Based Register Allocation 2 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

SSA-Based Register Allocation Not in Static Single In Static Single Assignment Form Assignment Form Front end Parsing Middle end Fewer spills but more shuffle code Optimizations Back end SSA-Based Register Allocation 2 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

Register Transfer Graphs Shuffle code = parallel copy operations between registers 3 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

Register Transfer Graphs Shuffle code = parallel copy operations between registers r 1 r 2 r 3 r 4 r 5 Register Transfer Graph (RTG) Nodes: Registers Directed edge ( r 1 , r 2 ) : After copies, value of r 1 must be in r 2 At most one incoming edge per node No incoming edge: Register value is irrelevant after copies 3 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

Motivation Number and size of RTGs depend on quality of allocation Reduction is an NP-complete problem r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 ⇒ On standard hardware, implementation may be expensive: 5 % to 20 % of all generated instructions (SPEC) 4 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

Motivation Number and size of RTGs depend on quality of allocation Reduction is an NP-complete problem r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 ⇒ On standard hardware, implementation may be expensive: 5 % to 20 % of all generated instructions (SPEC) mov r2 , r1 xor r6 , r7 xor r4 , r5 mov r3 , r2 xor r6 , r5 xor r5 , r4 mov r7 , r8 xor r5 , r6 xor r4 , r3 xor r6 , r7 xor r6 , r5 xor r3 , r4 xor r7 , r6 xor r5 , r4 xor r4 , r3 4 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

Motivation Number and size of RTGs depend on quality of allocation Reduction is an NP-complete problem r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 ⇒ On standard hardware, implementation may be expensive: 5 % to 20 % of all generated instructions (SPEC) Question 1: Is it possible to create an instruction set extension that allows implementing an RTG in one processor cycle? 4 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

Motivation Number and size of RTGs depend on quality of allocation Reduction is an NP-complete problem r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 ⇒ On standard hardware, implementation may be expensive: 5 % to 20 % of all generated instructions (SPEC) Question 1: Is it possible to create an instruction set extension that allows implementing an RTG in one processor cycle? Question 2: Is it worth it? 4 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

Fundamental Hardware Constraints Changing contents of multiple registers in one cycle very costly 5 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

Fundamental Hardware Constraints Changing contents of multiple registers in one cycle very costly Idea: Modify access to register file instead of contents Swap r 1 and r 2 : Exchange the access to r 1 and r 2 r 1 42 r 2 23 Register File 5 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

Fundamental Hardware Constraints Changing contents of multiple registers in one cycle very costly Idea: Modify access to register file instead of contents Swap r 1 and r 2 : Exchange the access to r 1 and r 2 r 1 42 r 2 23 Register File ⇒ Restriction to permutations of registers 5 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

ISA Extension Add permutation instructions to SPARC V8 ISA 32 registers ⇒ 5 bits to identify one register 7 bits for opcode ⇒ 25 bits left for encoding 5 register numbers 31 27 24 21 19 14 9 4 0 0001 a 1 000 a 2 b c d e 6 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

ISA Extension Add permutation instructions to SPARC V8 ISA 32 registers ⇒ 5 bits to identify one register 7 bits for opcode ⇒ 25 bits left for encoding 5 register numbers 31 27 24 21 19 14 9 4 0 0001 a 1 000 a 2 b c d e Two new instructions: permi5 : Implement cyclic RTG with up to 5 elements permi23 : Implement two independent cycles with 2 and up to 3 elements 6 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

Examples r 1 r 2 r 3 r 4 r 5 permi5 r1, r2, r3, r4, r5 7 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

Examples r 1 r 2 r 3 r 4 r 5 permi5 r1, r2, r3, r4, r5 r 1 r 2 permi5 r1, r2 7 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

Examples r 1 r 2 r 3 r 4 r 5 permi5 r1, r2, r3, r4, r5 r 1 r 2 permi5 r1, r2 r 1 r 2 r 3 r 4 permi23 r1, r2, r3, r4 7 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

Code Generation Goal: Generate efficient code using permi instructions for all RTGs Question: Which RTGs can be implemented using only permi ? 8 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

Code Generation Goal: Generate efficient code using permi instructions for all RTGs Question: Which RTGs can be implemented using only permi ? r 1 r 2 RTGs in permutation form Permutation can be written as a product of cycles Cycles can be implemented with permi s r 3 r 4 r 5 8 October 1, 2013 Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, IPD, ITEC Jörg Henkel – Hardware Acceleration for Programs in SSA Form

Hardware Acceleration for Programs in SSA Form Manuel Mohr, Artjom - PowerPoint PPT Presentation

saarland university computer science Hardware Acceleration for Programs in SSA Form Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, Jrg Henkel Institute for Program Structures and Data Organization, Karlsruhe

SSA Technicalities Last Time Introduced SSA Today Aliasing in SSA Building SSA

SSA form SSA form Michel Schinz SSA form Straight-line code Transforming a piece of

SSA and DFAs Simone Campanoni simonec@eecs.northwestern.edu SSA Outline SSA and why?

SSA Form & SSA-form: x 17-4 Each name is defined exactly once. Dead Code Elimination

SSA form Static single assignment (SSA) form Michel Schinz Advanced Compiler Construction

CO444H SSA SSA Construction SSA-based analysis Ben Livshits 1 Refresher: Reaching Definitions

Sta$c Single Assignment (SSA) Form SSA form Sta$c single

Static Single Assignment Form Last Time Static single assignment (SSA) form Today

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Building SSA Form Each use refers to exactly one name x 17 - 4 Whats hard? x a +

Hardware Acceleration of Hardware Acceleration of Graphics and Imaging Graphics and Imaging

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Presentation Agenda Acceleration Programs Support Programs Supplemental Programs 2 1

SSA in Scheme Static single assignment (SSA) : assignment conversion (boxing),

acceleration Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada NSS acceleration

SSA-Form Register Allocation Foundations Sebastian Hack Compiler Construction Course Winter

Discourse BSc Artificial Intelligence, Spring 2011 Raquel Fernndez Institute for Logic,

Neural Networks Marco Chiarandini Department of Mathematics & Computer Science University of

Unitary forms of Kac-Moody groups Cornell University Lie Seminar Spring 2009 February 20, 2009

Motivation Problem Statement Related work The SMART Approach Lack of

Recent Results on Stream Ciphers Subhamoy Maitra Applied Statistics Unit Indian Statistical

A Case Against Currently Used Hash Functions in RFID Protocols Workshop on RFID Security 2006

Spin- -orbit interactions in black hole orbit interactions in black hole binaries binaries Spin

Lecture 9: More on predicate logic Prof. Julia Hockenmaier juliahmr@illinois.edu

Hardware Acceleration for Programs in SSA Form Manuel Mohr, Artjom - PowerPoint PPT Presentation

saarland university computer science Hardware Acceleration for Programs in SSA Form Manuel Mohr, Artjom Grudnitsky, Tobias Modschiedler, Lars Bauer, Sebastian Hack, Jrg Henkel Institute for Program Structures and Data Organization, Karlsruhe

SSA Technicalities Last Time Introduced SSA Today Aliasing in SSA Building SSA

SSA form SSA form Michel Schinz SSA form Straight-line code Transforming a piece of

SSA and DFAs Simone Campanoni simonec@eecs.northwestern.edu SSA Outline SSA and why?

SSA Form &amp; SSA-form: x 17-4 Each name is defined exactly once. Dead Code Elimination

SSA form Static single assignment (SSA) form Michel Schinz Advanced Compiler Construction

CO444H SSA SSA Construction SSA-based analysis Ben Livshits 1 Refresher: Reaching Definitions

Sta$c Single Assignment (SSA) Form SSA form Sta$c single

Static Single Assignment Form Last Time Static single assignment (SSA) form Today

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Building SSA Form Each use refers to exactly one name x 17 - 4 Whats hard? x a +

Hardware Acceleration of Hardware Acceleration of Graphics and Imaging Graphics and Imaging

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Presentation Agenda Acceleration Programs Support Programs Supplemental Programs 2 1

SSA in Scheme Static single assignment (SSA) : assignment conversion (boxing),

acceleration Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada NSS acceleration

SSA-Form Register Allocation Foundations Sebastian Hack Compiler Construction Course Winter

Discourse BSc Artificial Intelligence, Spring 2011 Raquel Fernndez Institute for Logic,

Neural Networks Marco Chiarandini Department of Mathematics &amp; Computer Science University of

Unitary forms of Kac-Moody groups Cornell University Lie Seminar Spring 2009 February 20, 2009

Motivation Problem Statement Related work The SMART Approach Lack of

Recent Results on Stream Ciphers Subhamoy Maitra Applied Statistics Unit Indian Statistical

A Case Against Currently Used Hash Functions in RFID Protocols Workshop on RFID Security 2006

Spin- -orbit interactions in black hole orbit interactions in black hole binaries binaries Spin

Lecture 9: More on predicate logic Prof. Julia Hockenmaier juliahmr@illinois.edu

SSA Form & SSA-form: x 17-4 Each name is defined exactly once. Dead Code Elimination

Neural Networks Marco Chiarandini Department of Mathematics & Computer Science University of