Improving Program Efficiency by Packing Instructions into Registers - PowerPoint PPT Presentation

Improving Program Efficiency by Packing Instructions into Registers Stephen Hines, Joshua Green, Gary Tyson, David Whalley Computer Science Dept. Florida State University June 7, 2005

◆ Introduction • Embedded Processor Design Constraints – Power Consumption – Static Code Size – Execution Time • Fetch logic consumes 36% of total processor power on StrongARM – Instruction Cache (IC) and/or ROM — Lower power than a large memory store, but still a fairly large, flat storage method • Instruction encodings can be wasteful with bits – Nowhere near theoretical compression limits – Maximize functionality, but simplify decoding (fixed length) – Most applications only apply a subset of available instructions Improving Program Efficiency by Packing Instructions into Registers slide 1

◆ Access of Data & Instructions Main Memory L2 Cache L1 Data Cache L1 Instruction Cache Data Register File g???g • Each lower layer is designed to improve accessibility of current/frequent items, albeit at a reduction in number of available items • Caching is beneficial, but compilers can do better for the “most frequently” accessed data items (e.g. Register Allocation ) • Instructions have no analogue to the Data Register File (RF) Improving Program Efficiency by Packing Instructions into Registers slide 2

◆ Instruction Register File — IRF IF Stage First Half of ID Stage instruction IF/ID buffer ROM PC or IRF L1 IC • Stores frequently occurring instructions as specified by the compiler (potentially in a partially decoded state) • Allows multiple instruction fetch with packed instructions Improving Program Efficiency by Packing Instructions into Registers slide 3

◆ Dynamic Instruction Redundancy Total Instruction Frequency (%) 100 80 60 average susan 40 pgp patricia gsm 20 jpeg ghostscript 0 16 32 48 64 80 96 112 128 Number of Distinct Instructions • Profiling the largest benchmark in each category of MiBench • 32-entry IRF can capture 66.51% of all dynamic instructions executed on average Improving Program Efficiency by Packing Instructions into Registers slide 4

◆ ISA Modifications • MIPS ISA — commonly known and provides simple encoding • RISA (Register ISA) — instructions available via IRF access • MISA (Memory ISA) — instructions available in memory – Create new instruction formats that can reference multiple RISA instructions — Tightly Packed – Modify original instructions to be able to pack an additional RISA instruction reference — Loosely Packed • Increase packing abilities – Parameterization – Positional Register Specifiers Improving Program Efficiency by Packing Instructions into Registers slide 5

◆ Tightly Packed Instruction Format 6 bits 5 bits 5 bits 5 bits 5 bits 1 5 bits inst4 inst5 opcode inst1 inst2 inst3 param param s • New opcodes for this T-format of MISA instructions • Supports sequential execution of up to 5 RISA instructions from the IRF – Unnecessary fields are padded with nop • Supports up to 2 parameters replacing instruction slots – Parameters can come from 32-entry Immediate Table (IMM) – Each IRF entry retains a default immediate value as well – Branches use these 5-bits for displacements Improving Program Efficiency by Packing Instructions into Registers slide 6

◆ Positional Register Specifiers # RTL RTL (positional) 1 r[2]=R[r[29]+4]; r[2]=R[r[29]+4]; 2 r[2]=r[2]+r[5]; s[0] = s[0] +r[5]; 3 R[r[29]+4]=r[2]; R[ u[2] +4]= s[0] ; . . . . . . 4 r[3]=R[r[29]+4]; r[3]=R[r[29]+4]; 5 r[3]=r[3]+r[5]; s[0] = s[0] +r[5]; 6 R[r[29]+4]=r[3]; R[ u[2] +4]= s[0] ; • Abstract out common register usage patterns (e.g. load/add/store) • Increases code redundancy, so greater opportunity for compression • Positional register values can be obtained via modifications to standard pipeline register forwarding logic Improving Program Efficiency by Packing Instructions into Registers slide 7

◆ Compiler Modifications VPO Profiling C Source Files Compiler Executable Dynamic Static Profile Profile Data Data VPO Executable IRF Analyzer Compiler IRF/IMM Data • VPO — Very Portable Optimizer targeted for SimpleScalar MIPS/Pisa • IRF-resident instructions are selected by a greedy algorithm using profile data including parameterization/positional hints • Iterative packing process using a sliding window to allow branch displacements to slip into (5-bit) range Improving Program Efficiency by Packing Instructions into Registers slide 8

Instruction Register File Original Code Sequence # Instruction Default lw r[3], 8(r[29]) 0 nop NA andi r[3], r[3], 63 1 addiu r[5], r[3], 1 1 addiu r[5], r[3], 32 2 beq r[5], r[0], 0 None addu r[5], r[5], r[4] 3 addu r[5], r[5], r[4] NA beq r[5], r[0], −8 4 andi r[3], r[3],63 63 ... ... ... Immediate Table # Value ... ... 3 32 4 63 ... ... Improving Program Efficiency by Packing Instructions into Registers slide 9

Instruction Register File Original Code Sequence # Instruction Default lw r[3], 8(r[29]) 0 nop NA andi r[3], r[3], 63 1 addiu r[5], r[3], 1 1 addiu r[5], r[3], 32 2 beq r[5], r[0], 0 None addu r[5], r[5], r[4] 3 addu r[5], r[5], r[4] NA beq r[5], r[0], −8 4 andi r[3], r[3],63 63 ... ... ... Immediate Table Marked IRF Sequence # Value lw r[3], 8(r[29]) ... ... IRF[4], default (4) 3 32 IRF[1], param (3) 4 63 IRF[3] ... ... IRF[2], param (branch −8) Improving Program Efficiency by Packing Instructions into Registers slide 9

Instruction Register File Original Code Sequence # Instruction Default lw r[3], 8(r[29]) 0 nop NA andi r[3], r[3], 63 1 addiu r[5], r[3], 1 1 addiu r[5], r[3], 32 2 beq r[5], r[0], 0 None addu r[5], r[5], r[4] 3 addu r[5], r[5], r[4] NA beq r[5], r[0], −8 4 andi r[3], r[3],63 63 ... ... ... Immediate Table Marked IRF Sequence # Value lw r[3], 8(r[29]) ... ... IRF[4], default (4) 3 32 IRF[1], param (3) 4 63 IRF[3] ... ... IRF[2], param (branch −8) Packed Code Sequence lw r[3], 8(r[29]) {4} Improving Program Efficiency by Packing Instructions into Registers slide 9

Instruction Register File Original Code Sequence # Instruction Default lw r[3], 8(r[29]) 0 nop NA andi r[3], r[3], 63 1 addiu r[5], r[3], 1 1 addiu r[5], r[3], 32 2 beq r[5], r[0], 0 None addu r[5], r[5], r[4] 3 addu r[5], r[5], r[4] NA beq r[5], r[0], −8 4 andi r[3], r[3],63 63 ... ... ... Immediate Table Marked IRF Sequence # Value lw r[3], 8(r[29]) ... ... IRF[4], default (4) 3 32 IRF[1], param (3) 4 63 IRF[3] ... ... IRF[2], param (branch −8) Packed Code Sequence lw r[3], 8(r[29]) {4} param3_AC {1,3,2} {3,−5} Improving Program Efficiency by Packing Instructions into Registers slide 9

Instruction Register File Original Code Sequence # Instruction Default lw r[3], 8(r[29]) 0 nop NA andi r[3], r[3], 63 1 addiu r[5], r[3], 1 1 addiu r[5], r[3], 32 2 beq r[5], r[0], 0 None addu r[5], r[5], r[4] 3 addu r[5], r[5], r[4] NA beq r[5], r[0], −8 4 andi r[3], r[3],63 63 ... ... ... Immediate Table Marked IRF Sequence # Value lw r[3], 8(r[29]) ... ... IRF[4], default (4) 3 32 IRF[1], param (3) 4 63 IRF[3] ... ... IRF[2], param (branch −8) Encoded Packed Sequence opcode rs rt immediate irf lw 29 3 8 4 Packed Code Sequence opcode inst1 inst2 inst3 param s param lw r[3], 8(r[29]) {4} 1 3 2 3 1 −5 param3_AC {1,3,2} {3,−5} param3_AC Improving Program Efficiency by Packing Instructions into Registers slide 9

Improving Program Efficiency by Packing Instructions into Registers - PowerPoint PPT Presentation

Improving Program Efficiency by Packing Instructions into Registers Stephen Hines, Joshua Green, Gary Tyson, David Whalley Computer Science Dept. Florida State University June 7, 2005 Introduction Embedded Processor Design Constraints

- - packing p a - packing algo- packing cking rithms algo- a l g o - theorems rithms

Split Packing: An Algorithm for Packing Circles with up to Critical Density Sebastian Morr

Malware Obfuscation Techniques: Packing November 18, 2014 Malware and packing Not packed (20%)

Atlas Refinement with Bounded Packing Efficiency Presented by Jerry Yin Packing efficiency

Sphere packing, lattice packing, and related problems Abhinav Kumar Stony Brook April 25, 2018

Algorithms Theory 13 Bin Packing Prof. Dr. S. Albers Winter term 07/08 Bin packing 1.

Packing patterns in restricted permutations Lara Pudwell faculty.valpo.edu/lpudwell Rutgers

Packing Dimension Results for Anisotropic Gaussian Random Fields Dongsheng Wu Department of

Atlas Refinement with Bounded Packing Efficiency Hao-Yu Liu , Xiao-Ming Fu, Chunyang Ye,

The Kepler Conjecture Adrian Rauchhaus 21. Juni 2018 The Theorem There is no packing of equally

Recent breakthroughs in sphere packing Abhinav Kumar Stony Brook, ICTS November 8, 2019 Abhinav

Prevalence and Impact of Low-Entropy Packing Schemes in the Malware Ecosystem Alessandro

Efficient Parameterized Algorithms for Data Packing Krishnendu Chatterjee, Amir Goharshady ,

Prevalence and Impact of Low-Entropy Packing Schemes in the Malware Ecosystem Alessandro

Register Packing Register Packing Exploiting Narrow- -Width Operands Width Operands Exploiting

Polynomiality for Bin Packing with a Constant Number of Item Types Michel X. Goemans & Thomas

an International Distributed Environment ) Isabella Castiglioni Institute of Molecular

spmR : an R package for fMRI data analysis Yves Rosseel Department of Data Analysis Ghent

Particle Physics: The Standard Model Dirk Zerwas LAL zerwas@lal.in2p3.fr March 8, 2012 Dirk

Particle Physics: The Standard Model Dirk Zerwas LAL zerwas@lal.in2p3.fr March 14, 2013 Dirk

CORRELATIONS IN QE(LIKE) NEUTRINO- NUCLEUS SCATTERING Natalie Jachowicz, T. Van Cuyck, R.

Marginal stability in infinite dimensional Hard Spheres: the Gardner transition and the fullRSB

Challenges and Opportunities for Automated Reasoning John Harrison Intel Corporation 10th

Cardy embedding of random planar maps Nina Holden ETH Z urich, Institute for Theoretical

Improving Program Efficiency by Packing Instructions into Registers - PowerPoint PPT Presentation

Improving Program Efficiency by Packing Instructions into Registers Stephen Hines, Joshua Green, Gary Tyson, David Whalley Computer Science Dept. Florida State University June 7, 2005 Introduction Embedded Processor Design Constraints

- - packing p a - packing algo- packing cking rithms algo- a l g o - theorems rithms

Split Packing: An Algorithm for Packing Circles with up to Critical Density Sebastian Morr

Malware Obfuscation Techniques: Packing November 18, 2014 Malware and packing Not packed (20%)

Atlas Refinement with Bounded Packing Efficiency Presented by Jerry Yin Packing efficiency

Sphere packing, lattice packing, and related problems Abhinav Kumar Stony Brook April 25, 2018

Algorithms Theory 13 Bin Packing Prof. Dr. S. Albers Winter term 07/08 Bin packing 1.

Packing patterns in restricted permutations Lara Pudwell faculty.valpo.edu/lpudwell Rutgers

Packing Dimension Results for Anisotropic Gaussian Random Fields Dongsheng Wu Department of

Atlas Refinement with Bounded Packing Efficiency Hao-Yu Liu , Xiao-Ming Fu, Chunyang Ye,

The Kepler Conjecture Adrian Rauchhaus 21. Juni 2018 The Theorem There is no packing of equally

Recent breakthroughs in sphere packing Abhinav Kumar Stony Brook, ICTS November 8, 2019 Abhinav

Prevalence and Impact of Low-Entropy Packing Schemes in the Malware Ecosystem Alessandro

Efficient Parameterized Algorithms for Data Packing Krishnendu Chatterjee, Amir Goharshady ,

Prevalence and Impact of Low-Entropy Packing Schemes in the Malware Ecosystem Alessandro

Register Packing Register Packing Exploiting Narrow- -Width Operands Width Operands Exploiting

Polynomiality for Bin Packing with a Constant Number of Item Types Michel X. Goemans &amp; Thomas

an International Distributed Environment ) Isabella Castiglioni Institute of Molecular

spmR : an R package for fMRI data analysis Yves Rosseel Department of Data Analysis Ghent

Particle Physics: The Standard Model Dirk Zerwas LAL zerwas@lal.in2p3.fr March 8, 2012 Dirk

Particle Physics: The Standard Model Dirk Zerwas LAL zerwas@lal.in2p3.fr March 14, 2013 Dirk

CORRELATIONS IN QE(LIKE) NEUTRINO- NUCLEUS SCATTERING Natalie Jachowicz, T. Van Cuyck, R.

Marginal stability in infinite dimensional Hard Spheres: the Gardner transition and the fullRSB

Challenges and Opportunities for Automated Reasoning John Harrison Intel Corporation 10th

Cardy embedding of random planar maps Nina Holden ETH Z urich, Institute for Theoretical

Polynomiality for Bin Packing with a Constant Number of Item Types Michel X. Goemans & Thomas