Fast Software-managed Code Decompression Charles Lefurgy and Trevor - PowerPoint PPT Presentation

Fast Software-managed Code Decompression Charles Lefurgy and Trevor Mudge Advanced Computer Architecture Laboratory Electrical Engineering and Computer Science Dept. The University of Michigan, Ann Arbor Compiler and Architecture Support for Embedded Systems (CASES) October 1-3, 1999

Motivation • Problem: embedded code size CPU RAM ROM – Constraints: cost, area, and power Program I/O – Fit program in on-chip memory – Compilers vs. hand-coded assembly Original Program • Solution: code compression – Reduce compiled code size – Take advantage of instruction repetition ROM CPU RAM • Benefits I/O – On-chip memory used more effectively Compressed Program – Trade-off performance for code density – Systems use cheaper processors with smaller on-chip memories Embedded Systems 2

Hardware or software decompression? • Hardware – Faster translation – CodePack, MIPS-16, Thumb • Software – Smaller physical area – Lower cost – Quicker re-targeting to new compression algorithms – Rivals HW solutions on some (loopy) benchmarks 3

Kirovski et al., 1997 • Overview – Procedure Compression – Decompress and execute 1 procedure at a time – Store decompressed code in procedure cache – Cache management • Results – 60% compression ratio on SPARC – 166% execution penalty with 64KB procedure cache HLL Native LZ F() {...} F: load r5,4 F: 10010... ... HLL Native LZ G() {...} G: addi r7,8 G: 00101... ... Compile LZ Compress Decompressor P-cache manager 4

Dictionary compression algorithm • Dictionary contains unique instructions • Replace program instructions with short index 32 bits 16 bits 32 bits Add r1,r2,r3 5 Add r1,r2,r3 Add r1,r2,r4 Add r1,r2,r3 5 Add r1,r2,r4 30 .dictionary segment Add r1,r2,r4 30 Add r1,r2,r4 30 .text segment .text segment (contains indices) Original program Compressed program 5

Compression ratio compressed size = = = = compressio n ratio • original size • Compression ratios – Dictionary: 65% - 82% – LZRW1: 55% - 63% Benchmark Original Dict. Compression LZRW1 Compression cc1 1,083,168 65.4% 60.4% vortex 495,248 65.8% 55.5% go 310,576 69.6% 63.9% perl 267,568 73.7% 60.2% ijpeg 198,272 77.2% 61.5% mpeg2enc 119,600 82.5% 60.5% pegwit 88,800 79.5% 56.7% 6

Decompression code • Simple – Small static code size: 25 instructions • Fast – Less than 3 instructions per output byte – 74 dynamic instructions per decompressed cache line • Algorithm – Invoke decompressor on L1 I-cache miss – Decompress 1 complete cache line – For each instruction in cache line • Read index • Reference dictionary with index to get instruction • Put instruction in I-cache • HW Support – L1-cache miss exception – Write into I-cache 7

Optimizations • Partial decompression – compress from missed instruction to end of cache line – use a valid bit per word in cache line to mark instructions at beginning of line as invalid – avoids decompressing instructions that may not be executed – up to 12% speedup • Second register file – Many embedded processors have an additional register file – Avoid save/restore of registers when decompressor runs – 2nd register file with partial decompression: up to 16% speedup 8

Simulation environment • SimpleScalar – Modified to support compression • 5 stage, in-order pipeline – Simple embedded processor • D-cache – 8KB, 16B lines, 2-way • I-cache – 1 to 64KB, 32B lines, 2-way • Memory – 10 cycle latency, 2 cycle rate 9

Performance: cc1 6 compressed 5 partial partial+regfile 4 Slowdown native relative to 3 native code 2 1 0 1KB 4KB 16KB 64KB I-cache size (KB) 10

Performance: ijpeg 6 compressed 5 partial 4 partial+regfile Slowdown native 3 relative to 2 native code 1 0 1KB 4KB 16KB 64KB I-cache size (KB) 11

Performance summary • Data from CINT95, MediaBench with several cache sizes • Control slowdown by optimizing I-cache miss ratio – Code layout may help 6 5 4 Slowdown relative to 3 native code compressed 2 partial partial+regfile 1 0 0% 5% 10% 15% I-cache miss ratio 12

Performance summary, cont. • Magnification of previous graph • Slowdown under 3x when I-miss ratio is under 2% • Slowdown under 2x when I-miss ratio is under 1% 4 3 Slowdown relative to 2 native code compressed partial 1 partial+regfile 0 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% I-cache miss ratio 13

Conclusions • Line-based decompression beats procedure-based – use normal cache as decompression buffer – no fragmentation management as in procedure-based decompression – order of magnitude performance difference – A previous decompressor with procedure granularity had 100x slowdown on gcc and go [Kirovski97] • Compressed code fills gap – has quick execution of native code – has small size of interpreted code 14

Web page http://www.eecs.umich.edu/~tnm/compress 15

Fast Software-managed Code Decompression Charles Lefurgy and Trevor - PowerPoint PPT Presentation

Fast Software-managed Code Decompression Charles Lefurgy and Trevor Mudge Advanced Computer Architecture Laboratory Electrical Engineering and Computer Science Dept. The University of Michigan, Ann Arbor Compiler and Architecture Support for

Singular curve point decompression attack Peter Gnther joint work with Johannes Blmer

Compression and Decompression in Cognition Vertolli, M. O., Kelly, M., & Davies, J.

Introducing Sterling Managed Accounts Managed Accounts Like a managed fund (and fund of funds)

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Reducing Code Size with Run-time Decompression Charles Lefurgy, Eva Piccininni, and Trevor Mudge

Managed Lanes in California: Where Weve Been Where We ve Been Where Were Going Joe Rouse

Managed Services Managed Services Managed Services Welcome to Kaseya.edu www.kaseya.com

Hardware Decompression for Compressed Sensing Applications Keith Dronson Frank Zovko Samuel

Code Generation Machine code generation cs4713 1 Machine code generation machine Intermediate

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

SHAGE: A Framework for SHAGE: A Framework for Self- -managed Robot managed Robot Self

INDICATE Injection or Decompression in Carpal Tunnel Syndrome: A Feasibility Study Presenting

Advances In Minamilly Invasive Spine Surgery: Minimally invasive spinal posterior decompression

shows: Finger Thoracostomy Vs Chest Tube Insertion Vs Needle Decompression Cynthia Griffin

COMPRESSING XKCD IMAGES By Akarsh Kumar XKCD IMAGE EXAMPLE COMPRESSION AND DECOMPRESSION

Fully endoscopic microvascular decompression for hemifacial spasm Tracy M. Flanders, MD 1 ,

What is the State of Neural Network Pruning? Davis Blalock* Jose Javier Gonzalez* Jonathan

Rate Distortion for Model Compression: From Theory To Practice Weihao Gao , Yu-Han Liu ,

Delta-DNN : Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity The 49

What Youll Learn Today Review: how ASCII works and the great unfairness of bits What

Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation

Data suppression and compression SW in DUNE detector simula6on

String Attractors: A unifying theory of repetitiveness Dominik Kempa 1 Nicola Prezza 2 1

In the Compression Hornet's Nest: A Security Study of Data Compression in Network Services

Fast Software-managed Code Decompression Charles Lefurgy and Trevor - PowerPoint PPT Presentation

Fast Software-managed Code Decompression Charles Lefurgy and Trevor Mudge Advanced Computer Architecture Laboratory Electrical Engineering and Computer Science Dept. The University of Michigan, Ann Arbor Compiler and Architecture Support for

Singular curve point decompression attack Peter Gnther joint work with Johannes Blmer

Compression and Decompression in Cognition Vertolli, M. O., Kelly, M., &amp; Davies, J.

Introducing Sterling Managed Accounts Managed Accounts Like a managed fund (and fund of funds)

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Reducing Code Size with Run-time Decompression Charles Lefurgy, Eva Piccininni, and Trevor Mudge

Managed Lanes in California: Where Weve Been Where We ve Been Where Were Going Joe Rouse

Managed Services Managed Services Managed Services Welcome to Kaseya.edu www.kaseya.com

Hardware Decompression for Compressed Sensing Applications Keith Dronson Frank Zovko Samuel

Code Generation Machine code generation cs4713 1 Machine code generation machine Intermediate

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

SHAGE: A Framework for SHAGE: A Framework for Self- -managed Robot managed Robot Self

INDICATE Injection or Decompression in Carpal Tunnel Syndrome: A Feasibility Study Presenting

Advances In Minamilly Invasive Spine Surgery: Minimally invasive spinal posterior decompression

shows: Finger Thoracostomy Vs Chest Tube Insertion Vs Needle Decompression Cynthia Griffin

COMPRESSING XKCD IMAGES By Akarsh Kumar XKCD IMAGE EXAMPLE COMPRESSION AND DECOMPRESSION

Fully endoscopic microvascular decompression for hemifacial spasm Tracy M. Flanders, MD 1 ,

What is the State of Neural Network Pruning? Davis Blalock* Jose Javier Gonzalez* Jonathan

Rate Distortion for Model Compression: From Theory To Practice Weihao Gao , Yu-Han Liu ,

Delta-DNN : Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity The 49

What Youll Learn Today Review: how ASCII works and the great unfairness of bits What

Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation

Data suppression and compression SW in DUNE detector simula6on

String Attractors: A unifying theory of repetitiveness Dominik Kempa 1 Nicola Prezza 2 1

In the Compression Hornet's Nest: A Security Study of Data Compression in Network Services

Compression and Decompression in Cognition Vertolli, M. O., Kelly, M., & Davies, J.