Approximating to the Last Bit Thierry Moreau , Adrian Sampson, Luis - PowerPoint PPT Presentation

Approximating to the Last Bit Thierry Moreau , Adrian Sampson, Luis Ceze {moreau, luisceze}@cs.washington.edu, asampson@cornell.edu WAX 2016 co-located with ASPLOS 2016 April 3rd 2016

What this Talk is About How many bits in a program are really that important? 1 - AXE: Quality Tuning Framework 2 - PERFECT Benchmark Study 2

Precision Tuning More precision means larger memory footprint , more data movement , more energy used in computation 3

Precision Tuning More precision means larger memory footprint , more data movement , more energy used in computation float double 4

Precision Tuning More precision means larger memory footprint , more data movement , more energy used in computation float double 1 n 5

AXE Precision Tuning Framework Goal: Maximize Bit-Savings given a Quality Target 6

AXE Precision Tuning Framework quality target quality & bit-savings AXE kernel.c instruction-level framework precision requirements Built on top of ACCEPT , the approximate C/C++ compiler http://accept.rocks 7

AXE Precision Tuning Framework Default (no bit-savings) bit-savings instruction 0 instruction 1 instruction 2 … instruction n-1 instruction n bad OK quality 8

AXE Precision Tuning Framework Coarse-Grained Precision Reduction bit-savings instruction 0 instruction 1 instruction 2 … instruction n-1 instruction n bad OK quality 9

AXE Precision Tuning Framework Fine-Grained Precision Reduction bit-savings instruction 0 instruction 1 instruction 2 … instruction n-1 instruction n bad OK quality 10

PERFECT Benchmark Suite Application Domain Kernels Metric Discrete Wavelet Transform PERFECT Application 1 2D Convolution Histogram Equalization Outer Product Space Time Adaptive System Solve Processing Signal to Noise Ratio Inner Product (SNR) Interpolation 1 Synthetic Aperture Radar Interpolation 2 [120dB to 10dB] Back Projection (0.0001% to 31.6% MSE) Debayer Wide Area Motion Image Registration Imaging Change Detection FFT 1D Required Kernels FFT 2D 11

1 - PERFECT Dynamic Instruction Mix control 11% load/store 27% int arith 25% int arith 4% math fp arith 1% 31% Safe to approximate Precise 12

1 - PERFECT Dynamic Instruction Mix math fp arith 1% 31% Safe to approximate Long latency ops are all safe to approximate Precise 13

1 - PERFECT Dynamic Instruction Mix Memory ops are mostly safe to approximate (mostly data vs. pointers) load/store 27% Safe to approximate Precise 14

1 - PERFECT Dynamic Instruction Mix Control and address control computation must 11% remain precise int arith 25% Safe to approximate Precise 15

2 - Bit-Savings over Approximate Instructions 100% Approximate High Quality 83% 80% 74% Bit-Savings 60% 57% 48% 40% 40% 32% 26% 20% 0% 10 20 40 60 80 100 120 Average SNR (dB) 16

2 - Bit-Savings over Approximate Instructions 100% 83% 80% 74% Bit-Savings 60% PERFECT Manual 57% 0.001% MSE 48% 40% 40% 32% 26% 20% 0% 10 20 40 60 80 100 120 Average SNR (dB) 17

2 - Bit-Savings over Approximate Instructions Approximate 100% Computing 10% MSE 83% 80% 74% Bit-Savings 60% PERFECT Manual 57% 0.001% MSE 48% 40% 40% 32% 26% 20% 0% 10 20 40 60 80 100 120 Average SNR (dB) 18

Future Architectural Challenges Mechanisms to translate bit-savings into energy savings? New data types/representations? ISA extensions? 19

Thank You! Approximating to the Last Bit Thierry Moreau , Luis Ceze, Adrian Sampson {moreau, luisceze}@cs.washington.edu, asampson@cornell.edu WAX 2016 co-located with ASPLOS 2016 April 3rd 2016 20

Backup Slides 21

Bit Savings Explore the opportunity for precision reduction in a hardware-agnostic way ( precision ref − precision approx ) execs X BitSavings = × precision ref execs total insn static 22

Framework Overview Output Quality Results Quality Autotuner Configuration & Bit Savings ACCEPT ACCEPT Execution & Annotated Approximate static analysis ILPC* error injection & Quality Program Binary instrumentation Assessment Program Inputs & Quality Metrics * Instruction-level Precision Configuration Built on top of ACCEPT , the approximate C/C++ compiler http://accept.rocks 23

Program Output Quality Results Quality Autotuner Configuration & Bit Savings ACCEPT ACCEPT Execution & Annotated Approximate static analysis ILPC* error injection & Quality Program Binary instrumentation Assessment Annotation Program Inputs & Quality Metrics * Instruction-level Precision Configuration void conv2d (pix *in, pix *out, flt *filter) { for (row) { for (col) { flt sum = 0 int dstPos = … for (row_offset) { for (col_offset) { int srcPos = … int fltPos = … sum += in[srcPos] * filter[fltPos] } } out[dstPos] = sum / normFactor } } } 24

Program Output Quality Results Quality Autotuner Configuration & Bit Savings ACCEPT ACCEPT Execution & Annotated Approximate static analysis ILPC* error injection & Quality Program Binary instrumentation Assessment Annotation Program Inputs & Quality Metrics * Instruction-level Precision Configuration void conv2d ( APPROX pix *in, APPROX pix *out, APPROX flt *filter) { for (row) { for (col) { APPROX flt sum = 0 Key: use the APPROX int dstPos = … type qualifier for (row_offset) { for (col_offset) { int srcPos = … int fltPos = … sum += in[srcPos] * filter[fltPos] } } out[dstPos] = sum / normFactor } } } 25

Program Output Quality Results Quality Autotuner Configuration & Bit Savings ACCEPT ACCEPT Execution & Annotated Approximate static analysis ILPC* error injection & Quality Program Binary instrumentation Assessment Annotation Program Inputs & Quality Metrics * Instruction-level Precision Configuration tips on annotating programs faster typedef float flt typedef int pix typedef APPROX float flt typedef APPROX int pix Takeways: Annotating data is intuitive (~10 mins to annotate a kernel) Variables used to index arrays cannot be safely approximated 26

Static Output Quality Results Quality Autotuner Configuration & Bit Savings ACCEPT ACCEPT Execution & Annotated Approximate static analysis ILPC* error injection & Quality Program Binary instrumentation Assessment Analysis Program Inputs & Quality Metrics * Instruction-level Precision Configuration void conv2d ( APPROX pix *in, APPROX pix *out, APPROX flt *filter) Instruction-Level { for (row) { Precision Configuration for (col) { (ILPC) APPROX flt sum = 0 ACCEPT int dstPos = … for (row_offset) { conv2d:13:7:load:Int32 for (col_offset) { conv2d:13:10:load:Float int srcPos = … conv2d:13:11:fmul:Float int fltPos = … sum += in[srcPos] * filter[fltPos] conv2d:13:12:fadd:Float } conv2d:15:1:fdiv:Float } conv2d:15:7:store:Int32 out[dstPos] = sum / normFactor } } } ACCEPT identified safe-to-approximate instructions from data annotations using flow analysis 27

Error Output Quality Results Quality Autotuner Configuration & Bit Savings ACCEPT ACCEPT Execution & Annotated Approximate static analysis ILPC* error injection & Quality Program Binary instrumentation Assessment Injection Program Inputs & Quality Metrics * Instruction-level Precision Configuration Instruction-Level Precision Configuration (ILPC) ACCEPT Approximate conv2d:13:7:load:Int32 conv2d:13:10:load:Float Binary conv2d:13:11:fmul:Float conv2d:13:12:fadd:Float conv2d:15:1:fdiv:Float conv2d:15:7:store:Int32 Each instruction in the ILCP acts as a quality knob that the autotuner can use to maximize bit-savings 28

Quality Output Quality Results Quality Autotuner Configuration & Bit Savings ACCEPT ACCEPT Execution & Annotated Approximate static analysis ILPC* error injection & Quality Program Binary instrumentation Assessment Assessment Program Inputs & Quality Metrics * Instruction-level Precision Configuration Reference Binary eval.py Approximate Binary 10dB SNR The programmer provides a quality assessment script to evaluate quality on the program output 29

Output Quality Results Quality Autotuner Configuration & Bit Savings Autotuner ACCEPT ACCEPT Execution & Annotated Approximate static analysis ILPC* error injection & Quality Program Binary instrumentation Assessment Program Inputs & Quality Metrics * Instruction-level Precision Configuration Greedy iterative algorithm : reduces precision requirement of the instruction that impacts quality the least … config k: error = 0.10% config [k+1, i-1]: config [k+1, i]: config [k+1, i+1]: … … error = 5.91% error = 0.30% error = 0.12% config [k+2, i-1]: config [k+2, i]: config [k+2, i+1]: … … error = 5.91% error = 0.33% error = 1.6% … Finds solution in O(m 2 n) worst case where m is the number of static safe-to- approximate instructions and n are the levels of precision for all instructions 30

Approximating to the Last Bit Thierry Moreau , Adrian Sampson, Luis - PowerPoint PPT Presentation

Approximating to the Last Bit Thierry Moreau , Adrian Sampson, Luis Ceze {moreau, luisceze}@cs.washington.edu, asampson@cornell.edu WAX 2016 co-located with ASPLOS 2016 April 3rd 2016 What this Talk is About How many bits in a program are

Listing Bit Strings List all bit strings of length 3. Listing Bit Strings List all bit strings

Lecture 13 : Lecture 13 : Special Bit Instructions Todays Goals L Learn bit-set and

Bit Basics Eric McCreath Bit Basics A bit (Binary digIT) is single unit of binary storage. A bit

https://bit.ly/3pptcRS 3 4 https://bit.ly/2UiBgWq Vase Face Face https://bit.ly/3luge2Q

The MIPS instruction set architecture The MIPS has a 32 bit architecture, with 32 bit

Bit Basics A bit (Binary digIT) is single unit of binary storage. A bit is normally group with

Approximating the Diameter, Width, Smallest Enclosing Cylinder, and Minimum-Width Annulus

Approximating a Motorcycle Graph by a Straight Skeleton Stefan Huber Martin Held Universit

Approximating the rectilinear crossing number Jacob Fox, J anos Pach, Andrew Suk September 17,

Approximating Bounded Degree Boolean #CSP . Jingcheng Liu SJTU-ACM Class 2010 October 13, 2013

Approximating Submodular Functions Everywhere Nick Harvey February 16, 2008 Joint work with M.

Radoslav Fulek,IST Austria C-planarity & Approximating Maps C-planarity (Feng, Cohen and Eades

Approximating incomputable sets Timothy H. McNicholl Department of Mathematics Iowa State

Approximating partial by total: fixpoint characterizations of back-and-forth equivalences Samson

Supporting 64 bit pointers in RISCV 32 bit LLVM backend Reshabh Sharma Background: Prof.

Bit manipulations Operate on the bits of integers (0,...,31 for 4-byte integer) Single-bit

15 GARDEN RIDE SLIDE Assembly & Installation Instructions ATTENTION INSTALLERS: THESE

CI Security Mike Hamilton Founder and CISO April 19, 2019 2 Surviving 2019 and Beyond

Lecture 4 Jan 19, 2010 CS 886 1 CS486/686 Lecture Slides (c) 2010 C. Boutilier, P.Poupart &

OF ECLIPSE M2M IoT! Benjamin Cab @kartben Eclipse Foundation Protocols Services Tools

9. The Universe 9.1 The Universe and Solar System 9.2 Seasons and the Moon 9.1 The Universe

Lecture 24: Language 1 Announcements Project proposal comments out Chat with me at office

Liprista Orange A ? Made-to-order lipstick retail service is inefficient 20 ? 21 ? 22

Xcerpt and visXcerpt: Integrating Web Querying Sacha Berger Franois Bry Institute for

Approximating to the Last Bit Thierry Moreau , Adrian Sampson, Luis - PowerPoint PPT Presentation

Approximating to the Last Bit Thierry Moreau , Adrian Sampson, Luis Ceze {moreau, luisceze}@cs.washington.edu, asampson@cornell.edu WAX 2016 co-located with ASPLOS 2016 April 3rd 2016 What this Talk is About How many bits in a program are

Listing Bit Strings List all bit strings of length 3. Listing Bit Strings List all bit strings

Lecture 13 : Lecture 13 : Special Bit Instructions Todays Goals L Learn bit-set and

Bit Basics Eric McCreath Bit Basics A bit (Binary digIT) is single unit of binary storage. A bit

https://bit.ly/3pptcRS 3 4 https://bit.ly/2UiBgWq Vase Face Face https://bit.ly/3luge2Q

The MIPS instruction set architecture The MIPS has a 32 bit architecture, with 32 bit

Bit Basics A bit (Binary digIT) is single unit of binary storage. A bit is normally group with

Approximating the Diameter, Width, Smallest Enclosing Cylinder, and Minimum-Width Annulus

Approximating a Motorcycle Graph by a Straight Skeleton Stefan Huber Martin Held Universit

Approximating the rectilinear crossing number Jacob Fox, J anos Pach, Andrew Suk September 17,

Approximating Bounded Degree Boolean #CSP . Jingcheng Liu SJTU-ACM Class 2010 October 13, 2013

Approximating Submodular Functions Everywhere Nick Harvey February 16, 2008 Joint work with M.

Radoslav Fulek,IST Austria C-planarity &amp; Approximating Maps C-planarity (Feng, Cohen and Eades

Approximating incomputable sets Timothy H. McNicholl Department of Mathematics Iowa State

Approximating partial by total: fixpoint characterizations of back-and-forth equivalences Samson

Supporting 64 bit pointers in RISCV 32 bit LLVM backend Reshabh Sharma Background: Prof.

Bit manipulations Operate on the bits of integers (0,...,31 for 4-byte integer) Single-bit

15 GARDEN RIDE SLIDE Assembly &amp; Installation Instructions ATTENTION INSTALLERS: THESE

CI Security Mike Hamilton Founder and CISO April 19, 2019 2 Surviving 2019 and Beyond

Lecture 4 Jan 19, 2010 CS 886 1 CS486/686 Lecture Slides (c) 2010 C. Boutilier, P.Poupart &amp;

OF ECLIPSE M2M IoT! Benjamin Cab @kartben Eclipse Foundation Protocols Services Tools

9. The Universe 9.1 The Universe and Solar System 9.2 Seasons and the Moon 9.1 The Universe

Lecture 24: Language 1 Announcements Project proposal comments out Chat with me at office

Liprista Orange A ? Made-to-order lipstick retail service is inefficient 20 ? 21 ? 22

Xcerpt and visXcerpt: Integrating Web Querying Sacha Berger Franois Bry Institute for

Radoslav Fulek,IST Austria C-planarity & Approximating Maps C-planarity (Feng, Cohen and Eades

15 GARDEN RIDE SLIDE Assembly & Installation Instructions ATTENTION INSTALLERS: THESE

Lecture 4 Jan 19, 2010 CS 886 1 CS486/686 Lecture Slides (c) 2010 C. Boutilier, P.Poupart &