approximating to the last bit
play

Approximating to the Last Bit Thierry Moreau , Adrian Sampson, Luis - PowerPoint PPT Presentation

Approximating to the Last Bit Thierry Moreau , Adrian Sampson, Luis Ceze {moreau, luisceze}@cs.washington.edu, asampson@cornell.edu WAX 2016 co-located with ASPLOS 2016 April 3rd 2016 What this Talk is About How many bits in a program are


  1. Approximating to the Last Bit Thierry Moreau , Adrian Sampson, Luis Ceze {moreau, luisceze}@cs.washington.edu, asampson@cornell.edu WAX 2016 co-located with ASPLOS 2016 April 3rd 2016

  2. What this Talk is About How many bits in a program are really that important? 1 - AXE: Quality Tuning Framework 2 - PERFECT Benchmark Study 2

  3. Precision Tuning More precision means larger memory footprint , more data movement , more energy used in computation 3

  4. Precision Tuning More precision means larger memory footprint , more data movement , more energy used in computation float double 4

  5. Precision Tuning More precision means larger memory footprint , more data movement , more energy used in computation float double 1 n 5

  6. AXE Precision Tuning Framework Goal: Maximize Bit-Savings given a Quality Target 6

  7. AXE Precision Tuning Framework quality target quality & bit-savings AXE kernel.c instruction-level framework precision requirements Built on top of ACCEPT , the approximate C/C++ compiler http://accept.rocks 7

  8. AXE Precision Tuning Framework Default (no bit-savings) bit-savings instruction 0 instruction 1 instruction 2 … instruction n-1 instruction n bad OK quality 8

  9. AXE Precision Tuning Framework Coarse-Grained Precision Reduction bit-savings instruction 0 instruction 1 instruction 2 … instruction n-1 instruction n bad OK quality 9

  10. AXE Precision Tuning Framework Fine-Grained Precision Reduction bit-savings instruction 0 instruction 1 instruction 2 … instruction n-1 instruction n bad OK quality 10

  11. PERFECT Benchmark Suite Application Domain Kernels Metric Discrete Wavelet Transform PERFECT Application 1 2D Convolution Histogram Equalization Outer Product Space Time Adaptive System Solve Processing Signal to Noise Ratio Inner Product (SNR) Interpolation 1 Synthetic Aperture Radar Interpolation 2 [120dB to 10dB] Back Projection (0.0001% to 31.6% MSE) Debayer Wide Area Motion Image Registration Imaging Change Detection FFT 1D Required Kernels FFT 2D 11

  12. 1 - PERFECT Dynamic Instruction Mix control 11% load/store 27% int arith 25% int arith 4% math fp arith 1% 31% Safe to approximate Precise 12

  13. 1 - PERFECT Dynamic Instruction Mix math fp arith 1% 31% Safe to approximate Long latency ops are all safe to approximate Precise 13

  14. 1 - PERFECT Dynamic Instruction Mix Memory ops are mostly safe to approximate (mostly data vs. pointers) load/store 27% Safe to approximate Precise 14

  15. 1 - PERFECT Dynamic Instruction Mix Control and address control computation must 11% remain precise int arith 25% Safe to approximate Precise 15

  16. 2 - Bit-Savings over Approximate Instructions 100% Approximate High Quality 83% 80% 74% Bit-Savings 60% 57% 48% 40% 40% 32% 26% 20% 0% 10 20 40 60 80 100 120 Average SNR (dB) 16

  17. 2 - Bit-Savings over Approximate Instructions 100% 83% 80% 74% Bit-Savings 60% PERFECT Manual 57% 0.001% MSE 48% 40% 40% 32% 26% 20% 0% 10 20 40 60 80 100 120 Average SNR (dB) 17

  18. 2 - Bit-Savings over Approximate Instructions Approximate 100% Computing 10% MSE 83% 80% 74% Bit-Savings 60% PERFECT Manual 57% 0.001% MSE 48% 40% 40% 32% 26% 20% 0% 10 20 40 60 80 100 120 Average SNR (dB) 18

  19. Future Architectural Challenges Mechanisms to translate bit-savings into energy savings? New data types/representations? ISA extensions? 19

  20. Thank You! Approximating to the Last Bit Thierry Moreau , Luis Ceze, Adrian Sampson {moreau, luisceze}@cs.washington.edu, asampson@cornell.edu WAX 2016 co-located with ASPLOS 2016 April 3rd 2016 20

  21. Backup Slides 21

  22. Bit Savings Explore the opportunity for precision reduction in a hardware-agnostic way ( precision ref − precision approx ) execs X BitSavings = × precision ref execs total insn static 22

  23. Framework Overview Output Quality Results Quality Autotuner Configuration & Bit Savings ACCEPT ACCEPT Execution & Annotated Approximate static analysis ILPC* error injection & Quality Program Binary instrumentation Assessment Program Inputs & Quality Metrics * Instruction-level Precision Configuration Built on top of ACCEPT , the approximate C/C++ compiler http://accept.rocks 23

  24. Program Output Quality Results Quality Autotuner Configuration & Bit Savings ACCEPT ACCEPT Execution & Annotated Approximate static analysis ILPC* error injection & Quality Program Binary instrumentation Assessment Annotation Program Inputs & Quality Metrics * Instruction-level Precision Configuration void conv2d (pix *in, pix *out, flt *filter) { for (row) { for (col) { flt sum = 0 int dstPos = … for (row_offset) { for (col_offset) { int srcPos = … int fltPos = … sum += in[srcPos] * filter[fltPos] } } out[dstPos] = sum / normFactor } } } 24

  25. Program Output Quality Results Quality Autotuner Configuration & Bit Savings ACCEPT ACCEPT Execution & Annotated Approximate static analysis ILPC* error injection & Quality Program Binary instrumentation Assessment Annotation Program Inputs & Quality Metrics * Instruction-level Precision Configuration void conv2d ( APPROX pix *in, APPROX pix *out, APPROX flt *filter) { for (row) { for (col) { APPROX flt sum = 0 Key: use the APPROX int dstPos = … type qualifier for (row_offset) { for (col_offset) { int srcPos = … int fltPos = … sum += in[srcPos] * filter[fltPos] } } out[dstPos] = sum / normFactor } } } 25

  26. Program Output Quality Results Quality Autotuner Configuration & Bit Savings ACCEPT ACCEPT Execution & Annotated Approximate static analysis ILPC* error injection & Quality Program Binary instrumentation Assessment Annotation Program Inputs & Quality Metrics * Instruction-level Precision Configuration tips on annotating programs faster typedef float flt typedef int pix typedef APPROX float flt typedef APPROX int pix Takeways: Annotating data is intuitive (~10 mins to annotate a kernel) Variables used to index arrays cannot be safely approximated 26

  27. Static Output Quality Results Quality Autotuner Configuration & Bit Savings ACCEPT ACCEPT Execution & Annotated Approximate static analysis ILPC* error injection & Quality Program Binary instrumentation Assessment Analysis Program Inputs & Quality Metrics * Instruction-level Precision Configuration void conv2d ( APPROX pix *in, APPROX pix *out, APPROX flt *filter) Instruction-Level { for (row) { Precision Configuration for (col) { (ILPC) APPROX flt sum = 0 ACCEPT int dstPos = … for (row_offset) { conv2d:13:7:load:Int32 for (col_offset) { conv2d:13:10:load:Float int srcPos = … conv2d:13:11:fmul:Float int fltPos = … sum += in[srcPos] * filter[fltPos] conv2d:13:12:fadd:Float } conv2d:15:1:fdiv:Float } conv2d:15:7:store:Int32 out[dstPos] = sum / normFactor } } } ACCEPT identified safe-to-approximate instructions from data annotations using flow analysis 27

  28. Error Output Quality Results Quality Autotuner Configuration & Bit Savings ACCEPT ACCEPT Execution & Annotated Approximate static analysis ILPC* error injection & Quality Program Binary instrumentation Assessment Injection Program Inputs & Quality Metrics * Instruction-level Precision Configuration Instruction-Level Precision Configuration (ILPC) ACCEPT Approximate conv2d:13:7:load:Int32 conv2d:13:10:load:Float Binary conv2d:13:11:fmul:Float conv2d:13:12:fadd:Float conv2d:15:1:fdiv:Float conv2d:15:7:store:Int32 Each instruction in the ILCP acts as a quality knob that the autotuner can use to maximize bit-savings 28

  29. Quality Output Quality Results Quality Autotuner Configuration & Bit Savings ACCEPT ACCEPT Execution & Annotated Approximate static analysis ILPC* error injection & Quality Program Binary instrumentation Assessment Assessment Program Inputs & Quality Metrics * Instruction-level Precision Configuration Reference Binary eval.py Approximate Binary 10dB SNR The programmer provides a quality assessment script to evaluate quality on the program output 29

  30. Output Quality Results Quality Autotuner Configuration & Bit Savings Autotuner ACCEPT ACCEPT Execution & Annotated Approximate static analysis ILPC* error injection & Quality Program Binary instrumentation Assessment Program Inputs & Quality Metrics * Instruction-level Precision Configuration Greedy iterative algorithm : reduces precision requirement of the instruction that impacts quality the least … config k: error = 0.10% config [k+1, i-1]: config [k+1, i]: config [k+1, i+1]: … … error = 5.91% error = 0.30% error = 0.12% config [k+2, i-1]: config [k+2, i]: config [k+2, i+1]: … … error = 5.91% error = 0.33% error = 1.6% … Finds solution in O(m 2 n) worst case where m is the number of static safe-to- approximate instructions and n are the levels of precision for all instructions 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend