Architecture Support for Disciplined Approximate Programming Hadi - PowerPoint PPT Presentation

Architecture Support for Disciplined Approximate Programming Hadi Esmaeilzadeh Adrian Sampson University of Washington Luis Ceze Doug Burger Microsoft Research sa pa ASPLOS 2012

mobile devices battery usage data centers power & cooling costs dark silicon utilization wall

Disciplined approximate programming The EnerJ programming language ✓ references pixel data neuron weights jump targets audio samples ✗ JPEG header video frames Precise Approximate safely interleave approximate and precise operation

Errors Energy Errors Energy Errors Energy Errors Energy

Perfect correctness is not required computer vision machine learning augmented reality sensory data games scientific computing information retrieval physical simulation

Disciplined approximate programming The EnerJ programming language @Approx float[] nums; ⋮ @Approx float total = 0.0f; for ( @Precise int i = 0; i < nums.length; ++i) total += nums[i]; return total / nums.length;

Disciplined approximate programming The EnerJ programming language @Approx float[] nums; ⋮ @Approx float total = 0.0f; for ( @Precise int i = 0; i < nums.length; ++i) total += nums[i]; return total / nums.length; approximate data storage

Disciplined approximate programming The EnerJ programming language @Approx float[] nums; ⋮ @Approx float total = 0.0f; for ( @Precise int i = 0; i < nums.length; ++i) total += nums[i]; return total / nums.length; approximate operations

Hardware support for disciplined approximate programming @Approx float[] nums; ⋮ Truffle @Approx float total = 0.0f; Compiler for (@Precise int i = 0; Core i < nums.length; ++i) total += nums[i]; return total / nums.length; EnerJ Code

Hardware support for disciplined approximate programming Compiler-directed approximation Safety checks at compile time Truffle Compiler Core No expensive checks at run time Simplify hardware implementation

Hardware support for disciplined approximate programming Approximation-aware ISA Dual-voltage microarchitecture Energy savings results

Approximation-aware languages need: ÷ & + Approximate operations × - ALU | Approximate data registers caches main memory Fine-grained interleaving ADD R1 R2 R3 MOV R3 R4 JMP 0x01234 STL R1 0xABCD LDF R2 0xBCDE ADD R1 R2 R3 MOV R3 R4 JMP 0x01234 STL R1 0xABCD LDF R2 0xBCDE ADD R1 R2 R3 MOV R3 R4 JMP 0x01234

Approximation-aware languages need: ÷ & + Approximate operations × - ALU | per instruction Approximate data registers caches main memory per cache line

Traditional, precise semantics ADD r1 r2 r3: some value writes the sum of r1 and r2 to r3

Approximate semantics ADD r1 r2 r3: some value writes the sum of r1 and r2 to r3 Informally: r3 gets something that approximates the sum r1 + r2. Actual error pattern depends on microarchitecture, voltage, process, variation, …

Undefined behavior ADD r1 r2 r3: ???

Approximate semantics ADD r1 r2 r3: some value writes the sum of r1 and r2 to r3 Informally: r3 gets something that approximates the sum r1 + r2. No other register is modified. No floating point division exception is raised. Does not jump to an arbitrary address. No missiles are launched. ⋮

An ISA extension with approximate semantics ADD.a AND.a ADDF.a operations MUL.a XNOR.a DIVF.a CMPLE.a SRL.a … ALU LDL.a LDF.a … storage STL.a STF.a registers caches main memory

Dual-voltage pipeline Fetch Decode Reg Read Execute Memory Write Back Branch Predictor Integer FU Data Cache Instruction Register File Register File Decoder Cache FP FU ITLB DTLB data movement & control plane processing plane

Dual-voltage pipeline Integer FU Register File Data Cache FP FU

Dual-voltage pipeline Integer FU Integer FU Register File Data Cache FP FU FP FU switch replicate switch (dynamic) (static) (dynamic)

Dual-voltage functional units: shadow structures operands result One structure is active at a time. Execute Stage

Dual-voltage functional units: shadow structures Issue width not changed (scheduler is unaware of shadowing) Inactive unit is power-gated No voltage change latency

Approximate storage: register modes r1 r2 precise mode r3 approximate mode r4 r4 r5 Reads from registers r6 in approximate mode r7 may return any value. r8 ⋮

Approximate storage: register modes r1 ADD r1 r2 r3 r2 r3 r4 r5 r6 r7 r8 ⋮

Approximate storage: register modes r1 ADD.a r1 r2 r3 r2 r3 r3 r4 The destination register’s r5 mode is set to match the r6 writing instruction. r7 r8 ⋮

Approximate storage: register modes r1 r2 ADD r2 r3.a r4 r3 r3 r4 r4 r5 Register operands r6 must be marked with r7 the register’s mode. (Otherwise, read garbage.) r8 ⋮

Registers and caches: dual-voltage SRAMs precision column data V DD H V DD L 0 0 1 1 row selection 1 data (read) 0 + data (write) 1 + precision 0 (for sense 1 amplifiers and 1 precharge) 0 1 DV-SRAM subarray

Registers and caches: dual-voltage SRAMs Mixture of precise and approximate data Instruction stream gives access levels (compiler-specified)

Approximate storage: caches LDL.a 0x… r1 r1 r2 Cache r3 r3 r4 r4 Data enters cache with precision of the access. r5 Compiler: consistently r6 treat data as approximate r7 or precise. r8 (Otherwise, read garbage.) ⋮

Also in the paper Approximate main memory Detailed DV-SRAM design Voltage level-shifter and mux circuits VddH VddH VddL VddH input[0] 0 -Vdd(H/L) output input output input 0-VddH VddH 0 -Vdd(H/L) 0 -VddL output 0 -Vdd(H/L) select precision 0-VddH VddH 0 -Vdd(H/L) VddL 0-VddH precision VddH 0-VddH input[1] 0 -Vdd(H/L) Replicated pipeline registers Broadcast network details

Hardware support for disciplined approximate programming Approximation-aware ISA Dual-voltage microarchitecture Energy savings results

Energy savings results Simulated EnerJ programs Precision-annotated Java [PLDI’11] Scientific kernels, mobile app, game engine, imaging, raytracer Modified McPAT models for OoO (Alpha 21264) and in-order cores [Li, Ahn, Strong, Brockman, Tullsen, Jouppi; MICRO’09] 65 nm process, 1666 MHz, 1.5 V nominal ( V DD H ) 4-wide (OoO) and 2-wide (in-order) Includes overhead of additional muxing, shadow FUs, etc. Extended CACTI for DV-SRAM structures [Muralimanohar, Balasubramonian, and Jouppi; MICRO’07] 64 KB (OoO) and 32 KB (in-order) L1 cache Line size: 16 bytes Includes precision column overhead

Energy savings on in-order core 50% energy reduction over non-Truffle 40% 30% 20% 10% 0% -10% fft imagefill jmeint lu mc raytracer smm sor zxing average V DD L = 0.75 V 0.94 V 1.13 V 1.31 V 7–24% energy saved on average Raytracer saves 14–43% energy

Energy savings on OoO core 50% energy reduction over non-Truffle 40% 30% 20% 10% 0% -10% fft imagefill jmeint lu mc raytracer smm sor zxing average V DD L = 0.75 V 0.94 V 1.13 V 1.31 V Energy savings up to 17% Efficiency loss up to 5% in the worst case

Application accuracy trade-off 100% output quality-of-service loss 80% 60% 40% 20% 0% 10 -8 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 fft imagefill jmeint lu mc raytracer smm sor zxing Application-specific output quality metrics Error resilience varies across applications

Hardware support for disciplined approximate programming V DD H int p = 5; @Approx int a = 7; for (int x = 0..) { a += func(2); @Approx int z; Truffle z = p * 2; Compiler p += 4; Core } a /= 9; func2(p); V DD L a += func(2); @Approx int y; z = p * 22 + z; p += 10;

Hardware support for disciplined approximate programming Approximation-aware ISA Tightly coupled with language-level precision information Dual-voltage microarchitecture Data plane can run at lower voltage Low-complexity design relying on compiler support Significant energy savings Up to 43% vs. a baseline in-order core

Future work on disciplined approximate programming Approximate accelerators Precision-aware programmer tools Non-voltage approximation techniques

Architecture Support for Disciplined Approximate Programming Hadi - PowerPoint PPT Presentation

Architecture Support for Disciplined Approximate Programming Hadi Esmaeilzadeh Adrian Sampson University of Washington Luis Ceze Doug Burger Microsoft Research sa pa ASPLOS 2012 mobile devices battery usage data centers power &

Architecture Support for Disciplined Approximate Programming Hadi Esmaeilzadeh, Adrian Sampson,

Delivering Disciplined Delivering Disciplined Growth Kinross Friendly Combination Kinross

Delivering Disciplined Delivering Disciplined Growth Kinross Friendly Combination Kinross

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Disciplined Agile In The City AGILE in the CITY, LONDON 2018 About today My

C/C++ Disciplined Coding Styles C++ Object Oriented Programming Pei-yih Ting NTOU CS 1

Approximate Bayesian Computation Chris Drovandi, Charisse Farr October 24, 2012 Chris Drovandi,

Probable Cause The Deanonymizing Effects of Approximate DRAM Amir Rahmati , Matthew Hicks, Dan

Approximate Graph Operations on Parallel Platforms Approximate Graph Operations on Parallel

Backward Analysis via Over-Approximate Abstraction and Under-Approximate Subtraction Alexey

Approximate Reasoning for the Semantic Web Part V Approximate Resolution for OWL Frank van

Two Approximate- Programmability Birds, One Statistical- Inference Stone Adrian Sampson

Approximate Program Synthesis James Bornholt Emina Torlak Luis Ceze Dan Grossman University of

Approximate Bayesian Computation Dr. Jarad Niemi STAT 615 - Iowa State University December 5,

Approximate inference: Sampling methods Probabilistic Graphical Models Sharif University of

Parametric Presburger Arithmetic Tristram Bogart Universidad de los Andes 13 March 2018

t Pt Prts r Prsrr

SCHUMPETERIAN DYNAMICS: A SURVEY OF DIFFERENT APPROACHES Victor Polterovich CEMI RAS and MSE

Trust-based belief change Emiliano Lorini Guifei Jiang , Laurent Perrussel IRIT

From Lawvere to Brandenburger-Keisler: interactive forms of diagonalization and self-reference

SIP URI Conventions for Media Servers IETF 51 8 August 2001 draft-burger-sipping-msuri-00.txt

One System To Fit Them All: Shared MySQL Hosting At Facebook Andrew Regner Production

libadjoint: a new abstraction for developing adjoint models S.W. Funke, P.E. Farrell and D.A. Ham