Bespoke Processors for Applications with Ultra-low Area and Power - PowerPoint PPT Presentation

Bespoke Processors for Applications with Ultra-low Area and Power Constraints by Cherupalli et al. ISCA ‘17 Jielun Tan, Tim Wesley

Overview Motivation Intro to Bespoke Benchmarks and Results Discussion

General Purpose CPUs in ULP Ultra-Low Power applications (IoT, wearables, implantables) typically use small, general purpose microprocessors ● Amortized cost of development ● Most capabilities of these processors are never used by the application ○ Unused gates still drain power and take up area

What about ASICs and FPGAs ? ● Both are expensive to develop ● ASICs ○ IPs required for different applications ○ Expensive at small scales ● FPGA ○ Often larger than needed, to accommodate programmability ○ May still use too much power

Algorithm Usage Examples

Bespoke Processors--Tuning Process ● Bespoke processor design flow: ○ First use traditional module-level removal ○ Next use Input-Independent Gate Activity Analysis ○ Finally, cut-and-stitch the netlist to form the final design

Input-Independent Gate Activity Analysis 1. Load binary into memory 2. Set application inputs to Xs 3. After each cycle is simulated, the toggled gates are marked “keep” 4. If an X propagates to the PC, we have a possible branch a. Explore all possible branch paths, depth-first b. Remember the most conservative state (most Xs) i. Take union of gates of branches if most conservative is missing a few c. If branch is re-encountered i. Skip check if this state is a substate of that most conservative state ii. Merge lists of activated gates and make the result the new conservative state 5. Lists of all gates that are never toggled, along with their constant values, are passed to the cut-and-stitch function

Cutting and Stitching 1. After X propagation, untoggled gates are removed from the netlist and replaced by a constant voltage 2. Rerun logical synthesis for further optimizations a. Typically gates that have constant inputs can reduced to even simpler logic 3. Place and route (this is not any further optimized)

Input-independent Gate Activity Analysis Example

Benchmarks ● Baseline ○ openMSP430 with TSMC 65nm ○ Operating @1V @100MHz ○ Bare metal simulation or FreeRTOS ○ Either completely general purpose, or traditionally optimized for an application by removing modules ● Each benchmark is then run on a Bespoke processor optimized for that benchmark ○ All unused modules are removed ○ X propagation and cut-and-stitch are performed

Used Gates per Benchmark

Results ● Reduction in gate count, area and power for a bespoke design vs. unmodified baseline

Results ● Reduction in gate count, area, and power in bespoke design vs. module optimized baseline

Results

Multiple Programs ● Multiple programs? ○ Run bespoke tuning process on each and take the union of the results ● Ceiling at 80%... test suite does not activate all gates

In-Field Updates ● Bug fixes may need to be deployed, which may change the toggled gates ● Milu mutation testing tool used to emulate changes in the program for future updates ○ Type I: conditional operator changes (AND -> OR) ○ Type II: computation operator mutants (add -> multiply) ○ Type III: loop conditional operator mutants (less than -> less than or equal to)

Coverage for In-Field Updates ● Between 25% and 100% of mutants for each type are covered ● 70% of all mutants of all types of covered ● If mutants are significantly different, then they can be considered as independent programs ● Overhead of between 1% and 40% ● Total area reductions between 23% and 66%, total power reductions between 13% and 53%

Coverage for in-Field Updates cont. ● An instruction that can be executed in one program is not necessarily executable in another program ○ A particular ADD instruction may only use 16 bits out of a 32 bit ALU ● A tailored bespoke processor can support arbitrary software updates by supporting a Turing complete instruction (e.g. subneg) or a set of them ○ A program written using Turing complete instruction can be consisted solely of that instruction

System Code ● Application analysis of system code for FreeRTOS shows 57% of the gates are never used by the OS ● When benchmarks are evaluated individually with FreeRTOS ○ 37% unused in the worst case ○ 49% unused on average ● Running 15 benchmarks on top of FreeRTOS still shows 27% of gates unused

Generality and Limitations ● Hardware with non-deterministic behaviors need additional techniques to be Bespoke tuned ○ Branch predictors ○ Caches ○ Speculative operations ○ Out-of-order cores ● Xs need to be injected as the results of ○ ...branch predictions ○ ...tag checks ○ ...values where speculation may be used ● Extending the X-prop process to explore data flow graphs may allow analysis of OoO to work

Discussion Points 1. All of the examples they tested are just algorithms such as binary search or FFT. But actual applications, even in IoT and smaller, typically do more than just, e.g., binary search. Do Bespoke tuned processors have any value for real-world programs? 2. Is using Milu and adding mutations representative of what in-field updates would actually change? 3. Can the Bespoke tuning process be used for lowering power consumption of high-performance accelerators? 4. Is Bespoke tuning better or worse for certain cases than technologies such as HLS, Simulate-and-Eliminate, or just making an ASIC design?

Related Works ● High-Level Synthesis ○ Additional development costs ■ New high-level specs of application behavior needs to be defined ■ High-level spec needs to also be verified ○ C to ASICs is very difficult to do, especially to do efficiently ○ Unlikely to support multiple applications nor in-field updates ● Simulate-and-Eliminate ○ Simulates the target application with a user-provided set of inputs on multiple base designs ■ Require significant user input ■ Only considers high-level, manually-identified components ■ Relies on user inputs to determine unused components--user may forget a test case!

Bespoke Processors for Applications with Ultra-low Area and Power - PowerPoint PPT Presentation

Bespoke Processors for Applications with Ultra-low Area and Power Constraints by Cherupalli et al. ISCA 17 Jielun Tan, Tim Wesley Overview Motivation Intro to Bespoke Benchmarks and Results Discussion General Purpose CPUs in ULP

A/D Conversion and A/D Conversion Filtering for Ultra Low Filtering for Ultra Low A/D

ChemBioDraw Today & Tomorrow Mark L. Olson, PhD Vice-President, Software Development

Innovative Power Control for Ultra Low-Power and High- Ultra Low Power and High Performance

Strategic Integration of Ultra Low Strategic Integration of Ultra Low Power Technologies g

Customer Presentation 16-bit Ultra Low Power Microcontroller The eCOG1, 16 Bit Ultra Low Power

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

Presentation GWT TM Ultra Filtration Systems GWT Ultra filtration systems incorporate advanced

Outline Introduction. Paper: System Design for Ultra-Low Power. Bernier, C. Hameau, F.,

TBEN-S Ultra-Compact Multiprotocol I/O Modules Ultra-Compact Multiprotocol I/O Modules in IP67

Ultra P Ultra Petr troleum Corp. oleum Corp. Michael D. W Michael D. Watf tfor ord

Florida & The Future of Law March 17, 2017 The Evolution of Legal Service Bespoke

CYBERSECURITY IN THE AGE OF CORONAVIRUS PRESENTED BY BEN GLASS PRESIDENT BESPOKE TECHNOLOGY

Cleaners / Degreasers UB10 Degreaser 1 For use via Ultra Blend & Ultra Dose.

Silica Monolith Columns for Silica Monolith Columns for Ultra High Speed Separations Ultra High

Gloucestershire County Council Ultra Low Emission Vehicle (ULEV) Strategy Wednesday, 04 March

Emission Zone Michele Dix, Managing Director, Planning 4thJune 2014 NO 2 a Europe-wide

Training Grant Application FY 2016-2017 Housing Counseling Training Grant Application Audio is

Admin Today/Friday: mobile platform security Wednesday: Guest lecture: Christoph Kern,

EAP Key Derivation For Multiple Application (draft-salowey-eap-key-deriv-00.txt) Pasi Eronen

Introduction Asking Good Questions So youve welcomed the client and made them feel

Long-range correlations in driven systems (II) David Mukamel Firenze, 12-16 May, 2014 Outline

CPSC 875 CPSC 875 John D McGregor John D. McGregor C15 Variation in architecture Early

Extending Exploratory Landscape Analysis for Multi-Objective and Multimodal Problems Pascal

Solving MOOP: Non-Pareto MOEA approaches Debasis Samanta Indian Institute of Technology Kharagpur

Bespoke Processors for Applications with Ultra-low Area and Power - PowerPoint PPT Presentation

Bespoke Processors for Applications with Ultra-low Area and Power Constraints by Cherupalli et al. ISCA 17 Jielun Tan, Tim Wesley Overview Motivation Intro to Bespoke Benchmarks and Results Discussion General Purpose CPUs in ULP

A/D Conversion and A/D Conversion Filtering for Ultra Low Filtering for Ultra Low A/D

ChemBioDraw Today &amp; Tomorrow Mark L. Olson, PhD Vice-President, Software Development

Innovative Power Control for Ultra Low-Power and High- Ultra Low Power and High Performance

Strategic Integration of Ultra Low Strategic Integration of Ultra Low Power Technologies g

Customer Presentation 16-bit Ultra Low Power Microcontroller The eCOG1, 16 Bit Ultra Low Power

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

Presentation GWT TM Ultra Filtration Systems GWT Ultra filtration systems incorporate advanced

Outline Introduction. Paper: System Design for Ultra-Low Power. Bernier, C. Hameau, F.,

TBEN-S Ultra-Compact Multiprotocol I/O Modules Ultra-Compact Multiprotocol I/O Modules in IP67

Ultra P Ultra Petr troleum Corp. oleum Corp. Michael D. W Michael D. Watf tfor ord

Florida &amp; The Future of Law March 17, 2017 The Evolution of Legal Service Bespoke

CYBERSECURITY IN THE AGE OF CORONAVIRUS PRESENTED BY BEN GLASS PRESIDENT BESPOKE TECHNOLOGY

Cleaners / Degreasers UB10 Degreaser 1 For use via Ultra Blend &amp; Ultra Dose.

Silica Monolith Columns for Silica Monolith Columns for Ultra High Speed Separations Ultra High

Gloucestershire County Council Ultra Low Emission Vehicle (ULEV) Strategy Wednesday, 04 March

Emission Zone Michele Dix, Managing Director, Planning 4thJune 2014 NO 2 a Europe-wide

Training Grant Application FY 2016-2017 Housing Counseling Training Grant Application Audio is

Admin Today/Friday: mobile platform security Wednesday: Guest lecture: Christoph Kern,

EAP Key Derivation For Multiple Application (draft-salowey-eap-key-deriv-00.txt) Pasi Eronen

Introduction Asking Good Questions So youve welcomed the client and made them feel

Long-range correlations in driven systems (II) David Mukamel Firenze, 12-16 May, 2014 Outline

CPSC 875 CPSC 875 John D McGregor John D. McGregor C15 Variation in architecture Early

Extending Exploratory Landscape Analysis for Multi-Objective and Multimodal Problems Pascal

Solving MOOP: Non-Pareto MOEA approaches Debasis Samanta Indian Institute of Technology Kharagpur

ChemBioDraw Today & Tomorrow Mark L. Olson, PhD Vice-President, Software Development

Florida & The Future of Law March 17, 2017 The Evolution of Legal Service Bespoke

Cleaners / Degreasers UB10 Degreaser 1 For use via Ultra Blend & Ultra Dose.