Logic Synthesis in the Twilight of Moores Law Near-threshold, - PowerPoint PPT Presentation

Logic Synthesis in the Twilight of Moore’s Law Near-threshold, Heterogeneous, 3D Design Looking for a New Toolbox Luca Benini IIS-ETHZ & DEI-UNIBO

IoT: a System View Sense Analyze and Classify Transmit MEMS IMU Short range, BW µ Controller L2 Memory MEMS Microphone e.g. CotrexM ULP Imager Low rate (periodic) data IOs SW update, commands EMG/ECG/EIT 1 ÷ 25 MOPS 1 ÷ 2000 MOPS 1 ÷ 10 mW 1 ÷ 10 mW Long range, low BW Battery + Harvesting powered Idle: ~1 µ W 100 µW ¡÷ ¡ 2 mW  a few mW power envelope Active: ~ 50mW 2 2

How efficient? 10 12 ops/J ↓ 1pJ/op ↓ 1GOPS/mW Moore’s law has slowed to roughly 2 ½ years or roughly 30 months (25% increase in the time How to do that between semiconductor process nodes) 3 [RuchIBM11] 3

Minimum energy operation Source: Vivek De, INTEL – Date 2013 Near-Threshold Computing (NTC): 1. Don’t waste energy pushing devices in strong inversion 2. Recover performance with parallel execution 4

PULP – Parallel Ultra Low Power

Near-Threshold Multiprocessing Open Source Hardware & Software Shared L1 I$ with Multi-instruction load I$ ¡ I$B 0 ¡ I$B k ¡ IL0 ¡ IL0 ¡ Private Loop/Prefetch Buffer 4-stage, in-order ORISC PE 0 ¡ PE N-‑1 ¡ . ¡. ¡. ¡. ¡. ¡ 2 ..16 Cores Micro-MMU (demux) Periph ¡ DMA ¡ L1 ¡TCDM+T&S ¡ MB 0 ¡ MB M ¡ +ExtM ¡ Tightly Coupled DMA Shared L1 DataMem + Atomic Variables NT but parallel  Max. Energy efficiency when Active + strong PM for (partial) idleness 6

PULP Chips Technology UTBB FD-SOI 28nm Transistors Flip well L = 24 nm Cluster area 1.3 mm 2 VDD range 0.32V - 1.15V (memories) (0.45 – 1.15V) BB 0V - 1.75V range SRAM 8 x 32 kbit (TCDM) macros SCM 16x4 kbit (TCDM) macros 4x 2x4 kbit (I$) Gates 200K Frequency NO BB: 40.5-710 MHz range MAX FBB: 63.5 - 825 MHz Power NO FBB: 0.56 - 85 mW range MAX FBB: 6.9 - 480 mW ISSCC15 (student presentations, Hot Chips 15, ISSCC16 (paper+student presentation) 7 7

Variability! Temperature awareness BB/leakage management is essential 8

Synthesis Challenge  An extensive set of parameters to consider:  Supplies, Poly biasing, Body biasing, Gate sizing  Subject to temperature, reliability, mission profile constraints Target Frequency (Vdd, Pb, BB) choice becomes a power-delay trade off exercise 9

Optimization and Trade-off  Conditions Power (FF,125C) – a.u Non optimized design  28nm UTBB FDSOI  V DD min (0.5V) < V DD < V DD max (1.3V)  P b min (0) < P b < P b max (16nm)  B b min (0) < B b < B b max (2.0V)  Pdyn/Pstat ratio = 50% Optimum  Power,Perf corners in speed and power An optimized design means:  Freq (SS 125C) – a.u  Maximize performance for given power  Minimize power for given performance  Area constraint  The optimum vector is a function (Vdd, Pb, BB)  Strongly dependent on chosen corners  Static + Dynamic 10

Dynamic Body Bias Dynamic adaptation can also be used to «remove» extremely adverse corners and ease MC-MM optimization 11

ULP Bottleneck: Memory 256x32 6T SRAMS vs. SCM 2x-4x  “Standard” 6T SRAMs:  High VDDMIN  Bottleneck for energy efficiency  Near-Threshold SRAMs (8T)  Lower VDDMIN  Area/timing overhead (25%-50%)  High active energy  Low technology portability  Standard Cell Memories:  Wide supply voltage range  Lower read/write energy (2x - 4x)  Easy technology portability  Major area overhead (2x) Need help exploring memory tradeoffs! 12

Static vs. Dynamic again… SoC ¡ ¡ CLUSTER ¡ SRAM ¡VOLTAGE ¡DOMAIN ¡(0.5V ¡– ¡0.8V) ¡ VOLTAGE ¡ VOLTAGE ¡ DOMAIN ¡ ... SRA SRA SRA DOMAIN ¡ M M M (0.8V) ¡ #1 #M-1 #0 (0.5V-‑0.8V) ¡ ... SCM SCM SCM #0 #1 #M-1 DMA BRIDGE ¡ L2 RMU RMU RMU MEMORY Hybrid CLUSTER ¡BUS ¡ LOW ¡LATENCY ¡INTERCONNECT ¡ BRIDGES ¡ memory system INTERCONNECT ¡ BRIDGE ¡ PERIPHERAL ¡ PERIPHER ALS PERIPHER ... ALS PE ¡ PE ¡ PE ¡ I$ I$ I$ #0 ¡ #1 ¡ ¡ ¡ #N-‑1 ¡ ¡ to RMUs INSTRUCTION ¡BUS ¡ 13

Approximate Computing to the Rescue

Approximate  Adequate Less-than-perfect results perceived as correct by the users e.g. image processing (filtering) RGB to GRAYSCALE (+ 10% error) RGB to GRAYSCALE Approximation is not always acceptable  Application and program phase dependent! 15

Approximate Storage?  Retention voltage Retention SCM 0.25V 6T-SRAM 0.29V  Probability of flip-bit error on a single bit during read/ write operations Voltage (V) 0.50 0.55 0.60 0.65 0.70 0.75 0.80 P(flip-bit) SCM 0.0 0.0 0.0 0.0 0.0 0.0 0.0 P(flip-bit) 6T 0.0037 0.0012 0.0003 5.24e-5 4.35e-6 4.16e-8 0.0 Energy vs. Precision tradeoff  big range! 16

Acceleration

Recovering more silicon efficiency GOPS/W 3 6 1 > 100 SW Mixed HW General-purpose Throughput Computing Computing GPGPU HW IP CPU Accelerator Gap Closing The Accelerator Efficiency Gap with Agile Customization 18 18

Learn to Accelerate  Brain-inspired (deep convolutional networks) systems are high performers in many tasks over many domains CNN: 93.4% accuracy (Imagenet 2014) Human: 85% (untrained), 94.9% (trained) [Karpahy15] Spiking NN Image recognition Speech recognition Accelerator [Russakovsky et al., 2014] [Hannun et al., 2014]  Flexible acceleration: learned CNN weights are “the program” 19

Computational Effort  Computational effort ~90%  7.5 GOp for 320x240 image  260 GOp for FHD  1050 GOp for 4k UHD Origami a CNN accelerator 20

Origami: The Architecture 21

Smooth Degradation with Vdd ↓ 0% bit flips 1% bit flips Really needing synthesis tools for exploring the approximation space for these «arithmetically dense» architectures 1. Numerical precision 2. Controlled error tolerance 67% energy improvement 22

Conclusions  ioT Energy efficiency requirements are super-tight  Technology scaling alone is not doing the job for us  Ultra-low power “traditional computing” architecture and circuits are needed, but not sufficient in the long run  Approximation for energy efficiency is apromising direction  SW and SW-abstractions are key  Need synthesis tools more than ever! 23

Next bottleneck - IO Key Challenges 1. Minimize Epb for IO 2. Maximize cluster idleness while doing IO Flexible and low-pin count interface layer – (Quasi)-Serial is better 24

ULP Serial Phy  A 0.45-0.7V 1-6Gb/s 0.29-0.58pJ/bit Source Synchronous Transceiver Using Automatic Phase Calibration in 65nm CMOS (0.15mm 2 ) On 36-inch SMA cable BER <10-10 with 0.15UI timing margin  Source-synchronous, pseudo-differential, unterminated, Voltage Mode, 200mVpp, 1/8 rate CLK, self-calibrating PLL-based phase generator  Low-cost SIP+die stacking option for processor + memories + sensors becomes viable Departement Informationstechnologie und Elektrotechnik 25

Logic Synthesis in the Twilight of Moores Law Near-threshold, - PowerPoint PPT Presentation

Logic Synthesis in the Twilight of Moores Law Near-threshold, Heterogeneous, 3D Design Looking for a New Toolbox Luca Benini IIS-ETHZ & DEI-UNIBO IoT: a System View Sense Analyze and Classify Transmit MEMS IMU Short range, BW

BATH & BODY SS20 TWILIGHT GARDEN TWILIGHT GARDEN FRAGRANCE JASMINE & MANDARIN A

Twilight Tuesdays Sailing is for You! Welcome to Twilight Tuesdays Sailing Fun, Sailing and

Logic Synthesis Overview Design flow Principles of logic synthesis Logic

Why Again Logic Synthesis Giovanni De Micheli Why again logic synthesis? Strong

Logic Synthesis Page 1 Introduction to Digital VLSI Logic Synthesis Course Outline Design

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

The Origin of Near Earth The Origin of Near Earth The Origin of Near Earth The Origin of Near

Program: Another Regents Pathway to Graduation Two Options Twilight Academic Program (TAP)

Markov Logic Markov Logic Probability First-Order Logic Propositional Logic Markov Logic

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

INVISIBLE LEVIATHAN: MARXS LAW OF VALUE IN THE TWILIGHT OF CAPITALISM Murray E.G. Smith

Remarks on Invisible Leviathan: Marxs Law of Value in the Twilight of Capitalism Note: This is

Logic Modeling Outline What is a logic model? How to use a logic model How to build a

Combining equilibrium logic and dynamic logic (an introduction and a very brief overview) Luis

Synthesis of Carbon Synthesis of Carbon Nanotubes Nanotubes Polina Shifrina Supervisors: Dr.

CSE378 - Cache Performance metrics for caches Parameters for cache design Basic performance

Part I Introduction Hardware and OS Review The scientist described what is: the engineer creates

A Parallel Implementation of Quicksort and its Performance Evaluation Philippas Tsigas

External System Generator Outage Localization Based on Tie-line Synchrophasor Measurements Zhen

CS 398 ACC MapReduce Part 1 Prof. Robert J. Brunner Ben Congdon Tyler Kim Data Science

INTRODUCTION TO MUSICAL TIMBRE II YU / LAMONT FEBRUARY 22, 2018 LINGUIST 197M, SPRING 2018.

Cartographic Visualization Jennifer Tillett November 10, 2004 From Metaphor to Method:

Speaker Verification Systems Haizhou Li Institute for Infocomm Research (I 2 R), Singapore

Logic Synthesis in the Twilight of Moores Law Near-threshold, - PowerPoint PPT Presentation

Logic Synthesis in the Twilight of Moores Law Near-threshold, Heterogeneous, 3D Design Looking for a New Toolbox Luca Benini IIS-ETHZ & DEI-UNIBO IoT: a System View Sense Analyze and Classify Transmit MEMS IMU Short range, BW

BATH &amp; BODY SS20 TWILIGHT GARDEN TWILIGHT GARDEN FRAGRANCE JASMINE &amp; MANDARIN A

Twilight Tuesdays Sailing is for You! Welcome to Twilight Tuesdays Sailing Fun, Sailing and

Logic Synthesis Overview Design flow Principles of logic synthesis Logic

Why Again Logic Synthesis Giovanni De Micheli Why again logic synthesis? Strong

Logic Synthesis Page 1 Introduction to Digital VLSI Logic Synthesis Course Outline Design

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

The Origin of Near Earth The Origin of Near Earth The Origin of Near Earth The Origin of Near

Program: Another Regents Pathway to Graduation Two Options Twilight Academic Program (TAP)

Markov Logic Markov Logic Probability First-Order Logic Propositional Logic Markov Logic

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

INVISIBLE LEVIATHAN: MARXS LAW OF VALUE IN THE TWILIGHT OF CAPITALISM Murray E.G. Smith

Remarks on Invisible Leviathan: Marxs Law of Value in the Twilight of Capitalism Note: This is

Logic Modeling Outline What is a logic model? How to use a logic model How to build a

Combining equilibrium logic and dynamic logic (an introduction and a very brief overview) Luis

Synthesis of Carbon Synthesis of Carbon Nanotubes Nanotubes Polina Shifrina Supervisors: Dr.

CSE378 - Cache Performance metrics for caches Parameters for cache design Basic performance

Part I Introduction Hardware and OS Review The scientist described what is: the engineer creates

A Parallel Implementation of Quicksort and its Performance Evaluation Philippas Tsigas

External System Generator Outage Localization Based on Tie-line Synchrophasor Measurements Zhen

CS 398 ACC MapReduce Part 1 Prof. Robert J. Brunner Ben Congdon Tyler Kim Data Science

INTRODUCTION TO MUSICAL TIMBRE II YU / LAMONT FEBRUARY 22, 2018 LINGUIST 197M, SPRING 2018.

Cartographic Visualization Jennifer Tillett November 10, 2004 From Metaphor to Method:

Speaker Verification Systems Haizhou Li Institute for Infocomm Research (I 2 R), Singapore

BATH & BODY SS20 TWILIGHT GARDEN TWILIGHT GARDEN FRAGRANCE JASMINE & MANDARIN A