ASIC accelerators 1 Part 2 serial codes out Part 1 due tomorrow, - PowerPoint PPT Presentation

ASIC accelerators 1 Part 2 — serial codes out Part 1 — due tomorrow, 11:59PM Homework 3 Questions? 2 Consistent style — easier to read even if your grammar is poor Usually — better ofg rewriting completely have good habits Some paper reviews copying phrases from papers in the semester I didn’t look closely enough at paper reviews earlier A Note on Quoting Papers 1 Enabling Large Design Space Exploration of Customized Architectures” Shao et al, “Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Han et al, “EIE: Efficient Inference Engine on Compressed Neural Networks” Supplementary reading: (Computer magazine version) Shao et al, “The Aladdin Approach to Accelerator Design and Modeling” Network Accelerators” Reagan et al, “Minerva: Enabling Low-Power, Highly-Accurate Deep Neural This day’s papers: To read more… 3 You must make it obvious you are doing so This will get you in tons of trouble later if you don’t

Accelerator motivation + 6 DFG scheduling two add functional units: one add functional unit: a b c d + + + result a b c d + + + result result d end of transistor scaling to handle/eliminate dependencies specialization as way to further improve performance especially performance per watt key challenge: how do we design/test custom chips quickly? 4 Behavioral High-Level Synthesis take C-like code, produce HW problem (according to Aladdin paper): requires lots of tuning… to make memory accesses/etc. efficient + 5 Data Flow Graphs int sum_ab = a + b; int sum_cd = c + d; int result = sum_ab + sum_cd; a + b c 7

DFG realization — data path 8 full synthesis: tuning 10 Dynamic Data Dependency Graph 9 need to fjgure out memory/register connections actually need to make working control logic full synthesis: assume someone will fjgure out scheduling HW Aladdin trick: Dynamic DDG selectors for MUXes, write enable for regs MUX plus control logic result sum_cd sum_ab ADD ADD d b c a MUX 11 use dynamic (runtime) dependencies

tuning: false dependencies select latency, number of ports (read/write units) without making something worse Pareto-optimum: can’t make anything better Pareto-optimum 14 wire lengths, etc., etc. control logic accounting Missing area/power modeling 13 memory model “the reason is that when striding over a partitioned tested via microbenchmarks library of functional units functional unit power/area + memory power/area Aladdin area/power modeling 12 loop-carried dependences.” cycle, though accessing difgerent elements of the array being read from and written to in the same 15 array, the HLS compiler conservatively adds

design space example (GEMM) out difgerentiable Neural Networks (3) 18 out 16 Neural Networks (2) 17 19 Neural Networks (1) I 4 a 4 b 3 a 3 I 3 c 1 b 2 a 2 I 2 b 1 a 1 I 1 real world: out real = F ( I 1 , I 2 , I 3 , I 4 ) compute approximation out pred ≈ ˆ F ( I 1 , I 2 , I 3 , I 4 ) using intermediate values a i s, b i s I 4 a 4 neuron: a 1 = K ( w a 1 , 1 I 1 + w a 1 , 2 I 2 + · · · + w a 1 , 4 I 4 ) b 3 1 a 3 I 3 K ( x ) — activation function, e.g. 1 + e − x c 1 b 2 close to 0 as x approaches −∞ a 2 I 2 close to 1 as x approaches + ∞ b 1 a 1 I 1 a 1 = K ( w a 1 , 1 I 1 + w a 1 , 2 I 2 + · · · + w a 1 , 4 I 4 ) b 1 = K ( w b 1 , 1 a 1 + w b 1 , 2 a 2 + w b 1 , 3 a 3 ) w s — weights, selected by training

Minerva’s problem mathematical — precision of calculations Neural network parameters 22 i.e. pipeline depth hardware — amount of intra-neuron parallelism approx. cores hardware — amount of inter-neuron parallelism hardware — size of memory, number of calculations hardware — size of memory, number of calculations evaluating neural networks mathematical — design of neural network Tradeofgs 21 High-level design 20 example: handwriting recognizer train model once, deploy in portable devices 23 goal: low-power, low-cost ( ≈ area) ASIC

“intrinsic inaccuracy” 24 intrinsic inaccuracy assumption don’t care if precision variation similar to training variation sensible? 25 HW tradeofgs (1) 26 HW tradeofgs (1) 27

parameters varied functional unit placement (in in pipeline) number of lanes 28 HW pipeline 29 Decreasing precision (1) from another neural network ASIC accelerator paper: 30 Decreasing precision (2) from another neural network ASIC accelerator paper: 31

Pruning don’t run at low voltage/etc. physical faults can just be more noise “noise” from imprecise training data, rounding, etc. calculations are approximate anyways Algorithmic fault handling 34 redundancy — error correcting codes Traditional reliability techniques short-circuit calculations close to zero 33 SRAM danger zone 32 checking weights dynamically – compute 0 if input is near-zero without weights statically — remove neurons with almost all zero 35

round-down on faults noted: lots of tricky branching on GPUs/CPUs. as a Computer, chapters 1 and 3 and 6 reading on schedule: Barroso et al, The Datacenter no paper review supercomputer AKA datacenters — most common modern next time: Warehouse-Scale Computers design tradeofgs in the huge 38 solved general sparse matrix-vector multiply problem omitted zero weights in more compact way 36 EIE — same conference note: other papers on this topic 37 size of models width of computations/storage amount of parallel computations huge number of variations: design exploration 39 best power per accuracy

next week — security general areas of HW security: protect programs from each other — page tables, kernel mode, etc. protect programs from adversaries — bounds checking, etc. protect programs from people manipulating the hardware next week’s paper: last category 40

ASIC accelerators 1 Part 2 serial codes out Part 1 due tomorrow, - PowerPoint PPT Presentation

ASIC accelerators 1 Part 2 serial codes out Part 1 due tomorrow, 11:59PM Homework 3 Questions? 2 Consistent style easier to read even if your grammar is poor Usually better ofg rewriting completely have good habits Some

Application Accelerators: Application Accelerators: Application Accelerators: Application

ASIC Computer-Aided Design Flow ELEC 5250/6250 ASIC Design Flow ASIC Design Flow Behavioral

Coercive Powers & ASIC Coercive Powers & ASIC ASIC Summer School 2011 Richard Gilbert

ASIC Development @ GSI Holger Flemming Experiment Electronic / ASIC-Design 1 1 The GSI ASIC

Evolving ASIC Methodology to Adapt to Technology and EDA Tool Advances Tom Russell Manager ASIC

Measurements on P2 and P3 FE ASIC and Experience of P2 FE ASIC in ProtoDUNE-SP Shanshan Gao on

ASIC Research and Development at Fermilab R. Yarema April 20, 2005 Main areas ASIC R&D

ASIC accelerators 1 To read more This days papers: Reagan et al, Minerva: Enabling

DETECTORS AND ACCELERATORS DETECTORS AND ACCELERATORS APPLIED TO MEDICINE Jos Bernabu Jos

Accelerators for Americas Future ACCELERATORS - MODERN SHIPS OF DISCOVERY October 26, 2009

R265: Advanced Topics in Computer Architecture Seminar 7: HW accelerators and accelerators for

Confidential Accelerators Stavros Volos Microsoft Research Accelerators Play Pivotal Role in

Activities on accelerators in Spain Francis Perez ALBA Accelerators Head on behalf of

SIPHRA Silicon Photomultiplier Readout ASIC Prototype ASIC for SiPM Based Gamma-Ray

ECE 5745 Complex Digital ASIC Design Section 1: ASIC Flow Front-End Christopher Batten School of

ASIC Physical Design Post-Layout Verification ASIC Physical Design (Standard Cell) (can also do

Testing: Our Experiences Test Case Sof tware Testing Software to be tested Output When to

Resource Allocation for Hardware Implementations of Map Richard Townsend Martha A. Kim Stephen

Data Processing on Modern Hardware Jens Teubner, TU Dortmund, DBIS Group

Software Quality It doesnt happen by accident. Dimensions of Software Quality Software

Superscalar Processors Raul Queiroz Feitosa Parts of these slides are from the support material

Integration Testing Chapter 13 Integration Testing Test the interfaces and interactions among

A Time-Multiplexed FPGA Overlay with Linear Interconnect Xiangwei Li , Douglas L. Maskell

Fei Li and Lei He Li and Lei He Fei ECE Dept. ECE Dept. University of Wisconsin

Sambuz

Useful Links

Newsletter

Mail Us

ASIC accelerators 1 Part 2 serial codes out Part 1 due tomorrow, - PowerPoint PPT Presentation

ASIC accelerators 1 Part 2 serial codes out Part 1 due tomorrow, 11:59PM Homework 3 Questions? 2 Consistent style easier to read even if your grammar is poor Usually better ofg rewriting completely have good habits Some

Application Accelerators: Application Accelerators: Application Accelerators: Application

ASIC Computer-Aided Design Flow ELEC 5250/6250 ASIC Design Flow ASIC Design Flow Behavioral

Coercive Powers &amp; ASIC Coercive Powers &amp; ASIC ASIC Summer School 2011 Richard Gilbert

ASIC Development @ GSI Holger Flemming Experiment Electronic / ASIC-Design 1 1 The GSI ASIC

Evolving ASIC Methodology to Adapt to Technology and EDA Tool Advances Tom Russell Manager ASIC

Measurements on P2 and P3 FE ASIC and Experience of P2 FE ASIC in ProtoDUNE-SP Shanshan Gao on

ASIC Research and Development at Fermilab R. Yarema April 20, 2005 Main areas ASIC R&amp;D

ASIC accelerators 1 To read more This days papers: Reagan et al, Minerva: Enabling

DETECTORS AND ACCELERATORS DETECTORS AND ACCELERATORS APPLIED TO MEDICINE Jos Bernabu Jos

Accelerators for Americas Future ACCELERATORS - MODERN SHIPS OF DISCOVERY October 26, 2009

R265: Advanced Topics in Computer Architecture Seminar 7: HW accelerators and accelerators for

Confidential Accelerators Stavros Volos Microsoft Research Accelerators Play Pivotal Role in

Activities on accelerators in Spain Francis Perez ALBA Accelerators Head on behalf of

SIPHRA Silicon Photomultiplier Readout ASIC Prototype ASIC for SiPM Based Gamma-Ray

ECE 5745 Complex Digital ASIC Design Section 1: ASIC Flow Front-End Christopher Batten School of

ASIC Physical Design Post-Layout Verification ASIC Physical Design (Standard Cell) (can also do

Testing: Our Experiences Test Case Sof tware Testing Software to be tested Output When to

Resource Allocation for Hardware Implementations of Map Richard Townsend Martha A. Kim Stephen

Data Processing on Modern Hardware Jens Teubner, TU Dortmund, DBIS Group

Software Quality It doesnt happen by accident. Dimensions of Software Quality Software

Superscalar Processors Raul Queiroz Feitosa Parts of these slides are from the support material

Integration Testing Chapter 13 Integration Testing Test the interfaces and interactions among

A Time-Multiplexed FPGA Overlay with Linear Interconnect Xiangwei Li , Douglas L. Maskell

Fei Li and Lei He Li and Lei He Fei ECE Dept. ECE Dept. University of Wisconsin

Sambuz

Useful Links

Newsletter

Mail Us

Coercive Powers & ASIC Coercive Powers & ASIC ASIC Summer School 2011 Richard Gilbert

ASIC Research and Development at Fermilab R. Yarema April 20, 2005 Main areas ASIC R&D