Flexible Hardware Design at Flexible Hardware Design at Low Levels - PowerPoint PPT Presentation

Flexible Hardware Design at Flexible Hardware Design at Low Levels of Abstraction Low Levels of Abstraction Emil Axelsson Hardware Description and Verification May 2009

Why low-level? Why low-level? Related question: Why is some software written in C? (but difference between high- and low-level is much greater in hardware) Ideal: Software-like code → magic compiler → chip masks gadget a b = case a of 2 -> thing (b+10) 3 -> thing (b+20) _ -> fixNumber a

Why low-level? Why low-level? Reality: “Ascii schematic” → chain of synthesis tools → chip masks

Why low-level? Why low-level? Reality: “Ascii schematic” → chain of synthesis tools → chip masks Reiterate to improve timing/power/area/etc. Very costly / time-consuming Each fabrication costs ≈ $ 1.000.000

Failing abstraction Failing abstraction Realistic flow cannot avoid low-level awareness Paradox Modern designs require higher abstraction level ...but... Modern chip technologies make abstraction harder Main problem: Routing wires are dominant in signal delays and power consumption Controlling the wires is key to the performance!

Gate vs. wire delay under scaling Gate vs. wire delay under scaling Relative delay Process technology node [nm]

Physical design level Physical design level Certain high-performance components (e.g. arithmetic) need to be designed at even lower level Physical level: A set of connected standard cells (implemented gates) Absolute or relative positions of cells (placement) Shape of connecting wires (routing)

Physical design level Physical design level Design by interfacing to physical CAD tools Call automatic tools for certain tasks (mainly routing) Often done through scripting code Tedious Hard to explore design space Limited design reuse Aim of this work: Raise the abstraction level of physical design! Raise the abstraction level of physical design!

Two ways to raise abstraction Two ways to raise abstraction Automatic synthesis + Powerful abstraction – May not be optimal for e.g. high-performance arithmetic – Opaque (hard to control the result) – Unstable (heuristics-based) Language-based techniques (higher-order functions, recursion, etc.) + Transparent, stable – Still quite low-level – Somewhat limited to regular circuits

Two ways to raise abstraction Two ways to raise abstraction Automatic synthesis + Powerful abstraction – May not be optimal for e.g. high-performance arithmetic – Opaque (hard to control the result) – Unstable (heuristics-based) Language-based techniques (higher-order functions, recursion, etc.) + Transparent, stable – Still quite low-level – Somewhat limited to regular circuits Our approach

Lava Lava Gate-level hardware description in Haskell Parameterized module generators : Haskell programs that generate circuits Can be smart, e.g. optimize for speed in a given environment Basic placement expressed through combinators Used successfully to generate high-performance FPGA cores

Wired: Extension to Lava Wired: Extension to Lava Finer control over geometry More accurate performance models Feedback from timing/power analysis enables self-optimizing generators Wire-awareness (unique for Wired) Performance analysis based on wire length estimates Control routing through “guides” (experimental) ...

Monads in Haskell Monads in Haskell Haskell functions are pure Side-effects can be “simulated” using monads Syntactic sugar, add a b = do prog = do expands to a pure as <- get a <- add 5 6 program with explicit put (a:as) b <- add a 7 state passing return (a+b) add b 8 *Main> runState prog [] (26, [18,11,5]) Monads can also be used to model e.g. IO, exceptions, Result Side-effect non-determinism etc.

Monad combinators Monad combinators Haskell has a general and well-understood combinator library for monadic programs *Main> runState ( mapM (add 2) [11..13]) [] ([13,14,15],[2,2,2]) *Main> runState ( mapM (add 2 >=> add 4) [11..13]) [] ([17,18,19],[4,2,4,2,4,2])

Example: Parallel prefix Example: Parallel prefix Given inputs x 1 , x 2 , … x n y 1 = x 1 compute y 2 = x 1 ∘ x 2 … y n = x 1 ∘ x 2 ∘ … ∘ x n for ∘ , an associative (but not necessarily commutative) operator

Parallel prefix Parallel prefix Very central component in microprocessors Most common use: Computing carries in fast adders Trying different operators: Addition: prefix (+) [1,2,3,4]

Parallel prefix Parallel prefix Very central component in microprocessors Most common use: Computing carries in fast adders Trying different operators: Addition: prefix (+) [1,2,3,4] = [1, 1+2, 1+2+3, 1+2+3+4] = [1,3,6,10]

Parallel prefix Parallel prefix Very central component in microprocessors Most common use: Computing carries in fast adders Trying different operators: Addition: prefix (+) [1,2,3,4] = [1, 1+2, 1+2+3, 1+2+3+4] = [1,3,6,10] Boolean OR: prefix (||) [F,F,F, T ,F, T , T ,F]

Parallel prefix Parallel prefix Very central component in microprocessors Most common use: Computing carries in fast adders Trying different operators: Addition: prefix (+) [1,2,3,4] = [1, 1+2, 1+2+3, 1+2+3+4] = [1,3,6,10] Boolean OR: prefix (||) [F,F,F, T ,F, T , T ,F] = [F,F,F, T , T , T , T , T ]

Parallel prefix Parallel prefix Implementation choices (relying on associativity): prefix ( ∘ ) [ x 1 , x 2 , x 3 , x 4 ] = [ y 1 , y 2 , y 3 , y 4 ] Serial: y 4 = (( x 1 ∘ x 2 ) ∘ x 3 ) ∘ x 4 y 4 = ( x 1 ∘ x 2 ) ∘ ( x 3 ∘ x 4 ) Parallel: y 4 = y 3 ∘ x 4 Sharing:

There are many of them... There are many of them... Sklansky Brent-Kung Ladner-Fischer

Parallel prefix: Sklansky Parallel prefix: Sklansky Simplest approach (divide-and-conquer) sklansky op [a] = return [a] sklansky op as = do Purely structural let k = length as `div` 2 (no geometry) (ls,rs) = splitAt k as' ls' <- sklansky op ls rs' <- sklansky op rs rs'' <- sequence [op (last ls', r) | r <- rs'] return (ls' ++ rs'') Could have been (monadic) Lava

Refinement: Add placement Refinement: Add placement sklansky op [a] = space cellWidth [a] sklansky op as = downwards 1 $ do let k = length as `div` 2 (ls,rs) = splitAt k as' (ls',rs') <- rightwards 0 $ liftM2 (,) (sklansky op ls) (sklansky op rs) rs'' <- rightwards 0 $ sequence [op (last ls', r) | r <- rs'] return (ls' ++ rs'')

Sklansky with placement Sklansky with placement Simple postscript allows interactive development of placement

Refinement: Add routing guides Refinement: Add routing guides bus = rightwards 0 . mapM bus1 where bus1 = space 2750 >=> guide 3 500 >=> space 1250 sklanskyIO op = downwards 0 $ inputList 16 "in" >>= bus >>= space 1000 >>= sklansky op >>= space 1000 >>= bus >>= output "out" Reusing standard (monadic) Haskell combinators (nothing Wired-specific)

Sklansky with guides Sklansky with guides

Refinement: More guides Refinement: More guides sklansky op [a] = space cellWidthD [a] sklansky op as = downwards 1 $ do bus as let k = length as `div` 2 (ls,rs) = splitAt k as (ls',rs') <- rightwards 0 $ liftM2 (,) (sklansky op ls) (sklansky op rs) rs'' <- rightwards 0 $ sequence [op (last ls', r) | r <- rs'] bus (ls' ++ rs'')

Sklansky with guides Sklansky with guides

Experiment: Compaction Experiment: Compaction sklansky op [a] = space cellWidthD [a] Buses were compacted separately sklansky op [a] = return [a]

Export to CAD tool (Cadence Soc Encounter) Export to CAD tool (Cadence Soc Encounter) Exchanged using DEF file format Auto-routed in Encounter Odd rows flipped to share power rails Simple change in recursive call: sklansky (flipY.op) ls

Fast, low-power prefix networks Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel prefix networks Initially, crude performance models Delay: Logical depth Power: Number of operators Still good results Now using Wired to improve accuracy Static timing/power analysis using models from cell library

Minimal change to search algorithm Minimal change to search algorithm prefix f p = memo pm where pm ([],w) = perhaps id' ([],w) pm ([i],w) = perhaps id' ([i],w) pm (is,w) | 2^(maxd(is,w)) < length is = Fail pm (is,w) = (bestOn is f . dropFail) [ wrpC ds (prefix f p ) (prefix p p ) | ds <- igen ... ] where wrpC ds p1 p2 = wrp ds (perhaps id’ c) (p1 c1) (p2 c2) ...

Minimal change to search algorithm Minimal change to search algorithm prefix f p = memo pm where pm ([],w) = perhaps id' ([],w) pm ([i],w) = perhaps id' ([i],w) pm (is,w) | 2^(maxd(is,w)) < length is = Fail pm (is,w) = (bestOn is f . dropFail) [ wrpC ds (prefix f p ) (prefix p p ) | ds <- igen ... ] where wrpC ds p1 p2 = wrp ds (perhaps id’ c) (p1 c1) (p2 c2) ... Plug in cost functions that analyze the placed network through Wired

Flexible Hardware Design at Flexible Hardware Design at Low Levels - PowerPoint PPT Presentation

Flexible Hardware Design at Flexible Hardware Design at Low Levels of Abstraction Low Levels of Abstraction Emil Axelsson Hardware Description and Verification May 2009 Why low-level? Why low-level? Related question: Why is some software

Hardware Observability Framework Hardware Observability Framework Hardware Observability

The The Beverly Beverly Middle Middle School School Flexible Flexible Learning Learning

Personalized Learning Flexible Seating and Space Flexible Seating and Space Flexible Seating and

software and hardware for the Internet of Things. Choose hardware Design hardware Design

Flexible Instruction Day Parent Presentation Flexible Instruction Day March 16 - 20 - Flexible

Flexible Infrastructure Qualification What Is Flexible Infrastructure/Benefits Flexible

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

FSA - HSA - HRA Spending & Savings Accounts Flexible Spending Account (FSA) Flexible

20 Introduction Frequency spectrum for LTE Flexible spectrum use Flexible

PyNN and the FACETS Hardware Daniel Brderle Heidelberg FACETS Hardware: Recap

LibreCores Free and Open Digital Hardware Requirements Design Implementation Hardware

Hardware Design for Cryptographers P . Schaumont Bradley Department of Electrical and Computer

Hardware evaluation and procurement Hardware: competition, evolution, Evaluation of CPU nodes

BIOINSPIRED HARDWARE Erki Suurjaak Overview bioinspired hardware NASAs exploration

Secure Hardware HOW CAN WE PROTECT OUR HARDWARE ??? HOW CAN OUR HARDWARE PROTECT ITSELF ??? 1

Why formalize? n ML is tricky, particularly in corner cases Formal Semantics n generalizable type

OpenMP: a shared-memory parallel programming model Eduard Ayguad Computer Sciences Department

Parallel Models Different ways to exploit parallelism Reusing this material This work is

Parallel Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Sequential Algorithms Classical

Lambda Calculus with Types Henk Barendregt ICIS Radboud University Nijmegen The Netherlands New

Information Dynamics Samson Abramsky Department of Computer Science, Oxford University Samson

Static Typing Slides available from github at: https://github.com/bhurt/presentations/blob/master

Shared Memory Parallel Programming Abhishek Somani, Debdeep Mukhopadhyay Mentor Graphics, IIT

Flexible Hardware Design at Flexible Hardware Design at Low Levels - PowerPoint PPT Presentation

Flexible Hardware Design at Flexible Hardware Design at Low Levels of Abstraction Low Levels of Abstraction Emil Axelsson Hardware Description and Verification May 2009 Why low-level? Why low-level? Related question: Why is some software

Hardware Observability Framework Hardware Observability Framework Hardware Observability

The The Beverly Beverly Middle Middle School School Flexible Flexible Learning Learning

Personalized Learning Flexible Seating and Space Flexible Seating and Space Flexible Seating and

software and hardware for the Internet of Things. Choose hardware Design hardware Design

Flexible Instruction Day Parent Presentation Flexible Instruction Day March 16 - 20 - Flexible

Flexible Infrastructure Qualification What Is Flexible Infrastructure/Benefits Flexible

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

FSA - HSA - HRA Spending &amp; Savings Accounts Flexible Spending Account (FSA) Flexible

20 Introduction Frequency spectrum for LTE Flexible spectrum use Flexible

PyNN and the FACETS Hardware Daniel Brderle Heidelberg FACETS Hardware: Recap

LibreCores Free and Open Digital Hardware Requirements Design Implementation Hardware

Hardware Design for Cryptographers P . Schaumont Bradley Department of Electrical and Computer

Hardware evaluation and procurement Hardware: competition, evolution, Evaluation of CPU nodes

BIOINSPIRED HARDWARE Erki Suurjaak Overview bioinspired hardware NASAs exploration

Secure Hardware HOW CAN WE PROTECT OUR HARDWARE ??? HOW CAN OUR HARDWARE PROTECT ITSELF ??? 1

Why formalize? n ML is tricky, particularly in corner cases Formal Semantics n generalizable type

OpenMP: a shared-memory parallel programming model Eduard Ayguad Computer Sciences Department

Parallel Models Different ways to exploit parallelism Reusing this material This work is

Parallel Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Sequential Algorithms Classical

Lambda Calculus with Types Henk Barendregt ICIS Radboud University Nijmegen The Netherlands New

Information Dynamics Samson Abramsky Department of Computer Science, Oxford University Samson

Static Typing Slides available from github at: https://github.com/bhurt/presentations/blob/master

Shared Memory Parallel Programming Abhishek Somani, Debdeep Mukhopadhyay Mentor Graphics, IIT

FSA - HSA - HRA Spending & Savings Accounts Flexible Spending Account (FSA) Flexible