Lava 4 (relevant to take home exam) Stepping back to see the bigger - - PowerPoint PPT Presentation

lava 4 relevant to take home exam stepping back to see
SMART_READER_LITE
LIVE PREVIEW

Lava 4 (relevant to take home exam) Stepping back to see the bigger - - PowerPoint PPT Presentation

Lava 4 (relevant to take home exam) Stepping back to see the bigger picture Where can more info. be found? What are the hot research topics? 1 Prefix Given inputs x1, x2, x3 xn Compute x1, x1*x2, x1*x2*x3, , x1*x2 ** xn


slide-1
SLIDE 1

Lava 4 (relevant to take home exam) Stepping back to see the bigger picture

Where can more info. be found? What are the hot research topics?

1

slide-2
SLIDE 2

Prefix

Given inputs x1, x2, x3 … xn Compute x1, x1*x2, x1*x2*x3, … , x1*x2*…*xn Where * is an arbitrary associative (but not necessarily commutative) operator

2

slide-3
SLIDE 3

Why interesting?

Microprocessors contain LOTS of parallel prefix circuits

not only binary and FP adders address calculation priority encoding etc.

Overall performance depends on making them fast But they should also have low power consumption... Parallel prefix is a good example of a connection pattern for which it is interesting to do better synthesis

3

slide-4
SLIDE 4

Serial prefix

least most significant inputs n=8 depth d=7 size s=7 (number ops) Pictures generated by symbolic evaluation of Lava descriptions Style is specific to parallel prefix

4

slide-5
SLIDE 5

5

serr _ [a] = [a] serr op (a:b:bs) = a:cs where c = op(a,b) cs = serr op (c:bs) *Main> simulate (serr plus) [1..10] [1,3,6,10,15,21,28,36,45,55]

slide-6
SLIDE 6

Sklansky

6

slide-7
SLIDE 7

Sklansky

32 inputs, depth 5, 80 operators

7

slide-8
SLIDE 8

skl _ [a] = [a] skl op as = init los ++ ros' where (los,ros) = (skl op las, skl op ras) ros' = fan op (last los : ros) (las,ras) = halveList as

8

slide-9
SLIDE 9

9

Brent Kung

fewer ops, at cost of being deeper. Fanout only 2

slide-10
SLIDE 10

BK recursive pattern

10

P is another half size network operating on only the thick wires

slide-11
SLIDE 11

11

Ladner Fischer

NOT the same as Sklansky; many books and papers are wrong about this (including slides from Digital Circuit Design course)

slide-12
SLIDE 12

Question

How do we design fast low power prefix networks?

12

slide-13
SLIDE 13

Answer

Generalise the above recursive constructions Use dynamic programming to search for a good solution User Wired to increase accuracy of power and delay estimations (see later lecture by Emil)

13

slide-14
SLIDE 14

BK recursive pattern

14

P is another half size network operating on only the thick wires This is an alternative view to the ”forwards and backwards trees” that some of you saw in Jeppson’s course

slide-15
SLIDE 15

BK recursive pattern generalised

15

Each S is a serial network like that shown earlier

slide-16
SLIDE 16

16

4 2 3 … 4 This sequence of numbers determines how the outer ”layer” looks

slide-17
SLIDE 17

17

4 2 3 … 4 4 2 3 … 4

  • 1 +1

sequence for widths of fans at bottom is closely related

slide-18
SLIDE 18

18

4 2 3 … 4 3 2 3 … 5 sequence for widths of fans at bottom is closely related

slide-19
SLIDE 19

19

4 2 3 … 4 So just look at all possibilities for this sequence and for each one find the best possibility for the smaller P Then pick best overall! Dynamic programming

slide-20
SLIDE 20

Search!

need a measure function (e.g. number of operators) Very similar to a ”shortest paths” algorithm

20

slide-21
SLIDE 21

21

wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([d],_,w) = trywire ([d],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC ds (prefix f)| ds <- topds g h (length is)] where . . . .

The real code!

slide-22
SLIDE 22

22

wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([d],_,w) = trywire ([d],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC ds (prefix f)| ds <- topds g h (length is)] where . . . .

The real code!

f1 is the measure function being

  • ptimised for
slide-23
SLIDE 23

23

wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([d],_,w) = trywire ([d],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC ds (prefix f)| ds <- topds g h (length is)] where . . . .

The real code!

g is max width of small S and F

  • networks. Controls fanout.
slide-24
SLIDE 24

24

wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([d],_,w) = trywire ([d],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC ds (prefix f)| ds <- topds g h (length is)] where . . . .

The real code!

context delays in wire numbers (positions) in allowed depth (is,xs,w)

slide-25
SLIDE 25

25

wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([d],_,w) = trywire ([d],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC ds (prefix f)| ds <- topds g h (length is)] where . . . .

The real code!

use memoisation to avoid expensive recomputation

slide-26
SLIDE 26

26

wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([d],_,w) = trywire ([d],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC ds (prefix f)| ds <- topds g h (length is)] where . . . .

The real code!

base case: single wire

slide-27
SLIDE 27

27

wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([d],_,w) = trywire ([d],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC ds (prefix f)| ds <- topds g h (length is)] where . . . .

The real code!

Fail if it is simply impossible to fit a prefix network in the available depth

slide-28
SLIDE 28

28

wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([d],_,w) = trywire ([d],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC ds (prefix f)| ds <- topds g h (length is)] where . . . .

The real code!

For each candidate sequence: Build the resulting network (where call of (prefix f) gives the best network for the recursive call inside) (Needed to think hard about controlling size of search space)

slide-29
SLIDE 29

29

parpre f1 g ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([d],_,w) = trywire ([d],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC ds (prefix f)| ds <- topds g h (length is)] where . . . .

The real code!

Finally, pick the best among all these candidates

slide-30
SLIDE 30

30

Result when minimising number of ops, depth 6, 33 inputs, fanout 7 This network is Depth Size Optimal (DSO) depth + number of ops = 2(number of inputs)-2 (known to be smallest possible no. ops for given depth, inputs) 6 + 58 = 2*33 – 2

slide-31
SLIDE 31

31

64 inputs, depth 8, size 118 (also DSO) BUT not min. depth. We need to move away from DSO if we want shallow networks

slide-32
SLIDE 32

A further generalisation

32

slide-33
SLIDE 33

33

parpre1 f1 f2 g m ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([],_,w) = trywire ([],w) pm ([i],_,w) = trywire ([i],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC1 ds (prefix f) (prefix f2)| ds <- topds1 g h m lis]

slide-34
SLIDE 34

34

parpre1 f1 f2 g m ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([],_,w) = trywire ([],w) pm ([i],_,w) = trywire ([i],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC1 ds (prefix f) (prefix f2)| ds <- topds1 g h m lis] extra base case for 0 inputs

slide-35
SLIDE 35

35

parpre1 f1 f2 g m ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([],_,w) = trywire ([],w) pm ([i],_,w) = trywire ([i],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC1 ds (prefix f) (prefix f2)| ds <- topds1 g h m lis] now there are 2 recursive calls

slide-36
SLIDE 36

Result

When minimising no. of ops: gives same as Ladner Fischer for 2^n inputs, depth n, considerably fewer ops and lower fanout elsewhere (non power of 2, deeper) Translates into low power plus decent speed when exported to Design Compiler

36

slide-37
SLIDE 37

37

Link to Wired allows more accurate estimates. Can then explore design space

slide-38
SLIDE 38

38

Can also export to Cadence SoC Encounter

slide-39
SLIDE 39

Wired

Start with Lava-like description and then gradually add placement info. + wiring ”guides” Can still use our bag of programming tricks (still embedded in Haskell) Quick but relatively accurate design exploration See lecture by Emil on thursday

39

slide-40
SLIDE 40

Obvious questions

This is very low level. What about higher up, earlier in the design? (Tentative assertion: these were general programming idioms with possible application at other levels of abstraction.) What about the cases when such a structural approach is inappropriate? Can we make refinement work? Can we design appropriate GENERIC verification methods?

40

slide-41
SLIDE 41

Putting the designer in control

Connection patterns are essential first step (and give some layout awareness when wanted) We write circuit generators rather than circuit descriptions. Everything is done behind the scenes by symbolic evaluation. Full power of Haskell is available to the user (but we have some useful idioms to reduce the fear). Circuit generators are short and sweet and LOOK LIKE circuit descriptions.

41

slide-42
SLIDE 42

It’s all about programming

Non-standard interpretation used after generation (as we have long done) and now also to guide synthesis Clever circuits a good idiom. Can control choice of components, wiring and topology. Greatly increase expressive power of the connection patterns approach. Having a full functional language available is a great once one has had some practice. More idioms to be discovered Ideas compatible with Intel’s IDV

42

slide-43
SLIDE 43

We can’t only think about function

Clever circuits give a way to allow non-functional properties to influence design (even early on). Makes blocks context sensitive. Vital as we move to deep sub-micron Separation of concerns becoming less and less possible First experiments are (and will be) about module generation Remains to be seen if there are applications at higher levels Hopefully, a project on DSP Algorithm Design with Ericsson will explore this

43

slide-44
SLIDE 44

44

The Big Picture (Design and Verification Languages) (see chapter in e-Book)

VHDL Verilog C

UML

slide-45
SLIDE 45

45

The Big Picture (Languages)

VHDL Verilog C

UML

slide-46
SLIDE 46

46

Intel IDV (Seger)

Forte (Intel’s FV system)

IBM SystemML (now called HDML,

  • n sourceforge)

Masters projects possible Behavioural Lava (York) Lava + Wired etc.

Bluespec SV

Lustre, Esterel Cryptol

slide-47
SLIDE 47

47

The Big Picture (Verification methods) (see course intro., lectures by Seger and Kunz)

Equivalence Checking (formal)

Simulation

Property Checking Formal

slide-48
SLIDE 48

48

Kunz (Infineon, Siemens, Bosch … OneSpin) processor and SoC verification SAT-based Extremely impressive! see also work at companies like NVIDIA, Freescale, … (see panel at FMCAD 2007 (links page))

A problem is that there is a lot of unpublished work….

slide-49
SLIDE 49

49

Intel (Seger’s lecture) Forte (STE) niches (such as Floating Point Arith.) IBM Sixth Sense combines formal and semi-formal emphasises scalability and automation see great presentation by Baumgartner from FMCAD 2006 (links page)

slide-50
SLIDE 50

Hot research topics

Coverage (OneSpin look to have something very interesting, but it is not public) Methodology, Finding new FV ”recipes” Moving up in abstraction levels Satisfiability Modulo Theories (SMT), First Order Logic How to design (and verify) complete systems has become harder because of multicore Getting control of non-functional properties (particularly power consumption)

50

slide-51
SLIDE 51

Hot research topics

Parallelisation of EDA algorithms Protocol verification Increasing automation of FV (e.g. transformation-based verification ala Sixth Sense) how to build and use verification IP reuse Post-silicon verification

51

slide-52
SLIDE 52

You should think about

The two different design flows that you have seen What was good and bad about them YOUR opinions based on your experience (which is influenced by previous expertise) Formal Verification evidence about its use (suitable niches, module verification) limitations (a main one being scalability) what it can give when it works

52