cs137 today electronic design automation
play

CS137: Today Electronic Design Automation Bit-Level Addition - PDF document

CS137: Today Electronic Design Automation Bit-Level Addition LUT Cascades For Sums Day 9: January 30, 2006 Applications FSMs Parallel Prefix SATADD Data Forwarding Pointer Jumping Applications


  1. CS137: Today Electronic Design Automation • Bit-Level – Addition – LUT Cascades • For Sums Day 9: January 30, 2006 – Applications • FSMs Parallel Prefix • SATADD • Data Forwarding • Pointer Jumping – Applications 1 2 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Ripple Carry Addition • Simple “definition” of addition • Serially resolve carry at each bit Introduction / Reminder Addition in Log Time 3 4 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon CLA Functions • Think about each • What functions can g(c[i-1]) be? adder bit as a – g(x)=1 computing a function • a[i]=b[i]=1 on the carry in – g(x)=x – C[i]=g(c[i-1]) • a[i] xor b[i]=1 – Particular function f will – g(x)=0 depend on a[i], b[i] • A[i]=b[i]=0 – G=f(a,b) 5 6 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 1

  2. Functions Combining • What functions can g(c[i-1]) be? • Want to combine functions – Compute c[i]=g i (g i-1 (c[i-2])) – g(x)=1 Generate – Compute compose of two functions • a[i]=b[i]=1 • What functions will the – g(x)=x Propagate compose of two of these • a[i] xor b[i]=1 functions be? – g(x)=0 Squash – Same as before • Propagate, generate, • A[i]=b[i]=0 squash 7 8 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Compose Rules Compose Rules (LSB MSB) (LSB MSB) Compose Result Compose Result GG GG S GP GP G GS GS S PG PG G PP PP P PS PS S SG SG G SP SP S SS SS S 9 10 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Reduce Tree Combining • Do it again… • Combine g[i-3,i-2] and g[i-1,i] • What do we get? 11 12 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 2

  3. Associative Reduce � Prefix Prefix Tree Prefix Tree • Shows us how to compute the Nth value in O(log(N)) time • Can actually produce all intermediate values in this time – w/ only a constant factor more hardware 13 14 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Parallel Prefix • Important Pattern Generalizing • Applicable any time operation is associative • Function Composition is always LUT Cascade associative 15 16 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Cascaded LUT Delay Model Parallel Prefix LUT Cascade? • Can we do better than N×Tmux? • Can we compute LUT cascade in O(log(N)) time? • Can we compute mux cascade using parallel prefix? • Tcascade =T(3LUT) + T(mux) • Don’t pay • Can we make mux cascade associative? – General interconnect – Full 4-LUT delay 17 18 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 3

  4. Parallel Prefix Mux cascade Parallel Prefix Mux cascade • How can mux transform S � mux-out? • How can mux transform S � mux-out? – A=0, B=0 � mux-out=0 – A=0, B=0 � mux-out=0 Stop= S – A=1, B=1 � mux-out=1 – A=1, B=1 � mux-out=1 Generate= G – A=0, B=1 � mux-out=S – A=0, B=1 � mux-out=S Buffer = B – A=1, B=0 � mux-out=/S – A=1, B=0 � mux-out=/S Invert = I 19 20 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Parallel Prefix Mux cascade Two-mux transforms • SS � S • GS � S • BS � S • IS � S • How can 2 muxes transform input? • SG � G • GG � G • BG � G • IG � G • Can I compute 2-mux transforms from 1 • SB � S • GB � G • BB � B • IB � I mux transforms? • SI � G • GI � S • BI � I • II � B 21 22 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Generalizing mux-cascade Associative Reduce Mux-Cascade • How can N muxes transform the input? • Is mux transform composition associative? Can be hardwired, no general interconnect 23 24 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 4

  5. Prefix Sum • Common Operation: For Sums – Want B[x] such that B[x]=A[0]+A[1]+…A[x] – For I=0 to x • B[x]=B[x-1]+A[x] 25 26 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Prefix Sum Other simple operators • Compute in tree fashion • Prefix-OR – A[I]+A[I+1] • Prefix-AND – A[I]+A[I+1]+A[I+2]+A[I+3] • Prefix-MAX – … • Prefix-MIN • Combine partial sums back down tree – S(0:7)+S(8:9)+S(10)=S(0:10) 27 28 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Find-First One Arbitration • Often want to find first M requestors • Useful for arbitration – E.g. Assign unique memory ports to first M – Finds first (highest-priority) requestor processors requesting – Also magnitude finding in numbers • Prefix-sum across all potential • How: requesters – Prefix-OR • Counts requesters, giving unique – Locally compute X[I-1]^X[I] number to each – Flags the first one • Know if one of first M – Perhaps which resource assigned 29 30 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 5

  6. Partitioning Channel Width • Use something to order • Prefix sum on delta wires at each node – E.g. spectral linear ordering – To compute net channel widths at all points along channel – …or 1D cellular swap to produce linear order – E.g. 1D ordered • Maybe use with cellular placement scheme • Parallel prefix on area of units – If not all same area • Know where the midpoint is 31 32 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Rank Finding • Looking for I’th ordered element • Do a prefix-sum on high-bit only FA/FSM Evaluation – Know m=number of things > 01111111… • High-low search on result (regular expression recognition) – I.e. if number > I, recurse on half with leading zero – If number < I, search for (I-m)’th element in half with high-bit true • Find median in log 2 (N) time 33 34 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Finite Automata Function Specialization • Machine has finite state: S • But, this is just functions – …and function composition is associative • On each cycle • Given that we know input sequence: – Input I – I 0 ,I 1 ,I 2 … – Compute output and new state • Can compute specialized functions: • Based on inputs and current state – f i (s)=f(s,I i ) • O i ,S (i+1) =f(S i ,I i ) • What is f i (s)? • Intuitively, a sequential process – Worst-case, a translation table: • S=0 � NS0, S=1 � NS1 …. – Must know previous state to compute next – Must know state to compute output 35 36 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 6

  7. Function Composition Recursive Function Composition • Now: O (i+m) ,S (i+m+1) = • Now: O (i+m) ,S (i+m+1) = f (i+m) (f (i+m-1) (f (i+m-2) (…f i (S i )))) f (i+m) (f (i+m-1) (f (i+m-2) (…f i (S i )))) • Can we compute the function • We can compute the composition composition? – f (i+1,i) (s)=f (i+1) (f i (s)) – f (i+1,i) (s)=f (i+1) (f i (s)) • Repeat to compute – What is f (i+1,i) (s)? – f (i+3,i) (s)=f (i+3,i+2) (f (i+1,i) (s)) • A translation table just like f i (s) and f (i+1) (s) – Etc. until have computed: f (i+m,i) (s) in • Table of size |S|, can fillin in O(|S|) time O(log(m)) steps 37 38 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Implications Saturated Addition • If can get input stream, • S (i+1) =max(min(I i +S i ,maxval),minval) – Any FA can be evaluated in O(log(N)) time • Could model as FSM with: – Regular Expression recognition in – |S|=maxval-minval O(log(N)) • So, in theory, FSM result applies • Any streaming operator with finite state • …but |S| might be 2 16 , 2 24 – Where the input stream is independent of the output stream – Can be run arbitrarily fast by using parallel- prefix on FSM evaluation 39 40 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon SATADD Composition SATADD Composition • Can compute composition efficiently [Papadantonakis et al. FPT2005] 41 42 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 7

  8. SATADD Reduce Tree Data Forwarding UltraScalar From Henry, Kuszmaul, et al. ARVLSI’99, SPAA’99, ISCA’00 43 44 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Ultrascalar: concept model Consider Machine • Each FU has a full RF – FU=Functional Unit – RF=Register File • Build network between FUs – use network to connect produce/consume – user register names to configure interconnect • Signal data ready along network 45 46 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Ultrascalar Concept Ultrascalar: cyclic prefix • Linear delay • O(1) register cost / FU • Complete renaming at each FU – different set of registers – so when say complete RF at each FU, that’s only the logical registers 47 48 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend