Outline Multiplication in the digital domain HW mapping - PowerPoint PPT Presentation

Outline • Multiplication in the digital domain • HW ‐ mapping Introduction to Structured VLSI Design • Pipelining optimization ‐ Integer Arithmetic and Pipelining Joachim Rodrigues Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic 8 ‐ bit Signed/Unsigned Integers Signed and Unsigned Integers Signed overflow ↑ ‐ 128 1000 0000 n-1 ‐ 127 1000 0001 � Unsigned integer: ∑ bit i • 2 i ... ... 1111 1100 i=0 MSB defines sign 1111 1101 ‐ 2 1111 1110 ‐ 1 1111 1111 � Two's complement signed integer: Signed integers 0 0000 0000 0 1 0000 0001 1 n-2 2 0000 0010 2 bit n-1 • (-2 n-1 ) + ∑ bit i • 2 i 3 0000 0011 3 ... ... ... i=0 126 Unsigned integers 126 0111 1110 Signed overflow ↓ 127 0111 1111 127 1000 0000 128 1000 0001 129 ... ... 1111 1110 254 1111 1111 255 Unsigned overflow ↓ n-1 5 4 3 2 1 0 Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic

Unsigned Overflow Examples Add/Subtract 10+6 = 16, outside [0..15] A n ‐ 1 B n ‐ 1 A 1 B 1 A 0 B 0 1010 +0110 C 4 = 1 0000 ... + + + C n ‐ 1 C n C 2 C 1 C 0 = 0 7-10 = -3, outside [0..15] C n = C 4 = 1 & add ⇔ Unsigned overflow 0111 S 0 S n ‐ 1 S 1 ⇔ Unsigned overflow Carry-out & add - 1010 same as � The HW for sum/difference (S) doesn't care about signed/unsigned 0111 0101 � Unsigned overflow = Carry ‐ out & add OR no carry-out & subtract ⇔ Unsigned overflow + 1 C 4 = 0 1101 � Signed overflow = C n ⊕ C n ‐ 1 C n = C 4 = 0 & subtract ⇔ Unsigned overflow � True sign = S n ‐ 1 ⊕ signed overflow = (A n ‐ 1 ⊕ B n ‐ 1 ⊕ C n ‐ 1 ) ⊕ (C n ⊕ C n ‐ 1 ) = A n ‐ 1 ⊕ B n ‐ 1 ⊕ C n No carry-out & subtract ⇔ Unsigned overflow Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Signed Overflow Example Multiplication � Product = Multiplicand * Multiplier 6+7 = 13, outside [-8..7] � log (product) = log (multiplicand) + log (multiplier) C 3 = 1 0110 � Width of product is (worst ‐ case) sum of widths of factors +0111 � May overflow if single length product register is used C 4 =0 1101 � Paper ‐ and ‐ pencil method � Conditional add (controlled by bits of multiplier) and shift C n ⊕ C n-1 = C 4 ⊕ C 3 = 0 ⊕ 1 = 1 ⇔ Carry-outs different ⇔ Signed overflow � Partial product progressively develops into product � 1 product bit/cycle S n-1 ⊕ signed overflow = � Unsigned and signed multiplication A n-1 ⊕ B n-1 ⊕ C n = A 3 ⊕ B 3 ⊕ C 4 = 0 ⊕ 0 ⊕ 0 = 0 ⇔ True sign = Positive/zero � Signs require extra attention � Sequential, combinational or pipelined implementation � Tradeoff between hardware resources, throughput, latency, power Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic

Multiplying Using Paper and Pencil ... more Paper and Pencil Multiplicand * Multiplier Partl ‐ product Partl ‐ multiplier We will concentrate on unsigned integers for the next few slides ! 1011*1110 0000 1110 Example: 0000 (0) + 0000 ‐ > 0000 0 111 LSB ”controls” 1011 * 1110 1011. (1) + 1011 . whether to add 0000 (*0 = zero) ‐ > 0101 10 11 ”0” or multiplicand 1011. . (1) + 1011 . . to partial product +1011. (*1 = copy) ‐ > 1000 010 1 1011. . . (1) + 1011 . . . +1011.. (*1 = copy) 10011010 1001 1010 +1011... (*1 = copy) 10011010 0 Multiplicand Partial prod uct, part.mul. Disadvantage: 2n ‐ bi t ALU Advantage: n ‐ bit ALU In decimal: 11 * 14 = 154 0: add zero, 1: add multiplicand Shifting in carry ‐ out prevents overflow Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Seq. Multiplication, Initialize Seq. Multiplication, Step n ‐ bit reg. n ‐ bit reg. Repeat step n times Multiplicand Multiplicand Load Load Control signal Conditional add C n Add C n Add C n Partial Partial product x 0 Multiplier Load Load Shift right multiplier bit 0 bit 0 2n ‐ bit reg. 2n ‐ bit reg. Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic

Seq. Multiplication, Result Don't forget ... Signed Multiplication � Either transform to multiply of non ‐ negative integers: n ‐ bit reg. Multiplicand 1. Record signs and negate any negative factors. 2. Perform unsigned multiplication. 3. Negate product if signs above differ. C n Add � Or directly perform signed multiplication: 1. Take into account the sign bit of multiplicand by shifting in true sign bits rather than carry ‐ outs, i.e. A n ‐ 1 ⊕ B n ‐ 1 ⊕ C n rather than C n . Product 2. Take into account the sign bit of multiplier by bit 0 2n ‐ bit reg. doing a conditional subtract rather than a conditional add during the last iteration. one partial product per clock cycle => very slow Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Seq. signed multiplication, step Multiplication by a Constant n ‐ bit reg. Repeat step n times Multiplicand As a designer you need to assure that division with a small constant is accomplished by a number of shifts and adds Conditional add for iteration 1.. n ‐ 1, Some numerical examples: conditional subtract for iteration n Add/ True *2 (*10 2 ): multiplicand << 1 sub sign *3 (*11 2 ): multiplicand << 1 + multiplicand *4 (*100 2 ): multiplicand << 2 *5 (*101 2 ): multiplicand << 2 + multiplicand *255 (*11111111 2 ): multiplicand << 8 – multiplicand True sign Partial Partial product x Shift right multiplier bit 0 2n ‐ bit reg. True sign = A n ‐ 1 ⊕ B n ‐ 1 ⊕ C n Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic

String of n ‐ bit Adders Carry ‐ save Adders in Multipliers � Unrolling loop lowers latency � Significantly reduced delays for multi ‐ input adders Mp 1 *Mc Mp 0 *Mc � Full ‐ adders with clever interconnect when compared to sequential 0 � Sum and carries fed separately to adder at next level add ‐ and ‐ shift at the expense � Carries drawn diagonally, sums drawn vertically of much more hardware � Typically, a final (carry ‐ propagate) adder assimilates the carries Mp 2 *Mc � n x n multiplication requires n ‐ 1 n ‐ bit adders A 0,2 B 0,2 A 0,1 B 0,1 B 0,0 C 0,2 C 0,1 A 0,0 C 0,0 � t saved_latency = n*(t clk ‐ out +t set ‐ up ) + + + CSA 0 Mp n ‐ 1 *Mc C 1,3 S 1,2 C 1,2 S 1,1 C 1,1 S 1,0 A 1,2 C 1,0 A 1,1 A 1,0 CSA 1 + + + C 2,3 P 2n ‐ 1 P 2n ‐ 2..n P n ‐ 1 P 2 P 1 P 0 S 2,2 C 2,2 S 2,1 C 2,1 S 2,0 Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic ... Pipelined Version 6 x 6 Parallel Array Multiplier MP i, j = Multiplier i AND Multiplicand j MP 0,3 MP 0,2 MP 0,1 MP 0,0 MP 1,3 MP 1,2 MP 1,1 MP 1,0 0 0 0 MP 2,3 MP 2,2 MP 2,1 + MP 2,0 + + MP 3,3 MP 3,2 MP 3,1 MP 3,0 Pipeline registers + + + Pipeline registers + + + Pipeline registers Carry ‐ propagate adder P 7 P 6 P 5 P 4 P 3 P 2 P 1 P 0 Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic

Sequential, Combinational, and Pipelined � The sequential shift ‐ and ‐ add algorithm corresponds to a for ‐ loop that may be implemented by: � a state machine or � instructions (low ‐ end microcontrollers) � The sequential algorithm may be unrolled and implemented as a deep combinational circuit: � String of n ‐ bit adders and AND ‐ gates, or Pipelining � Carry ‐ save adders, AND ‐ gates, and final (n ‐ 1) ‐ bit adder � Advantage: low latency � Disadvantage: more hardware � The deep combinational circuit may be pipelined � Advantage: very high throughput � Disadvantages: pipeline latency, more hardware, and higher power Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Laundry process Comparison • Non ‐ pipelined: – Delay: 60 min – Throughput 1/60 load per min • Pipelined: – Delay: 60 min – Throughput k/(40+k*20) load per min about 1/20 when k is large – Throughput 3 times better than non ‐ pipelined Joachim Rodrigues, Informatik og Matematisk Modellering, jnr@imm.dtu.dk Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic

Outline Multiplication in the digital domain HW mapping - PowerPoint PPT Presentation

Outline Multiplication in the digital domain HW mapping Introduction to Structured VLSI Design Pipelining optimization Integer Arithmetic and Pipelining Joachim Rodrigues Joachim Rodrigues, EIT, LTH, Introduction to Structured

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Equivariant Toeplitz index L. Boutet de Monvel CIRM, Septembre 2013 UPMC, F75005, Paris, France

Real Johnson-Wilson Theories Joint with S. Wilson Warsaw, July, 2009 Joint with S. Wilson Real

Universality for roots of random trigonometric models Guillaume POLY 19 june 2018 Guillaume

Three-nucleon forces and exotic nuclei Javier Menndez Institut fr Kernphysik (TU Darmstadt)

Evidence for three-nucleon interaction in isotope shifts of Z = magic nuclei H. Nakada ( Chiba U.,

Non-Archimedean White Noise, Pseudodierential Stochastic Equations, and Massive Euclidean

Hardware Design with VHDL Design Example: UART ECE 443 UART Universal Asynchronous Receiver and

What Sacrifice is All About Hebrews 10:5-18 fulfilling Old Testament Scripture Quotation