Lund University / EITF35/ Liang Liu
EITF35: Introduction to Structured VLSI Design
Part 2.1.1: Combinational circuit
Liang Liu liang.liu@eit.lth.se
1
VLSI Design Part 2.1.1: Combinational circuit Liang Liu - - PowerPoint PPT Presentation
EITF35: Introduction to Structured VLSI Design Part 2.1.1: Combinational circuit Liang Liu liang.liu@eit.lth.se 1 Lund University / EITF35/ Liang Liu Why Called Combinational Circuits? Combination In mathematics a combination is
Lund University / EITF35/ Liang Liu
Part 2.1.1: Combinational circuit
Liang Liu liang.liu@eit.lth.se
1
Lund University / EITF35/ Liang Liu
Why Called “Combinational” Circuits?
Combination
Combinational Circuits
present input only.
2
Lund University / EITF35/ Liang Liu
Two basic components
3
Operands (Data type) Operations
Lund University / EITF35/ Liang Liu
‘Digital’- quantization
4
Lund University / EITF35/ Liang Liu
What does it mean?
5
Lund University / EITF35/ Liang Liu
What does it mean?
6
Lund University / EITF35/ Liang Liu
Two basic components
7
Operands (Data type) Operations Check what is in the library!
0101010111100 signed/unsigned binary floating-point 7-segment ......
+/-
......
Lund University / EITF35/ Liang Liu
8
Data Representation
Unsigned
Signed (Two’s complement)
1
bit 2
n i i i
1 2 1
bit ( 2 ) bit 2
n i i i n n
111101002 = -1210 Sign bit 2’s complement
Lund University / EITF35/ Liang Liu
Signed overflow ↑
1000 0000
1000 0001 ... ... 1111 1100 1111 1101
1111 1110
1111 1111
Signed integers
0000 0000 1 0000 0001 1 2 0000 0010 2 3 0000 0011 3 ... ... ... 126 0111 1110 126 Unsigned integers Signed overflow ↓ 127 0111 1111 127 1000 0000 128 1000 0001 129 ... ... 1111 1110 254 1111 1111 255 Unsigned overflow ↓
8-bit Signed/Unsigned Integers
MSB defines sign
9
Lund University / EITF35/ Liang Liu
Finite Word-Length Effect
Overflow
Quantization error
input
Rounding Floor Ceil
round(0.51)=1 floor(0.51)=0 ceil(0.49)=1
12
Will learn more in DSP-Design course
Lund University / EITF35/ Liang Liu
Fixed-Point Design
DSP algorithms
for digital hardware realization
Fixed-point digital VLSI
dynamic range
Idea Floating-Point Algorithm Quantization Fixed-Point Algorithm Code Generation Target System Algorithm Level Implementation Level Range Estimation
13
Lund University / EITF35/ Liang Liu
“Optimum” Word-Length
Range Analysis
14
Lund University / EITF35/ Liang Liu
“Optimum” Word-Length
Range Analysis Fixed-point Simulation
15
Lund University / EITF35/ Liang Liu
Hardware Consumption Analysis
Complexity analysis Quick prototype
16
Lund University / EITF35/ Liang Liu
Where is the cost
18
Global Cache
Source: Han Song, “Efficient Methods and Hardware for Deep Learning” & V. Sze et.al. “Efficient Processing of Deep Neural Networks: A Tutorial and Survey”
Lund University / EITF35/ Liang Liu
Implement the best HW realization. Best??
Flexibilty Complexity
Low power Low cost Flexibilty
Lower power Lower cost
Design Trade-off
19
Lund University / EITF35/ Liang Liu
Implement the best HW realization. Best??
Different applications, different demands... Thus, ”just good enough” is the best in engineering. Try to find a BALANCE between effort and cost!
Design Trade-off
20
Lund University / EITF35/ Liang Liu
Overview
Fixed-Point Representation Add/Subtract Multiplication Timing&Techniques to Reduce Delay
21
Lund University / EITF35/ Liang Liu
A0 B0 S0 C1 A1 B1 S1 C2 Cn-1 An-1 Bn-1 Sn-1 Cn C0 = 0 ...
The HW for sum/difference (S) does NOT care about signed/unsigned Overflow
Add/Subtract (Binary)
+ + +
22
Lund University / EITF35/ Liang Liu
Signed Overflow Example
6+7 = 13, outside [-8..7] 0110 +0111 C4=0 1101 Cn Cn-1 = C4 C3 = 0 1 = 1 Carry-outs different Signed overflow C3 = 1 4-Bit signed addition
23
Overflow Check in Hardware?
Lund University / EITF35/ Liang Liu
Overflow in Hardware
Hardware does not take care of the overflow for you
24
Lund University / EITF35/ Liang Liu
Overflow in Hardware
25
Saturation or wrap-around or 1 more bit
Lund University / EITF35/ Liang Liu
Two’s Complement Signed Extension
To add two numbers, we should represent them with the same number of bits: 0100+11100
26
Lund University / EITF35/ Liang Liu
Decimal Mark in Hardware
Matlab aligns the decimal mark automatically 1.32+100.2343= 101.5543 Hardware does NOT
01.100+001.01=?
001.100+001.010=010.110 10001
27
Lund University / EITF35/ Liang Liu
Overview
Fixed-Point Representation Add/Subtract Multiplication Timing & Techniques to Reduce Delay
28
0,2 0,4 0,6 0,8 1 1,2 1,4 1,6
Area (mm) Delay (ns)
Mult Add
Lund University / EITF35/ Liang Liu
Y0 Y1 X3 X2 X1 X0 X3 HA X2 FA X1 FA X0 HA Y2 X3 FA X2 FA X1 FA X0 HA Z1 Z3 Z6 Z7 Z5 Z4 Y3 X3 FA X2 FA X1 FA X0 HA Z2 Z0
Direct Mapping
29
Array Multiplier (unsigned)
1011 * 1110 0000 (*0 = zero) +1011. (*1 = copy) +1011.. (*1 = copy) +1011... (*1 = copy) 10011010
Multiplier Multiplicand
Lund University / EITF35/ Liang Liu
1 0 1 1
0 0 1 1 +3 1 1 1 1 0 0 0 1
Don't Forget ... Signed Multiplication
30
Lund University / EITF35/ Liang Liu
Signed Multiplication
Either transform to multiply of non-negative integers:
0 1 0 1 +5x 0 0 1 1 +3 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 +15
abs(-5)=5 abs(3)=3
31
Lund University / EITF35/ Liang Liu
Signed Multiplication
Or directly perform signed multiplication:
1 0 1 1
0 0 1 1 +3 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1
32
Lund University / EITF35/ Liang Liu
Multiplier in Xilinx FPGA
Embedded DSP48E1
http://www.xilinx.com/support/documentation/us er_guides/ug479_7Series_DSP48E1.pdf 34
Lund University / EITF35/ Liang Liu
Multiplier in Xilinx FPGA
36
Lund University / EITF35/ Liang Liu
Multiplier in Xilinx FPGA
37 architecture archi of use_dsp48_example is signal s : std_logic_vector (7 downto 0); attribute use_dsp48 : string; attribute use_dsp48 of s : signal is "yes"; begin process (clk) begin if clk'event and clk = '1' then s <= s + a; end if; end process; end archi;
Lund University / EITF35/ Liang Liu
Constant Multiplication
Examples:
Software may be not smart enough to optimize Designer should optimize that multiplications with a small constant is accomplished by shifts & adds Some numerical examples: *2 (*102): multiplicand << 1 *3 (*112): multiplicand << 1 + multiplicand *5 (*1012): multiplicand << 2 + multiplicand *255 (*111111112): ? multiplicand << 8 – multiplicand
38
Lund University / EITF35/ Liang Liu
Overview
Fixed-Point Representation Add/Subtract Multiplication Timing & Techniques to Reduce Delay
44
Lund University / EITF35/ Liang Liu
Combinational Circuit Timing
Path delay = cell delay + net delay
0.62 0.5 0.4 1.28 0.21 0.82 0.12 Path Delay = 0.5+0.4+0.62+0.21+1.28+0.12+0.82=3.95 ns
45
Lund University / EITF35/ Liang Liu
Combinational Circuit Timing
Path delay = cell delay + net delay
0.62 0.5 0.4 1.28 0.21 0.82 0.12 Path Delay = 0.5+0.4+0.62+0.21+1.28+0.12+0.82=3.95 ns
46
Lund University / EITF35/ Liang Liu
Combinational Circuit Timing
Path delay = cell delay + net delay
0.62 0.5 0.4 1.28 0.21 0.82 0.12 Path Delay = 0.5+0.4+0.62+0.21+1.28+0.12+0.82=3.95 ns
47
Lund University / EITF35/ Liang Liu
Combinational Circuit Timing
Path delay = cell delay + net delay
0.62 0.5 0.4 1.28 0.21 0.82 0.12 Path Delay = 0.5+0.4+0.62+0.21+1.28+0.12+0.82=3.95 ns
48
FPGA
Lund University / EITF35/ Liang Liu
Combinational Circuit Timing
Path delay = cell delay + net delay How to reduce processing delay
0.62 0.5 0.4 1.28 0.21 0.82 0.12 Path Delay = 0.5+0.4+0.62+0.21+1.28+0.12+0.82=3.95 ns
49
Lund University / EITF35/ Liang Liu
Combinational Circuit Timing
Path delay = cell delay + net delay How to reduce processing delay
0.62 0.5 0.4 1.28 0.21 0.82 0.12 Path Delay = 0.5+0.4+0.62+0.21+1.28+0.12+0.82=3.95 ns
50
Lund University / EITF35/ Liang Liu
51
8 7 6 5 4 3 2 1
A A A A A A A A B
Example1: Higher-Level Adder Chain
Cascaded-Chain
Calculate: A1 A2 B
A3 + A4 + A5 + A6 + A7 + A8 +
Lund University / EITF35/ Liang Liu
52
)] ( ) [( )] ( ) [(
8 7 6 5 4 3 2 1
A A A A A A A A B
Higher-Level
Tree
A1 A2
B A3 A4
A5 A6
A7 A8
Lund University / EITF35/ Liang Liu
56