SLIDE 1 CENG 342 – Digital Systems
Simplified Floating-point Adder Larry Pyeatt
SDSM&T
SLIDE 2
Binary Floating Point Representation
Floating point number consists of three components: sign bit, exponent, and mantissa. Example: 12.75: sign is +, exponent is 102, and mantissa is .1275. Stored in a normalized representation, in binary: (−1)s × .m × 2e , where s and e represent the sign and exponent, and m is the mantissa. Floating-point adder for 13 bit format:
1 bit for sign, 4 bits for exponent, and 8 bits for mantissa.
Assumptions:
Both exponent and fraction are unsigned. Normalized representation: the MSB of the fraction must be 1. If the magnitude is smaller than 0.100000002 × 20, it needs to be converted to 0. Ignore the round-off error (lower bits will be discarded when shifted out)
SLIDE 3
Major steps
Sorting: find the bigger and smaller numbers Alignment: align two numbers so that they have the same exponent, if necessary, adjust the exponent of the smaller number Add/Subtract: perform addition when both have the same sign, otherwise perform subtraction Normalization: After a subtraction, the result may have leading zeros in front.
Count number of leading 0s (n), then shift fraction n bits, and adjust exponent by n.
If after a subtraction, the result is too small to be normalized, make both exponent and fraction 0. If after an addition, the result generates a carry out bit, shift mantissa to right 1 bit and increment exponent.
SLIDE 4
Examples in Decimal
SLIDE 5 Entity Declaration
Design uses a similar algorithm for decimal addition The suffixes ’b’, ’s’, ’a’, ’r’ and ’n’ are used in signal names represent big number, small number, aligned number, result of addition/subtraction and normalized number, respectively.
1 library ieee; 2 use ieee.std_logic_1164.all; 3 use ieee.numeric_std.all; 4 5 entity fp_adder is 6
port (
7
sign1, sign2: in std_logic;
8
exp1, exp2: in std_logic_vector(3 downto 0);
9
frac1, frac2: in std_logic_vector(7 downto 0);
10
sign_out: out std_logic;
11
exp_out: out std_logic_vector(3 downto 0);
12
frac_out: out std_logic_vector(7 downto 0)
13
);
14 end fp_adder ;
SLIDE 6 Architecture – Part 1
16 architecture arch of fp_adder is 17 -- suffix b, s, a, n for big, small, aligned, 18 -- normalized number 19
signal signb, signs: std_logic;
20
signal expb, exps, expn: unsigned(3 downto 0);
21
signal fracb, fracs, fraca, fracn: unsigned(7 downto 0);
22
signal sum_norm: unsigned(7 downto 0);
23
signal exp_diff: unsigned(3 downto 0);
24
signal sum: unsigned(8 downto 0); -- extra bit for carry-out
25
signal lead0: unsigned(2 downto 0);
26 begin 27
- - 1st stage: sort to find the larger
28
number
29
process (sign1, sign2, exp1, exp2, frac1,
30
frac2)
31
Exponent and fraction both need
32
begin
33
comparison, combine these two together
34
if (exp1 & frac1) > (exp2 & frac2) then
35
signb <= sign1;
36
signs <= sign2;
37
expb <= unsigned(exp1);
38
exps <= unsigned(exp2);
39
fracb <= unsigned(frac1);
40
fracs <= unsigned(frac2);
SLIDE 7 Architecture – Part 2
41
else
42
signb <= sign2;
43
signs <= sign1;
44
expb <= unsigned(exp2);
45
exps <= unsigned(exp1);
46
fracb <= unsigned(frac2);
47
fracs <= unsigned(frac1);
48
end if;
49
end process;
50 51
- - 2nd stage: align smaller number
52
exp_diff <= expb - exps;
53
54
lefraca <=
55
fracs when "0000",
56
"0" & fracs(7 downto 1) when "0001",
57
"00" & fracs(7 downto 2) when "0010",
58
"000" & fracs(7 downto 3) when "0011",
59
"0000" & fracs(7 downto 4) when "0100",
60
"00000" & fracs(7 downto 5) when "0101",
61
"000000" & fracs(7 downto 6) when "0110",
62
"0000000" & fracs(7) when"0111",
63
"00000000" when others;
64 65
- - 3rd stage: add/substract
66
sum <= (’0’ & fracb) + (’0’ & fraca) when
67
signb=signs else
68
(’0’ & fracb) - (’0’ & fraca);
SLIDE 8 Architecture – Part 3
70
71
72
lead0 <= "000" when (sum(7)=’1’) else
73
"001" when (sum(6)=’1’) else
74
"010" when (sum(5)=’1’) else
75
"011" when (sum(4)=’1’) else
76
"100" when (sum(3)=’1’) else
77
"101" when (sum(2)=’1’) else
78
"110" when (sum(1)=’1’) else
79
"111";
80 81
- - 4b - shift significand according to leading 0
82
with lead0 select
83
sum_norm <=
84
sum(7 downto 0)
85
when "000",
86
sum(6 downto 0) & ’0’
87
when "001",
88
sum(5 downto 0) & "00"
89
when "010",
90
sum(4 downto 0) & "000" when "011",
91
sum(3 downto 0) & "0000" when "100",
92
sum(2 downto 0) & "00000" when "101",
93
sum(1 downto 0) & "000000" when "110",
94
sum(0) & "0000000" when others;
SLIDE 9 Architecture – Part 4
96
- - 4c - special conditions
97
process(sum,sum_norm,expb,lead0)
98
begin
99
if sum(8)=’1’ then -- w/ carry out; shift frac to right
100
expn <= expb + 1;
101
fracn <= sum(8 downto 1);
102
elsif (lead0 > expb) then -- too small to normalize;
103
expn <= (others=>’0’); -- set to 0
104
fracn <= (others=>’0’);
105
else
106
expn <= expb - lead0;
107
fracn <= sum_norm;
108
end if;
109
end process;
110 111
112
sign_out <= signb;
113
exp_out <= std_logic_vector(expn);
114
frac_out <= std_logic_vector(fracn);
115 end arch;
SLIDE 10 Testing Circuit – Part 1
The floating-point adder needs 13-bit operands. For two inputs, it needs 26-bit in total. The S3 board cannot provide enough physical inputs to test the circuit. We must assign constants or duplicated switch signals to the adder’s inputs. The addition result is passed to hexadecimal decoders and results are shown on the 7-segment LEDs.
Exponent (4-bit) is displayed on the rightmost LED LSB of mantissa (4-bit) is displayed on the left of the exponent. MSB (4-bit) is displayed to the left of the LSB. The“sign” is displayed on the leftmost LED
1 library ieee; 2 use ieee.std_logic_1164.all; 3 use ieee.numeric_std.all; 4 5 entity fp_adder_test is 6
port(
7
clk: in std_logic; -- will be used in 7-seg LED display time-multiplexing module
8
sw: in std_logic_vector(7 downto 0);
9
btn: in std_logic_vector(3 downto 0);
10
an: out std_logic_vector(3 downto 0);
11
sseg: out std_logic_vector(7 downto 0)
12
);
13 end fp_adder_test;
SLIDE 11 Testing Circuit – Part 2
14 15 architecture arch of fp_adder_test is 16
signal sign1, sign2: std_logic;
17
signal exp1, exp2: std_logic_vector(3 downto 0);
18
signal frac1, frac2: std_logic_vector(7 downto 0);
19
signal sign_out: std_logic;
20
signal exp_out: std_logic_vector(3 downto 0);
21
signal frac_out: std_logic_vector(7 downto 0);
22
signal led3, led2, led1, led0: std_logic_vector(7 downto 0);
23 24 begin 25
- - set up the fp adder input signals
26
sign1 <= ’0’;
27
exp1 <= "1000";
28
frac1<= ’1’ & sw(1) & sw(0) & "10101";
29
sign2 <= sw(7);
30
exp2 <= btn;
31
frac2 <= ’1’ & sw(6 downto 0);
32
33
fp_add_unit: entity work.fp_adder
34
port map(
35
sign1=>sign1, sign2=>sign2, exp1=>exp1, exp2=>exp2,
36
frac1=>frac1, frac2=>frac2,
37
sign_out=>sign_out, exp_out=>exp_out,
38
frac_out=>frac_out
39
);
SLIDE 12 Testing Circuit – Part 3
40
- - instantiate three instances of hex decoders
41
- - exponent is shown on the rightmost LED
42
sseg_unit_0: entity work.hex_to_sseg
43
port map(hex=>exp_out, dp=>’0’, sseg=>led0);
44
45
sseg_unit_1: entity work.hex_to_sseg
46
port map(hex=>frac_out(3 downto 0),
47
dp=>’1’, sseg=>led1);
48
49
sseg_unit_2: entity work.hex_to_sseg
50
port map(hex=>frac_out(7 downto 4),
51
dp=>’0’, sseg=>led2);
52
53
led3 <= "11111110" when sign_out=’1’ else -- middle bar
54
"11111111";
55
- - instantiate 7-seg LED display time-multiplexing module
56
disp_unit: entity work.disp_mux
57
port map(
58
clk=>clk, reset=>’0’,
59
in0=>led0, in1=>led1, in2=>led2, in3=>led3,
60
an=>an, sseg=>sseg
61
);
62 end arch;