[PPT] - Fixed Point Real Numbers 16-bit Unsigned with Binary Point: 8 PowerPoint Presentation

SLIDE 1

Fixed Point Real Numbers

16-bit Unsigned with Binary Point: 8
XXXXXXXX.XXXXXXXX
Maximum/Minimum Values
11111111.11111111 = 255+255/256
00000000.00000000 = 0
16-bit Signed with Binary Point: 8
XXXXXXXX.XXXXXXXX
Maximum/Minimum Values
01111111.11111111 = 127+255/256
10000000.00000000 = -128

SLIDE 2

Multiplication of Signed FP

If a has width Wa and binary point

bpa and b has width Wb and binary point bpb.

The output of the multiplier will

need width Wa+Wb and a bp of bpa+bpb.

SLIDE 3

Number Representation

Previous examples of FIR filters

used integer representations for the filter coefficients.

What if we have coefficients with

fractional components?

Two options.
1. Apply a scaling factor to all the

coefficients to get the desired resolution.

2. Use a binary point numbers to represent
ur coefficients.

SLIDE 4

Example:

input is 8-bit signed
filter coefficients

b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values digital of b should we use? What is the required output data width?

SLIDE 5

Example:

input is 8-bit signed
filter coefficients

b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values of b should we use? What is the required output data width?

Scaling factor approach
Multiply coefficients by 100 and use a

10-bit signed format (-512 to 511). b = [142 205 -323 471 -311 -510]

Determine the maximum output.

max = abs(128)*sum(abs(b)) = 251136

ceil(log2(251136))+1 = 19-bit signed
19-bit signed has a range(-262144 to

262143)

SLIDE 6

Example:

input is 8-bit signed
filter coefficients

b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values of b should we use? What is the required output data width?

Scaling factor approach
If the absolute scale of the output is to

be retained, it will need to be divided by 100 to revert back to the original filter coefficients.

How do we divide by 100 in binary?
Maybe not a good approach in all

instances.

SLIDE 7

Example:

input is 8-bit signed
filter coefficients

b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values of b should we use? What is the required output data width?

Binary Point Approach
Represent coefficients with 10-bit signed

and binary point at the 6th position

This is a design choice.
XXXX.XXXXXX
Can handle values -8+(0/64) to 7+(63/64)

SLIDE 8

Example:

input is 8-bit signed
filter coefficients

b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values of b should we use? What is the required output data width?

Binary Point Approach
Determine the digital coefficient values.
bbp = dec2bin(mod(round(64*b)+1024,1024))
bbp = [0001011011

0010000011 1100110001 0100101101 1100111001 1010111010]

SLIDE 9

Example:

input is 8-bit signed
filter coefficients

b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values of b should we use? What is the required output data width?

Binary Point Approach
Determine the maximum output.

bbp = round(64b) max = abs(128)sum(abs(bbp)) = 160640

ceil(log2(160640))+1 = 19-bit signed
19-bit signed has a range(-262144 to

262143)

Final output is a 19-bit signed with a bp
f 6.

SLIDE 10

IIR Implementation: a0 = 1

z-1 z-1 a1 a2

+ - subtraction

SLIDE 11

IIR Implementation: Pipelining?

z-1 z-1 a1 a2 z-1 z-1 a1 a2 z-1 z-1 z-1 a1 a2 a1 a2 z-1 z-1 z-1 z-1 z-1 z-1

SLIDE 12

IIR Implementation

z-1 z-1 a1 a2 z-1 z-1

ut_reg

in_reg p2 p1 //functional description assign dif = in_reg - p1; assign m1 = dif*a1; assign m2 = dif*a2; always@(posedge clock) begin in_reg <= in;

ut_reg

<= dif; p1 <= m1 + p2; p2 <= m2; end dif m1 m2

SLIDE 13

IIR Implementation: DSP Blocks

z-1 z-1 a1 a2 z-1 z-1

ut_reg

in_reg p2 p1 //dsp48 structural always@(posedge clock) begin in_reg <= in;

ut_reg <= dif;

end macc_wrap dsp2 (.C(0),.A(dif),.B(a2),.PCOUT(p2)); macc_wrap dsp1 (.PCIN(p2),.A(dif),.B(a1),POUT(p1)); assign dif = p1 + in; dif

SLIDE 14

Number Representation

For the IIR filter diagram in the

previous slides, there is a requirement that a0=1.

For cases when a1 and a2 are near 1
r fractional values, we cannot

accurately represent these values.

Two options.
1. add a pre-multiplier to the input to

incorporate an a0 scale term.

2. If we care about the absolute scale,

use a binary point numbers to represent

ur coefficients.
3. Remember to keep track of binary point

locations especially in the feedback path.

SLIDE 15

IIR Implementation

z-1 z-1 a1 a2 z-1 z-1

ut_reg

in_reg p2 p1 //functional description assign dif = m0 - p1; assign m0 = in_reg*a0; assign m1 = dif*a1; assign m2 = dif*a2; always@(posedge clock) begin in_reg <= in;

ut_reg

<= dif; p1 <= m1 + p2; p2 <= m2; end dif m1 m2 m0

a0

+ - subtraction

SLIDE 16

IIR Implementation

z-1 z-1 a1 a2 z-1 z-1

ut_reg

in_reg p2 p1 Keeping track of bp locations We will use (W:BP) notation. Assume all values are signed. Input is 8-bit signed (8:0) Coefficients are (10:6) Assume p1 is (18:6) for subtraction m0 (18:6) diff (19:6) m1,m2 (29:12) p1 (30:12) p1 needs to have a bp of 6 so the subtraction will have equivalent input formats. dif m1 m2 m0

a0

+ - subtraction

SLIDE 17

IIR Implementation

z-1 z-1 a1 a2 z-1 z-1

ut_reg

in_reg p2 p1 //functional description assign dif = m0 – (p1 >>> bp); //bp is the binary point of the coefficients assign m0 = in_reg*a0; assign m1 = dif*a1; assign m2 = dif*a2; always@(posedge clock) begin in_reg <= in;

ut_reg

<= dif; p1 <= m1 + p2; p2 <= m2; end dif m1 m2 m0

a0

+ - subtraction

SLIDE 18

IIR Implementation

z-1 z-1 a1 a2 z-1 z-1

ut_reg

in_reg p2 p1 //functional description assign dif = m0 – p1; assign m0 = in_reg*a0; assign m1 = (dif*a1) >>> bp; //or shift assign m2 = (dif*a2) >>> bp; //here always@(posedge clock) begin in_reg <= in;

ut_reg

<= dif; p1 <= m1 + p2; p2 <= m2; end dif m1 m2 m0

a0

+ - subtraction

SLIDE 19

IIR Implementation

z-1 z-1 a1 a2 z-1 z-1

ut_reg

p2 p1 //dsp48 structural always@(posedge clock) begin

ut_reg <= dif;

end macc_wrap dsp0 (.C(p1 >>> bp),.A(in),.B(a0),.POUT(dif)); macc_wrap dsp2 (.C(0),.A(sum),.B(a2),.PCOUT(p2)); macc_wrap dsp1 (.PCIN(p2),.A(sum),.B(a1),POUT(p1)); dif

a0

SLIDE 20

IIR Implementation

z-1 z-1

a1
a2

z-1 z-1

ut_reg

p2 p1 //dsp48 structural always@(posedge clock) begin

ut_reg <= dif;

end macc_wrap dsp0 (.C(p1 >>> bp),.A(in),.B(a0),.POUT(sum)); macc_wrap dsp2 (.C(0),.A(sum),.B(-a2),.PCOUT(p2)); macc_wrap dsp1 (.PCIN(p2),.A(sum),.B(-a1),POUT(p1)); sum

a0

SLIDE 21

IIR Implementation: Pipelining?

a3 a4 a2 a1 z-4 z-1 z-1 z-1 z-1 z-1 a3 a4 z-1 a2 z-1 a1 z-1

Really only need to pipeline a 2nd order IIR filter to realize 2 poles. Then we can cascade a number of them to realize M poles.

SLIDE 22

IIR Implementation: Pipelining?

z-1 a4 z-1 a2 z-1 z-1

The above diagram has 4 poles and has extra registers for pipelining. Idea is to start with an IIR filter with 4 poles (2 we want to keep and 2 of our choosing that will be canceled). Based on the 2 we want to keep, determine what the 2 additional poles need to be to eliminate the a1 and a3 terms. Pre-multiply with a cascaded FIR filter with zeros placed at the locations of the two additional poles.

SLIDE 23

IIR Implementation: Pipelining?

Turn to the math...
Z-domain

𝐼 𝑨 = 1 1 − 𝑏1𝑨−1 − 𝑏2𝑨−2 = 1 1 − 𝑞1𝑨−1 1 − 𝑞2𝑨−1

Add some poles but compensate in the

numerator with an FIR filter.

𝐼 𝑨 = 1 1 − 𝑞1𝑨−1 1 − 𝑞2𝑨−1 1 − 𝑞3𝑨−1 1 − 𝑞4𝑨−1 1 − 𝑞3𝑨−1 1 − 𝑞4𝑨−1

Choose p3 & p4 to cancel the z-1 & z-3

coefficients in the denominator.

SLIDE 24

IIR Implementation: Pipelining

Using the original polynomial.

𝐼 𝑨 = 1 1 − 𝑏1𝑨−1 − 𝑏2𝑨−2 1 + 𝑏1𝑨−1 + −𝑏2 𝑨−2 1 − −𝑏1 𝑨−1 − 𝑏2𝑨−2 𝐼 𝑨 = 1 + 𝑏1𝑨−1 + −𝑏2 𝑨−2 1 − 𝑏1

2 + 2𝑏2 𝑨−2 − 𝑏2 2𝑨−4

SLIDE 25

IIR Implementation: Pipelining

z-2 z-2 a1

2+2a2

a2

2

z-1 z-1 a1

a2

z-1 z-1 a1

2+2a2

a2

2

z-1 z-1 a1

a2

z-1 z-1 z-1 z-1

SLIDE 26

IIR Implementation: Pipelining

Still not fully pipelined.
Add 4 zeros/poles instead of 2.

z-1 z-2 a1’ a2’ z-1 z-1 z-2 z-1 z-1 z-2 b1 z-2 b2 b3 b4 z-1 z-1 z-1 z-1 z-2 z-1 z-1 z-1 z-1 z-1 z-1

Fixed Point Real Numbers

Multiplication of Signed FP

bpa and b has width Wb and binary point bpb.

need width Wa+Wb and a bp of bpa+bpb.

Number Representation

used integer representations for the filter coefficients.

fractional components?

coefficients to get the desired resolution.

Example:

b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values digital of b should we use? What is the required output data width?

Example:

b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values of b should we use? What is the required output data width?

10-bit signed format (-512 to 511). b = [142 205 -323 471 -311 -510]

max = abs(128)*sum(abs(b)) = 251136

262143)

Example:

b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values of b should we use? What is the required output data width?

be retained, it will need to be divided by 100 to revert back to the original filter coefficients.

instances.

Example:

b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values of b should we use? What is the required output data width?

and binary point at the 6th position

Example:

b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values of b should we use? What is the required output data width?

0010000011 1100110001 0100101101 1100111001 1010111010]

Example:

b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values of b should we use? What is the required output data width?

bbp = round(64*b) max = abs(128)*sum(abs(bbp)) = 160640

262143)

IIR Implementation: a0 = 1

IIR Implementation: Pipelining?

IIR Implementation

IIR Implementation: DSP Blocks

Number Representation

previous slides, there is a requirement that a0=1.

accurately represent these values.

incorporate an a0 scale term.

use a binary point numbers to represent

locations especially in the feedback path.

IIR Implementation

IIR Implementation

IIR Implementation

IIR Implementation

IIR Implementation

IIR Implementation

IIR Implementation: Pipelining?

IIR Implementation: Pipelining?

IIR Implementation: Pipelining?

𝐼 𝑨 = 1 1 − 𝑏1𝑨−1 − 𝑏2𝑨−2 = 1 1 − 𝑞1𝑨−1 1 − 𝑞2𝑨−1

numerator with an FIR filter.

𝐼 𝑨 = 1 1 − 𝑞1𝑨−1 1 − 𝑞2𝑨−1 1 − 𝑞3𝑨−1 1 − 𝑞4𝑨−1 1 − 𝑞3𝑨−1 1 − 𝑞4𝑨−1

coefficients in the denominator.

IIR Implementation: Pipelining

𝐼 𝑨 = 1 1 − 𝑏1𝑨−1 − 𝑏2𝑨−2 1 + 𝑏1𝑨−1 + −𝑏2 𝑨−2 1 − −𝑏1 𝑨−1 − 𝑏2𝑨−2 𝐼 𝑨 = 1 + 𝑏1𝑨−1 + −𝑏2 𝑨−2 1 − 𝑏1

2 + 2𝑏2 𝑨−2 − 𝑏2 2𝑨−4

IIR Implementation: Pipelining

IIR Implementation: Pipelining

bbp = round(64b) max = abs(128)sum(abs(bbp)) = 160640