CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology
1
CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao - - PowerPoint PPT Presentation
CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao Zheng Comp S ci & Eng Univ of South Florida 1 Structure of a Typical Digital Design Data Inputs Control Inputs Control Signals Datapath Controller (Execution (Control
1
2
3
RF/Scratch pad ALU MUL Memory Bus 1 Bus 2 Bus 3 Next- state logic Output logic State register (SR) Control signals Data path Controller Control inputs Control
Status signals
... ...
4
5
6
7
n 5 n 2 clk reset in_data in_addr write
MIN_MAX_AVR
Port Width Meaning
1 System clock
1 System reset – clears internal registers
n Input data bus
5 Address of the internal memory where input data is stored
1 Synchronous write control signal – validity of in_data
1 Starts the computations
1 Asserted when all results are ready
n Output data bus used to read results
2 01 – reading minimum 10 – reading maximum 11 – reading average
8
9
10
11
12
13
14
Before sorting
After sorting
Addr
i=0 i=0 i=0 i=1 i=1 i=2 j=1 j=2 j=3 j=2 j=3 j=3
Legend: position of memory indexed by i position of memory indexed by j
Data
15
16
clock reset din
N
done addr
k
we start
N
dout
17
18
19
20
21
22
start=1 / rst<=1, i<=0
we <= 0 sel2 <= 0 sel3 <= 0 ...
done<=0
23
24
25
i = 0; while i < k-1 do addr = i A = M[addr] j=i+1 while j < k do addr = j B = M[addr] if A > B then addr = i M[addr] = B addr = j M[addr] = A A = B end if j=j+1 end while i = i+1; end while 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
26
i = 0; while i < k-1 do addr = i A = M[addr] j=i+1 while j < k do addr = j B = M[addr] if A > B then addr = i M[addr] = B addr = j M[addr] = A A = B end if j=j+1 end while i = i+1; end while 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
27
i = 0; while i < k-1 do addr = i A = M[addr] j = i+1; while j < k do j = j+1 addr = j B = M[addr] if A > B then addr = i M[addr] = B addr = j M[addr] = A A = B end if end while end while 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Current State Next State
Cond
Operations
1 2 start=‘1’
i <= 0
2 3 i < k-1
null
2 18 !(i<k-1)
done <= ‘1’
3 6 true
addr <= i, A <= M[addr]; j <= j+1;
6 7 j < k
null
6 17 !(j<k)
null
7 10 true
j++; addr <= j; B <= M[addr];
10 16 A > B
addr <= i; M[addr] <= B;
10 16 !(A > B)
null
16 6 true
null
17 2 true
null
... ... ...
...
28
i = 0; while i < k-1 do addr = i A = M[addr] j = i+1 while j < k do addr = j B = M[addr] if A > B then addr = i M[addr] = B addr = j M[addr] = A A = B end if j = j+1 end while i = i + 1 end while 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Curren t State Next State
Cond
Operations
s0 s1 start=‘1’
i <= 0
s1 s2 i < k-1
addr <= i, A <= M[addr]; j <= i+1;
s1 s0 !(i<k-1)
done <= ‘1’
s2 s3 j < k
addr <= j; B <= M[addr];
s2 s1 !(j<k)
i <= i+1
s3 s2 A > B
addr <= i; M[addr] <= B; addr <= j; M[addr] <= A; A <= B; j <= j+1;
s3 s2 !(A > B)
j <= j+1;
29
30
31
xpower = 1; for for (i = 0; i < 3; i++) xpower = x * xpower; process process (clk) begin begin if if rising_edge(clk) then then if if start=‘1’ then then cnt <= 3; done <= ‘0’; elsif elsif cnt > 0 then then cnt <= cnt – 1; xpower <= xpower * x; elsif elsif cnt = 0 then then done <= ‘1’; end if end if; end process end process;
Throughput: 1 data / 3 cycles = 0.33 data / cycle . Latency: 3 cycles. Critical path delay: 1 multiplier delay
32
Throughput: 1 data / cycle Latency: 3 cycles + register delays. Critical path delay: 1 multiplier delay
33
34
stage 1 stage 2 stage n
registers
35
stage 1 stage 2 stage n registers
36
stage 1 stage 2 stage n
37
stage 1 stage 2 stage n registers
38
39
Critical path delay: 3 adders
40
Critical path delay: 2 adders
41
42
cont’d from previous slide
43
44
cont’d from previous slide AND AND AND AND OR Reg c[0] c[1] c[2] c[3] din3 din2 din1 din0 rout enable
45
block 1 block 2
block 1 block 2
process process (clk, rst) begin begin if if rising_edge(clk) then then rA <= A; rB <= B; rC <= C; sum <= rA + rB + rC; end if end if; end process end process; process process (clk, rst) begin begin if if rising_edge(clk) then then sumAB <= A + B; rC <= C; sum <= sumAB + rC; end if end if; end process end process;
47
48
stage 1 stage 2 stage n Block including all all logic in stage 1 to n.
49
A B C D X
50
X A B C D
A, B, C, D need to hold steady until X is processed
control
A B C D X
51
52
53
– Minimize slice logic utilization. – Maximize circuit performance. – Utilize device resources such as block RAM components and DSP blocks.
– Control set remapping becomes impossible. – Sequential functionality in device resources such as block RAM components and DSP blocks can be set or reset synchronously only. – You will be unable to leverage device resources resources, or they will be confjgured sub-optimally. – Use synchronous initialization instead.
to be set or reset asynchronously. This allows you to assess the benefjts of using synchronous set/reset.
– No Flip-Flop primitives feature both a set and a reset, whether synchronous
– If not rejected by the software, Flip-Flop primitives featuring both a set and a reset may adversely affect area and performance.
model.
expensive, ways to achieve the desired effect, such as taking advantage of the circuit global reset by defjning an initial contents.
as active-High. If they are described as active-Low, the resulting inverter logic will penalize circuit performance.
For other ways to control implementation of Flip-Flops and Registers, see Mapping Logic to LUTs.
– Minimize slice logic utilization. – Maximize circuit performance. – Utilize device resources such as block RAM components and DSP blocks.
– Control set remapping becomes impossible. – Sequential functionality in device resources such as block RAM components and DSP blocks can be set or reset synchronously only. – You will be unable to leverage device resources resources, or they will be confjgured sub-optimally. – Use synchronous initialization instead.
to be set or reset asynchronously. This allows you to assess the benefjts of using synchronous set/reset.
– No Flip-Flop primitives feature both a set and a reset, whether synchronous
– If not rejected by the software, Flip-Flop primitives featuring both a set and a reset may adversely affect area and performance.
model.
expensive, ways to achieve the desired effect, such as taking advantage of the circuit global reset by defjning an initial contents.
as active-High. If they are described as active-Low, the resulting inverter logic will penalize circuit performance.
For other ways to control implementation of Flip-Flops and Registers, see Mapping Logic to LUTs.
54
55
56
57
stage 1 stage 2 stage n stage 4
stage 1 stage 2 stage n stage 4
positively triggered negatively triggered
58
59