CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao - PowerPoint PPT Presentation

CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao Zheng Comp S ci & Eng Univ of South Florida 1

Structure of a Typical Digital Design Data Inputs Control Inputs Control Signals Datapath Controller (Execution (Control Unit) Unit) Status Signals Data Outputs Control Outputs 2

Hardware Design with RTL VHDL Control inputs ... RF/Scratch pad Control signals Bus 1 Bus 2 State register (SR) Next- state Output Memory ALU logic MUL ... logic Status signals Bus 3 Data path Controller Control outputs 3

Steps of the Design Process 1. Text description 2. Define interface 3. Describe the functionality using pseudo-code 4. Convert pseudo-code to FSM in state diagram 1. Define states and state transitions 2. Define datapath operations in each state. 5. Develop VHDL code to implement FSM 6. Develop testbench for simulation and debugging 7. Implementation and timing simulation • Timing simulation can reveal more bugs than pre- synthesis simulation 8. Test the implementation on FPGA boards 4

Min_Max_Average 5

Pseudocode Input : M[i] Outputs : max, min, average max = 0 min = MAX // the maximal constant Data M[i] are stored in sum = 0 memory. for i=0 to 31 do d = M[i]; Results are stored in the sum = sum + d internal registers. if (d < min) then min = d endif if (d > max) then max = d endif endfor average = sum/32 6

Circuit Interface clk done reset n n in_data out_data 5 MIN_MAX_AVR 2 in_addr out_addr write start 7

Interface Table Width Meaning Port clk 1 System clock reset 1 System reset – clears internal registers in_data n Input data bus 5 Address of the internal memory where input data is stored in_addr 1 Synchronous write control signal – validity of in_data write 1 Starts the computations start 1 Asserted when all results are ready done n Output data bus used to read results out_data 2 01 – reading minimum 10 – reading maximum out_addr 11 – reading average 8

Datapath Input : M[i] Output : max, min, average max = 0 min = max sum = 0 for i=0 to 31 do d = M[i]; sum = sum + d if (d < min) then min = d endif if (d > max) then max = d endif endfor average = sum/32 9

Datapath Input : M[i] Output : max, min, average d min max max = 0 min = max sum = 0 + > < for i=0 to 31 do d = M[i]; sum = sum + d d min d max if (d < min) then sum min = d mux mux endif if (d > max) then /32 max = d endif endfor max min average average = sum/32 10

State Diagram for Input: M[i] Outputs: max, min, average Controller max = 0 min = MAX sum = 0 for i=0 to 31 do d = M[i]; sum = sum + d if (d < min) then min = d endif if (d > max) then max = d endif endfor average = sum/32 11

State Diagram for Input: M[i] Controller Outputs: max, min, average max = 0 start=0/ init min = MAX sum = 0 start=1 / rst<=1 for i=0 to 31 do done<=0 d = M[i]; sum = sum + d run i < 32 / i++ if (d < min) then min = d i==32 / done<=1 endif if (d > max) then end max = d endif endfor Output logic: in_addr <= i; average = sum/32 out_data <= ... 12

Sorting 13

Sorting - Example During Sorting After Before sorting sorting i=0 i=0 i=0 i=1 i=1 i=2 j=1 j=2 j=3 j=2 j=3 j=3 Data Addr 0 3 3 2 2 1 1 1 1 1 2 2 3 3 3 3 2 2 2 4 4 4 4 4 4 4 3 3 1 1 1 1 2 2 3 4 Legend: position of memory position of memory M j M i indexed by i indexed by j 14

Pseudocode for i=0 to k-2 do K is a constant, the number of A = M[i] integers to be for j=i+1 to k-1 do sorted in memory B = M[j] if A > B then M denotes memory. M[i] = B M[j] = A Memory address is A = B either i or j. end if end for end for 15

Sorting – Interface clock din N reset dout N Memory Sort k addr start we done 16

Sorting – Datapath for i=0 to k-2 do • Registers to hold A, B, A = M[i] • Memory addresses i and j for j=i+1 to k-1 do • Incrementor B = M[j] • Comparator if A > B then M[i] = B M[j] = A A = B end if end for end for 17

Sorting – Datapath for i=0 to k-2 do enable A = M[i] for j=i+1 to k-1 do Ri B = M[j] +1 if A > B then i M[i] = B mux sel1 M[j] = A A = B j +1 end if end for Rj end for 18

Sorting – Datapath for i=0 to k-2 do j i A = M[i] B din for j=i+1 to k-1 do sel3 mux sel2 B = M[j] mux if A > B then addr M[i] = B R A R B A B M[j] = A A = B B A mux end if end for dout end for 19

Sorting – Datapath for i=0 to k-2 do A = M[i] for j=i+1 to k-1 do j k-1 i k-2 B A B = M[j] if A > B then > > > M[i] = B M[j] = A end_j AgtB end_i A = B end if end for status signals end for 20

Sorting – Controller for i=0 to k-2 do • Nested loops by two FSMs: A = M[i] one for the outer loop for j=i+1 to k-1 do controls the one for the inner B = M[j] loop. if A > B then • Reuse the FSM for the single M[i] = B for loop in the previous M[j] = A example. A = B end if end for end for 21

Sorting – Controller for i=0 to k-2 do start=0/ init A = M[i] for j=i+1 to k-1 do done<=0 start=1 / rst<=1, i<=0 B = M[j] end_i=1 / outer end if A > B then done<=1 M[i] = B end_i=0 / end_j=1 / we <= 0 M[j] = A sel2 <= 0 i++; A = B sel3 <= 0 ... end if inner end for j++; end_j=0 / end for … 22

Behavioral Level Design reg_next Combinational inputs register reg Logic clk output 23

FSMD for i=0 to k-2 do A = M[i] for j=i+1 to k-1 do B = M[j] if A > B then M[i] = B M[j] = A A = B end if end for end for 24

FSMD i = 0; 1 while i < k-1 do 2 addr = i 3 for i=0 to k-2 do A = M[addr] 4 A = M[i] j=i+1 5 for j=i+1 to k-1 do while j < k do 6 addr = j 7 B = M[j] B = M[addr] 8 if A > B then if A > B then 9 addr = i 10 M[i] = B M[addr] = B 11 M[j] = A addr = j 12 A = B M[addr] = A 13 A = B 14 end if end if 15 end for j=j+1 16 end while 17 end for i = i+1; 18 end while 19 25

FSMD i = 0; 1 while i < k-1 do 2 addr = i 3 A = M[addr] 4 j=i+1 5 while j < k do 6 addr = j 7 B = M[addr] 8 if A > B then 9 addr = i 10 M[addr] = B 11 addr = j 12 M[addr] = A 13 A = B 14 end if 15 j=j+1 16 end while 17 i = i+1; 18 end while 19 26

FSMD 1 i = 0; 2 while i < k-1 do Current Next Cond Operations State State 3 addr = i 1 2 start=‘1’ 4 A = M[addr] i <= 0 5 j = i+1; 2 3 i < k-1 null 6 while j < k do 2 18 !(i<k-1) done <= ‘1’ 7 j = j+1 addr <= i, 8 addr = j 3 6 true A <= M[addr]; j <= j+1; 9 B = M[addr] 10 if A > B then 6 7 j < k null 11 addr = i 6 17 !(j<k) null 12 M[addr] = B j++; addr <= j; B <= 7 10 true 13 addr = j M[addr]; 14 M[addr] = A 10 16 A > B addr <= i; M[addr] <= B; 15 A = B 10 16 !(A > B) null 16 end if 16 6 true null 17 end while 17 2 true null 18 end while ... ... ... ... 27

FSMD 1 i = 0; 2 while i < k-1 do 3 addr = i 4 A = M[addr] Curren Next Cond Operations 5 j = i+1 t State State 6 while j < k do s0 s1 start=‘1’ i <= 0 7 addr = j addr <= i, 8 B = M[addr] s1 s2 i < k-1 A <= M[addr]; j <= i+1; 9 if A > B then s1 s0 !(i<k-1) 10 addr = i done <= ‘1’ 11 M[addr] = B s2 s3 j < k addr <= j; B <= M[addr]; 12 addr = j s2 s1 !(j<k) i <= i+1 13 M[addr] = A addr <= i; M[addr] <= B; 14 A = B s3 s2 A > B addr <= j; M[addr] <= A; A <= B; j <= j+1; 15 end if s3 s2 !(A > B) 16 j = j+1 j <= j+1; 17 end while 18 i = i + 1 19 end while 28

Optimization for Performance 29

Performance Definitions • Throughput : the number of inputs processed per unit time. • Latency : the amount of time for an input to be processed. • Maximizing throughput and minimizing latency in conflict. • Both require timing optimization: - Reduce delay of the critical path 30

Achieving High Throughput: Pipelining • Divide data processing into stages • Process different data inputs in different stages simultaneously. process (clk) begin process begin if rising_edge(clk) then if then xpower = 1; if start=‘1’ then if then for for (i = 0; i < 3; i++) cnt <= 3; done <= ‘0’; xpower = x * xpower; elsif cnt > 0 then elsif then cnt <= cnt – 1; xpower <= xpower * x; elsif cnt = 0 then elsif then Throughput : 1 data / 3 cycles = done <= ‘1’; 0.33 data / cycle . end if ; end if Latency : 3 cycles. end process end process ; Critical path delay: 1 multiplier delay 31

Achieving High Throughput: Pipelining process (clk, rst) begin if rising_edge(clk) then if start=‘1’ then -- stage 1 x1 <= x; xpower1 <= x; done1 <= start; xpower = 1; end if ; for (i = 0; i < 3; i++) -- stage 2 xpower = x * xpower; x2 <= x1; xpower2 <= xpower1 * x1; done2 <= done1; Throughput: 1 data / cycle -- stage 3 Latency: 3 cycles + register delays. xpower <= xpower2 * x2; Critical path delay: 1 multiplier delay done <= done2; end if ; end process ; 32

Achieving High Throughput: Pipelining • Divide data processing into stages • Process different data inputs in different stages simultaneously. dout din 33

CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao - PowerPoint PPT Presentation

CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao Zheng Comp S ci & Eng Univ of South Florida 1 Structure of a Typical Digital Design Data Inputs Control Inputs Control Signals Datapath Controller (Execution (Control

CDA 4253/CIS 6930 FPGA System Design Modeling of Combinational Circuits Hao Zheng Dept of Comp

CDA 4253/CIS 6930 FPGA System Design VHDL Testbench Development Hao Zheng Comp. Sci & Eng

CDA 4253/CIS 6930 FPGA System Design Finite State Machines Dr. Hao Zheng Comp Sci & Eng U

CDA 4253/CIS 6930 FPGA System Design Sequential Circuit Building Blocks Hao Zheng Dept of Comp

CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of Comp Sci & Eng U of South

CDA 4253 FPGA System Design Op7miza7on Techniques Hao Zheng Comp S ci & Eng Univ of South

CDA 4253 FPGA System Design Introduction to VHDL Hao Zheng Dept of Comp Sci & Eng USF

CDA 4253 FPGA System Design PicoBlaze Interface Hao Zheng Comp Sci & Eng U of South Florida

CDA 4253 FPGA System Design The PicoBlaze Microcontroller Hao Zheng Comp Sci & Eng U of

CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 5: Composition of State Machines Hao

CIS 4930/6930: Principles of Cyber-Physical Systems Timed Automata: A Case Study Hao Zheng

CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 4: Hybrid Systems Hao Zheng

CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 11 Scheduling Hao Zheng Department

CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 2: Continuous Dynamics Hao Zheng

CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 3: Discrete Dynamics Hao Zheng

CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 4: Hybrid Systems Hao Zheng

Quicksort 4-18-2013 Th Thursda sday, April 18 18 th th Barben Ro Rooms s A&B &B,

Sampo-UI Framework for Semantic Portal User Interfaces Digital Humanities in Action: Sampo Model

Draft A review of Array-RQMC Sorting methods and convergence rates Pierre LEcuyer Christian

Neural Networks and Sparse Coding from the Signal Processing Perspective Gerald Schuller Ilmenau

Algorithms X. Zhang Fordham Univ. 1 Real World applications of algorithms Algorithms for

An Hour of Code Research (CSER) Group with Artificial The University of Adelaide

Wrap-up Max Kemman University of Luxembourg December 20, 2016 Doing Digital History:

On the Comparison of Numbers: An excursion into non-classical corners of computational

CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao - PowerPoint PPT Presentation

CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao Zheng Comp S ci & Eng Univ of South Florida 1 Structure of a Typical Digital Design Data Inputs Control Inputs Control Signals Datapath Controller (Execution (Control

CDA 4253/CIS 6930 FPGA System Design Modeling of Combinational Circuits Hao Zheng Dept of Comp

CDA 4253/CIS 6930 FPGA System Design VHDL Testbench Development Hao Zheng Comp. Sci &amp; Eng

CDA 4253/CIS 6930 FPGA System Design Finite State Machines Dr. Hao Zheng Comp Sci &amp; Eng U

CDA 4253/CIS 6930 FPGA System Design Sequential Circuit Building Blocks Hao Zheng Dept of Comp

CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of Comp Sci &amp; Eng U of South

CDA 4253 FPGA System Design Op7miza7on Techniques Hao Zheng Comp S ci &amp; Eng Univ of South

CDA 4253 FPGA System Design Introduction to VHDL Hao Zheng Dept of Comp Sci &amp; Eng USF

CDA 4253 FPGA System Design PicoBlaze Interface Hao Zheng Comp Sci &amp; Eng U of South Florida

CDA 4253 FPGA System Design The PicoBlaze Microcontroller Hao Zheng Comp Sci &amp; Eng U of

CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 5: Composition of State Machines Hao

CIS 4930/6930: Principles of Cyber-Physical Systems Timed Automata: A Case Study Hao Zheng

CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 4: Hybrid Systems Hao Zheng

CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 11 Scheduling Hao Zheng Department

CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 2: Continuous Dynamics Hao Zheng

CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 3: Discrete Dynamics Hao Zheng

CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 4: Hybrid Systems Hao Zheng

Quicksort 4-18-2013 Th Thursda sday, April 18 18 th th Barben Ro Rooms s A&amp;B &amp;B,

Sampo-UI Framework for Semantic Portal User Interfaces Digital Humanities in Action: Sampo Model

Draft A review of Array-RQMC Sorting methods and convergence rates Pierre LEcuyer Christian

Neural Networks and Sparse Coding from the Signal Processing Perspective Gerald Schuller Ilmenau

Algorithms X. Zhang Fordham Univ. 1 Real World applications of algorithms Algorithms for

An Hour of Code Research (CSER) Group with Artificial The University of Adelaide

Wrap-up Max Kemman University of Luxembourg December 20, 2016 Doing Digital History:

On the Comparison of Numbers: An excursion into non-classical corners of computational

CDA 4253/CIS 6930 FPGA System Design VHDL Testbench Development Hao Zheng Comp. Sci & Eng

CDA 4253/CIS 6930 FPGA System Design Finite State Machines Dr. Hao Zheng Comp Sci & Eng U

CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of Comp Sci & Eng U of South

CDA 4253 FPGA System Design Op7miza7on Techniques Hao Zheng Comp S ci & Eng Univ of South

CDA 4253 FPGA System Design Introduction to VHDL Hao Zheng Dept of Comp Sci & Eng USF

CDA 4253 FPGA System Design PicoBlaze Interface Hao Zheng Comp Sci & Eng U of South Florida

CDA 4253 FPGA System Design The PicoBlaze Microcontroller Hao Zheng Comp Sci & Eng U of

Quicksort 4-18-2013 Th Thursda sday, April 18 18 th th Barben Ro Rooms s A&B &B,