CDA 4253 FPGA System Design Op7miza7on Techniques Hao Zheng Comp S - PowerPoint PPT Presentation

CDA 4253 FPGA System Design Op7miza7on Techniques Hao Zheng Comp S ci & Eng Univ of South Florida 1

Extracted from Advanced FPGA Design by Steve Kilts 2

Op7miza7on for Performance 3

Performance Defini7ons • Throughput : the number of inputs processed per unit 2me. • Latency : the amount of 2me for an input to be processed. • Maximizing throughput and minimizing latency in conflict. • Both require 2ming op2miza2on: - Reduce delay of the cri$cal path. 4

Achieving High Throughput: Pipelining • Divide data processing into stages • Process different data inputs in different stages simultaneously. -- Non-pipelined version -- Non-pipelined version process process (clk) begin begin xpower = 1; if if rising_edge(clk) then then for for (i = 0; i < 3; i++) if if start=‘1’ then then cnt <= 3; xpower = x * xpower; end end if if; if if cnt > 0 then then cnt <= cnt – 1; xpower <= xpower * x; Throughput: 1 data / 3 cycles = 0.33 elsif elsif cnt = 0 then then data / cycle . done <= ‘1’; Latency: 3 cycles. end if end if; Critical path delay: 1 multiplier delay end process; end process 5

Achieving High Throughput: Pipelining -- Pipelined version process (clk, rst) begin if rising_edge(clk) then if start=‘1’ then -- stage 1 x1 <= x; xpower1 <= x; xpower = 1; done1 <= start; end if ; for (i = 0; i < 3; i++) -- stage 2 xpower = x * xpower; x2 <= x1; xpower2 <= xpower1 * x1; done2 <= done1; Throughput: 1 data / cycle -- stage 3 Latency: 3 cycles + register delays. xpower <= xpower2 * x2; Critical path delay: 1 multiplier delay done <= done2; end if ; end process ; 6

Comparison Iterative implementation Pipelined implementation 7

Achieving High Throughput: Pipelining • Loop unrolling X Y C Reg 8

Achieving High Throughput: Pipelining • Loop unrolling X ... C0 C1 Cn Reg Reg Y 9

Achieving High Throughput: Pipelining • Divide data processing into stages • Process different data inputs in different stages simultaneously. dout din 10

Achieving High Throughput: Pipelining • Divide data processing into stages • Process different data inputs in different stages simultaneously. dout din … stage 1 stage 2 stage n registers Penalty: increase in area as logic needs to be duplicated for different stages 11

Reducing Latency • Closely related to reducing cri2cal path delay. • Reducing pipeline registers reduces latency. dout din … stage 1 stage 2 stage n registers 12

Reducing Latency • Closely related to reducing cri2cal path delay. • Reducing pipeline registers reduces latency. dout din … stage 1 stage 2 stage n 13

Timing Op7miza7on • Maximal clock frequency determined by the longest path delay in any combina2onal logic blocks. • Pipelining is one approach. dout din … stage 1 stage 2 stage n pipeline registers din dout 14

Timing Op7miza7on: Spa7al Compu7ng • Extract independent opera2ons • Execute independent opera2ons in parallel. X = A + B + C + D process (clk, rst) begin process (clk, rst) begin if rising_edge(clk) then if rising_edge(clk) then X1 := A + B; X1 := A + B; X2 := C + D; X2 := X1 + C; X <= X1 + X2; X <= X2 + D; end if ; end if ; end process ; end process ; Critical path delay: 2 adders Critical path delay: 3 adders 15

Timing Op7miza7on: Avoid Unwanted Priority process (clk, rst) begin if rising_edge(clk) then if c[0]=‘1’ then r[0] <= din; elsif c[1]=‘1’ then r[1] <= din; elsif c[2]=‘1’ then r[2] <= din; elsif c[3]=‘1’ then r[3] <= din; end if; end if; end process; Critical path delay: 3-input AND gate + 4x1 MUX. 16

Timing Op7miza7on: Avoid Unwanted Priority Critical path delay: 3-input AND gate + 4x1 MUX. 17

Timing Op7miza7on: Avoid Unwanted Priority process (clk, rst) begin if rising_edge(clk) then if c[0]=‘1’ then r[0] <= din; end if; if c[1]=‘1’ then r[1] <= din; end if; if c[2]=‘1’ then r[2] <= din; end if; if c[3]=‘1’ then r[3] <= din; end if; end if; end process; Critical path delay: 2x1 MUX 18

Timing Op7miza7on: Avoid Unwanted Priority Critical path delay: 2x1 MUX 19

Timing Op7miza7on: Register Balancing • Maximal clock frequency determined by the longest path delay in any combina2onal logic blocks. din dout block 1 block 2 din dout block 1 block 2 20

Timing Op7miza7on: Register Balancing process process (clk, rst) begin begin process (clk, rst) begin process begin if if rising_edge(clk) then then if if rising_edge(clk) then then sumAB <= A + B; rA <= A; rC <= C; rB <= B; sum <= sumAB + rC; rC <= C; end if end if; sum <= rA + rB + rC; end process end process; end if end if; end process; end process

Timing Op7miza7on: Register Balancing process process (clk, rst) begin begin if if rising_edge(clk) then then rA <= A; rB <= B; rC <= C; sum <= rA + rB + rC; end if end if; end process end process;

Timing Op7miza7on: Register Balancing process process (clk, rst) begin begin if if rising_edge(clk) then then sumAB <= A + B; rC <= C; sum <= sumAB + rC; end if end if; end process end process;

Op7miza7on for Area 24

Area Op7miza7on: Resource Sharing • Rolling up pipleline: share common resources at different 2me – a form of temporal compu2ng dout din … stage 1 stage 2 stage n Block including dout all all logic in din stage 1 to n. 25

Area Op7miza7on: Resource Sharing • Use registers to hold inputs • Develop FSM to select which inputs to process in each cycle. X = A + B + C + D A + B + X C + D 26

Area Op7miza7on: Resource Sharing • Use registers to hold inputs • Develop FSM to select which inputs to process in each cycle. X = A + B + C + D A A B + B C D + X + X C + control D A, B, C, D need to hold steady until X is processed 27

Area Op7miza7on: Resource Sharing Merge duplicate components together 28

Area Op7miza7on: Resource Sharing Merge duplicate components together – reduces a 8-bit counter 29

CDA 4253 FPGA System Design Op7miza7on Techniques Hao Zheng Comp S - PowerPoint PPT Presentation

CDA 4253 FPGA System Design Op7miza7on Techniques Hao Zheng Comp S ci & Eng Univ of South Florida 1 Extracted from Advanced FPGA Design by Steve Kilts 2 Op7miza7on for Performance 3 Performance Defini7ons Throughput : the number

CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of Comp Sci & Eng U of South

CDA 4253 FPGA System Design Introduction to VHDL Hao Zheng Dept of Comp Sci & Eng USF

CDA 4253/CIS 6930 FPGA System Design Modeling of Combinational Circuits Hao Zheng Dept of Comp

CDA 4253 FPGA System Design PicoBlaze Interface Hao Zheng Comp Sci & Eng U of South Florida

CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao Zheng Comp S ci & Eng Univ

CDA 4253 FPGA System Design The PicoBlaze Microcontroller Hao Zheng Comp Sci & Eng U of

CDA 4253/CIS 6930 FPGA System Design VHDL Testbench Development Hao Zheng Comp. Sci & Eng

CDA 4253/CIS 6930 FPGA System Design Finite State Machines Dr. Hao Zheng Comp Sci & Eng U

CDA 4253/CIS 6930 FPGA System Design Sequential Circuit Building Blocks Hao Zheng Dept of Comp

CDA Technology and Design Overview ubomr Hribk www.tempest.technology CDA DESIGN

Slides on the IT- Slides on the IT- CDA Service CDA Service Documentation Documentation

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

CDA InterCorp Controllable Drive Actuators AS9100C certified ISO 9001:2008 certified CDA

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

CPSC 121: Models of Computation Trace the operation of a DFA (deterministic finite-state

Network Flow IV - Applications II Lecture 15 October 17, 2013 Sariel (UIUC) CS573 1 Fall

Week 4 - Friday What did we talk about last time? Homogeneous notation Vector equations

IIT Bombay Course Code : EE 611 Department: Electrical Engineering Instructor Name: Jayanta

1 & 2 Samuel Series Lesson #169 April 30, 2019 Dean Bible Ministries

PRINCIPLES OF MICROSERVICES Sam Newman Microxchg, Berlin 2015 1 @samnewman There is no hyphen

Welcome to CSci 1113 Introduction to C/C++ Programming for Scientists and Engineers Instructor

The Probabilistic Approach to Learning from Data Prob. Readings: Matt Gormley

CDA 4253 FPGA System Design Op7miza7on Techniques Hao Zheng Comp S - PowerPoint PPT Presentation

CDA 4253 FPGA System Design Op7miza7on Techniques Hao Zheng Comp S ci & Eng Univ of South Florida 1 Extracted from Advanced FPGA Design by Steve Kilts 2 Op7miza7on for Performance 3 Performance Defini7ons Throughput : the number

CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of Comp Sci &amp; Eng U of South

CDA 4253 FPGA System Design Introduction to VHDL Hao Zheng Dept of Comp Sci &amp; Eng USF

CDA 4253/CIS 6930 FPGA System Design Modeling of Combinational Circuits Hao Zheng Dept of Comp

CDA 4253 FPGA System Design PicoBlaze Interface Hao Zheng Comp Sci &amp; Eng U of South Florida

CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao Zheng Comp S ci &amp; Eng Univ

CDA 4253 FPGA System Design The PicoBlaze Microcontroller Hao Zheng Comp Sci &amp; Eng U of

CDA 4253/CIS 6930 FPGA System Design VHDL Testbench Development Hao Zheng Comp. Sci &amp; Eng

CDA 4253/CIS 6930 FPGA System Design Finite State Machines Dr. Hao Zheng Comp Sci &amp; Eng U

CDA 4253/CIS 6930 FPGA System Design Sequential Circuit Building Blocks Hao Zheng Dept of Comp

CDA Technology and Design Overview ubomr Hribk www.tempest.technology CDA DESIGN

Slides on the IT- Slides on the IT- CDA Service CDA Service Documentation Documentation

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

CDA InterCorp Controllable Drive Actuators AS9100C certified ISO 9001:2008 certified CDA

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

CPSC 121: Models of Computation Trace the operation of a DFA (deterministic finite-state

Network Flow IV - Applications II Lecture 15 October 17, 2013 Sariel (UIUC) CS573 1 Fall

Week 4 - Friday What did we talk about last time? Homogeneous notation Vector equations

IIT Bombay Course Code : EE 611 Department: Electrical Engineering Instructor Name: Jayanta

1 &amp; 2 Samuel Series Lesson #169 April 30, 2019 Dean Bible Ministries

PRINCIPLES OF MICROSERVICES Sam Newman Microxchg, Berlin 2015 1 @samnewman There is no hyphen

Welcome to CSci 1113 Introduction to C/C++ Programming for Scientists and Engineers Instructor

The Probabilistic Approach to Learning from Data Prob. Readings: Matt Gormley

CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of Comp Sci & Eng U of South

CDA 4253 FPGA System Design Introduction to VHDL Hao Zheng Dept of Comp Sci & Eng USF

CDA 4253 FPGA System Design PicoBlaze Interface Hao Zheng Comp Sci & Eng U of South Florida

CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao Zheng Comp S ci & Eng Univ

CDA 4253 FPGA System Design The PicoBlaze Microcontroller Hao Zheng Comp Sci & Eng U of

CDA 4253/CIS 6930 FPGA System Design VHDL Testbench Development Hao Zheng Comp. Sci & Eng

CDA 4253/CIS 6930 FPGA System Design Finite State Machines Dr. Hao Zheng Comp Sci & Eng U

1 & 2 Samuel Series Lesson #169 April 30, 2019 Dean Bible Ministries