VHDL for Logic Synthesis Overview Design Flow for Hardware Design - - PowerPoint PPT Presentation
VHDL for Logic Synthesis Overview Design Flow for Hardware Design - - PowerPoint PPT Presentation
VHDL for Logic Synthesis Overview Design Flow for Hardware Design VHDL coding for synthesis General guidelines for hardware designers This lecture includes the content from: Nitin Yogi, Modelling for Synthesis with VHDL, Auburn
Overview
- Design Flow for Hardware Design
- VHDL coding for synthesis
- General guidelines for hardware designers
2 This lecture includes the content from: Nitin Yogi, Modelling for Synthesis with VHDL, Auburn University Actel HDL Coding Style Guide and other sources
Digital design flow (ASIC; FPGA)
System Design Flow
Mixed-signal Wireless Comm Embedded Computing Architectures MATLAB model - floating point MATLAB model - fixed point RTL coding (VHDL) ASIC Logic Synthesis (Synopsys), FPGA LS (Xilinx ISE) ASIC Back-End (CADENCE SE), FPGA P&R (Xilinx ISE) ASIC DRC & LVS (Cadence Assura, Polyteda) High Level Synthesis (CtoS, CatapultC, HandelC) HwSw Partitioning – based on profiling RTL coding (VHDL) Software flow Verification flow
- based on system level
verification
- Assertions and formal
verification actively used
- Smart testbenches
- FPGA verification
- Palladium verification
DfT flow
- BIST for memory and
logic
- Scan for logic
Electronic System Level (ESL) flow –System C – TLM, Verification, Profiling -VISTA Reconfugurable IP Cores – Internal & External (SystemC, MATLAB, VHDL) Simulink flow Palladium XP
3
VLSI Levels of Abstraction
Specification
(what the chip does, inputs/outputs)
System Level Modeling
major resources, connections
Register-Transfer
logic blocks, FSMs, memory, connections
Circuit
transistors, parasitics, connections
Layout
mask layers, polygons
Logic
gates, flip-flops, latches, connections
4
Activity Flow in Digital Design
5
Functional Design RTL Design Logic Design Circuit Design Physical Design specifications GDS description Behavioural description and verification RTL description and verification Netlist synthesis and simulation Timing Closure Power Analysis Physical Analysis (DRC, LVS, ERC)
ASIC Design flow
IP Library HDL RTL Designs HDL Top Module Definition Simulation Result OK? Logic Synthesis Simulation Result OK? Layout Synthesis Simulation Result OK? Final Chip Layout Test Benches yes IP New synthesis yes yes New layout run sufficient? ye s no no no no yes no Applications System Specification
6
6
Design Views and Abstraction Models
7
- Process of ASIC design starts with behavioral model, goes over
structural until physical model
BEHAVIOURAL STRUCTURAL PHYSICAL algorithms Register transfers Signals, expressions gates registers processors MPSoC transistors cells modules chips
VHDL could be applied at multiple levels of abstraction
8
- For detailed view please visit “Entwurf Digitaler Systeme”
- VHDL can be used to model the circuit of very abstract behavioral
level
- This description can be refined to the RTL level
- Also it can be used for describing the structural netlist
After synthesis Cell Delay After layout Cell Delay Interconnect Delay
Synthesis Process and different coding Styles
- Synthesis converts RTL model to structural model
- As a result we get some sort of a netlist (VHDL, Verilog (most frequently), EDIF)
Behavioral (RTL) model
architecture behav of mux is begin pr: process(A,B,C) begin if (S = '0') then Y <= A; else Y <= B; end if; end process pr; end;
Structural model
architecture netlist of mux is signal CI, D, E:std_logic begin g1: not port map (CI,C); g2: and port map (D,A,CI); g3: and port map (E,B,C); g4: or port map (Y,D,E); end; A B C Y
Synthesis
9
Why we should know the synthesis outcome while describing VHDL?
Behavioral model architecture behav of cont is begin p1: process(A,B,C1, C2) begin if (C1 = ‘1') and (C2 = '1') then Out1 <= A; elsif (C2 = ‘1') then Out2 <= B;
- -else we do not care
end if; end process p1; end;
Synthesis
Expected result of synthesis Obtained synthesized design!
10
A C2 C1 Out1
MUX
B A C2 C1 Out1
MUX
B
D Out Latch EN
Why is suboptimal design dangerous?
11
- Additional hardware leads to overhead in the area -> additional cost
- Additional hardware means additional power consumption -> reduced
battery time
- Suboptimal design have reduced performances -> longer critical path
- Unclarities in the design create potential bugs
- (Unintentional) use of latches may lead to problems in timing
analysis and glitch generation
Why we should know the synthesis outcome while describing VHDL?
- Corrected Design
Behavioral model architecture behav of cont is begin p1: process(A,B,C1, C2) begin if (C1 = ‘1') and (C2 = '1') then Out1 <= A; elsif (C2 = ‘1') then Out2 <= B; else Out2<=‘B’; end if; end process p1; end;
Synthesis
12
A C2 C1 Out1
MUX
B
Rule of correctly written VHDL: Always define the outputs for all IF cases
Typical Digital Circuits
- Combinational logic circuits
random logic multiplexers Decoders
- Arithmetic functions
- Sequential logic (registers)
synchronous & asynchronous inputs
- Shift registers
- Finite state machines
- Memory synthesis
- More advanced circuits (FIFOs, synchronizers, clock gates)
13
How VHDL Simulator works?
- VHDL blocks are simulated using event based simulator
- Assignments are concurrently executed
- Update for all assignments in particular timestamp is performed at the same time
- Following assignments (depending on the updates from the previous calculations)
are updated with delta cycle delay
- Delta cycle delay is simulation quantum time which cannot be visualized, but
enables effective execution of events.
- When all assignments are eventually resolved (after N delta cycles) the simulator
can go to the next timing event in the simulation. X<= Y+ Z; -- assignment executed after delta cycle W<= X-Z;
- - after updating the value of X, we will update the value of W as well,
- - however with one delta cycle delay compared to the X update
Please be careful: A<=B; In this case signals A and B are not identical, and there is a delta delay in between
Variables and Signals
- Variables are used only within the process
Usually they are utilized for holding the immediate results of the calculation (it is also difficult to visualize them in the simulation) Variables are updated immediately (without delta cycle delay) They enable sequential execution in the process
- Signals are always executed with delta delay cycle
Behavioral model architecture behav of cont is begin p1: process(A,B,C1, C2) variable Temp:std_logic:=‘0’; --initial value! begin if (C1 = ‘1') and (C2 = '1') then temp := A; elsif (C2 = ‘1') then temp:= B; else temp:=‘B’; end if; Out1<=B; end process p1; end; Behavioral model architecture behav of ex1 is begin p1: process(clk) variable Temp1, Temp2:std_logic; begin if (clk = ‘1') and clk’event then temp1 := A; temp2 :=temp1; Out1:=temp2; end if; end process p1; end; Behavioral model architecture behav of ex2 is begin p1: process(clk) variable Temp1, Temp2:std_logic; begin if (clk = ‘1') and clk’event then Out1:=temp2; temp1 := A; temp2 :=temp1;
- - order of operation matters!
end if; end process p1; end;
VHDL Coding Styles
- Behavioural
Behavioral model architecture behav of cont is begin p1: process(A,B,C1, C2) begin if (C1 = ‘1') and (C2 = '1') then Out1 <= A; else Out2 <= B; end if; end process p1; end;
- Structural
architecture netlist of cont is signal CI, D, E:std_logic begin g1: not port map (CI,C); g2: and port map (D,A,CI); g3: and port map (E,B,C); g4: or port map (Out1,D,E); end;
- Dataflow
architecture dataflow of cont is begin Out1<=A when C1=‘1’ and C2=‘1’ else B; end;
Sensitivity list in Combinational Logic
Behavioral model architecture behav of cont is begin p1: process(A,B, C2) – missing C1 begin if (C1 = ‘1') and (C2 = '1') then Out1 <= A; else Out2 <= B; end if; end process p1; end;
17
- All signals affecting results of the combinational process need to be
in the sensitivity list
- Otherwise the simulation results will not be representative
- For synchronous circuits it is only required to have clock (and
asynchronous set/reset in the list
- Why is this so?
Multiplexer: Using “case” Statement
entity Mux4 is port (in1: in std_logic_vector(3 downto 0); s1: in std_logic_vector (1 downto 0); m: out std_logic); end Mux4; architecture behav of Mux4 is begin process(s1, in1) begin case s1 is when "00" => m <= i(0); when "01" => m <= i(1); when "10" => m <= i(2); when others => m <= i(3); -- why this? end case; end process; end behav;
18
MUX in1 S1 m
Multiplexer: dataflow implementation
entity Mux4 is port (in1: in std_logic_vector(3 downto 0); s1: in std_logic_vector (1 downto 0); m: out std_logic); end Mux4; architecture behav of Mux4 is begin with s1 select when "00" => m <= i(0); when "01" => m <= i(1); when "10" => m <= i(2); when others => m <= i(3); end behav; This implementation is safer for unexperienced designers => no problems with sensitivity list and complete definition of cases
19
MUX in1 S1 m
Priority encoder
entity enc is port (in1: in std_logic_vector(3 downto 0); s1: in std_logic_vector (1 downto 0); m: out std_logic); end enc; architecture behav of enc is begin process(s1, in1) begin If S1 = "00" then m <= i(0); elsif S1= "01" then m <= i(1); elsif S1= "10" m <= i(2); else m <= i(3); -- why this? end if; end process; end behav;
20
MUX i3 S1=10 i2 MUX S1=01 i1 MUX S1=00 i0 m What is the difference between priority encoder and mux? Which one has shorter critical path?
Synthesizing arithmetic circuits
- Basic arithmetic operations are synthesizable
+,-,*, and abs However, special multiplication architectures are not per default supported and need to be described
- Division operator functions in simulation, but it is not in general synthesizable
Exception is division with 2N How this could be implemented?
- Special operations:
“+1” , “-1” , unary “-”
- Relational Operators:
“=“, “/=“, “<“, “>”, “<=“, “>=“
- For arithmetic functions one (but not both at the same time) of the packages can be used
std_logic_arith numeric_std
21
Ranges of signals/variables
- It is important to define the correct range of logic
Example: please observe the consequences of two different definitions signal i1 : integer range 0 to 15; -- how many bits? signal i1 : integer;
- If we already know the value of some operand, constant should be used
x<= y +3; -- is less complex in synthesis as x<=y+z;
22
Signed and Unsigned Arithmetic
- We cannot directly calculate with std_logic type
- It is not clear which kind of arithmetic need to be used
- Therefore such signals need to be converted to SIGNED or UNSIGNED
arithmetic
- The corresponding arithmetic packages need to be used
library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_ARITH.ALL; entity SUB is port ( in1, in2 : in SIGNED(3 downto 0) ;
- ut1: out SIGNED(3 downto 0) ) ;
end SUB; architecture Behav of SUB is begin
- ut1<= in1 - in2; -- please observe the width of the operands and result
end Behav;
23
Taking overflow into account
library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_ARITH.ALL; entity SUB_ex is port ( in1, in2 : in SIGNED(3 downto 0) ;
- ut1: out SIGNED(4 downto 0) ) ;
end SUB_ex; architecture Behav of SUB_ex is begin
- ut1<= in1(3)&in1 – in2(3)& in2;
end Behav; What we should do for unsigned arithmetic?
24
Combining combinational and sequential logic
library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_ARITH.ALL; entity SUB_ex_clk is port ( clk, reset: in std_logic; in1, in2 : in SIGNED(3 downto 0) ;
- ut1: out SIGNED(4 downto 0) ) ;
end SUB_ex_clk; architecture Behav of SUB_ex_clk is Begin px: process(clk, reset) – why those signals? if reset=‘1’ then -- how we should name reset active 0?
- ut1<=(others=>‘0’); -- what this means?
elsif clk’event and clk=‘1’ then
- ut1<= in1(3)&in1 – in2(3)& in2;
end if; end process px; end Behav;
25
Combining combinational and sequential logic – Alternative approach
library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_ARITH.ALL; entity SUB_ex_clk is port ( clk, reset: in std_logic; in1, in2 : in SIGNED(3 downto 0) ;
- ut1: out SIGNED(4 downto 0) ) ;
end SUB_ex_clk; architecture Behav of SUB_ex_clk is Signal out_s: SIGNED(4 downto 0); Begin Out_s<= in1(3)&in1 – in2(3)& in2; px: process(clk, reset) if reset=‘1’ then
- ut1<=(others=>‘0’);
elsif clk’event and clk=‘1’ then
- ut1<= out_s; -- could we visualize such circuit after synthesis?
end if; end process px; end Behav; ;
26
Combining combinational and sequential logic – adding conditions
library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_ARITH.ALL; entity ACU_ex_clk is port ( clk, reset, cnt: in std_logic; in1, in2 : in SIGNED(3 downto 0) ;
- ut1: out SIGNED(4 downto 0) ) ;
end ACU_ex_clk; architecture Behav of ACU_ex_clk is Signal out_s: SIGNED(4 downto 0); Begin Out_s<= in1(3)&in1 – in2(3)& in2 when cnt=‘1’ else in1(3)&in1 – in2(3)& in2; px: process(clk, reset) if reset=‘1’ then
- ut1<=(others=>‘0’);
elsif clk’event and clk=‘1’ then
- ut1<= out_s;
end if; end process px; end Behav; ;
27
Combining combinational and sequential logic – adding conditions
library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_ARITH.ALL; entity ACU_ex_clk is port ( clk, reset, cnt: in std_logic; in1, in2 : in SIGNED(3 downto 0) ;
- ut1: out SIGNED(4 downto 0) ) ;
end ACU_ex_clk; architecture Behav of ACU_ex_clk is Signal out_s: SIGNED(4 downto 0); Begin px: process(clk, reset) if reset=‘1’ then
- ut1<=(others=>‘0’);
elsif clk’event and clk=‘1’ then case cnt is when ‘1’ => out1<= in1(3)&in1 – in2(3)& in2; when others => out1<= in1(3)&in1 + in2(3)& in2; end case; end if; end process px; end Behav; ;
28
Resource Sharing
process (s1,s2,s3,cnt) begin if (cnt=‘0’) then Out1 <= S1 * S2 ; else Out1 <= S3 * S2 ; end if ; end process ;
29
X X MUX S1 S2 S3 cnt X MUX S1 S2 S3 cnt equivalent
process (s1,s2,s3,cnt) begin if (cnt=‘0’) then Out1_s <= S1 ; else Out1_s <= S3 ; end if ; Out1<= Out1_s*S2; end process ;
Latches and Flip-flops (reminder)
- Sequential elements are latches and flip-flops
Flip-flops are more frequently used in synchronous designs
- Latches
process (EN, D, RSTN) – please check the sensitivity list begin if RSTN=‘0’ then if (EN = ‘1’) then Q <= D ; end if; end process;
- Flip-flops
process (CLK) begin if (CLK’event and CLK= ‘0’) then – what this means? Q <= D ; end if; end process;
30
Reset/Set as synchronous and asynchronous signals
- Asynchronous reset/set corresponds to equivalent flip-flop standard cells
where activation/deactivation or reset is not related to clock activity How to describe this in VHDL?
- Asynchronous signals could be critical in synchronous system since we do
not know timing relation to the clock Metastability issue Synchronization of reset from the external world
- Synchronous reset behaves as any other signal within the synchronous
pipeline. Could be seen as multiplexer before the flip-flop More rarely used, often in specific applications (space) How to describe this in VHDL?
31
D CLK RST Q
MUX
‘0’
D Out D-FF CLK
Try your examples
- Try with simple circuits:
- 8-bit counter with Load and Asynchronous Reset
- Shift register (Shift left, right, rotate)
- Tri-state buffer
- Bi-directional buffer
Not frequently used for on-chip communication
32
Output Input En
Moore/Mealy FSMs
http://www.rz.e-technik.fh-kiel.de/~dispert/digital/digital6/dig006_2.htm
Moore Format Mealy Format
33
State machine synthesis issues
- Two-types of FSM models
Mealy model: outputs = f ( inputs, state) Moore model: outputs = f ( state )
- Present_state and next_state
Enumeration type state encoding
- Two processes
combinational and sequential
- Using “case” statement rather than “if-then-elsif…” to avoid generation of
priority encoder
- Next state assigned in a synchronous template
34
Models of Synchronous Systems
- Moore and Mealy FSMs
35
Moore Model in VHDL
library IEEE; use IEEE.STD_LOGIC_1164.all; entity Moore is port ( Inp1 , clk, reset : in std_logic ; Out1 : out std_logic ); end Moore; architecture FSM of Moore is type state is (s1, s2, s3); signal present_state , next_state : state; begin process ( inp1 , present_state ) begin – combinational part case present_state is when s1 => Out1 <= '0'; if ( Inp1 = '1') then next_state <= s3; else next_state <= s2; end if; when s2 => Out1 <= ‘0'; if ( Inp1 = '1') then next_state <= s3; else next_state <= s1; end if; when s3 => Out1 <= '1'; next_state <= s1; end case; end process; process (clk, reset) begin -- sequential part if reset=‘1’ then present_state<=s1; if clk=‘1’ and clk’event then present_state <= next_state ; -- taking the result of combinational logic and storing into reg end if; end process; end FSM ;
36
Mealy Model in VHDL
library IEEE; use IEEE.STD_LOGIC_1164.all; entity Mealy is port ( Inp1 , clk, reset : in std_logic ; Out1 : out std_logic ); end Mealy; architecture FSM of Mealy is type state is (s1, s2, s3); signal present_state , next_state : state; begin process ( inp1 , present_state ) begin – combinational part case present_state is when s1 => if ( Inp1 = '1') then next_state <= s3; Out1 <= '0'; else next_state <= s2; Out1 <= ‘1'; end if; when s2 => if ( Inp1 = '1') then next_state <= s3; Out1 <= ‘0'; else next_state <= s1; Out1 <= ‘1'; end if; when s3 => Out1 <= '1'; next_state <= s1; end case; end process; process (clk, reset) begin -- sequential part if reset=‘1’ then present_state<=s1; if clk=‘1’ and clk’event then present_state <= next_state ; -- taking the result of combinational logic and storing into reg end if; end process; end FSM ; What is the difference?
37
Memory Synthesis
- Approaches:
Sequential logic using flip-flops or latches easy to be used, ineffective in respect to area and power D-flip-flop ~ 26 transistors Register files in datapaths SRAM – Static RAM 6 transistors per cell SRAM memory standard components – no configurability, hard macros DRAM – only 1 transistor per cell, but needs for refresh ROM, PROM, Embedded Flash Emerging memories: RRAM, MRAM, PCRAM memory compilers – one can choose the configuration and architecture, much more optimal then FF based, limited number of access ports Single port, dual-port, two-port memories
38
Memory Generator
Source: Xilinx
Involving memories in Code
- For generating memory models one should use memory generators
- Memory instances should be included in the structural VHDL code
- For memory wrappers (glue logic) generate the separate instance
- Normally for behavioral simulation use VHDL memory models
- For back-annotation use verilog memory models
40
FIFO Implementation
- FIFOs should be implemented as circular buffers
41
RAM Write pointer (counter) Read pointer (counter) we re W_add R_add W_data R_data
Combinational and sequential logic – Coding Guidelines
- Avoid the instances with only combinational logic
The output signals should be registered
- For pure combinational path use non-process description style
(dataflow)
- For sequential parts always use flip-flop template
For most of the applications is best to use consistently asynchronous reset
42
Design Organization & Partitioning
- Don‘t mix structural and behavioral code
Avoid glue logic in structural designs; if necessary put the glue logic in a separate design entity.
- Use comments to describe important issues related to the code
functionality.
- Make the header for each entity with corresponding comments
- For large designs try to organise your files to be distributed in
separate folders; each folder should contain data related to a larger structural unit of the design.
- Avoid using generics at the top level; it is recommended to use
packages with definitions of constants instead of generics
- Make clock dividers and reset synchronisers as separate entities and
include them on the top
43
Naming conventions
- Design name and entity name should be the same
example (vhdl): design.vhd; example (verilog): design.v entity design is module design
- Port, signal, process, and instance names should be meaningful
clk, data, data, reset, ack, cs, wr, rd, test_si, etc
- Don’t mix lower and upper case (however VHDL is not case sensitive)
- For signals and variables that are active low, this shall be clearly
indicated by their name, by suffixing _n
- Every process shall have a name; the name shall be formed by suffixing
_proc
- Architecture name shall be formed by suffixing _arch
44
Architectural Decisions
- When some system is coded on RTL level, the designer has to have
in mind the system architecture
- Memory insertion must be considered
- Area trade-off
- Performance trade-off
- Power trade-off
45
Reset Issue
- Ensure that all the registers in the design are resettable;
Non-resettable registers are not testable and their behaviour is hard to debug
- If the design requirements prefer non-resettable cells make sure to
provide proper initialisation procedure in the simulation;
- Asynchronous resets are most commonly used
Don’t mix synchronous and asynchronous resets.
- Use reset synchronizers
46
Clocking strategy
- Reasonable clock frequencies are (rule of thumb)
for 0.25 um up to 100 MHz, for 0.13 um up to 166 MHz.
- If possible avoid different clock domains
- Clock control circuits (gates, divider, multiplexers) should be
grouped in the single entity on the top level of design
- For high performance design or complex clocking (divided clock
domains) use PLL (DLL) in design
- Take care about the delta cycles!
47
Clock issues and clock-gating
- Don’t mix rising and falling edge flip-flops
- Use glitch-free clock gates for clock gating or special standard cells
(if they are available)
48
Synchronization
- If you have to transfer the data
between the different unrelated clock domains use synchronizers Otherwise you will have problems with metastability.
- Two-flop or single-flop
synchronizers for a single bit
- For bus synchronization do not
synchronize each bit individually; introduce an “enable” signal and then synchronize this enable signal
49
Making design testable
- Provide test_mode signal
In this mode all clocks must come directly from PADs without any gating In this mode all reset signals must come directly from PADs without any gating or registering
- DFT strategies commonly used
Structural test (Scan test) Memory BIST Logic BIST
- Advanced rules for memory insertion, combinational loops, reset
definition, complex clocking with DFT
50
Standard and Scan Flip-Flop
Test and Verification
- Writing the good testbench is as much important as making the good
design
- The input data could be read from the file (textio package)
The output data should be compared to the golden model (coming from C, MATLAB etc.)
- The tests should be as much exhaustive as possible
Code coverage shall be reported to ensure the quality and the thoroughness of the testbench
- Use assertions and avoid relying on GUI
Conclusions
- Writing a VHDL is not the same as writing C-code
- The designer must understand the concequences of particular
coding style
- The designer should write the code such that this fully defines
resulting hardware after synthesis
- Some guidelines should be followed to have the efficient code
generation
52