IIT Bombay CDEEP Autumn 2009 Introduction to IMAGE Simulation - - PowerPoint PPT Presentation

iit bombay
SMART_READER_LITE
LIVE PREVIEW

IIT Bombay CDEEP Autumn 2009 Introduction to IMAGE Simulation - - PowerPoint PPT Presentation

IIT Bombay CDEEP Autumn 2009 Introduction to IMAGE Simulation flow Presented by- Anil Powai Labs Tech. Pvt. Ltd. EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini IIT Bombay Hardware accelerated design


slide-1
SLIDE 1

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Introduction to IMAGE Simulation flow

CDEEP Autumn 2009

Presented by- Anil Powai Labs Tech. Pvt. Ltd.

slide-2
SLIDE 2

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Hardware accelerated design simulation process

  • The designs are rapidly evolving, doubling in size with

each generation and heading to tens million gates.

  • This causes dramatic increase of the simulation run time.

The simulation time has increased from minutes and hours to days and weeks.

  • Therefore it is difficult to verify ASICs and system-on-

chip (SoC) designs through software-only simulation.

  • Simulation assisted by special hardware is the best

solution for speeding up the simulation of large design sections that have been tested and accepted by RTL simulations.

slide-3
SLIDE 3

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Why hardware simulation is faster?

  • Assume a case where in the original RTL

simulation the testbench is responsible for 10% of the simulation time . So for each 100 seconds of simulation time, 10 seconds are spent on executing testbench and 90 seconds on the design itself.

  • If we map the design portion from the

software simulator into hardware, we could observe performance improvement in 90% of the simulation time because of concurrent behavior of hardware.

slide-4
SLIDE 4

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Software simulation Hardware simulation

Hardware accelerated design simulation process

Module A Module A & B & C Module A & B Testbench/design top Module B Module C Techbench/design top Module C Testbench

slide-5
SLIDE 5

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Hardware accelerated design simulation process

 Typically hardware accelerated design simulation process is carried out using a FPGA board.

HDL Simulator FPGA Prototyping Board

slide-6
SLIDE 6

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

IMAGE : First look

 IMAGE is an integrated system of proprietary software tools, customized FPGA based hardware and distributed synthesis servers.  The IMAGE system takes a specified design description and maps it to hardware system consisting of multiple FPGA’s and memory

slide-7
SLIDE 7

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Mapping Design to IMAGE Hardware

IMAGE HW FPGA FPGA FPGA FPGA MEMORY MEMORY PCI DUT mapped to IMAGE HW

slide-8
SLIDE 8

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

What is IMAGE ?

 Thus, IMAGE creates a map of user defined RTL

  • nto pre-designed reconfigurable hardware.

 IMAGE can be used to select a section of simulation RTL, map it to FPGA hardware, and run the simulation in co-simulation mode with part of the simulation running on a host and part of it running

  • n IMAGE FPGA hardware.
slide-9
SLIDE 9

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Co-Simulation with IMAGE

Test-Bench Design Top Instance 1 Instance 2 (DUT) Test-Bench Design Top Instance1 Instance2 (DUT) Simulator Simulator IMAGE Hardware Software Simulation IMAGE Co-Simulation

slide-10
SLIDE 10

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Components in the IMAGE system

  • IMAGE mapping flow : A set of mapping tools that accepts

an RTL description, analyses it, partitions it, and maps it a set

  • f hardware boards.

– This process is incremental in nature. i.e. if you make a small change in the RTL source, the turn-around time of the entire flow is correspondingly small.

  • IMAGE hardware : A set of hardware boards, each of which

contains several FPGA's and a large amount of memory. – The mapping tool flow will partition the RTL across the available boards.

  • IMAGE synthesis flow : A set of management tools to

coordinate the synthesis and compute servers needed to complete the mapping process.

slide-11
SLIDE 11

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Components ..

The current capacity of an IMAGE system (using up to 12 cards) is 24 million ASIC gates with 200MB of memory. IMAGE server

IMAGE Mapping Flow IMAGE Synthesis Flow PCI LAN Synthesis Servers IMAGE Hardware

Work Node (IMAGE FLOW) LAN LAN

slide-12
SLIDE 12

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Features of the IMAGE system

Mixed language:

Support of VHDL/Verilog mixed design description with various signal coding schemes. Full type visibility from Hardware. Visibility of '0', '1', 'X','Z' in hardware.

Multi-Clock:

The IMAGE system allows the source RTL to have an arbitrary number of clocks, clock-gating logic and asynchronous descriptions.

slide-13
SLIDE 13

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Features ..

Controllability (ForceIT) and Observability (HookIN) Features:

IMAGE permits the user to specify internal control and observation points which are then accessible from the host application.

A B C D SIG

Clk Rst

slide-14
SLIDE 14

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Features ..

Memory-Mapping (Mirage):

IMAGE provides memory entities/modules which when used in the user RTL, enable the IMAGE tools to map these instances to on-board memory resources. The memory-mapping feature in IMAGE can model arbitrarily ported memories as well as ROMs. This can save FPGA resources, and increase the effective capacity of the hardware.

slide-15
SLIDE 15

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Features ..

Black-boxes, Un-synthesizable Entities:

A black-box in IMAGE is a piece of already synthesized RTL which needs to be ”dropped-in” to the hardware. An un-synthesizable entity/module is a design unit which is marked by the user as a unit which cannot be synthesized; IMAGE will pull this unit

  • ut of the hardware and place it on the host-side

for the co-simulation process.

slide-16
SLIDE 16

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Features ..

API and Simulator Interfaces:

IMAGE comes with a full-featured API using which a user application that works together with the hardware can be constructed. IMAGE uses the Verilog-PLI/DPI interface link to a Verilog or mixed language simulator and the VHPI interface link to a VHDL only simulator. The DPI interface offers the highest performance.

slide-17
SLIDE 17

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Incremental flow and Design Re-use

 The IMAGE mapping flow is incremental in

  • nature. Small changes in the original RTL source

do not need the entire process to be rerun: only the required part is run.

 This can lead to a turnaround time of a few minutes.

 In particular, a design unit that is already mapped to IMAGE hardware may be re-used in another simulation without having to repeat the mapping process.

slide-18
SLIDE 18

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Features ..

Distributed Synthesis server Setup:

In the IMAGE system, one can configure a set of computers to act as servers in the IMAGE flow. These compute servers can be used to speed up the mapping process by parallelization. IMAGE allows monitoring and control of synthesis server.

slide-19
SLIDE 19

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Feature : Distributed Server Setup

slide-20
SLIDE 20

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Features ..

IMAGE Flow Debugging:

In order to track down the source of a possible mismatch between your software simulation and IMAGE accelerated simulation, the IMAGE installation includes a set of utilities which can allow you to break the IMAGE mapping flow at different points and to simulate the transformed RTL.

slide-21
SLIDE 21

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Examples : IMAGE Flow

 Simulation of a design using IMAGE Flow is seamless.  IMAGE can use an existing simulation setup based on standard simulators (e.g. Modelsim, VCS, GHDL, IcarusVerilog).

slide-22
SLIDE 22

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Example1 : Mux ..

 Consider a example to illustrate the IMAGE Flow.  We will denote the top entity/module from which analysis is to begin as the mux_tb, and the instance which is to be mapped to IMAGE hardware as the DUT.  The DUT is a mux instance in VHDL, and we want to simulate the mux description using the mux_tb entity.

slide-23
SLIDE 23

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Example1 : Mux ..

 The instance hierarchy of the design is as given below:

 top:mux_tb

  • DUT:u1
slide-24
SLIDE 24

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Example1 : Mux ..

 In order to map specified design to IMAGE FPGA Hardware you need to supply the following information to IMAGE:

 the type of language in which the top entity is described (in this case, VHDL)  the source of the RTL files which constitute the mux_tb and the mux.  the identity of the top entity (in this case, mux_tb).  the identity of the instance to be mapped to IMAGE hardware relative to the top(mux_tb) (in this case, u1).  the synthesis tool to be used during the mapping process(Xilinx).

slide-25
SLIDE 25

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Example1 : Mux ..

 the simulation tool to be used during the final co-simulation (currently Modelsim/VCS/iverilog/ghdl are supported).  the simulation mode (can be batch server,batch or gui). If the batch server or batch mode is selected then simulation time limit and resolution also need to be specified.  the simulator interface mode (can be "pli”/“dpi”/“vpi”/“vhpi”). If the simulator that you use supports the SystemVerilog 3.1 DPI standard, you should use the “dpi” option for better performance.

 Using all above information a command to IMAGE Flow is generated which seamlessly perform co-simulation.

slide-26
SLIDE 26

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Example1 : Mux ..

 The following command is used to run the example:

Image.py -d vhdl:mux_tb:u1 –s source \

  • -sim_tool ghdl –synth_tool xilinx \

–sim_mode batch_server –t 1 –ts us \

  • -vcd_gen sig_add_wave.txt \

–vcd_file mux_tb_dut –sim_link vhpi

slide-27
SLIDE 27

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

The explanation of the arguments

 -d vhdl:mux_tb:u1: A description of the DUT to be analyzed (more than one DUT can be specified). The fields in the DUT descriptor are

 (a) vhdl: specifies that the top entity/module is described in VHDL.  (b) mux_tb: specifies that the top entity name is “mux_tb”.  (c) u1: specifies that the instance to be mapped to hardware is the instance named u1 inside the mux_tb instance (’/’ is used as hierarchy separators).

slide-28
SLIDE 28

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

The explanation of the arguments

 -s source: specifies that the directories where the RTL files are to be found are listed in the file “source”.  −−sim tool ghdl: the simulation tool is GHDL. VCS/Mcdelsim/iverilog(icarus) are the other supported simulators. To use VCS supply vcs as the −−sim tool argument1.

slide-29
SLIDE 29

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

The explanation of the arguments

 −−synth tool xilinx : the synthesis tool is Xilinx. Synthesis jobs are distributed among remote synthesis servers. Remote synthesis servers should have Xilinx ISE 8.2 or higher versions installed.  -t 1 : simulate up to 1 time-units (the time-unit is specified with the −−ts option).  −−ts us: simulation resolution is us.

slide-30
SLIDE 30

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

The explanation of the arguments

 −−sim mode batch server : the simulation is to be run in the batch server mode. In this mode, the simulation runs on a remote IMAGE server and returns the simulation results to the user-

  • machine. Simulation results are in the form of simulation log

and .vcd files. The other available options are batch and gui.

 The batch option is similar to batch server except that the simulation is run on the local machine (which is assumed to be an IMAGE simulation server ). If the gui mode is selected, the simulation will be run locally (the local machine must be an IMAGE simulation server).  In the gui mode, the specified simulator GUI will be invoked and the user has to run the simulation by loading the simulation script file and by specifying the simulation duration.

slide-31
SLIDE 31

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

The explanation of the arguments

 −−vcd gen sig add wave.txt : the waveforms of the signals specified in the sig add wave.txt file will be reported in a vcd file (whose name is specified by the -vcd option). The signals are specified in one of the following forms:  A line in sig add wave.txt of the form mux_tb/u1/ *  will specify a dump of all signals in the instance “mux_tb/u1”.

slide-32
SLIDE 32

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

The explanation of the arguments

 −−vcd file testbench DUT: The waveforms of the dumped signals will be saved in “testbench DUT.vcd”. The default name of the dump file is IMAGE.vcd.  −−sim link pli: mode-of-simulator interface(PLI/VPI/DPI/VHPI). The IMAGE system uses one of many standard simulator interfaces for linking the user simulation with IMAGE hardware

 PLI version 2.0 or VPI (upgraded PLI-2.0) or DPI (SystemVerilog 3.1 standard) interface link to a Verilog or mixed language simulator and the VHPI interface link to a VHDL only simulator.

slide-33
SLIDE 33

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

IMAGE Simulation

 When you run the Image.py command the following sequence of actions takes place

 Analysis: In this step, the specified RTL files are parsed and the instance hierarchy elaborated, the DUT is separated from the test-bench, a network representation of the DUT instance specified at the command line is created. The network representation is a graph of instances of VHDL entities/Verilog modules corresponding to the unique processes/always-blocks or continuous assignments encountered inside the DUT.

slide-34
SLIDE 34

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

IMAGE Simulation

 Library Synthesis: The unique process-level entities/modules are synthesized (in parallel) using the synthesis tool selected. Exact size estimates are derived from the synthesis results.  Partitioning: A partitioner breaks the design into the appropriate number of FPGAs based on the size information after the Library Synthesis step.  Hex file generation: The partitioned networks are synthesized and mapped to individual FPGAs, and FPGA programming (hex- files) files are generated.

slide-35
SLIDE 35

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

IMAGE Simulation

 Simulation: During this step, the testbench is loaded in to the simulator and the hex-files are loaded onto the board. If the simulation is done using the batch mode, then the simulation is run for the time specified. If the simulation is done through gui mode then the user is supposed to run the simulation for the desired time explicitly in the simulator. All information needed to run this simulation is generated by the Image.py script.

 After running the Image.py command a sub-directories of your current working directory named PL Work and ImageDb are created. PL_Work contains the results of your Image run and ImageDb has previous run databse used for incremental (re-use) run

slide-36
SLIDE 36

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Signal Coding In IMAGE

  • Since IMAGE maps the user RTL to hardware which uses
  • nly the bit type (0/1 valued), it has to encode user types

into bit combinations.

  • IMAGE offers a variety of encodings for VHDL std logic

type and the Verilog wire type

– this helps the user trade-off accuracy versus performance/capacity in the IMAGE system

slide-37
SLIDE 37

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Std_ulogic and its derived types

  • The std_ulogic type is defined in the IEEE library in package std

logic 1164.

  • This type is universally used, and is 9-valued. Thus, a strict

encoding of this type would need 4 bits. However, this is expensive and often unnecessary.

  • Usually, only the 0/1 values are used. In some situations, the Z

and X values are also important to provide more accuracy. The full set of 9 values is rarely needed.

slide-38
SLIDE 38

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Encoding schemes

  • IMAGE provides three possible encodings of std ulogic. The 4-bit encoding is

selected by using the option −r 4 . Note that in this encoding, Z is assigned the code 0000.

Z --> 0000 U --> 0001 X --> 0010 0 --> 0011 1 --> 0100 W --> 0101 L --> 0110 H --> 0111

  • --> 1000
slide-39
SLIDE 39

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Encoding schemes

  • IMAGE provides a 2-bit encoding of std ulogic, which is selected by using the option

−r 2 to the Image.py script in the IMAGE mapping flow. The two bit encoding of std ulogic is

Z --> 00 U --> 11 X --> 11 0 --> 01 1 --> 10 W --> 11 L --> 01 H --> 10

  • --> 11
slide-40
SLIDE 40

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Encoding schemes

  • If the -r 2 option is used, there is a loss of

information in the simulation of the hardware- mapped instance as compared to software simulation.

  • However, there is a considerable reduction(3X)

in the FPGA resources used with the 2-bit encoding relative to the 4-bit encoding.

slide-41
SLIDE 41

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Encoding schemes

  • Finally, IMAGE provides a 1-bit encoding of std ulogic which can be selected by

sending the −r 1 option to the Image.py script. This encoding of std ulogic is

Z --> 0 U --> 1 X --> 1 0 --> 0 1 --> 1 W --> 1 L --> 0 H --> 1

  • --> 1
slide-42
SLIDE 42

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Encoding schemes

  • If the 1-bit encoding is selected then the hardware resource usage drops

by a further 3X relative to the 2-bit encoding.

  • IMAGE also provides a supplementary −Z option to be used with -r 1. This
  • ption effectively treats std_ulogic as equivalent to the bit type.
  • The -r 1 -Z combination offers the most compact mapping of a circuit to

FPGA’s and should be used whenever possible (It is default option).

  • This can be reliably applied only to the acceleration of purely

synthesizable designs which do not rely on condition checking of UXWZ- literals.

slide-43
SLIDE 43

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Clocks and Timing Assumptions

  • The IMAGE system can accelerate a user design with an

arbitrary number of clocks.

  • When used in conjunction with an event-driven

simulator, there is interaction between IMAGE and the simulator at each simulation cycle.

  • IMAGE responds to events from the simulator

immediately, and creates events in the simulator for the next simulation cycle. Thus, the DUT is replaced by a single concurrent procedure called every simulation delta, with response available in the next simulation delta.

slide-44
SLIDE 44

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Classification of Clocks

  • Internally, IMAGE classifies a signal/wire/reg as clock-like

if an edge condition on that signal/wire/reg is checked in some assignment in the user RTL.

  • Thus a signal A in a VHDL description is marked clock-like

if A’event appears legally somewhere in the VHDL description.

  • A wire/reg B in a Verilog description is marked clock-like if

the clause @(posedge B) or @(negedge B) appears legally somewhere in the Verilog description.

slide-45
SLIDE 45

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Design Classification

  • IMAGE classifies designs into two types

– Pure designs are those in which a signal A and all signals which depend on A in a level-sense act as clocks on state-elements either exclusively in an edge- triggered sense, or exclusively in a level-triggered sense. – Impure designs are those which are not pure: in such a design, a signal A controls some state-latches in a level-triggered sense, and other state-flipflops in an edge-triggered sense.

slide-46
SLIDE 46

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Example of an impure design

process(clk,din) – level triggered latch with clock clk begin if clk ='1' then dint <= din after 1 ns; end if; end process; clkb <= not clk; -- clkb is generated from clk using levels process(clkb) – edge triggered flip-flop with clock clkb begin if clkb'event and clkb='0' then dout <= dint after 1 ns; end if; end process;

slide-47
SLIDE 47

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Design Classification

  • In impure designs, since IMAGE does not recognize delay

clauses in the source RTL, the user must provide some information about timing assumptions about paths to IMAGE.

– For instance, suppose that there are two paths P1 and P2 starting from a signal A which re-converge at a process or always block. The user may rely on delay information to ensure correct behavior of the RTL. – perhaps the delay of P1 (which may be part of clock-gating logic) may be known to be much less than the delay of P2 (which may be part of a data-path). Thus the relative delay of the two paths is important.

slide-48
SLIDE 48

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

REFERENCES

  • 1. IMAGE User Guide.
slide-49
SLIDE 49

IIT Bombay

EE705/707 Lecture No. 25 Prof. M.Shojaei Baghini

Thank you