Review Numbers Formats and Simple Arithmetic FPGA Structure (CLB, - - PowerPoint PPT Presentation

review
SMART_READER_LITE
LIVE PREVIEW

Review Numbers Formats and Simple Arithmetic FPGA Structure (CLB, - - PowerPoint PPT Presentation

Review Numbers Formats and Simple Arithmetic FPGA Structure (CLB, Routing, IO, Clocks) Pipelining (Resource VS Speed VS Latency) Memories and Waveform Generation ADCs and DACs applications in DSP Constraints (Timing and


slide-1
SLIDE 1

Review

  • Numbers Formats and Simple Arithmetic
  • FPGA Structure (CLB, Routing, IO, Clocks)
  • Pipelining (Resource VS Speed VS Latency)
  • Memories and Waveform Generation
  • ADCs and DACs applications in DSP
  • Constraints (Timing and Placement)
  • More Complex Arithmetic (Series Expansion

and the CORDIC algorithm for sin & cos)

  • DSP Resources (DSP48E2 Block)
  • Filtering: FIR and IIR Implementations
  • Serial multi-rate DSP (decimation and

interpolation) and applications

slide-2
SLIDE 2

Looking Forward

  • Multi-Rate, Parallel DSP (1 week)
  • FFTs (2 weeks)
  • Digital Compensation (1 week)
  • PLLs
  • AGCs
  • Complete DSP Chains & SDRs (2 weeks)
  • Miscellaneous (1 week)
  • Pseudo-Random Noise Generators and CRC

checks

  • PWM and PDM (audio systems)
slide-3
SLIDE 3

Parallel Processing

  • In some instances, the timing

requirements cannot be met with a serial process even after a DSP function is fully pipelined.

  • Example:
  • In desktop computers, video processing

requires the values of many pixels to be simultaneously computed within the refresh rate.

  • Since many of the operations are

independent, GPUs are well suited to handle the computational load in a parallel fashion.

slide-4
SLIDE 4

Parallel Processing

  • FPGAs are well suited to handle

parallel tasks.

  • We need to understand what can be

computed independently, or how to modify the DSP algorithm to work in a parallel fashion.

  • Like pipelining there is a trade-off

between use of resources and achievable clock rates.

  • Common Applications
  • FFTs
  • Video Processing
  • GSPS ADCs and DACs
slide-5
SLIDE 5

Parallel Processing

  • Many FPGAs now have dedicated

hardware components to facilitate the use of high speed data converters that operate at rates that exceed the FPGA fabric.

  • Gigabit Transceivers
  • Serializers
  • Deserializers
  • RFSoC Integrated ADCs and DACs
  • Extreme care must be taken to

understand clock rates and data formats.

  • ODDR processing of the DAC channels on

the dev board.

slide-6
SLIDE 6

GigaBit Transceiver

slide-7
SLIDE 7

Zynq RFSoC

DATA_ADC0[127:0] – 8x 16-bit samples Up to 16 Converters 128 Values per clock cycle. DATA_ADC0[255:0] – 16x 16-bit samples Up to 16 Converters 256 Values per clock cycle.

slide-8
SLIDE 8

Zynq RFSoC

I/Q Mixers, Decimation, Interpolation all implemented in dedicated hardware.

slide-9
SLIDE 9

Zynq RFSoC

slide-10
SLIDE 10

Serializer

  • Part of the IO Logic
  • Data_In D8 to D1
  • Data_Out OQ
  • Achieve output data rates that are

up to 14x fabric rate.

slide-11
SLIDE 11

Detailed View

  • 4-to-1
  • Signals
  • 2 clocks
  • Global Clock:

Slower clock from FPGA fabric.

  • IO Clock:

High-speed Input/Output clock.

  • SDR and DDR
  • IO Data
  • 4 input lines
  • 1 output line
  • Enables
  • Training Data
slide-12
SLIDE 12

Detailed View

  • Structure
  • Registers
  • Two Columns
  • Parallel Load
  • Global Clock
  • Shift Regs,

Serialized

  • utput.
  • IO Clock
  • Muxes
  • Shift data

from parallel load to Shift Registers.

  • Use Training

Data.

  • Width

Expansion

slide-13
SLIDE 13

Detailed View

  • Operation
  • Global Clock
  • Loads parallel data

from D4 to D1.

  • Strobe
  • Not used on

OSERDESE2.

  • Selects mux to

shift parallel data into shift registers.

  • I/O Clock
  • When Strobe is

high, loads shift registers with new data.

  • When Strobe is low,

shifts out the serial data.

  • Train pin selects

preset training data rather than D4 to D1.

slide-14
SLIDE 14

Detailed View

  • Operation
  • Global Clock
  • Loads parallel data

from D4 to D1.

  • Strobe
  • Not used on

OSERDESE2.

  • Selects mux to

shift parallel data into shift registers.

  • I/O Clock
  • When Strobe is

high, loads shift registers with new data.

  • When Strobe is low,

shifts out the serial data.

  • Train pin selects

preset training data rather than D4 to D1.

slide-15
SLIDE 15

Detailed View

  • Operation
  • Global Clock
  • Loads parallel data

from D4 to D1.

  • Strobe
  • Not used on

OSERDESE2.

  • Selects mux to

shift parallel data into shift registers.

  • I/O Clock
  • When Strobe is

high, loads shift registers with new data.

  • When Strobe is low,

shifts out the serial data.

  • Train pin selects

preset training data rather than D4 to D1.

D1

slide-16
SLIDE 16

Detailed View

  • Operation
  • Global Clock
  • Loads parallel data

from D4 to D1.

  • Strobe
  • Not used on

OSERDESE2.

  • Selects mux to

shift parallel data into shift registers.

  • I/O Clock
  • When Strobe is

high, loads shift registers with new data.

  • When Strobe is low,

shifts out the serial data.

  • Train pin selects

preset training data rather than D4 to D1.

D2

slide-17
SLIDE 17

Detailed View

  • Operation
  • Global Clock
  • Loads parallel data

from D4 to D1.

  • Strobe
  • Not used on

OSERDESE2.

  • Selects mux to

shift parallel data into shift registers.

  • I/O Clock
  • When Strobe is

high, loads shift registers with new data.

  • When Strobe is low,

shifts out the serial data.

  • Train pin selects

preset training data rather than D4 to D1.

D3

slide-18
SLIDE 18

Detailed View

  • Operation
  • Global Clock
  • Loads parallel data

from D4 to D1.

  • Strobe
  • Not used on

OSERDESE2.

  • Selects mux to

shift parallel data into shift registers.

  • I/O Clock
  • When Strobe is

high, loads shift registers with new data.

  • When Strobe is low,

shifts out the serial data.

  • Train pin selects

preset training data rather than D4 to D1.

D4

slide-19
SLIDE 19

Detailed View

  • Operation
  • Global Clock
  • Loads parallel data

from D4 to D1.

  • Strobe
  • Not used on

OSERDESE2.

  • Selects mux to

shift parallel data into shift registers.

  • I/O Clock
  • When Strobe is

high, loads shift registers with new data.

  • When Strobe is low,

shifts out the serial data.

  • Train pin selects

preset training data rather than D4 to D1.

D1

slide-20
SLIDE 20

Detailed View

  • Operation
  • Global Clock
  • Loads parallel data

from D4 to D1.

  • Strobe
  • Not used on

OSERDESE2.

  • Selects mux to

shift parallel data into shift registers.

  • I/O Clock
  • When Strobe is

high, loads shift registers with new data.

  • When Strobe is low,

shifts out the serial data.

  • Train pin selects

preset training data rather than D4 to D1.

D2

slide-21
SLIDE 21

Detailed View

  • Operation
  • Global Clock
  • Loads parallel data

from D4 to D1.

  • Strobe
  • Not used on

OSERDESE2.

  • Selects mux to

shift parallel data into shift registers.

  • I/O Clock
  • When Strobe is

high, loads shift registers with new data.

  • When Strobe is low,

shifts out the serial data.

  • Train pin selects

preset training data rather than D4 to D1.

D3

slide-22
SLIDE 22

Detailed View

  • Operation
  • Global Clock
  • Loads parallel data

from D4 to D1.

  • Strobe
  • Not used on

OSERDESE2.

  • Selects mux to

shift parallel data into shift registers.

  • I/O Clock
  • When Strobe is

high, loads shift registers with new data.

  • When Strobe is low,

shifts out the serial data.

  • Train pin selects

preset training data rather than D4 to D1.

D4

slide-23
SLIDE 23

Detailed View

  • Operation
  • Global Clock
  • Loads parallel data

from D4 to D1.

  • Strobe
  • Not used on

OSERDESE2.

  • Selects mux to

shift parallel data into shift registers.

  • I/O Clock
  • When Strobe is

high, loads shift registers with new data.

  • When Strobe is low,

shifts out the serial data.

  • Train pin selects

preset training data rather than D4 to D1.

D1

slide-24
SLIDE 24

Detailed View

  • Operation
  • Global Clock
  • Loads parallel data

from D4 to D1.

  • Strobe
  • Not used on

OSERDESE2.

  • Selects mux to

shift parallel data into shift registers.

  • I/O Clock
  • When Strobe is

high, loads shift registers with new data.

  • When Strobe is low,

shifts out the serial data.

  • Train pin selects

preset training data rather than D4 to D1.

slide-25
SLIDE 25

Example

slide-26
SLIDE 26

Example: 2.5 GSPS DAC

  • Think about the

required Clock and Data Requirements.

  • Device requires two

deinterleaved DDR data paths.

  • Data Rate (per path):

2.5 GSPS/2 = 1.25 GSPS

  • Clock Freq (per path):

2.5 GHz/4 = 625 MHz

  • FPGA OSERDES IO Clock
  • perating in DDR mode.
  • We will drive each

data path with an 8:1 OSERDESE2.

  • Our choice based on the

available FPGA (Host Processor in the figure)

slide-27
SLIDE 27

Example: 2.5 GSPS DAC

FPGA Requirements Global Clock

  • Not DDR
  • 1.25 GSPS/8
  • 156.25 MHz

Every clock Cycle must update 16x 14-bit data samples. reg [13:0] D [15:0] always@(posedge GCLK) begin D[15] <= ?; D[14] <= ?; ... D[0] <= ?; end

8:1 SerDes 8:1 SerDes 8:1 SerDes 8:1 SerDes 8:1 SerDes 8:1 SerDes

DB0x14 DB1x14

D15 D13 D11 D9 D7 D5 D3 D1 GCLK IOCLK D14 D12 D10 D8 D6 D4 D2 D0 GCLK IOCLK

slide-28
SLIDE 28

Clock Gen. and Dist.

Informational Resources IOSERDES: SelectIO Users Guide BUFR & BUFIO: Clocking Users Guide Instatiation: Libraries Guide

slide-29
SLIDE 29

8:1 SerDes 8:1 SerDes

Example: 2.5 GSPS DAC

8:1 SerDes 8:1 SerDes 8:1 SerDes 8:1 SerDes

DB0x14 DB1x14

D15 D13 D11 D9 D7 D5 D3 D1 GCLK IOCLK D14 D12 D10 D8 D6 D4 D2 D0 GCLK IOCLK

FPGA Requirements Global Clock

  • Not DDR
  • 1.25 GSPS/8
  • 156.25 MHz

Every clock Cycle must update 16x 14-bit data samples. reg [13:0] D [15:0] always@(posedge GCLK) begin D[15] <= ?; D[14] <= ?; ... D[0] <= ?; end

8:1 SerDes

1 1 1 1 GCLK IOCLK

DCI

BUFR DIV BUF IO

IOCLK GCLK

slide-30
SLIDE 30

Waveform Generation

  • Waveform Generation for a serializer

becomes more complicated.

  • Start Simple: Using the previous

example, how would a linear ramp be generated?

slide-31
SLIDE 31

Waveform Generation

  • Waveform Generation for a serializer

becomes more complicated.

  • Start Simple: Using the previous

example, how would a linear ramp be generated?

reg [13:0] D [15:0]; Initial begin D[15] <= 14’h0F; D[14] <= 14’h0E; ... D[1] <= 14’h01; D[0] <= 14’h00; end always@(posedge GCLK) begin D[15] <= D[15] + 14’h10; D[14] <= D[14] + 14’h10; ... D[0] <= D[0] + 14’h10; end

slide-32
SLIDE 32

Waveform Generation

  • Waveform Generation for a serializer

becomes more complicated.

  • Start Simple: Using the previous

example, how would a linear ramp be generated?

reg [9:0] DH = 0; always@(posedge GCLK) DH <= DH + 10’h01; wire [13:0] D [15:0]; assign D[15] <= {DH,4’hF}; assign D[14] <= {DH,4’hE}; ... assign D[1] <= {DH,4’h1}; assign D[0] <= {DH,4’h0};

slide-33
SLIDE 33

Waveform Generation

  • Waveform Generation for a serializer

becomes more complicated.

  • Using the previous example, how

would a Sinusoidal Signal be generated using the CORDIC algorithm?

slide-34
SLIDE 34

Waveform Generation

  • Waveform Generation for a serializer

becomes more complicated.

  • Using the previous example, how

would an arbitrary signal be generated from memory?

slide-35
SLIDE 35

Waveform Generation

  • Waveform Generation for a serializer

becomes more complicated.

  • Using the previous example, how

would a chirp signal be generated from the CORDIC algorithm?

slide-36
SLIDE 36

Waveform Generation

  • Waveform Generation for a serializer

becomes more complicated.

  • Using the previous example, how

would a chirp signal be generated from the CORDIC algorithm?

  • Simplify the problem. Generate a

quadratic as the phase: x2

  • Chirp: 2π(f0/fS)n + π(k/fS

2)n2+ ϕ

  • Phase = A0 + A1*n + A2*n*n
slide-37
SLIDE 37

Waveform Generation

Phase[n] = A0 + A1*n + A2*n2 Expand for n0=0, 16, 32 n1=1, 17, 33 n2=2, 18, 34 ... for ean generator. Phase0[n] = A0 + A1*(16*n) + A2*(16n)2 Phase0[n] = A0 + (16A1)*n + (256A2)*n2 Phase1[n] = A0 + A1*(16*n + 1) + A2*(16*n+1)2 Phase1[n] = (A0+A1+A2) + (16A1 + 32A2)*n + (256A2)*n2 Phase2[n] = A0 + A1*(16*n + 2) + A2*(16*n+2)2 Phase2[n] = (A0+2A1+4A2) + (16A1 + 64A2)*n + (256A2)*n2 Phase2[n] = A0 + A1*(16*n + 3) + A2*(16*n+3)2 Phase2[n] = (A0+3A1+9A2) + (16A1 + 96A2)*n + (256A2)*n2 PhaseI[n] = A0 + A1*(16*n + I) + A2*(16*n+I)2 PhaseI[n] = (A0+I*A1+I2*A2) + (16A1 + 32*I*A2)*n + (256A2)*n2 Each CORDIC synthesizer would start with a different phase and frequency.

slide-38
SLIDE 38

Waveform Generation

  • Waveform Generation for a serializer

becomes more complicated.

  • In general, the x16 increase in
  • utput data rate requires a x16

increase in the number of resources need in the FPGA fabric.

  • For a sinusoid, we would need 16

CORDIC Blocks.

  • If the instantaneous bandwidth

requirements doesn’t use the entire DAC bandwidth (1/2 DAC rate), the resources can be relaxed through an interpolation filter.

slide-39
SLIDE 39

Example: Hardware Interpolation

slide-40
SLIDE 40

Example: Side-note

slide-41
SLIDE 41

Waveform Generation

  • Example:
  • 1 GSPS DAC (Reconstruction Filter ?)
  • 8:1 serializer
  • GCLK = ? MSPS
  • DAC Bandwidth = ? MHz
  • The application only requires DC to

125 MHz output bandwidth.

slide-42
SLIDE 42

Waveform Generation

  • Example:
  • 1 GSPS DAC w/ 500 MHz Analog Filter
  • 8:1 serializer
  • GCLK = 125 MSPS
  • DAC Bandwidth = 500 MHz
  • Only require DC to 125 MHz output

bandwidth.

  • Implies we only need to generate

serialized data at a rate of 250 MHz.

  • At 125 this would be 2 CORDIC cores
  • r 2 AWGs.
slide-43
SLIDE 43

Waveform Generation

  • Example:
  • Red values are the interpolated

values.

  • Interpolation filter typically uses

much fewer FPGA resources.

8:1 SerDes

AWG2 (D4,D12,D20) AWG1 (D0,D8,D16)

Parallel Interp filter

D4, D12, D20 D3, D11, D19 D2, D10, D18 D1, D9, D17 D0, D8, D16 D-1,D7, D15 D-2,D6, D14 D-3,D5, D13 ...,D0,D1,D2,D3,D4,D5,D6,D7,...

slide-44
SLIDE 44

Waveform Generation

  • Example:
  • Red values are the interpolated

values.

  • Nearest Neighbor just uses routing.

8:1 SerDes

AWG2 (D4,D12,D20) AWG1 (D0,D8,D16) D4, D12, D20 D3, D11, D19 D2, D10, D18 D1, D9, D17 D0, D8, D16 D-1,D7, D15 D-2,D6, D14 D-3,D5, D13 ...,D0,D1,D2,D3,D4,D5,D6,D7,...

slide-45
SLIDE 45

Waveform Generation

  • Example:
  • Interpolation filter does not need to

be LPF. You can use any image of the up-sampled signal depending on the

  • filter. Or insert an I/Q Mixer.
  • Provides the flexibility of using the

entire DAC bandwidth (500 MHz) by changing coefficients.

8:1 SerDes

AWG2 (D4,D12,D20) AWG1 (D0,D8,D16)

Parallel Interp filter

D4, D12, D20 D3, D11, D19 D2, D10, D18 D1, D9, D17 D0, D8, D16 D-1,D7, D15 D-2,D6, D14 D-3,D5, D13 ...,D0,D1,D2,D3,D4,D5,D6,D7,...