[PPT] - Extending/Optimizing the USRP/RFNOC Framework for Implementing PowerPoint Presentation

SLIDE 1

Extending/Optimizing the USRP/RFNOC Framework for Implementing Latency-Sensitive Radio Systems

Joshua Monson, Zhongren Cao^, Pei Liu~, Travis Haroldsen, Matthew French* 11/14/2018 *USC Information Sciences Institute Arlington, VA ^C3-COMM Systems Vienna, VA ~New York University 4676 Admiralty Way Marina del Rey, CA 3811 N Fairfax Drive Arlington, VA

SLIDE 2

USC Information Sciences Institute

Reconfigurable Computing Group

@ USC/ISI

– Over 20 years performing cutting edge FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity, TRUST and Security – Custom ASIC/FPGA CAD Tool (TORC)

ISI: A Large, vibrant, path-breaking

research Institute

– Part of USC’s Viterbi School of Engineering located in “Marina Tech Campus” (Marina del Rey) and in Arlington, VA – >$80M per year in funding from a diversified base of sponsors – ~300 people mostly research staff – Facilities to conduct ITAR, classified, and unclassified research

SLIDE 3

Overview

Discuss Extensions/Optimizations to the UHD/RFNOC

that allowed us to meet stringent latency requirements

f the transceiver
Experiences, lessons learned, and development efforts

to implement a broadband CSMA/CA based OFDM transceiver on an Ettus E310 USRP Software Defined Radio (SDR) platform.

Supports link layer latency-cooperative transmissions

SLIDE 4

COMBAT Project

Over 95% of casualties in Operations Enduring Freedom, Iraqi Freedom,

and New Dawn occurred after operations transitioned from linear, conventional fights to the nonlinear, nonconventional stabilization phase

–A majority of missions during the nonconventional stabilization phase are carried out by dismount squads at the tactical edge –Sharing situational awareness among soldiers is vital to mission successes –Multicast plays an increasingly important role in edge networks

Goal: Improve the throughput of wireless multicast and broadcast in

dismounted squad networks in order to significantly enhance the situational awareness at the tactical edge

1. C. Thielenhaus, P. Traeger, E. Roles, “Reaching forward in the war against the Islamic State,”

PRISM, Nation Defense University 12/2016

2. E. Roles, Presentation on RAA in DARPA Industry Day, Jan. 2017

Sources: DISTRIBUTION A. Approved for public release: distribution unlimited.

SLIDE 5

COMBAT Innovations

COMBAT system works to support and improve IP

multicast schemes such as the SMF in RFC 6621

– IP multicast to provide connections among clusters – Local mobility within clusters is handled by COMBAT in the link layer

Higher Data Rate Transparent to IP Layer Responsive to Channel Reduced Complexity Efficient Channel Utilization

7

SLIDE 6

System Level Simulation

Simulation Setup

– Topology: Mobile distributed on a disc R=100 meter – Radio Propagation: Path loss + AWGN – Traffic: Multicast only from a random node. – Random Waypoint Model: Velocity (0.1-4 m/s), Pause duration (0-60 s).

DISTRIBU TION A

SLIDE 7

COMBAT Objectives

Demonstrate Improvements in Relay Throughput on

Prototype OFDM Transceiver over Worst Link Scenario

Requirements:

– 10 MHz Bandwidth, 40 MSPS – Utilize CSMA/CA

SLIDE 8

COMBAT Latency Requirements

Contention-free relay with PHY-assisted

ACK/NACK

CSMA/CA deadlines

– Enable Compatibility with Existing Systems

Throughput

– RX-to-TX Latency is critical for throughput

SLIDE 9

Software-Defined Architecture

USRPs are latency-insensitive peripherals
Latency is low, but SDRs not intended to meet

stringent latency CSMA/CA deadlines

Waveform in the Air Antenna(s)

ADC DAC

N N H(n)

Decimation

H(n)

Interpolation

Analog Filter Banks/ Mixers

FPGA Discrete RF Devices

Host

USB/Ethernet

SLIDE 10

Ethernet/ USB Connection

Enabling Advances in SDR Architecture

Block Diagram of N210

FPGA

RF-Front

Host

Block Diagram of E310

RF-Front

FPGA

Host

Embedded SDR Architecture

Single Package
Host/FPGA Latency Smaller
Larger FPGA

Standard SDR Architecture

ADC DAC

N N H(n)

Decimation

H(n)

Interpolation

Analog Filter Banks

ADC DAC

N N H(n)

Decimation

H(n)

Interpolation

Analog Filter Banks

ARM CORTEX A9

SLIDE 11

Radio Frequency Network on Chip (RFNOC)

RF-Frontend ADC/DAC IF

RFNOC

Custom Accelerator0 Custom Accelerator1 Custom Accelerator2

Host

FPGA USB/Ethernet

Discrete RF Devices

Dynamically Programmable Network-on-Chip
Provides a design entry-point into the FPGA

SLIDE 12

E310 RFNOC Base Design (FPGA Only)

ADC/DAC IF

LOOPBACK

NOC IF NOC IF NOC IF NOC IF NOC IF NOC IF NOC IF NOC IF

RFNOC

ADC/DAC IF ZYNQ FIFO 16 Channel Host-to-RFNOC DMA

FOSPHOR FIR FILTER FFT SIGNALGEN

RFNOC Initial Configuration on E310

SLIDE 13

E310-RFNOC Base Design (FPGA Only)

ADC/DAC IF

LOOPBACK

NOC IF NOC IF NOC IF NOC IF NOC IF NOC IF NOC IF NOC IF

RFNOC

ADC/DAC IF ZYNQ FIFO 16 Channel Host-to-RFNOC DMA

FOSPHOR FIR FILTER FFT SIGNALGEN

X X X X X X X X X X

NOC IF are ~30% Resource Utilization

SLIDE 14

E310 RFNOC Base Design (FPGA Only)

ADC/DAC IF

LOOPBACK

NOC IF NOC IF NOC IF NOC IF NOC IF NOC IF NOC IF NOC IF

RFNOC

ADC/DAC IF ZYNQ FIFO 16 Channel Host-to-RFNOC DMA

FOSPHOR FIR FILTER FFT SIGNALGEN

X X X X X X

8 Channel

*Required Extension/Update UHD

SLIDE 15

E310 RFNOC Base Design (FPGA Only)

ADC/DAC IF NOC IF NOC IF ZYNQ FIFO 16 Channel Host-to-RFNOC DMA

8 Channel

LUT FF BRAM DSP

Total ZYNQ 7020 Device

53,200 106,400 140 220

RFNOC (Initial)

41,247 (77%) 55,783 (52.4%) 116 (82.8%) 146 (66.3%)

RFNOC baseline (RFNOC/Radio)

12,546 (23.5%) 15,840 (14.8%) 26 (18.6%) (0%)

COMBAT

ptimized

45,319 (85.1%) 52,540 (47%) 104.5 (74.6%) 120 (52%)

Freed 53% of the FPGA Resources!

SLIDE 16

RFNOC Latency

ADC/DAC IF NOC IF NOC IF NOC IF ZYNQ FIFO 16 Channel Host-to- RFNOC DMA

8 Channel

RF-Frontend RX Block

2.5 us Fixed Both Directions ~200 ns Fixed Both Directions RFNOC CLOCK = 50 MHz (400 MB/Sec) CE CLOCK = 40 MHz (40 MSPS, 160 MB/Sec) ~.625 us + RFNOC Packet Buffering Delay each way B B

Packet Buffering into RFNOC is

Primary Cause of Delay.

Dependent on Programmable Packet

Size.

Static Delay through RFNOC .625 us

SLIDE 17

RFNOC Latency

ADC/DAC IF NOC IF NOC IF NOC IF ZYNQ FIFO 16 Channel Host-to- RFNOC DMA

8 Channel

RF-Frontend RFNOC Block

RFNOC CLOCK = 50 MHz (400 MB/Sec) CE CLOCK = 40 MHz (40 MSPS, 160 MB/Sec)

50 100 150 200 250 300

Samples Per Packet

10 20 30 40 50 60 70 80

MSPS

10 6

RFNOC Throughput

Required Thruput 16 Samples

Minimum RX –to-TX Delay

RFNOC 3.0 us ADC/DAC IF ~250 ns RF-Frontend 5.0 us Total 8.25 us

B B

SLIDE 18

Low-Latency RFNOC Extension

ADC/DAC IF NOC IF NOC IF NOC IF ZYNQ FIFO 16 Channel Host-to- RFNOC DMA

8 Channel

RF-Frontend RFNOC Block

RFNOC CLOCK = 50 MHz (400 MB/Sec) CE CLOCK = 40 MHz (40 MSPS, 160 MB/Sec)

Minimum RX –to-TX Delay

RFNOC 3.0 us ADC/DAC IF ~250 ns RF-Frontend 5.0 us Total 5.25 us

LL LL Low-Latency RFNOC Block Benefits:

Minimum Latency Limited by RF

configuration and a couple cycles

Selectable – Select LL for TX, RX, or

TX/RX

Maintains Compatibility with

RFNOC/UHD; however needs additional work to support GNURadio

SLIDE 19

Other Mods/Extensions to E310/UHD

Updates to REPO

– Added Block Diagram Build Flow – Construct Vivado GUI Project for Block Diagrams – Smoother Integration Vivado IP

Exposed AD9361 Control

– Exposed SPI Interface to Write AD9361 Control

Debugging

– Built Custom Cable and Implemented Virtual JTAG – Integrated Virtual JTAG Server/Driver

SLIDE 20

OFDM Transceiver Architecture

1. Upper MAC

a) Wraps Payloads in Packets b) Manages COMBAT Protocols c) Moves Packets to and from Packet Buffers

2. Lower MAC

a) Configures and Manages RX and TX b) Handles latency sensitive protocol (e.g. relay forwarding, etc.) c) Inform Upper MAC of received packets.

3. Packet Buffers

a) TX: Hold packets that have been staged from Transmission b) RX: Hold packets that have been received and decoded

4. OFDM RX/TX

a) TX: Transform Bits to OFDM baseband samples. b) RX: Transforms OFDM baseband Samples to bits TX Bits TX Samples RX Bits RX Samples Control BUS

ARM

(Hard IP) Upper MAC

AD9361 ADC/DAC IF OFDM RX

RX

RFNOC

OFDM TX

TX

Packet Buffers

MircoBlaze Lower MAC

Mailbox

DISTRIBUTION A. Approved for public release: distribution unlimited.

25

8 Rate Settings (3-27 Mb/s)
40 MSPS, 10 MHz BW

Performance Statistics Transmit Directly from RX Buffer to Minimize Latency

ZYNQ 7020 FPGA

SLIDE 21

Latency Analysis

ARM

(Hard IP) Upper MAC

AD9361 AD9361 IF OFDM RX

RX

RFNOC

OFDM TX

TX

Packet Buffers

MircoBlaze Lower MAC

Mailbox

DISTRIBUTION A. Approved for public release: distribution unlimited.

26

ZYNQ 7020 FPGA

SLIDE 22

Latency Analysis

ARM

(Hard IP) Upper MAC

AD9361 AD9361 IF OFDM RX

RX

RFNOC

OFDM TX

TX

Packet Buffers

MircoBlaze Lower MAC

Mailbox

DISTRIBUTION A. Approved for public release: distribution unlimited.

27

ZYNQ 7020 FPGA 2.5

.2

SLIDE 23

Latency Analysis

ARM

(Hard IP) Upper MAC

AD9361 AD9361 IF OFDM RX

RX

RFNOC

OFDM TX

TX

Packet Buffers

MircoBlaze Lower MAC

Mailbox

DISTRIBUTION A. Approved for public release: distribution unlimited.

28

ZYNQ 7020 FPGA 2.5

.2 17

SLIDE 24

Latency Analysis

ARM

(Hard IP) Upper MAC

AD9361 AD9361 IF OFDM RX

RX

RFNOC

OFDM TX

TX

Packet Buffers

MircoBlaze Lower MAC

Mailbox

DISTRIBUTION A. Approved for public release: distribution unlimited.

29

ZYNQ 7020 FPGA 2.5

.2 17

4.1

SLIDE 25

Latency Analysis

ARM

(Hard IP) Upper MAC

AD9361 AD9361 IF OFDM RX

RX

RFNOC

OFDM TX

TX

Packet Buffers

MircoBlaze Lower MAC

Mailbox

DISTRIBUTION A. Approved for public release: distribution unlimited.

30

ZYNQ 7020 FPGA 2.5

.2 17

4.1

.1

SLIDE 26

Latency Analysis

ARM

(Hard IP) Upper MAC

AD9361 AD9361 IF OFDM RX

RX

RFNOC

OFDM TX

TX

Packet Buffers

MircoBlaze Lower MAC

Mailbox

DISTRIBUTION A. Approved for public release: distribution unlimited.

31

ZYNQ 7020 FPGA 2.5

.2 17

4.1

.1 .2

29.3 us

SLIDE 27

RFNOC Latency Measurements

58 us End of Packet Start of Relay

RF RF NOC NOC RX MAC 2.5 2.5 10.4 7.6 17.0 20.0

TX .1 NOC Delays Primarily Related to RFNOC Packet Size & Buffering Opportunities to further optimize MAC Code can be optimized RX-to-TX Latency exceeds Requirement (34 us)! Best Case Relay of 10.98 Mb/S

SLIDE 28

3.88 3.9 3.92 3.94 3.96 3.98 4 4.02 4.04 4.06 10 4

3
2.5
2
1.5
1
0.5

0.5 1 1.5 2 10 4

Low-Latency IF Measurements

~26.6 us End of Packet Start of Relay

RF RF RX MAC 2.7 2.7 17.0 4.1

TX

.1

Reduced RX-to-TX

Latency by 50%!

Optimizations:
RFNOC
Adjusted FC Buffers
Adjusted FC ACKs
MAC Code
Added –O3 (50%)
Adjusted Algorithms
Overlapped More relay

processing with RX

Added B. Shifter, Mult
Meets Latency Requirement

+4.7% Achievable Relay Throughput!

2.5% Packet loss due to TX Underrun
Likely due to Flow Control ACK

SLIDE 29

Low-Latency IF Measurements

1 2 3 4 5 6 7 10 4

4
3
2
1

1 2 3 4 10 4

Relay Latency: 22.6 us

OFDM Relay 18 Mb/s Rate

Relay 1st Packet

SLIDE 30

RFNOC Review

Benefits Drawbacks Simple FPGA Design Entry Point into Ettus Stack Not Intended for ultra-low latency processing Excellent for High-level Design Tools like GNURadio/RFNOC No Abstraction for Low Latency Signals communicating between blocks (e.g. state-inputs and outputs) Excellent for Modular Design Flexible Stock IP to Connect To RFNOC

We will be releasing our Low-Latency RFNOC Block and NOC Radio Extension as
pen-source shortly on https://github.com/ISI-RCG (likely location)
Recommended using these cores for ultra-low latency processing

SLIDE 31

Conclusion and Summary

Implemented Latency-Sensitive CSMA/CA Radio System
n USRP E310/E312
Made Several Extensions and Improvements to Permit

Latency Sensitivity

– Created Low-Latency Radio Block and Low Latency RFNOC Block Shell – UHD Updates that Permitted RX->TX without Host – Enabled Vivado Block Diagram Build Flow – Implemented Lower MAC In Microblaze Processor – Enabled TX to Read from RX Packet Buffer – Transfer Packets to/from FPGA Rather than Samples

Future Work:

– Push Abstraction Levels Higher (Mapping to GNURadio)

SLIDE 32

Questions

41

Extending/Optimizing the USRP/RFNOC Framework for Implementing Latency-Sensitive Radio Systems

Joshua Monson*, Zhongren Cao^, Pei Liu~, Travis Haroldsen*, Matthew French* 11/14/2018 *USC Information Sciences Institute Arlington, VA ^C3-COMM Systems Vienna, VA ~New York University 4676 Admiralty Way Marina del Rey, CA 3811 N Fairfax Drive Arlington, VA

USC Information Sciences Institute

@ USC/ISI

– Over 20 years performing cutting edge FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity, TRUST and Security – Custom ASIC/FPGA CAD Tool (TORC)

research Institute

– Part of USC’s Viterbi School of Engineering located in “Marina Tech Campus” (Marina del Rey) and in Arlington, VA – >$80M per year in funding from a diversified base of sponsors – ~300 people mostly research staff – Facilities to conduct ITAR, classified, and unclassified research

Overview

that allowed us to meet stringent latency requirements

to implement a broadband CSMA/CA based OFDM transceiver on an Ettus E310 USRP Software Defined Radio (SDR) platform.

COMBAT Project

and New Dawn occurred after operations transitioned from linear, conventional fights to the nonlinear, nonconventional stabilization phase

dismounted squad networks in order to significantly enhance the situational awareness at the tactical edge

COMBAT Innovations

multicast schemes such as the SMF in RFC 6621

– IP multicast to provide connections among clusters – Local mobility within clusters is handled by COMBAT in the link layer

System Level Simulation

– Topology: Mobile distributed on a disc R=100 meter – Radio Propagation: Path loss + AWGN – Traffic: Multicast only from a random node. – Random Waypoint Model: Velocity (0.1-4 m/s), Pause duration (0-60 s).

COMBAT Objectives

Prototype OFDM Transceiver over Worst Link Scenario

– 10 MHz Bandwidth, 40 MSPS – Utilize CSMA/CA

COMBAT Latency Requirements

ACK/NACK

– Enable Compatibility with Existing Systems

– RX-to-TX Latency is critical for throughput

Software-Defined Architecture

stringent latency CSMA/CA deadlines

Waveform in the Air Antenna(s)

N N H(n)

H(n)

Analog Filter Banks/ Mixers

FPGA Discrete RF Devices

Host

USB/Ethernet

Ethernet/ USB Connection

Enabling Advances in SDR Architecture

Block Diagram of N210

FPGA

RF-Front

Host

Block Diagram of E310

RF-Front

FPGA

Host

Embedded SDR Architecture

Standard SDR Architecture

N N H(n)

H(n)

Analog Filter Banks

N N H(n)

H(n)

Analog Filter Banks

Radio Frequency Network on Chip (RFNOC)

RFNOC

Host

FPGA USB/Ethernet

Discrete RF Devices

E310 RFNOC Base Design (FPGA Only)

LOOPBACK

RFNOC

FOSPHOR FIR FILTER FFT SIGNALGEN

RFNOC Initial Configuration on E310

E310-RFNOC Base Design (FPGA Only)

LOOPBACK

RFNOC

FOSPHOR FIR FILTER FFT SIGNALGEN

X X X X X X X X X X

NOC IF are ~30% Resource Utilization

E310 RFNOC Base Design (FPGA Only)

LOOPBACK

RFNOC

FOSPHOR FIR FILTER FFT SIGNALGEN

X X X X X X

8 Channel

*Required Extension/Update UHD

E310 RFNOC Base Design (FPGA Only)

8 Channel

Total ZYNQ 7020 Device

53,200 106,400 140 220

RFNOC (Initial)

Joshua Monson, Zhongren Cao^, Pei Liu~, Travis Haroldsen, Matthew French* 11/14/2018 *USC Information Sciences Institute Arlington, VA ^C3-COMM Systems Vienna, VA ~New York University 4676 Admiralty Way Marina del Rey, CA 3811 N Fairfax Drive Arlington, VA