27 March 2018 Mikael Arguedas and Morgan Quigley - - PowerPoint PPT Presentation
27 March 2018 Mikael Arguedas and Morgan Quigley - - PowerPoint PPT Presentation
27 March 2018 Mikael Arguedas and Morgan Quigley USB3 Camera Separate devices: USB3 USB Host Camera (prototypes 0-3) USB2 IMU Unified
- ○
- ○
○
- ○
○
- ○
○ ○
- ○
○
USB Host Camera Camera IMU USB Host FPGA Imager Imager IMU FPGA Imager Imager IMU PCIe root
Unified system: (prototypes 6+) Separate devices: (prototypes 0-3) Unified camera: (prototypes 4-5)
USB3 USB3 USB2 USB3 PCIe
- Global-shutter 1.3 MPix imagers, 20cm baseline
- FPGA+DRAM+USB3 on daughterboard
- InertialSense µIMU-2
Imagers FPGA Imagers DRAM USB3 PHY Any PC Imagers FPGA Imagers PCIe root on SBC (NVIDIA TX2) PCIe PHY 40 40 / 30 / 8 / 12 / 30 /
USB3-based design PCIe-based design PCIe x2 bandwidth is similar to (low-end) FPGA DRAM bus !
Artix-7: 16 bit @ 400 MHz DDR = ~13 Gbit - overhead PCIe Gen 2 x2 = 8 Gbit full-duplex DRAM buffer required since USB3 = 4Gbit - scheduling Imagers capable of ~2 Gbit pixel rate (each)
- Designed around TX2
- FPGA is PCIe endpoint
- Self-contained computer vision: "just supply power"
https://cad.onshape.com/documents/12b7e4d13bded8c95b2b0603/w/4fe61ad3cda4bc70ee895fc7/e/e46ce3bf3f9ed56eb291d5e2
- Imager active area is not centered
- Use 3d-printed lens holders
- PLA is OK. Carbon-fiber is better
- Heat-set inserts for mounting + lens lock
- Stereo systems need to be very stiff
- PCB is clamped to carbon-fiber tube
- always-on MCU waits for TX2 boot
- During TX2 boot: MCU loads stage-1 FPGA image
- TX2-FPGA PCIe link established
- TX2 loads stage-2 FPGA image over PCIe
- sensors initialized over PCIe MMIO
TX2 Imager MCU IMU
UART PCIe
FPGA
SPI
Imager
TX2 FPGA DRAM PCIe Image sensors DMA write arbiter trigger IMU
sync
decimate PCIe
Image timing via IMU sync
TX2
FPGA DRAM
PCIe
image sensors DMA arbiter
PCIe
deserialize decode framing FIFO fix column
- rdering
Extreme close-up of typical indoor navigation feature (sprinkler pipe joint)
corner 7x7 circle
corner 7x7 circle (discretized)
corner 7x7 circle (discretized) "unrolled" discretized circle
"unrolled" circle subtract center threshold 1) Find (in parallel) if there is a contiguous sequence
- f >= 9 pixels above threshold value.
2) For non-max suppression, find (in parallel) "how far" the sequence is above the threshold
Imagers produce 4 pixels per clock. Solution: search in parallel.
- An example of FPGA reducing latency in (simple) pixel-wise operations
- 100's of operations per clock: 8-bit subtractions, comparisons, etc
- Deterministic timing, keeps up with pixel rate
- Many other algorithms are FPGA-friendly: pyramids, gradients, ...
TX2 FPGA DRAM
CPU
PCIe SPI Image sensors register file pixel array
link train
deserializer serializers 4
clock sync data
ADC
image and c
/
image decoders
IMU BRAM
SPI sequencer
pixel FIFOs pixel FIFOs DMA write arbiter
trigger
SPI
IMU
sync decimate
pixel FIFOs
corner detectors
image stats
control register BRAM
PCIe
GPU
Imager
- Initialization
○ allocate PCIe-visible RAM block ○ configure FPGA core, imager SPI registers, IMU registers
- Every frame
○ FPGA writes pixels via DMA to TX2 RAM, sends interrupt ○ kernel re-syncs CPU caches ○ kernel unblocks userland thread in ROS node ○ ROS node copies image into ROS message, sends it downstream Imagers FPGA kernel module ROS driver
image consumer nodes image consumer nodes image consumer nodes
TX2 RAM
DMA MSI PCIe
sensor hardware actuator hardware
- Dynamic distributed message-passing framework
- Huge collection of open-source nodes
- Tools to parameterize, configure, and debug nodes
prevented by encryption prevented by authorization camera driver vision node camera evil data sniffing publish evil image data
evil node
downstream nodes
https://github.com/osrf/tensorflow_object_detector
https://github.com/osrf/tensorflow_object_detector
SSD USB GbE GPU USB (etc)
PCIe root
"Traditional" system
- all peripherals on PCIe
- PCIe cannot reset / re-enumerate
- PCIe devices ready within 100ms
PCIe root
TX2-based system
- nly the FPGA hangs off PCIe
- PCIe kernel driver can be reloaded
- FPGA configure/reconfigure at any time
- elaborate "fast" configuration not needed
Flash USB GbE GPU USB
TX2 (Tegra) SoC System Fabric
FPGA
- all connectors on same side
- no configuration MCU
- FPGA upgrade (?)
- stack boards to reduce footprint