Acknowledgment Thanks to the many IBM colleagues who contribute to - - PowerPoint PPT Presentation

acknowledgment
SMART_READER_LITE
LIVE PREVIEW

Acknowledgment Thanks to the many IBM colleagues who contribute to - - PowerPoint PPT Presentation

M ULTI -V EHICLE M AP F USION USING GNU R ADIO O PTIMIZATION AND A CCELERATION O PPORTUNITIES Augusto Vega Akin Sisbot Alper Buyuktosunoglu Arun Paidimarri David Trilla John-David Wellman Pradip Bose IBM T. J. Watson Research Center IBM


slide-1
SLIDE 1

IBM Research

MULTI-VEHICLE MAP FUSION USING GNU RADIO

OPTIMIZATION AND ACCELERATION OPPORTUNITIES

Augusto Vega Akin Sisbot Alper Buyuktosunoglu Arun Paidimarri David Trilla John-David Wellman Pradip Bose IBM T. J. Watson Research Center

slide-2
SLIDE 2

IBM Research

Acknowledgment

§ Thanks to the many IBM colleagues who contribute to and support different aspects of this work + our esteemed university collaborators at Harvard, Columbia, and UIUC (Profs. David Brooks, Vijay Janapa Reddi, Gu-Yeon Wei, Luca Carloni, Ken Shepard, Sarita Adve, Vikram Adve, Sasa Misailovic) + many brilliant graduate students and postdocs! § Special thanks to Dr. Thomas Rondeau, Program Manager of the DARPA MTO DSSoC Program

2 February 2020

This research was developed, in part, with funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. This document is approved for public release: distribution unlimited.

slide-3
SLIDE 3

IBM Research

Outline

§ Part 1: DARPA-funded EPOCHS project –Domain-specific (heterogeneous) SoC development § Part 2: EPOCHS Reference Application (“ERA”) –Application domain: multi-vehicle cooperative perception § Part 3: 802.11p Transceiver –Optimization and acceleration opportunities

3 February 2020

slide-4
SLIDE 4

IBM Research

Outline

§ Part 1: DARPA-funded EPOCHS project –Domain-specific (heterogeneous) SoC development § Part 2: EPOCHS Reference Application (“ERA”) –Application domain: multi-vehicle cooperative perception § Part 3: 802.11p Transceiver –Optimization and acceleration opportunities

4 February 2020

slide-5
SLIDE 5

IBM Research

DARPA’s Domain-Specific System on Chip (DSSoC) Program

Program Manager: Dr. Tom Rondeau

5 February 2020

§ Goal: to develop a heterogeneous system-on-chip (SoC) comprised of many cores that mix general purpose processors, special purpose processors, hardware accelerators, memory, and input/output (I/O) devices to significantly improve performance

  • f applications within a domain *

§ A domain is larger than any one application – We target the “super” domain of embedded processors for autonomous/connected cars

Source: IEEE Spectrum (July 2018) * Source: https://www.darpa.mil/program/domain-specific-system-on-chip

computer vision software radio

“c “cooperative perception”

slide-6
SLIDE 6

IBM Research

Application Domain: Cooperative Perception

§ Automakers use arrays of sensors to build redundancy into their systems

6 February 2020 Source: MIT Technology Review

This Image is Why Self-Driving Cars Come Loaded with Many Types of Sensors

When’s a pedestrian not a pedestrian? When it’s a decal.

slide-7
SLIDE 7

IBM Research

Application Domain: Cooperative Perception

§ Automakers use arrays of sensors to build redundancy into their systems § We propose a complementary approach: multi-vehicle (cooperative) perception

– Cars exchange locally-generated maps – Each vehicle merges its local map and the received ones in real time

7

Sensing and computation capabilities False predic4ons car-centric swarm-based

February 2020 Source: MIT Technology Review

This Image is Why Self-Driving Cars Come Loaded with Many Types of Sensors

When’s a pedestrian not a pedestrian? When it’s a decal.

slide-8
SLIDE 8

IBM Research

Efficient Programmability Of Cognitive Heterogeneous Systems

8 February 2020

“EPOCHS” à our proposed solution for the design challenge presented by the DSSoC program

slide-9
SLIDE 9

IBM Research

Efficient Programmability Of Cognitive Heterogeneous Systems

9 February 2020

EPOCHS Reference Application Compiler + Scheduler Ontology & Design Space Exploration Accelerators + NoC + Memory Architecture Implementation Domain-Specific SoC Hardware

Agile Flow Agile methodology to quickly design and implement an easily programmed domain-specific SoC for real-time cognitive decision engines in connected vehicles “Super”-Domain: Software-Defined Radio + Computer Vision

FPGA Prototype

10X – 100X reduction in person-years FPGA prototype, emulation, optimization, software bring-up

“EPOCHS” à our proposed solution for the design challenge presented by the DSSoC program

slide-10
SLIDE 10

IBM Research

Efficient Programmability Of Cognitive Heterogeneous Systems

10 February 2020

EPOCHS Reference Application Compiler + Scheduler Ontology & Design Space Exploration Accelerators + NoC + Memory Architecture Implementation Domain-Specific SoC Hardware

Agile Flow Agile methodology to quickly design and implement an easily programmed domain-specific SoC for real-time cognitive decision engines in connected vehicles “Super”-Domain: Software-Defined Radio + Computer Vision

FPGA Prototype

10X – 100X reduction in person-years FPGA prototype, emulation, optimization, software bring-up

“EPOCHS” à our proposed solution for the design challenge presented by the DSSoC program

slide-11
SLIDE 11

IBM Research 11 February 2020

The Big Picture (Where Does This Talk Fit In?)

Application Integrated performance analysis Development Environment and Programming Languages Libraries Operating System Compiler, linker, assembler Intelligent scheduling/routing Heterogeneous architecture composed of Processor Elements:

  • CPUs
  • Graphics processing units
  • Tensor product units
  • Neuromorphic units
  • Accelerators (e.g., FFT)
  • DSPs
  • Programmable logic
  • Math accelerators

Decoupled Software development Hardware-Software Co-design

Medium Access Control

DSSoC’s Full-Stack Integration Multi-vehicle map fusion using GNU Radio

slide-12
SLIDE 12

IBM Research

Outline

§ Part 1: DARPA-funded EPOCHS project –Domain-specific (heterogeneous) SoC development § Part 2: EPOCHS Reference Application (“ERA”) –Application domain: multi-vehicle cooperative perception § Part 3: 802.11p Transceiver –Optimization and acceleration opportunities

12 February 2020

slide-13
SLIDE 13

IBM Research

ERA: EPOCHS Reference Application

13 February 2020

Communication Fabric Sensing Fabric

V2V Communications To other control modules Real-Time Map Fusion Map Generation Computer Vision

EPOCHS Reference Application

§ “Cooperative Perception” for connected/autonomous vehicles – Multimodal sensing – Local occupancy map generation – DSRC-based V2V communication – Real-time map fusion

Contribute! https://github.com/IBM/era

slide-14
SLIDE 14

IBM Research

ERA Main Components (Single Robot’s Viewpoint)

14 February 2020

§ Raw sensor data generated (simulated) using Gazebo in ERA v2 – ERA v3 will replace Gazebo with an automotive simulation platform

Costmap 2D ERA Msg Builder Map Merger ERA Msg Interpreter GNU Radio ROS-GR Interface

Gazebo

Depth Camera

2D Map

Scan Occupancy Grid Map Occupancy Grid Map ERA Msg Pose Occupancy Grid Map ERA Msg Payload

slide-15
SLIDE 15

IBM Research

ERA Main Components (Single Robot’s Viewpoint)

15 February 2020

§ Raw sensor data is first converted into laser scans which are used to generate a 2D occupancy grid map

Costmap 2D ERA Msg Builder Map Merger ERA Msg Interpreter GNU Radio ROS-GR Interface

Gazebo

Depth Camera

2D Map

Scan Occupancy Grid Map Occupancy Grid Map ERA Msg Pose Occupancy Grid Map ERA Msg Payload

Depth image à laser scans conversion 2D occupancy map generation

label label label label label label

slide-16
SLIDE 16

IBM Research

ERA Main Components (Single Robot’s Viewpoint)

16 February 2020

§ Occupancy grid maps are serialized, compressed and put into a GNU Radio PDU § Outbound PDUs are injected into the 802.11p transceiver

Costmap 2D ERA Msg Builder Map Merger ERA Msg Interpreter GNU Radio ROS-GR Interface

Gazebo

Depth Camera

2D Map

Scan Occupancy Grid Map Occupancy Grid Map ERA Msg Pose Occupancy Grid Map ERA Msg Payload

Transmitter Receiver

Open-source implementation by Bastian Bloessl

https://github.com/bastibl/gr-ieee802-11

slide-17
SLIDE 17

IBM Research

ERA Main Components (Single Robot’s Viewpoint)

17 February 2020

§ Locally- and remotely-generated occupancy maps are merged in real time to improve the accuracy of the surroundings’ view § In ERAv2, merging is merely adding maps – Executed several times per second (!)

Costmap 2D ERA Msg Builder Map Merger ERA Msg Interpreter GNU Radio ROS-GR Interface

Gazebo

Depth Camera

2D Map

Scan Occupancy Grid Map Occupancy Grid Map ERA Msg Pose Occupancy Grid Map ERA Msg Payload

slide-18
SLIDE 18

IBM Research

Option 1: Two-Computer Setup

18 February 2020

§ One Gazebo instance simulating one single robot/vehicle in each computer § Over-the-air 802.11p communication (10-MHz OFDM with up to 64-QAM modulation) § More info: https://github.com/IBM/era/wiki/ERA-in-two-computers

Costmap 2D ERA Msg Builder Map Merger ERA Msg Interpreter GNU Radio ROS-GR Interface

Gazebo

Depth Camera

2D Map

Scan Occupancy Grid Map Occupancy Grid Map ERA Msg Pose Occupancy Grid Map ERA Msg Payload

USRP

Costmap 2D ERA Msg Builder Map Merger ERA Msg Interpreter GNU Radio ROS-GR Interface

Gazebo

Depth Camera

2D Map

Scan Occupancy Grid Map Occupancy Grid Map ERA Msg Pose Occupancy Grid Map ERA Msg Payload

USRP

Robot 1 Robot 2

802.11p

slide-19
SLIDE 19

IBM Research

Option 2: Standalone Setup

19 February 2020

§ Runs on a single computer, replacing over-the-air communication with network sockets § Easiest setup to start with

Costmap 2D ERA Msg Builder Map Merger ERA Msg Interpreter GNU Radio ROS-GR Interface

Gazebo

Depth Camera

2D Map

Scan Occupancy Grid Map Occupancy Grid Map ERA Msg Pose Occupancy Grid Map ERA Msg Payload

Costmap 2D ERA Msg Builder Map Merger ERA Msg Interpreter GNU Radio ROS-GR Interface

Gazebo

Depth Camera

2D Map

Scan Occupancy Grid Map Occupancy Grid Map ERA Msg Pose Occupancy Grid Map ERA Msg Payload

Robot 1 Robot 2 socket

slide-20
SLIDE 20

IBM Research

Outline

§ Part 1: DARPA-funded EPOCHS project –Domain-specific (heterogeneous) SoC development § Part 2: EPOCHS Reference Application (“ERA”) –Application domain: multi-vehicle cooperative perception § Part 3: 802.11p Transceiver –Optimization and acceleration opportunities

20 February 2020

slide-21
SLIDE 21

IBM Research

Communication Fabric Sensing Fabric

V2V Communications To other control modules Real-Time Map Fusion Map Generation Computer Vision

802.11p Transceiver within ERA

21 February 2020

EPOCHS Reference Application

From RX ADC

TRANSMITTER FLOWGRAPH RECEIVER FLOWGRAPH

WiFi Encoding and Packet Generation OFDM Carrier Allocation IFFT Cyclic Prefix Generation WiFi MAC

From ROS To TX DAC

OFDM Frame Equalizer FFT Sync Long Sync Short Packet Decoder and Mac

To ROS

Open-source IEEE 802.11p transceiver implemented in GNU Radio by Bastian Bloessl

https://github.com/bastibl/gr-ieee802-11

slide-22
SLIDE 22

IBM Research

Communication Fabric Sensing Fabric

V2V Communications To other control modules Real-Time Map Fusion Map Generation Computer Vision

802.11p Transceiver within ERA

22 February 2020

EPOCHS Reference Application

From RX ADC

TRANSMITTER FLOWGRAPH RECEIVER FLOWGRAPH

WiFi Encoding and Packet Generation OFDM Carrier Allocation IFFT Cyclic Prefix Generation WiFi MAC

From ROS To TX DAC

OFDM Frame Equalizer FFT Sync Long Sync Short Packet Decoder and Mac

To ROS

Open-source IEEE 802.11p transceiver implemented in GNU Radio by Bastian Bloessl

https://github.com/bastibl/gr-ieee802-11

0% 5% 10% 15% 20% 25% 30% 35%

cexpf viterbi_butterfly2 volk_32fc_32f_dot_prod… fastnoise_source_c_impl::work volk_32fc_x2_dot_prod… mulsc3

Execution Time (%)

Functions identified for acceleration

slide-23
SLIDE 23

IBM Research

Acceleration Options (for cexpf): Preliminary Results

23 February 2020

5 10 15 20 25 30 35 40

CPU Baseline FPGA w/DMA CPU Vectorized FPGA w/DMA (Fully Optimized) CPU Cycles Computation Time Mem Copy Overhead

Execution time per cexpf operation

ARM Cortex-A53

slide-24
SLIDE 24

IBM Research

Acceleration Options (for cexpf): Preliminary Results

24 February 2020

5 10 15 20 25 30 35 40

CPU Baseline FPGA w/DMA CPU Vectorized FPGA w/DMA (Fully Optimized) CPU Cycles Computation Time Mem Copy Overhead

Execution time per cexpf operation

ARM Cortex-A53 Xilinx UltraScaleMP+ ZYNQ ZCU102

FPGA Implementation (dual data-path pipeline)

(cos𝑐 + sin𝑐 𝑗) 𝑓,

𝑓,-./ = 𝑓, ∗ (cos 𝑐 + sin 𝑐 𝑗)

slide-25
SLIDE 25

IBM Research

Acceleration Options (for cexpf): Preliminary Results

25 February 2020

5 10 15 20 25 30 35 40

CPU Baseline FPGA w/DMA CPU Vectorized FPGA w/DMA (Fully Optimized) CPU Cycles Computation Time Mem Copy Overhead

Execution time per cexpf operation

ARM Cortex-A53 ARM’s SIMD extension (NEON) Xilinx UltraScaleMP+ ZYNQ ZCU102

FPGA Implementation (dual data-path pipeline)

(cos𝑐 + sin𝑐 𝑗) 𝑓,

𝑓,-./ = 𝑓, ∗ (cos 𝑐 + sin 𝑐 𝑗)

slide-26
SLIDE 26

IBM Research

Acceleration Options (for cexpf): Preliminary Results

26 February 2020

5 10 15 20 25 30 35 40

CPU Baseline FPGA w/DMA CPU Vectorized FPGA w/DMA (Fully Optimized) CPU Cycles Computation Time Mem Copy Overhead

Execution time per cexpf operation

Fully-optimized implementation (idealized bound) – 300 MHz (instead of 100 MHz) – Four parallel computation engines – Memory-copy elimination ARM Cortex-A53 ARM’s SIMD extension (NEON) Xilinx UltraScaleMP+ ZYNQ ZCU102

FPGA Implementation (dual data-path pipeline)

(cos𝑐 + sin𝑐 𝑗) 𝑓,

𝑓,-./ = 𝑓, ∗ (cos 𝑐 + sin 𝑐 𝑗)

slide-27
SLIDE 27

IBM Research

ERA Roadmap

27

Computer System ROS/Gazebo 802.11p

  • ver-the-air

Computer System ROS/Gazebo Computer System ROS/Gazebo UDP Computer System CarSim/Apollo 802.11p

  • ver-the-air

Computer System CarSim/Apollo

February 2020

Version 1 Version 2 Version 3

slide-28
SLIDE 28

IBM Research

ERA Roadmap

28

Computer System ROS/Gazebo 802.11p

  • ver-the-air

Computer System ROS/Gazebo Computer System ROS/Gazebo UDP Computer System CarSim/Apollo 802.11p

  • ver-the-air

Computer System CarSim/Apollo

February 2020

Version 1 Version 2 Version 3 LAYER 1 World Simulators (sensor data source) LAYER 2 Automotive Platforms (perception, plan and control) LAYER 3 Coopera4ve Vehicles PlaIorm (swarming and V2X support) CarSim Gazebo CARLA LGSV Apollo Autoware ERA

802.11p 5G Multi-Vehicle Cooperation Logic

Apollo API Autoware API ROS GNU Radio

some raw sensor data is directly fed to ERA

ERA is only intended to enable cooperative automotive, with support for DSRC, and 5G (future)

This makes ERA unique

slide-29
SLIDE 29

IBM Research

Summary

29 February 2020

§ The domain-specific (heterogeneous) SoCs era is here! § DARPA’s Domain-Specific System on Chip (DSSoC) Program – Our proposed application domain: multi-vehicle cooperative perception – Local sensing + V2V communications – The DSRC transceiver plays a critical role for real-time V2V communications performance power efficiency ROS and GNU Radio “worlds” coexisting

slide-30
SLIDE 30

IBM Research

Summary

30 February 2020

§ The domain-specific (heterogeneous) SoCs era is here! § DARPA’s Domain-Specific System on Chip (DSSoC) Program – Our proposed application domain: multi-vehicle cooperative perception – Local sensing + V2V communications – The DSRC transceiver plays a critical role for real-time V2V communications performance power efficiency ROS and GNU Radio “worlds” coexisting

Turn ERA into a benchmark for cooperative mobility that can be easily “plugged” into existing platforms

Do you want to collaborate?

– Contact: ajvega@us.ibm.com – GitHub: https://github.com/IBM/era

slide-31
SLIDE 31

IBM Research

Thank You!

IBM T. J. Watson Research Center

Photo by Balthazar Korab Source: http://www.shorpy.com/node/15488

ajvega@us.ibm.com https://github.com/augustojv