Real-Time Sonar Beamforming on a Unix Workstation using Process - PowerPoint PPT Presentation

Real-Time Sonar Beamforming on a Unix Workstation using Process Networks and POSIX Threads Gregory E. Allen 1,2 Brian L. Evans 1 David C. Schanbacher 1 1 Embedded Signal Processing Laboratory The University of Texas at Austin 2 http://www.ece.utexas.edu/~allen/

Motivation •Beamforming is computationally intensive (GFLOPS). •Traditionally limited to expensive custom hardware. •Real-time software implementation on a workstation. • Multi-processor workstations. • Real-time threads supported by modern operating systems. • Native signal processing. 2

Objectives •Implement a 4 GFLOP sonar beamformer in software. • Evaluate the performance of sonar beamforming algorithms. • Capture parallelism and guarantee determinate bounded execution. • Use lightweight threads on a multiprocessor workstation. •Assess feasibility of replacing a real-time custom hardware beamformer with a Unix workstation. 3

Time-Domain Beamforming •Delay and sum weighted sensor outputs. •Geometrically project the sensor elements onto a line to compute the time delays. Projection for a beam pointing 20° off axis 20 M Σ α i x i (t– τ i ) b(t) = 15 i = 1 y position, inches 10 b(t) beam outputi ith sensor output xi(t) 5 τ i ith sensor delay 0 α i ith sensor weight sensor element projected sensor element -5 -20 -15 -10 -5 0 5 10 15 20 x position, inches 4

Interpolation Beamforming •Quantized time delays perturb beam pattern. •Sample at just above the Nyquist rate. •Interpolate to obtain desired time-delay resolution. Sample at Interpolate up to Time delay α 1 interval ∆ interval δ = ∆ /L at interval δ N 1 δ A/D Interpolate α M Σ • • • • b[n] • • • • N M δ A/D Interpolate Sensor Array Weights Digital Interpolation Beamformer 5

Interpolation Beamforming •Modeled as a sparse FIR filter: • M total sensors in array (80) • S sensors used to calculate beam (50) • D maximum geometry delay (31) • P points for interpolation filter (2) • B number of beams calculated (61) Coefficient filter length: K = ( D + P -1) M (2560) Non-zero coefficients: C = P S (100) Sparsity = 1- C / K (96%) MACs per sample = B C (6100) Beam Beam Beam Data 1 B ••• Incoming Data (1 sample) coefs coefs (1 by K ) ( K by B ) (1 by B ) 6

Interpolation Beamformer •Performed in floating-point to preserve dynamic range. •Generate sparse FIR beam coefficients using Matlab. Coefficients for a beam pointing 20° off axis • 2560-point sparse FIR 5 filter viewed in 2-D. Sample number 10 • Zero-valued coefficients 15 are white, non-zero coefficients are black. 20 • Array shape is visible 25 in beam coefficients. 30 10 20 30 40 50 60 70 80 Stave number 7

Vertical Beamforming Multiple vertical transducers stave for every horizontal position. •Each vertical sensor column is combined into a stave. • No time delay or interpolation is required. • Staves are calculated by a simple dot product. • Integer-to-float conversion must be performed. • Output data must be interleaved. 8

System Block Diagram •Vertical beamformer forms 3 sets of 80 staves from 10 vertical elements each. •Each horizontal beamformer forms 61 beams from the 80 staves, using a two-point interpolation filter. Digital sensor Interpolation Fan 0 data Element data Stave data Beamformer Beams sensor Three-fan Digital data Fan 1 Vertical Interpolation Beams Beamformer Beamformer sensor data Digital 500 sensor Interpolation Fan 2 MFLOPS data Beamformer Beams 40 MB/sec each 1200 MFLOPS each 9

Formal Design Methodology •The Process Network model [Kahn, 1974] . •Superset of dataflow models of computation. •Captures concurrency and parallelism. •Provides correctness. •Guarantees determinate execution of the program. 10

The Process Network Model •A program is represented as a directed graph • Each node represents an independent process. • Each edge represents a one-way FIFO queue of data. P A B •A node may have any number of input or output edges, and may communicate only via these edges. •A node suspends execution when it tries to consume data from an empty queue (blocking reads). •A node is never suspended for producing, so queues can grow without bound (non-blocking writes). 11

Bounded Scheduling • Infinitely large queues cannot be implemented. •The following scheduling policy will execute the program in bounded memory if it is possible [Parks, 1995] 1. Block when attempting to read from an empty queue. 2. Block when attempting to write to a full queue. 3. On artificial deadlock , increase the capacity of the smallest full queue until the producer associated with it can fire. •Fits the thread model of concurrent programming. 12

Process Network Implementation •Implemented in C++ using POSIX Pthreads. •Each node corresponds to a thread. Pthread Pthread •Low-overhead, high-performance, scalable. •Granularity larger than a thread context switch. •Symmetric multiprocessing operating system dynamically schedules threads. •Efficient utilization of multiple processors. 13

Process Network Queues • Nodes operate directly on queue memory, avoiding unnecessary copying. •Queues use mirroring to keep data contiguous. Mirrored data Queue data region Mirror region •Compensates for the lack of circular address buffers. •Queues tradeoff memory usage for overhead. •Virtual memory manager maintains data circularity. 14

Exploiting Parallelism divide by beam vs. divide by time low Latency high <- space -> <- space -> low Memory Usage high <- time -> <- time -> poor Cache Usage good partial Style batch embedded Target workstation • Strategies for high performance on a workstation • Throughput is more importatant than memory usage or latency. • Keep kernel calculations smaller than the cache. • Calculate as much as possible while the data is in cache. 15

System Implementation •Vertical beamformer forms 3 sets of 80 staves from 10 vertical elements each. •Each horizontal beamformer forms 61 beams from the 80 staves, using a two-point interpolation filter. Digital sensor Interpolation Fan 0 data Element data Stave data Beamformer Beams sensor Three-fan Digital data Fan 1 Vertical Interpolation Beams Beamformer Beamformer sensor data Digital 500 sensor Interpolation Fan 2 MFLOPS data Beamformer Beams 40 MB/sec each 1200 MFLOPS each 16

Integration with Process Networks •A single CPU cannot achieve real-time performance. •A horizontal beamformer node manages multiple worker nodes. Horizontal Beamformer Node •The number of worker nodes is set as performance requirements dictate. Worker Nodes •Similar to the traditional thread pool model. 17

Kernel Performance Results •Ten trial mean execution time for 2.6 seconds of data. •Sun Ultra Enterprise 4000 with 8 UltraSPARC-II CPUs at 336 MHz, running Solaris 2.6. Execution time and MFLOPS vs CPUs 12 12 3000 Horizontal 10 10 2500 Vertical seconds (dotted lines) seconds (dotted lines) MFLOPS (solid lines) 8 8 2000 6 6 1500 4 4 1000 2 2 500 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 threads in thread pool kernel performance scalability Horizontal good at 1.22 FLOPS per cycle good Vertical poor at 0.40 FLOPS per cycle poor 18

System Performance Results •Process network and thread Type Seconds MFLOPS thread pool 5.053 2159.0 pool results are within 1%, process network 5.024 2171.5 overhead is small. Execution time and MFLOPS vs CPUs •Process network uses 25 2500 25% less memory with 20 2000 seconds (dotted lines) MFLOPS (solid lines) lower latency. 15 1500 10 1000 •Scalability is evaluated 5 500 by disabling CPUs. 0 0 2 2 3 3 4 4 5 5 6 6 7 7 8 8 CPUs •Process network scalability is good. •Will continue to scale as more CPUs are added. 19

Conclusion •Implemented a 4 GFLOP software sonar beamformer. • Divide the computation by time and not by beam. • Use the Process Network model of computation. • POSIX Pthreads and a symmetric multiprocessing workstation. •This 4 GFLOP beamforming system could execute in real time with 16 UltraSPARC-II CPUs at 336 MHz. •We achieve real-time beamforming at a substantial savings in development cost and time. 20

Real-Time Sonar Beamforming on a Unix Workstation using Process - PowerPoint PPT Presentation

Real-Time Sonar Beamforming on a Unix Workstation using Process Networks and POSIX Threads Gregory E. Allen 1,2 Brian L. Evans 1 David C. Schanbacher 1 1 Embedded Signal Processing Laboratory The University of Texas at Austin 2

#UDT2019 Motivation #UDT2019 Beamforming #UDT2019 Beamforming #UDT2019 Receive beamforming

Todays topics Unix history Unix philosophy Unix standards Unix future Future

Crash Course in Unix For more info check out the Unix man pages -or-

Real-Time High-Throughput Sonar Beamforming Kernels Using Native Signal Processing and Memory

Where can UNIX be used? Real Unix computers Introduction to Unix: Introduction to Unix:

Atomic Workstation Kalev Lember, Red Hat desktop team DevConf.cz 2018 What is Fedora Atomic

Invention of Sonar Kassidy Kervin What is Sonar? So nar was invented in 1906 and is short

2014-2016 Cdiz, 20 October 2016 Interferometric Sonar An interferometric sonar can be

Blind Beamforming using Randomly Distributed Sensors Kung Yao UCLA DARPA CSP Workshop, Jan. 15,

Advanced UNIX CIS 218 Advanced UNIX Director ies again CIS 218 Advanced UNIX 1 Directory

Intro to UNIX CIS 118 Oakton Community College Beginnings Getting Started with Unix Unix

Getting Started with UNIX What is UNIX? Getting Started with UNIX Operating System

Interprocess Communication Pipes (UNIX) Sockets (UNIX) Shared Memory (UNIX)

CS2281: Programming in UNIX Semester 3, 2004/05 CS2281: Programming in UNIX p.1/13 Syllabus

Pilot Sit-Stand Workstation- Tools to Sell 1 6 Pilot-Value Proposition The Pilot

Motorized Height-Adjustable Workstation The Motorized Workstation was ergonomically designed to

of Workstation Single Sign-On George A. Gellert, MD, MPH, MPA Associate CMIO, CHRISTUS Health

CSE543 - Computer and Network Security Module: Virtualization Professor Trent Jaeger 1 CSE543

Workshop on the Instrumentation Needs of CISE Research Preliminary Report Azer Bestavros Boston

Transition to Adulthood Learning Collaborative FY19 Quarter 2 Meeting February 27, 2019

From Dev To Production Sam Newman QCon London 2010 Wednesday, 10 March 2010 Wednesday, 10 March

Online Games Sandra Perdomo Factors that contribute to the

An Analysis of An Analysis of Players Game Hours Game Hours Players Pin Yun Tarng NTU

SAiP World of Warcraft Thanks to Magnus a.k.a. Kormeryion My a bit more humble