Computational Process Networks for Real-Time High-Throughput Signal and Image
Processing Systems on Workstations Gregory E. Allen EE 382C - Embedded Software Systems
17 February 2000 http://www.ece.utexas.edu/~allen/
Computational Process Networks for Real-Time High-Throughput Signal - - PowerPoint PPT Presentation
Computational Process Networks for Real-Time High-Throughput Signal and Image Processing Systems on Workstations Gregory E. Allen EE 382C - Embedded Software Systems 17 February 2000 http://www.ece.utexas.edu/~allen/ Outline Introduction
17 February 2000 http://www.ece.utexas.edu/~allen/
2
3
processors (e.g. 100 digital signal processors in a VME chassis)
4
64-bit registers, 8-bit and 16-bit fixed-point arithmetic
5
i
i
i=1 N
Supervisor thread Pool of worker threads Queue of work requests
6
7
4-GFLOP sonar beamformers; volumes of under 50 units; 1999 technology
8
Custom Hardware Embedded COTS Commodity Workstation Development cost Development time Physical size (m3) Reconfigurability Software portability Hardware upgradability
9
P B A
more general
10
Design System, Cadence Signal Processing Work System)
11
A P B Q C
4 3 2 4
A is number of data words initially present U is number of words inserted by producer on each firing W is number of words removed by consumer on each firing T is number of words in queue before consumer can fire where T ≥ W
12
B D
1-P1 1 1
C
P1 1 1
A
1-P2 P2 F T F T 1 1 1 1 13
14
empty queue (blocking reads)
so queues can grow without bound
1.Block when attempting to read from an empty queue 2.Block when attempting to write to a full queue 3.On artificial deadlock, increase the capacity of the smallest full queue until its producer can fire
15
16
e.g. digital filters and fast Fourier transforms (FFTs)
17
http://www.ece.utexas.edu/~allen/PNSourceCode/
18
Pthread Pthread
19
Mirror region Queue data region Mirrored data
20
(e.g. modulo addressing in DSPs)
inputQ
Node
typedef float T; while (true) { // blocking calls to get in/out data pointers const T* inPtr = inputQ.GetDequeuePtr(inThresh); T* outPtr = outputQ.GetEnqueuePtr(outThresh); DoComputation( inPtr, inThresh, outPtr, outThresh ); // complete node transactions inputQ.Dequeue(inSize);
}
21
int main() { PNThresholdQueue<T> P (queueLen, maxThresh); PNThresholdQueue<T> Q (queueLen, maxThresh); MyProducerNode A (P); MyTransmuterNode B (P, Q); MyConsumerNode C (Q); }
A P B Q C
22
Mirror region Queue data region (queueLen) maxThresh maxThresh
23
Hazard Beam coverage Side view (vertical coverage) Top view (horizontal coverage)
20 40 60 80 30
60
90
120
150
180
Sensor Positions and Pointing angles
5 10 15 20 25
5 10 15 20 25 30 x posistion
24 Typical Sensor Directional Response
20 40 60 80 30
60
90
120
150
180
25
5 10 15 20 25
5 10 15 20 25 30 x posistion
Desired Beam Pointing Angles Typical Beam Directional Response
i = 1 M b(t) beam outputi xi(t) ith sensor output τi ith sensor delay αi ith sensor weight
5 10 15 20
5 10 15 20
Projection for a beam pointing 20° off axis x position, inches 20° sensor element projected element 26
27
28
sensor data sensor data sensor data sensor data Element data 40 MB/s each Three-fan Vertical Beamformer Stave data 32 MB/s each Digital Interpolation Beamformer Digital Interpolation Beamformer Digital Interpolation Beamformer 500 MFLOPS 1200 MFLOPS each Fan 0 Beams Fan 1 Beams Fan 2 Beams Beam data 24 MB/s each
Multiple vertical transducers for every horizontal position stave
29
Interpolate z-N1 Interpolate z-NM
b[n]
Stave data at interval ∆ Interpolate up to interval δ = ∆/L Time delay at interval δ α1 αM
30
Single beam output
sensor data sensor data sensor data sensor data Element data 40 MB/s each Three-fan Vertical Beamformer Stave data 32 MB/s each Digital Interpolation Beamformer Digital Interpolation Beamformer Digital Interpolation Beamformer 500 MFLOPS 1200 MFLOPS each Fan 0 Beams Fan 1 Beams Fan 2 Beams Beam data 24 MB/s each
31
2 4 6 8 10 12 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 CPUs Performance vs. Number of processors
efficiency of 94%
32
Real-time: 4.1 GFLOPS
33