Energy Efficient Adaptive Beamforming on Sensor Networks Viktor K. - PowerPoint PPT Presentation

Energy Efficient Adaptive Beamforming on Sensor Networks Viktor K. Prasanna Bhargava Gundala, Mitali Singh Dept. of EE-Systems University of Southern California email: prasanna@usc.edu http://ceng.usc.edu/~prasanna http://pacman.usc.edu

Outline � Problem Definition � Computational Characteristics � Prior Solution � Power Optimizations � Sensor Node Level � Inter Node Level � Challenges/Discussion 1

Problem Scenario Energy Constrained Network Passive Active 2

Beamforming Def: The technique which spatially filters the signals received from an array of sensors and estimates the spatial features of the sources Procedure: 1. passively and repeatedly sample acoustic propagation wave field signals 2. input data, linearly combined with a weight matrix to form a sonar beam for a particular direction of look Adaptive Sonar Beamforming: For High SNR and High resolution Time changing signal and noise properties included in the derivation of weights, making them adapt accordingly 3

Space Time Adaptive Processing Range gates 1 2 L Elements 1 N Pulse Repetition Interval N s I R P L M Each CPI Target Detection (Coherent Processing Interval) 4

MITRE RT_STAP Benchmark Preprocessing Step 2 Preprocessing Step 1 Input Data L .. .. (1920) N (22) M (64) . . Doppler Weight Weight Processing Computation Application T latency = 161.25 msec & T period = 32.25 msec 5

Elements (N = 22) (M = 64) PRIs Input Data Cube (L = 1920) Gates Range 6

Sonar Signal Processing Adaptive Beamforming Sampling Rate Output Rate Frequency Domain =10 Hz~25 KHz =1 Hz~100 Hz Element Adaptive FFT Space Beam- forming Adaptive Beam Space FFT Beam- forming Conventional 100 ~5000 Beams per Output Beamforming Time Domain 7

An Example Adaptive Beamformer MVDR (Minimum Variance Distortionless Response) Frequency Bins s Channel F Corner FFT N N Turn F Beams per Bin N B F N Factorization N F N F Linear Solver & Beamformer Steering Covariance B 8

Computational Characteristics D D A D D A T A A T S 1 S 2 S 3 S 4 A T T A A A Outputs Initial Data Layout � Overall processing consists of sequence of subproblems � Computational requirements are different for each subproblem � Large amount of data is repeatedly processed in real-time � Data access patterns change from subproblem to subproblem � Throughput and latency performance requirements 9

Adaptive Processing Key Problems � Doppler Processing (FFT) � Weight Computation apply (Co Variance matrix factorization) � Weight Application adaptation Gates (Matrix Vector Product) Range Elements (N = 22) (L = 1920) PRIs (M = 64) 10

Prior Solution Architecture= tightly coupled collection of processors Target detection High bandwidth, low latency network 11

Key Issue: Communication Cost Coarse grain machines : Powerful processing nodes - T3E: Typical Configuration -SP-2: Typical Configuration •1200 Mflops/node (T3E- 1200) • 640 Mflops/node • Local Memory Access Time: • 64 MB – 4 GB Memory 87 ~ 253 nsec • 4.5 – 36.2 GB Internal Disk • Global Memory Access Time: µ µ µ µ 1~2 sec (SHMEM) � Large software overhead for message transfer - SP-2: ~39 µ sec overhead/message using MPL/MPI ~ 9 nsec/byte/node transfer rate - local memory access: 100’s of nsec 12

Key Idea- Data Remapping P 0 P 3 P 0 P 3 P 0 P 3 Data Access Pattern S 1 S 2 S 3 Remap? Remap? Benefits of Remapping Must Exceed the Overhead 13

Impact of Data Remapping Our Results Results reported in IPPS ‘95 Implementation performed on IBM SP-2 at MHPCC Code developed using C, MPI and ESSL 14

Lessons learnt Objective : Adaptive beamforming on parallel machines � Task level parallelism � Minimize communication cost � Data Remapping 15

Energy Efficiency � Energy Constrained � Network Power is critical and must be conserved � Sensors � Reduce power dissipation at sensor node level � energy efficient algorithms � Decrease power dissipation at inter-node level � Optimize on communication cost between sensors � 16

Power Model for a Processing Element Frequency Frequency Control Control f p f b Processor Processor Memory FU Cache FU Cache Power Total = Power Processor +Power Data bus + Power Memory Power unit = Power Dynamic + Power Static = 0.5f(n)CV 2 f Active + VI Leakage F max ∝ (V-V t )/V 17

Reduce Processor-Memory Data Traffic Instructions for Memory access consume lot of power Instruction Energy (10 -8 Joules) (Intel 486DX2) MOV DX BX 2.49 MOV DX [BX] 3.53 MOV [BX] DX 4.30 Reduce # of memory accesses � reduce cache misses high data reuse in cache � use registers � Reduce power consumed on the data bus 18

Example: Matrix Multiplication Cache size =n j k j i x i k A B C Do i = 0 ; Do j = 0 ; A[i,j] � � 0 ; � � Do k = 0 ; A[i, j] � � A[i,j] + B[i,k] x C[k,j] ; � � k++; j++; i++ ; ≈ Energy = α n 3 + β (n+n 2 )n + γ (3n 2 ) ( α + β )n 3 Time = n 3 + lower order terms 19

Optimization I: Reduce Bus Traffic Block Matrix Multiply n n n n x Energy = α n 3 + 2 β (n.n 1 /2 )n + γ (3n 2 ) Time = n 3 + lower order terms 20

Optimization II: Reduce Peak Bus Bandwidth A B C n n n n n n 3 2 n 2 Data = 2 n Time = 1 Bus Data Rate ∝ Processor Rate! n 21

Optimization III: Application directed Data Layouts � Applications have different data access patterns � Matrices accessed by rows, columns, diagonals, sub-squares � Tree structures accessed along paths, sub-trees � “Naive” data layouts degrade performance � Large working sets cause capacity misses � Improper alignment in memory causes conflict misses Row major Layout Block Layout a 0,2 a 0,3 a 0,0 a 0,1 a 0,0 a 0,1 a 0,2 a 0,3 a 1,0 a 1,1 a 1,2 a 1,3 a 1,0 a 1,0 a 1,2 a 1,3 a 2,0 a 2,1 a 2,2 a 2,3 a 2,0 a 2,1 a 2,2 a 2,3 a 3,0 a 3,1 a 3,2 a 3,3 a 3,0 a 3,1 a 3,2 a 3,3 Page 0 Page 2 Page 2 Page 0 Page 1 Page 3 Page 1 Page 3 22

Cache Friendly Algorithms Cache friendly � High data reuse � Low cache pollution � Regular access patterns Data layouts � Static data layouts (Matrix Multiply) � Dynamic data layouts (FFT) 23

Fast Fourier Transform DFT: Cooley-Tukey Algorithm � Compute DFT of size N = N 1 *N 2 � Step1: compute N 2 DFTs of size N 1 � Step2: multiply twiddle factors � Step3: compute N 1 DFTs of size N 2 � Divide and conquer recursively Current Approach � MIT FFTW � Determine optimal factorization � Perform low level optimizations for kernels � Construct larger size FFTs from kernels � Key Assumption � All DFTs of same size have same execution time 24

Problem with Current Approach All N-point DFTs do not have the same cost! � different data access patterns with various strides � stride affects execution time 32-point FFT with Strided Access - Experimental Results N = 32 70 60 Execution Time 50 (usec) 40 30 20 10 0 0 5 10 15 20 Stride (2^s) Sun Ultra 1: 167MHz, L2 Cache = 512 KB = 32 K points 25

Our Approach Reorganize input data layout to change non-unit stride to unit stride Dynamic Data Layout Perform data reorganization during computation N 1 N 2 N 2 -point FFTs N 1 -point FFTs Data Reorganization 26

Example FFTW USC approach Decomposition trees for a 1024*1024 point FFT 1611.125 ms 1039.6496 ms 54.96% improvement over state-of-the-art FFTW package on DEC Alpha 27

Other Techniques for Node Level Power Optimizations ? � Voltage frequency scaling f max α (V-V t )/V � Power management (idle/sleep/active states) � Reduce precision Instruction Energy (10 -8 Joules) (Fujitsu Sparc‘934) � Clock Gating OR 3.26 MUL 3.26 28

Current Work � Development and Verification of techniques proposed for power optimization � Existing simulators Simple Power(based on Simple Scalar architecture) � Joule Track (Code Length Limitations) � � Board level Power Measurements Brutus Evaluation Board (SA-1100) � � Build a functional level power simulation Fast with acceptable level of accuracy. � Develop a multiprocessor power model � 31

Space Time Representation A ⊗ B for N x N matrices Compute results in each block B 11 B 12 B 1N � Schedule blocks row-major � N 2 steps … A 11 A 12 c c c � Data per step ∝ N √ c … … � Operations per step ∝ Nc � Data reuse per step ∝ √ c … A 1N � Total traffic ∝ N 2 * N √ c = N 3 = computation for result (i,j) c √ c c = cache size 33

Theorem Unidirectional Space-Time representation leads to cache friendly algorithms => Energy Efficient Algorithms 34

Network level Energy Optimization � Computation cost is much lower than communication cost � Radio interface consumes a large amount of power POWER WINS sensor Consumed Node Transmission(100m) 600mw (at 100kbits/sec) Reception 300mw Processor (SA1100) 250MIPS/watt � Energy to transfer 32 bits over 100m in WINS sensor node =( (600 +300)mw ÷ 100kbits/s) x 32 = 288 x 10 –6 Joules � Energy to execute a 32 bit instruction using SA1100 processor = 1 ÷ 250 MIPS/watt = 0.004 x 10 –6 Joules � Additional overhead for bits added for error correction � Retransmissions are frequent due to unreliable links(e.g.wireless) 29

Energy Efficient Adaptive Beamforming on Sensor Networks Viktor K. - PowerPoint PPT Presentation

Energy Efficient Adaptive Beamforming on Sensor Networks Viktor K. Prasanna Bhargava Gundala, Mitali Singh Dept. of EE-Systems University of Southern California email: prasanna@usc.edu http://ceng.usc.edu/~prasanna http://pacman.usc.edu

#UDT2019 Motivation #UDT2019 Beamforming #UDT2019 Beamforming #UDT2019 Receive beamforming

Time-domain beam signals for adaptive beamforming UDT 2019, Stockholm - 13th May a sound

Blind Beamforming using Randomly Distributed Sensors Kung Yao UCLA DARPA CSP Workshop, Jan. 15,

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Sensor Relocation Mesh-based Sensor Relocation Mesh-based Sensor Relocation Objective for

Sensor Networks & TinyDB Author: Roman Kolcun Supervisor: Julie A. McCann Index Sensor

Temporal Privacy in Wireless Sensor Networks Temporal Privacy in Wireless Sensor Networks

Mobile Communications Ad-Hoc Networks & Wireless Sensor Networks Ad-hoc networks

Algorithmic approaches to distributed adaptive transmit beamforming 5th international conference

Localization in Sensor Networks Localization in Sensor Networks Jie Gao Computer Science

Localization in Sensor Networks I Localization in Sensor Networks I Jie Gao Jie Gao Computer

Hybrid Energy Efficient Reactive Protocol For Wireless Sensor Networks Prepared by Saad Noor

Feedback based distributed adaptive transmit beamforming Algorithmic considerations Stephan Sigg

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Analytic transfer theorems (common cases) Rational functions. Meromorphic functions.

TreeAge Software Guide Sensitivity Analysis Antie-Eater Open the file Antie-Eater-0.trex OR

BayesNAS: A Bayesian Approach for Neural Architecture Search Hongpeng Zhou 1 , Minghao Yang 1 , Jun

Chapter 19 Data Structures - struct -dynamic memory allocation Data Structures A data structure

Sil ilent Data Access Protocol for NVRAM+RDMA Dis istributed Storage Qingyue Liu Peter Varman

1 Search Overview Backtracking Search Basic solution: DFS / backtracking Add a new

Contents Protokolle zur internen Uhrensynchronisation in sicherheitskritischen melody

Constraint Satisfaction Philipp Koehn 28 February 2019 Philipp Koehn Artificial Intelligence:

Energy Efficient Adaptive Beamforming on Sensor Networks Viktor K. - PowerPoint PPT Presentation

Energy Efficient Adaptive Beamforming on Sensor Networks Viktor K. Prasanna Bhargava Gundala, Mitali Singh Dept. of EE-Systems University of Southern California email: prasanna@usc.edu http://ceng.usc.edu/~prasanna http://pacman.usc.edu

#UDT2019 Motivation #UDT2019 Beamforming #UDT2019 Beamforming #UDT2019 Receive beamforming

Time-domain beam signals for adaptive beamforming UDT 2019, Stockholm - 13th May a sound

Blind Beamforming using Randomly Distributed Sensors Kung Yao UCLA DARPA CSP Workshop, Jan. 15,

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Sensor Relocation Mesh-based Sensor Relocation Mesh-based Sensor Relocation Objective for

Sensor Networks &amp; TinyDB Author: Roman Kolcun Supervisor: Julie A. McCann Index Sensor

Temporal Privacy in Wireless Sensor Networks Temporal Privacy in Wireless Sensor Networks

Mobile Communications Ad-Hoc Networks &amp; Wireless Sensor Networks Ad-hoc networks

Algorithmic approaches to distributed adaptive transmit beamforming 5th international conference

Localization in Sensor Networks Localization in Sensor Networks Jie Gao Computer Science

Localization in Sensor Networks I Localization in Sensor Networks I Jie Gao Jie Gao Computer

Hybrid Energy Efficient Reactive Protocol For Wireless Sensor Networks Prepared by Saad Noor

Feedback based distributed adaptive transmit beamforming Algorithmic considerations Stephan Sigg

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Analytic transfer theorems (common cases) Rational functions. Meromorphic functions.

TreeAge Software Guide Sensitivity Analysis Antie-Eater Open the file Antie-Eater-0.trex OR

BayesNAS: A Bayesian Approach for Neural Architecture Search Hongpeng Zhou 1 , Minghao Yang 1 , Jun

Chapter 19 Data Structures - struct -dynamic memory allocation Data Structures A data structure

Sil ilent Data Access Protocol for NVRAM+RDMA Dis istributed Storage Qingyue Liu Peter Varman

1 Search Overview Backtracking Search Basic solution: DFS / backtracking Add a new

Contents Protokolle zur internen Uhrensynchronisation in sicherheitskritischen melody

Constraint Satisfaction Philipp Koehn 28 February 2019 Philipp Koehn Artificial Intelligence:

Sensor Networks & TinyDB Author: Roman Kolcun Supervisor: Julie A. McCann Index Sensor

Mobile Communications Ad-Hoc Networks & Wireless Sensor Networks Ad-hoc networks