 
              A Fully-Automated High Performance Geolocation Improvement Workflow for Problematic Imaging Systems Devin White 1 , Sophie Voisin 1 , Christopher Davis 1 , Andrew Hardin 1 , Jeremy Archuleta 2 , David Eberius 3 , 1 Scalable and High Performance Geocomputation Team Geographic Information Science and Technology Group 2 Data Architectures Team Computational Data Analytics Group Oak Ridge National Laboratory 3 Innovative Computing Laboratory Department of Electrical Engineering and Computer Science University of Tennessee – Knoxville GTC 2016 – April 5, 2016
Outline  Project background  System overview  Scientific foundation  Technological solution  Current system performance Managed by UT-Battelle for the Department of Energy
Background  Overhead imaging systems (spaceborne and airborne) can vary substantially in their geopositioning accuracy  The sponsor wanted an automated near real time geocoordinate correction capability at Satellites ground processing nodes upstream of their entire user community  Extensible automated solution is using well- established photogrammetric, computer vision, and high performance computing techniques to reduce risk and uncertainty Manned Aircraft  Robust multi-year advanced R&D portfolio aimed at continually improving the system through science, engineering, software, and hardware innovation  We are moving towards on-board processing Unmanned Aerial Systems Managed by UT-Battelle for the Department of Energy
Isn’t This a Solved Problem?  Systemic constraints – Space – Power – Quality/reliability of components – Subject matter expertise – Time – Budget – Politics  Operational constraints – Collection conditions – Sensor and platform health – Existing software quality and performance – System independence  Many of these issues are greatly amplified on UAS platforms Managed by UT-Battelle for the Department of Energy
Sponsor Requirements  Solution must: – Be completely automated – Be government-owned and based on open source/GOTS code – Be sensor agnostic by leveraging the Community Sensor Model framework – Be standards-based (NITF, OGC, etc.) to enable interoperability – Clearly communicate the quantified level of uncertainty using standard methods – Be multithreaded and hardware accelerated – Construct RPC and RSM replacement sensor models as well as generate SENSRB/GLAS and BLOCKA tagged record extensions (TREs) – Improve geolocation accuracy to within a specific value – Complete a run within a specific amount of time  The first sensor supported is one of the sponsor’s most important, but also its most problematic Managed by UT-Battelle for the Department of Energy
Technical Approach (General) 1. Ingest and preprocessing 2. Trusted source selection 3. Global localization (coarse alignment, in ground space) 4. Image registration to generate GCPs (fine alignment, in image space) 5. Sensor model resection and uncertainty propagation 6. Generation and export of new and improved metadata Managed by UT-Battelle for the Department of Energy
PRIMUS Pipeline  Photogrammetric Registration of Imagery from Manned and Unmanned Systems PRIMUS Input NITF R2D2 Preprocessing Controlled Sources Core Libraries: Source Selection • NITRO (Glycerin) Orthorectification Reprojection • GDAL • Proj.4 Global Localization libpq (Postgres) • Mosaicking OpenCV • Registration • CUDA • OpenMP • CSM Resection MSP • CPU Implementation Metadata Output NITF GPU Implementation Managed by UT-Battelle for the Department of Energy
Source Selection  Find and assemble trusted control imagery and elevation data that cover the spatial extent of an image. Elevation Input: image Source Selection Imagery Managed by UT-Battelle for the Department of Energy
Mosaic Generation Read images from Start disk Create bounding box Grow bounding 150% box Create (elevation + geoid) Mosaic imagery mosaic Query R2D2’s DB Returns image paths Managed by UT-Battelle for the Department of Energy
System Hardware  CPU/GPU hybrid architecture – 12 Dell C4130 HPC nodes – Each node has:  48 logical processors  256GB of RAM  Dual high speed SSDs  4 Tesla K80s – Virtual Machine option Managed by UT-Battelle for the Department of Energy
A Note on Virtualization  We ran VMware on one of our nodes with mixed results  We were able to access one GPU on that node through a VM using PCI passthrough, but the other seven remained unavailable due to VMware software limitations  VMware, GPU, and OS resource requirements limited us to two VMs per node, which is not very helpful  We greatly appreciate the technical assistance NVIDIA provided as we conducted this experiment  Verdict: It’s still a little too early for virtualization to be really useful for high-density compute nodes with multiple GPUs Managed by UT-Battelle for the Department of Energy
PRIMUS Pipeline  Photogrammetric Registration of Imagery from Manned and Unmanned Systems PRIMUS Input NITF R2D2 Preprocessing Controlled Sources Core Libraries: Source Selection • NITRO (Glycerin) Orthorectification Reprojection • GDAL • Proj.4 Global Localization libpq (Postgres) • Mosaicking OpenCV • Registration • CUDA • OpenMP • CSM Resection MSP • CPU Implementation Metadata Output NITF GPU Implementation Managed by UT-Battelle for the Department of Energy
Orthorectification Process Source image Begin Orthorectify Create bounding box Grow bounding box Query R2D2’s DB Returns image paths Read images from disk Create (elevation + geoid) mosaic Control Selection Global Localization Managed by UT-Battelle for the Department of Energy
Orthorectification Solution  Accelerate portions of our OpenMP-enabled code with GPUs using CUDA – Sensor Model calculations – Band Interpolation calculations  Optimize both of the CUDA kernels and their associated memory operations  Create in-house Transverse Mercator CUDA device functions  Combined the Sensor Model and Band Interpolation kernels Managed by UT-Battelle for the Department of Energy
Orthorectification Optimized Managed by UT-Battelle for the Department of Energy
Orthorectification Performance • JPEG2000-compressed commercial image pair (36,000 x 30,000 each) • GPU-enabled RPC orthorectification to UTM • Each is done in 8 seconds, using one eighth of a single node’s horsepower • 65,000,000,000 pixels per minute per node, running on multiple nodes • That includes building HAE terrain models on the fly from tiled global sources Managed by UT-Battelle for the Department of Energy
PRIMUS Pipeline  Photogrammetric Registration of Imagery from Manned and Unmanned Systems PRIMUS Input NITF R2D2 Preprocessing Controlled Sources Core Libraries: Source Selection • NITRO (Glycerin) Orthorectification Reprojection • GDAL • Proj.4 Global Localization libpq (Postgres) • Mosaicking OpenCV • Registration • CUDA • OpenMP • CSM Resection MSP • CPU Implementation Metadata Output NITF GPU Implementation Managed by UT-Battelle for the Department of Energy
Global Localization - Coarse Adjustment  Roughly determine where source and control images match.  Adjust the sensor model.  Triage step in the pipeline. Input: Output: source and control images coarse sensor model adjustments C C S S Global Localization Managed by UT-Battelle for the Department of Energy
Computation - Solution Space  Solution Space: – Each possible shift (exhaustive search) C  Solution: Solution space – Similarity coefficient between the source and the control sub-image S Managed by UT-Battelle for the Department of Energy
Similarity Metric  Normalized Mutual Information 𝑂𝑁𝐽 = 𝐼 𝑇 + 𝐼 𝐷 𝐼 𝐾 Source image and mask: N S xM S pixels 𝑙 𝐼 = − 𝑞 𝑗 𝑚𝑝 2 𝑞 𝑗 𝑗=0 𝐼 is the entropy 𝑞 𝑗 is the probability density function 𝑙 ∈ 0. . 255 for S and C 0. . 65535 for J  Histogram with masked area Control image and mask: N C xM C pixels – Missing data – Artifact – Homogeneous area Solution space: nxm NMI coefficients Managed by UT-Battelle for the Department of Energy
Visual Example  Histogram computation (for normalized mutual information) – nVidia  Histogram64  Histogram256 – Literature  Joint histogram 80x80 bins – Our problem (joint)Histogram 65536 nxm times N S xM S data Managed by UT-Battelle for the Department of Energy
Kernel families  How to leverage the GPU to compute one solution\one joint histogram (65536 bins) – 1 kernel per NMI computation  Pros: use shared memory to piecewise fill the histogram -  Cons: atomicAdd – syncthread for reduction – CPU call for each solution – 1 block per NMI computation (K1, K2)  Pros: use shared memory to piecewise fill the histogram – 1 kernel to evaluate all solutions  Cons: atomicAdd – syncthread for reduction – 1 thread per NMI computation (K3, K4, K5)  Pros: global memory access read only - no atomicAdd – no syncthread – 1 kernel to evaluate all solutions  Cons: stack frame 264192 Bytes / thread Managed by UT-Battelle for the Department of Energy
Recommend
More recommend