Geolocation Improvement Workflow for Problematic Imaging Systems - - PowerPoint PPT Presentation

geolocation improvement workflow for
SMART_READER_LITE
LIVE PREVIEW

Geolocation Improvement Workflow for Problematic Imaging Systems - - PowerPoint PPT Presentation

A Fully-Automated High Performance Geolocation Improvement Workflow for Problematic Imaging Systems Devin White 1 , Sophie Voisin 1 , Christopher Davis 1 , Andrew Hardin 1 , Jeremy Archuleta 2 , David Eberius 3 , 1 Scalable and High Performance


slide-1
SLIDE 1

Devin White1, Sophie Voisin1, Christopher Davis1, Andrew Hardin1, Jeremy Archuleta2, David Eberius3,

1Scalable and High Performance Geocomputation Team

Geographic Information Science and Technology Group

2Data Architectures Team

Computational Data Analytics Group

Oak Ridge National Laboratory

3Innovative Computing Laboratory

Department of Electrical Engineering and Computer Science

University of Tennessee – Knoxville

GTC 2016 – April 5, 2016

A Fully-Automated High Performance Geolocation Improvement Workflow for Problematic Imaging Systems

slide-2
SLIDE 2

Managed by UT-Battelle for the Department of Energy

Outline

 Project background  System overview  Scientific foundation  Technological solution  Current system performance

slide-3
SLIDE 3

Managed by UT-Battelle for the Department of Energy

Background

 Overhead imaging systems (spaceborne and airborne) can vary substantially in their geopositioning accuracy  The sponsor wanted an automated near real time geocoordinate correction capability at ground processing nodes upstream of their entire user community  Extensible automated solution is using well- established photogrammetric, computer vision, and high performance computing techniques to reduce risk and uncertainty  Robust multi-year advanced R&D portfolio aimed at continually improving the system through science, engineering, software, and hardware innovation  We are moving towards on-board processing

Satellites Manned Aircraft Unmanned Aerial Systems

slide-4
SLIDE 4

Managed by UT-Battelle for the Department of Energy

Isn’t This a Solved Problem?

 Systemic constraints

– Space – Power – Quality/reliability of components – Subject matter expertise – Time – Budget – Politics

 Operational constraints

– Collection conditions – Sensor and platform health – Existing software quality and performance – System independence

 Many of these issues are greatly amplified on UAS platforms

slide-5
SLIDE 5

Managed by UT-Battelle for the Department of Energy

Sponsor Requirements

 Solution must:

– Be completely automated – Be government-owned and based on open source/GOTS code – Be sensor agnostic by leveraging the Community Sensor Model framework – Be standards-based (NITF, OGC, etc.) to enable interoperability – Clearly communicate the quantified level of uncertainty using standard methods – Be multithreaded and hardware accelerated – Construct RPC and RSM replacement sensor models as well as generate SENSRB/GLAS and BLOCKA tagged record extensions (TREs) – Improve geolocation accuracy to within a specific value – Complete a run within a specific amount of time

 The first sensor supported is one of the sponsor’s most important, but also its most problematic

slide-6
SLIDE 6

Managed by UT-Battelle for the Department of Energy

Technical Approach (General)

  • 1. Ingest and preprocessing
  • 2. Trusted source selection
  • 3. Global localization (coarse alignment, in ground space)
  • 4. Image registration to generate GCPs (fine alignment, in

image space)

  • 5. Sensor model resection and uncertainty propagation
  • 6. Generation and export of new and improved metadata
slide-7
SLIDE 7

Managed by UT-Battelle for the Department of Energy

PRIMUS Pipeline

 Photogrammetric Registration of Imagery from Manned and Unmanned Systems

PRIMUS Input NITF Source Selection Global Localization Registration Resection Metadata Output NITF R2D2 Reprojection Orthorectification Mosaicking Controlled Sources Core Libraries:

  • NITRO (Glycerin)
  • GDAL
  • Proj.4
  • libpq (Postgres)
  • OpenCV
  • CUDA
  • OpenMP
  • CSM
  • MSP

GPU Implementation Preprocessing CPU Implementation

slide-8
SLIDE 8

Managed by UT-Battelle for the Department of Energy

Source Selection

 Find and assemble trusted control imagery and elevation data that cover the spatial extent of an image.

Source Selection Elevation Imagery Input: image

slide-9
SLIDE 9

Managed by UT-Battelle for the Department of Energy

Mosaic Generation

Create bounding box Grow bounding box Query R2D2’s DB Start Read images from disk Mosaic imagery Create (elevation + geoid) mosaic 150%

Returns image paths

slide-10
SLIDE 10

Managed by UT-Battelle for the Department of Energy

System Hardware

 CPU/GPU hybrid architecture

– 12 Dell C4130 HPC nodes – Each node has:

 48 logical processors  256GB of RAM  Dual high speed SSDs  4 Tesla K80s

– Virtual Machine

  • ption
slide-11
SLIDE 11

Managed by UT-Battelle for the Department of Energy

A Note on Virtualization

 We ran VMware on one of our nodes with mixed results  We were able to access one GPU on that node through a VM using PCI passthrough, but the other seven remained unavailable due to VMware software limitations  VMware, GPU, and OS resource requirements limited us to two VMs per node, which is not very helpful  We greatly appreciate the technical assistance NVIDIA provided as we conducted this experiment  Verdict: It’s still a little too early for virtualization to be really useful for high-density compute nodes with multiple GPUs

slide-12
SLIDE 12

Managed by UT-Battelle for the Department of Energy

PRIMUS Pipeline

 Photogrammetric Registration of Imagery from Manned and Unmanned Systems

PRIMUS Input NITF Source Selection Global Localization Registration Resection Metadata Output NITF R2D2 Reprojection Orthorectification Mosaicking Controlled Sources Core Libraries:

  • NITRO (Glycerin)
  • GDAL
  • Proj.4
  • libpq (Postgres)
  • OpenCV
  • CUDA
  • OpenMP
  • CSM
  • MSP

GPU Implementation Preprocessing CPU Implementation

slide-13
SLIDE 13

Managed by UT-Battelle for the Department of Energy

Orthorectification Process

Create bounding box Grow bounding box Query R2D2’s DB Begin Read images from disk Create (elevation + geoid) mosaic Orthorectify

Source image Control Selection Global Localization Returns image paths

slide-14
SLIDE 14

Managed by UT-Battelle for the Department of Energy

Orthorectification Solution

 Accelerate portions of our OpenMP-enabled code with GPUs using CUDA

– Sensor Model calculations – Band Interpolation calculations

 Optimize both of the CUDA kernels and their associated memory operations  Create in-house Transverse Mercator CUDA device functions  Combined the Sensor Model and Band Interpolation kernels

slide-15
SLIDE 15

Managed by UT-Battelle for the Department of Energy

Orthorectification Optimized

slide-16
SLIDE 16

Managed by UT-Battelle for the Department of Energy

  • JPEG2000-compressed commercial image pair (36,000 x 30,000 each)
  • GPU-enabled RPC orthorectification to UTM
  • Each is done in 8 seconds, using one eighth of a single node’s horsepower
  • 65,000,000,000 pixels per minute per node, running on multiple nodes
  • That includes building HAE terrain models on the fly from tiled global sources

Orthorectification Performance

slide-17
SLIDE 17

Managed by UT-Battelle for the Department of Energy

PRIMUS Pipeline

 Photogrammetric Registration of Imagery from Manned and Unmanned Systems

PRIMUS Input NITF Source Selection Global Localization Registration Resection Metadata Output NITF R2D2 Reprojection Orthorectification Mosaicking Controlled Sources Core Libraries:

  • NITRO (Glycerin)
  • GDAL
  • Proj.4
  • libpq (Postgres)
  • OpenCV
  • CUDA
  • OpenMP
  • CSM
  • MSP

GPU Implementation Preprocessing CPU Implementation

slide-18
SLIDE 18

Managed by UT-Battelle for the Department of Energy

Global Localization - Coarse Adjustment

 Roughly determine where source and control images match.  Adjust the sensor model.  Triage step in the pipeline.

Global Localization Output: coarse sensor model adjustments

C S C S

Input: source and control images

slide-19
SLIDE 19

Managed by UT-Battelle for the Department of Energy

S

Computation - Solution Space

 Solution Space:

– Each possible shift (exhaustive search)

 Solution:

– Similarity coefficient between the source and the control sub-image

C Solution space

slide-20
SLIDE 20

Managed by UT-Battelle for the Department of Energy

Similarity Metric

 Normalized Mutual Information  Histogram with masked area

– Missing data – Artifact – Homogeneous area

Source image and mask: NSxMS pixels Control image and mask: NCxMC pixels Solution space: nxm NMI coefficients

𝑂𝑁𝐽 = 𝐼𝑇 + 𝐼𝐷 𝐼

𝐾

𝐼 = − 𝑞 𝑗 𝑚𝑝𝑕2𝑞 𝑗

𝑙 𝑗=0

𝐼 is the entropy 𝑞 𝑗 is the probability density function 𝑙 ∈ 0. . 255 for S and C

  • 0. . 65535 for J
slide-21
SLIDE 21

Managed by UT-Battelle for the Department of Energy

Visual Example

 Histogram computation (for normalized mutual information)

– nVidia

 Histogram64  Histogram256

– Literature

 Joint histogram 80x80 bins

– Our problem (joint)Histogram65536 nxm times NSxMS data

slide-22
SLIDE 22

Managed by UT-Battelle for the Department of Energy

Kernel families

 How to leverage the GPU to compute one solution\one joint histogram (65536 bins)

– 1 kernel per NMI computation

 Pros: use shared memory to piecewise fill the histogram -  Cons: atomicAdd – syncthread for reduction – CPU call for each solution

– 1 block per NMI computation (K1, K2)

 Pros: use shared memory to piecewise fill the histogram – 1 kernel to evaluate all solutions  Cons: atomicAdd – syncthread for reduction

– 1 thread per NMI computation (K3, K4, K5)

 Pros: global memory access read only - no atomicAdd – no syncthread – 1 kernel to evaluate all solutions  Cons: stack frame 264192 Bytes / thread

slide-23
SLIDE 23

Managed by UT-Battelle for the Department of Energy

Kernel details

Kernels K1 K2 K3 K4 K5

  • ccupancy

100% threads / block 128 256 128 128 128 stack frame 2048 1024 264192 264192 264192 total memory / block 0.26 MB 0.26 MB 33.81 MB 33.81 MB 33.81 MB total memory / SM 4.19 MB 4.19 MB 541.06 MB 541.06 MB 541.06 MB total memory / GPU 0.54 GB 0.54 GB 7.03 GB 7.03 GB 7.03 GB memory % 0.47% 0.47% 61.06% 61.06% 61.06% spill stores – spill loads 0 – 0 registers 33 34 27 26 29 smem / block 3072 3072 smem / SM 49152 49152 smem % 42.86% 42.86% 0.00% 0.00% 0.00% cmem[0] – cmem[2] 448 – 20

  • partial entropy
  • atomicAdd
  • synchronization
  • 1 solution / block
  • 2D index for the joint

histogram

  • 1 solution / thread
  • 1D index for the joint

histogram

  • 1 solution / thread
  • no if condition for mask
  • 1D index for the joint

histogram

  • 1 solution / thread
slide-24
SLIDE 24

Managed by UT-Battelle for the Department of Energy

10 20 30 40 50 60 70 80 90 100 10000 20000 30000 40000 50000 60000

time in seconds number of solutions

Kernel timings

K1* K1 K2* K2 K3* K3 K4* K4 K5* K5 25

Kernel Timings with Respect of Solution Space

source images:

  • 512 x 256

mask:

  • 0% - K*
  • 50% - K

30 control images:

  • 512 x 256 – 1 solution
  • 991 x 383 – 61440 solutions

0.5 1 1.5 2 2.5 5000 10000

slide-25
SLIDE 25

Managed by UT-Battelle for the Department of Energy

Summary for Global Localization

 Global Localization as coarse adjustment of the sensor model

– Problematic: joint histogram computation for each solution

 No compromise on the number of bins - 65536  Exhaustive search

– Solution: leverage of the K80 specifications

 12 GB of memory  1 thread per solution  Less than 25 seconds - 61K solutions for a 131K pixel image

slide-26
SLIDE 26

Managed by UT-Battelle for the Department of Energy

PRIMUS Pipeline

 Photogrammetric Registration of Imagery from Manned and Unmanned Systems

PRIMUS Input NITF Source Selection Global Localization Registration Resection Metadata Output NITF R2D2 Reprojection Orthorectification Mosaicking Controlled Sources Core Libraries:

  • NITRO (Glycerin)
  • GDAL
  • Proj.4
  • libpq (Postgres)
  • OpenCV
  • CUDA
  • OpenMP
  • CSM
  • MSP

GPU Implementation Preprocessing CPU Implementation

slide-27
SLIDE 27

Managed by UT-Battelle for the Department of Energy

S

Registration - Fine Adjustment  Account for Global Localization Coarse Resolution

C

slide-28
SLIDE 28

Managed by UT-Battelle for the Department of Energy

Control (X,Y) Descriptor (152.511,148.398) (123, 122, … , 56) (101.124,88.6674) (164, 45, …, 165) ⁞ ⁞ Source (X,Y) Descriptor (157.511,153.398) (123, 122, … , 56) (106.124,93.6674) (164, 45, …, 165) ⁞ ⁞

Registration Workflows

detect describe describe detect Source Image Control Image Tiepoint list Keypoint list Keypoint list

search window

match Descriptor Descriptor

+ + + + + + + + + +

detect from

+ + + + + + + + + O + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + O + + + + + + + + + +

metric metric

Option “Match” Option “Detect From”

slide-29
SLIDE 29

Managed by UT-Battelle for the Department of Energy

OpenCV Library

 Leverage OpenCV 2.4.11

Detector CPU GPU

BRISK ~  DENSE ~  FAST   GFTT(w/wo HARRIS) ~  MSER ~  ORB(HARRIS/FAST)   SIFT ~  SIMPLEBLOB ~  STAR (CenSurE) ~  SURF  

Descriptor CPU GPU

BRIEF ~  BRISK ~  FREAK ~  INTENSITY*   ORB(HARRIS/FAST)   SIFT ~  SURF  

Matcher CPU GPU Match Detect from

BRUTEFORCE     FLANN ~  INTENSITY based*    

detect describe describe detect Source Image Control Image Tiepoint list Keypoint list Keypoint list

search window

match Descriptor Descriptor detect from

metric metric

Option “Match” Option “Detect From”

slide-30
SLIDE 30

Managed by UT-Battelle for the Department of Energy

OpenCV limitation(s)

 OpenCV 2.4.11

– for the current Source image

 for each keypoint – point to the associated template / descriptor – point to the associated image / collection of descriptors – call the GPU function to compute the metric – find the best match

 In-house

– for the current Source image

 call the GPU function to find the best match for all keypoints using the descriptor definition and the metric The keypoints and their associated template\image are managed outside the GPU call Each template\image couple locks the GPU during its function call The keypoints and their associated template\image are managed by the GPU call All the template\image couple access the GPU during the same function call

slide-31
SLIDE 31

Managed by UT-Battelle for the Department of Energy

– OpenCV 2.4.11 – In-house

Visual comparison

 What is the difference?

– OpenCV 2.4.11 – In-house

Blocks

  • rganization

Threads

  • rganization

CPU management of the pointer to the images per keypoints GPU management of the block and threads

Blocks

  • rganization

+ + + + + + + + + + + + + + + + + + + +

Threads

  • rganization

Pointer to the sub-images

Ø

slide-32
SLIDE 32

Managed by UT-Battelle for the Department of Energy

Back to NMI as Similarity Metric

 Normalized Mutual Information  Small “images” but numerous Keypoints

– Numerous keypoints

 up to 65536 with GPU SURF detector

– Image / Descriptor size

 11 x 11 intensity values to describe

– Search area

 73 x 73 control sub-image

– Solution space

 63 x 63 Descriptors: 11x11 intensity values Search windows: 73x73 pixels Solution spaces: 63x63 NMI coefficients 𝑂𝑁𝐽 = 𝐼𝑇 + 𝐼𝐷 𝐼𝐾 𝐼 = − 𝑞 𝑗 𝑚𝑝𝑕2𝑞 𝑗

𝑙 𝑗=0

𝐼 is the entropy 𝑞 𝑗 is the probability density function 𝑙 ∈ 0. . 255 for S and C

  • 0. . 65535 for J

… … …

slide-33
SLIDE 33

Managed by UT-Battelle for the Department of Energy

Kernel details

 Basic Kernel (K1)

– Find the best match for all keypoints

 1 block per keypoint

– Optimize for the 63 x 63 search windows

 64 threads / blocks – 1 idle  each threads compute a “row” of solutions

– limit to 1 joint histogram per block

 Loop over entire histogram to compute

 Optimized Kernel (K2)

– Sparse joint histogram

 65536 bins but only 121 values

– Leverage the 11 x 11 descriptor size

 Create 2 lists (length 121) of intensity values  Update joint histogram count from lists  Loop over lists to retrieve aggregate count  Set aggregate count to 0 after first retrieval List of indices for source List of indices for the corresponding subset control Joint histogram

=

slide-34
SLIDE 34

Managed by UT-Battelle for the Department of Energy

Kernel Timings with Respect of Number of Keypoints

50 100 150 200 250 300 350 400 10000 20000 30000 40000 50000 60000

time in seconds number of keypoints

Kernel timings

K1 K2 5 10 15 20 25 30 35 2000 4000 6000 8000 10000 17.272

slide-35
SLIDE 35

Managed by UT-Battelle for the Department of Energy

Summary for Registration

 Registration refine the adjustment of the sensor model

– Problematic: joint histogram computation for each solution

 No compromise on the number of bins - 65536  Exhaustive search

– Solution: leverage of the K80 specifications

 12 GB of memory  1 block per solution  Leverage the number of values of the descriptors 121 (maximum) << 65536  Less than 100 seconds - 65K keypoints – computes 260M NMI coefficients  About 10K keypoints in less than 20 seconds

slide-36
SLIDE 36

Managed by UT-Battelle for the Department of Energy

PRIMUS Pipeline

 Photogrammetric Registration of Imagery from Manned and Unmanned Systems

PRIMUS Input NITF Source Selection Global Localization Registration Resection Metadata Output NITF R2D2 Reprojection Orthorectification Mosaicking Controlled Sources Core Libraries:

  • NITRO (Glycerin)
  • GDAL
  • Proj.4
  • libpq (Postgres)
  • OpenCV
  • CUDA
  • OpenMP
  • CSM
  • MSP

GPU Implementation Preprocessing CPU Implementation

slide-37
SLIDE 37

Managed by UT-Battelle for the Department of Energy

PRIMUS Pipeline Timings

D1

Source: 200 x 131 Solution space: 6834 Source: 3600 x 2674

D2

Source: 258 x 67 Solution space: 4250 Source: 4571 x 1555

D3

Source: 259 x 88 Solution space: 5980 Source: 4725 x 1607

D4

Source: 318 x 92 Solution space: 5745 Source: 5745 x 1954

Global Localization Registration

slide-38
SLIDE 38

Managed by UT-Battelle for the Department of Energy

PRIMUS Pipeline Timings

5 10 15 20 25 30 35 40 D1 D2 D3 D4

time in seconds

Pipeline Timings

Misc Resection Registration GlobalLocalization SourceSelection

Source Images

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% D1 D2 D3 D4

Percentage for each module

Source Images

slide-39
SLIDE 39

Managed by UT-Battelle for the Department of Energy

Questions?