SLIDE 24 24
Example HPC Application - GTC16
Summary
- Registration refine the re-alignment
– Problematic: joint histogram computation for each solution
- No compromise on the number of bins - 65536
- Exhaustive search
– Solution: leverage of the K80 specifications
- 12 GB of memory
- 1 block per solution
- Leverage the number of values of the descriptors
121 (maximum) << 65536
- Less than 100 seconds - 65K keypoints
260M NMI coefficients
- About 10K keypoints in less than 20 seconds
List of indices for source List of indices for the corresponding subset control Joint histogram
=
Kernel
Find the best match for all keypoints
1 block per keypoint
Optimize for the 63 x 63 search windows
64 threads / blocks – 1 idle each threads compute a “row” of solutions
Sparse joint histogram
65536 bins but only 121 values
Leverage the 11 x 11 descriptor size
Create 2 lists (length 121) of intensity values Update joint histogram count from lists Loop over lists to retrieve aggregate count Set aggregate count to 0 after first retrieval