s homeland security
play

S Homeland Security Image Credit: Thinkstock.com March 17-20, GTC - PowerPoint PPT Presentation

Real-time image segmentation for Homeland Security exploiting Hyper-Q concurrency Fanny Nina-Paravecino David Kaeli NU-MGH CUDA Research Center Dept. of Electrical and Computer Engineering Northeastern University Boston, MA S Homeland


  1. Real-time image segmentation for Homeland Security exploiting Hyper-Q concurrency Fanny Nina-Paravecino David Kaeli NU-MGH CUDA Research Center Dept. of Electrical and Computer Engineering Northeastern University Boston, MA S

  2. Homeland Security Image Credit: Thinkstock.com March 17-20, GTC 2015 2 San Jose, California

  3. Homeland Security Alert! Image Credit: snallabolaget.com March 17-20, GTC 2015 3 San Jose, California

  4. Homeland Security Constraints of the input data: S Noise S Hundreds of frames per objects March 17-20, GTC 2015 4 San Jose, California

  5. Homeland Security S One key application for Homeland Security is the need to perform high quality luggage inspection at airports S This task becomes challenging since it involves the following constraints : S Near real-time response needed S Very high accuracy needed S We will explore using CUDA 6.5 and new hardware features to address these needs in this important application March 17-20, GTC 2015 5 San Jose, California

  6. Outline for this presentation S Background on the imaging analysis problem S Connected Component Analysis S Performance optimization S NVIDIA’s Hyper -Q S Performance results S Conclusion and future work March 17-20, GTC 2015 6 San Jose, California

  7. Homeland Security Image dimensions One Frame 512 x512 DICOM Image ~ 700 Images … Multiple Frames March 17-20, GTC 2015 7 San Jose, California

  8. Object Detection Pipeline Input Object Detection Preprocessing Image Segmentation Features Extraction Object Detection March 17-20, GTC 2015 8 San Jose, California

  9. Homeland Security S Image Segmentation plays a key role in the compute pipeline when performing object detection. S Multiple algorithms: S Graph-based image segmentation [Fenzenswalb04] S Level Set [Shi05] S Spectral Clustering [ Zelnik-Manor04 ] S Connected Component Labeling [ Zhao10 ] March 17-20, GTC 2015 9 San Jose, California

  10. Outline for this presentation S Background on the imaging analysis problem S Connected Component Analysis S Performance optimization S NVIDIA’s Hyper -Q S Performance results S Conclusion and future work March 17-20, GTC 2015 10 San Jose, California

  11. Connected Component Labeling S Connected component labeling is a good fit based on the constraints of the environment S Connected Component Labeling identifies neighboring segments possessing similar intensities S Potential for efficient segmentation S Provides high quality results March 17-20, GTC 2015 11 San Jose, California

  12. Connected Component Labeling A lot of dependencies among neighbors!!! 1 1 2 2 2 1 1 2 2 2 1 2 3 3 3 1 2 3 3 3 Connected Component 2 2 3 3 4 2 2 3 3 4 Labeling 2 3 4 4 4 2 3 4 4 4 2 3 2 2 2 2 3 2 2 2 7 segments Despite there are four different intensities. Groups pixels by location, and intensity March 17-20, GTC 2015 12 San Jose, California

  13. Outline for this presentation S Background on the imaging analysis problem S Connected Component Analysis S Performance optimization S NVIDIA’s Hyper -Q S Performance results S Conclusion and future work March 17-20, GTC 2015 13 San Jose, California

  14. How can we improve the performance of CCL? S Exploit inherent parallelism! S Dependencies among neighbors? S Stripe-based Connected Component Labeling [ Zhao10 ] S Re-structure of the storage labeling S Merge Strip-based approach? S Exploit CUDA’s Dynamic Parallelism S Further optimizations S Explore the potential of using Hyper-Q March 17-20, GTC 2015 14 San Jose, California

  15. Accelerated Connected Component Labeling S Two phases: S Phase 0: Find Spans S Phase 1: Merge Spans Phase 0 Phase 1 Each pair = span Input Span matrix threads 0 0 2 2 0 0 2 2 1 1 1 2 - - 1 2 - - 1 1 0 0 - - 0 0 - - 1 threads Image Binary image Spans matrix Child matrix N x M N x K N x M 1 2 1 2 Label 2 - Index 3 - 5 - Matrix 5 - Label Index N x K/2 March 17-20, GTC 2015 15 San Jose, California

  16. Phase 0: Find Spans S Each span has two elements: (y start , y end ) span x = {( y start , y end )| I ( x , y start ) = I ( x , y start + 1 ) = ... = I ( x , y end ) } S A unique label is assigned immediately 0 0 2 2 1 1 1 2 - - 1 1 0 0 - - 1 Binary image Span matrix N x M • 1 2 Reduced intermediate matrix of 3 - labels 5 - • Half the size of the span matrix Label matrix March 17-20, GTC 2015 16 San Jose, California

  17. Dynamic Parallelism S Kepler GK110 [ Whitepaper NVIDIA’s Next Generation CUDATM Compute Architecture: KeplerTM GK110 ] S Nested parallelism March 17-20, GTC 2015 Courtesy: NVIDIA 17 San Jose, California

  18. Phase 1: Merge Spans Merge Span Parent Kernel Label Label matrix matrix Spans 1 2 1 2 matrix Yes 0 0 2 2 2 - 3 - 1 2 - - 5 - 5 - No 0 0 - - One single update Next span March 17-20, GTC 2015 18 San Jose, California

  19. Outline for this presentation S Background on the imaging analysis problem S Connected Component Analysis S Performance optimization S NVIDIA’s Hyper -Q S Performance results S Conclusion and future work March 17-20, GTC 2015 19 San Jose, California

  20. Hyper-Q S Kepler: Hyper-Q working with CUDA streams [ Whitepaper NVIDIA’s Next Generation CUDATM Compute Architecture: KeplerTM GK110 ] March 17-20, GTC 2015 Courtesy: NVIDIA 20 San Jose, California

  21. When should we use Hyper-Q? S Identify kernels that have low of the device S Identify applications that can allow for concurrent kernel execution S Two tasks: S Analyze the applications S Analyze the kernels March 17-20, GTC 2015 21 San Jose, California

  22. Outline for this presentation S Background on the imaging analysis problem S Connected Component Analysis S Performance optimization S NVIDIA’s Hyper -Q S Performance results S Conclusion and future work March 17-20, GTC 2015 22 San Jose, California

  23. Accelerated Connected Component Labeling S Resources utilization per kernel S Find Spans: S SMX Activity: 27% S Occupancy: 0.11 S Merge Spans S SMX Activity: 31% S Occupancy: 0.09 March 17-20, GTC 2015 23 San Jose, California

  24. Accelerated Connected Component Labeling S Exploiting Hyper-Q Stream 1 Each stream processes 2 frames - each frame has 512 x 512 pixels Each stream processes Stream 2 2 frames - each frame has Hyper-Q 512 x 512 pixels … Each stream processes Stream N 2 frames - each frame has 512 x 512 pixels March 17-20, GTC 2015 24 San Jose, California

  25. Concurrent kernel execution Stream 1 Find Spans Merge Spans … Re-label … Stream 2 … Stream 3 March 17-20, GTC 2015 25 San Jose, California

  26. Performance Results S Speedup of a stream-based ACCL run on CUDA 6.5 vs. OpenMP with 8 threads on an Intel Core i7-3770K # Streams # Frames OpenMP CCL (s) ACCL(s) Speedup 4 8 2.72 1.35 2.01x 8 16 10.79 2.73 3.94x 16 32 42.92 5.43 7.91x 32 64 171.18 10.79 15.32x 64 128 1020.00 21.56 47.32x March 17-20, GTC 2015 26 San Jose, California

  27. Outline for this presentation S Background on the imaging analysis problem S Connected Component Analysis S Performance optimization S NVIDIA’s Hyper -Q S Performance results S Conclusion and future work March 17-20, GTC 2015 27 San Jose, California

  28. Conclusion S Improved performance of image segmentation task for baggage scanning problem S Exploited NVIDIA’s Hyper -Q feature to accelerate Connected Component Labeling S Compared an OpenMP CCL implementation with our ACCL implementation S Our algorithm scales well as long as we increase the number of streams S Kernels with low occupancy are the best fit to use Hyper-Q March 17-20, GTC 2015 28 San Jose, California

  29. Future work S Combine Hyper-Q with MPI to exploit multiple grains of parallelism using multiple GPU nodes S Evaluate additional image segmentation algorithms that address the constraints of baggage scanning March 17-20, GTC 2015 29 San Jose, California

  30. THANK YOU S Questions? S fninaparavecino@ece.neu.edu March 17-20, GTC 2015 30 San Jose, California

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend