re thinking cnn frameworks for time sensitive autonomous
play

Re-thinking CNN Frameworks for Time- Sensitive Autonomous-Driving - PowerPoint PPT Presentation

Re-thinking CNN Frameworks for Time- Sensitive Autonomous-Driving Applications: Addressing an Industrial Challenge Ming Yang 1 , Shige Wang 2 , Joshua Bakita 1 , Thanh Vu 1 , F. Donelson Smith 1 , James H. Anderson 1 , and Jan-Michael Frahm 1 1


  1. Re-thinking CNN Frameworks for Time- Sensitive Autonomous-Driving Applications: Addressing an Industrial Challenge Ming Yang 1 , Shige Wang 2 , Joshua Bakita 1 , Thanh Vu 1 , F. Donelson Smith 1 , James H. Anderson 1 , and Jan-Michael Frahm 1 1 The University of North Carolina at Chapel Hill 2 General Motors Research

  2. Re-thinking CNN Frameworks for Time- Sensitive Autonomous-Driving Applications: Addressing an Industrial Challenge Ming Yang 1 , Shige Wang 2 , Joshua Bakita 1 , Thanh Vu 1 , F. Donelson Smith 1 , James H. Anderson 1 , and Jan-Michael Frahm 1 1 The University of North Carolina at Chapel Hill 2 General Motors Research

  3. � 3 Ming Yang - RTAS 2019

  4. � 3 Ming Yang - RTAS 2019

  5. � 3 Ming Yang - RTAS 2019

  6. � 3 Ming Yang - RTAS 2019

  7. https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ � 3 Ming Yang - RTAS 2019

  8. https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ Icons made by Freepik from � 3 Ming Yang - RTAS 2019 Flaticon is licensed by CC 3.0 BY

  9. https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time Icons made by Freepik from � 3 Ming Yang - RTAS 2019 Flaticon is licensed by CC 3.0 BY

  10. https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time 2. Accuracy Icons made by Freepik from � 3 Ming Yang - RTAS 2019 Flaticon is licensed by CC 3.0 BY

  11. https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time 2. Accuracy 3. Throughput Icons made by Freepik from � 3 Ming Yang - RTAS 2019 Flaticon is licensed by CC 3.0 BY

  12. Our focus https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time 2. Accuracy 3. Throughput � 4 Ming Yang - RTAS 2019

  13. Our focus https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time Hardware 2. Accuracy resources are 3. Throughput constrained and expensive . � 4 Ming Yang - RTAS 2019

  14. Our focus https://blogs.nvidia.com/blog/2016/01/04/ automotive-nvidia-drive-px-2/ 1. Response time Hardware 2. Accuracy resources are 3. Throughput constrained and expensive . CNN software underutilizes the hardware. � 4 Ming Yang - RTAS 2019

  15. Current CNN frameworks CPU GPU � 5 Ming Yang - RTAS 2019

  16. Current CNN frameworks CPU GPU � 5 Ming Yang - RTAS 2019

  17. Current CNN frameworks CPU GPU � 5 Ming Yang - RTAS 2019

  18. Current CNN frameworks Gaps CPU GPU � 5 Ming Yang - RTAS 2019

  19. Current CNN frameworks Gaps CPU GPU Cycles not utilized � 5 Ming Yang - RTAS 2019

  20. Current CNN frameworks Gaps CPU GPU Cycles not utilized Single CNN underutilizes the hardware. � 5 Ming Yang - RTAS 2019

  21. Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 � 6 Ming Yang - RTAS 2019

  22. Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 Issues: � 6 Ming Yang - RTAS 2019

  23. Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 Issues: 1. Memory requirements multiply, limiting the number of instances. � 6 Ming Yang - RTAS 2019

  24. Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 Issues: 1. Memory requirements multiply, limiting the number of instances. 2. Context switches on GPU cause overheads. � 6 Ming Yang - RTAS 2019

  25. Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 Issues: 1. Memory requirements multiply, limiting the number of instances. 2. Context switches on GPU cause overheads. 3. Fast synchronization between cameras becomes harder. � 6 Ming Yang - RTAS 2019

  26. Traditional Multiple-Camera Processing Setup 0 Private CNN … … … Private CNN C-1 Parallelism through multi-process isn’t helping. Issues: 1. Memory requirements multiply, limiting the number of instances. 2. Context switches on GPU cause overheads. 3. Fast synchronization between cameras becomes harder. � 6 Ming Yang - RTAS 2019

  27. Proposed Solutions Part I: Parallel Execution for CNN frameworks Multi-camera Composite Images to provide Part II: high throughput for multiple cameras. � 7 Ming Yang - RTAS 2019

  28. Proposed Solutions Part I: Parallel Execution for CNN frameworks Multi-camera Composite Images to provide Part II: high throughput for multiple cameras. � 8 Ming Yang - RTAS 2019

  29. Let’s re-think the design of CNN frameworks Layer 1 Layer n Part I: Parallel Execution … Part II: Multi- camera Composite Images � 9 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  30. Let’s re-think the design of CNN frameworks Layer 1 • CNN models are graphs of Layer n Part I: layers . Parallel Execution … Part II: Multi- camera Composite Images � 9 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  31. Let’s re-think the design of CNN frameworks Layer 1 • CNN models are graphs of Layer n Part I: layers . Parallel Execution • Processing of images can be … independent , e.g., object detection. Part II: Multi- camera Composite Images � 9 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  32. Let’s re-think the design of CNN frameworks Layer 1 • CNN models are graphs of Layer n Part I: layers . Parallel Execution • Processing of images can be … independent , e.g., object detection. Part II: Multi- camera Composite Images We enable parallel execution for CNN frameworks and shared CNN for multiple cameras. � 9 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  33. Stage 0 Stage 1 Stage N-1 … • Generalize concept of 퓁 퓀 r + e y 퓁 layers into stages a r L e y a L … � 10 Ming Yang - RTAS 2019

  34. Stage 0 Stage 1 Stage N-1 Queue 1 Queue N-1 Queue 0 1 3 2 … • Generalize concept of Bookkeeping Frames layers into stages data 0 0 • Communicate data 1 Data for 1 Frame 1 2 between stages using Data for 2 Frame 2 3 Data for PGM RT (a processing 3 Frame 3 graph management tool) � 11 Ming Yang - RTAS 2019

  35. Shared CNN Cameras Detection box results … Stage 0 Stage 1 Stage N-1 0 Queue 1 Queue N-1 Queue 0 1 3 2 … … C-1 • Generalize concept of Bookkeeping Frames layers into stages data 0 0 • Communicate data 1 Data for 1 Frame 1 2 between stages using Data for 2 Frame 2 3 Data for PGM RT (a processing 3 Frame 3 graph management tool) • Share CNN among multiple cameras � 12 Ming Yang - RTAS 2019

  36. Different Execution Methods Part I: Parallel Execution Part II: Multi- camera Composite Images � 13 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  37. Different Execution Methods S ERIAL private CNN in one process Part I: Parallel Execution Part II: Multi- camera Composite Images � 13 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  38. Different Execution Methods S ERIAL private CNN in one process Part I: Parallel Execution P IPELINE shared CNN that has one thread per stage Part II: Multi- camera Composite Images � 13 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  39. Different Execution Methods S ERIAL private CNN in one process Part I: Parallel Execution P IPELINE shared CNN that has one thread per stage Part II: Multi- camera Composite P ARALLEL shared CNN that has multiple threads per stage Images � 13 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  40. CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  41. CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  42. CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  43. CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  44. CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  45. CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  46. CPU S ERIAL GPU Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  47. CPU S ERIAL GPU Concurrency Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  48. CPU S ERIAL GPU Concurrency Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

  49. CPU S ERIAL GPU Concurrency Part I: CPU P IPELINE Parallel Execution GPU Part II: Multi- camera CPU Composite P ARALLEL Images GPU � 14 Ming Yang - RTAS 2019 Ming Yang - RTAS 2019

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend