prefix sums on gpus
play

Prefix sums on GPUs Motivating Problem Definitions Other - PowerPoint PPT Presentation

Prefix sums on GPUs Bruce Merry Definition and Applications Prefix sums on GPUs Motivating Problem Definitions Other Applications Parallel Algorithms Bruce Merry Kogge-Stone Brent-Kung GPU Strategies Department of Computer Science,


  1. Prefix sums on GPUs Bruce Merry Definition and Applications Prefix sums on GPUs Motivating Problem Definitions Other Applications Parallel Algorithms Bruce Merry Kogge-Stone Brent-Kung GPU Strategies Department of Computer Science, University of Cape Town Reduce-then-Scan Two-Level Prefix Sum GPGPU2 Workshop 2014 Summary

  2. Outline Prefix sums on GPUs Bruce Merry Definition and Applications 1 Motivating Problem Definition and Applications Definitions Motivating Problem Definitions Other Applications Other Applications Parallel Algorithms Parallel Algorithms 2 Kogge-Stone Brent-Kung Kogge-Stone GPU Brent-Kung Strategies Reduce-then-Scan Two-Level Prefix Sum 3 GPU Strategies Summary Reduce-then-Scan Two-Level Prefix Sum

  3. Outline Prefix sums on GPUs Bruce Merry Definition and Applications 1 Motivating Problem Definition and Applications Definitions Motivating Problem Definitions Other Applications Other Applications Parallel Algorithms Parallel Algorithms 2 Kogge-Stone Brent-Kung Kogge-Stone GPU Brent-Kung Strategies Reduce-then-Scan Two-Level Prefix Sum 3 GPU Strategies Summary Reduce-then-Scan Two-Level Prefix Sum

  4. Problem Statement Prefix sums on GPUs Bruce Merry Definition and Applications Motivating Problem Definitions For every object in a set, output a list of the other objects Other Applications Parallel that differ by less than some amount. Algorithms This is deliberately vague: could be for n-body simulation, Kogge-Stone Brent-Kung clustering, scattered data interpolation. GPU Strategies Reduce-then-Scan Two-Level Prefix Sum Summary

  5. Problem Statement Prefix sums on GPUs Bruce Merry Definition and Applications Motivating Problem Definitions For every object in a set, output a list of the other objects Other Applications Parallel that differ by less than some amount. Algorithms This is deliberately vague: could be for n-body simulation, Kogge-Stone Brent-Kung clustering, scattered data interpolation. GPU Strategies Reduce-then-Scan Two-Level Prefix Sum Summary

  6. Output Format Prefix sums on GPUs Bruce Merry Definition and Applications Motivating Problem The lists should be packed together contiguously. Definitions Other Applications Parallel A0 A1 A2 Algorithms Kogge-Stone Brent-Kung Assuming one workitem per object, how do the workitems GPU Strategies know where to start? Reduce-then-Scan Two-Level Prefix Sum Summary

  7. Output Format Prefix sums on GPUs Bruce Merry Definition and Applications Motivating Problem The lists should be packed together contiguously. Definitions Other Applications Parallel A0 A1 A2 B0 B1 Algorithms Kogge-Stone Brent-Kung Assuming one workitem per object, how do the workitems GPU Strategies know where to start? Reduce-then-Scan Two-Level Prefix Sum Summary

  8. Output Format Prefix sums on GPUs Bruce Merry Definition and Applications Motivating Problem The lists should be packed together contiguously. Definitions Other Applications Parallel A0 A1 A2 B0 B1 D0 Algorithms Kogge-Stone Brent-Kung Assuming one workitem per object, how do the workitems GPU Strategies know where to start? Reduce-then-Scan Two-Level Prefix Sum Summary

  9. Output Format Prefix sums on GPUs Bruce Merry Definition and Applications Motivating Problem The lists should be packed together contiguously. Definitions Other Applications Parallel A0 A1 A2 B0 B1 D0 E0 E1 Algorithms Kogge-Stone Brent-Kung Assuming one workitem per object, how do the workitems GPU Strategies know where to start? Reduce-then-Scan Two-Level Prefix Sum Summary

  10. Output Format Prefix sums on GPUs Bruce Merry Definition and Applications Motivating Problem The lists should be packed together contiguously. Definitions Other Applications Parallel A0 A1 A2 B0 B1 D0 E0 E1 Algorithms Kogge-Stone Brent-Kung Assuming one workitem per object, how do the workitems GPU Strategies know where to start? Reduce-then-Scan Two-Level Prefix Sum Summary

  11. Solution Prefix sums on GPUs Bruce Merry Definition and This can be solved with a multi-pass approach: Applications Motivating Problem Definitions 1 Every workitem counts how many records to emit, and Other Applications writes this number to a buffer. Parallel Algorithms 2 The buffer is processed to determine the start position Kogge-Stone Brent-Kung for each object, and writes this position to a buffer. GPU Strategies 3 Each workitem reads this buffer, and emits its records Reduce-then-Scan Two-Level Prefix in the right place. Sum Summary

  12. Solution Prefix sums on GPUs Bruce Merry Definition and This can be solved with a multi-pass approach: Applications Motivating Problem Definitions 1 Every workitem counts how many records to emit, and Other Applications writes this number to a buffer. Parallel Algorithms 2 The buffer is processed to determine the start position Kogge-Stone Brent-Kung for each object, and writes this position to a buffer. GPU Strategies 3 Each workitem reads this buffer, and emits its records Reduce-then-Scan Two-Level Prefix in the right place. Sum Summary

  13. Solution Prefix sums on GPUs Bruce Merry Definition and This can be solved with a multi-pass approach: Applications Motivating Problem Definitions 1 Every workitem counts how many records to emit, and Other Applications writes this number to a buffer. Parallel Algorithms 2 The buffer is processed to determine the start position Kogge-Stone Brent-Kung for each object, and writes this position to a buffer. GPU Strategies 3 Each workitem reads this buffer, and emits its records Reduce-then-Scan Two-Level Prefix in the right place. Sum Summary

  14. Outline Prefix sums on GPUs Bruce Merry Definition and Applications 1 Motivating Problem Definition and Applications Definitions Motivating Problem Definitions Other Applications Other Applications Parallel Algorithms Parallel Algorithms 2 Kogge-Stone Brent-Kung Kogge-Stone GPU Brent-Kung Strategies Reduce-then-Scan Two-Level Prefix Sum 3 GPU Strategies Summary Reduce-then-Scan Two-Level Prefix Sum

  15. Exclusive Prefix Sum Prefix sums on GPUs Given an operator ⊕ and an identity element I , the exclusive Bruce Merry prefix sum of ( a 0 , a 1 , . . . , a n − 1 ) is Definition and Applications   i − 1 Motivating Problem � Definitions ( I , a 0 , a 0 ⊕ a 1 , a 0 ⊕ a 1 ⊕ a 2 , . . . , a 0 ⊕ · · · ⊕ a n − 2 ) = a j   Other Applications j = 0 Parallel Algorithms Kogge-Stone In other words, element i is the sum of all elements strictly Brent-Kung GPU before i . Strategies Reduce-then-Scan Two-Level Prefix Sum 4 3 7 9 2 3 Summary 0 4 7 14 23 25

  16. Exclusive Prefix Sum Prefix sums on GPUs Given an operator ⊕ and an identity element I , the exclusive Bruce Merry prefix sum of ( a 0 , a 1 , . . . , a n − 1 ) is Definition and Applications   i − 1 Motivating Problem � Definitions ( I , a 0 , a 0 ⊕ a 1 , a 0 ⊕ a 1 ⊕ a 2 , . . . , a 0 ⊕ · · · ⊕ a n − 2 ) = a j   Other Applications j = 0 Parallel Algorithms Kogge-Stone In other words, element i is the sum of all elements strictly Brent-Kung GPU before i . Strategies Reduce-then-Scan Two-Level Prefix Sum 4 3 7 9 2 3 Summary 0 4 7 14 23 25

  17. Inclusive Prefix Sum Prefix sums on GPUs Given an operator ⊕ and an identity element I , the inclusive Bruce Merry prefix sum of ( a 0 , a 1 , . . . , a n − 1 ) is Definition and Applications   i Motivating Problem � Definitions ( a 0 , a 0 ⊕ a 1 , a 0 ⊕ a 1 ⊕ a 2 , . . . , a 0 ⊕ · · · ⊕ a n − 1 ) = a j   Other Applications j = 0 Parallel Algorithms Kogge-Stone In other words, element i is the sum of all elements before Brent-Kung and including i . GPU Strategies Reduce-then-Scan Two-Level Prefix Sum 4 3 7 9 2 3 Summary 4 7 14 23 25 28

  18. Inclusive Prefix Sum Prefix sums on GPUs Given an operator ⊕ and an identity element I , the inclusive Bruce Merry prefix sum of ( a 0 , a 1 , . . . , a n − 1 ) is Definition and Applications   i Motivating Problem � Definitions ( a 0 , a 0 ⊕ a 1 , a 0 ⊕ a 1 ⊕ a 2 , . . . , a 0 ⊕ · · · ⊕ a n − 1 ) = a j   Other Applications j = 0 Parallel Algorithms Kogge-Stone In other words, element i is the sum of all elements before Brent-Kung and including i . GPU Strategies Reduce-then-Scan Two-Level Prefix Sum 4 3 7 9 2 3 Summary 4 7 14 23 25 28

  19. Outline Prefix sums on GPUs Bruce Merry Definition and Applications 1 Motivating Problem Definition and Applications Definitions Motivating Problem Definitions Other Applications Other Applications Parallel Algorithms Parallel Algorithms 2 Kogge-Stone Brent-Kung Kogge-Stone GPU Brent-Kung Strategies Reduce-then-Scan Two-Level Prefix Sum 3 GPU Strategies Summary Reduce-then-Scan Two-Level Prefix Sum

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend