sergio ruiz computer science department tecnol gico de
play

Sergio Ruiz Computer Science Department Tecnolgico de Monterrey, - PowerPoint PPT Presentation

GPU A GPU Acce cceler lerated Mar ted Markov v Decision Decision Pr Proce ocesse sses s in C in Crowd d Simula Simulation tion Sergio Ruiz Computer Science Department Tecnolgico de Monterrey, CCM Mexico City, Mxico


  1. GPU A GPU Acce cceler lerated Mar ted Markov v Decision Decision Pr Proce ocesse sses s in C in Crowd d Simula Simulation tion Sergio Ruiz Computer Science Department Tecnológico de Monterrey, CCM Mexico City, México sergio.ruiz.loza@itesm.mx Benjamín Hernández National Center for Computational Sciences Oak Ridge National Laboratory Tennessee, USA hernandezarb@ornl.gov

  2. Contents • Introduction • Optimization Approaches • Problem solving strategy • A simple example • Algorithm description • Results • Conclusions & future work 2

  3. Cr Crowd d Simula Simulation tion Path Planning Local Collision Avoidance (LCA) 3

  4. Optimization Approaches • According to (Reyes et al. 2009, Foka and Trahanias 2003), Markov Decision Processes (MDPs) are computationally inefficient: as the state space grows, the problem becomes intractable. • Decomposition offers the possibility to solve large MDPs (Sucar 2007, Meuleau et al. 1998, Singh and Cohn 1998), either in State Space decomposition, or Process decomposition. • (Mausam and Weld. 2004) follow the idea of concurrency to solve MDPs generating solutions close to optimal extending the Labeled Real-time Dynamic Programming method. 4

  5. Optimization Approaches • (Sucar 2007) proposes a parallel implementation of weakly coupled MDPs. • (Jóhansson 2009) presents a dynamic programming framework that implements the Value Iteration algorithm to solve MDPs using CUDA. • (Noer 2013) explores the design and implementation of a point- based Value Iteration algorithm for Partially Observable MDPs (POMDPs) with approximate solutions. The GPU implementation supports belief stat pruning which avoids calculations. 5

  6. Pr Problem oblem Solving Str Solving Strate tegy • We propose a parallel Value Iteration MDP solving algorithm to guide groups of agents toward assigned goals while avoiding obstacles interactively. For optimal performance the algorithm is run over a hexagonal grid in the context of a Fully Observable MDP. 6

  7. Pr Problem oblem Solving Str Solving Strate tegy A Markov Decision Process is a tuple 𝑁 = 𝑇, 𝐵, 𝑈, 𝑆 • • S is a finite set of states. In our case, 2D cells. • A is a finite set of actions. In our case, 6 directions. • T is a transition model T(s, a, s’) . • R is a reward function R(s). A policy 𝜌 is a solution that specifies the action for an agent • at a given state. 𝜌 ∗ is the optimal policy. • Transition 7

  8. Problem Pr oblem Solving Str Solving Strate tegy States Value Iteration ∗ 𝑡 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑏 𝑅 𝑢 𝑡, 𝑏 𝜌 𝑢 5 𝑏 𝑊 𝑅 𝑢 𝑡, 𝑏 = 𝑆 𝑡, 𝑏 + 𝛿 𝑈 𝑢−1 𝑘 𝑡𝑘 𝑘=0 𝑢 𝑡 = 𝑅 𝑢 𝑡, 𝜌 ∗ 𝑡 𝑊 ; 𝑊 0 𝑡 = 0 8

  9. Pr Problem oblem Solving Str Solving Strate tegy • We propose to temporarily override the optimal policy when agent density in a cell is above a certain threshold 𝝉 . 9

  10. A simplified example ∗ 𝑡 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑏 𝑅 𝑢 𝑡, 𝑏 𝜌 𝑢 2 𝑏 𝑊 𝑅 𝑢 𝑡, 𝑏 = 𝑆 𝑡, 𝑏 + 𝛿 𝑈 𝑡𝑘 𝑢−1 𝑘 1 2 3 4 𝑘=0 a -3 -3 -3 +100 What is 𝜌 for cell a3 a3 ? b -3 -3 -100 𝜌 𝑏3 = max{𝑅 𝑏3, 𝑋 , 𝑅 𝑏3, 𝑂 , 𝑅 𝑏3, 𝐹 } c -3 -3 -3 -3 𝑅 𝑏3, 𝑭 = 100 + 1.0(0. 0.8(1 8(100) ) + 0.1(-3) + 0.1(0)) 𝑅 𝑏3, 𝑿 = -3 + 1.0 (0. 0.1(1 1(100) ) + 0.8(-3) + 0.1(0)) A = { N , W, E } 𝑅 𝑏3, 𝑶 = 0 + 1.0 (0. 0.1( 1(100) ) + 0.1(-3) + 0.8(0)) 𝛿 = 1 (for simplicity) => max is 𝑅 𝑏3, 𝑭 Transitions: p = 0.8 (probability of taking a current action) q = 0.1 (probability of taking another action) 𝑅 𝑏3, 𝐅 = 100 + 1.0 ( 0. 0.8(100) ) + 0.1(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐗 = -3 + 1.0 ( 0. 0.1(100) ) + 0.8(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐎 = 0 + 1.0 ( 0. 0.1(100) ) + 0.1(-3) + 0.8(0) ) 2 𝛿 𝑆 𝑡, 𝑏 𝑏 𝑊 𝑘 𝑈 𝑡𝑘 𝑘=0 10

  11. Algorithm 𝑅 𝑏3, 𝐅 = 100 + 1.0 ( 0. 0.8(100) ) + 0.1(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐗 = -3 + 1.0 ( 0. 0.1(100) ) + 0.8(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐎 = 0 + 1.0 ( 0. 0.1(100) ) + 0.1(-3) + 0.8(0) ) 2 𝛿 𝑆 𝑡, 𝑏 𝑏 𝑊 𝑘 𝑈 𝑡𝑘 𝑘=0 – Data collect: current cell needs to know rewards from neighboring cells and out of bound values. 𝑏 and 𝑆 𝑡, 𝑏 = 𝑆𝑋 – Input generation: build 𝑈 𝑡𝑘 – Value Iteration: optimal policy computed using parallel transformations and parallel reduction by key. 11

  12. Al Algorithm: gorithm: input gener input generation tion • Transition matrix requirements: 𝑞 ⋯ 𝑞 𝑟 𝑗 ⋯ 𝑟 𝑗 𝑅 = ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ 𝑈 𝑄 = 𝑈 𝑠,𝑑 𝑞 ⋯ 𝑞 𝑟 𝑗 ⋯ 𝑟 𝑗 1 ⋯ 0 0 ⋯ 1 𝐸 𝐵 = 𝐸 𝐶 = ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ 0 ⋯ 1 1 ⋯ 0 Dimensions: |A|x|A| i.e. each cell can compute neighboring info 𝑟 𝑠 ∈ 1, 𝑁𝐸𝑄 𝑠𝑝𝑥𝑡 𝑟 𝑗 = 𝑆𝐹 𝑗 −1 𝑑 ∈ 1, 𝑁𝐸𝑄 𝑑𝑝𝑚𝑣𝑛𝑜𝑡 12

  13. Algorithm: Al gorithm: input gener input generation tion 𝑞 𝑟 𝑟 𝑅 ∘ 𝐸 𝐶 = 𝑟 𝑞 𝑟 where 𝑈 𝑠,𝑑 = 𝑈 𝑞 ∘ 𝐸 𝐵 + 𝑈 𝑠,𝑑 𝑟 𝑟 𝑞 𝑅 𝑏3, 𝐅 = 100 + 1.0 ( 0.8 .8(100) + 0.1(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐗 = -3 + 1.0 ( 0.1 .1(100) ) + 0.8(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐎 = 0 + 1.0 ( 0.1 .1(100) ) + 0.1(-3) + 0.8(0) ) 𝑏 computation: Transition matrix 𝑈 𝑡𝑘 Represents a Cell 𝑈 ⋯ 𝑈 1,1 1,𝑁𝐸𝑄 𝑑𝑝𝑚𝑣𝑛𝑜𝑡 𝑏 = ⋮ ⋱ ⋮ 𝑈 𝑡𝑘 𝑈 𝑁𝐸𝑄 𝑠𝑝𝑥𝑡 ,1 ⋯ 𝑈 𝑁𝐸𝑄 𝑠𝑝𝑥𝑡 ,𝑁𝐸𝑄 𝑑𝑝𝑚𝑣𝑛𝑜𝑡 13

  14. Algorithm: Al gorithm: Par arallel allel Value Iter alue Iteration tion 1. Computation of Q-values . 𝜌 𝑢 = 𝑆𝑋 + 𝛿 𝑈 𝑡𝑘 𝑏 𝑊 Consecutive parallel transformations (mult, mult, sum) results in a matrix Q that stores |A| -tuple of policies for taking all actions per each cell. 14

  15. Al Algorithm: gorithm: Par arallel allel Value Iter alue Iteration tion 2. Selection of best Q-values . – Parallel reduction: from every consecutive |A|-tuple in 𝜌 𝑢 , the largest value index indicates current best policy. 3. Check for convergence . – If 𝜌 𝑢 − 𝜌 𝑢−1 = [0, … , 0] 15

  16. Cr Crowd d Na Navig vigation tion Video ideo https://www.youtube.com/watch?v=369td2O8dxY 16

  17. Results: esults: test s test scenarios cenarios Office (1,584 cells) Maze (100x100 Champ de Mars cells) (100x100 cells) Implementation: CUDA Thrust, OpenMP and CUDA Backbends CPU: Intel Core i7 CPU running at 3.40GHz. ARM (Jetson TK1): 32 bit ARM quad-core Cortex-A15 CPU running at 2.32GHz. GPUs: Tegra K1 192 CUDA Cores, Tesla K40c 2880 CUDA cores, Geforce GTX TITAN 2688 CUDA cores. 17

  18. Results: esults: GPU GPU perf perfor ormance mance 18

  19. Results: esults: GPU GPU speedup speedup Intel CPU baseline: 8 threads ARM CPU baseline: 4 threads 19

  20. Conc Conclusion lusion • Parallelization of the proposed algorithm was made possible by formulating it in terms of matrix “massive” operations, leveraging the data parallelism in GPU computing to reduce the MDP solution time. • We demonstrated that standard parallel transformation and reduction operations provide the means to solve MDPs via Value Iteration with optimal performance. 20

  21. Conc Conclusion lusion • Taking advantage of the proposed hexagonal grid partitioning method, our implementation provides a good level of space discretization and performance. • We obtained a 90x speed up using GPUs enabling us to simulate crowd behavior interactively. • We found the Jetson TK1 GPU to have a remarkable performance, opening many possibilities to incorporate real-time MDP solvers in mobile robotics. 21

  22. Futur Future W e Wor ork • Reinforcement learning. Evaluate different parameter values to obtain policy convergence in the least number of iterations without losing precision in the generated paths. • Couple the MDP solver with a Local Collision Avoidance method to obtain more precise simulation results at microscopic level. • Investigate further applications of our MDP solver beyond the context of crowd simulation. 22

  23. GPU GPU Acceler Accelerated Mar ted Markov v Decision Decision Pr Proces ocess s in in Cr Crowd d Simula Simulation tion Further reading: Ruiz, S. Hernandez, B . “A parallel solver for Markov Decision Process in Crowd Simulation” MICAI 2015, 14th Mexican International Conference on Artificial Intelligence, At Cuernavaca, Mexico, IEEE volume: ISBN 978-1-5090-0323-5 Thank you! Benjamín Hernández Sergio Ruiz National Center for Computational Sciences Computer Science Department Oak Ridge National Laboratory Tecnológico de Monterrey, CCM Tennessee, USA Mexico City, México hernandezarb@ornl.gov sergio.ruiz.loza@itesm.mx This research was partially supported by: CONACyT SNI-54067, CONACyT PhD scholarship 375247, Nvidia Hardware Grant and Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, under DOE Contract No. DE-AC05-00OR22725.

  24. Ad Additional R ditional Results: esults: Intel Intel CPU CPU 24

  25. Ad Additional R ditional Results: esults: AR ARM M CPU CPU 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend