Sergio Ruiz Computer Science Department Tecnolgico de Monterrey, - - PowerPoint PPT Presentation

sergio ruiz computer science department tecnol gico de
SMART_READER_LITE
LIVE PREVIEW

Sergio Ruiz Computer Science Department Tecnolgico de Monterrey, - - PowerPoint PPT Presentation

GPU A GPU Acce cceler lerated Mar ted Markov v Decision Decision Pr Proce ocesse sses s in C in Crowd d Simula Simulation tion Sergio Ruiz Computer Science Department Tecnolgico de Monterrey, CCM Mexico City, Mxico


slide-1
SLIDE 1

GPU A GPU Acce cceler lerated Mar ted Markov v Decision Decision Pr Proce

  • cesse

sses s in C in Crowd d Simula Simulation tion

Sergio Ruiz

Computer Science Department Tecnológico de Monterrey, CCM Mexico City, México sergio.ruiz.loza@itesm.mx Benjamín Hernández National Center for Computational Sciences Oak Ridge National Laboratory Tennessee, USA hernandezarb@ornl.gov

slide-2
SLIDE 2

2

Contents

  • Introduction
  • Optimization Approaches
  • Problem solving strategy
  • A simple example
  • Algorithm description
  • Results
  • Conclusions & future work
slide-3
SLIDE 3

3

Cr Crowd d Simula Simulation tion

Path Planning Local Collision Avoidance (LCA)

slide-4
SLIDE 4

4

Optimization Approaches

  • According to (Reyes et al. 2009, Foka and Trahanias 2003),

Markov Decision Processes (MDPs) are computationally inefficient: as the state space grows, the problem becomes intractable.

  • Decomposition offers the possibility to solve large MDPs (Sucar

2007, Meuleau et al. 1998, Singh and Cohn 1998), either in State Space decomposition, or Process decomposition.

  • (Mausam and Weld. 2004) follow the idea of concurrency to

solve MDPs generating solutions close to optimal extending the Labeled Real-time Dynamic Programming method.

slide-5
SLIDE 5

5

Optimization Approaches

  • (Sucar 2007) proposes a parallel implementation of weakly

coupled MDPs.

  • (Jóhansson 2009) presents a dynamic programming framework

that implements the Value Iteration algorithm to solve MDPs using CUDA.

  • (Noer 2013) explores the design and implementation of a point-

based Value Iteration algorithm for Partially Observable MDPs (POMDPs) with approximate solutions. The GPU implementation supports belief stat pruning which avoids calculations.

slide-6
SLIDE 6

6

Pr Problem

  • blem Solving Str

Solving Strate tegy

  • We

propose a parallel Value Iteration MDP solving algorithm to guide groups of agents toward assigned goals while avoiding

  • bstacles interactively. For optimal

performance the algorithm is run

  • ver a hexagonal grid in the context
  • f a Fully Observable MDP.
slide-7
SLIDE 7

7

Pr Problem

  • blem Solving Str

Solving Strate tegy

  • A Markov Decision Process is a tuple 𝑁 = 𝑇, 𝐵, 𝑈, 𝑆
  • S is a finite set of states. In our case, 2D cells.
  • A is a finite set of actions. In our case, 6 directions.
  • T is a transition model T(s, a, s’).
  • R is a reward function R(s).
  • A policy 𝜌 is a solution that specifies the action for an agent

at a given state.

  • 𝜌∗ is the optimal policy.

Transition

slide-8
SLIDE 8

8

Pr Problem

  • blem Solving Str

Solving Strate tegy

States Value Iteration

𝜌𝑢

∗ 𝑡 = 𝑏𝑠𝑕𝑛𝑏𝑦𝑏𝑅𝑢 𝑡, 𝑏

𝑅𝑢 𝑡, 𝑏 = 𝑆 𝑡, 𝑏 + 𝛿 𝑈

𝑡𝑘 𝑏𝑊 𝑢−1 𝑘 5 𝑘=0

𝑊

𝑢 𝑡 = 𝑅𝑢 𝑡, 𝜌∗ 𝑡

; 𝑊

0 𝑡 = 0

slide-9
SLIDE 9

9

Pr Problem

  • blem Solving Str

Solving Strate tegy

  • We propose to temporarily override the optimal policy when

agent density in a cell is above a certain threshold 𝝉.

slide-10
SLIDE 10

10

A simplified example

1 2 3 4 a

  • 3
  • 3
  • 3

+100

b

  • 3
  • 3
  • 100

c

  • 3
  • 3
  • 3
  • 3

A = { N, W, E } 𝛿 = 1 (for simplicity) Transitions: p = 0.8 (probability of taking a current action) q = 0.1 (probability of taking another action)

𝜌𝑢

∗ 𝑡 = 𝑏𝑠𝑕𝑛𝑏𝑦𝑏𝑅𝑢 𝑡, 𝑏

𝑅𝑢 𝑡, 𝑏 = 𝑆 𝑡, 𝑏 + 𝛿 𝑈𝑡𝑘

𝑏 𝑊 𝑢−1 𝑘 2 𝑘=0

What is 𝜌 for cell a3 a3 ? 𝜌 𝑏3 = max{𝑅 𝑏3, 𝑋 , 𝑅 𝑏3, 𝑂 , 𝑅 𝑏3, 𝐹 } 𝑅 𝑏3, 𝑭 = 100 + 1.0(0. 0.8(1 8(100) ) + 0.1(-3) + 0.1(0)) 𝑅 𝑏3, 𝑿 = -3 + 1.0 (0. 0.1(1 1(100) ) + 0.8(-3) + 0.1(0)) 𝑅 𝑏3, 𝑶 = 0 + 1.0 (0. 0.1( 1(100) ) + 0.1(-3) + 0.8(0)) => max is 𝑅 𝑏3, 𝑭

𝑅 𝑏3, 𝐅 = 100 + 1.0 ( 0. 0.8(100) ) + 0.1(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐗 = -3 + 1.0 ( 0. 0.1(100) ) + 0.8(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐎 = 0 + 1.0 ( 0. 0.1(100) ) + 0.1(-3) + 0.8(0) ) 𝑆 𝑡, 𝑏 𝛿 𝑈𝑡𝑘

𝑏𝑊 𝑘 2 𝑘=0

slide-11
SLIDE 11

11

Algorithm

– Data collect: current cell needs to know rewards from neighboring cells and out of bound values. – Input generation: build 𝑈𝑡𝑘

𝑏 and 𝑆 𝑡, 𝑏 = 𝑆𝑋

– Value Iteration: optimal policy computed using parallel transformations and parallel reduction by key.

𝑅 𝑏3, 𝐅 = 100 + 1.0 ( 0. 0.8(100) ) + 0.1(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐗 = -3 + 1.0 ( 0. 0.1(100) ) + 0.8(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐎 = 0 + 1.0 ( 0. 0.1(100) ) + 0.1(-3) + 0.8(0) ) 𝑆 𝑡, 𝑏 𝛿 𝑈𝑡𝑘

𝑏𝑊 𝑘 2 𝑘=0

slide-12
SLIDE 12

12

Al Algorithm: gorithm: input gener input generation tion

  • Transition matrix requirements:

𝑈

𝑄 =

𝑞 ⋯ 𝑞 ⋮ ⋱ ⋮ 𝑞 ⋯ 𝑞 𝑈

𝑠,𝑑 𝑅 =

𝑟𝑗 ⋯ 𝑟𝑗 ⋮ ⋱ ⋮ 𝑟𝑗 ⋯ 𝑟𝑗 𝐸𝐵 = 1 ⋯ ⋮ ⋱ ⋮ ⋯ 1 𝐸𝐶 = ⋯ 1 ⋮ ⋱ ⋮ 1 ⋯ Dimensions: |A|x|A| i.e. each cell can compute neighboring info 𝑠 ∈ 1, 𝑁𝐸𝑄

𝑠𝑝𝑥𝑡 𝑟𝑗 = 𝑟 𝑆𝐹𝑗−1

𝑑 ∈ 1, 𝑁𝐸𝑄𝑑𝑝𝑚𝑣𝑛𝑜𝑡

slide-13
SLIDE 13

13

Al Algorithm: gorithm: input gener input generation tion

where 𝑈

𝑠,𝑑 = 𝑈 𝑞 ∘ 𝐸 𝐵 + 𝑈 𝑠,𝑑 𝑅 ∘ 𝐸𝐶 =

𝑞 𝑟 𝑟 𝑟 𝑞 𝑟 𝑟 𝑟 𝑞

𝑅 𝑏3, 𝐅 = 100 + 1.0 ( 0.8 .8(100) + 0.1(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐗 = -3 + 1.0 ( 0.1 .1(100) ) + 0.8(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐎 = 0 + 1.0 ( 0.1 .1(100) ) + 0.1(-3) + 0.8(0) ) Transition matrix 𝑈

𝑡𝑘 𝑏 computation:

𝑈𝑡𝑘

𝑏 =

𝑈

1,1

⋯ 𝑈

1,𝑁𝐸𝑄𝑑𝑝𝑚𝑣𝑛𝑜𝑡

⋮ ⋱ ⋮ 𝑈𝑁𝐸𝑄𝑠𝑝𝑥𝑡,1 ⋯ 𝑈𝑁𝐸𝑄𝑠𝑝𝑥𝑡,𝑁𝐸𝑄𝑑𝑝𝑚𝑣𝑛𝑜𝑡

Represents a Cell

slide-14
SLIDE 14

14

Al Algorithm: gorithm: Par arallel allel Value Iter alue Iteration tion

  • 1. Computation of Q-values.

𝜌𝑢 = 𝑆𝑋 + 𝛿 𝑈𝑡𝑘

𝑏 𝑊

Consecutive parallel transformations (mult, mult, sum) results in a matrix Q that stores |A|-tuple of policies for taking all actions per each cell.

slide-15
SLIDE 15

15

Al Algorithm: gorithm: Par arallel allel Value Iter alue Iteration tion

  • 2. Selection of best Q-values.

– Parallel reduction: from every consecutive |A|-tuple in 𝜌𝑢, the largest value index indicates current best policy.

  • 3. Check for convergence.

– If 𝜌𝑢 − 𝜌𝑢−1 = [0, … , 0]

slide-16
SLIDE 16

16

Cr Crowd d Na Navig vigation tion Video ideo

https://www.youtube.com/watch?v=369td2O8dxY

slide-17
SLIDE 17

17

Results: esults: test s test scenarios cenarios

Office (1,584 cells) Maze (100x100 cells) Champ de Mars (100x100 cells)

Implementation: CUDA Thrust, OpenMP and CUDA Backbends CPU: Intel Core i7 CPU running at 3.40GHz. ARM (Jetson TK1): 32 bit ARM quad-core Cortex-A15 CPU running at 2.32GHz. GPUs: Tegra K1 192 CUDA Cores, Tesla K40c 2880 CUDA cores, Geforce GTX TITAN 2688 CUDA cores.

slide-18
SLIDE 18

18

Results: esults: GPU GPU perf perfor

  • rmance

mance

slide-19
SLIDE 19

19

Results: esults: GPU GPU speedup speedup

Intel CPU baseline: 8 threads ARM CPU baseline: 4 threads

slide-20
SLIDE 20

20

Conc Conclusion lusion

  • Parallelization of the proposed algorithm was made

possible by formulating it in terms of matrix

  • perations,

leveraging the “massive” data parallelism in GPU computing to reduce the MDP solution time.

  • We

demonstrated that standard parallel transformation and reduction operations provide the means to solve MDPs via Value Iteration with

  • ptimal performance.
slide-21
SLIDE 21

21

Conc Conclusion lusion

  • Taking advantage of the proposed hexagonal grid

partitioning method, our implementation provides a good level of space discretization and performance.

  • We obtained a 90x speed up using GPUs enabling us to

simulate crowd behavior interactively.

  • We found the Jetson TK1 GPU to have a remarkable

performance, opening many possibilities to incorporate real-time MDP solvers in mobile robotics.

slide-22
SLIDE 22

22

Futur Future W e Wor

  • rk
  • Reinforcement

learning. Evaluate different parameter values to obtain policy convergence in the least number of iterations without losing precision in the generated paths.

  • Couple the MDP solver with a Local Collision

Avoidance method to

  • btain

more precise simulation results at microscopic level.

  • Investigate further applications of our MDP solver

beyond the context of crowd simulation.

slide-23
SLIDE 23

GPU GPU Acceler Accelerated Mar ted Markov v Decision Decision Pr Proces

  • cess

s in in Cr Crowd d Simula Simulation tion

Sergio Ruiz

Computer Science Department Tecnológico de Monterrey, CCM Mexico City, México sergio.ruiz.loza@itesm.mx Benjamín Hernández National Center for Computational Sciences Oak Ridge National Laboratory Tennessee, USA hernandezarb@ornl.gov

Thank you!

This research was partially supported by: CONACyT SNI-54067, CONACyT PhD scholarship 375247, Nvidia Hardware Grant and Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, under DOE Contract No. DE-AC05-00OR22725. Further reading: Ruiz, S. Hernandez, B. “A parallel solver for Markov Decision Process in Crowd Simulation” MICAI 2015, 14th Mexican International Conference on Artificial Intelligence, At Cuernavaca, Mexico, IEEE volume: ISBN 978-1-5090-0323-5

slide-24
SLIDE 24

24

Ad Additional R ditional Results: esults: Intel Intel CPU CPU

slide-25
SLIDE 25

25

Ad Additional R ditional Results: esults: AR ARM M CPU CPU