Massively Parallel A* Search on a GPU Yichao Zhou Jianyang Zeng - - PowerPoint PPT Presentation

massively parallel a search on a gpu
SMART_READER_LITE
LIVE PREVIEW

Massively Parallel A* Search on a GPU Yichao Zhou Jianyang Zeng - - PowerPoint PPT Presentation

Massively Parallel A* Search on a GPU Yichao Zhou Jianyang Zeng Institute for Interdisciplinary Information Sciences Tsinghua University, Beijing, P. R. China Jan, 2015 Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU A Brief


slide-1
SLIDE 1

Massively Parallel A* Search on a GPU

Yichao Zhou Jianyang Zeng

Institute for Interdisciplinary Information Sciences Tsinghua University, Beijing, P. R. China

Jan, 2015

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-2
SLIDE 2

A Brief Concept to GPU Computation

Core 1 Core 2 CPU (several cores) ☺ Decent single thread performance ☹ Limited in parallelism GPU (thousands of cores) ☺ More computational units ☺ Energy efficient ☹ Need massive parallelism

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-3
SLIDE 3

Shortest Path Search Problem

Find the shortest path from the starting node s to the goal node t.

s t

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-4
SLIDE 4

Heuristic Function

s x t

Heuristic Function f (x) = g(x) + h(x) g(x): distance from the starting node s to node x; h(x): estimated distance from node x to the goal node t

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-5
SLIDE 5

Heuristic Function

s x t

Heuristic Function f (x) = g(x) + h(x) g(x): distance from the starting node s to node x; h(x): estimated distance from node x to the goal node t

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-6
SLIDE 6

Heuristic Function

s x t

Heuristic Function f (x) = g(x) + h(x) g(x): distance from the starting node s to node x; h(x): estimated distance from node x to the goal node t

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-7
SLIDE 7

A* Search Example

s t Extracted Frontier Visited Unknown

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-8
SLIDE 8

A* Search Example

s t Extracted Frontier Visited Unknown

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-9
SLIDE 9

A* Search Example

s t Extracted Frontier Visited Unknown

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-10
SLIDE 10

A* Search Example

s t Extracted Frontier Visited Unknown

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-11
SLIDE 11

A* Search Example

s t Extracted Frontier Visited Unknown

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-12
SLIDE 12

A* Search Example

s t Extracted Frontier Visited Unknown

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-13
SLIDE 13

Related Work (Parallelization of A* Search)

Solve multiple small A* search problems simultaneously on a GPU (Bleiweiss 2008) Parallelize A* on a CPU cluster (Kishimoto, Fukunaga, and Botea 2013) Node expansion on a GPU for Dijkstra (Sulewski, Edelkamp, and Kissmann 2011)

None of them makes general A* search work on a GPU!

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-14
SLIDE 14

Related Work (Parallelization of A* Search)

Solve multiple small A* search problems simultaneously on a GPU (Bleiweiss 2008) Parallelize A* on a CPU cluster (Kishimoto, Fukunaga, and Botea 2013) Node expansion on a GPU for Dijkstra (Sulewski, Edelkamp, and Kissmann 2011)

None of them makes general A* search work on a GPU!

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-15
SLIDE 15

Our Contribution The first GPU-based A* search framework

Massively Parallel Algorithm “Pure” GPU Algorithm

Data structures are stored on the GPU Minimize data transmission overhead

General A* Search Algorithm

Efficient for different problems Guarantee to find the global optimal solution

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-16
SLIDE 16

Our Contribution The first GPU-based A* search framework

Massively Parallel Algorithm “Pure” GPU Algorithm

Data structures are stored on the GPU Minimize data transmission overhead

General A* Search Algorithm

Efficient for different problems Guarantee to find the global optimal solution

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-17
SLIDE 17

Our Contribution The first GPU-based A* search framework

Massively Parallel Algorithm “Pure” GPU Algorithm

Data structures are stored on the GPU Minimize data transmission overhead

General A* Search Algorithm

Efficient for different problems Guarantee to find the global optimal solution

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-18
SLIDE 18

Our Contribution The first GPU-based A* search framework

Massively Parallel Algorithm “Pure” GPU Algorithm

Data structures are stored on the GPU Minimize data transmission overhead

General A* Search Algorithm

Efficient for different problems Guarantee to find the global optimal solution

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-19
SLIDE 19

Workflow of Traditional A* Algorithm

Priority Qveue Priority Qveue 1 Priority Qveue 3 q1 q q3 s1 . . . sk · · · sk−1 s2 · · · s1 sk t1 t2 · · · tn−1 tn t1 tn . . . f (t2) f (t1) f (tn−1) f (tn) f (t1) f (tn) Push-Back Extract Expand Deduplicate Compute Push-Back

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-20
SLIDE 20

Workflow of Traditional A* Algorithm

Priority Qveue Priority Qveue 1 Priority Qveue 3 q1 q q3 s1 . . . sk · · · sk−1 s2 · · · s1 sk t1 t2 · · · tn−1 tn t1 tn . . . f (t2) f (t1) f (tn−1) f (tn) f (t1) f (tn) Push-Back Extract Expand Deduplicate Compute Push-Back

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-21
SLIDE 21

Workflow of Traditional A* Algorithm

Priority Qveue Priority Qveue 1 Priority Qveue 3 q1 q q3 s1 . . . sk · · · sk−1 s2 · · · s1 sk t1 t2 · · · tn−1 tn t1 tn . . . f (t2) f (t1) f (tn−1) f (tn) f (t1) f (tn) Push-Back Extract Expand Deduplicate Compute Push-Back

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-22
SLIDE 22

Workflow of Traditional A* Algorithm

Priority Qveue Priority Qveue 1 Priority Qveue 3 q1 q q3 s1 . . . sk · · · sk−1 s2 · · · s1 sk t1 t2 · · · tn−1 tn t1 tn . . . f (t2) f (t1) f (tn−1) f (tn) f (t1) f (tn) Push-Back Extract Expand Deduplicate Compute Push-Back

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-23
SLIDE 23

Workflow of Traditional A* Algorithm

Priority Qveue Priority Qveue 1 Priority Qveue 3 q1 q q3 s1 . . . sk · · · sk−1 s2 · · · s1 sk t1 t2 · · · tn−1 tn t1 tn . . . f (t2) f (t1) f (tn−1) f (tn) f (t1) f (tn) Push-Back Extract Expand Deduplicate Compute Push-Back

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-24
SLIDE 24

Workflow of Traditional A* Algorithm

Priority Qveue Priority Qveue 1 Priority Qveue 3 q1 q q3 s1 . . . sk · · · sk−1 s2 · · · s1 sk t1 t2 · · · tn−1 tn t1 tn . . . f (t2) f (t1) f (tn−1) f (tn) f (t1) f (tn) Push-Back Extract Expand Deduplicate Compute Push-Back

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-25
SLIDE 25

Workflow of Traditional A* Algorithm

Priority Qveue Priority Qveue 1 Priority Qveue 3 q1 q q3 s1 . . . sk · · · sk−1 s2 · · · s1 sk t1 t2 · · · tn−1 tn t1 tn . . . f (t2) f (t1) f (tn−1) f (tn) f (t1) f (tn) Push-Back Extract Expand Deduplicate Compute Push-Back

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-26
SLIDE 26

Workflow of Traditional A* Algorithm

Priority Qveue Priority Qveue 1 Priority Qveue 3 q1 q q3 s1 . . . sk · · · sk−1 s2 · · · s1 sk t1 t2 · · · tn−1 tn t1 tn . . . f (t2) f (t1) f (tn−1) f (tn) f (t1) f (tn) Push-Back Extract Expand Deduplicate Compute Push-Back

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-27
SLIDE 27

Workflow of GPU-based A* Algorithm

Priority Qveue 2 Priority Qveue 1 Priority Qveue 3 q1 q2 q3 s2 . . . sk−1 s1 · · · · · · · · · · · · sk t1 t2 · · · · · · · · · tn−1 tn . . . . . . f (t2) f (t1) . . . f (tn−1) f (tn) Push-Back Extract Expand Deduplicate Compute Push-Back

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-28
SLIDE 28

Workflow of GPU-based A* Algorithm

Priority Qveue 2 Priority Qveue 1 Priority Qveue 3 q1 q2 q3 s2 . . . sk−1 s1 · · · · · · · · · · · · sk t1 t2 · · · · · · · · · tn−1 tn . . . . . . f (t2) f (t1) . . . f (tn−1) f (tn) Push-Back Extract Expand Deduplicate Compute Push-Back

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-29
SLIDE 29

Workflow of GPU-based A* Algorithm

Priority Qveue 2 Priority Qveue 1 Priority Qveue 3 q1 q2 q3 s2 . . . sk−1 s1 · · · · · · · · · · · · sk t1 t2 · · · · · · · · · tn−1 tn . . . . . . f (t2) f (t1) . . . f (tn−1) f (tn) Push-Back Extract Expand Deduplicate Compute Push-Back

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-30
SLIDE 30

GPU-based A* Search Example Using 3 Priority Qveues

1 1 1 1/s 1 2 3 2 2 3 t 3 2 i Extracted node from ith Qveue i Frontier node in ith Qveue Visited Unknown

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-31
SLIDE 31

GPU-based A* Search Example Using 3 Priority Qveues

1 1 1 1/s 1 2 3 2 2 3 t 3 2 i Extracted node from ith Qveue i Frontier node in ith Qveue Visited Unknown

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-32
SLIDE 32

GPU-based A* Search Example Using 3 Priority Qveues

1 1 1 1/s 1 2 3 2 2 3 t 3 2 i Extracted node from ith Qveue i Frontier node in ith Qveue Visited Unknown

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-33
SLIDE 33

GPU-based A* Search Example Using 3 Priority Qveues

1 1 1 1/s 1 2 3 2 2 3 t 3 2 i Extracted node from ith Qveue i Frontier node in ith Qveue Visited Unknown

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-34
SLIDE 34

GPU-based A* Search Example Using 3 Priority Qveues

1 1 1 1/s 1 2 3 2 2 3 t 3 2 i Extracted node from ith Qveue i Frontier node in ith Qveue Visited Unknown

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-35
SLIDE 35

Evaluation

Experiment Evaluate running time and memory usage of

traditional single-thread CPU-based A* search

  • ur GPU-based A* search

On three problems with different characteristics Environment Information CPU: Intel Xeon™ E5-1620 3.6GHz GPU: NVIDIA Tesla K20C (2496 logic cores)

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-36
SLIDE 36

Evaluation

Experiment Evaluate running time and memory usage of

traditional single-thread CPU-based A* search

  • ur GPU-based A* search

On three problems with different characteristics Environment Information CPU: Intel Xeon™ E5-1620 3.6GHz GPU: NVIDIA Tesla K20C (2496 logic cores)

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-37
SLIDE 37

Structure-Based Protein Design

Problem Description

Protein structure

Optimization Algorithm

Ile Pro His · · · Gly Gly Pro Glu Val Gly Ser Asp Pro Ala Ile Trp · · · Ile Ser Ile

1D amino acid sequence Energy Function (Optimization Target) ET =

  • ir

E1(ir) +

  • ir
  • js,i<j

E2(ir, js) Features Heuristic function is computationally expensive Tree search rather than graph search

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-38
SLIDE 38

Structure-Based Protein Design

Problem Description

Protein structure

Optimization Algorithm

Ile Pro His · · · Gly Gly Pro Glu Val Gly Ser Asp Pro Ala Ile Trp · · · Ile Ser Ile

1D amino acid sequence Energy Function (Optimization Target) ET =

  • ir

E1(ir) +

  • ir
  • js,i<j

E2(ir, js) Features Heuristic function is computationally expensive Tree search rather than graph search

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-39
SLIDE 39

Structure-Based Protein Design

Experiment Result — Running Time

2CS7 2DSX 3D3B 20,000 40,000 60,000 80,000 1 · 105 1.01 · 105 53,604 32,980 2,134 1,318 909 Running time (in millisecond) Sequential CPU-based A* GPU-based A* (4992 queues)

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-40
SLIDE 40

Structure-Based Protein Design

Experiment Result — Number of Total Expanded States

2CS7 2DSX 3D3B 1 · 107 2 · 107 3 · 107 4 · 107 2.49 · 107 2.61 · 107 2.15 · 107 3.46 · 107 3.32 · 107 2.71 · 107 Number of total expanded states Sequential CPU-based A* GPU-based A* (4992 queues)

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-41
SLIDE 41

Sliding Puzzle

Problem Description

Heuristic Function Using disjoint patuern database heuristic (Korf and Felner 2002) Heuristic function requires memory access

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-42
SLIDE 42

Sliding Puzzle

Experiment Result — Running Time

4x4a 5x5a 5x5b 20,000 40,000 60,000 80,000 1 · 105 1,687 52,972 98,935 184 3,236 4,532 174 1,729 2,187 Running time (in millisecond) Sequential CPU-based A* GPU-based A* (2496 queues) GPU-based A* (9984 queues)

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-43
SLIDE 43

Shortest Path

Problem Description s t

Features Grid network size: 10, 000 × 10, 000 Simple heuristic function: diagonal distance

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-44
SLIDE 44

Shortest Path

Experiment Result — Running Time

zigzag random full 10,000 20,000 30,000 40,000 50,000 60,000 58,081 32,562 8 15,746 12,527 4,475 8,322 9,563 8,114 Running time (in millisecond) Sequential CPU-based A* GPU-based A* (2496 queues) GPU-based A* (9984 queues)

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-45
SLIDE 45

Shortest Path

Experiment Result — Number of total states

zigzag random full 2 · 107 4 · 107 6 · 107 8 · 107 1 · 108

4 · 107 2.8 · 107 3 · 103 4.8 · 107 3.7 · 107 1.8 · 107 7.5 · 107 7.6 · 107 7.5 · 107

Number of total expanded states Sequential CPU-based A* GPU-based A* (2496 queues) GPU-based A* (9984 queues)

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-46
SLIDE 46

Conclusion

Our Contribution First massively parallel GPU-based A* 4x - 45x speedup for most problems Future Work Improvement when the parallelism of problems is limited Multi-GPUs A* search Extend to other heuristic search

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU

slide-47
SLIDE 47

Q&A

Thank you!

Funding National Basic Research Program of China National Natural Science Foundation of China

Yichao Zhou, Jianyang Zeng Massively Parallel A* Search on a GPU