A Dynamic Programming-based MCMC Framework for Solving DCOPs with GPUs
Ferdinando Fioretto1(2) (joint work with) William Yeoh2 and Enrico Pontelli2
1 University of Michigan 2 New Mexico State University
CP 2016, Toulouse
A Dynamic Programming-based MCMC Framework for Solving DCOPs with - - PowerPoint PPT Presentation
A Dynamic Programming-based MCMC Framework for Solving DCOPs with GPUs Ferdinando Fioretto 1(2) (joint work with) William Yeoh 2 and Enrico Pontelli 2 1 University of Michigan 2 New Mexico State University CP 2016, Toulouse Introduction GPUs
1 University of Michigan 2 New Mexico State University
CP 2016, Toulouse
1
Introduction GPUs DMCMC Results Conclusions
Introduction GPUs DMCMC Results Conclusions
2
Numerical Analysis
MathWorks MATLAB
Bioinformatics Deep Learning
3
centralized solver centralized solver DCOP Algorithm
agent variables constraints
Introduction GPUs DMCMC Results Conclusions
4
xa xb U 3 1 20 1 2 1 1 5
Introduction GPUs DMCMC Results Conclusions
5
x2 x1 x3 x4 x5 Boundary variables Local variables Bi Li Agent ai
Introduction GPUs DMCMC Results Conclusions
6
x
x
f2F
7
Introduction GPUs DMCMC Results Conclusions
Source: http://xr0038.hatenadiary.jp/
[Nguyen et al., AAMAS 2013]
8
Introduction GPUs DMCMC Results Conclusions
Introduction GPUs DMCMC Results Conclusions
9
10
Block (0,0) Block (1,0) Block (2,0) Block (0,1) Block (1,1) Block (2,1) Kernel 1 Kernel 2
B
Thread (0,0) Thread (1,0) Thread (2,0) Thread (3,0) Thread (4,0) Thread (0,1) Thread (1,1) Thread (2,1) Thread (3,1) Thread (4,1) Thread (0,2) Thread (1,2) Thread (2,2) Thread (3,2) Thread (4,2) Thread (0,3) Thread (1,3) Thread (2,3) Thread (3,3) Thread (4,3)
block
CPU GPU
Block (0,0) Block (1,0) Block (2,0) Block (0,1) Block (1,1) Block (2,1) block block block block block block
...
Introduction GPUs DMCMC Results Conclusions
HOST GLOBAL MEMORY CONSTANT MEMORY Shared memory
Thread Thread regs regs
Block Shared memory
Thread Thread regs regs
Block GRID
Introduction GPUs DMCMC Results Conclusions
11
Introduction GPUs DMCMC Results Conclusions
12
Introduction GPUs DMCMC Results Conclusions
13
cu cudaMallo lloc(&deviceV, sizeV); cudaMemcpy(deviceV, hostV, sizeV, ...)
data Global Memory
Introduction GPUs DMCMC Results Conclusions
14
cu cudaKe Kernel<nThreads, nBlocks>( )
cu cudaKernel( )
Kernel invocation Global Memory
Introduction GPUs DMCMC Results Conclusions
15
cudaMemcpy(hostV, deviceV, sizeV, ...)
Global Memory data
16
Introduction GPUs DMCMC Results Conclusions
i ← Initialize(zi)
i ← Sample(P(zi | zt 1, . . . , zt i1, zt1 i+1, . . . , zt1 n
[Nguyen et al. AAMAS-2013]
17
x2 x1 x3 x4 x5
Introduction GPUs DMCMC Results Conclusions
x7 x6 x8 x9 x10 Li Lj Bi Bj Agent ai Agent aj
18
x2 x1 x3 x4 x5 x1 x2 x3 x4 x5 U 3 2 5 21 1 2 1 4 20 2 3 5 1 32 Joint utility table
Introduction GPUs DMCMC Results Conclusions
Li Bi Bi Li
19
Introduction GPUs DMCMC Results Conclusions
Block (0,0) Block (1,0) Block (2,0) Block (0,1) Block (1,1) Block (2,1)
GPU
Block (0,0) Block (1,0) Block (2,0) Block (0,1) Block (1,1) Block (2,1) block block block block block block
Each row of the Joint utility table is computed in parallel using several blocks.
Joint utility table x1 x2 x3 x4 x5 U 3 2 5 21 1 2 1 4 20 2 3 5 1 32 [Fioretto et al. CP-15]
20
Introduction GPUs DMCMC Results Conclusions
x1 x2 x3 x4 x5 U 3 2 5 21 2 1 4 20 3 5 1 32 1 2 1 4 20 1 3 5 1 32 1 2 1 4 20 2 3 5 1 32 …
Block (0,0) Block (1,0) Block (2,0) Block (0,1) Block (1,1) Block (2,1)
GPU
Block (0,0) Block (1,0) Block (2,0) Block (0,1) Block (1,1) Block (2,1) block block block block block block
R multiple samples
Joint utility table
21
Introduction GPUs DMCMC Results Conclusions
Block (0,0) Block (1,0) Block (2,0) Block (0,1) Block (1,1) Block (2,1)
B
Thread (0,0) Thread (1,0) Thread (2,0) Thread (3,0) Thread (4,0) Thread (0,1) Thread (1,1) Thread (2,1) Thread (3,1) Thread (4,1) Thread (0,2) Thread (1,2) Thread (2,2) Thread (3,2) Thread (4,2) Thread (0,3) Thread (1,3) Thread (2,3) Thread (3,3) Thread (4,3)
block
GPU
Block (0,0) Block (1,0) Block (2,0) Block (0,1) Block (1,1) Block (2,1) block block block block block block
... 3
q(xk =did | xl ∈ Li \ {xk}) = 1 Zπ exp X
fj∈Fi
fj(z|xfj ) q(xk =did | xl ∈ Li \ {xk}) = 1 Zπ exp X
fj∈Fi
fj(z|xfj ) q(xk =did | xl ∈ Li \ {xk}) = 1 Zπ exp X
fj∈Fi
fj(z|xfj )
… 1 2 Gibbs Sampling Process
22
good bad
Introduction GPUs DMCMC Results Conclusions
23
Introduction GPUs DMCMC Results Conclusions
Gibbs (CPU) Gibbs (GPU) MH (CPU) MH (GPU) MGM2 Number of samples (S) Quality (Ratio) 10 2500 5000 10000 0.6 0.7 0.8 0.9 1.0
Simulated Time (sec.) 10 2500 5000 10000 0.1 0.5 1.0 5.0 10.0 50.0 100.0 500.0
than CPU-MCMC ones.
GPU-MH solutions quality comparable to those of MGM2 Random Networks
24
Introduction GPUs DMCMC Results Conclusions
Main results:
MGM(2) and finds solutions of higher quality.
|A| 5 10 25 50 wct st quality wct st quality wct st quality wct st quality DPOP 125.39 94.98 1661
7.435 0.435 1379 11.910 0.446 2766 24.211 0.417 6692 45.771 0.462 13802 MGM2 8.939 0.979 1389 23.903 1.526 2783 56.035 1.629 7116 112.54 1.788 14145 GibbsCP U 6.146 1.101 1638 12.093 1.190 3.319 31.031 1.347 8344 62.411 1.489 16577 GibbsGP U 0.162 0.033 1635 0.301 0.034 3338 0.708 0.041 8344 1.416 0.048 16550 MHCP U 0.561 0.113 1131 1.091 0.121 2775 2.281 0.176 6921 3.921 0.185 12112 MHGP U 0.047 0.014 1143 0.102 0.016 2663 0.196 0.017 6925 0.360 0.022 11856
Meeting Scheduling S = 100; R = 10
Introduction GPUs DMCMC Results Conclusions
25
Sampling-Based DCOP Algorithm, AAMAS, 2013.
(Distributed) Constraint Optimization Problems with Dynamic Programming”, CP, 2015.
26