Performance Issues for Parallel Implementations of Bootstrap - - PowerPoint PPT Presentation

performance issues for parallel implementations of
SMART_READER_LITE
LIVE PREVIEW

Performance Issues for Parallel Implementations of Bootstrap - - PowerPoint PPT Presentation

Performance Issues for Parallel Implementations of Bootstrap Simulation Algorithm 22 nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2010) Ricardo M. Czekster, Paulo Fernandes, Afonso Sales and Thais


slide-1
SLIDE 1

Performance Issues for Parallel Implementations

  • f Bootstrap Simulation Algorithm

22nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2010)

Ricardo M. Czekster, Paulo Fernandes, Afonso Sales and Thais Webber

Pontif´ ıcia Universidade Cat´

  • lica do Rio Grande do Sul (PUCRS)

PaleoProspec Project - PUCRS/Petrobras Funded also by CAPES and CNPq - Brazil

slide-2
SLIDE 2

Context

Interest

The solution of complex and large state-based stochastic models to extract performance indices.

Solution

Numerical (iterative methods)

◮ Power method [Stewart 94] ◮ Arnoldi [Arnoldi 51] ◮ GMRES [Saad and Schultz 86]

Simulation

◮ Traditional [Ross 96] ◮ Monte Carlo [H¨

aggstr¨

  • m 02]

◮ Backward [Propp and Wilson 96] ◮ Bootstrap [Czekster et al. 10] ⋆ reliable estimations ⋆ high computational cost to generate repeated batches of samples

slide-3
SLIDE 3

Context

Markovian simulation

Generation of independent samples (parallel execution) Parallel sampling (e.g., master-worker approach) Possible sequence of states using the transition matrix

◮ random walk or simulation trajectory ◮ huge size → huge memory cost ◮ Stochastic Automata Networks (structured formalism, underlying

Markov Chain)

slide-4
SLIDE 4

Objective

Goal

It is to present a parallel implementation of Bootstrap simulation, focusing

  • n the overall technique performance by presenting a method to generate

large amount of samples in less time.

Discussion

processing x communication times model size x amount of generated samples

slide-5
SLIDE 5

Outline Stochastic Automata Networks (SAN) Bootstrap simulation Parallelization Experiments and results Conclusion and future works

slide-6
SLIDE 6

Stochastic Automata Networks (SAN)

  • It allows the description of a large system in a structured manner by its

parts (automata) SAN model Underlying Continuous-Time Markov Chain

1

s1 C

Type Event Rate Type Event Rate Type Event Rate syn s1 α syn s2 β loc l1 f

1

A

1

B s2 l1 s1 s2 s1

f = [(B == 0) && (C == 0)] × γ

000 100 111

α β γ

slide-7
SLIDE 7

Stochastic Automata Networks (SAN)

  • It allows the description of a large system in a structured manner by its

parts (automata) SAN model Underlying Continuous-Time Markov Chain

1

s1 C

Type Event Rate Type Event Rate Type Event Rate syn s1 α syn s2 β loc l1 f

1

A

1

B s2 l1 s1 s2 s1

f = [(B == 0) && (C == 0)] × γ

2 1

α β γ

slide-8
SLIDE 8

Bootstrap

Method

It is a well known statistical method applied to many fields to improve accuracy when performing sample estimations for complex distributions.

In the simulation context

Bootstrap simulation provides more reliable estimations than the traditional simulation [SCSC’10].

Main feature

Generation of repeated batches of samples that helps to improve the method accuracy.

slide-9
SLIDE 9

Traditional simulation

1 2

States π′ π′ π′

1

π′

2

Time

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length each visited state = sample mean permanence probability π = π′

n

slide-10
SLIDE 10

Traditional simulation

Initial state States π′ π′ π′

1

π′

2

Time

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length each visited state = sample mean permanence probability π = π′

n

slide-11
SLIDE 11

Traditional simulation

Initial state States 1 U = 0.08 π′ π′ π′

1

π′

2

Time

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length each visited state = sample mean permanence probability π = π′

n

slide-12
SLIDE 12

Traditional simulation

Initial state States 1 U = 0.08 π′ π′ π′

1

π′

2

Time

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length each visited state = sample mean permanence probability π = π′

n

slide-13
SLIDE 13

Traditional simulation

1 Initial state States 1 U = 0.08 π′ π′ π′

1

π′

2

Time

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length each visited state = sample mean permanence probability π = π′

n

slide-14
SLIDE 14

Traditional simulation

1 Initial state States 1 2 U = 0.87 π′ π′ π′

1

π′

2

Time

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length each visited state = sample mean permanence probability π = π′

n

slide-15
SLIDE 15

Traditional simulation

1

2

Initial state States 1 2 U = 0.87 π′ π′ π′

1

π′

2

Time

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length each visited state = sample mean permanence probability π = π′

n

slide-16
SLIDE 16

Traditional simulation

2

1 1 Initial state States 1 2 U = 0.87 π′ π′ π′

1

π′

2

Time

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length each visited state = sample mean permanence probability π = π′

n

slide-17
SLIDE 17

Traditional simulation

2

1 1 Initial state States 1 2 3 U = 0.32 π′ π′ π′

1

π′

2

Time

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length each visited state = sample mean permanence probability π = π′

n

slide-18
SLIDE 18

Traditional simulation

2

1 1

1

Initial state States 1 2 3 U = 0.32 π′ π′ π′

1

π′

2

Time

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length each visited state = sample mean permanence probability π = π′

n

slide-19
SLIDE 19

Traditional simulation

2 1

1 1 1 Initial state States 1 2 3 U = 0.32 π′ π′ π′

1

π′

2

Time

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length each visited state = sample mean permanence probability π = π′

n

slide-20
SLIDE 20

Traditional simulation

2 1

1 1 1 States 1 2 π′ π′ π′

1

π′

2

3

. . .

n

. . .

Initial state Time

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length each visited state = sample mean permanence probability π = π′

n

slide-21
SLIDE 21

Bootstrap simulation

Initial state States Time

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length K: bootstrap z: number of bootstraps mean permanence probability π =

z

i=1 ¯

xi z

slide-22
SLIDE 22

Bootstrap simulation

2 1 2 1 2 1 Initial state 1 States U = 0.08 . . . K1 K2 Kz Time

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length K: bootstrap z: number of bootstraps mean permanence probability π =

z

i=1 ¯

xi z

slide-23
SLIDE 23

Bootstrap simulation

2 1 2 1 2 1 Initial state 1 States U = 0.08 . . . K1 K2 Kz Time

¯ n trials to execute the resamplings For each bootstrap, it is performed

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length K: bootstrap z: number of bootstraps mean permanence probability π =

z

i=1 ¯

xi z

slide-24
SLIDE 24

Bootstrap simulation

2 1 2 1 2 1 Initial state 1 States U = 0.08 . . . K1 K2 Kz Time

¯ n trials to execute the resamplings For each bootstrap, it is performed

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length K: bootstrap z: number of bootstraps mean permanence probability π =

z

i=1 ¯

xi z

slide-25
SLIDE 25

Bootstrap simulation

2 1 2 1 2 1 Initial state 1 States U = 0.08 . . . K1 K2 Kz Time

¯ n trials to execute the resamplings For each bootstrap, it is performed

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length K: bootstrap z: number of bootstraps mean permanence probability π =

z

i=1 ¯

xi z

slide-26
SLIDE 26

Bootstrap simulation

2

2 1 2 1 2 1 Initial state 1 2 States U = 0.87 . . . K1 K2 Kz Time

¯ n trials to execute the resamplings For each bootstrap, it is performed

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length K: bootstrap z: number of bootstraps mean permanence probability π =

z

i=1 ¯

xi z

slide-27
SLIDE 27

Bootstrap simulation

2 1

2 2 1 2 1 1 Initial state 1 2 3 States U = 0.32 . . . K1 K2 Kz Time

For each bootstrap, it is performed ¯ n trials to execute the resamplings

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length K: bootstrap z: number of bootstraps mean permanence probability π =

z

i=1 ¯

xi z

slide-28
SLIDE 28

Bootstrap simulation

2 1

2 1 2 1 2 1 Initial state 1 2 3 n States . . . K1 K2 Kz Time . . . . . .

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length K: bootstrap z: number of bootstraps mean permanence probability π =

z

i=1 ¯

xi z

slide-29
SLIDE 29

Bootstrap simulation

2 1 2 1 2 1 2 1 2 1 2 1

2 1

Initial state 1 2 3 n States . . . K1 K2 Kz ¯ x1 ¯ x2 ¯ xz . . . normalize Time . . . . . .

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length K: bootstrap z: number of bootstraps mean permanence probability π =

z

i=1 ¯

xi z

slide-30
SLIDE 30

Bootstrap simulation

2 1 2 1 2 1 2 1 2 1 2 1

2 1

Initial state normalize 1 2 3 n States . . . K1 K2 Kz ¯ x1 ¯ x2 ¯ xz . . . ¯ x1[0] + ¯ x2[0] + ··· + ¯ xz[0] z

= π0

¯ x1[1] + ¯ x2[1] + ··· + ¯ xz[1] z

= π1

¯ x1[2] + ¯ x2[2] + ··· + ¯ xz[2] z

= π2

π Time . . . . . .

1 2 1 2

0.10 0.25 0.30 0.65 0.55 0.25 0.25 0.20 0.45 Transition Matrix

n = trajectory length K: bootstrap z: number of bootstraps mean permanence probability π =

z

i=1 ¯

xi z

slide-31
SLIDE 31

Parallelization

Approach

Split the bootstrap sampling tasks over the processing nodes Each node performs the full trajectory simulation but produces a different set of samples Master-worker pattern Implementation: C++ language and MPI primitives Executed on a cluster with 8 Dell PowerEdge R610 connected in a Gigabit Ethernet network

slide-32
SLIDE 32

Experiments

Number of bootstraps assigned to nodes in each configuration

configuration number of bootstraps 1 36 2 18 18 3 12 12 12 4 9 9 9 9 5 7 7 7 7 8 6 6 6 6 6 6 6 7 5 5 5 5 5 5 6 8 4 4 4 4 5 5 5 5

Models (examples)

ASP - Alternate Service Patterns: describes an Open Queueing Network with servers that map P different service patterns. FAS - First Available Server: indicates the availability of N servers. RS - Resource Sharing: maps R shared resources to P processes.

slide-33
SLIDE 33

Results Large models (million of states)

n = 106 n = 107

5 10 15 20 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Time (s) Number of nodes Proc. Comm. RS FAS ASP 20 40 60 80 100 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Time (s) Number of nodes Proc. Comm. RS FAS ASP

n = 108 n = 109

200 400 600 800 1000 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Time (s) Number of nodes Proc. Comm. RS FAS ASP 2000 4000 6000 8000 10000 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Time (s) Number of nodes Proc. Comm. RS FAS ASP

slide-34
SLIDE 34

Results Small models (hundred of states)

n = 106 n = 107

2 4 6 8 10 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Time (s) Number of nodes Proc. Comm. RS FAS ASP 20 40 60 80 100 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Time (s) Number of nodes Proc. Comm. RS FAS ASP

n = 108 n = 109

200 400 600 800 1000 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Time (s) Number of nodes Proc. Comm. RS FAS ASP 2000 4000 6000 8000 10000 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Time (s) Number of nodes Proc. Comm. RS FAS ASP

slide-35
SLIDE 35

Conclusion and future works

Summary

An efficient implementation of a novel simulation algorithm Considerable speedup for very large models

◮ specially for long trajectories

The speedup was consistent with different SAN models

◮ nearly 5 times speedup for 8 nodes

The processing demands depend only on the simulation trajectory length (n) The communication demands depend only on the reachable state space size of the model

slide-36
SLIDE 36

Conclusion and future works

Future works

Study of bootstrap distribution over non-uniform memory architectures

◮ some levels of shared memory could be highly beneficial to cope with

high communication (short trajectories for large models)

Blending methods

◮ combination of parallel Bootstrap approach with more sophisticated

simulation approaches (e.g., Perfect Simulation)

slide-37
SLIDE 37

Thank you for your attention.