Speculative High-Performance Simulation Alessandro Pellegrini A.Y. - - PowerPoint PPT Presentation

speculative high performance simulation
SMART_READER_LITE
LIVE PREVIEW

Speculative High-Performance Simulation Alessandro Pellegrini A.Y. - - PowerPoint PPT Presentation

Speculative High-Performance Simulation Alessandro Pellegrini A.Y. 2017/2018 Simulation From latin simulare (to mimic or to fake) It is the imitation of a real-world process' or system's operation over time It allows to collect


slide-1
SLIDE 1

Speculative High-Performance Simulation

Alessandro Pellegrini A.Y. 2017/2018

slide-2
SLIDE 2

Simulation

  • From latin simulare (to mimic or to fake)
  • It is the imitation of a real-world process' or

system's operation over time

  • It allows to collect results long before a system

is actually built (what-if analysis)

  • It can be used to drive physical systems

(symbiotic simulation)

  • Widely used: medicine, biology, physics,

economics, sociology, ...

slide-3
SLIDE 3

Some Examples

slide-4
SLIDE 4

Main Categories of Simulation

  • Continuous Simulation
  • Monte Carlo Simulation
  • Discrete-Event Simulation
slide-5
SLIDE 5

Wall-Clock Time vs Logical Time

  • Two different notions of time are present in a

simulation

  • Wall-Clock Time: the elapsed time required to

carry on a digital simulation (the shorter, the higher is the performance)

  • Logical Time: the actual simulated time

– Also referred to as simulation time

slide-6
SLIDE 6

Continuous Simulation

  • It is typically employed for modeling physical

phenomena

– Usually relies on a set of equations to be solved periodically

  • Commonly physical phenomena are expressed

via differential equations

  • A continuous simulation involves repeatedly

solving equations to update the state of the modeled phenomenon

slide-7
SLIDE 7

An Example: Diffusion Equation

  • Let's consider the Bidimensional Diffusion

Equation Case:

  • or, more compactly:
slide-8
SLIDE 8

An Example: Diffusion Equation

  • We approximate u(x, y, t) by a discrete function

ui,j(m)

– x = iΔx – y = iΔy – t = iΔt

  • This approximation is not enough for simulation:

we must be able to compute a future state starting from the current one

  • We use finite difference to transform it into a

recurrence relation

slide-9
SLIDE 9

Finite Difference

  • A finite difference is a mathematical expression of the

form f(x+b) - f(x+a)

– Forward difference: Δh[f](x) = f(x+h) - f(x) – Backward difference: ∇h[f](x) = f(x) - f(x - h) – Central difference: δh[f](x) = f(x + ½h) - f(x - ½h)

  • Using finite difference, the finite-difference method

can be applied to solve differential equations

  • Finite differences are used to approximate derivatives:

it is a discretization method

slide-10
SLIDE 10

An Example: Diffusion Equation

  • Applying finite (forward) difference

approximations to the derivatives we obtain:

  • To simulate, we transform it into:
  • This gives us an expression of ui,j(m+1)
slide-11
SLIDE 11

Stability of the Simulation

  • This is an approximation of a continuous system
  • Is the result correct independently of the

selected time step?

  • Stability reflects the sensitivity of Differential

Equation solution to perturbations

  • If the solutions are stable, they converge and

perturbations are damped out

  • When we step from an approximation to the

next, we land on a different solution from what we started from

slide-12
SLIDE 12

An Example: Diffusion Equation

  • In case of 2D Heat Simulation, we rewrite ut as:
  • The resulting amplification factor becomes:
  • Neumann boundary conditions lead to:
slide-13
SLIDE 13

An Example: Diffusion Equation

  • We know that -2 ≤ cos(βΔx) - 1 ≤ 0 and -2 ≤

cos(γΔy) -1 ≤ 0

  • The right-hand inequality holds for all β and γ
  • The left-hand inequality leads to:
slide-14
SLIDE 14

How is this useful programmatically?

  • Simulation is an approximation of reality
  • We want our approximation to resemble reality

as much as possible

  • Setting a simulation time step such that:

gives a simulation which is incorrect

slide-15
SLIDE 15

Initial and Boundary Conditions

  • ui,j(m+1) is derived using ui,j(m)
  • Then, we must give a numerical value to ui,j(0)
  • Furthermore, we must specify boundary

conditions to the Laplacian

– We can arbitrarily set it to 0

slide-16
SLIDE 16

Evolution of the System

slide-17
SLIDE 17

Coding the Problem

  • We repeatedly solve the differential equations
  • We rely on a loop to do this:
  • The code to update the state of the system looks

like:

slide-18
SLIDE 18

Coding the Problem: Initial Conditions

slide-19
SLIDE 19

EXAMPLE SESSION

Heat Diffusion Simulation in Python

slide-20
SLIDE 20

What Lessons Have we Learnt?

  • Before going distributed, we must be sure that

the sequential implementation is efficient

  • Stability conditions are not only a

mathematician's concern!

  • Continuous simulation is actually an

approximation of the continuous behaviour of a system

slide-21
SLIDE 21

Monte Carlo Simulation

  • It is generally used to evaluate some property that is

time independent

  • It tries to explore densely the whole space of

parameters of the phenomenon

– Monte Carlo simulations sample probability distribution for each variable to produce hundreds or thousands of possible outcomes

  • It is used to find (approximate) solutions of

mathematical problems involving a high number of variables that cannot be easily solved analytically

slide-22
SLIDE 22

An Example: Computing π

  • Let us consider a circle with r

= 1

  • The area of the circle is πr2 = π
  • The area of the sourrounding

square is (2r)2 = 22 = 4

  • The ratio of the areas is:
slide-23
SLIDE 23

An Example: Computing π

  • Randomly select points {(xi, yi)}ni=1 in the square
  • Determine the ratio

– m is the number of points such that xi2 + yi2 ≤ 1

  • Since , then
slide-24
SLIDE 24

EXAMPLE SESSION

Monte Carlo PI Approximation

slide-25
SLIDE 25

Event-Driven Programming

  • Event-Driven Programming is a programming

paradigm in which the flow of the program is determined by events

– Sensors outputs – User actions – Messages from other programs or threads

  • Based on a main loop divided into two phases:

– Event selection/detection – Event handling

  • Events resemble what interrupts do in hardware

systems

slide-26
SLIDE 26

Event Handlers

  • An event handler is an asynchronous callback
  • Each event represents a piece of application-level

information, delivered from the underlying framework:

– In a GUI events can be mouse movements, key pression, action selection, . . .

  • Events are processed by an event dispatcher which

manages associations between events and event handlers and notifies the correct handler

  • Events can be queued for later processing if the involved

handler is busy at the moment

slide-27
SLIDE 27

Discrete Event Simulation (DES)

  • A discrete event occurs at an instant in time and

marks a change of state in the system

  • DES represents the operation of a system as a

chronological sequence of events

  • If the simulation is run on top of a

parallel/distributed system, it's named Parallel Discrete Event Simulation (PDES)

slide-28
SLIDE 28

DES Building Blocks

  • Clock

– Independently of the measuring unit, the simulation must keep track of the current simulation time – Being discrete, time hops to the next event’s time

  • Event List

– At least the pending event set must be maintained by the simulation architecture – Events can arrive at a higher rate than they can be processed

  • Random Number Generators
  • Statistics
  • Ending Condition
slide-29
SLIDE 29

DES Skeleton

slide-30
SLIDE 30

Implementation of a DES Kernel

  • General-purpose Simulation is easy for DES

– No notion of model in the main-loop pseudocode!

  • Only prerequisites:

– The model must implement actual handlers – The model requires APIs to inject new events in the system and pass entities' states from the kernel

  • Multiple models can be run on the same kernel

– Core reuse – Model-independent optimization of the kernel

slide-31
SLIDE 31

API to Schedule Events and Set State

slide-32
SLIDE 32

API to Schedule Events and Set State

slide-33
SLIDE 33

API to Schedule Events and Set State

slide-34
SLIDE 34

Initialization and Main Loop

slide-35
SLIDE 35

Initialization and Main Loop

slide-36
SLIDE 36

Initialization and Main Loop

slide-37
SLIDE 37

Personal Communication Service

  • Networking System for mobile devices
  • Interesting to study how different

configurations behave

  • Coverage area modeled as a set of adjacent

hexagons

  • Explicit modeling of channel allocation
slide-38
SLIDE 38

EXAMPLE SESSION

Personal Communication Service

slide-39
SLIDE 39

Parallel Discrete Event Simulation

  • To increase the overall performance, DES

models can be run on top of multiple computing nodes

– Distributed and/or concurrent simulation

  • The main goal is transparency
  • Simulation models should not be modified
slide-40
SLIDE 40

Traditional PDES execution support

slide-41
SLIDE 41

Why are multicores important?

slide-42
SLIDE 42

Revisited PDES Architecture

Communication Network Machine CPU Kernel LP LP LP … … CPU CPU CPU LP LP LP LP Machine CPU Kernel LP LP LP … CPU CPU CPU Kernel LP LP LP

slide-43
SLIDE 43

Revisited PDES Archietcture

Communication Network Machine CPU Kernel LP LP LP … … CPU CPU CPU LP LP LP LP Machine CPU Kernel LP LP LP … CPU CPU CPU Kernel LP LP LP

slide-44
SLIDE 44

Is this effective as is?

  • No!
  • The multithreaded nature requires several
  • ptimizations:

– Message exchange (syncrhonization among threads) – Memory access patterns (NUMA machines)

  • Yet this opens to new opportunities

– Load sharing – Exchange of pointers among LPs

slide-45
SLIDE 45

The Synchronization Problem

  • Consider a simulation program composed of

several logical processes exchanging timestamped messages

  • Consider the sequential execution: this ensures

that events are processed in timestamp order

  • Consider the parallel execution: the greatest
  • pportunity arises from processing events from

different LPs concurrently

  • Is correctness always ensured?
slide-46
SLIDE 46

The Synchronization Problem

6 3 5 9 11 6 15 17 15

11

LPi

Execution Time Execution Time Execution Time 8 Message Straggler Message Events Timestamps

LPj LPk

Message
slide-47
SLIDE 47

Conservative Synchronization

  • Consider the LP with the smallest clock value at

some instant T in the simulation's execution

  • This LP could generate events relevant to every
  • ther LP in the simulation with a timestamp T
  • No LP can process any event with timestamp

larger than T

slide-48
SLIDE 48

Conservative Synchronization

  • If each LP has a lookahead of L, then any new

message sent by al LP must have a timestamp of at least T + L

  • Any event in the interval [T, T + L] can be safely

processed

  • L is intimately related to details of the

simulation model

slide-49
SLIDE 49

Optimistic Synchronization: Time Warp

  • There are no state variables that are shared between

LPs

  • Communications are assumed to be reliable
  • LPs need not to send messages in timestamp order
  • Local Control Mechanism

– Events not yet processed are stored in an input queue – Events already processed are not discarded

  • Global Control Mechanism

– Event processing can be undone – A-posteriori detection of causality violation

slide-50
SLIDE 50

Time Warp: State Recoverability

6 3 5 9 11 6 15 17 15

8 11

LPi

Execution Time Execution Time Execution Time Message Straggler Message Events Timestamps

LPj LPk

8 17

11 Antimessage Antimessage reception Rollback Execution: recovering state at LVT 5 Rollback Execution: recovering state at LVT 6 Message
slide-51
SLIDE 51

Rollback Operation

  • The rollback operation is fundamental to ensure

a correct speculative simulation

  • Its time critical: it is often executed on the

critical path of the simulation engine

  • 30+ years of research have tried to find
  • ptimized ways to increase its performance
slide-52
SLIDE 52

State Saving and Restore

  • The traditional way to support a rollback is to

rely on state saving and restore

  • A state queue is introduced into the engine
  • Upon a rollback operations, the "closest" log is

picked from the queue and restored

  • What are the technological problems to solve?
  • What are the methodological problems to solve?
slide-53
SLIDE 53

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

slide-54
SLIDE 54

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

3 5.5 7 15 21 33

slide-55
SLIDE 55

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

3 bound 5.5 7 15 21 33

slide-56
SLIDE 56

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

3 5.5 7 15 21 33 3 3 3 3 bound

slide-57
SLIDE 57

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

3 5.5 7 15 21 33 3 3 3 3 3 bound

slide-58
SLIDE 58

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

3 5.5 7 15 21 33 3 3 3 3 3 bound

slide-59
SLIDE 59

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

3 5.5 3

5.5

3 3 3 3 bound 7 15 21 33

slide-60
SLIDE 60

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

3 5.5 7 15 21 33 3

5.5

3 3 3 3 bound

slide-61
SLIDE 61

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

3 5.5 7 15 21 33 3

5.5

3 3 3 3 7 7 bound

slide-62
SLIDE 62

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

3 5.5 7 15 21 33 3

5.5

7 3 3 3 3 bound 7 7

slide-63
SLIDE 63

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

3 5.5 7 15 21 33 3

5.5

7 3 3 3 3 bound 7 7

3.7

slide-64
SLIDE 64

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

3 5.5 7 15 21 33 3

5.5

7 3 3 3 3 bound 7 7

slide-65
SLIDE 65

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

3 5.5 7 15 21 33 3

5.5

7 3 3 3 3 bound 7 7

slide-66
SLIDE 66

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

3 5.5 7 15 21 33 3 3 3 3 3 bound 7 7

slide-67
SLIDE 67

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

3 5.5 7 15 21 33 3 3 3 3 3 bound 7 7 Antimessages

slide-68
SLIDE 68

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

3 5.5 7 15 21 33 3 bound

slide-69
SLIDE 69

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

3 5.5 7 15 21 33 3 bound 3.7

slide-70
SLIDE 70

State Saving and Restore

Simulation Time

State Queue

Simulation Time

Input Queue

Simulation Time

Output Queue

3 5.5 7 15 21 33 3 bound 3.7

slide-71
SLIDE 71

State Saving Efficiency

  • How large is the simulation state?
  • How often do we execute a rollback? (rollback

frequency)

  • How many events do we have to undo on

average?

  • Can we do something better?
slide-72
SLIDE 72

Copy State Saving

slide-73
SLIDE 73

Sparse State Saving (SSS)

slide-74
SLIDE 74

Coasting Forward

  • Re-execution of already-processed events
  • These events have been artificially undone!
  • Antimessages have not been sent
  • These events must be reprocessed in silent

execution

– Otherwise, we duplicate messages in the system!

slide-75
SLIDE 75

When to take a checkpoint?

  • Classical approach: periodic state saving
  • Is this efficient?

– Think in terms of memory footprint and wall-clock time requirements

slide-76
SLIDE 76

When to take a checkpoint?

  • Classical approach: periodic state saving
  • Is this efficient?

– Think in terms of memory footprint and wall-clock time requirements

  • Model-based decision making
  • This is the basis for autonomic self-optimizing

systems

  • Goal: find the best-suited value for χ
slide-77
SLIDE 77

When to take a checkpoint?

  • δs: average time to take a snapshot
  • δc: the average time to execute coasting forward
  • N: total number of committed events
  • kr: number of executed rollbacks
  • γ: average rollback length
slide-78
SLIDE 78

Incremental State Saving (ISS)

  • If the state is large and scarcely updated, ISS

might provide a reduced memory footprint and a non-negligible performance increase!

  • How to know what state portions have been

modified?

slide-79
SLIDE 79

Incremental State Saving (ISS)

  • If the state is large and scarcely updated, ISS

might provide a reduced memory footprint and a non-negligible performance increase!

  • How to know what state portions have been

modified?

– Explicit API notification (non-transparent!) – Operator Overloading – Static Binary Instrumentation

slide-80
SLIDE 80

Reverse Computation

  • It can reduce state saving overhead
  • Each event is associated (manually or

automatically) with a reverse event

  • A majority of the operations that modify state

variables are constructive in nature

– the undo operation for them requires no history

  • Destructive operations (assignment, bit-wise
  • perations, ...) can only be restored via

traditional state saving

slide-81
SLIDE 81

Reversible Operations

slide-82
SLIDE 82

Non-Reversible Operations: if/then/else

if(qlen > 0) { qlen--; sent++; }

if(qlen "was" > 0) { sent--; qlen++; }

  • The reverse event must check an "old"

state variables' value, which is not available when processing it!

slide-83
SLIDE 83

Non-Reversible Operations: if/then/else

if(qlen > 0) { b = 1; qlen--; sent++; }

if(b == 1) { sent--; qlen++; }

  • Forward events are modified by inserting "bit variables";
  • The are additional state variables telling whether a

particular branch was taken or not during the forward execution

slide-84
SLIDE 84

Random Number Generators

  • Fundamental support for stochastic simulation
  • They must be aware of the rollback operation!

– Failing to rollback a random sequence might lead to incorrect results (trajectory divergence) – Think for example to the coasting forward operation

  • Computers are precise and deterministic:

– Where does randomness come from?

slide-85
SLIDE 85

Random Number Generators

  • Practical computer "random" generators are

common in use

  • They are usually referred to as pseudo-random

generators

  • What is the correct definition of randomness in

this context?

slide-86
SLIDE 86

Random Number Generators

“The deterministic program that produces a random sequence should be different from, and—in all measurable respects—statistically uncorrelated with, the computer program that uses its output”

  • Two different RNGs must produce statistically

the same results when coupled to an application

  • The above definition might seem circular:

comparing one generator to another!

  • There is a certain list of statistical tests
slide-87
SLIDE 87

Uniform Deviates

  • They are random numbers lying in a specified

range (usually [0,1])

  • Other random distributions are drawn from a

uniform deviate

– An essential building block for other distributions

  • Usually, there are system-supplied RNGs:
slide-88
SLIDE 88

Problems with System-Supplied RNGs

  • If you want a random float in [0.0, 1.0):

x = rand() / (RAND_MAX + 1.0);

  • Be very (very!) suspicious of a system-supplied

rand() that resembles the above-described one

  • They belong to the category of linear

congruential generators Ij+1 = a Ij + c (mod m)

  • The recurrence will eventually repeat itself,

with a period no greater than m

slide-89
SLIDE 89

Problems with System-Supplied RNGs

  • If m, a, and c are properly chosen, the period

will be of maximal length (m)

– all possible integers between 0 anbd m - 1 will occur at some point

  • In general, it may look a good idea
  • Many ANSI-C implementations are flawed
slide-90
SLIDE 90

An example RNG (from libc)

slide-91
SLIDE 91

An example RNG (from libc)

This is where we can support the rollback operation: consider the seed as part of the simulation state!

slide-92
SLIDE 92

Problems with System-Supplied RNGs

slide-93
SLIDE 93

Problems with System-Supplied RNGs

In an n-dimensional space, the points lie on at most m1/n hyperplanes!

slide-94
SLIDE 94
  • The probability p(x)dx of generating a number

between x and x+dx is:

  • p(x) is normalized:
  • If we take some function of x like y(x):

Functions of Uniform Deviates

slide-95
SLIDE 95

Exponential Deviates

  • Suppose that y(x) ≡ -ln(x), and that p(x) is

uniform:

  • This is distributed exponentially
  • Exponential distribution is fundamental in

simulation

– Poisson-random events, for example the radioactive decay of nuclei, or the more general interarrival time

slide-96
SLIDE 96

Exponential Deviates

slide-97
SLIDE 97

Deviate Transformation

slide-98
SLIDE 98

Scheduling Events

Communication Network Machine CPU Kernel LP LP LP … … CPU CPU CPU LP LP LP LP Machine CPU Kernel LP LP LP … CPU CPU CPU Kernel LP LP LP

slide-99
SLIDE 99

Scheduling Events

  • A single thread takes care of a certain number of

LPs at any time

  • We have to avoid inter-LPs rollbacks
  • Lowest-Timestamp First:

– Scan the input queue of all LPs – Check the bound of each LP – Pick the LP whose next event is closest in simulation time

slide-100
SLIDE 100

Global Virtual Time

  • In a PDES system, memory is always increasing

– We do not discard events – We take a lot of snapshots!

  • We must find a way to implement a garbage

collector

– During the execution of an event at time T, we can schedule events at time t ≥ T

slide-101
SLIDE 101

Global Virtual Time

At a specific wall-clock time t, the GVT is defined as the minimum between:

  • All virtual times in all virtual clocks at time t;
  • The timestamps of all sent but not yet processed

events at time t

slide-102
SLIDE 102

Global Virtual Time

6 3 5 9 6

11

LPi

Execution Time Execution Time Execution Time Events Timestamps

LPj LPk

Message
slide-103
SLIDE 103

Global Virtual Time

6 3 5 9 6

11

LPi

Execution Time Execution Time Execution Time Events Timestamps

LPj LPk

Message
slide-104
SLIDE 104

Global Virtual Time

6 3 5 9 6

11

LPi

Execution Time Execution Time Execution Time Events Timestamps

LPj LPk

Message
slide-105
SLIDE 105

Global Virtual Time

6 3 5 9 31

11

LPi

Execution Time Execution Time Execution Time Events Timestamps

LPj LPk

Message

15 18

slide-106
SLIDE 106

Global Virtual Time

6 3 5 9 31

11

LPi

Execution Time Execution Time Execution Time Events Timestamps

LPj LPk

Message

15 18

slide-107
SLIDE 107

Global Virtual Time

6 3 5 9 31

11

LPi

Execution Time Execution Time Execution Time Events Timestamps

LPj LPk

Message

15 18

slide-108
SLIDE 108

GVT Operations

  • Once a correct GVT value is determined we can

perform two actions:

– Fossil Collection: the actual garbage collection of

  • ld memory buffers

– Termination Detection

  • GVT identifies the commitment horizon of the

speculative execution

slide-109
SLIDE 109

How Accurate is Speculative Simulation?

  • Sequential Simulation is perfect for fine-grain

inspection of predicates

– It does not scale – Models are getting larger and larger everyday

  • Parallel/Distributed simulation has great

performance

  • Fine-grain inspection is not viable

– Process coordination is required – This hampers the achievable speedup

slide-110
SLIDE 110

How Accurate is Speculative Simulation?

  • Speculative Simulation inserts an additional

delay

  • The inspection of a global simulation state is

delayed until a portion of the simulation trajectory becomes committed

  • Inspection can be done after a GVT value has

been computed

slide-111
SLIDE 111

The Completion-Shift Problem

slide-112
SLIDE 112

The Completion-Shift Problem

slide-113
SLIDE 113

The Completion-Shift Problem

slide-114
SLIDE 114

Time Warp Fundamentals

slide-115
SLIDE 115

ROOT-Sim

  • The ROme OpTimisti Simulator

https://github.com/HPDCS/ROOT-Sim

  • A general-purpose speculative simulation kernel

based on both state saving and reversibility

  • Targets complete transparency towards the

model developer

  • It can transparently deploy and run legacy

models

slide-116
SLIDE 116

ROOT-Sim Internals

slide-117
SLIDE 117

EXAMPLE SESSION

PCS on ROOT-Sim