Stochastic Simulation Random number generation
Bo Friis Nielsen
Applied Mathematics and Computer Science Technical University of Denmark 2800 Kgs. Lyngby – Denmark Email: bfn@imm.dtu.dk
Stochastic Simulation Random number generation Bo Friis Nielsen - - PowerPoint PPT Presentation
Stochastic Simulation Random number generation Bo Friis Nielsen Applied Mathematics and Computer Science Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: bfn@imm.dtu.dk Random number generation Random number generation
Applied Mathematics and Computer Science Technical University of Denmark 2800 Kgs. Lyngby – Denmark Email: bfn@imm.dtu.dk
02443 – lecture 2 2
DTU
02443 – lecture 2 3
DTU
project.
RNG’s that can be trusted.
02443 – lecture 2 4
DTU
02443 – lecture 2 5
DTU
numbers: A sequence of independent random variable, Ui, uniformly distributed on ]0, 1[
1 1
U(0, 1) numbers.
02443 – lecture 2 6
DTU
Mechanics devices:
Other devices:
stor
02443 – lecture 2 7
DTU
An RNG is a computer algorithm that outputs a sequence of reals
relevant statistical properties as I.I.D. uniformly distributed random variables
[0; 1] can never be achieved.
02443 – lecture 2 8
DTU
(output divide by 10000)
i Zi Ui Z2
i
7182 0.7182 51,581,124 1 5811 0.5811 33,767,721 2 7677 0.7677 58,936,329 3 9363 0.9363 87,665,769 4 6657 0.6657 44,315,649 5 3156 0.3156 09,960,336 . . . . . . . . . . . . Might seem plausible - but rather dubious
02443 – lecture 2 9
DTU
Leonardo of Pisa (pseudonym: Fibonacci) dealt in the book ”Liber Abaci”(1202) with the integer sequence defined by: xi = xi−1 + xi−2 i ≥ 2 x0 = 1 x1 = 1
xi = mod( xi−1 + xi−2, M ) Ui = xi M where x = mod( y, M ) is the modulus after division ie. y − nM where n = ⌊y/M⌋ Notice xi ∈ [0, M −1]. Consequently, there is M 2 −1 possible starting values. Maximal length of period is M 2 − 1 which is only achieved for M = 2, 3.
02443 – lecture 2 10
DTU
The generator Ui = mod( aUi−1, 1 ) Ui ∈ [0, 1] illustrates the principle provided a is large, the last digits are retained. Can be implemented as (xi is an integer) xi = mod( axi−1, M ) Ui = xi M Examples are a = 23 and M = 108 + 1.
02443 – lecture 2 11
DTU
If xi can take N values, then the maximum length of a cycle is N.
02443 – lecture 2 12
DTU
02443 – lecture 2 13
DTU
LCG are defined as xi = mod( axi−1 + c, M ) Ui = xi M for a multiplier a, shift c and modulus M. We will take a, c and x0 such xi lies in (0, 1, ... , M − 1) and it looks random. Example: M = 16, a = 5, c = 1 With x0 = 3: 0 1 6 15 12 13 2 11 8 9 14 7 4 5 10 3
02443 – lecture 2 14
DTU
if)
prime, full period is attained only if a = 1.
02443 – lecture 2 15
DTU
02443 – lecture 2 16
DTU
Matsumoto and Nishimura, 1998
cryptography).
02443 – lecture 2 17
DTU
R: The Mersenne Twister is the default, many others can be chosen. Python: Mersenne Twister chosen. S-plus: XOR-shuffling between a congruential generator and a (Tausworthe) feedback shift register generator. The period is about 262 ≈ 4 · 1018, but seed dependent (!). Matlab 7.4 and higher: By default, the Mersenne Twister. Also
02443 – lecture 2 18
DTU
Definition: A sequence of pseudo-random numbers Ui is a deterministic sequence of numbers in ]0, 1[ having the same relevant statistical properties as a sequence of random numbers. The question is what are relevant statistical properties.
02443 – lecture 2 19
DTU
02443 – lecture 2 20
DTU
⋄ Visual tests/plots ⋄ χ2 test ⋄ Kolmogorov Smirnov test
⋄ Visual tests/plots ⋄ Run test up/down ⋄ Run test length of runs ⋄ Test of correlation coefficients
02443 – lecture 2 21
DTU
statistic
02443 – lecture 2 22
DTU
02443 – lecture 2 23
DTU
Thus Xj ∼ Bin(n, pj) E(Xj) = npj, Var(Xj) = npj(1 − pj) And E
√
npj(1−pj)
√
npj(1−pj)
Thus
Xj−npj
√
npj(1−pj) n→∞
∼ N(0, 1)
02443 – lecture 2 24
DTU
Recall
Xj−npj
√
npj(1−pj) n→∞
∼ N(0, 1) thus
√
npj(1−pj)
2 = (Xj−npj)2
npj(1−pj) asymp
∼ χ2(1) Consider now the case k = 2
(X1−np1)2 np1(1−p1) = (X1−np1)2(p1+1−p1) np1(1−p1)
= (X1−np1)2
np1
+ (X1−np1)2
n(1−p1)
= (X1−np1)2
np1
+ (X1−n−n(p1−1))2
n(1−p1)
= (X1−np1)2
np1
+ (−X2+np2)2
np2
= (X1−np1)2
np1
+ (X2−np2)2
np2
02443 – lecture 2 25
DTU
The general form of the test statistic is T =
nclasses
(nobserved,i − nexpected,i)2 nexpected,i
d f degrees of freedom. d f is generally nclasses − 1 − m where m is the number of estimated parameters.
02443 – lecture 2 26
DTU
distribution F(x).
F(x)
version
02443 – lecture 2 27
DTU
20 N(0, 1) variates (sorted):
0.71, 0.85, 0.87, 1.15, 1.37, 1.41, 1.81, 2.65, 3.69 Xi iid random variables with F(x) = P(X ≤ x) Each leads to a (simple) random function Fe,i(x) = 1{Xi≤x} leading to Fe(x) = 1
n
n
i=1 Fe,i(x) = 1 n
n
i=1 1{Xi≤x}
E (Fe(x)) = E 1
n
n
i=1 1{Xi≤x}
n
n
i=1 E
Var (Fe(x)) =
1 n2nF(x)(1 − F(x)) = F(x)G(x) n
Fe(x)
n→∞
∼ N
n
a stochastic process, more particularly a Brownian bridge
20 N(0, 1) variates (sorted):
0.71, 0.85, 0.87, 1.15, 1.37, 1.41, 1.81, 2.65, 3.69 Dn = sup
x {|Fn(x) − F(x)|}
the test statistic follows Kolmogorovs distribution
DTU
Level of significance (1 − α) Case Adjusted test statistic 0.850 0.900 0.950 0.975 0.990 All parameters known √n + 0.12 + 0.11
√n
1.138 1.224 1.358 1.480 1.628 N( ¯ X(n), S2(n)) √n − 0.01 + 0.85
√n
0.775 0.819 0.895 0.955 1.035 exp( ¯ X(n)) √n + 0.26 + 0.5
√n
Dn − 0.2
n
0.990 1.094 1.190 1.308
02443 – lecture 2 30
DTU
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Random numbers U_i against U_{i+1}, X_{i+1} = (5 X_i + 1)(mod 16) ’ranplot.lst’ 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Random numbers U_i against U_{i+1}, X_{i+1} = (129 X_i + 26461)(mod 65536) ’ranplot2.lst’
02443 – lecture 2 31
DTU
02443 – lecture 2 32
DTU
with the median.
distributed as N
n1 + n2 + 1, 2 n1n2(2n1n2 − n1 − n2) (n1 + n2)2(n1 + n2 − 1)
below.
(runs above) and Rb (runs below)
02443 – lecture 2 33
DTU
A test specifically designed for testing random number generators is the following UP/DOWN run test, see e.g. Donald E. Knuth, The Art
The sequence: 0.54, 0.67, |0.13, 0.89, |0.33, 0.45, 0.90, |0.01, 0.45, 0.76, 0.82, |0.24, |0.17 has runs of length 2,2,3,4,1, ... i.e. runs of consecutively increa- sing numbers.
Generate n random numbers.The observed number of runs of length 1, . . . , 5 and ≥6 are recorded in the vector R. The test statistic is calculated by: Z = 1 n − 6(R − nB)TA(R − nB)
A = 4529.4 9044.9 13568 18091 22615 27892 9044.9 18097 27139 36187 45234 55789 13568 27139 40721 54281 67852 83685 18091 36187 54281 72414 90470 111580 22615 45234 67852 90470 113262 139476 27892 55789 83685 111580 139476 172860 B =
1 6 5 24 11 120 19 720 29 5040 1 840
The test statistic is compared with a χ2(6) distribution. One should have n > 4000
02443 – lecture 2 35
DTU
81 “Simulation and the Monte Carlo Method” and Iversen 07 (in Danish). The sequence: 0.54, 0.67, 0.13, 0.89, 0.33, 0.45, 0.90, 0.01, 0.45, 0.76, 0.82, 0.24, 0.17 is converted to <, >, <, >, <, <, >, <, <, <, >, > giving in total 8 runs of length 1, 1, 1, 1, 2, 1, 3, 2
02443 – lecture 2 36
DTU
The expected number of runs of length k is n+1
12 , 11n−4 12
for runs of length 1 and 2 respectively, and 2[(k2 + 3k + 1)n − (k3 + 3k2 − k − 4)] (k + 3)! for runs of length k < N − 1. Define X to be the total number of runs, then Z = X − 2n−1
3
90
is asymptotically N(0,1).
02443 – lecture 2 37
DTU
ch = 1 n − h
n−h
UiUi+h ∼ N
7 144n
In today’s exercise you should implement everything including the tests yourself, including the chi-square and KS tests. Later, when your code is working you are free to use builtin functions.
numbers and present these numbers in a histogramme (e.g. 10 classes). ⋄ First implement the LCG yourself by experimenting with different values of “a”, “b” and “c”. ⋄ Evaluate the quality of the generators by graphical descriptive statistices (histogrammes, scatter plots) and statistical tests (χ2,Kolmogorov-Smirnov, run-tests, and correlation test. ⋄ Then apply a system available generator and perform the various statistical tests for this generator too.