PRNG REQUIREMENTS AND THE NEW MIXMAX PRNG J. Apostolakis 1 - - PowerPoint PPT Presentation

prng requirements and the new mixmax prng
SMART_READER_LITE
LIVE PREVIEW

PRNG REQUIREMENTS AND THE NEW MIXMAX PRNG J. Apostolakis 1 - - PowerPoint PPT Presentation

PRNG REQUIREMENTS AND THE NEW MIXMAX PRNG J. Apostolakis 1 Overview What do we need from a PRNG? What makes a good PRNG? What Dynamical Systems tell us The MIXMAX generator Properties, speed Availability 2


slide-1
SLIDE 1

PRNG REQUIREMENTS AND THE NEW MIXMAX PRNG

  • J. Apostolakis

1

slide-2
SLIDE 2

Overview

  • What do we need from a PRNG?
  • What makes a good PRNG?
  • What Dynamical Systems tell us
  • The MIXMAX generator
  • Properties, speed
  • Availability

2

slide-3
SLIDE 3

Particle Transport

  • Each interaction can create new, secondary particles (cascade = tree
  • f particles)
  • Many ‘decision’ depends on using a “random” number
  • Deciding which interaction (e.g. absorption or scattering?)
  • Generating a secondary particle depends on sampling value from

a probability distribution function

  • A good PseudoRandom Number Generator must provide the same

behaviour which cannot be distinguished from numbers sampled from a truly random source

3

slide-4
SLIDE 4

PRNG requirements

  • “Separate” sub-sequences using seeds
  • Excellent* statistical properties - same as truly random #s
  • ‘no’/‘perfect’ correlation of numbers within a stream
  • ‘no’ correlation between streams
  • Fast implementation
  • Efficient seeding from large integer (128-bit+ ?)
  • Same sequence on all hardware & OSes for the same seed
  • Size of the state is not a critical factor for Geant4 MT - remains < 3KB

4

slide-5
SLIDE 5

Needs of Particle Transport

  • Good statistical properties for the values
  • Stream of reliable, portable random numbers are critical
  • Large period - 30 * 10

6 steps/event * 10 10 events/year

  • Low correlation for the full sub-sequence of a stream
  • No correlations between streams (used for different events)
  • Computing performance - is 3-10% of CPU time
  • but RANLUX @ Luxury Level=5 can be > 10%
  • it matters - so we should seek to make it < 2-3%, if possible
  • Reproducibility/portability between operating systems & CPU arch.

5

slide-6
SLIDE 6

Parallelism

  • Clusters: Using from O(100) to O(1000) on a site
  • Grids - taking part of the time of many more cores O(100,000) for
  • ne application or one experiment
  • Inside one system - SIMD instructions in CPU, multiple cores

(desktop or accelerator)

  • Parallelism can be used at many levels of granularity
  • Job - different CPUs using batch processing or grid
  • Event parallelism - choice for multi-threading
  • Track parallelism - for primary tracks (or finer ?)

6

slide-7
SLIDE 7

Status in HEP & challenge

  • Provably good PRNG: RANLUX @ luxury = 5
  • slow (can take >10% CPU)
  • unclear if the overhead is needed
  • Empirical PRNGs - fast, but without math. guarantees
  • Mersenne Twister - state of 624 words, enormous period
  • Can we bridge the dilemma - obtain a provably-good

PRNG which is fast ?

7

slide-8
SLIDE 8

A theory for PRNGs

  • Martin Luscher’s RANLUX was brought to us by Fred James remedies the

most troubling issue - provides a mathematical underpinning (Savvidy 1990)

  • Theory of dynamical systems provides guarantee of decorrelation

between nearby initial states

  • “The Lyapunov exponents of the latter set the time scales for statistical

correlations in the generated random sequences. In particular, subsequences of numbers separated by several correlation times are highly decorrelated and thus provide a much better source of (pseudo) random numbers.” ( Ref 1a) )

  • After 16 iterations of its core RCARRY algorithm, the initial correlation

is completely lost

  • Based on Kolmogorov-Anisov mixing - paper of G. Savvidy et al (1990)

8

slide-9
SLIDE 9

Turning theory into a PRNG

  • Many PRNG use a recursion relation to obtain a

new set of values x’ from a previous one x

  • Mixmax uses a specific

class of matrix A which are realizations of chaotic dynamical matrix-recursive systems

Proposed in 1990 (Akopov et. al.) Replacing now with (3+s), & optimise ’s’ to obtain good period & eigenvalues

9

slide-10
SLIDE 10

Entropy and de-correlation

❖ One must study the spectrum of eigenvalues and none

  • f the eigenvalues should be on the unit circle

❖ Entropy is equal to ❖ Decay of correlations is governed by entropy:


h = X

|λ|>1

ln λ

i=1...N

|λ|

1/4 4 16 64

τ0 ≤ 1/h

Kostas Savvidy, 3 July 2015

10

slide-11
SLIDE 11

Other generators

❖ RCARRY at the core of RANLUX has a simple multiplier

matrix so gives small eigenvalues - so the deceleration is

  • slow. So RANLUX needed to iterate many times to get

good decorrelation (luxury level=5)

RCARRY - The eigenvalues closest to the circle has |λ| ≈ 1.0085, and the farthest |λ| ≈ 1.043.

❖ The Mersenne

Twister is worse - also due to sparse matrix and polynomial.

Mersenne Twister- eigenvalues have |λ|< 1.002

Kostas Savvidy, 3 July 2015

11

slide-12
SLIDE 12

Speed, Period of MIXMAX

  • Full matrix-matrix multiplication is too slow
  • Need a fast algorithm for the next state - done!
  • Must calculate the period for trial values of N
  • Advances from Kostas Savvidy:
  • created fast code for special structure of matrix ‘A’
  • Calculated period for some N & “magic” number ‘s’

12

slide-13
SLIDE 13

Particular realisations of MIXMAX

Size Magic Entropy Period q is N s (lower bound) τ/q ≈ log10(q) fully factored BigCrush 10 −1 6.2 1/4 165 Yes 33 16 6 9.9 1/32 275 Yes > 13 40 1 24.6 1/4 716 Yes 3 44 27.1 1/4 789 No 4 60 4 37.0 1 1083 Yes 2 64 6 39.4 1/8 1156 No 1 (?) 88 1 54.2 1/2 1597 No Pass 256 −1 157.7 1 4682 No Pass 508 5 313.0 1 9309 No Pass 720 1 443.6 1 13202 No Pass 1000 616.1 1/20 18344 No Pass 1260 15 776.3 1/2 23118 No Pass 3150 −11 1940.8 1/12 57824 No Pass

256 487013230256099064 with good period and large eigenvalue (3000) for fast deceleration

is the current choice

New:

13

slide-14
SLIDE 14

Towards production use

  • MIXMAX can provide the mathematical guarantees we
  • btained from RANLUX
  • Passes empirical BIGCRUSH tests
  • Skipping algorithm is available, using pre-calculated data
  • It remains only to define what is the acceptable level of de-

correlation => obtain a ‘perfect’ RNG using decimation

  • Used ‘Raw’ (not decimated) its implementation is faster than

CLHEP’s Merseine Twister

14

slide-15
SLIDE 15

Using it in Geant4

  • A CLHEP interface to MIXMAX has been

prepared

  • Submitted to CLHEP editors for inclusion in next

release

  • Available already in Geant4 ‘internal’ CLHEP
  • Should validate its use & measure the computing

performance in simulations.

15

slide-16
SLIDE 16

References

  • 1. RANLUX

a) http://cern.ch/luscher/ranlux/index.html b) M. Lüscher, Comp. Phys. Comm. 79 (1994) 100

  • 2. MIXMAX Proposal
  • N. Akopov, G. Savvidy, and N. Ter-Arutyunyan-Savvidy, J. Comput. Phys., p.

573–579, Dec. 1991.

  • 3. MIXMAX: Efficient implementation & period

a) K. G. Savvidy, Comp. Physics Comm., Vol 196, November 2015, pp 161-165, http://dx.doi.org/10.1016/j.cpc.2015.06.003 b) http://mixmax.hepforge.org/

  • 4. Many slides were from the recent “MIXMAX Network meeting”, 3 July 2015,

at https://indico.cern.ch/event/404547/

16

slide-17
SLIDE 17

BACKUP

17

slide-18
SLIDE 18

18

slide-19
SLIDE 19

19

slide-20
SLIDE 20

Implementation of MIXMAX

Kostas Savvidy, 3 July 2015

20