Generating Massive Amount of Generating Massive Amount of High- - - PowerPoint PPT Presentation

generating massive amount of generating massive amount of
SMART_READER_LITE
LIVE PREVIEW

Generating Massive Amount of Generating Massive Amount of High- - - PowerPoint PPT Presentation

Generating Massive Amount of Generating Massive Amount of High- -Quality Random Numbers using GPU Quality Random Numbers using GPU High Wai-Man Pang, Tien-Tsin Wong, Pheng-Ann Heng The Computer Science and Engineering Department The Chinese


slide-1
SLIDE 1

Generating Massive Amount of Generating Massive Amount of High High-

  • Quality Random Numbers using GPU

Quality Random Numbers using GPU

Wai-Man Pang, Tien-Tsin Wong, Pheng-Ann Heng The Computer Science and Engineering Department The Chinese University of Hong Kong

IEEE WCCI CIGPU 2008

slide-2
SLIDE 2

Pseudo Pseudo-

  • random number generator

random number generator (PRNG) (PRNG)

  • Provide uniform random numbers

Provide uniform random numbers

  • Example : rand() in C

Example : rand() in C

  • Important for stochastic algorithms

Important for stochastic algorithms

  • Evolutionary Computing

Evolutionary Computing

  • Photon

Photon-

  • mapping rendering

mapping rendering

  • Huge Amount

Huge Amount

  • Speed

Speed

  • Quality

Quality

  • Poor randomness

Poor randomness slow convergence slow convergence

slide-3
SLIDE 3

PRNG for Stochastic Rendering PRNG for Stochastic Rendering

  • Artifact for poor quality PRNG

Artifact for poor quality PRNG

slide-4
SLIDE 4

PRNG for Stochastic Rendering PRNG for Stochastic Rendering

  • From High quality PRNG

From High quality PRNG

slide-5
SLIDE 5

Some common PRNG Some common PRNG

  • linear

linear congruential congruential generator (LCG) generator (LCG)

  • R

Rn+1

n+1=

= aR aRn

n+ b (mod m)

+ b (mod m)

  • lagged Fibonacci generator

lagged Fibonacci generator

  • R

Rn

n=

= R Rn

n-

  • j

j #

# R Rn+k

n+k (mod m) (where # is a binary

(mod m) (where # is a binary

  • perator)
  • perator)
  • High precision integer arithmetic

High precision integer arithmetic

  • Cannot fit in all GPU

Cannot fit in all GPU

slide-6
SLIDE 6

PRNG on GPU PRNG on GPU

  • Cellular Automata

Cellular Automata-

  • based PRNG [

based PRNG [Wolfram]

  • No high precision integer

No high precision integer arithmetics arithmetics

  • Homogeneous cell operation and

Homogeneous cell operation and connectivity connectivity

  • Quality

Quality

  • Configure to produce high quality random

Configure to produce high quality random sequence sequence

slide-7
SLIDE 7

CA CA-

  • based PRNG

based PRNG

  • Array of connected

Array of connected cells cells with homogeneous behavior with homogeneous behavior

  • Each Cell have a state and

Each Cell have a state and a common cell equation a common cell equation

  • Cell Equation :

Cell Equation :

… 2 0 14 … 18 Previous state values from neighbors (X) Output state value ci

slide-8
SLIDE 8

Mechanism Mechanism

  • 4 Cell, Connectivity (

4 Cell, Connectivity (-

  • 1,2)

1,2)

  • Cell Equation : step( 1, 3

Cell Equation : step( 1, 3-

  • c1

c1-

  • 2*c2 )

2*c2 )

1 1

A B C D

Cell C: 0 Cell D: 1

  • A

Step(1, 3- 1 – 2*0)

1

slide-9
SLIDE 9

Mechanism (cont Mechanism (cont’ ’) )

random number generated

111

1 1 1

A B C D

random number generated

011

1 1

A B C D

slide-10
SLIDE 10

GPU Implementation Issue GPU Implementation Issue

  • Cell resembles

Cell resembles texel texel in GPU in GPU

  • 64 cells and 4 connected CA PRNG for 32

64 cells and 4 connected CA PRNG for 32-

  • bits

bits random number random number

  • Cell equation evaluation

Cell equation evaluation

  • Fast table lookup

Fast table lookup

  • 4

4 connectivities connectivities = 4 input, 2 = 4 input, 24

4 = 16 possible output

= 16 possible output

  • Reorganize bits

Reorganize bits

  • Bits in a random number is scattered among

Bits in a random number is scattered among texels texels

  • Output floating point value

Output floating point value f f

  • r

ri

i is the

is the i i-

  • th

th bit in the random number bit in the random number

( ) ( ) ( ) 2

/ ...... 2 / 2 /

31 1

r r r f + + + =

slide-11
SLIDE 11

Shader Shader Code Code

float4 caprng( in half2 coords: TEX0,in const uniform samplerRECT cells): COLOR0 { float2 Connector; float4 newState; float4 neigborStates[4]; int i; for (i = 0 ; i < 4; i++) { Connector.x = fmod(coords.x -connectivity(i),CA SIZE); Connector.y = coords.y; neigborStates[i] = round(texRECT(cells,Connector)); } // cell equation evaluation newState.x = celleqn(neigborStates); return newState; } float4 pack(in half2 index : TEX0, in const uniform samplerRECT cells): COLOR0 { int i; float4 outbits; float4 states; float2 texindex; outbits = 0; // packing all 32 bits for (i = 0 ; i < 32 ; i++) { texindex.x = i*2+1; texindex.y = index.y; states = texRECT(cells, texindex);

  • utbits += states;
  • utbits /= 2;

}return outbits; }

slide-12
SLIDE 12

Parallelized PRNG Parallelized PRNG

  • Fully utilize 4096

Fully utilize 4096 × ×4096 4096 texels texels (7800GTX) (7800GTX)

  • Each cell occupies single bit in

Each cell occupies single bit in texel texel

  • Why not pack more inside each

Why not pack more inside each texel texel ? ?

  • Fully utilize the mantissa part of the

Fully utilize the mantissa part of the texel texel

  • 23

23 × × 4 random sequences simultaneously. 4 random sequences simultaneously.

  • Combine 2 schemes : 64

Combine 2 schemes : 64× ×4096 4096× ×92 92 PRNGs PRNGs

1 1 1 1 1 1 PRNG1: PRNG2: PRNG3: 1

TEX0 TEX2

1

TEX1 TEX3

1

TEX0

1

TEX2

1 1 1

TEX1

1

TEX3

……

TEX4

1

TEX6

1

TEX5 TEX7 TEX8 TEX10

1

TEX9

1

TEX11

Cells Texture …… Cells Texture

slide-13
SLIDE 13

Optimize for Quality Optimize for Quality

  • Genetic Algorithm

Genetic Algorithm

  • CA base PRNG configuration with best quality

CA base PRNG configuration with best quality

  • Initialize candidates

Initialize candidates

  • Encoded cell equation and

Encoded cell equation and connectivities connectivities

  • 2

2n

n + n bits

+ n bits

  • Evaluate candidates by objective function

Evaluate candidates by objective function

  • Generate next generation

Generate next generation

  • Crossover

Crossover

  • Mutation

Mutation

  • Repeat until excess certain threshold

Repeat until excess certain threshold

slide-14
SLIDE 14

Objective Function Objective Function

  • Objective function

Objective function

  • bjective = w0 × e + w1 ×
  • w

wi

i is the weighting

is the weighting

  • e

e is the n is the n-

  • bit entropy

bit entropy

is the result of Diehard test

slide-15
SLIDE 15

Objective Function (cont Objective Function (cont’ ’) )

  • Diehard test

Diehard test

  • 14 tests (e.g. birthday spacing, GDC test, etc.)

14 tests (e.g. birthday spacing, GDC test, etc.)

  • Chi

Chi-

  • square

square

  • Overall p

Overall p-

  • value

value

  • Chi

Chi-

  • square test on all p

square test on all p-

  • values with Gaussian distribution

values with Gaussian distribution

  • Best 4 connected, 64 Cells CA PRNG

Best 4 connected, 64 Cells CA PRNG

  • Connectivity (56,2,21,49)

Connectivity (56,2,21,49)

  • Cell equation in tightly packed format

Cell equation in tightly packed format (1001100110100101) (1001100110100101)

slide-16
SLIDE 16

Convergence Convergence

Control 10,000 photons Generation 1 e=0.2673 =0.0 Generation 2 e=0.5852 =0.0 Generation 4 e=0.5944 =0.0 Generation 8 e=0.9464 =0.143 Generation 11 e=0.9514 =0.3513

slide-17
SLIDE 17

Performance Performance

  • Performance compare with CPU

Performance compare with CPU

  • Single PRNG

Single PRNG

  • 1,000 Parallel PRNG

1,000 Parallel PRNG

0.004s 0.064s 1,000 0.042s 0.942s 10,000 0.391s 10.081s 100,000 4.163s 100.082s 1,000,000

Software CA- PRNG GPU CA-PRNG Random numbers generated

0.043s 0.004s 10,000 0.425s 0.031s 100,000 4.274s 0.31s 1,000,000 43.003s 3.098s 10,000,000 430s 31.875s 100,000,000

Software CA- PRNG GPU CA-PRNG Random numbers generated

slide-18
SLIDE 18

Conclusion Conclusion

  • CA architecture PRNG is highly suitable

CA architecture PRNG is highly suitable for GPU for GPU

  • Parallel PRNG on GPU

Parallel PRNG on GPU

  • Optimization for quality

Optimization for quality

  • A high quality and high performance gain

A high quality and high performance gain

  • Future works

Future works

  • Support of variable precision random

Support of variable precision random sequence sequence

  • Experiment with Evolution Computing

Experiment with Evolution Computing applications applications

slide-19
SLIDE 19

End End

Thanks for your attention Thanks for your attention

slide-20
SLIDE 20
  • Reference :

Reference :

  • "

"Implementating Implementating High High-

  • Quality PRNG on GPU",

Quality PRNG on GPU",

  • W. M. Pang, T. T. Wong and P. A.
  • W. M. Pang, T. T. Wong and P. A. Heng

Heng, , Shader Shader X5: Advanced Rendering Techniques, Edited by W. X5: Advanced Rendering Techniques, Edited by W. Engel, Charles River Media, 2007, pp. 579 Engel, Charles River Media, 2007, pp. 579-

  • 590.

590.