Data Blocking Jon K. Nilsen Department of Physics and Scientific - PowerPoint PPT Presentation

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of Oslo, N-0316 Oslo, Norway Spring 2008 Computational Physics II FYS4410

Outline Data Blocking Why blocking? What is blocking? Blocking in parallel VMC Example Computational Physics II FYS4410

Why blocking? Statistical analysis Monte Carlo simulations can be treated as computer experiments The results can be analysed with the same statistics tools we would use in analysing laboraty experiments As in all other experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., the error Computational Physics II FYS4410

Why blocking? Statistical analysis As in other experiments, Monte Carlo experiments have two classes of errors: Statistical errors Systematic errors Statistical errors can be estimated using standard tools from statistics Systematic errors are method specific and must be treated differently from case to case. (In VMC a common source is the step length) Computational Physics II FYS4410

What is blocking? Blocking Say that we have a set of samples from a Monte Carlo experiment Assuming (wrongly) that our samples are uncorrelated our best estimate of the standard deviation of the mean ¯ m is given by � ¯ � 1 � m 2 − ¯ m 2 σ = n − 1 If the samples are correlated it can be showed that � ¯ � 1 + 2 τ/ ∆ t � m 2 − ¯ m 2 σ = n − 1 where τ is the correlation time (the time between a sample and the next uncorrelated sample) and ∆ t is time between each sample Computational Physics II FYS4410

What is blocking? Blocking If ∆ t ≫ τ our first estimate of σ still holds Much more common that ∆ t < τ In the method of data blocking we divide the sequence of samples into blocks We then take the mean ¯ m i of block i = 1 . . . n blocks to calculate the total mean and variance The size of each block must be so large that sample j of block i is not correlated with sample j of block i + 1 The correlation time τ would be a good choice Computational Physics II FYS4410

What is blocking? Blocking Problem: We don’t know τ Solution: Make a plot of std. dev. as a function of block size The estimate of std. dev. of correlated data is too low → the error will increase with increasing block size until the blocks are uncorrelated, where we reach a plateau When the std. dev. stops increasing the blocks are uncorrelated Computational Physics II FYS4410

Implementation Main ideas Do a parallel Monte Carlo simulation, storing all samples to files (one per process) Do the statistical analysis on these files, independently of your Monte Carlo program Read the files into an array Loop over various block sizes For each block size n b , loop over the array in steps of n b taking the mean of elements in b , . . . , ( i + 1 ) n b Take the mean and variance of the resulting array Write the results for each block size to file for later analysis Computational Physics II FYS4410

Implementation Example The files vmc para.cpp and vmc blocking.cpp contains a parallel VMC simulator (see Mortens slides for details) and a program for doing blocking on the samples from the resulting set of files Will go through the parts related to blocking Computational Physics II FYS4410

Implementation Parallel file output The total number of samples from all processes may get very large Hence, storing all samples on the master node is not a scalable solution Instead we store the samples from each process in separate files Must make sure this files have different names String handling ostringstream ost ; ost < < my rank < < ".dat" ; < "blocks_rank" < b l o c k o f i l e . open ( ost . s t r ( ) . c s t r ( ) , ios : : out | ios : : binary ) ; Computational Physics II FYS4410

Implementation Parallel file output Having separated the filenames it’s just a matter of taking the samples and store them to file Note that there is no need for communication between the processes in this procedure File dumping a l l e n e r g i e s = new double [ number cycles +1]; mc sampling ( max variations , number cycles , cumulative e , cumulative e2 , a l l e n e r g i e s ) ; b l o c k o f i l e . write ( ( char ∗ ) ( a l l e n e r g i e s +1) , number cycles ∗ sizeof ( double ) ) ; b l o c k o f i l e . close ( ) ; Computational Physics II FYS4410

Implementation Reading the files Reading the files is only about mirroring the output To make life easier for ourselves we find the filesize, and hence the number of samples by using the C function stat File loading struct s t a t r e s u l t ; i f ( s t a t ( "blocks_rank0.dat" , &r e s u l t ) == 0) { l o c a l n = r e s u l t . s t s i z e / sizeof ( double ) ; n = l o c a l n ∗ n procs ; } double ∗ mc results = new double [ n ] ; for ( int i =0; i < n procs ; i ++) { ostringstream ost ; ost < < "blocks_rank" < < i < < ".dat" ; ifstream i n f i l e ; | i n f i l e . open ( ost . s t r ( ) . c s t r ( ) , ios : : in ios : : binary ) ; i n f i l e . read ( ( char ∗ )&( mc results [ i ∗ l o c a l n ] ) , r e s u l t . s t s i z e ) ; i n f i l e . close ( ) ; } Computational Physics II FYS4410

Implementation Blocking Loop over block sizes in b , . . . , ( i + 1 ) n b Loop over block sizes for ( int i =0; i < n block samples ; i ++) { block size = min block size+ i ∗ block step length ; blocking ( mc results , n , block size , res ) ; mean = res [ 0 ] ; sigma = res [ 1 ] ; o u t f i l e << block size << "\t" << mean << "\t" << sqrt ( sigma / ( ( n / block size ) − 1.0) ) << endl ; } Computational Physics II FYS4410

Implementation Blocking The blocking itself is now just a matter of finding the number of blocks (note the integer division) and taking the mean of each block Note the pointer aritmetic: Adding a number i to an array pointer moves the pointer to element i in the array Blocking function void blocking ( double ∗ vals , int n vals , int block size , double ∗ res ) { int n blocks = n vals / block size ; double ∗ block vals = new double [ n blocks ] ; for ( int i =0; i < n blocks ; i ++) block vals [ i ] = mean( vals+ i ∗ block size , block size ) ; meanvar ( block vals , n blocks , res ) ; } Computational Physics II FYS4410

Data Blocking Jon K. Nilsen Department of Physics and Scientific - PowerPoint PPT Presentation

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of Oslo, N-0316 Oslo, Norway Spring 2008 Computational Physics II FYS4410 Outline Data Blocking Why blocking? What is blocking? Blocking in

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of

Dynamic Blocking Problems for Models of Fire Propagation Alberto Bressan Department of

Delay Aware Packet Scheduling (DAPS) and receivers buffer blocking in CMT-SCTP Nicolas KUHN 1 ,

Blocking in the 2 k Design Blocking may be required because: we cannot perform all required runs

Blocking and Non-blocking Checkpointing and Rollback Recovery for Networks-on-Chip Claudia Rusu 1

Efficient ion blocking in gaseous detectors Efficient ion blocking in gaseous detectors and its

Pragmatic Primitives for Non-blocking Data Structures PODC 2013 Trevor Brown, University of

Blind Measurement of Blocking Artifact in Images Zhou Wang Lab for Image and Video Engineering

blocking synchronization Yang Xu Outline Disadvantages of locking Hardware support for

[Introduction to] Writing non- blocking code ... in Node.js and Perl Thursday, July 19, 12

AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning Florian Tramr November

The Binary Blocking Flow Algorithm Andrew V. Goldberg Microsoft Research Silicon Valley

Morphological blocking in English causatives Michael Yoshitaka Erlewine and Hadas Kotek

Minimal multiple blocking sets Anurag Bishnoi (with S. Mattheus and J. Schillewaert) Free

A General Technique for Non-blocking Trees Trevor Brown, University of Toronto, Canada Faith

How the Great Firewall of China is Blocking Tor Philipp Winter and Stefan Lindskog Karlstad

RTP Redundancy Up date Colin P erkins < c.p erkins@cs.ucl.ac.uk > Depa rtment of

Combining Compression Functions and Block Cipher-Based Hash Functions Asiacrypt 2006 Thomas

1 Unroll and Jam Unroll and Jam Example (cont) Unroll the Outer Loop Idea do j = 1,2*n by 2

Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Soner Onder Michigan Technological

Web Information Retrieval Lecture 3 Index Construction Index construction This time:

Outline Overview of recent work improving performance in most difficult cases:

Single-Database Private Information Retrieval 07.11.2005 Aleksandr Grebennik Tartu University a

SASE : Implementation of a Compressed Text Search Engine Srinidhi Varadarajan Tzi-cker

Data Blocking Jon K. Nilsen Department of Physics and Scientific - PowerPoint PPT Presentation

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of Oslo, N-0316 Oslo, Norway Spring 2008 Computational Physics II FYS4410 Outline Data Blocking Why blocking? What is blocking? Blocking in

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of

Dynamic Blocking Problems for Models of Fire Propagation Alberto Bressan Department of

Delay Aware Packet Scheduling (DAPS) and receivers buffer blocking in CMT-SCTP Nicolas KUHN 1 ,

Blocking in the 2 k Design Blocking may be required because: we cannot perform all required runs

Blocking and Non-blocking Checkpointing and Rollback Recovery for Networks-on-Chip Claudia Rusu 1

Efficient ion blocking in gaseous detectors Efficient ion blocking in gaseous detectors and its

Pragmatic Primitives for Non-blocking Data Structures PODC 2013 Trevor Brown, University of

Blind Measurement of Blocking Artifact in Images Zhou Wang Lab for Image and Video Engineering

blocking synchronization Yang Xu Outline Disadvantages of locking Hardware support for

[Introduction to] Writing non- blocking code ... in Node.js and Perl Thursday, July 19, 12

AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning Florian Tramr November

The Binary Blocking Flow Algorithm Andrew V. Goldberg Microsoft Research Silicon Valley

Morphological blocking in English causatives Michael Yoshitaka Erlewine and Hadas Kotek

Minimal multiple blocking sets Anurag Bishnoi (with S. Mattheus and J. Schillewaert) Free

A General Technique for Non-blocking Trees Trevor Brown, University of Toronto, Canada Faith

How the Great Firewall of China is Blocking Tor Philipp Winter and Stefan Lindskog Karlstad

RTP Redundancy Up date Colin P erkins &lt; c.p erkins@cs.ucl.ac.uk &gt; Depa rtment of

Combining Compression Functions and Block Cipher-Based Hash Functions Asiacrypt 2006 Thomas

1 Unroll and Jam Unroll and Jam Example (cont) Unroll the Outer Loop Idea do j = 1,2*n by 2

Memory Hierarchy 3 Cs and 6 Ways to Reduce Misses Soner Onder Michigan Technological

Web Information Retrieval Lecture 3 Index Construction Index construction This time:

Outline Overview of recent work improving performance in most difficult cases:

Single-Database Private Information Retrieval 07.11.2005 Aleksandr Grebennik Tartu University a

SASE : Implementation of a Compressed Text Search Engine Srinidhi Varadarajan Tzi-cker

RTP Redundancy Up date Colin P erkins < c.p erkins@cs.ucl.ac.uk > Depa rtment of