random sampling
play

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential - PowerPoint PPT Presentation

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with Reservoir Sequential With Reservoir and Replacement . . . . . . . . . . . . Sampling Algorithms Input: List [ 1 . . . N ] Length of


  1. Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with Reservoir Sequential With Reservoir and Replacement . . . . . . . . . . . .

  2. Sampling Algorithms Input: • List [ 1 . . . N ] • Length of database N (if known) • Length of sample n Output: • Sample [ 1 . . . n ] Non-Sequential Sequential Sequential with Reservoir Sequential With Reservoir and Replacement . . . . . . . . . . . .

  3. Considerations • Online vs. random-access • Sequential vs. non-sequential • Samples for independent categories Desiderata: • Parallelizable • If random access, running time close to O ( n ) • Constant memory Non-Sequential Sequential Sequential with Reservoir Sequential With Reservoir and Replacement . . . . . . . . . . . .

  4. Random Indices 1: m ← 0 2: while m < n do R ← random ( { 1 . . . N } ) 3: if List [ R ] / ∈ Sample then 4: m ← m + 1 5: Sample [ m ] ← List [ R ] 6: N • ca. N ln N − n + 1 iterations in expectation • Space/time trade off in line 4 Non-Sequential Sequential Sequential with Reservoir Sequential With Reservoir and Replacement . . . . . . . . . . . .

  5. Random Remaining Indices 1: for m ← 1 , . . . , n do R ← random ( { 1 . . . N − m + 1 } ) 2: j ← index of R 'th non-null element in List 3: Sample [ m ] ← List [ j ] 4: List [ j ] ← null 5: • Prohibitive running time Θ( nN ) • Modifies List Non-Sequential Sequential Sequential with Reservoir Sequential With Reservoir and Replacement . . . . . . . . . . . .

  6. The Fisher-Yates Shuffle 1: for m ← 1 , . . . , n do R ← random ( { m . . . N } ) 2: Swap List [ m ] and List [ R ] 3: 4: Sample [ 1 . . . n ] ← List [ 1 . . . n ] • Running time Θ( n ) • Modifies List Non-Sequential Sequential Sequential with Reservoir Sequential With Reservoir and Replacement . . . . . . . . . . . .

  7. Probabilistic Sampling 1: for t ← 1 , . . . , N do with probability n N do 2: Append List [ t ] to Sample 3: • Running time Θ( N ) • Only expected sample size n (mean of B ( N , n N ) ) • Standard deviation √ n ( 1 − n / N ) Non-Sequential Sequential Sequential with Reservoir Sequential With Reservoir and Replacement . . . . . . . . . . . .

  8. Selection Sampling 1: m ← 0 2: for t ← 1 , . . . , N do with probability n − m N − t do 3: m ← m + 1 4: Sample [ m ] ← List [ t ] 5: • Running time Θ( N ) • Completely unbiased! Non-Sequential Sequential Sequential with Reservoir Sequential With Reservoir and Replacement . . . . . . . . . . . .

  9. Random Number Generation (Digression) Running time O ( n ) possible by skipping rows? Idea 1: • Let S ∈ { 0 . . . N − n } RV for # rows to skip Pr [ S ≤ s ] = 1 − ( N − n ) s + 1 N s + 1 Idea 2 (Vitter, 1984): • von Neumann’s rejection & “squeeze” method Non-Sequential Sequential Sequential with Reservoir Sequential With Reservoir and Replacement . . . . . . . . . . . .

  10. Vitter (1984) ) n − 1 , N g ( x ) = n 1 − x c = N − n + 1 , ( N N ) n − 1 h ( x ) = n x 1 − ( N N − n + 1 N = 20, n = 5 0,2 c · g ( x ) 0,1 h ( x ) Pr[ S = x ] 0 5 10 15 Non-Sequential Sequential Sequential with Reservoir Sequential With Reservoir and Replacement . . . . . . . . . . . .

  11. Reservoir Sampling 1: Sample [ 1 . . . n ] ← List [ 1 . . . n ] 2: for t ← n + 1 , . . . , N do with probability n t do 3: R ← random ( { 1 . . . n } ) 4: Sample [ R ] ← List [ t ] 5: • Completely unbiased! • O ( n ( 1 + log N n )) by optimizing (Vitter, 1985) Non-Sequential Sequential Sequential with Reservoir Sequential With Reservoir and Replacement . . . . . . . . . . . .

  12. Reservoir, with Replacement 1: for t ← 1 , . . . , N do for i ← 1 , . . . , n do 2: with probability 1 t do 3: Sample [ i ] ← List [ t ] 4: • Completely unbiased! Non-Sequential Sequential Sequential with Reservoir Sequential With Reservoir and Replacement . . . . . . . . . . . .

  13. Bibliography Knuth (1997): The Art of Computer Programming, Vol. 2 Vitter (1984): Faster methods for random sampling Vitter (1985): Random sampling with a reservoir Park, Ostrouchov, Samatova, Geist (2004): Reservoir-Based Random Sampling with Replacement from Data Stream Non-Sequential Sequential Sequential with Reservoir Sequential With Reservoir and Replacement . . . . . . . . . . . .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend