artur czumaj artur czumaj
play

Artur Czumaj Artur Czumaj DIMAP (Centre for Discrete Maths and it - PowerPoint PPT Presentation

Testing Continuous Distributions Artur Czumaj Artur Czumaj DIMAP (Centre for Discrete Maths and it Applications) DIMAP ( entre for D screte Maths and t Appl cat ons) & Department of Computer Science University of Warwick Joint work with A.


  1. Testing Continuous Distributions Artur Czumaj Artur Czumaj DIMAP (Centre for Discrete Maths and it Applications) DIMAP ( entre for D screte Maths and t Appl cat ons) & Department of Computer Science University of Warwick Joint work with A. Adamaszek & C. Sohler

  2. Testing probability distributions Testing probability distributions • General question: G l – Test a given property of a given probability distribution • distribution is available by accessing only samples drawn from the distribution Examples: - is given probability uniform? - are two prob. distributions independent?

  3. Testing probability distributions Testing probability distributions For more details/introduction: see R. Rubinfeld’s talk on Wednesday • Typical result: – Given a probability distribution on n points, we can test √ n if it’s uniform after seeing ~ random samples [Batu et al ‘01] Testing = distinguish between uniform distribution and Testing = distinguish between uniform distribution and distributions which are ² -far from uniform ² -far from uniform: ² far from uniform P x ∈ Ω | Pr[ x ] − 1 n | ≥ ²

  4. Testing probability distributions Testing probability distributions For more details/introduction: see R. Rubinfeld’s talk on Wednesday • Typical result: – Given a probability distribution on n points, we can test √ n if it’s uniform after seeing ~ random samples [Batu et al ‘01] • What if distribution has infinite support? What if distribution has infinite support? • Continuous probability distributions?

  5. Testing continuous probability distributions Testing continuous probability distributions • Typical result: yp – Given a probability distribution on n points, we can test √ n if it’s uniform after seeing ~ random samples √ n √ – ~ random samples are necessary • Given a continuous probability distribution on [0,1], can we test if it’s uniform? • Impossible bl • Follows from the lower bound for discrete case with n → ∞ h

  6. Testing continuous probability distributions Testing continuous probability distributions • More direct proof: • Suppose tester A distinguishes in at most t steps between uniform distribution and ² -far from uniform • D 1 – uniform distribution • D 2 is ½-far from uniform and is defined as follows: Partition [0,1] into t 3 interval of identical length • • Split each interval into two halves • Randomly choose one half: – the chosen half gets uniform distribution – the other half has zero probability th th h lf h s p b bilit • In t steps, no interval will be chosen more than once in D 2 A A cannot distinguish between D 1 and D 2 t di ti i h b t D d D

  7. Testing continuous probability distributions Testing continuous probability distributions • What can be tested? Wh b d • First question: test if the distribution is indeed continuous

  8. Testing continuous probability distributions Testing continuous probability distributions • Test if a probability distribution is discrete f b b l d b d • Prob. distribution D on  is discrete on N points if there is a set X ⊆  |X| ≤ N st Pr [X]=1 if there is a set X ⊆  , |X| ≤ N, st. Pr D [X]=1 • D is ² -far from discrete on N points if D is ² far from discrete on N points if ∀ X ⊆  , |X| ≤ N Pr [X]<1 ² Pr D [X]<1- ²

  9. Testing if distribution is discrete on N points Testing if distribution is discrete on N points • We repeatedly draw random points from D W dl d d f D • All what can we see: – Count frequency of each point – Count number of points drawn For some D (eg, uniform or close): √ • we need  ( ) to see first multiple occurrence N Gi Gives a hope that can be solved in sublinear-time h th t b l d i bli ti

  10. Testing if distribution is discrete on N points Testing if distribution is discrete on N points R Raskhodnikova et al ’07 (Valiant’08): kh d k l ’0 (V l ’08) Distinct Elements Problem: • D discrete with each element with prob. ≥ 1/N • Estimate the support size pp  (N 1-o(1) ) queries are needed to distinguish instances with ≤ N/100 and ≥ N/11 support size pp ≤ ≥ Key step: two distributions that have identical first log Θ (1) N moments their expected frequencies up to log Θ (1) N are identical •

  11. Testing if distribution is discrete on N points Testing if distribution is discrete on N points R Raskhodnikova et al ’07 (Valiant’08): kh d k l ’0 (V l ’08) Distinct Elements Problem: • D discrete with each element with prob. ≥ 1/N • Estimate the support size pp  (N 1-o(1) ) queries are needed to distinguish instances with ≤ N/100 and ≥ N/11 support size pp ≤ ≥ Corollary: Testing if a distribution is discrete on N points g p requires  (N 1-o(1) ) samples

  12. Testing if distribution is discrete on N points Testing if distribution is discrete on N points • We repeatedly draw random points from D W dl d d f D • All what can we see: – Count frequency of each point – Count number of points drawn • Can we get O(N) time?

  13. Testing if distribution is discrete on N points Testing if distribution is discrete on N points • Testing if a distribution is discrete on N points: f d b d N • Draw a sample S = (s 1 , …, s t ) with t = cN/ ² • If S has more than N distinct elements then REJECT else ACCEPT If D is discrete on N points then we will accept D p p • We only have to prove that • if D is ² -far from discrete on N points, then we will reject • with probability >2/3 with probability >2/3

  14. Testing if distribution is discrete on N points Testing if distribution is discrete on N points • Testing if a distribution is discrete on N points: f d b d N • Draw a sample S = (s 1 , …, s t ) with t = cN/ ² • If S has more than N distinct elements then REJECT else ACCEPT Can we do better (if we only count distinct elements)? y D: has 1 point with prob. 1-4 ² 2N points with prob. 2 ² /N D i D is ² -far from discrete on N points f f di N i We need  (N/ ² ) samples to see at least N points

  15. Testing if distribution is discrete on N points Testing if distribution is discrete on N points Assume D is ² -far from discrete on N points Assume D is ² far from discrete on N points Order points in  so that Pr[X i ] = p i and p i ≥ p i+1 A = {X 1 , …, X N }, B = other points from the support p 1 +p 2 +…+p N < 1- ² α = # points from A drawn by the algorithm β = # points from B drawn by the algorithm # points from B drawn by the algorithm β We consider 3 cases (all bounds are with prob. > 0.99): We consider 3 cases (all bounds are with prob > 0 99): 1) p N < ² /2N  β > N all points in B have small prob.  not too many repetitions • 2) p N ≥ c N / ²  β ≥ ² /2p N points in B have small prob.  bound for #distinct points • 3) p N ≥ ² /2N  α ≥ N - ² /2p N 3) p N ≥ ² /2N  α ≥ N ² /2p N either many distinct points from A or p N is very small (then β will • be large)

  16. Testing if distribution is discrete on N points Testing if distribution is discrete on N points Assume D is ² -far from discrete on N points Assume D is ² far from discrete on N points Order points in  so that Pr[X i ] = p i and p i ≥ p i+1 A = {X 1 , …, X N }, B = other points from the support α = # points from A drawn by the algorithm β = # points from B drawn by the algorithm Main ideas: Case 2) p N ≥ c N / ²  β ≥ ² /2p N Worst case: all points in B have uniform and maximum distrib = p N Worst case: all points in B have uniform and maximum distrib. = p N • • Z i = random variable: number of steps to get ith new point from B • ²/ 2 p N X We have to prove that with prob. > 0.99: • Z i < t i =1 Z 1 , Z 2 , … - geometric distribution: 1 E [ Z i ] = ( r − i ) p N , r = number of points in B • P ²/ 2 p N 2 E [ Z i ] ≤ i =1 i 1 p N p N → Markov gives with prob. ≥ 0.99: P ²/ 2 p N Z i < t i =1

  17. Testing if distribution is discrete on N points Testing if distribution is discrete on N points • We repeatedly draw random points from D W dl d d f D • All what can we see: – Count frequency of each point – Count number of points drawn By sampling O(N/ ² ) points one can distinguish between By sampling O(N/ ² ) points one can distinguish between • distributions discrete on N points and • those ² -far from discrete on N points those ² far from discrete on N points The algorithm may fail with prob. < 1/3

  18. Testing continuous probability distributions Testing continuous probability distributions • What can we test efficiently? Wh ff l – Complexity for discrete distributions should be “independent” on the support size “i d d t” th t i • Uniform distribution … under some conditions U if di t ib ti d diti • Rubinfeld & Servedio’05: – testing monotone distributions for uniformity

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend