Preserving Randomness for Adaptive Algorithms William M. Hoza 1 Adam - - PowerPoint PPT Presentation

preserving randomness for adaptive algorithms
SMART_READER_LITE
LIVE PREVIEW

Preserving Randomness for Adaptive Algorithms William M. Hoza 1 Adam - - PowerPoint PPT Presentation

Preserving Randomness for Adaptive Algorithms William M. Hoza 1 Adam R. Klivans August 20, 2018 RANDOM 1 Supported by the NSF GRFP under Grant DGE-1610403 and by a Harrington Fellowship from UT Austin 1 / 13 Randomized estimation algorithms


slide-1
SLIDE 1

Preserving Randomness for Adaptive Algorithms

William M. Hoza1 Adam R. Klivans August 20, 2018 RANDOM

1Supported by the NSF GRFP under Grant DGE-1610403 and by a

Harrington Fellowship from UT Austin

1 / 13

slide-2
SLIDE 2

Randomized estimation algorithms

◮ Algorithm Est(C) estimates some value µ(C) ∈ Rd

2 / 13

slide-3
SLIDE 3

Randomized estimation algorithms

◮ Algorithm Est(C) estimates some value µ(C) ∈ Rd Pr[Est(C) − µ(C)∞ > ε] ≤ δ

2 / 13

slide-4
SLIDE 4

Randomized estimation algorithms

◮ Algorithm Est(C) estimates some value µ(C) ∈ Rd Pr[Est(C) − µ(C)∞ > ε] ≤ δ ◮ Canonical example:

2 / 13

slide-5
SLIDE 5

Randomized estimation algorithms

◮ Algorithm Est(C) estimates some value µ(C) ∈ Rd Pr[Est(C) − µ(C)∞ > ε] ≤ δ ◮ Canonical example:

◮ C is a Boolean circuit

2 / 13

slide-6
SLIDE 6

Randomized estimation algorithms

◮ Algorithm Est(C) estimates some value µ(C) ∈ Rd Pr[Est(C) − µ(C)∞ > ε] ≤ δ ◮ Canonical example:

◮ C is a Boolean circuit ◮ µ(C)

def

= Prx[C(x) = 1] (d = 1)

2 / 13

slide-7
SLIDE 7

Randomized estimation algorithms

◮ Algorithm Est(C) estimates some value µ(C) ∈ Rd Pr[Est(C) − µ(C)∞ > ε] ≤ δ ◮ Canonical example:

◮ C is a Boolean circuit ◮ µ(C)

def

= Prx[C(x) = 1] (d = 1) ◮ Est(C) evaluates C at several randomly chosen points

2 / 13

slide-8
SLIDE 8

Using Est as a subroutine

Owner Steward

3 / 13

slide-9
SLIDE 9

Using Est as a subroutine

Owner Steward C1

3 / 13

slide-10
SLIDE 10

Using Est as a subroutine

Owner Steward C1 Y1 ≈ µ(C1)

3 / 13

slide-11
SLIDE 11

Using Est as a subroutine

Owner Steward C1 Y1 ≈ µ(C1) C2

3 / 13

slide-12
SLIDE 12

Using Est as a subroutine

Owner Steward C1 Y1 ≈ µ(C1) C2 Y2 ≈ µ(C2)

3 / 13

slide-13
SLIDE 13

Using Est as a subroutine

Owner Steward C1 Y1 ≈ µ(C1) C2 Y2 ≈ µ(C2) . . . Ck Yk ≈ µ(Ck)

3 / 13

slide-14
SLIDE 14

Using Est as a subroutine

Owner Steward C1 Y1 ≈ µ(C1) C2 Y2 ≈ µ(C2) . . . Ck Yk ≈ µ(Ck) ◮ Suppose Est uses n random bits

3 / 13

slide-15
SLIDE 15

Using Est as a subroutine

Owner Steward C1 Y1 ≈ µ(C1) C2 Y2 ≈ µ(C2) . . . Ck Yk ≈ µ(Ck) ◮ Suppose Est uses n random bits ◮ Na¨ ıvely, total number of random bits = nk

3 / 13

slide-16
SLIDE 16

Using Est as a subroutine

Owner Steward C1 Y1 ≈ µ(C1) C2 Y2 ≈ µ(C2) . . . Ck Yk ≈ µ(Ck) ◮ Suppose Est uses n random bits ◮ Na¨ ıvely, total number of random bits = nk ◮ Can we do better?

3 / 13

slide-17
SLIDE 17

Main result

◮ Theorem (informal): There is a steward that uses just n + O(k log(d + 1)) random bits!

4 / 13

slide-18
SLIDE 18

Main result

◮ Theorem (informal): There is a steward that uses just n + O(k log(d + 1)) random bits! ◮ Mild increases in both error and failure probability

4 / 13

slide-19
SLIDE 19

Main result

◮ Theorem (informal): There is a steward that uses just n + O(k log(d + 1)) random bits! ◮ Mild increases in both error and failure probability ◮ Prior work:

4 / 13

slide-20
SLIDE 20

Main result

◮ Theorem (informal): There is a steward that uses just n + O(k log(d + 1)) random bits! ◮ Mild increases in both error and failure probability ◮ Prior work:

◮ [Saks, Zhou ’99], [Impagliazzo, Zuckerman ’89] both imply stewards

4 / 13

slide-21
SLIDE 21

Main result

◮ Theorem (informal): There is a steward that uses just n + O(k log(d + 1)) random bits! ◮ Mild increases in both error and failure probability ◮ Prior work:

◮ [Saks, Zhou ’99], [Impagliazzo, Zuckerman ’89] both imply stewards ◮ Our steward has better parameters

4 / 13

slide-22
SLIDE 22

Outline of our steward

  • 1. Compute pseudorandom bits Xi ∈ {0, 1}n

5 / 13

slide-23
SLIDE 23

Outline of our steward

  • 1. Compute pseudorandom bits Xi ∈ {0, 1}n
  • 2. Compute Wi := Est(Ci, Xi)

5 / 13

slide-24
SLIDE 24

Outline of our steward

  • 1. Compute pseudorandom bits Xi ∈ {0, 1}n
  • 2. Compute Wi := Est(Ci, Xi)
  • 3. Compute Yi by carefully modifying Wi

5 / 13

slide-25
SLIDE 25

Pseudorandom bits

◮ Gen : {0, 1}s → {0, 1}nk: Variant of INW pseudorandom generator

6 / 13

slide-26
SLIDE 26

Pseudorandom bits

◮ Gen : {0, 1}s → {0, 1}nk: Variant of INW pseudorandom generator ◮ Before first round, steward computes (X1, X2, . . . , Xk) = Gen(Us)

6 / 13

slide-27
SLIDE 27

Pseudorandom bits

◮ Gen : {0, 1}s → {0, 1}nk: Variant of INW pseudorandom generator ◮ Before first round, steward computes (X1, X2, . . . , Xk) = Gen(Us) ◮ In round i, steward runs Est(Ci, Xi)

6 / 13

slide-28
SLIDE 28

Shifting and rounding

7 / 13

(d + 1) · 2ε

slide-29
SLIDE 29

Shifting and rounding

7 / 13

(d + 1) · 2ε Wi

slide-30
SLIDE 30

Shifting and rounding

7 / 13

(d + 1) · 2ε Wi µ(Ci)

slide-31
SLIDE 31

Shifting and rounding

7 / 13

(d + 1) · 2ε Wi µ(Ci)

slide-32
SLIDE 32

Shifting and rounding

7 / 13

(d + 1) · 2ε Wi µ(Ci)

slide-33
SLIDE 33

Shifting and rounding

7 / 13

(d + 1) · 2ε Wi µ(Ci)

slide-34
SLIDE 34

Shifting and rounding

7 / 13

(d + 1) · 2ε Wi µ(Ci)

slide-35
SLIDE 35

Shifting and rounding

7 / 13

(d + 1) · 2ε Wi µ(Ci)

slide-36
SLIDE 36

Shifting and rounding

7 / 13

(d + 1) · 2ε Wi µ(Ci)

slide-37
SLIDE 37

Shifting and rounding

7 / 13

(d + 1) · 2ε Yi µ(Ci)

slide-38
SLIDE 38

Analysis

◮ Theorem (informal): With high probability, for every i, Yi − µ(Ci)∞ ≤ O(εd).

8 / 13

slide-39
SLIDE 39

Analysis

◮ Theorem (informal): With high probability, for every i, Yi − µ(Ci)∞ ≤ O(εd). ◮ Notation: For W ∈ Rd, ∆ ∈ [d + 1], define ⌊W ⌉∆ ∈ Rd by rounding each coordinate to nearest value y such that y ≡ 2ε∆ mod (d + 1) · 2ε

8 / 13

slide-40
SLIDE 40

Analysis

◮ Theorem (informal): With high probability, for every i, Yi − µ(Ci)∞ ≤ O(εd). ◮ Notation: For W ∈ Rd, ∆ ∈ [d + 1], define ⌊W ⌉∆ ∈ Rd by rounding each coordinate to nearest value y such that y ≡ 2ε∆ mod (d + 1) · 2ε ◮ In this notation, Yi = ⌊Wi⌉∆ for a suitable ∆ ∈ [d + 1]

8 / 13

slide-41
SLIDE 41

Analysis (continued)

◮ Yi = ⌊Wi⌉∆

9 / 13

slide-42
SLIDE 42

Analysis (continued)

◮ Yi = ⌊Wi⌉∆ ◮ If Xi = fresh randomness, then w.h.p., ⌊Wi⌉∆ = ⌊µ(Ci)⌉∆

9 / 13

slide-43
SLIDE 43

Analysis (continued)

◮ Yi = ⌊Wi⌉∆ ◮ If Xi = fresh randomness, then w.h.p., ⌊Wi⌉∆ = ⌊µ(Ci)⌉∆ ◮ Consider d + 2 cases:

9 / 13

slide-44
SLIDE 44

Analysis (continued)

◮ Yi = ⌊Wi⌉∆ ◮ If Xi = fresh randomness, then w.h.p., ⌊Wi⌉∆ = ⌊µ(Ci)⌉∆ ◮ Consider d + 2 cases:

◮ Yi = ⌊µ(Ci)⌉1, or

9 / 13

slide-45
SLIDE 45

Analysis (continued)

◮ Yi = ⌊Wi⌉∆ ◮ If Xi = fresh randomness, then w.h.p., ⌊Wi⌉∆ = ⌊µ(Ci)⌉∆ ◮ Consider d + 2 cases:

◮ Yi = ⌊µ(Ci)⌉1, or ◮ Yi = ⌊µ(Ci)⌉2, or

9 / 13

slide-46
SLIDE 46

Analysis (continued)

◮ Yi = ⌊Wi⌉∆ ◮ If Xi = fresh randomness, then w.h.p., ⌊Wi⌉∆ = ⌊µ(Ci)⌉∆ ◮ Consider d + 2 cases:

◮ Yi = ⌊µ(Ci)⌉1, or ◮ Yi = ⌊µ(Ci)⌉2, or . . .

9 / 13

slide-47
SLIDE 47

Analysis (continued)

◮ Yi = ⌊Wi⌉∆ ◮ If Xi = fresh randomness, then w.h.p., ⌊Wi⌉∆ = ⌊µ(Ci)⌉∆ ◮ Consider d + 2 cases:

◮ Yi = ⌊µ(Ci)⌉1, or ◮ Yi = ⌊µ(Ci)⌉2, or . . . ◮ Yi = ⌊µ(Ci)⌉d+1, or

9 / 13

slide-48
SLIDE 48

Analysis (continued)

◮ Yi = ⌊Wi⌉∆ ◮ If Xi = fresh randomness, then w.h.p., ⌊Wi⌉∆ = ⌊µ(Ci)⌉∆ ◮ Consider d + 2 cases:

◮ Yi = ⌊µ(Ci)⌉1, or ◮ Yi = ⌊µ(Ci)⌉2, or . . . ◮ Yi = ⌊µ(Ci)⌉d+1, or ◮ Yi = something else.

9 / 13

slide-49
SLIDE 49

Block decision tree

C1

10 / 13

slide-50
SLIDE 50

Block decision tree

C1 C (1)

2

C (2)

2

C (3)

2

10 / 13

slide-51
SLIDE 51

Block decision tree

C1 C (1)

2

C (2)

2

C (3)

2

⊥ C (2,1)

3

C (2,2)

3

C (2,3)

3

10 / 13

slide-52
SLIDE 52

Block decision tree

C1 C (1)

2

C (2)

2

C (3)

2

⊥ C (2,1)

3

C (2,2)

3

C (2,3)

3

⊥ ◮ A sequence (X1, . . . , Xk) determines:

10 / 13

slide-53
SLIDE 53

Block decision tree

C1 C (1)

2

C (2)

2

C (3)

2

⊥ C (2,1)

3

C (2,2)

3

C (2,3)

3

⊥ ◮ A sequence (X1, . . . , Xk) determines:

◮ A transcript (C1, Y1, C2, Y2, . . . , Ck, Yk)

10 / 13

slide-54
SLIDE 54

Block decision tree

C1 C (1)

2

C (2)

2

C (3)

2

⊥ C (2,1)

3

C (2,2)

3

C (2,3)

3

⊥ ◮ A sequence (X1, . . . , Xk) determines:

◮ A transcript (C1, Y1, C2, Y2, . . . , Ck, Yk) ◮ A path P through tree

10 / 13

slide-55
SLIDE 55

Block decision tree

C1 C (1)

2

C (2)

2

C (3)

2

⊥ C (2,1)

3

C (2,2)

3

C (2,3)

3

⊥ ◮ A sequence (X1, . . . , Xk) determines:

◮ A transcript (C1, Y1, C2, Y2, . . . , Ck, Yk) ◮ A path P through tree

◮ If we pick X1, . . . , Xk independently and u.a.r., Pr

(X1,...,Xk)[P has a ⊥ node] ≤ kδ

10 / 13

slide-56
SLIDE 56

Fooling the tree

C1 C (1)

2

C (2)

2

C (3)

2

⊥ C (2,1)

3

C (2,2)

3

C (2,3)

3

⊥ ◮ Tree has low memory

11 / 13

slide-57
SLIDE 57

Fooling the tree

C1 C (1)

2

C (2)

2

C (3)

2

⊥ C (2,1)

3

C (2,2)

3

C (2,3)

3

⊥ ◮ Tree has low memory ◮ So when X1, . . . , Xk are pseudorandom, Pr

(X1,...,Xk)[P has a ⊥ node] ≤ kδ + γ

11 / 13

slide-58
SLIDE 58

The tree certifies correctness

C1 C (1)

2

C (2)

2

C (3)

2

⊥ C (2,1)

3

C (2,2)

3

C (2,3)

3

⊥ ◮ (Certification) No ⊥ nodes in P = ⇒ every Yi has error O(εd)

12 / 13

slide-59
SLIDE 59

Application: Randomness-efficient Goldreich-Levin

13 / 13

slide-60
SLIDE 60

Application: Randomness-efficient Goldreich-Levin

◮ Oracle access to x ∈ {0, 1}2n

13 / 13

slide-61
SLIDE 61

Application: Randomness-efficient Goldreich-Levin

◮ Oracle access to x ∈ {0, 1}2n ◮ Theorem: Can find all Hadamard codewords that agree with x in ( 1

2 + θ)-fraction of positions

13 / 13

slide-62
SLIDE 62

Application: Randomness-efficient Goldreich-Levin

◮ Oracle access to x ∈ {0, 1}2n ◮ Theorem: Can find all Hadamard codewords that agree with x in ( 1

2 + θ)-fraction of positions

◮ Runtime poly(n, 1/θ, log(1/δ)) (δ = failure prob)

13 / 13

slide-63
SLIDE 63

Application: Randomness-efficient Goldreich-Levin

◮ Oracle access to x ∈ {0, 1}2n ◮ Theorem: Can find all Hadamard codewords that agree with x in ( 1

2 + θ)-fraction of positions

◮ Runtime poly(n, 1/θ, log(1/δ)) (δ = failure prob) ◮ O(n + log n log(1/δ)) random bits (independent of θ!)

13 / 13

slide-64
SLIDE 64

Application: Randomness-efficient Goldreich-Levin

◮ Oracle access to x ∈ {0, 1}2n ◮ Theorem: Can find all Hadamard codewords that agree with x in ( 1

2 + θ)-fraction of positions

◮ Runtime poly(n, 1/θ, log(1/δ)) (δ = failure prob) ◮ O(n + log n log(1/δ)) random bits (independent of θ!)

◮ Previous best: O(n log(n/θ) log(1/(δθ))) random bits (Bshouty et al. ’04)

13 / 13

slide-65
SLIDE 65

Application: Randomness-efficient Goldreich-Levin

◮ Oracle access to x ∈ {0, 1}2n ◮ Theorem: Can find all Hadamard codewords that agree with x in ( 1

2 + θ)-fraction of positions

◮ Runtime poly(n, 1/θ, log(1/δ)) (δ = failure prob) ◮ O(n + log n log(1/δ)) random bits (independent of θ!)

◮ Previous best: O(n log(n/θ) log(1/(δθ))) random bits (Bshouty et al. ’04) ◮ Proof ingredients:

13 / 13

slide-66
SLIDE 66

Application: Randomness-efficient Goldreich-Levin

◮ Oracle access to x ∈ {0, 1}2n ◮ Theorem: Can find all Hadamard codewords that agree with x in ( 1

2 + θ)-fraction of positions

◮ Runtime poly(n, 1/θ, log(1/δ)) (δ = failure prob) ◮ O(n + log n log(1/δ)) random bits (independent of θ!)

◮ Previous best: O(n log(n/θ) log(1/(δθ))) random bits (Bshouty et al. ’04) ◮ Proof ingredients:

◮ Standard Goldreich-Levin algorithm

13 / 13

slide-67
SLIDE 67

Application: Randomness-efficient Goldreich-Levin

◮ Oracle access to x ∈ {0, 1}2n ◮ Theorem: Can find all Hadamard codewords that agree with x in ( 1

2 + θ)-fraction of positions

◮ Runtime poly(n, 1/θ, log(1/δ)) (δ = failure prob) ◮ O(n + log n log(1/δ)) random bits (independent of θ!)

◮ Previous best: O(n log(n/θ) log(1/(δθ))) random bits (Bshouty et al. ’04) ◮ Proof ingredients:

◮ Standard Goldreich-Levin algorithm ◮ Our steward with d = poly(1/θ)

13 / 13

slide-68
SLIDE 68

Application: Randomness-efficient Goldreich-Levin

◮ Oracle access to x ∈ {0, 1}2n ◮ Theorem: Can find all Hadamard codewords that agree with x in ( 1

2 + θ)-fraction of positions

◮ Runtime poly(n, 1/θ, log(1/δ)) (δ = failure prob) ◮ O(n + log n log(1/δ)) random bits (independent of θ!)

◮ Previous best: O(n log(n/θ) log(1/(δθ))) random bits (Bshouty et al. ’04) ◮ Proof ingredients:

◮ Standard Goldreich-Levin algorithm ◮ Our steward with d = poly(1/θ) ◮ Goldreich-Wigderson sampler

13 / 13

slide-69
SLIDE 69

Open questions

◮ Optimal randomness complexity when d is large?

14 / 13

slide-70
SLIDE 70

Open questions

◮ Optimal randomness complexity when d is large? ◮ Avoid error blowup ε → O(εd)?

14 / 13

slide-71
SLIDE 71

Open questions

◮ Optimal randomness complexity when d is large? ◮ Avoid error blowup ε → O(εd)? ◮ More applications?

14 / 13

slide-72
SLIDE 72

Open questions

◮ Optimal randomness complexity when d is large? ◮ Avoid error blowup ε → O(εd)? ◮ More applications? ◮ Thanks! Questions?

14 / 13