Probabilistic Bisection Search for Stochastic Root Finding Rolf - - PowerPoint PPT Presentation

probabilistic bisection search for stochastic root finding
SMART_READER_LITE
LIVE PREVIEW

Probabilistic Bisection Search for Stochastic Root Finding Rolf - - PowerPoint PPT Presentation

Probabilistic Bisection Search for Stochastic Root Finding Rolf Waeber Peter I. Frazier Shane G. Henderson Operations Research & Information Engineering Cornell University, Ithaca, NY Research supported by AFOSR YIP FA9550-11-1-0083, NSF


slide-1
SLIDE 1

Probabilistic Bisection Search for Stochastic Root Finding

Rolf Waeber Peter I. Frazier Shane G. Henderson

Operations Research & Information Engineering Cornell University, Ithaca, NY

Research supported by AFOSR YIP FA9550-11-1-0083, NSF CMMI 1200315

slide-2
SLIDE 2

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Shameless Commerce

www.simopt.org

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 2/32

slide-3
SLIDE 3

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Stochastic Root-Finding Problem

1 g(x) X*

  • Consider a function g : [0, 1] → R.
  • Assumption: There exists a unique X ∗ ∈ [0, 1] such that
  • g(x) > 0 for x < X ∗,
  • g(x) < 0 for x > X ∗.

Goal: Find X ∗ ∈ [0, 1].

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 3/32

slide-4
SLIDE 4

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Stochastic Root-Finding Problem

1 Yn(Xn) X*

  • Consider a function g : [0, 1] → R.
  • Assumption: There exists a unique X ∗ ∈ [0, 1] such that
  • g(x) > 0 for x < X ∗,
  • g(x) < 0 for x > X ∗.

Goal: Find X ∗ ∈ [0, 1].

  • Can only observe Yn(Xn) = g(Xn) + εn(Xn), where εn(Xn) is a

conditionally independent noise sequence with zero mean (median).

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 3/32

slide-5
SLIDE 5

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Stochastic Root-Finding Problem

1 Yn(Xn) X*

  • Consider a function g : [0, 1] → R.
  • Assumption: There exists a unique X ∗ ∈ [0, 1] such that
  • g(x) > 0 for x < X ∗,
  • g(x) < 0 for x > X ∗.

Goal: Find X ∗ ∈ [0, 1].

  • Can only observe Yn(Xn) = g(Xn) + εn(Xn), where εn(Xn) is a

conditionally independent noise sequence with zero mean (median). Decisions:

  • Where to place samples Xn for n = 0, 1, 2, . . .
  • How to estimate X ∗ after n iterations.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 3/32

slide-6
SLIDE 6

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Applications

  • Simulation optimization:
  • g(x) as a gradient
  • Finance:
  • Pricing American options
  • Estimating risk measures
  • Computer science:
  • Edge detection
  • Image detection and tracking

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 4/32

slide-7
SLIDE 7

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Stochastic Approximation [Robbins and Monro, 1951]

1 g(x) X*

  • 1. Choose an initial estimate X0 ∈ [0, 1];
  • 2. Select a tuning sequence (an)n ≥ 0, ∞

n=0 a 2 n < ∞, and

n=0 an = ∞.

(Example: an = d/n for d > 0.)

  • 3. Xn+1 = Π[0,1](Xn + anYn(Xn)), where Π[0,1] is the projection to [0, 1].

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 5/32

slide-8
SLIDE 8

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Stochastic Approximation [Robbins and Monro, 1951]

1 g(x) X*

  • 1. Choose an initial estimate X0 ∈ [0, 1];
  • 2. Select a tuning sequence (an)n ≥ 0, ∞

n=0 a 2 n < ∞, and

n=0 an = ∞.

(Example: an = d/n for d > 0.)

  • 3. Xn+1 = Π[0,1](Xn + anYn(Xn)), where Π[0,1] is the projection to [0, 1].

Stochastic approximation is fragile.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 5/32

slide-9
SLIDE 9

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Isotonic Regression

  • 1. Simulate at selected points in the interval (0, 1)
  • 2. Minimize a sum of squared deviations from the sample values
  • 3. Subject to a monotonicity constraint
  • 4. Estimate root from regression function
  • 5. Add points as necessary

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 6/32

slide-10
SLIDE 10

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Isotonic Regression

  • 1. Simulate at selected points in the interval (0, 1)
  • 2. Minimize a sum of squared deviations from the sample values
  • 3. Subject to a monotonicity constraint
  • 4. Estimate root from regression function
  • 5. Add points as necessary

Computationally intensive if warm starts are not possible.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 6/32

slide-11
SLIDE 11

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

A Different Approach

What about a bisection algorithm?

1 g(x) X*

  • Deterministic bisection algorithm will fail almost surely.
  • Need to account for the noise.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 7/32

slide-12
SLIDE 12

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

The Probabilistic Bisection Algorithm

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 8/32

slide-13
SLIDE 13

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

The Probabilistic Bisection Algorithm [Horstein, 1963]

  • Input: Zn(Xn) := sign(Yn(Xn)).
  • Assume a prior density f0 on [0, 1].

1 1 2 fn(x) n = 0, Xn = 0.5, Zn(Xn) = −1 X* Xn Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 9/32

slide-14
SLIDE 14

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

The Probabilistic Bisection Algorithm [Horstein, 1963]

  • Input: Zn(Xn) := sign(Yn(Xn)).
  • Assume a prior density f0 on [0, 1].

1 1 2 fn(x) n = 0, Xn = 0.5, Zn(Xn) = −1 X* Xn 1 1 2 fn(x) n = 1, Xn = 0.38462, Zn(Xn) = −1 X* Xn Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 9/32

slide-15
SLIDE 15

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

The Probabilistic Bisection Algorithm [Horstein, 1963]

  • Input: Zn(Xn) := sign(Yn(Xn)).
  • Assume a prior density f0 on [0, 1].

1 1 2 fn(x) n = 0, Xn = 0.5, Zn(Xn) = −1 X* Xn 1 1 2 fn(x) n = 1, Xn = 0.38462, Zn(Xn) = −1 X* Xn 1 1 2 fn(x) n = 2, Xn = 0.29586, Zn(Xn) = 1 X* Xn Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 9/32

slide-16
SLIDE 16

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

The Probabilistic Bisection Algorithm [Horstein, 1963]

  • Input: Zn(Xn) := sign(Yn(Xn)).
  • Assume a prior density f0 on [0, 1].

1 1 2 fn(x) n = 0, Xn = 0.5, Zn(Xn) = −1 X* Xn 1 1 2 fn(x) n = 1, Xn = 0.38462, Zn(Xn) = −1 X* Xn 1 1 2 fn(x) n = 2, Xn = 0.29586, Zn(Xn) = 1 X* Xn 1 1 2 fn(x) n = 3, Xn = 0.36413, Zn(Xn) = 1 X* Xn Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 9/32

slide-17
SLIDE 17

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Stochastic Root-Finding Revisited

1 g(x) X*

Zn(Xn) =

  • sign (g(Xn))

with probability p(Xn), −sign (g(Xn)) with probability 1 − p(Xn).

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 10/32

slide-18
SLIDE 18

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Stochastic Root-Finding Revisited

1 g(x) X* 1 0.5 1 p(x) X*

Zn(Xn) =

  • sign (g(Xn))

with probability p(Xn), −sign (g(Xn)) with probability 1 − p(Xn).

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 10/32

slide-19
SLIDE 19

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Stochastic Root-Finding Revisited

1 g(x) X* 1 0.5 1 p(x) X*

Zn(Xn) =

  • sign (g(Xn))

with probability p(Xn), −sign (g(Xn)) with probability 1 − p(Xn).

  • The probability of a correct sign p(·) depends on g(·) and the noise

(εn)n.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 10/32

slide-20
SLIDE 20

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Stochastic Root-Finding Revisited

1 g(x) X* 1 0.5 1 p p(x) X*

Zn(Xn) =

  • sign (g(Xn))

with probability p(Xn), −sign (g(Xn)) with probability 1 − p(Xn).

  • The probability of a correct sign p(·) depends on g(·) and the noise

(εn)n.

  • Stylized Setting:
  • p(·) is constant.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 10/32

slide-21
SLIDE 21

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Stochastic Root-Finding Revisited

1 g(x) X* 1 0.5 1 p p(x) X*

Zn(Xn) =

  • sign (g(Xn))

with probability p(Xn), −sign (g(Xn)) with probability 1 − p(Xn).

  • The probability of a correct sign p(·) depends on g(·) and the noise

(εn)n.

  • Stylized Setting:
  • p(·) is constant.
  • p(·) is known.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 10/32

slide-22
SLIDE 22

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Stylized Setting

Waeber et al. [2013]:

  • Assume p(·) is constant and known
  • Assume always measure at the median Xn
  • Then E|Xn − X ∗| = O(e−rn) for some r > 0

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 11/32

slide-23
SLIDE 23

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Not so Stylized Setting

  • g(x) is a step function with a jump at X ∗, for example, in edge

detection applications [Castro and Nowak, 2008].

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 12/32

slide-24
SLIDE 24

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Not so Stylized Setting

  • g(x) is a step function with a jump at X ∗, for example, in edge

detection applications [Castro and Nowak, 2008].

  • Sample sequentially at point Xn and use Sm(Xn) = m

i=1 Yn,i(Xn) to

construct an α-level test of power 1 [Siegmund, 1985]: Nn = inf

  • m : |Sm| ≥ [(m + 1)(log(m + 1) + 2 log(1/α))]1/2

. Then PXn=X ∗ {Nn < ∞} ≤ α, PXn=X ∗ {Nn < ∞} = 1, and PXn<X ∗ {SNn(Xn) > 0} ≥ 1 − α/2 = pc, PXn>X ∗ {SNn(Xn) < 0} ≥ 1 − α/2 = pc.

5 10 15 20 m Sm(Xn)

1 0.5 1 p p(x) X*

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 12/32

slide-25
SLIDE 25

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

The Probabilistic Bisection Algorithm [Horstein, 1963]

Notation: p(·) = pc ∈ (1/2, 1] and qc = 1 − pc.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 13/32

slide-26
SLIDE 26

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

The Probabilistic Bisection Algorithm [Horstein, 1963]

Notation: p(·) = pc ∈ (1/2, 1] and qc = 1 − pc.

  • 1. Place a prior density f0 on the root X ∗, f0 has domain [0, 1].

Example: U(0, 1).

  • 2. For n=0,1,2, . . .

(a) Measure at the median Xn := F −1

n

(1/2). (b) Update the posterior density: if Zn(Xn) = +1, fn+1(x) =

  • 2pc · fn(x),

if x > Xn, 2qc · fn(x), if x ≤ Xn, if Zn(Xn) = −1, fn+1(x) =

  • 2qc · fn(x),

if x > Xn, 2pc · fn(x), if x ≤ Xn.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 13/32

slide-27
SLIDE 27

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Sample Path of Posterior Distributions

1 fn(x) n = 0, Xn = 0.5, Zn(Xn) = 1 X* Xn

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 14/32

slide-28
SLIDE 28

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Sample Path of Posterior Distributions

1 fn(x) n = 1, Xn = 0.61538, Zn(Xn) = 1 X* Xn

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 14/32

slide-29
SLIDE 29

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Sample Path of Posterior Distributions

1 fn(x) n = 2, Xn = 0.70414, Zn(Xn) = −1 X* Xn

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 14/32

slide-30
SLIDE 30

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Sample Path of Posterior Distributions

1 fn(x) n = 3, Xn = 0.63587, Zn(Xn) = −1 X* Xn

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 14/32

slide-31
SLIDE 31

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Sample Path of Posterior Distributions

1 fn(x) n = 4, Xn = 0.55589, Zn(Xn) = −1 X* Xn

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 14/32

slide-32
SLIDE 32

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Sample Path of Posterior Distributions

1 fn(x) n = 5, Xn = 0.46446, Zn(Xn) = −1 X* Xn

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 14/32

slide-33
SLIDE 33

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Sample Path of Posterior Distributions

1 fn(x) n = 10, Xn = 0.39721, Zn(Xn) = −1 X* Xn

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 14/32

slide-34
SLIDE 34

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Sample Path of Posterior Distributions

1 fn(x) n = 20, Xn = 0.20046, Zn(Xn) = 1 X* Xn

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 14/32

slide-35
SLIDE 35

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Sample Path of Posterior Distributions

1 fn(x) n = 30, Xn = 0.36118, Zn(Xn) = 1 X* Xn

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 14/32

slide-36
SLIDE 36

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Sample Path of Posterior Distributions

1 fn(x) n = 40, Xn = 0.39722, Zn(Xn) = 1 X* Xn

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 14/32

slide-37
SLIDE 37

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Sample Path of Posterior Distributions

1 fn(x) n = 50, Xn = 0.36904, Zn(Xn) = 1 X* Xn

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 14/32

slide-38
SLIDE 38

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Sample Path of Posterior Distributions

1 fn(x) n = 100, Xn = 0.3752, Zn(Xn) = 1 X* Xn

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 14/32

slide-39
SLIDE 39

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Sample Path of Posterior Distributions

1 fn(x) n = 150, Xn = 0.37261, Zn(Xn) = 1 X* Xn

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 14/32

slide-40
SLIDE 40

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Comparison to Stochastic Approximation

50 100 150 200 10

−10

10

−8

10

−6

10

−4

10

−2

10 n |X* − Xn| Stochastic Approximation Probabilistic Bisection

1 g(x) X*

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 15/32

slide-41
SLIDE 41

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Literature Review: Probabilistic Bisection Algorithm

  • First introduced in Horstein [1963].
  • Discretized version: Burnashev and Zigangirov [1974].
  • Feige et al. [1994], Karp and Kleinberg [2007], Ben-Or and Hassidim

[2008], Nowak [2008], Nowak [2009], ...

  • Survey paper: Castro and Nowak [2008]

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 16/32

slide-42
SLIDE 42

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Literature Review: Probabilistic Bisection Algorithm

  • First introduced in Horstein [1963].
  • Discretized version: Burnashev and Zigangirov [1974].
  • Feige et al. [1994], Karp and Kleinberg [2007], Ben-Or and Hassidim

[2008], Nowak [2008], Nowak [2009], ...

  • Survey paper: Castro and Nowak [2008]

“The [probabilistic bisection] algorithm seems to work extremely well in practice, but it is hard to analyze and there are few theoretical guarantees for it, especially pertaining error rates of convergence.”

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 16/32

slide-43
SLIDE 43

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Algorithm Analysis

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 17/32

slide-44
SLIDE 44

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Consistency

Setting for probabilistic bisection with power 1 tests:

  • X ∗ ∈ [0, 1] fixed and unknown.
  • Xn = X ∗ for any finite n ∈ N.
  • p(Xn) ≥ pc for all n ∈ N.
  • pc ∈ (1/2, 1) is an input parameter.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 18/32

slide-45
SLIDE 45

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Consistency

Setting for probabilistic bisection with power 1 tests:

  • X ∗ ∈ [0, 1] fixed and unknown.
  • Xn = X ∗ for any finite n ∈ N.
  • p(Xn) ≥ pc for all n ∈ N.
  • pc ∈ (1/2, 1) is an input parameter.

Theorem

Xn → X ∗ almost surely as n → ∞.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 18/32

slide-46
SLIDE 46

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Analysis of Posterior Density

1 Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 19/32

slide-47
SLIDE 47

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Analysis of Posterior Density

Xn 1

  • If Zn = +1 :

fn+1(x) = 2qc · fn(x), x < Xn, fn+1(x) = 2pc · fn(x), x ≥ Xn,

  • If Zn = −1 :

fn+1(x) = 2pc · fn(x), x < Xn, fn+1(x) = 2qc · fn(x), x ≥ Xn.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 19/32

slide-48
SLIDE 48

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Analysis of Posterior Density

Xn 1 X*

Case I: If X ∗ < Xn : P(Zn = +1) = 1 − p(Xn) ≤ 1 − pc

  • If Zn = +1 :

fn+1(x) = 2qc · fn(x), x < Xn, fn+1(x) = 2pc · fn(x), x ≥ Xn,

  • If Zn = −1 :

fn+1(x) = 2pc · fn(x), x < Xn, fn+1(x) = 2qc · fn(x), x ≥ Xn.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 19/32

slide-49
SLIDE 49

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Analysis of Posterior Density

Xn 1 X*

Case II: If X ∗ > Xn : P(Zn = +1) = p(Xn) ≥ pc

  • If Zn = +1 :

fn+1(x) = 2qc · fn(x), x < Xn, fn+1(x) = 2pc · fn(x), x ≥ Xn,

  • If Zn = −1 :

fn+1(x) = 2pc · fn(x), x < Xn, fn+1(x) = 2qc · fn(x), x ≥ Xn.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 20/32

slide-50
SLIDE 50

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Analysis of Posterior Density cont.

  • The dynamics of fn(x) are very complicated for almost all x ∈ [0, 1].

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 21/32

slide-51
SLIDE 51

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Analysis of Posterior Density cont.

  • The dynamics of fn(x) are very complicated for almost all x ∈ [0, 1].

HOWEVER, the dynamics of fn(X ∗) are rather simple: fn+1(X ∗) =

  • 2pc · fn(X ∗),

with probability p(Xn) ≥ pc, 2qc · fn(X ∗), with probability q(Xn) ≤ qc.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 21/32

slide-52
SLIDE 52

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Analysis of Posterior Density cont.

  • The dynamics of fn(x) are very complicated for almost all x ∈ [0, 1].

HOWEVER, the dynamics of fn(X ∗) are rather simple: fn+1(X ∗) =

  • 2pc · fn(X ∗),

with probability p(Xn) ≥ pc, 2qc · fn(X ∗), with probability q(Xn) ≤ qc.

  • A sample path of fn(X ∗) dominates a sample path of a coupled

geometric random walk (Wn)n with dynamics Wn+1 =

  • 2pc · Wn,

with probability pc, 2qc · Wn, with probability qc.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 21/32

slide-53
SLIDE 53

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Analysis of Posterior Density cont.

  • The dynamics of fn(x) are very complicated for almost all x ∈ [0, 1].

HOWEVER, the dynamics of fn(X ∗) are rather simple: fn+1(X ∗) =

  • 2pc · fn(X ∗),

with probability p(Xn) ≥ pc, 2qc · fn(X ∗), with probability q(Xn) ≤ qc.

  • A sample path of fn(X ∗) dominates a sample path of a coupled

geometric random walk (Wn)n with dynamics Wn+1 =

  • 2pc · Wn,

with probability pc, 2qc · Wn, with probability qc.

  • The process fn(X ∗) behaves almost like a geometric random walk

independently of (Xn)n.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 21/32

slide-54
SLIDE 54

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Confidence Intervals for X ∗

  • Notation: µ = pc ln 2pc + qc ln 2qc.
  • For α ∈ (0, 1), define

bn = nµ − n1/2(−0.5 ln α)1/2(ln 2pc − ln 2qc).

  • Define

Jn = conv(x ∈ [0, 1] : fn(x) ≥ ebn).

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 22/32

slide-55
SLIDE 55

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Confidence Intervals for X ∗

  • Notation: µ = pc ln 2pc + qc ln 2qc.
  • For α ∈ (0, 1), define

bn = nµ − n1/2(−0.5 ln α)1/2(ln 2pc − ln 2qc).

  • Define

Jn = conv(x ∈ [0, 1] : fn(x) ≥ ebn).

Theorem

For α ∈ (0, 1), P(X ∗ ∈ Jn) ≥ 1 − α, for all n ∈ N.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 22/32

slide-56
SLIDE 56

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Confidence Intervals for X ∗

  • Notation: µ = pc ln 2pc + qc ln 2qc.
  • For α ∈ (0, 1), define

bn = nµ − n1/2(−0.5 ln α)1/2(ln 2pc − ln 2qc).

  • Define

Jn = conv(x ∈ [0, 1] : fn(x) ≥ ebn).

Theorem

For α ∈ (0, 1), P(X ∗ ∈ Jn) ≥ 1 − α, for all n ∈ N. Proof: Application of Hoeffding’s inequality.

  • Stochastic Root-Finding

Probabilistic Bisection Algorithm Analysis Conclusions References 22/32

slide-57
SLIDE 57

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Size of Confidence Interval

Theorem

Choose pc ≥ 0.85, α ∈ (0, 1). For 0 < r < µ − qc ln 2pc there exists a N(pc, r, α) ∈ N, such that P(|Jn| ≤ e−rn, X ∗ ∈ Jn) ≥ 1 − α, for all n ≥ N(pc, r, α).

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 23/32

slide-58
SLIDE 58

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Size of Confidence Interval

Theorem

Choose pc ≥ 0.85, α ∈ (0, 1). For 0 < r < µ − qc ln 2pc there exists a N(pc, r, α) ∈ N, such that P(|Jn| ≤ e−rn, X ∗ ∈ Jn) ≥ 1 − α, for all n ≥ N(pc, r, α). Proof Idea:

1

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 23/32

slide-59
SLIDE 59

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Rate of Convergence

Theorem

Define ˆ Xn to be any point in Jn, then there exists r > 0 such that E[|X ∗ − ˆ Xn|] = O(e−rn).

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 24/32

slide-60
SLIDE 60

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Rate of Convergence

Theorem

Define ˆ Xn to be any point in Jn, then there exists r > 0 such that E[|X ∗ − ˆ Xn|] = O(e−rn).

  • This is extremely fast compared to stochastic approximation:

O(e−rn) vs. O(n−1/2).

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 24/32

slide-61
SLIDE 61

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Rate of Convergence

Theorem

Define ˆ Xn to be any point in Jn, then there exists r > 0 such that E[|X ∗ − ˆ Xn|] = O(e−rn).

  • This is extremely fast compared to stochastic approximation:

O(e−rn) vs. O(n−1/2).

  • And we have true confidence intervals for X ∗.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 24/32

slide-62
SLIDE 62

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Rate of Convergence

Theorem

Define ˆ Xn to be any point in Jn, then there exists r > 0 such that E[|X ∗ − ˆ Xn|] = O(e−rn).

  • This is extremely fast compared to stochastic approximation:

O(e−rn) vs. O(n−1/2).

  • And we have true confidence intervals for X ∗.
  • But n is the number of measurement points, what about total

wall-clock time?

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 24/32

slide-63
SLIDE 63

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Wall-Clock Time

At each iteration of the Probabilistic Bisection Algorithm:

  • Sample sequentially at point Xn and observe

Sm(Xn) = m

i=1 Yn,i(Xn), until

Nn = inf

  • m : |Sm| ≥ [(m + 1)(log(m + 1) + 2 log(1/α))]1/2

, then PXn=X ∗ {Nn < ∞} ≤ α, PXn=X ∗ {Nn < ∞} = 1, and PXn<X ∗ {SNn(Xn) > 0} ≥ 1 − α/2 = pc, PXn>X ∗ {SNn(Xn) < 0} ≥ 1 − α/2 = pc.

  • Wall-clock time: Tn = n

i=1 Nn.

5 10 15 20 m Sm(Xn)

1 0.5 1 p p(x) X*

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 25/32

slide-64
SLIDE 64

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Sample Paths

10 10

1

10

2

10

3

10

4

10

5

−0.5 0.5 Tn X* − Xn Robbin−Monro, an = 1/n, εn ~ N(0,1) 10 10

1

10

2

10

3

10

4

10

5

−0.5 0.5 Tn X* − Xn Robbin−Monro, an = 1/n, εn ~ N(0,1) 10 10

1

10

2

10

3

10

4

10

5

−0.5 0.5 Tn X* − Xn Robbin−Monro, an = 1/n, εn ~ N(0,1) 10 10

1

10

2

10

3

10

4

10

5

−0.5 0.5 Tn X* − Xn Bisection, p = 0.75, εn ~ N(0,1) 10 10

1

10

2

10

3

10

4

10

5

−0.5 0.5 Tn X* − Xn Bisection, p = 0.75, εn ~ N(0,1) 10 10

1

10

2

10

3

10

4

10

5

−0.5 0.5 Tn X* − Xn Bisection, p = 0.75, εn ~ N(0,1)

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 26/32

slide-65
SLIDE 65

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Numerical Comparison

10 10

1

10

2

10

3

10

4

10

5

10

−4

10

−3

10

−2

10

−1

10

Tn E[|X* − Xn|]

Robbins−Monro (an=1/n) Bisection, Siegmund (pc=0.85)

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 27/32

slide-66
SLIDE 66

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Rate of Convergence in Wall-Clock Time?

  • Farrell [1964]:

Eg(x)[N] ∼ (1/g(x))2 log log(1/|g(x)|) as g(x) → 0, and for all tests of power one, if P0(N = ∞) > 0, then lim

g(x)→0 g(x)2Eg(x)[N] = ∞.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 28/32

slide-67
SLIDE 67

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Rate of Convergence in Wall-Clock Time?

  • Farrell [1964]:

Eg(x)[N] ∼ (1/g(x))2 log log(1/|g(x)|) as g(x) → 0, and for all tests of power one, if P0(N = ∞) > 0, then lim

g(x)→0 g(x)2Eg(x)[N] = ∞.

Theorem

(|X ∗ − Xn|(Tn)1/2)n is not tight.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 28/32

slide-68
SLIDE 68

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Rate of Convergence in Wall-Clock Time?

  • Farrell [1964]:

Eg(x)[N] ∼ (1/g(x))2 log log(1/|g(x)|) as g(x) → 0, and for all tests of power one, if P0(N = ∞) > 0, then lim

g(x)→0 g(x)2Eg(x)[N] = ∞.

Theorem

(|X ∗ − Xn|(Tn)1/2)n is not tight.

  • If

g(x) → 0 as x → X ∗, and if we use Xn as the best estimate of X ∗ then the Probabilistic Bisection Algorithm with power one tests is asymptotically slower than Stochastic Approximation.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 28/32

slide-69
SLIDE 69

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Conjecture

  • Xn might not be the best estimate for X ∗ when we use power one

tests.

  • Intuitively, observations where we spend more time should also be

closer to X ∗, hence an estimator of the form ˜ Xn = 1 Tn

n

  • i=1

NiXi should perform better.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 29/32

slide-70
SLIDE 70

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Conjecture

  • Xn might not be the best estimate for X ∗ when we use power one

tests.

  • Intuitively, observations where we spend more time should also be

closer to X ∗, hence an estimator of the form ˜ Xn = 1 Tn

n

  • i=1

NiXi should perform better.

  • Conjecture: For any ǫ > 0 it holds that

E[|˜ Xn − X ∗|] = O(T

− 1

2 +ǫ

n

), (if g satisfies some growth conditions).

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 29/32

slide-71
SLIDE 71

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Conjecture

  • Xn might not be the best estimate for X ∗ when we use power one

tests.

  • Intuitively, observations where we spend more time should also be

closer to X ∗, hence an estimator of the form ˜ Xn = 1 Tn

n

  • i=1

NiXi should perform better.

  • Conjecture: For any ǫ > 0 it holds that

E[|˜ Xn − X ∗|] = O(T

− 1

2 +ǫ

n

), (if g satisfies some growth conditions).

  • Sufficient Condition: |Xn − X ∗| = O(e−rn) for some r > 0.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 29/32

slide-72
SLIDE 72

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Numerical Comparison Cont.

10 10

1

10

2

10

3

10

4

10

5

10

−4

10

−3

10

−2

10

−1

10

Tn E[|X* − Xn|]

Robbins−Monro (an=1/n) Bisection, Siegmund (pc=0.85) Polyak−Ruppert Bisection Averaging

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 30/32

slide-73
SLIDE 73

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

Conclusions

Positive:

  • Provides true confidence interval of the root X ∗.
  • Works extremely well if there is a jump at g(X ∗) (geometric rate of

convergence).

  • Only one tuning parameter.
  • Robust finite-time performance

Drawbacks:

  • Seems to be asymptotically slower than Stochastic Approximation (but not by much).
  • Higher computational cost

Future Research:

  • Use parallel computing (very little switching of (Xn)n).
  • Extension to higher dimensions.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 31/32

slide-74
SLIDE 74

Waeber, Frazier, Henderson Probabilistic Bisection Search for Stochastic Root Finding

  • M. Ben-Or and A. Hassidim. The Bayesian learner is optimal for noisy binary search (and pretty

good for quantum as well). In 49th Annual Symposium on Foundations of Computer Science (FOCS), pages 221–230. IEEE, 2008.

  • M. V. Burnashev and K. S. Zigangirov. An interval estimation problem for controlled observations.

Problemy Peredachi Informatsii, 10(3):51–61, 1974.

  • R. M. Castro and R. D. Nowak. Active learning and sampling. In A. O. Hero, D. A. Castañón,
  • D. Cochran, and K. Kastella, editors, Foundations and Applications of Sensor Management,

pages 177–200. Springer, 2008. ISBN 978-0-387-49819-5. URL http://dx.doi.org/10.1007/978-0-387-49819-5_8.

  • R. H. Farrell. Asymptotic behavior of expected sample size in certain one sided tests. Ann. Math.

Statist., 35(1):36–72, 1964.

  • U. Feige, P. Raghavan, D. Peleg, and E. Upfal. Computing with noisy information. SIAM J.

Comput., 23(5):1001–1018, 1994.

  • M. Horstein. Sequential transmission using noiseless feedback. IEEE Trans. Inform. Theory, 9(3):

136–143, 1963.

  • R. M. Karp and R. Kleinberg. Noisy binary search and its applications. In Proceedings of the 18th

Annual ACM-SIAM Symposium on Discrete Algorithms, pages 881–890. SIAM, 2007.

  • R. D. Nowak. Generalized binary search. In 46th Annual Allerton Conference on Communication,

Control, and Computing, pages 568–574, 2008.

  • R. D. Nowak. Noisy generalized binary search. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I.

Williams, and A. Culotta, editors, Adv. Neural Inf. Process. Syst. 22, pages 1366–1374, 2009.

  • H. Robbins and S. Monro. A stochastic approximation method. Ann. Math. Statist., 22(3):

400–407, 1951.

  • D. Siegmund. Sequential Analysis: tests and confidence intervals. Springer, 1985.
  • R. Waeber, P. I. Frazier, and S. G. Henderson. Bisection search with noisy responses. SIAM J.

Control Optim., 2013.

Stochastic Root-Finding Probabilistic Bisection Algorithm Analysis Conclusions References 32/32