Quantum and Classical Algorithms for Approximate Submodular Function Minimization
Yassine Hamoudi, Patrick Rebentrost, Ansis Rosmanis, Miklos Santha
arXiv: 1907.05378
Quantum and Classical Algorithms for Approximate Submodular Function - - PowerPoint PPT Presentation
Quantum and Classical Algorithms for Approximate Submodular Function Minimization Yassine Hamoudi, Patrick Rebentrost, Ansis Rosmanis, Miklos Santha arXiv: 1907.05378 1. Approximate Submodular Function Minimization 2. Quantum speed-up for
Yassine Hamoudi, Patrick Rebentrost, Ansis Rosmanis, Miklos Santha
arXiv: 1907.05378
A submodular function is a set function satisfying the diminishing returns property:
A submodular function is a set function satisfying the diminishing returns property:
Example: area covered by cameras
A submodular function is a set function satisfying the diminishing returns property:
Example: area covered by cameras
A submodular function is a set function satisfying the diminishing returns property:
Example: size of a cut
A submodular function is a set function satisfying the diminishing returns property:
Example: size of a cut
A submodular function is a set function satisfying the diminishing returns property:
Other examples:
A submodular function is a set function satisfying the diminishing returns property:
Other examples:
(time = #queries to the oracle )
(time = #queries to the oracle )
(Grotschel, Lovasz, Shrijver 1981)
find such that
S⊂[n] F(S)
(time = #queries to the oracle )
(Grotschel, Lovasz, Shrijver 1981)
˜ O(n3) ˜ O(n2 log M)
M = max |F(S)|
where
(lower bound: Ω(n))
find such that find such that
S⊂[n] F(S)
S⊂[n] F(S) + ϵ
(time = #queries to the oracle )
(Grotschel, Lovasz, Shrijver 1981)
˜ O(n3) ˜ O(n2 log M)
M = max |F(S)|
where
(lower bound: Ω(n))
find such that find such that
S⊂[n] F(S)
S⊂[n] F(S) + ϵ
(time = #queries to the oracle )
(Grotschel, Lovasz, Shrijver 1981)
˜ O(n3) ˜ O(n2 log M)
M = max |F(S)|
where
˜ O(n5/3/ϵ2)
(lower bound: Ω(n))
find such that find such that
S⊂[n] F(S)
S⊂[n] F(S) + ϵ
(time = #queries to the oracle )
(Grotschel, Lovasz, Shrijver 1981)
˜ O(n3) ˜ O(n2 log M)
M = max |F(S)|
where
˜ O(n5/3/ϵ2)
(classical) (quantum)
(lower bound: Ω(n))
find such that find such that
S⊂[n] F(S)
S⊂[n] F(S) + ϵ
(time = #queries to the oracle )
(Grotschel, Lovasz, Shrijver 1981)
˜ O(n3) ˜ O(n2 log M)
M = max |F(S)|
where
˜ O(n5/3/ϵ2)
(classical) (quantum)
˜ O(n/ϵ2)
(classical) (lower bound: Ω(n))
Discrete Optimization
Set function:
Discrete Optimization Continuous Optimization
Lovász extension:
Set function:
Discrete Optimization Continuous Optimization
Lovász extension:
Set function:
Discrete Optimization Continuous Optimization
Lovász extension:
Set function:
Discrete Optimization Continuous Optimization
Lovász extension:
Set function:
Discrete Optimization Continuous Optimization
Lovász extension:
Set function:
Set function:
Discrete Optimization Continuous Optimization
Lovász extension:
(1,1) (0,0) (1,0) (0,1) F(Ø) F({2}) F({1,2}) F({1})
Set function:
Discrete Optimization Continuous Optimization
Lovász extension:
(1,1) (0,0) (1,0) (0,1) F(Ø) F({2}) F({1,2}) F({1})
Set function:
Discrete Optimization Continuous Optimization
Lovász extension:
(1,1) (0,0) (1,0) (0,1) F(Ø) F({2}) F({1,2}) F({1})
Set function:
Discrete Optimization Continuous Optimization
Lovász extension:
(1,1) (0,0) (1,0) (0,1) F(Ø) F({2}) F({1,2}) F({1})
Convex function f : C → ℝ on a convex set C. (not necessarily differentiable)
Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x.
x
(not necessarily differentiable)
f(x)
g(x)
Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x.
x
(not necessarily differentiable)
f(x)
g(x)
Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x.
x
(not necessarily differentiable)
f(x)
g(x)
Stochastic Subgradient at x: random variable satisfying
˜ g(x) E[˜ g(x)] = g(x) ˜ g(x) w.p. 1/2 ˜ g(x) w.p. 1/2
Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x. (not necessarily differentiable) Stochastic Subgradient at x: random variable satisfying
E[˜ g(x)] = g(x) ˜ g(x)
(projected) Stochastic Subgradient Descent
Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x. (not necessarily differentiable) Stochastic Subgradient at x: random variable satisfying
E[˜ g(x)] = g(x)
˜ g(x)
(projected) Stochastic Subgradient Descent
Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x. (not necessarily differentiable) Stochastic Subgradient at x: random variable satisfying
E[˜ g(x)] = g(x)
˜ g(x) −η˜ g(xt)
(projected) Stochastic Subgradient Descent
Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x. (not necessarily differentiable) Stochastic Subgradient at x: random variable satisfying
E[˜ g(x)] = g(x)
˜ g(x) −η˜ g(xt)
projection
(projected) Stochastic Subgradient Descent
If has low variance then the number of steps is the same as if we were using g(x).
Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x. (not necessarily differentiable) Stochastic Subgradient at x: random variable satisfying
˜ g(x) E[˜ g(x)] = g(x)
˜ g(x) −η˜ g(xt)
projection
(projected) Stochastic Subgradient Descent
For the Lovász extension f, there exists a subgradient g(x) such that:
For the Lovász extension f, there exists a subgradient g(x) such that:
For the Lovász extension f, there exists a subgradient g(x) such that:
(Jegelka, Bilmes 2011) and (Hazan, Kale 2012)
A stochastic subgradient for g(x) can be computed in time:
For the Lovász extension f, there exists a subgradient g(x) such that:
(Jegelka, Bilmes 2011) and (Hazan, Kale 2012)
A stochastic subgradient for g(x) can be computed in time:
For the Lovász extension f, there exists a subgradient g(x) such that:
˜ O(n2/3)
(Jegelka, Bilmes 2011) and (Hazan, Kale 2012)
A stochastic subgradient for g(x) can be computed in time:
For the Lovász extension f, there exists a subgradient g(x) such that:
˜ O(n2/3)
(classical) (quantum)
(Jegelka, Bilmes 2011) and (Hazan, Kale 2012)
A stochastic subgradient for g(x) can be computed in time:
For the Lovász extension f, there exists a subgradient g(x) such that:
˜ O(n2/3)
(classical) (quantum)
˜ O(1)
(Jegelka, Bilmes 2011) and (Hazan, Kale 2012)
First attempt to construct :
First attempt to construct :
For any non-zero vector u ∈ Rn, define the random variable
First attempt to construct :
For any non-zero vector u ∈ Rn, define the random variable
i-th coordinate
where i is sampled with probability
First attempt to construct :
For any non-zero vector u ∈ Rn, define the random variable
i-th coordinate
where i is sampled with probability
E[ ̂ u] = ∑
i
|ui| ∥u∥1 sgn(ui)∥u∥1 ⋅ ⃗ e i = ∑
i
ui ⋅ ⃗ e i = u
Unbiased:
First attempt to construct :
For any non-zero vector u ∈ Rn, define the random variable
i-th coordinate
where i is sampled with probability
E[ ̂ u] = ∑
i
|ui| ∥u∥1 sgn(ui)∥u∥1 ⋅ ⃗ e i = ∑
i
ui ⋅ ⃗ e i = u
Unbiased: 2nd moment:
E[∥ ̂ u∥2
2] = ∑ i
|ui| ∥u∥1 sgn(ui)2∥u∥2
1 = ∥u∥2 1
First attempt to construct :
For any non-zero vector u ∈ Rn, define the random variable
i-th coordinate
where i is sampled with probability
E[ ̂ u] = ∑
i
|ui| ∥u∥1 sgn(ui)∥u∥1 ⋅ ⃗ e i = ∑
i
ui ⋅ ⃗ e i = u
Unbiased: 2nd moment:
E[∥ ̂ u∥2
2] = ∑ i
|ui| ∥u∥1 sgn(ui)2∥u∥2
1 = ∥u∥2 1
For the Lovász extension: u = g(x) and ||g(x)||1 = O(1) (low variance)
(Jegelka, Bilmes 2011)
First attempt to construct :
For any non-zero vector u ∈ Rn, define the random variable
i-th coordinate
where i is sampled with probability
E[ ̂ u] = ∑
i
|ui| ∥u∥1 sgn(ui)∥u∥1 ⋅ ⃗ e i = ∑
i
ui ⋅ ⃗ e i = u
Unbiased: 2nd moment:
E[∥ ̂ u∥2
2] = ∑ i
|ui| ∥u∥1 sgn(ui)2∥u∥2
1 = ∥u∥2 1
For the Lovász extension: u = g(x) and ||g(x)||1 = O(1) (low variance)
Hard to sample
(Importance sampling)
(Jegelka, Bilmes 2011)
Second attempt:
Second attempt:
(Chakrabarty, Lee, Sidford, Wong 2017)
Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.
˜ d(x, y) d(x, y) = g(y) − g(x) x − y
Second attempt:
(Chakrabarty, Lee, Sidford, Wong 2017)
Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.
˜ d(x, y) d(x, y) = g(y) − g(x) x − y
Our construction:
Second attempt:
(Chakrabarty, Lee, Sidford, Wong 2017)
Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.
˜ d(x, y) d(x, y) = g(y) − g(x) x − y
Our construction:
x0 ⟶ ̂ g(x0)
Second attempt:
(Chakrabarty, Lee, Sidford, Wong 2017)
Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.
˜ d(x, y) d(x, y) = g(y) − g(x) x − y
Our construction:
x0 ⟶ ̂ g(x0) x1 ⟶ ̂ g(x0) + ˜ d(x0, x1)
Second attempt:
(Chakrabarty, Lee, Sidford, Wong 2017)
Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.
˜ d(x, y) d(x, y) = g(y) − g(x) x − y
Our construction:
x0 ⟶ ̂ g(x0) x1 ⟶ ̂ g(x0) + ˜ d(x0, x1) x2 ⟶ ̂ g(x0) + ˜ d(x0, x2)
Second attempt:
(Chakrabarty, Lee, Sidford, Wong 2017)
Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.
˜ d(x, y) d(x, y) = g(y) − g(x) x − y
Our construction:
x0 ⟶ ̂ g(x0) x1 ⟶ ̂ g(x0) + ˜ d(x0, x1) x2 ⟶ ̂ g(x0) + ˜ d(x0, x2)
⋮
xT−1 ⟶ ̂ g(x0) + ˜ d(x0, xT−1)
(T = parameter to be optimized)
Second attempt:
(Chakrabarty, Lee, Sidford, Wong 2017)
Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.
˜ d(x, y) d(x, y) = g(y) − g(x) x − y
Our construction:
x0 ⟶ ̂ g(x0) x1 ⟶ ̂ g(x0) + ˜ d(x0, x1) x2 ⟶ ̂ g(x0) + ˜ d(x0, x2)
⋮
xT−1 ⟶ ̂ g(x0) + ˜ d(x0, xT−1) xT ⟶ ̂ g(xT)
(T = parameter to be optimized)
Second attempt:
(Chakrabarty, Lee, Sidford, Wong 2017)
Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.
˜ d(x, y) d(x, y) = g(y) − g(x) x − y
Our construction:
x0 ⟶ ̂ g(x0) x1 ⟶ ̂ g(x0) + ˜ d(x0, x1) x2 ⟶ ̂ g(x0) + ˜ d(x0, x2)
⋮
xT−1 ⟶ ̂ g(x0) + ˜ d(x0, xT−1) xT ⟶ ̂ g(xT) xT+1 ⟶ ̂ g(xT) + ˜ d(xT, xT+1)
(T = parameter to be optimized)
xT+2 ⟶ ̂ g(xT) + ˜ d(xT, xT+2) Second attempt:
(Chakrabarty, Lee, Sidford, Wong 2017)
Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.
˜ d(x, y) d(x, y) = g(y) − g(x) x − y
Our construction:
x0 ⟶ ̂ g(x0) x1 ⟶ ̂ g(x0) + ˜ d(x0, x1) x2 ⟶ ̂ g(x0) + ˜ d(x0, x2)
⋮ ⋮
xT−1 ⟶ ̂ g(x0) + ˜ d(x0, xT−1) xT ⟶ ̂ g(xT) xT+1 ⟶ ̂ g(xT) + ˜ d(xT, xT+1) x2T−1 ⟶ ̂ g(xT) + ˜ d(xT, x2T−1)
(T = parameter to be optimized)
xT+2 ⟶ ̂ g(xT) + ˜ d(xT, xT+2) Second attempt:
(Chakrabarty, Lee, Sidford, Wong 2017)
Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.
˜ d(x, y) d(x, y) = g(y) − g(x) x − y
Our construction:
x0 ⟶ ̂ g(x0) x1 ⟶ ̂ g(x0) + ˜ d(x0, x1) x2 ⟶ ̂ g(x0) + ˜ d(x0, x2)
⋮ ⋮
xT−1 ⟶ ̂ g(x0) + ˜ d(x0, xT−1) xT ⟶ ̂ g(xT) xT+1 ⟶ ̂ g(xT) + ˜ d(xT, xT+1) x2T−1 ⟶ ̂ g(xT) + ˜ d(xT, x2T−1)
⋯
x2T ⟶ ̂ g(x2T)
(T = parameter to be optimized)
xT+2 ⟶ ̂ g(xT) + ˜ d(xT, xT+2) Second attempt:
(Chakrabarty, Lee, Sidford, Wong 2017)
Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.
˜ d(x, y) d(x, y) = g(y) − g(x) x − y
Our construction:
x0 ⟶ ̂ g(x0) x1 ⟶ ̂ g(x0) + ˜ d(x0, x1) x2 ⟶ ̂ g(x0) + ˜ d(x0, x2)
⋮ ⋮
xT−1 ⟶ ̂ g(x0) + ˜ d(x0, xT−1) xT ⟶ ̂ g(xT) xT+1 ⟶ ̂ g(xT) + ˜ d(xT, xT+1) T independent
samples
T independent
samples
x2T−1 ⟶ ̂ g(xT) + ˜ d(xT, x2T−1)
⋯
x2T ⟶ ̂ g(x2T)
(T = parameter to be optimized)
⋯
p1 + p2 p3 p1 + p2 + p3 p4 + p5 p1 p2 p4 p5
Preprocessing time: O(n) Cost per sample:
p1 + p2 p3 p1 + p2 + p3 p4 + p5 p1 p2 p4 p5
Preprocessing time: O(n) Cost per sample:
Cost for T samples: O(n + T log n)
p1 + p2 p3 p1 + p2 + p3 p4 + p5 p1 p2 p4 p5
Preprocessing time: O(n)
Cost per sample:
Cost for T samples: O(n + T log n)
Preprocessing time: O(n) Cost per sample:
Cost for T samples: O(n + T)
p1 + p2 p3 p1 + p2 + p3 p4 + p5 p1 p2 p4 p5
(Walker 1974, Vose 1991)
Preprocessing: Sampling (repeat T times):
(Grover 2000)
Preprocessing: Sampling (repeat T times):
(Grover 2000)
V(|0⟩|0⟩) ⟼ 1 n ∑
i∈[n]
|i⟩|0⟩
Preprocessing: Sampling (repeat T times):
(Grover 2000)
V(|0⟩|0⟩) ⟼ 1 n ∑
i∈[n]
|i⟩|0⟩
⟼ 1 n ∑
i∈[n]
|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)
Preprocessing: Sampling (repeat T times):
(Grover 2000)
V(|0⟩|0⟩) ⟼ 1 n ∑
i∈[n]
|i⟩|0⟩
= 1 npmax (∑
i
pi |i⟩)|0⟩ + …|1⟩ ⟼ 1 n ∑
i∈[n]
|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)
Preprocessing: Sampling (repeat T times):
(Grover 2000)
V(|0⟩|0⟩) ⟼ 1 n ∑
i∈[n]
|i⟩|0⟩
∑
i
pi |i⟩
= 1 npmax (∑
i
pi |i⟩)|0⟩ + …|1⟩ ⟼ 1 n ∑
i∈[n]
|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)
Preprocessing: Sampling (repeat T times):
(Grover 2000)
V(|0⟩|0⟩) ⟼ 1 n ∑
i∈[n]
|i⟩|0⟩
∑
i
pi |i⟩
= 1 npmax (∑
i
pi |i⟩)|0⟩ + …|1⟩ ⟼ 1 n ∑
i∈[n]
|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)
Preprocessing time: O(
Cost per sample:
Preprocessing: Sampling (repeat T times):
(Grover 2000)
V(|0⟩|0⟩) ⟼ 1 n ∑
i∈[n]
|i⟩|0⟩
∑
i
pi |i⟩
= 1 npmax (∑
i
pi |i⟩)|0⟩ + …|1⟩ ⟼ 1 n ∑
i∈[n]
|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)
Preprocessing time: O(
Cost per sample:
Cost for T samples: O(
Preprocessing: Sampling (repeat T times):
(Grover 2000)
V(|0⟩|0⟩) ⟼ 1 n ∑
i∈[n]
|i⟩|0⟩
∑
i
pi |i⟩
= 1 npmax (∑
i
pi |i⟩)|0⟩ + …|1⟩ ⟼ 1 n ∑
i∈[n]
|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)
Preprocessing time: O(
Cost per sample:
Cost for T samples: O(
Preprocessing: Sampling (repeat T times):
(Grover 2000)
V(|0⟩|0⟩) ⟼ 1 n ∑
i∈[n]
|i⟩|0⟩
∑
i
pi |i⟩
= 1 npmax (∑
i
pi |i⟩)|0⟩ + …|1⟩ ⟼ 1 n ∑
i∈[n]
|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)
Preprocessing time: O(
Cost per sample:
Cost for T samples: O(
Our result: O(
Preprocessing: Sampling (repeat T times):
(Grover 2000)
Our result: O(
Our result: O(
Element 1 2 3 4 5 6 7 Probability
p1 p2 p3 p4 p5 p6 p7
Distribution D
Our result: O(
Element 1 2 3 4 5 6 7 Probability
p1 p2 p3 p4 p5 p6 p7
Element 1 3 4 Probability
p1 PHeavy p3 PHeavy p4 PHeavy
Distribution D Distribution DHeavy
i
PHeavy = ∑
i∈Heavy
pi
Our result: O(
Element 1 2 3 4 5 6 7 Probability
p1 p2 p3 p4 p5 p6 p7
Element 1 3 4 Probability Element 2 5 6 7 Probability
p1 PHeavy p3 PHeavy p4 PHeavy p2 PLight p5 PLight p6 PLight p7 PLight
Distribution D Distribution DHeavy Distribution DLight
i
PHeavy = ∑
i∈Heavy
pi PLight = ∑
i∈Light
pi
Our result: O(
Element 1 2 3 4 5 6 7 Probability
p1 p2 p3 p4 p5 p6 p7
Element 1 3 4 Probability Element 2 5 6 7 Probability
p1 PHeavy p3 PHeavy p4 PHeavy p2 PLight p5 PLight p6 PLight p7 PLight
Distribution D Distribution DHeavy Distribution DLight
i
PHeavy = ∑
i∈Heavy
pi PLight = ∑
i∈Light
pi
Our result: O(
Element 1 2 3 4 5 6 7 Probability
p1 p2 p3 p4 p5 p6 p7
Element 1 3 4 Probability Element 2 5 6 7 Probability
p1 PHeavy p3 PHeavy p4 PHeavy p2 PLight p5 PLight p6 PLight p7 PLight
Distribution D Distribution DHeavy Distribution DLight
i
PHeavy = ∑
i∈Heavy
pi PLight = ∑
i∈Light
pi
Preprocessing:
Preprocessing:
Preprocessing:
∑
i∈Heavy
pi
Preprocessing:
∑
i∈Heavy
pi
Preprocessing:
∑
i∈Heavy
pi
Sampling (repeat T times):
Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
Preprocessing:
Cost: Cost: Cost: Cost:
∑
i∈Heavy
pi
Preprocessing:
Cost: Cost: Cost:
O( nT)
Cost:
since |Heavy| ≤ T
∑
i∈Heavy
pi
Preprocessing:
Cost: Cost: Cost:
O( nT)
Cost:
O(T) since |Heavy| ≤ T
∑
i∈Heavy
pi
Preprocessing:
Cost: Cost: Cost:
O( nT) O(T)
Cost:
O(T) since |Heavy| ≤ T
∑
i∈Heavy
pi
Preprocessing:
Cost: Cost: Cost: O(
n) O( nT) O(T)
Cost:
O(T) since |Heavy| ≤ T
∑
i∈Heavy
pi
Cost per sample:
Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
Cost per sample:
Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(1)
Cost per sample:
Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(1)
Total cost: O(T)
Cost per sample:
Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(1)
Total cost: O(T) Cost per sample:
Cost per sample:
Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(1)
Total cost: O(T) Cost per sample:
pmax = max{ pi PLight : i ∈ Light} O( npmax) where
Cost per sample:
Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(1)
Total cost: O(T) Cost per sample:
pmax = max{ pi PLight : i ∈ Light} O( npmax) where ≤ 1 T ⋅ PLight
Cost per sample:
Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(1)
Total cost: O(T) Cost per sample:
pmax = max{ pi PLight : i ∈ Light} O( npmax) where
Total expected cost:
≤ 1 T ⋅ PLight
Cost per sample:
Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(1)
Total cost: O(T) Cost per sample:
pmax = max{ pi PLight : i ∈ Light} O( npmax) where
Total expected cost: O(T ⋅ PLight ⋅
npmax) ≤ 1 T ⋅ PLight
Cost per sample:
Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(1)
Total cost: O(T) Cost per sample:
pmax = max{ pi PLight : i ∈ Light} O( npmax) where
Total expected cost: O(T ⋅ PLight ⋅
npmax) = O(T ⋅ PLight ⋅ n T ⋅ PLight ) ≤ 1 T ⋅ PLight
Cost per sample:
Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(1)
Total cost: O(T) Cost per sample:
pmax = max{ pi PLight : i ∈ Light} O( npmax) where
Total expected cost: O(T ⋅ PLight ⋅
npmax) = O(T ⋅ PLight ⋅ n T ⋅ PLight ) ≤ 1 T ⋅ PLight = O ( nTPLight)
Cost per sample:
Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(1)
Total cost: O(T) Cost per sample:
pmax = max{ pi PLight : i ∈ Light} O( npmax) where
Total expected cost: O(T ⋅ PLight ⋅
npmax) = O(T ⋅ PLight ⋅ n T ⋅ PLight ) ≤ 1 T ⋅ PLight = O ( nTPLight) = O ( nT)
∑
i∈[n]
pi |i⟩ O( nT)