Quantum and Classical Algorithms for Approximate Submodular Function Minimization
Yassine Hamoudi, Patrick Rebentrost, Ansis Rosmanis, Miklos Santha
arXiv: 1907.05378
Quantum and Classical Algorithms for Approximate Submodular Function - - PowerPoint PPT Presentation
Quantum and Classical Algorithms for Approximate Submodular Function Minimization Yassine Hamoudi, Patrick Rebentrost, Ansis Rosmanis, Miklos Santha arXiv: 1907.05378 1. Approximate Submodular Function Minimization 2. Quantum speed-up for
Yassine Hamoudi, Patrick Rebentrost, Ansis Rosmanis, Miklos Santha
arXiv: 1907.05378
4
A submodular function is a set function satisfying the diminishing returns property:
4
A submodular function is a set function satisfying the diminishing returns property:
Example: area covered by cameras
4
A submodular function is a set function satisfying the diminishing returns property:
Example: area covered by cameras
5
A submodular function is a set function satisfying the diminishing returns property:
Example: size of a cut
5
A submodular function is a set function satisfying the diminishing returns property:
Example: size of a cut
6
(time = #queries to the oracle )
6
(time = #queries to the oracle )
(Grotschel, Lovasz, Shrijver 1981)
find such that
6
S⊂[n] F(S)
(time = #queries to the oracle )
(Grotschel, Lovasz, Shrijver 1981)
˜ O(n3) ˜ O(n2 log M)
M = max |F(S)|
where
find such that find such that
6
S⊂[n] F(S)
S⊂[n] F(S) + ϵ
(time = #queries to the oracle )
(Grotschel, Lovasz, Shrijver 1981)
˜ O(n3) ˜ O(n2 log M)
M = max |F(S)|
where
find such that find such that
6
S⊂[n] F(S)
S⊂[n] F(S) + ϵ
(time = #queries to the oracle )
(Grotschel, Lovasz, Shrijver 1981)
˜ O(n3) ˜ O(n2 log M)
M = max |F(S)|
where
˜ O(n5/3/ϵ2)
(Chakrabarty, Lee, Sidford, Wong STOC’17)
(classical)
find such that find such that
6
S⊂[n] F(S)
S⊂[n] F(S) + ϵ
(time = #queries to the oracle )
(Grotschel, Lovasz, Shrijver 1981)
˜ O(n3) ˜ O(n2 log M)
M = max |F(S)|
where
˜ O(n5/3/ϵ2)
(classical) (quantum)
(Chakrabarty, Lee, Sidford, Wong STOC’17)
(classical)
7
Discrete Optimization
Set function:
7
Discrete Optimization Continuous Optimization
Lovász extension:
Set function:
7
Discrete Optimization Continuous Optimization
Lovász extension:
Set function:
7
Discrete Optimization Continuous Optimization
Lovász extension:
Set function:
7
Discrete Optimization Continuous Optimization
Lovász extension:
Set function:
7
Discrete Optimization Continuous Optimization
Lovász extension:
Set function:
8
Set function:
Discrete Optimization Continuous Optimization
Lovász extension:
(1,1) (0,0) (1,0) (0,1) F(Ø) F({2}) F({1,2}) F({1})
8
Set function:
Discrete Optimization Continuous Optimization
Lovász extension:
(1,1) (0,0) (1,0) (0,1) F(Ø) F({2}) F({1,2}) F({1})
8
Set function:
Discrete Optimization Continuous Optimization
Lovász extension:
(1,1) (0,0) (1,0) (0,1) F(Ø) F({2}) F({1,2}) F({1})
8
Set function:
Discrete Optimization Continuous Optimization
Lovász extension:
(1,1) (0,0) (1,0) (0,1) F(Ø) F({2}) F({1,2}) F({1})
9
Convex function f : C → ℝ on a convex set C. (not necessarily differentiable)
9
Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x.
x
(not necessarily differentiable)
f(x)
g(x)
9
Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x.
x
(not necessarily differentiable)
f(x)
g(x)
9
Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x.
x
(not necessarily differentiable)
f(x)
g(x)
Stochastic Subgradient at x: random variable satisfying
˜ g(x) E[˜ g(x)] = g(x) ˜ g(x) w.p. 1/2 ˜ g(x) w.p. 1/2
10
Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x. (not necessarily differentiable) Stochastic Subgradient at x: random variable satisfying
E[˜ g(x)] = g(x) ˜ g(x)
(projected) Stochastic Subgradient Descent
10
Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x. (not necessarily differentiable) Stochastic Subgradient at x: random variable satisfying
E[˜ g(x)] = g(x)
˜ g(x)
(projected) Stochastic Subgradient Descent
10
Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x. (not necessarily differentiable) Stochastic Subgradient at x: random variable satisfying
E[˜ g(x)] = g(x)
˜ g(x) −η˜ g(xt)
(projected) Stochastic Subgradient Descent
10
Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x. (not necessarily differentiable) Stochastic Subgradient at x: random variable satisfying
E[˜ g(x)] = g(x)
˜ g(x) −η˜ g(xt)
projection
(projected) Stochastic Subgradient Descent
If has low variance then the number of steps is the same as if we were using g(x). 10
Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x. (not necessarily differentiable) Stochastic Subgradient at x: random variable satisfying
˜ g(x) E[˜ g(x)] = g(x)
˜ g(x) −η˜ g(xt)
projection
(projected) Stochastic Subgradient Descent
For the Lovász extension f, there exists a subgradient g(x) such that:
11
For the Lovász extension f, there exists a subgradient g(x) such that:
11
(Jegelka, Bilmes 2011) and (Hazan, Kale 2012)
For the Lovász extension f, there exists a subgradient g(x) such that:
11
(Jegelka, Bilmes 2011) and (Hazan, Kale 2012)
For the Lovász extension f, there exists a subgradient g(x) such that:
11
(Jegelka, Bilmes 2011) and (Hazan, Kale 2012)
Approximate minimization in time
A stochastic subgradient can be computed in time Q =
For the Lovász extension f, there exists a subgradient g(x) such that:
11
(Jegelka, Bilmes 2011) and (Hazan, Kale 2012)
˜ O(n2/3)
(Chakrabarty, Lee, Sidford, Wong STOC’17)
Approximate minimization in time
A stochastic subgradient can be computed in time Q =
For the Lovász extension f, there exists a subgradient g(x) such that:
11
(Jegelka, Bilmes 2011) and (Hazan, Kale 2012)
˜ O(n2/3)
(classical) (quantum)
(Chakrabarty, Lee, Sidford, Wong STOC’17)
Approximate minimization in time
A stochastic subgradient can be computed in time Q =
For the Lovász extension f, there exists a subgradient g(x) such that:
11
(Jegelka, Bilmes 2011) and (Hazan, Kale 2012)
˜ O(n2/3)
(classical) (quantum)
(Chakrabarty, Lee, Sidford, Wong STOC’17)
Approximate minimization in time
Approximate minimization in time
12
12
12
14
14
14
p1 + p2 p3 p1 + p2 + p3 p4 + p5 p1 p2 p4 p5 15
p1 + p2 p3 p1 + p2 + p3 p4 + p5 p1 p2 p4 p5 15
p1 + p2 p3 p1 + p2 + p3 p4 + p5 p1 p2 p4 p5 15
p1 + p2 p3 p1 + p2 + p3 p4 + p5 p1 p2 p4 p5 15
p1 + p2 p3 p1 + p2 + p3 p4 + p5 p1 p2 p4 p5 15
Preprocessing time: O(n) Cost per sample:
p1 + p2 p3 p1 + p2 + p3 p4 + p5 p1 p2 p4 p5 15
Preprocessing time: O(n) Cost per sample:
Cost for T samples: O(n + T log n)
p1 + p2 p3 p1 + p2 + p3 p4 + p5 p1 p2 p4 p5 15
16
Preprocessing: Sampling (repeat T times):
(Grover 2000)
16
Preprocessing: Sampling (repeat T times):
(Grover 2000)
V(|0⟩|0⟩) ⟼ 1 n ∑
i∈[n]
|i⟩|0⟩
16
Preprocessing: Sampling (repeat T times):
(Grover 2000)
V(|0⟩|0⟩) ⟼ 1 n ∑
i∈[n]
|i⟩|0⟩
16
⟼ 1 n ∑
i∈[n]
|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)
Preprocessing: Sampling (repeat T times):
(Grover 2000)
V(|0⟩|0⟩) ⟼ 1 n ∑
i∈[n]
|i⟩|0⟩
16
= 1 npmax (∑
i
pi |i⟩)|0⟩ + …|1⟩ ⟼ 1 n ∑
i∈[n]
|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)
Preprocessing: Sampling (repeat T times):
(Grover 2000)
V(|0⟩|0⟩) ⟼ 1 n ∑
i∈[n]
|i⟩|0⟩
∑
i
pi |i⟩
16
= 1 npmax (∑
i
pi |i⟩)|0⟩ + …|1⟩ ⟼ 1 n ∑
i∈[n]
|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)
Preprocessing: Sampling (repeat T times):
(Grover 2000)
V(|0⟩|0⟩) ⟼ 1 n ∑
i∈[n]
|i⟩|0⟩
∑
i
pi |i⟩
16
= 1 npmax (∑
i
pi |i⟩)|0⟩ + …|1⟩ ⟼ 1 n ∑
i∈[n]
|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)
Preprocessing time: O(
Cost per sample:
Preprocessing: Sampling (repeat T times):
(Grover 2000)
V(|0⟩|0⟩) ⟼ 1 n ∑
i∈[n]
|i⟩|0⟩
∑
i
pi |i⟩
16
= 1 npmax (∑
i
pi |i⟩)|0⟩ + …|1⟩ ⟼ 1 n ∑
i∈[n]
|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)
Preprocessing time: O(
Cost per sample:
Cost for T samples: O(
Preprocessing: Sampling (repeat T times):
(Grover 2000)
V(|0⟩|0⟩) ⟼ 1 n ∑
i∈[n]
|i⟩|0⟩
∑
i
pi |i⟩
16
= 1 npmax (∑
i
pi |i⟩)|0⟩ + …|1⟩ ⟼ 1 n ∑
i∈[n]
|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)
Preprocessing time: O(
Cost per sample:
Cost for T samples: O(
Preprocessing: Sampling (repeat T times):
(Grover 2000)
17
17
17
Our result: O(
18
Our result: O(
18
Element 1 2 3 4 5 6 7 Probability
p1 p2 p3 p4 p5 p6 p7
Distribution D
Our result: O(
18
Element 1 2 3 4 5 6 7 Probability
p1 p2 p3 p4 p5 p6 p7
Element 1 3 4 Probability
p1 PHeavy p3 PHeavy p4 PHeavy
Distribution D Distribution DHeavy
i
PHeavy = ∑
i∈Heavy
pi
Our result: O(
18
Element 1 2 3 4 5 6 7 Probability
p1 p2 p3 p4 p5 p6 p7
Element 1 3 4 Probability Element 2 5 6 7 Probability
p1 PHeavy p3 PHeavy p4 PHeavy p2 PLight p5 PLight p6 PLight p7 PLight
Distribution D Distribution DHeavy Distribution DLight
i
PHeavy = ∑
i∈Heavy
pi PLight = ∑
i∈Light
pi
Our result: O(
18
Element 1 2 3 4 5 6 7 Probability
p1 p2 p3 p4 p5 p6 p7
Element 1 3 4 Probability Element 2 5 6 7 Probability
p1 PHeavy p3 PHeavy p4 PHeavy p2 PLight p5 PLight p6 PLight p7 PLight
Distribution D Distribution DHeavy Distribution DLight
i
PHeavy = ∑
i∈Heavy
pi PLight = ∑
i∈Light
pi
Our result: O(
18
Element 1 2 3 4 5 6 7 Probability
p1 p2 p3 p4 p5 p6 p7
Element 1 3 4 Probability Element 2 5 6 7 Probability
p1 PHeavy p3 PHeavy p4 PHeavy p2 PLight p5 PLight p6 PLight p7 PLight
Distribution D Distribution DHeavy Distribution DLight
i
PHeavy = ∑
i∈Heavy
pi PLight = ∑
i∈Light
pi
19
Preprocessing: Sampling (repeat T times):
19
Preprocessing: Sampling (repeat T times):
19
Preprocessing:
∑
i∈Heavy
pi Sampling (repeat T times):
19
Preprocessing:
∑
i∈Heavy
pi Sampling (repeat T times):
19
Preprocessing:
∑
i∈Heavy
pi Sampling (repeat T times):
19
Preprocessing:
∑
i∈Heavy
pi
Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
19
Preprocessing:
∑
i∈Heavy
pi
Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
19
Preprocessing:
∑
i∈Heavy
pi
Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
20
Preprocessing:
Cost: Cost: Cost: Cost:
∑
i∈Heavy
pi
20
Preprocessing:
Cost: Cost: Cost:
O( nT)
Cost:
since |Heavy| ≤ T
∑
i∈Heavy
pi
20
Preprocessing:
Cost: Cost: Cost:
O( nT)
Cost:
O(T) since |Heavy| ≤ T
∑
i∈Heavy
pi
20
Preprocessing:
Cost: Cost: Cost:
O( nT) O(T)
Cost:
O(T) since |Heavy| ≤ T
∑
i∈Heavy
pi
20
Preprocessing:
Cost: Cost: Cost: O(
n) O( nT) O(T)
Cost:
O(T) since |Heavy| ≤ T
∑
i∈Heavy
pi
Cost per sample:
21 Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
Cost per sample:
21 Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(log n)
Cost per sample:
21 Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(log n)
Total cost: O(T log n)
Cost per sample:
21 Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(log n)
Total cost: O(T log n) Cost per sample:
Cost per sample:
21 Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(log n)
Total cost: O(T log n) Cost per sample:
pmax = max{ pi PLight : i ∈ Light} O( npmax) where
Cost per sample:
21 Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(log n)
Total cost: O(T log n) Cost per sample:
pmax = max{ pi PLight : i ∈ Light} O( npmax) where ≤ 1 T ⋅ PLight
Cost per sample:
21 Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(log n)
Total cost: O(T log n) Cost per sample:
pmax = max{ pi PLight : i ∈ Light} O( npmax) where
Total expected cost:
≤ 1 T ⋅ PLight
Cost per sample:
21 Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(log n)
Total cost: O(T log n) Cost per sample:
pmax = max{ pi PLight : i ∈ Light} O( npmax) where
Total expected cost: O(T ⋅ PLight ⋅
npmax) ≤ 1 T ⋅ PLight
Cost per sample:
21 Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(log n)
Total cost: O(T log n) Cost per sample:
pmax = max{ pi PLight : i ∈ Light} O( npmax) where
Total expected cost: O(T ⋅ PLight ⋅
npmax) = O( n ⋅ T ⋅ PLight) ≤ 1 T ⋅ PLight
Cost per sample:
21 Flip a coin that is head with probability PHeavy :
Sampling (repeat T times):
O(log n)
Total cost: O(T log n) Cost per sample:
pmax = max{ pi PLight : i ∈ Light} O( npmax) where
Total expected cost: O(T ⋅ PLight ⋅
npmax) = O( n ⋅ T ⋅ PLight) ≤ 1 T ⋅ PLight = O ( nT)
approximate submodular function minimization
approximate submodular function minimization
submodular function minimization?
∑
i∈[n]
pi |i⟩ O( nT)
(ongoing work: solving linear systems)