Quantum and Classical Algorithms for Approximate Submodular Function - - PowerPoint PPT Presentation

quantum and classical algorithms for approximate
SMART_READER_LITE
LIVE PREVIEW

Quantum and Classical Algorithms for Approximate Submodular Function - - PowerPoint PPT Presentation

Quantum and Classical Algorithms for Approximate Submodular Function Minimization Yassine Hamoudi, Patrick Rebentrost, Ansis Rosmanis, Miklos Santha arXiv: 1907.05378 1. Approximate Submodular Function Minimization 2. Quantum speed-up for


slide-1
SLIDE 1

Quantum and Classical Algorithms for Approximate Submodular Function Minimization

Yassine Hamoudi, Patrick Rebentrost, Ansis Rosmanis, Miklos Santha

arXiv: 1907.05378

slide-2
SLIDE 2

1. Approximate Submodular Function Minimization 2. Quantum speed-up for Importance Sampling

slide-3
SLIDE 3

Approximate Submodular Function Minimization

1

slide-4
SLIDE 4
  • 4

Submodular Function

A submodular function is a set function satisfying the diminishing returns property:

F : 2[n] → ℝ ∀A ⊂ B ⊂ [n] and i ∉ B, F(A ∪ {i}) − F(A) ≥ F(B ∪ {i}) − F(B)

slide-5
SLIDE 5
  • 4

Submodular Function

A submodular function is a set function satisfying the diminishing returns property:

F : 2[n] → ℝ

Example: area covered by cameras

∀A ⊂ B ⊂ [n] and i ∉ B, F(A ∪ {i}) − F(A) ≥ F(B ∪ {i}) − F(B)

A B

slide-6
SLIDE 6
  • 4

Submodular Function

A submodular function is a set function satisfying the diminishing returns property:

F : 2[n] → ℝ

Example: area covered by cameras

∀A ⊂ B ⊂ [n] and i ∉ B, F(A ∪ {i}) − F(A) ≥ F(B ∪ {i}) − F(B)

A B + i + i

slide-7
SLIDE 7
  • 5

Submodular Function

A B

|cut(A)| = 2 |cut(B)| = 5

A submodular function is a set function satisfying the diminishing returns property:

F : 2[n] → ℝ

Example: size of a cut

∀A ⊂ B ⊂ [n] and i ∉ B, F(A ∪ {i}) − F(A) ≥ F(B ∪ {i}) − F(B)

slide-8
SLIDE 8
  • 5

Submodular Function

A B i

|cut(A)| = 2 |cut(B)| = 5 |cut(A+i)| = 4 |cut(B+i)| = 6

A submodular function is a set function satisfying the diminishing returns property:

F : 2[n] → ℝ

Example: size of a cut

∀A ⊂ B ⊂ [n] and i ∉ B, F(A ∪ {i}) − F(A) ≥ F(B ∪ {i}) − F(B)

slide-9
SLIDE 9
  • 6

Submodular Function

A submodular function is a set function satisfying the diminishing returns property:

F : 2[n] → ℝ

Other examples:

∀A ⊂ B ⊂ [n] and i ∉ B, F(A ∪ {i}) − F(A) ≥ F(B ∪ {i}) − F(B)

  • F(S) = h(|S|) is submodular iff h is concave
slide-10
SLIDE 10
  • 6

Submodular Function

A submodular function is a set function satisfying the diminishing returns property:

F : 2[n] → ℝ

Other examples:

∀A ⊂ B ⊂ [n] and i ∉ B, F(A ∪ {i}) − F(A) ≥ F(B ∪ {i}) − F(B)

  • F(S) = h(|S|) is submodular iff h is concave
  • rank of a set of vectors
  • entropy of random variables
  • coverage functions
slide-11
SLIDE 11

Evaluation oracle access: given S obtain F(S).

  • 7

Submodular Function Minimization

(time = #queries to the oracle )

slide-12
SLIDE 12

Evaluation oracle access: given S obtain F(S).

  • 7

Submodular Function Minimization

(time = #queries to the oracle )

Submodular functions can be minimized in polynomial time

(Grotschel, Lovasz, Shrijver 1981)

slide-13
SLIDE 13

find such that

Evaluation oracle access: given S obtain F(S).

  • 7

Submodular Function Minimization

Exact Minimization: F(S⋆) = min

S⊂[n] F(S)

S⋆

(time = #queries to the oracle )

Submodular functions can be minimized in polynomial time

  • Lee, Sidford, Wong 2015:

(Grotschel, Lovasz, Shrijver 1981)

˜ O(n3) ˜ O(n2 log M)

M = max |F(S)|

  • r

where

(lower bound: Ω(n))

slide-14
SLIDE 14

find such that find such that

Evaluation oracle access: given S obtain F(S).

  • 7

Submodular Function Minimization

Exact Minimization: F(S⋆) = min

S⊂[n] F(S)

S⋆ F(S⋆) ≤ min

S⊂[n] F(S) + ϵ

S⋆ ε-Approx. Minimization:

(time = #queries to the oracle )

Submodular functions can be minimized in polynomial time

  • Lee, Sidford, Wong 2015:

(Grotschel, Lovasz, Shrijver 1981)

˜ O(n3) ˜ O(n2 log M)

M = max |F(S)|

  • r

where

(lower bound: Ω(n))

slide-15
SLIDE 15

find such that find such that

Evaluation oracle access: given S obtain F(S).

  • 7

Submodular Function Minimization

Exact Minimization: F(S⋆) = min

S⊂[n] F(S)

S⋆ F(S⋆) ≤ min

S⊂[n] F(S) + ϵ

S⋆ ε-Approx. Minimization:

(time = #queries to the oracle )

Submodular functions can be minimized in polynomial time

  • Lee, Sidford, Wong 2015:

(Grotschel, Lovasz, Shrijver 1981)

˜ O(n3) ˜ O(n2 log M)

M = max |F(S)|

  • r

where

  • Chakrabarty, Lee, Sidford, Wong 2017:

˜ O(n5/3/ϵ2)

(lower bound: Ω(n))

slide-16
SLIDE 16

find such that find such that

Evaluation oracle access: given S obtain F(S).

  • 7

Submodular Function Minimization

Exact Minimization: F(S⋆) = min

S⊂[n] F(S)

S⋆ F(S⋆) ≤ min

S⊂[n] F(S) + ϵ

S⋆ ε-Approx. Minimization:

(time = #queries to the oracle )

Submodular functions can be minimized in polynomial time

  • Lee, Sidford, Wong 2015:

(Grotschel, Lovasz, Shrijver 1981)

˜ O(n3) ˜ O(n2 log M)

M = max |F(S)|

  • r

where

  • Chakrabarty, Lee, Sidford, Wong 2017:

˜ O(n5/3/ϵ2)

  • Our result:

˜ O(n3/2/ϵ2) ˜ O(n5/4/ϵ5/2)

(classical) (quantum)

  • r

(lower bound: Ω(n))

slide-17
SLIDE 17

find such that find such that

Evaluation oracle access: given S obtain F(S).

  • 7

Submodular Function Minimization

Exact Minimization: F(S⋆) = min

S⊂[n] F(S)

S⋆ F(S⋆) ≤ min

S⊂[n] F(S) + ϵ

S⋆ ε-Approx. Minimization:

(time = #queries to the oracle )

Submodular functions can be minimized in polynomial time

  • Lee, Sidford, Wong 2015:

(Grotschel, Lovasz, Shrijver 1981)

˜ O(n3) ˜ O(n2 log M)

M = max |F(S)|

  • r

where

  • Chakrabarty, Lee, Sidford, Wong 2017:

˜ O(n5/3/ϵ2)

  • Our result:

˜ O(n3/2/ϵ2) ˜ O(n5/4/ϵ5/2)

(classical) (quantum)

  • r
  • Axelrod, Liu, Sidford 2019:

˜ O(n/ϵ2)

(classical) (lower bound: Ω(n))

slide-18
SLIDE 18
  • 8

Lovász Extension

Discrete Optimization

F : 2[n] → ℝ

Set function:

slide-19
SLIDE 19
  • 8

Lovász Extension

Discrete Optimization Continuous Optimization

f : [0,1]n → ℝ

Lovász extension:

F : 2[n] → ℝ

Set function:

slide-20
SLIDE 20
  • 8

Lovász Extension

Discrete Optimization Continuous Optimization

f : [0,1]n → ℝ

Lovász extension:

n = 2

F({1}) = 10 F({2}) = 6 F({1,2}) = 3 F(Ø) = 0 F : 2[n] → ℝ

Set function:

slide-21
SLIDE 21

(1,1)

  • 8

Lovász Extension

Discrete Optimization Continuous Optimization

f : [0,1]n → ℝ

Lovász extension:

(0,0) (1,0) (0,1)

n = 2

F({1}) = 10 F({2}) = 6 F({1,2}) = 3 F(Ø) = 0

[0,1]2

F : 2[n] → ℝ

Set function:

slide-22
SLIDE 22

(1,1)

  • 8

Lovász Extension

Discrete Optimization Continuous Optimization

f : [0,1]n → ℝ

Lovász extension:

(0,0) (1,0) (0,1) F(Ø) F({2}) F({1,2}) F({1})

n = 2

F({1}) = 10 F({2}) = 6 F({1,2}) = 3 F(Ø) = 0 F : 2[n] → ℝ

Set function:

slide-23
SLIDE 23

(1,1)

  • 8

Lovász Extension

Discrete Optimization Continuous Optimization

f : [0,1]n → ℝ

Lovász extension:

(0,0) (1,0) (0,1) F(Ø) F({2}) F({1,2}) F({1})

n = 2

F({1}) = 10 F({2}) = 6 F({1,2}) = 3 F(Ø) = 0 F : 2[n] → ℝ

Set function:

slide-24
SLIDE 24
  • 9

Lovász Extension

F : 2[n] → ℝ

Set function:

Discrete Optimization Continuous Optimization

f : [0,1]n → ℝ

Lovász extension:

The Lovász extension is:

(1,1) (0,0) (1,0) (0,1) F(Ø) F({2}) F({1,2}) F({1})

slide-25
SLIDE 25
  • 9

Lovász Extension

F : 2[n] → ℝ

Set function:

Discrete Optimization Continuous Optimization

f : [0,1]n → ℝ

Lovász extension:

The Lovász extension is:

  • Piecewise linear

(1,1) (0,0) (1,0) (0,1) F(Ø) F({2}) F({1,2}) F({1})

slide-26
SLIDE 26
  • 9

Lovász Extension

F : 2[n] → ℝ

Set function:

Discrete Optimization Continuous Optimization

f : [0,1]n → ℝ

Lovász extension:

  • Convex iff F is submodular (Lovász 1983)

The Lovász extension is:

  • Piecewise linear

(1,1) (0,0) (1,0) (0,1) F(Ø) F({2}) F({1,2}) F({1})

slide-27
SLIDE 27
  • 9

Lovász Extension

F : 2[n] → ℝ

Set function:

Discrete Optimization Continuous Optimization

f : [0,1]n → ℝ

Lovász extension:

  • Convex iff F is submodular (Lovász 1983)

The Lovász extension is:

  • Piecewise linear

(1,1) (0,0) (1,0) (0,1) F(Ø) F({2}) F({1,2}) F({1})

  • Evaluable using n queries to F.
slide-28
SLIDE 28
  • 10

Submodular Function Minimization

Exact Minimization: ε-Approx. Minimization:

  • Lee, Sidford, Wong 2015:
  • Chakrabarty, Lee, Sidford, Wong 2017:
  • Our result:
  • Axelrod, Liu, Sidford 2019:

Cutting plane method

  • n the Lovász extension

Stochastic Subgradient Descent

  • n the Lovász extension
slide-29
SLIDE 29
  • 11

Stochastic Subgradient Descent

Convex function f : C → ℝ on a convex set C. (not necessarily differentiable)

slide-30
SLIDE 30
  • 11

Stochastic Subgradient Descent

Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x.

x

(not necessarily differentiable)

f(x)

g(x)

slide-31
SLIDE 31
  • 11

Stochastic Subgradient Descent

Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x.

x

(not necessarily differentiable)

f(x)

g(x)

slide-32
SLIDE 32
  • 11

Stochastic Subgradient Descent

Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x.

x

(not necessarily differentiable)

f(x)

g(x)

Stochastic Subgradient at x: random variable satisfying

˜ g(x) E[˜ g(x)] = g(x) ˜ g(x) w.p. 1/2 ˜ g(x) w.p. 1/2

slide-33
SLIDE 33
  • 12

Stochastic Subgradient Descent

Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x. (not necessarily differentiable) Stochastic Subgradient at x: random variable satisfying

E[˜ g(x)] = g(x) ˜ g(x)

(projected) Stochastic Subgradient Descent

slide-34
SLIDE 34
  • 12

Stochastic Subgradient Descent

Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x. (not necessarily differentiable) Stochastic Subgradient at x: random variable satisfying

E[˜ g(x)] = g(x)

C xt

˜ g(x)

(projected) Stochastic Subgradient Descent

slide-35
SLIDE 35
  • 12

Stochastic Subgradient Descent

Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x. (not necessarily differentiable) Stochastic Subgradient at x: random variable satisfying

E[˜ g(x)] = g(x)

C xt

˜ g(x) −η˜ g(xt)

(projected) Stochastic Subgradient Descent

slide-36
SLIDE 36
  • 12

Stochastic Subgradient Descent

Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x. (not necessarily differentiable) Stochastic Subgradient at x: random variable satisfying

E[˜ g(x)] = g(x)

C xt

˜ g(x) −η˜ g(xt)

xt+1

projection

(projected) Stochastic Subgradient Descent

slide-37
SLIDE 37

If has low variance then the number of steps is the same as if we were using g(x).

  • 12

Stochastic Subgradient Descent

Convex function f : C → ℝ on a convex set C. Subgradient at x: slope g(x) of any line that is below the graph of f and intersects it at x. (not necessarily differentiable) Stochastic Subgradient at x: random variable satisfying

˜ g(x) E[˜ g(x)] = g(x)

C xt

˜ g(x) −η˜ g(xt)

xt+1

projection

(projected) Stochastic Subgradient Descent

slide-38
SLIDE 38

For the Lovász extension f, there exists a subgradient g(x) such that:

  • 13

Stochastic Subgradient for the Lovász extension

slide-39
SLIDE 39

For the Lovász extension f, there exists a subgradient g(x) such that:

  • 13

Stochastic Subgradient for the Lovász extension

  • each coordinate g(x)i can be computed with two queries to F
slide-40
SLIDE 40
  • subgradient descent requires steps to get an ε-minimizer of f

For the Lovász extension f, there exists a subgradient g(x) such that:

  • 13

Stochastic Subgradient for the Lovász extension

O(n/ϵ2)

  • each coordinate g(x)i can be computed with two queries to F

(Jegelka, Bilmes 2011) and (Hazan, Kale 2012)

slide-41
SLIDE 41

A stochastic subgradient for g(x) can be computed in time:

  • subgradient descent requires steps to get an ε-minimizer of f

For the Lovász extension f, there exists a subgradient g(x) such that:

  • 13

Stochastic Subgradient for the Lovász extension

˜ g(x)

O(n/ϵ2)

  • each coordinate g(x)i can be computed with two queries to F

(Jegelka, Bilmes 2011) and (Hazan, Kale 2012)

slide-42
SLIDE 42

A stochastic subgradient for g(x) can be computed in time:

  • subgradient descent requires steps to get an ε-minimizer of f

For the Lovász extension f, there exists a subgradient g(x) such that:

  • 13

Stochastic Subgradient for the Lovász extension

˜ g(x)

  • Chakrabarty, Lee, Sidford, Wong 2017:

˜ O(n2/3)

O(n/ϵ2)

  • each coordinate g(x)i can be computed with two queries to F

(Jegelka, Bilmes 2011) and (Hazan, Kale 2012)

slide-43
SLIDE 43

A stochastic subgradient for g(x) can be computed in time:

  • subgradient descent requires steps to get an ε-minimizer of f

For the Lovász extension f, there exists a subgradient g(x) such that:

  • 13

Stochastic Subgradient for the Lovász extension

˜ g(x)

  • Chakrabarty, Lee, Sidford, Wong 2017:

˜ O(n2/3)

  • Our result:

˜ O(n1/2) ˜ O(n1/4/ϵ1/2)

(classical) (quantum)

  • r

O(n/ϵ2)

  • each coordinate g(x)i can be computed with two queries to F

(Jegelka, Bilmes 2011) and (Hazan, Kale 2012)

slide-44
SLIDE 44

A stochastic subgradient for g(x) can be computed in time:

  • subgradient descent requires steps to get an ε-minimizer of f

For the Lovász extension f, there exists a subgradient g(x) such that:

  • 13

Stochastic Subgradient for the Lovász extension

˜ g(x)

  • Chakrabarty, Lee, Sidford, Wong 2017:

˜ O(n2/3)

  • Our result:

˜ O(n1/2) ˜ O(n1/4/ϵ1/2)

(classical) (quantum)

  • r
  • Axelrod, Liu, Sidford 2019:

˜ O(1)

O(n/ϵ2)

  • each coordinate g(x)i can be computed with two queries to F

(Jegelka, Bilmes 2011) and (Hazan, Kale 2012)

slide-45
SLIDE 45

First attempt to construct :

  • 14

Stochastic Subgradient for the Lovász extension

˜ g(x)

slide-46
SLIDE 46

First attempt to construct :

  • 14

Stochastic Subgradient for the Lovász extension

˜ g(x)

For any non-zero vector u ∈ Rn, define the random variable

slide-47
SLIDE 47

First attempt to construct :

  • 14

Stochastic Subgradient for the Lovász extension

˜ g(x)

For any non-zero vector u ∈ Rn, define the random variable

̂ u = (0, … , 0 , sgn(ui)∥u∥1 , 0 , … , 0)

pi = |ui| ∥u∥1

i-th coordinate

where i is sampled with probability

slide-48
SLIDE 48

First attempt to construct :

  • 14

Stochastic Subgradient for the Lovász extension

˜ g(x)

For any non-zero vector u ∈ Rn, define the random variable

̂ u = (0, … , 0 , sgn(ui)∥u∥1 , 0 , … , 0)

pi = |ui| ∥u∥1

i-th coordinate

where i is sampled with probability

E[ ̂ u] = ∑

i

|ui| ∥u∥1 sgn(ui)∥u∥1 ⋅ ⃗ e i = ∑

i

ui ⋅ ⃗ e i = u

Unbiased:

slide-49
SLIDE 49

First attempt to construct :

  • 14

Stochastic Subgradient for the Lovász extension

˜ g(x)

For any non-zero vector u ∈ Rn, define the random variable

̂ u = (0, … , 0 , sgn(ui)∥u∥1 , 0 , … , 0)

pi = |ui| ∥u∥1

i-th coordinate

where i is sampled with probability

E[ ̂ u] = ∑

i

|ui| ∥u∥1 sgn(ui)∥u∥1 ⋅ ⃗ e i = ∑

i

ui ⋅ ⃗ e i = u

Unbiased: 2nd moment:

E[∥ ̂ u∥2

2] = ∑ i

|ui| ∥u∥1 sgn(ui)2∥u∥2

1 = ∥u∥2 1

slide-50
SLIDE 50

First attempt to construct :

  • 14

Stochastic Subgradient for the Lovász extension

˜ g(x)

For any non-zero vector u ∈ Rn, define the random variable

̂ u = (0, … , 0 , sgn(ui)∥u∥1 , 0 , … , 0)

pi = |ui| ∥u∥1

i-th coordinate

where i is sampled with probability

E[ ̂ u] = ∑

i

|ui| ∥u∥1 sgn(ui)∥u∥1 ⋅ ⃗ e i = ∑

i

ui ⋅ ⃗ e i = u

Unbiased: 2nd moment:

E[∥ ̂ u∥2

2] = ∑ i

|ui| ∥u∥1 sgn(ui)2∥u∥2

1 = ∥u∥2 1

For the Lovász extension: u = g(x) and ||g(x)||1 = O(1) (low variance)

(Jegelka, Bilmes 2011)

slide-51
SLIDE 51

First attempt to construct :

  • 14

Stochastic Subgradient for the Lovász extension

˜ g(x)

For any non-zero vector u ∈ Rn, define the random variable

̂ u = (0, … , 0 , sgn(ui)∥u∥1 , 0 , … , 0)

pi = |ui| ∥u∥1

i-th coordinate

where i is sampled with probability

E[ ̂ u] = ∑

i

|ui| ∥u∥1 sgn(ui)∥u∥1 ⋅ ⃗ e i = ∑

i

ui ⋅ ⃗ e i = u

Unbiased: 2nd moment:

E[∥ ̂ u∥2

2] = ∑ i

|ui| ∥u∥1 sgn(ui)2∥u∥2

1 = ∥u∥2 1

For the Lovász extension: u = g(x) and ||g(x)||1 = O(1) (low variance)

Hard to sample

(Importance sampling)

(Jegelka, Bilmes 2011)

slide-52
SLIDE 52

Second attempt:

  • 15

Stochastic Subgradient for the Lovász extension

slide-53
SLIDE 53

Second attempt:

  • 15

Stochastic Subgradient for the Lovász extension

(Chakrabarty, Lee, Sidford, Wong 2017)

Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.

˜ d(x, y) d(x, y) = g(y) − g(x) x − y

slide-54
SLIDE 54

Second attempt:

  • 15

Stochastic Subgradient for the Lovász extension

(Chakrabarty, Lee, Sidford, Wong 2017)

Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.

˜ d(x, y) d(x, y) = g(y) − g(x) x − y

Our construction:

slide-55
SLIDE 55

Second attempt:

  • 15

Stochastic Subgradient for the Lovász extension

(Chakrabarty, Lee, Sidford, Wong 2017)

Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.

˜ d(x, y) d(x, y) = g(y) − g(x) x − y

Our construction:

x0 ⟶ ̂ g(x0)

slide-56
SLIDE 56

Second attempt:

  • 15

Stochastic Subgradient for the Lovász extension

(Chakrabarty, Lee, Sidford, Wong 2017)

Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.

˜ d(x, y) d(x, y) = g(y) − g(x) x − y

Our construction:

x0 ⟶ ̂ g(x0) x1 ⟶ ̂ g(x0) + ˜ d(x0, x1)

slide-57
SLIDE 57

Second attempt:

  • 15

Stochastic Subgradient for the Lovász extension

(Chakrabarty, Lee, Sidford, Wong 2017)

Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.

˜ d(x, y) d(x, y) = g(y) − g(x) x − y

Our construction:

x0 ⟶ ̂ g(x0) x1 ⟶ ̂ g(x0) + ˜ d(x0, x1) x2 ⟶ ̂ g(x0) + ˜ d(x0, x2)

slide-58
SLIDE 58

Second attempt:

  • 15

Stochastic Subgradient for the Lovász extension

(Chakrabarty, Lee, Sidford, Wong 2017)

Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.

˜ d(x, y) d(x, y) = g(y) − g(x) x − y

Our construction:

x0 ⟶ ̂ g(x0) x1 ⟶ ̂ g(x0) + ˜ d(x0, x1) x2 ⟶ ̂ g(x0) + ˜ d(x0, x2)

xT−1 ⟶ ̂ g(x0) + ˜ d(x0, xT−1)

(T = parameter to be optimized)

slide-59
SLIDE 59

Second attempt:

  • 15

Stochastic Subgradient for the Lovász extension

(Chakrabarty, Lee, Sidford, Wong 2017)

Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.

˜ d(x, y) d(x, y) = g(y) − g(x) x − y

Our construction:

x0 ⟶ ̂ g(x0) x1 ⟶ ̂ g(x0) + ˜ d(x0, x1) x2 ⟶ ̂ g(x0) + ˜ d(x0, x2)

xT−1 ⟶ ̂ g(x0) + ˜ d(x0, xT−1) xT ⟶ ̂ g(xT)

(T = parameter to be optimized)

slide-60
SLIDE 60

Second attempt:

  • 15

Stochastic Subgradient for the Lovász extension

(Chakrabarty, Lee, Sidford, Wong 2017)

Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.

˜ d(x, y) d(x, y) = g(y) − g(x) x − y

Our construction:

x0 ⟶ ̂ g(x0) x1 ⟶ ̂ g(x0) + ˜ d(x0, x1) x2 ⟶ ̂ g(x0) + ˜ d(x0, x2)

xT−1 ⟶ ̂ g(x0) + ˜ d(x0, xT−1) xT ⟶ ̂ g(xT) xT+1 ⟶ ̂ g(xT) + ˜ d(xT, xT+1)

(T = parameter to be optimized)

slide-61
SLIDE 61

xT+2 ⟶ ̂ g(xT) + ˜ d(xT, xT+2) Second attempt:

  • 15

Stochastic Subgradient for the Lovász extension

(Chakrabarty, Lee, Sidford, Wong 2017)

Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.

˜ d(x, y) d(x, y) = g(y) − g(x) x − y

Our construction:

x0 ⟶ ̂ g(x0) x1 ⟶ ̂ g(x0) + ˜ d(x0, x1) x2 ⟶ ̂ g(x0) + ˜ d(x0, x2)

⋮ ⋮

xT−1 ⟶ ̂ g(x0) + ˜ d(x0, xT−1) xT ⟶ ̂ g(xT) xT+1 ⟶ ̂ g(xT) + ˜ d(xT, xT+1) x2T−1 ⟶ ̂ g(xT) + ˜ d(xT, x2T−1)

(T = parameter to be optimized)

slide-62
SLIDE 62

xT+2 ⟶ ̂ g(xT) + ˜ d(xT, xT+2) Second attempt:

  • 15

Stochastic Subgradient for the Lovász extension

(Chakrabarty, Lee, Sidford, Wong 2017)

Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.

˜ d(x, y) d(x, y) = g(y) − g(x) x − y

Our construction:

x0 ⟶ ̂ g(x0) x1 ⟶ ̂ g(x0) + ˜ d(x0, x1) x2 ⟶ ̂ g(x0) + ˜ d(x0, x2)

⋮ ⋮

xT−1 ⟶ ̂ g(x0) + ˜ d(x0, xT−1) xT ⟶ ̂ g(xT) xT+1 ⟶ ̂ g(xT) + ˜ d(xT, xT+1) x2T−1 ⟶ ̂ g(xT) + ˜ d(xT, x2T−1)

x2T ⟶ ̂ g(x2T)

(T = parameter to be optimized)

slide-63
SLIDE 63

xT+2 ⟶ ̂ g(xT) + ˜ d(xT, xT+2) Second attempt:

  • 15

Stochastic Subgradient for the Lovász extension

(Chakrabarty, Lee, Sidford, Wong 2017)

Tool: there is an unbiased estimate of that can be computed efficiently when is sparse.

˜ d(x, y) d(x, y) = g(y) − g(x) x − y

Our construction:

x0 ⟶ ̂ g(x0) x1 ⟶ ̂ g(x0) + ˜ d(x0, x1) x2 ⟶ ̂ g(x0) + ˜ d(x0, x2)

⋮ ⋮

xT−1 ⟶ ̂ g(x0) + ˜ d(x0, xT−1) xT ⟶ ̂ g(xT) xT+1 ⟶ ̂ g(xT) + ˜ d(xT, xT+1) T independent

samples

T independent

samples

x2T−1 ⟶ ̂ g(xT) + ˜ d(xT, x2T−1)

x2T ⟶ ̂ g(x2T)

(T = parameter to be optimized)

slide-64
SLIDE 64

Quantum speed-up for Importance Sampling

2

slide-65
SLIDE 65

Problem

  • 17

discrete probability distribution D = (p1,…,pn) on [n]. Input: Output: T independent samples i1,…,iT ~ D. Evaluation oracle access Classical Quantum

U(|i⟩|0⟩) = |i⟩|pi⟩ i ↦ pi

Cost = # queries to the evaluation oracle

slide-66
SLIDE 66

Importance Sampling with a Classical Oracle Binary Tree

  • 18
slide-67
SLIDE 67

Importance Sampling with a Classical Oracle Binary Tree

p1 + p2 p3 p1 + p2 + p3 p4 + p5 p1 p2 p4 p5

  • 18
slide-68
SLIDE 68

Importance Sampling with a Classical Oracle Binary Tree

Preprocessing time: O(n) Cost per sample:

O(log n)

p1 + p2 p3 p1 + p2 + p3 p4 + p5 p1 p2 p4 p5

  • 18
slide-69
SLIDE 69

Importance Sampling with a Classical Oracle Binary Tree

Preprocessing time: O(n) Cost per sample:

O(log n)

Cost for T samples: O(n + T log n)

p1 + p2 p3 p1 + p2 + p3 p4 + p5 p1 p2 p4 p5

  • 18
slide-70
SLIDE 70

Importance Sampling with a Classical Oracle Binary Tree

Preprocessing time: O(n)

Alias Method

Cost per sample:

O(log n)

Cost for T samples: O(n + T log n)

Preprocessing time: O(n) Cost per sample:

O(1)

Cost for T samples: O(n + T)

p1 + p2 p3 p1 + p2 + p3 p4 + p5 p1 p2 p4 p5

  • 18

(Walker 1974, Vose 1991)

slide-71
SLIDE 71

Importance Sampling with Quantum State preparation

  • 19

Preprocessing: Sampling (repeat T times):

(Grover 2000)

slide-72
SLIDE 72

Importance Sampling with Quantum State preparation

  • 1. Compute with quantum Maximum Finding

pmax = max {p1, …, pn}

  • 19

Preprocessing: Sampling (repeat T times):

(Grover 2000)

slide-73
SLIDE 73

Importance Sampling with Quantum State preparation

V(|0⟩|0⟩) ⟼ 1 n ∑

i∈[n]

|i⟩|0⟩

  • 1. Compute with quantum Maximum Finding

pmax = max {p1, …, pn}

  • 2. Construct the unitary
  • 19

Preprocessing: Sampling (repeat T times):

(Grover 2000)

slide-74
SLIDE 74

Importance Sampling with Quantum State preparation

V(|0⟩|0⟩) ⟼ 1 n ∑

i∈[n]

|i⟩|0⟩

  • 1. Compute with quantum Maximum Finding

pmax = max {p1, …, pn}

  • 2. Construct the unitary
  • 19

⟼ 1 n ∑

i∈[n]

|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)

Preprocessing: Sampling (repeat T times):

(Grover 2000)

slide-75
SLIDE 75

Importance Sampling with Quantum State preparation

V(|0⟩|0⟩) ⟼ 1 n ∑

i∈[n]

|i⟩|0⟩

  • 1. Compute with quantum Maximum Finding

pmax = max {p1, …, pn}

  • 2. Construct the unitary
  • 19

= 1 npmax (∑

i

pi |i⟩)|0⟩ + …|1⟩ ⟼ 1 n ∑

i∈[n]

|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)

Preprocessing: Sampling (repeat T times):

(Grover 2000)

slide-76
SLIDE 76

Importance Sampling with Quantum State preparation

V(|0⟩|0⟩) ⟼ 1 n ∑

i∈[n]

|i⟩|0⟩

  • 1. Compute with quantum Maximum Finding

pmax = max {p1, …, pn}

  • 2. Construct the unitary
  • 1. Prepare with Amplitude Amplification on V, and measure it.

i

pi |i⟩

  • 19

= 1 npmax (∑

i

pi |i⟩)|0⟩ + …|1⟩ ⟼ 1 n ∑

i∈[n]

|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)

Preprocessing: Sampling (repeat T times):

(Grover 2000)

slide-77
SLIDE 77

Importance Sampling with Quantum State preparation

V(|0⟩|0⟩) ⟼ 1 n ∑

i∈[n]

|i⟩|0⟩

  • 1. Compute with quantum Maximum Finding

pmax = max {p1, …, pn}

  • 2. Construct the unitary
  • 1. Prepare with Amplitude Amplification on V, and measure it.

i

pi |i⟩

  • 19

= 1 npmax (∑

i

pi |i⟩)|0⟩ + …|1⟩ ⟼ 1 n ∑

i∈[n]

|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)

Preprocessing time: O(

n)

Cost per sample:

O( npmax)

Preprocessing: Sampling (repeat T times):

(Grover 2000)

slide-78
SLIDE 78

Importance Sampling with Quantum State preparation

V(|0⟩|0⟩) ⟼ 1 n ∑

i∈[n]

|i⟩|0⟩

  • 1. Compute with quantum Maximum Finding

pmax = max {p1, …, pn}

  • 2. Construct the unitary
  • 1. Prepare with Amplitude Amplification on V, and measure it.

i

pi |i⟩

  • 19

= 1 npmax (∑

i

pi |i⟩)|0⟩ + …|1⟩ ⟼ 1 n ∑

i∈[n]

|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)

Preprocessing time: O(

n)

Cost per sample:

O( npmax)

Cost for T samples: O(

n + T npmax)

Preprocessing: Sampling (repeat T times):

(Grover 2000)

slide-79
SLIDE 79

Importance Sampling with Quantum State preparation

V(|0⟩|0⟩) ⟼ 1 n ∑

i∈[n]

|i⟩|0⟩

  • 1. Compute with quantum Maximum Finding

pmax = max {p1, …, pn}

  • 2. Construct the unitary
  • 1. Prepare with Amplitude Amplification on V, and measure it.

i

pi |i⟩

  • 19

= 1 npmax (∑

i

pi |i⟩)|0⟩ + …|1⟩ ⟼ 1 n ∑

i∈[n]

|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)

Preprocessing time: O(

n)

Cost per sample:

O( npmax)

Cost for T samples: O(

n + T npmax)

Preprocessing: Sampling (repeat T times):

(Grover 2000)

= O(T n)

slide-80
SLIDE 80

Importance Sampling with Quantum State preparation

V(|0⟩|0⟩) ⟼ 1 n ∑

i∈[n]

|i⟩|0⟩

  • 1. Compute with quantum Maximum Finding

pmax = max {p1, …, pn}

  • 2. Construct the unitary
  • 1. Prepare with Amplitude Amplification on V, and measure it.

i

pi |i⟩

  • 19

= 1 npmax (∑

i

pi |i⟩)|0⟩ + …|1⟩ ⟼ 1 n ∑

i∈[n]

|i⟩( pi pmax |0⟩ + 1 − pi pmax |1⟩)

Preprocessing time: O(

n)

Cost per sample:

O( npmax)

Cost for T samples: O(

n + T npmax)

Our result: O(

Tn)

Preprocessing: Sampling (repeat T times):

(Grover 2000)

= O(T n)

slide-81
SLIDE 81

Importance Sampling with a Quantum Oracle

Our result: O(

Tn) for obtaining T independent samples from D = (p1,…,pn).

  • 20
slide-82
SLIDE 82

Importance Sampling with a Quantum Oracle

Our result: O(

Tn) for obtaining T independent samples from D = (p1,…,pn).

  • 20

Element 1 2 3 4 5 6 7 Probability

p1 p2 p3 p4 p5 p6 p7

Distribution D

slide-83
SLIDE 83

Importance Sampling with a Quantum Oracle

Our result: O(

Tn) for obtaining T independent samples from D = (p1,…,pn).

  • 20

Element 1 2 3 4 5 6 7 Probability

p1 p2 p3 p4 p5 p6 p7

Element 1 3 4 Probability

p1 PHeavy p3 PHeavy p4 PHeavy

Distribution D Distribution DHeavy

p

i

≥ 1 / T

PHeavy = ∑

i∈Heavy

pi

slide-84
SLIDE 84

Importance Sampling with a Quantum Oracle

Our result: O(

Tn) for obtaining T independent samples from D = (p1,…,pn).

  • 20

Element 1 2 3 4 5 6 7 Probability

p1 p2 p3 p4 p5 p6 p7

Element 1 3 4 Probability Element 2 5 6 7 Probability

p1 PHeavy p3 PHeavy p4 PHeavy p2 PLight p5 PLight p6 PLight p7 PLight

Distribution D Distribution DHeavy Distribution DLight

p

i

≥ 1 / T pi < 1/T

PHeavy = ∑

i∈Heavy

pi PLight = ∑

i∈Light

pi

slide-85
SLIDE 85

Importance Sampling with a Quantum Oracle

Our result: O(

Tn) for obtaining T independent samples from D = (p1,…,pn).

  • 20

Element 1 2 3 4 5 6 7 Probability

p1 p2 p3 p4 p5 p6 p7

Element 1 3 4 Probability Element 2 5 6 7 Probability

p1 PHeavy p3 PHeavy p4 PHeavy p2 PLight p5 PLight p6 PLight p7 PLight

Distribution D Distribution DHeavy Distribution DLight

Use the Alias method

p

i

≥ 1 / T pi < 1/T

PHeavy = ∑

i∈Heavy

pi PLight = ∑

i∈Light

pi

slide-86
SLIDE 86

Importance Sampling with a Quantum Oracle

Our result: O(

Tn) for obtaining T independent samples from D = (p1,…,pn).

  • 20

Element 1 2 3 4 5 6 7 Probability

p1 p2 p3 p4 p5 p6 p7

Element 1 3 4 Probability Element 2 5 6 7 Probability

p1 PHeavy p3 PHeavy p4 PHeavy p2 PLight p5 PLight p6 PLight p7 PLight

Distribution D Distribution DHeavy Distribution DLight

Use the Alias method Use Quantum State Preparation

p

i

≥ 1 / T pi < 1/T

PHeavy = ∑

i∈Heavy

pi PLight = ∑

i∈Light

pi

slide-87
SLIDE 87

Importance Sampling with a Quantum Oracle

  • 21

Preprocessing:

slide-88
SLIDE 88

Importance Sampling with a Quantum Oracle

  • 21
  • 1. Compute the set Heavy ⊂ [n] of indices i such that pi ≥ 1/T, using Grover Search.

Preprocessing:

slide-89
SLIDE 89

Importance Sampling with a Quantum Oracle

  • 21
  • 1. Compute the set Heavy ⊂ [n] of indices i such that pi ≥ 1/T, using Grover Search.

Preprocessing:

  • 2. Compute PHeavy =

i∈Heavy

pi

slide-90
SLIDE 90

Importance Sampling with a Quantum Oracle

  • 21
  • 1. Compute the set Heavy ⊂ [n] of indices i such that pi ≥ 1/T, using Grover Search.
  • 3. Apply the preprocessing step of the Alias Method on DHeavy.

Preprocessing:

  • 2. Compute PHeavy =

i∈Heavy

pi

slide-91
SLIDE 91
  • 4. Apply the preprocessing step of the Quant. State Preparation method on DLight.

Importance Sampling with a Quantum Oracle

  • 21
  • 1. Compute the set Heavy ⊂ [n] of indices i such that pi ≥ 1/T, using Grover Search.
  • 3. Apply the preprocessing step of the Alias Method on DHeavy.

Preprocessing:

  • 2. Compute PHeavy =

i∈Heavy

pi

slide-92
SLIDE 92

Importance Sampling with a Quantum Oracle

  • 22

Sampling (repeat T times):

slide-93
SLIDE 93

Importance Sampling with a Quantum Oracle

  • 22

Flip a coin that is head with probability PHeavy :

Sampling (repeat T times):

slide-94
SLIDE 94

Importance Sampling with a Quantum Oracle

  • 22

Flip a coin that is head with probability PHeavy :

Sampling (repeat T times):

  • Head: sample i ~ DHeavy with the Alias Method.
slide-95
SLIDE 95

Importance Sampling with a Quantum Oracle

  • 22

Flip a coin that is head with probability PHeavy :

Sampling (repeat T times):

  • Head: sample i ~ DHeavy with the Alias Method.
  • Tail: sample i ~ DLight with Quantum State Preparation.
slide-96
SLIDE 96
  • 4. Apply the preprocessing step of the Quant. State Preparation method on DLight.

Importance Sampling with a Quantum Oracle

  • 23
  • 1. Compute the set Heavy ⊂ [n] of indices i such that pi ≥ 1/T, using Grover Search.
  • 3. Apply the preprocessing step of the Alias Method on DHeavy.

Preprocessing:

Cost: Cost: Cost: Cost:

  • 2. Compute PHeavy =

i∈Heavy

pi

slide-97
SLIDE 97
  • 4. Apply the preprocessing step of the Quant. State Preparation method on DLight.

Importance Sampling with a Quantum Oracle

  • 23
  • 1. Compute the set Heavy ⊂ [n] of indices i such that pi ≥ 1/T, using Grover Search.
  • 3. Apply the preprocessing step of the Alias Method on DHeavy.

Preprocessing:

Cost: Cost: Cost:

O( nT)

Cost:

since |Heavy| ≤ T

  • 2. Compute PHeavy =

i∈Heavy

pi

slide-98
SLIDE 98
  • 4. Apply the preprocessing step of the Quant. State Preparation method on DLight.

Importance Sampling with a Quantum Oracle

  • 23
  • 1. Compute the set Heavy ⊂ [n] of indices i such that pi ≥ 1/T, using Grover Search.
  • 3. Apply the preprocessing step of the Alias Method on DHeavy.

Preprocessing:

Cost: Cost: Cost:

O( nT)

Cost:

O(T) since |Heavy| ≤ T

  • 2. Compute PHeavy =

i∈Heavy

pi

slide-99
SLIDE 99
  • 4. Apply the preprocessing step of the Quant. State Preparation method on DLight.

Importance Sampling with a Quantum Oracle

  • 23
  • 1. Compute the set Heavy ⊂ [n] of indices i such that pi ≥ 1/T, using Grover Search.
  • 3. Apply the preprocessing step of the Alias Method on DHeavy.

Preprocessing:

Cost: Cost: Cost:

O( nT) O(T)

Cost:

O(T) since |Heavy| ≤ T

  • 2. Compute PHeavy =

i∈Heavy

pi

slide-100
SLIDE 100
  • 4. Apply the preprocessing step of the Quant. State Preparation method on DLight.

Importance Sampling with a Quantum Oracle

  • 23
  • 1. Compute the set Heavy ⊂ [n] of indices i such that pi ≥ 1/T, using Grover Search.
  • 3. Apply the preprocessing step of the Alias Method on DHeavy.

Preprocessing:

Cost: Cost: Cost: O(

n) O( nT) O(T)

Cost:

O(T) since |Heavy| ≤ T

  • 2. Compute PHeavy =

i∈Heavy

pi

slide-101
SLIDE 101

Cost per sample:

Importance Sampling with a Quantum Oracle

  • 24

Flip a coin that is head with probability PHeavy :

Sampling (repeat T times):

  • Head: sample i ~ DHeavy with the Alias Method.
  • Tail: sample i ~ DLight with Quantum State Preparation.
slide-102
SLIDE 102

Cost per sample:

Importance Sampling with a Quantum Oracle

  • 24

Flip a coin that is head with probability PHeavy :

Sampling (repeat T times):

  • Head: sample i ~ DHeavy with the Alias Method.
  • Tail: sample i ~ DLight with Quantum State Preparation.

O(1)

slide-103
SLIDE 103

Cost per sample:

Importance Sampling with a Quantum Oracle

  • 24

Flip a coin that is head with probability PHeavy :

Sampling (repeat T times):

  • Head: sample i ~ DHeavy with the Alias Method.
  • Tail: sample i ~ DLight with Quantum State Preparation.

O(1)

Total cost: O(T)

slide-104
SLIDE 104

Cost per sample:

Importance Sampling with a Quantum Oracle

  • 24

Flip a coin that is head with probability PHeavy :

Sampling (repeat T times):

  • Head: sample i ~ DHeavy with the Alias Method.
  • Tail: sample i ~ DLight with Quantum State Preparation.

O(1)

Total cost: O(T) Cost per sample:

slide-105
SLIDE 105

Cost per sample:

Importance Sampling with a Quantum Oracle

  • 24

Flip a coin that is head with probability PHeavy :

Sampling (repeat T times):

  • Head: sample i ~ DHeavy with the Alias Method.
  • Tail: sample i ~ DLight with Quantum State Preparation.

O(1)

Total cost: O(T) Cost per sample:

pmax = max{ pi PLight : i ∈ Light} O( npmax) where

slide-106
SLIDE 106

Cost per sample:

Importance Sampling with a Quantum Oracle

  • 24

Flip a coin that is head with probability PHeavy :

Sampling (repeat T times):

  • Head: sample i ~ DHeavy with the Alias Method.
  • Tail: sample i ~ DLight with Quantum State Preparation.

O(1)

Total cost: O(T) Cost per sample:

pmax = max{ pi PLight : i ∈ Light} O( npmax) where ≤ 1 T ⋅ PLight

slide-107
SLIDE 107

Cost per sample:

Importance Sampling with a Quantum Oracle

  • 24

Flip a coin that is head with probability PHeavy :

Sampling (repeat T times):

  • Head: sample i ~ DHeavy with the Alias Method.
  • Tail: sample i ~ DLight with Quantum State Preparation.

O(1)

Total cost: O(T) Cost per sample:

pmax = max{ pi PLight : i ∈ Light} O( npmax) where

Total expected cost:

≤ 1 T ⋅ PLight

slide-108
SLIDE 108

Cost per sample:

Importance Sampling with a Quantum Oracle

  • 24

Flip a coin that is head with probability PHeavy :

Sampling (repeat T times):

  • Head: sample i ~ DHeavy with the Alias Method.
  • Tail: sample i ~ DLight with Quantum State Preparation.

O(1)

Total cost: O(T) Cost per sample:

pmax = max{ pi PLight : i ∈ Light} O( npmax) where

Total expected cost: O(T ⋅ PLight ⋅

npmax) ≤ 1 T ⋅ PLight

slide-109
SLIDE 109

Cost per sample:

Importance Sampling with a Quantum Oracle

  • 24

Flip a coin that is head with probability PHeavy :

Sampling (repeat T times):

  • Head: sample i ~ DHeavy with the Alias Method.
  • Tail: sample i ~ DLight with Quantum State Preparation.

O(1)

Total cost: O(T) Cost per sample:

pmax = max{ pi PLight : i ∈ Light} O( npmax) where

Total expected cost: O(T ⋅ PLight ⋅

npmax) = O(T ⋅ PLight ⋅ n T ⋅ PLight ) ≤ 1 T ⋅ PLight

slide-110
SLIDE 110

Cost per sample:

Importance Sampling with a Quantum Oracle

  • 24

Flip a coin that is head with probability PHeavy :

Sampling (repeat T times):

  • Head: sample i ~ DHeavy with the Alias Method.
  • Tail: sample i ~ DLight with Quantum State Preparation.

O(1)

Total cost: O(T) Cost per sample:

pmax = max{ pi PLight : i ∈ Light} O( npmax) where

Total expected cost: O(T ⋅ PLight ⋅

npmax) = O(T ⋅ PLight ⋅ n T ⋅ PLight ) ≤ 1 T ⋅ PLight = O ( nTPLight)

slide-111
SLIDE 111

Cost per sample:

Importance Sampling with a Quantum Oracle

  • 24

Flip a coin that is head with probability PHeavy :

Sampling (repeat T times):

  • Head: sample i ~ DHeavy with the Alias Method.
  • Tail: sample i ~ DLight with Quantum State Preparation.

O(1)

Total cost: O(T) Cost per sample:

pmax = max{ pi PLight : i ∈ Light} O( npmax) where

Total expected cost: O(T ⋅ PLight ⋅

npmax) = O(T ⋅ PLight ⋅ n T ⋅ PLight ) ≤ 1 T ⋅ PLight = O ( nTPLight) = O ( nT)

slide-112
SLIDE 112

Conclusion

slide-113
SLIDE 113
  • Can we prepare T copies of the state

in time .

arXiv: 1907.05378

Open questions:

  • Can we obtain a quantum speedup for exact/

approximate submodular function minimization?

  • Can we improve the lower bound for exact/

approximate submodular function minimization?

i∈[n]

pi |i⟩ O( nT)