Discrete Sampling using Semigradient-based Product Mixtures Alkis - - PowerPoint PPT Presentation

discrete sampling using semigradient based product
SMART_READER_LITE
LIVE PREVIEW

Discrete Sampling using Semigradient-based Product Mixtures Alkis - - PowerPoint PPT Presentation

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos Hamed Hassani Andreas Krause Stefanie Jegelka ETH Zurich UPenn ETH Zurich MIT UAI 2018 Modeling gene alterations [ cancergenome.nih.gov ] Discrete Sampling using


slide-1
SLIDE 1

Discrete Sampling using Semigradient-based Product Mixtures

Alkis Gotovos Hamed Hassani Andreas Krause Stefanie Jegelka

ETH Zurich UPenn ETH Zurich MIT UAI 2018

slide-2
SLIDE 2

Modeling gene alterations

[cancergenome.nih.gov] Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 1

slide-3
SLIDE 3

Modeling gene alterations

Patients Genes

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 2

slide-4
SLIDE 4

Modeling gene alterations

Patients Genes

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 3

slide-5
SLIDE 5

Modeling gene alterations

Patients Genes

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 4

slide-6
SLIDE 6

Modeling teams in online games

[www.theverge.com] Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 5

slide-7
SLIDE 7

Modeling teams in online games

[euw.leagueoflegends.com] Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 6

slide-8
SLIDE 8

Modeling teams in online games

vs

Team 1 Team 2

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 7

slide-9
SLIDE 9

Modeling teams in online games

Teams Characters

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 8

slide-10
SLIDE 10

Discrete probabilistic models

Ground set V = {1, . . . , n} Data ,

Teams Characters

Model higher-order interactions exp

graph-cut Ising model log DPP

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 9

slide-11
SLIDE 11

Discrete probabilistic models

Ground set V = {1, . . . , n} Data D = {Si}m

i=0, Si ⊆ V

Teams Characters

Model higher-order interactions exp

graph-cut Ising model log DPP

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 9

slide-12
SLIDE 12

Discrete probabilistic models

Ground set V = {1, . . . , n} Data D = {Si}m

i=0, Si ⊆ V

Teams Characters

Model higher-order interactions p(S; θ) = 1 Z(θ) exp ( F(S; θ) )

graph-cut Ising model log DPP

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 9

slide-13
SLIDE 13

Discrete probabilistic models

Ground set V = {1, . . . , n} Data D = {Si}m

i=0, Si ⊆ V

Teams Characters

Model higher-order interactions p(S; θ) = 1 Z(θ) exp ( F(S; θ) )

  • F(S) = graph-cut(S) → Ising model

log DPP

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 9

slide-14
SLIDE 14

Discrete probabilistic models

Ground set V = {1, . . . , n} Data D = {Si}m

i=0, Si ⊆ V

Teams Characters

Model higher-order interactions p(S; θ) = 1 Z(θ) exp ( F(S; θ) )

  • F(S) = graph-cut(S) → Ising model
  • F(S) = log |KS| → DPP

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 9

slide-15
SLIDE 15

Discrete probabilistic models

Learn θ

  • Max. likelihood

#P-hard in general Approximate Sample from

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 10

slide-16
SLIDE 16

Discrete probabilistic models

Learn θ

  • Max. likelihood

#P-hard in general Approximate Sample from

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 10

slide-17
SLIDE 17

Discrete probabilistic models

Learn θ

  • Max. likelihood

Compute ∇θZ(θ) #P-hard in general Approximate Sample from

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 10

slide-18
SLIDE 18

Discrete probabilistic models

Learn θ

  • Max. likelihood

Compute ∇θZ(θ) #P-hard in general Approximate Sample from

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 10

slide-19
SLIDE 19

Discrete probabilistic models

Learn θ

  • Max. likelihood

#P-hard in general Approximate ∇θZ(θ) Sample from p(· ; θ)

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 10

slide-20
SLIDE 20

Discrete probabilistic models

Learn θ

  • Max. likelihood

#P-hard in general Approximate ∇θZ(θ) Sample from p(· ; θ)

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 10

slide-21
SLIDE 21

The Gibbs sampler

{} {1} {2} {3} {1, 2} {1, 3} {2, 3} V

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 11

slide-22
SLIDE 22

The Gibbs sampler

{} {1} {2} {3} {1, 2} {1, 3} {2, 3} V

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 11

slide-23
SLIDE 23

The Gibbs sampler

{} {1} {2} {3} {1, 2} {1, 3} {2, 3} V

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 11

slide-24
SLIDE 24

When Gibbs fails

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 12

slide-25
SLIDE 25

When Gibbs fails

Ω1 Ω2

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 12

slide-26
SLIDE 26

When Gibbs fails

Ω1 Ω2 ?

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 12

slide-27
SLIDE 27

The M3 chain

→ M3 = Mixture of Log-Modulars Metropolis

1

Mixture exp

2

Log-Modulars

3

Metropolis Target exp Accept with probability

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 13

slide-28
SLIDE 28

The M3 chain

→ M3 = Mixture of Log-Modulars Metropolis

1

Mixture exp

2

Log-Modulars

3

Metropolis Target exp Accept with probability

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 13

slide-29
SLIDE 29

The M3 chain

→ M3 = Mixture of Log-Modulars Metropolis

1

Mixture exp

2

Log-Modulars

3

Metropolis Target exp Accept with probability

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 13

slide-30
SLIDE 30

The M3 chain

→ M3 = Mixture of Log-Modulars Metropolis

1

Mixture exp

2

Log-Modulars

3

Metropolis

  • Target p(S) ∝ exp(F(S))

Accept with probability

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 13

slide-31
SLIDE 31

The M3 chain

→ M3 = Mixture of Log-Modulars Metropolis

1

Mixture exp

2

Log-Modulars

3

Metropolis

  • Target p(S) ∝ exp(F(S))
  • Proposal q(S, T)

Accept with probability

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 13

slide-32
SLIDE 32

The M3 chain

→ M3 = Mixture of Log-Modulars Metropolis

1

Mixture exp

2

Log-Modulars

3

Metropolis

  • Target p(S) ∝ exp(F(S))
  • Proposal q(S, T)
  • Accept with probability min

{ 1, p(T)q(T,S)

p(S)q(S,T)

}

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 13

slide-33
SLIDE 33

The M3 chain

→ M3 = Mixture of Log-Modulars Metropolis

1

Mixture exp

2

Log-Modulars

3

Metropolis

  • Target p(S) ∝ exp(F(S))
  • Proposal q(S, T)
  • Accept with probability min

{ 1, p(T)q(T,S)

p(S)q(S,T)

}

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 13

slide-34
SLIDE 34

The M3 chain

→ M3 = Mixture of Log-Modulars Metropolis

1

Mixture q(S, T) = 1 Zq

r

i=1

wi exp (mi(T))

2

Log-Modulars

3

Metropolis

  • Target p(S) ∝ exp(F(S))
  • Proposal q(S, T)
  • Accept with probability min

{ 1, p(T)q(T,S)

p(S)q(S,T)

}

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 13

slide-35
SLIDE 35

The M3 chain

→ M3 = Mixture of Log-Modulars Metropolis

1

Mixture q(S, T) = 1 Zq

r

i=1

wi exp (mi(T))

2

Log-Modulars

3

Metropolis

  • Target p(S) ∝ exp(F(S))
  • Proposal q(S, T)
  • Accept with probability min

{ 1, p(T)q(T,S)

p(S)q(S,T)

}

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 13

slide-36
SLIDE 36

The M3 chain

→ M3 = Mixture of Log-Modulars Metropolis

1

Mixture q(S, T) = 1 Zq

r

i=1

wi exp (mi(T))

2

Log-Modulars mi(T) = ∑

v∈T

miv

3

Metropolis

  • Target p(S) ∝ exp(F(S))
  • Proposal q(S, T)
  • Accept with probability min

{ 1, p(T)q(T,S)

p(S)q(S,T)

}

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 13

slide-37
SLIDE 37

The M3 chain

→ M3 = Mixture of Log-Modulars Metropolis

1

Mixture q(T) = 1 Zq

r

i=1

wi exp (mi(T))

2

Log-Modulars mi(T) = ∑

v∈T

miv

3

Metropolis

  • Target p(S) ∝ exp(F(S))
  • Proposal q(T)
  • Accept with probability min

{ 1, p(T)q(S)

p(S)q(T)

}

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 13

slide-38
SLIDE 38

The M3 chain

Proposal q(T) = 1 Zq

r

i=1

wi exp (mi(T)) Can sample from in time Proposition 1 Mixture can approximate any distribution arbitrarily well. BUT May need an exponential (in ) number of components

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 14

slide-39
SLIDE 39

The M3 chain

Proposal q(T) = 1 Zq

r

i=1

wi exp (mi(T)) → Can sample from q in O(n) time Proposition 1 Mixture can approximate any distribution arbitrarily well. BUT May need an exponential (in ) number of components

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 14

slide-40
SLIDE 40

The M3 chain

Proposal q(T) = 1 Zq

r

i=1

wi exp (mi(T)) → Can sample from q in O(n) time Proposition 1 Mixture q can approximate any distribution p arbitrarily well. BUT May need an exponential (in ) number of components

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 14

slide-41
SLIDE 41

The M3 chain

Proposal q(T) = 1 Zq

r

i=1

wi exp (mi(T)) → Can sample from q in O(n) time Proposition 1 Mixture q can approximate any distribution p arbitrarily well. BUT May need an exponential (in n) number of components r

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 14

slide-42
SLIDE 42

The combined chain

Gibbs step with probability α | M3 step with prob. 1 − α

Projection chain

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 15

slide-43
SLIDE 43

The combined chain

Gibbs step with probability α | M3 step with prob. 1 − α

Ω1 Ω2

Projection chain

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 15

slide-44
SLIDE 44

The combined chain

Gibbs step with probability α | M3 step with prob. 1 − α

Ω1 Ω2

Decomposition theorem [Jerrum et al., ’04]

Projection chain

1 2

Restriction chains

Ω1 Ω2

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 15

slide-45
SLIDE 45

The combined chain

Gibbs step with probability α | M3 step with prob. 1 − α

Ω1 Ω2

Decomposition theorem [Jerrum et al., ’04]

Projection chain

1 2

Gibbs M3 Restriction chains

Ω1 Ω2

Gibbs M3

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 15

slide-46
SLIDE 46

The combined chain

Class of Ising models on the complete graph (Curie-Weiss)

Ω1 Ω2

1 2 Ω1 Ω2

Gibbs

mix

[Levin et al., ’08]

M Combo

mix

log

[Theorem 2]

M

mix [Lemma 1]

Gibbs

mix

log

[Ding et al., ’09]

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 16

slide-47
SLIDE 47

The combined chain

Class of Ising models on the complete graph (Curie-Weiss)

Ω1 Ω2

1 2 Ω1 Ω2

  • Gibbs : tmix = Ω (ecn) [Levin et al., ’08]
  • M3 :
  • Combo :

mix

log

[Theorem 2]

M

mix [Lemma 1]

Gibbs

mix

log

[Ding et al., ’09]

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 16

slide-48
SLIDE 48

The combined chain

Class of Ising models on the complete graph (Curie-Weiss)

Ω1 Ω2

1 2 Ω1 Ω2

  • Gibbs : tmix = Ω (ecn) [Levin et al., ’08]
  • M3 :
  • Combo :

mix

log

[Theorem 2]

M

mix [Lemma 1]

  • Gibbs : tmix = Θ

( n2 log n )

[Ding et al., ’09]

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 16

slide-49
SLIDE 49

The combined chain

Class of Ising models on the complete graph (Curie-Weiss)

Ω1 Ω2

1 2 Ω1 Ω2

  • Gibbs : tmix = Ω (ecn) [Levin et al., ’08]
  • M3 :
  • Combo :

mix

log

[Theorem 2]

  • M3 : tmix = O(1) [Lemma 1]
  • Gibbs : tmix = Θ

( n2 log n )

[Ding et al., ’09]

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 16

slide-50
SLIDE 50

The combined chain

Class of Ising models on the complete graph (Curie-Weiss)

Ω1 Ω2

1 2 Ω1 Ω2

  • Gibbs : tmix = Ω (ecn) [Levin et al., ’08]
  • M3 : ?
  • Combo : tmix = O

( n2 log n )

[Theorem 2]

  • M3 : tmix = O(1) [Lemma 1]
  • Gibbs : tmix = Θ

( n2 log n )

[Ding et al., ’09]

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 16

slide-51
SLIDE 51

Constructing the mixture

Construction of q(·) ∝ ∑r

i=1 exp (mi(·))

Input: Set function , mixture size for to do Permutation of Modular function that approximates at return Submodularity natural diminishing returns property Sub-/supergradients modular lower/upper approx. [Iyer et al., ’13] Construction works for general set functions

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 17

slide-52
SLIDE 52

Constructing the mixture

Construction of q(·) ∝ ∑r

i=1 exp (mi(·))

Input: Set function F, mixture size r for i = 1 to r do σ ← Permutation of V mi ← Modular function that approximates F at σ return {m1, . . . , mr} Submodularity natural diminishing returns property Sub-/supergradients modular lower/upper approx. [Iyer et al., ’13] Construction works for general set functions

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 17

slide-53
SLIDE 53

Constructing the mixture

Construction of q(·) ∝ ∑r

i=1 exp (mi(·))

Input: Set function F, mixture size r for i = 1 to r do σ ← Permutation of V mi ← SemiGradient(F, σ) return {m1, . . . , mr} Submodularity natural diminishing returns property Sub-/supergradients modular lower/upper approx. [Iyer et al., ’13] Construction works for general set functions

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 17

slide-54
SLIDE 54

Constructing the mixture

Construction of q(·) ∝ ∑r

i=1 exp (mi(·))

Input: Set function F, mixture size r for i = 1 to r do σ ← Permutation of V mi ← SemiGradient(F, σ) return {m1, . . . , mr}

  • Submodularity → natural diminishing returns property

Sub-/supergradients modular lower/upper approx. [Iyer et al., ’13] Construction works for general set functions

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 17

slide-55
SLIDE 55

Constructing the mixture

Construction of q(·) ∝ ∑r

i=1 exp (mi(·))

Input: Set function F, mixture size r for i = 1 to r do σ ← Permutation of V mi ← SemiGradient(F, σ) return {m1, . . . , mr}

  • Submodularity → natural diminishing returns property
  • Sub-/supergradients → modular lower/upper approx. [Iyer et al., ’13]

Construction works for general set functions

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 17

slide-56
SLIDE 56

Constructing the mixture

Construction of q(·) ∝ ∑r

i=1 exp (mi(·))

Input: Set function F, mixture size r for i = 1 to r do σ ← Permutation of V mi ← SemiGradient(F, σ) return {m1, . . . , mr}

  • Submodularity → natural diminishing returns property
  • Sub-/supergradients → modular lower/upper approx. [Iyer et al., ’13]
  • Construction works for general set functions F

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 17

slide-57
SLIDE 57

Constructing the mixture

Randomized construction of q(·) ∝ ∑r

i=1 exp (mi(·))

Input: Set function F, mixture size r for i = 1 to r do σ ← Random permutation of V mi ← SemiGradient(F, σ) return {m1, . . . , mr}

  • Submodularity → natural diminishing returns property
  • Sub-/supergradients → modular lower/upper approx. [Iyer et al., ’13]
  • Construction works for general set functions F

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 17

slide-58
SLIDE 58

Constructing the mixture

Iterative construction of q(·) ∝ ∑r

i=1 exp (mi(·))

Input: Set function F, mixture size r for i = 1 to r do σ ← Greedy ( F(·) − log ∑i−1

j=1 exp(mj(·))

) mi ← SemiGradient(F, σ) return {m1, . . . , mr}

  • Submodularity → natural diminishing returns property
  • Sub-/supergradients → modular lower/upper approx. [Iyer et al., ’13]
  • Construction works for general set functions F

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 17

slide-59
SLIDE 59

Experiments

8.5k teams of 5 characters is a (submodular) facility location diversity model [Tschiatschek et al., ’16] max

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 18

slide-60
SLIDE 60

Experiments

  • |V | = 48

8.5k teams of 5 characters is a (submodular) facility location diversity model [Tschiatschek et al., ’16] max

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 18

slide-61
SLIDE 61

Experiments

  • |V | = 48
  • 8.5k teams of 5 characters

is a (submodular) facility location diversity model [Tschiatschek et al., ’16] max

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 18

slide-62
SLIDE 62

Experiments

  • |V | = 48
  • 8.5k teams of 5 characters
  • F is a (submodular) facility location diversity model [Tschiatschek et al., ’16]

F(S) = ∑

i∈S

wi +

L

j=1

max

i∈S cij

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 18

slide-63
SLIDE 63

Experiments

1k 2k 3k 4k 5k 6k 7k 1 1.5

Samples PSRF

Gibbs Combo-Rand Combo-Iter

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 19

slide-64
SLIDE 64

Experiments

1k 2k 3k 4k 5k 6k 7k 1 1.5

Samples PSRF

Gibbs r = 50 r = 100 r = 200 r = 400 r = 800

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 20

slide-65
SLIDE 65

Experiments

1k 2k 3k 4k 5k 6k 7k 1 1.5

Samples PSRF

α = 1 (Gibbs) α = 0.8 α = 0.6 α = 0.4 α = 0.2 α = 0 (M3)

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 21

slide-66
SLIDE 66

Conclusion

Ω1 Ω2 M3

  • M3 sampler → propose global moves to overcome bottlenecks

Combined sampler analysis based on decomposition theorem Semigradient construction incorporate ideas from optimization

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 22

slide-67
SLIDE 67

Conclusion

Ω1 Ω2 M3

  • M3 sampler → propose global moves to overcome bottlenecks
  • Combined sampler → analysis based on decomposition theorem

Semigradient construction incorporate ideas from optimization

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 22

slide-68
SLIDE 68

Conclusion

Ω1 Ω2 M3

  • M3 sampler → propose global moves to overcome bottlenecks
  • Combined sampler → analysis based on decomposition theorem
  • Semigradient construction → incorporate ideas from optimization

Discrete Sampling using Semigradient-based Product Mixtures Alkis Gotovos 22