Gibbs Sampling from -Determinantal Point Processes Alireza Rezaei - - PowerPoint PPT Presentation

β–Ά
gibbs sampling from
SMART_READER_LITE
LIVE PREVIEW

Gibbs Sampling from -Determinantal Point Processes Alireza Rezaei - - PowerPoint PPT Presentation

Gibbs Sampling from -Determinantal Point Processes Alireza Rezaei University of Washington Based on joint work with Shayan Oveis Gharan Point Process: A distribution on subsets of = {1,2, , } . Determinantal Point Process:


slide-1
SLIDE 1

Gibbs Sampling from 𝑙-Determinantal Point Processes

Based on joint work with Shayan Oveis Gharan

Alireza Rezaei

University of Washington

slide-2
SLIDE 2

Point Process: A distribution on subsets of 𝑂 = {1,2, … , 𝑂}. Determinantal Point Process: There is a PSD kernel 𝑀 ∈ ℝ𝑂×𝑂 such that βˆ€π‘‡ βŠ† 𝑂 : β„™ 𝑇 ∝ det 𝑀𝑇

𝑴𝑻

slide-3
SLIDE 3

Point Process: A distribution on subsets of 𝑂 = {1,2, … , 𝑂}. Determinantal Point Process: There is a PSD kernel 𝑀 ∈ ℝ𝑂×𝑂 such that βˆ€π‘‡ βŠ† 𝑂 : β„™ 𝑇 ∝ det 𝑀𝑇 𝒍-DPP: Conditioning of a DPP on picking subsets of size 𝑙 if 𝑇 = 𝑙: β„™ 𝑇 ∝ det 𝑀𝑇

  • therwise : β„™ 𝑇 = 0

Focus of the talk: Sampling from 𝑙- DPPs

𝑴𝑻

slide-4
SLIDE 4

Point Process: A distribution on subsets of 𝑂 = {1,2, … , 𝑂}. Determinantal Point Process: There is a PSD kernel 𝑀 ∈ ℝ𝑂×𝑂 such that DPPs are Very popular probabilistic models in machine learning to capture diversity. βˆ€π‘‡ βŠ† 𝑂 : β„™ 𝑇 ∝ det 𝑀𝑇 𝒍-DPP: Conditioning of a DPP on picking subsets of size 𝑙 if 𝑇 = 𝑙: β„™ 𝑇 ∝ det 𝑀𝑇

  • therwise : β„™ 𝑇 = 0

Focus of the talk: Sampling from 𝑙- DPPs

Applications [Kulesza-Taskar’11, Dang’05, Nenkova-Vanderwende-McKeown’06, Mirzasoleiman-Jegelka-Krause’17]

β€” Image search, document and video summarization, tweet timeline generation, pose estimation, feature selection

𝑴𝑻

slide-5
SLIDE 5

Continuous Domain

Input: PSD operator 𝑀: π’Ÿ Γ— π’Ÿ β†’ ℝ and 𝑙 select a subset 𝑇 βŠ‚ π’Ÿ with 𝑙 points from a distribution with PDF function π‘ž(𝑇) ∝ det 𝑀(𝑦, 𝑧) 𝑦,π‘§βˆˆπ‘‡

slide-6
SLIDE 6

Continuous Domain

Input: PSD operator 𝑀: π’Ÿ Γ— π’Ÿ β†’ ℝ and 𝑙 select a subset 𝑇 βŠ‚ π’Ÿ with 𝑙 points from a distribution with PDF function π‘ž(𝑇) ∝ det 𝑀(𝑦, 𝑧) 𝑦,π‘§βˆˆπ‘‡

Applications.

β€” Hyper-parameter tuning [Dodge-Jamieson-Smith’17] β€” Learning mixture of Gaussians[Affandi-Fox-Taskar’13]

  • Ex. Gaussian : 𝑀 𝑦, 𝑧 = exp βˆ’

π‘¦βˆ’π‘§ Ξ£βˆ’1 π‘¦βˆ’π‘§ 2

slide-7
SLIDE 7

Random Scan Gibbs Sampler for 𝐿-DPP

  • 1. Stay at the current state 𝑇 = {𝑦1, … 𝑦𝑙} with prob

1 2.

  • 2. Choose 𝑦𝑗 ∈ 𝑇 u.a.r
  • 3. Choose 𝑧 βˆ‰ 𝑇 from the conditional dist 𝜌 . 𝑇 βˆ’ 𝑦𝑗 is

chosen) Continuous: PDF 𝑧 ∝ 𝜌 𝑦1, … π‘¦π‘—βˆ’1, 𝑧, 𝑦𝑗+1, … , 𝑦𝑙 )

S ∈ [𝑂] 𝑙

y 𝑦𝑗

slide-8
SLIDE 8

Main Result

Given a 𝑙-DPP 𝜌, an β€œapproximate” sample from 𝜌 can be generated by running the Gibbs sampler for 𝝊 = ΰ·© 𝑷 π’πŸ“ β‹… 𝐦𝐩𝐑 (𝐰𝐛𝐬𝝆

𝒒𝝂 𝒒𝝆 ) steps where 𝜈 is the starting dist.

slide-9
SLIDE 9

Main Result

Given a 𝑙-DPP 𝜌, an β€œapproximate” sample from 𝜌 can be generated by running the Gibbs sampler for 𝝊 = ΰ·© 𝑷 π’πŸ“ β‹… 𝐦𝐩𝐑 (𝐰𝐛𝐬𝝆

𝒒𝝂 𝒒𝝆 ) steps where 𝜈 is the starting dist.

Discrete: A simple greedy initialization gives 𝜐 = 𝑃 𝑙5log 𝑙 . Total running time is 𝑃 𝑂 . poly 𝑙 .

  • Does not improve upon the previous MCMC methods. [Anari-Oveis Gharan-R’16]
  • Mixing time is independent of 𝑂, so the running time in distributed settings is sublinear.
slide-10
SLIDE 10

Main Result

Given a 𝑙-DPP 𝜌, an β€œapproximate” sample from 𝜌 can be generated by running the Gibbs sampler for 𝝊 = ΰ·© 𝑷 π’πŸ“ β‹… 𝐦𝐩𝐑 (𝐰𝐛𝐬𝝆

𝒒𝝂 𝒒𝝆 ) steps where 𝜈 is the starting dist.

Discrete: A simple greedy initialization gives 𝜐 = 𝑃 𝑙5log 𝑙 . Total running time is 𝑃 𝑂 . poly 𝑙 .

  • Does not improve upon the previous MCMC methods. [Anari-Oveis Gharan-R’16]
  • Mixing time is independent of 𝑂, so the running time in distributed settings is sublinear.

Continuous: Given access to conditional oracles, 𝜈 can be found so 𝜐 = 𝑃(𝑙5log 𝑙).

  • First algorithm with a theoretical guarantee for sampling from continuous 𝑙-DPP.

Being able to run the chain.

slide-11
SLIDE 11

Main Result

Given a 𝑙-DPP 𝜌, an β€œapproximate” sample from 𝜌 can be generated by running the Gibbs sampler for 𝝊 = ΰ·© 𝑷 π’πŸ“ β‹… 𝐦𝐩𝐑 (𝐰𝐛𝐬𝝆

𝒒𝝂 𝒒𝝆 ) steps where 𝜈 is the starting dist.

Discrete: A simple greedy initialization gives 𝜐 = 𝑃 𝑙5log 𝑙 . Total running time is 𝑃 𝑂 . poly 𝑙 .

  • Does not improve upon the previous MCMC methods. [Anari-Oveis Gharan-R’16]
  • Mixing time is independent of 𝑂, so the running time in distributed settings is sublinear.

Continuous: Given access to conditional oracles, 𝜈 can be found so 𝜐 = 𝑃(𝑙5log 𝑙).

  • First algorithm with a theoretical guarantee for sampling from continuous 𝑙-DPP.
  • Using a rejection sampler as the conditional oracles for Gaussian kernels 𝑀 𝑦, 𝑧 = exp(βˆ’

π‘¦βˆ’π‘§ 2 𝜏2

) defined a unit sphere in ℝ𝑒, the total running time is

  • If 𝑙 =poly(d): poly(𝑒, 𝜏)
  • If 𝑙 ≀ 𝑓𝑒1βˆ’πœ€ and 𝜏 = 𝑃 1 : poly 𝑒 β‹… 𝑙𝑃(1

πœ€)

Being able to run the chain.

slide-12
SLIDE 12