gibbs sampling from
play

Gibbs Sampling from -Determinantal Point Processes Alireza Rezaei - PowerPoint PPT Presentation

Gibbs Sampling from -Determinantal Point Processes Alireza Rezaei University of Washington Based on joint work with Shayan Oveis Gharan Point Process: A distribution on subsets of = {1,2, , } . Determinantal Point Process:


  1. Gibbs Sampling from 𝑙 -Determinantal Point Processes Alireza Rezaei University of Washington Based on joint work with Shayan Oveis Gharan

  2. Point Process: A distribution on subsets of 𝑂 = {1,2, … , 𝑂} . Determinantal Point Process: There is a PSD kernel 𝑀 ∈ ℝ 𝑂×𝑂 such that βˆ€π‘‡ βŠ† 𝑂 : β„™ 𝑇 ∝ det 𝑀 𝑇 𝑴 𝑻

  3. Point Process: A distribution on subsets of 𝑂 = {1,2, … , 𝑂} . Determinantal Point Process: There is a PSD kernel 𝑀 ∈ ℝ 𝑂×𝑂 such that βˆ€π‘‡ βŠ† 𝑂 : β„™ 𝑇 ∝ det 𝑀 𝑇 𝒍 -DPP: Conditioning of a DPP on picking subsets of size 𝑙 𝑴 𝑻 if 𝑇 = 𝑙: β„™ 𝑇 ∝ det 𝑀 𝑇 Focus of the talk: Sampling from 𝑙 - otherwise : β„™ 𝑇 = 0 DPPs

  4. Point Process: A distribution on subsets of 𝑂 = {1,2, … , 𝑂} . Determinantal Point Process: There is a PSD kernel 𝑀 ∈ ℝ 𝑂×𝑂 such that βˆ€π‘‡ βŠ† 𝑂 : β„™ 𝑇 ∝ det 𝑀 𝑇 𝒍 -DPP: Conditioning of a DPP on picking subsets of size 𝑙 𝑴 𝑻 if 𝑇 = 𝑙: β„™ 𝑇 ∝ det 𝑀 𝑇 Focus of the talk: Sampling from 𝑙 - otherwise : β„™ 𝑇 = 0 DPPs DPPs are Very popular probabilistic models in machine learning to capture diversity. Applications [Kulesza- Taskar’11, Dang’05, Nenkova-Vanderwende- McKeown’06, Mirzasoleiman-Jegelka- Krause’17 ] β€” Image search, document and video summarization, tweet timeline generation, pose estimation, feature selection

  5. Continuous Domain Input: PSD operator 𝑀: π’Ÿ Γ— π’Ÿ β†’ ℝ and 𝑙 select a subset 𝑇 βŠ‚ π’Ÿ with 𝑙 points from a distribution with PDF function π‘ž(𝑇) ∝ det 𝑀(𝑦, 𝑧) 𝑦,π‘§βˆˆπ‘‡

  6. Continuous Domain Input: PSD operator 𝑀: π’Ÿ Γ— π’Ÿ β†’ ℝ and 𝑙 select a subset 𝑇 βŠ‚ π’Ÿ with 𝑙 points from a distribution with PDF function π‘¦βˆ’π‘§ Ξ£ βˆ’1 π‘¦βˆ’π‘§ Ex. Gaussian : 𝑀 𝑦, 𝑧 = exp βˆ’ π‘ž(𝑇) ∝ det 𝑀(𝑦, 𝑧) 𝑦,π‘§βˆˆπ‘‡ 2 Applications. β€” Hyper-parameter tuning [Dodge-Jamieson- Smith’17] β€” Learning mixture of Gaussians [Affandi-Fox- Taskar’13]

  7. Random Scan Gibbs Sampler for 𝐿 -DPP 1 1. Stay at the current state 𝑇 = {𝑦 1 , … 𝑦 𝑙 } with prob 2 . y 2. Choose 𝑦 𝑗 ∈ 𝑇 u.a.r 𝑦 𝑗 3. Choose 𝑧 βˆ‰ 𝑇 from the conditional dist 𝜌 . 𝑇 βˆ’ 𝑦 𝑗 is chosen) [𝑂] Continuous: PDF 𝑧 ∝ 𝜌 𝑦 1 , … 𝑦 π‘—βˆ’1 , 𝑧, 𝑦 𝑗+1 , … , 𝑦 𝑙 ) S ∈ 𝑙

  8. Main Result Given a 𝑙 -DPP 𝜌 , an β€œapproximate” sample from 𝜌 can be generated by running the 𝒒 𝝂 𝑷 𝒍 πŸ“ β‹… 𝐦𝐩𝐑 (𝐰𝐛𝐬 𝝆 Gibbs sampler for 𝝊 = ΰ·© 𝒒 𝝆 ) steps where 𝜈 is the starting dist.

  9. Main Result Given a 𝑙 -DPP 𝜌 , an β€œapproximate” sample from 𝜌 can be generated by running the 𝒒 𝝂 𝑷 𝒍 πŸ“ β‹… 𝐦𝐩𝐑 (𝐰𝐛𝐬 𝝆 Gibbs sampler for 𝝊 = ΰ·© 𝒒 𝝆 ) steps where 𝜈 is the starting dist. Discrete: A simple greedy initialization gives 𝜐 = 𝑃 𝑙 5 log 𝑙 . Total running time is 𝑃 𝑂 . poly 𝑙 . οƒ˜ Does not improve upon the previous MCMC methods . [Anari-Oveis Gharan- R’16 ] οƒ˜ Mixing time is independent of 𝑂, so the running time in distributed settings is sublinear.

  10. Main Result Given a 𝑙 -DPP 𝜌 , an β€œapproximate” sample from 𝜌 can be generated by running the 𝒒 𝝂 𝑷 𝒍 πŸ“ β‹… 𝐦𝐩𝐑 (𝐰𝐛𝐬 𝝆 Gibbs sampler for 𝝊 = ΰ·© 𝒒 𝝆 ) steps where 𝜈 is the starting dist. Discrete: A simple greedy initialization gives 𝜐 = 𝑃 𝑙 5 log 𝑙 . Total running time is 𝑃 𝑂 . poly 𝑙 . οƒ˜ Does not improve upon the previous MCMC methods . [Anari-Oveis Gharan- R’16 ] οƒ˜ Mixing time is independent of 𝑂, so the running time in distributed settings is sublinear. Being able to run the chain. Continuous: Given access to conditional oracles, 𝜈 can be found so 𝜐 = 𝑃(𝑙 5 log 𝑙) . οƒ˜ First algorithm with a theoretical guarantee for sampling from continuous 𝑙 -DPP.

  11. Main Result Given a 𝑙 -DPP 𝜌 , an β€œapproximate” sample from 𝜌 can be generated by running the 𝒒 𝝂 𝑷 𝒍 πŸ“ β‹… 𝐦𝐩𝐑 (𝐰𝐛𝐬 𝝆 Gibbs sampler for 𝝊 = ΰ·© 𝒒 𝝆 ) steps where 𝜈 is the starting dist. Discrete: A simple greedy initialization gives 𝜐 = 𝑃 𝑙 5 log 𝑙 . Total running time is 𝑃 𝑂 . poly 𝑙 . οƒ˜ Does not improve upon the previous MCMC methods . [Anari-Oveis Gharan- R’16 ] οƒ˜ Mixing time is independent of 𝑂, so the running time in distributed settings is sublinear. Being able to run the chain. Continuous: Given access to conditional oracles, 𝜈 can be found so 𝜐 = 𝑃(𝑙 5 log 𝑙) . οƒ˜ First algorithm with a theoretical guarantee for sampling from continuous 𝑙 -DPP. π‘¦βˆ’π‘§ 2 οƒ˜ Using a rejection sampler as the conditional oracles for Gaussian kernels 𝑀 𝑦, 𝑧 = exp(βˆ’ ) 𝜏 2 defined a unit sphere in ℝ 𝑒 , the total running time is β€’ If 𝑙 = poly(d): poly(𝑒, 𝜏) If 𝑙 ≀ 𝑓 𝑒1βˆ’πœ€ and 𝜏 = 𝑃 1 : poly 𝑒 β‹… 𝑙 𝑃( 1 πœ€ ) β€’

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend