learning determinantal processes wit ith moments and
play

Learning Determinantal Processes wit ith Moments and Cycles J. - PowerPoint PPT Presentation

Learning Determinantal Processes wit ith Moments and Cycles J. Urschel, V.-E. Brunel, A. Moitra, P. Rigollet ICML 2017, Sydney Determinantal Point Processes (D (DPPs) DPP: Random subset of [] For all ,


  1. Learning Determinantal Processes wit ith Moments and Cycles J. Urschel, V.-E. Brunel, A. Moitra, P. Rigollet ICML 2017, Sydney

  2. Determinantal Point Processes (D (DPPs) DPP: Random subset of [𝑢] β€’ For all 𝐾 βŠ† 𝑂 , β„™ 𝐾 βŠ† 𝑍 = det 𝑳 𝐾 𝑳 ∈ ℝ 𝑂×𝑂 , symmetric, 0 β‰Ό 𝐿 β‰Ό 𝐽 𝑂 : parameter ( kernel ) of the DPP β€’ β€’ 𝐿 𝐾 = 𝐿 𝑗,π‘˜ 𝑗,π‘˜βˆˆπΎ 2 ≀ β„™ 1 ∈ 𝑍 β„™ 2 ∈ 𝑍 . β€’ β„™ 1 ∈ 𝑍 = 𝐿 1,1 , β„™ 1,2 ∈ 𝑍 = 𝐿 1,1 𝐿 2,2 βˆ’ 𝐿 1,2 E.g. 𝑀 = 𝐿 𝐽 𝑂 βˆ’ 𝐿 βˆ’1 . β€’ A.k.a. 𝑀 -ensembles if 0 β‰Ί 𝐿 β‰Ί 𝐽 𝑂 : β„™ 𝑍 = 𝐾 ∝ det 𝑀 𝐾 ,

  3. Binary ry representation DPP ⟷ Random binary vector of size 𝑢 , represented as a subset of [𝑢] . ↔ {1,4,5,7,9,10,12,15,19} 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 1 1 0 1 0 1 1 0 0 1 0 0 1 0 0 0 1 0 ↔ {3,4,6,8,9, 12,15,19} π‘Œ 1 , … , π‘Œ 𝑂 ∈ {0,1} 𝑂 ↔ 𝑍 βŠ† [𝑂] π‘Œ 𝑗 = 1 ⇔ 𝑗 ∈ 𝑍 Model for correlated Bernoulli r.v.’s (such as Ising , …) featuring repulsion.

  4. Applications of DPP’s DPPs have become popular in various applications: β€’ Quantum physics ( fermionic processes ) [Macchi β€˜74] β€’ Document and timeline summarization [Lin, Bilmes β€˜12; Yao et al. β€˜16] β€’ Image search [Kulesza, Taskar β€˜11; Affandi et al. β€˜14] β€’ Bioinformatics [Batmanghelich et al. β€˜14] β€’ Neuroscience [Snoek et al. β€˜13] β€’ Wireless or cellular networks modelization [Miyoshi, Shirai β€˜14; Torrisi, Leonardi β€˜14; Li et al. β€˜15; Deng et al. β€˜15] And they remain an elegant and important tool in probability theory [Borodin β€˜11]

  5. Learning DPPs iid β€’ Given 𝑍 1 , 𝑍 2 , … , 𝑍 π‘œ ∼ DPP 𝐿 , estimate 𝐿 . β€’ Approach: Method of moments β€’ Problem: Is 𝐿 identified ?

  6. Identification: 𝓔 -similarity Id β€² = det 𝐿 β€’ DPP 𝐿′ = DPP 𝐿 ⇔ det 𝐿 𝐾 , βˆ€πΎ βŠ† [𝑂] 𝐾 Β±1 0 [Oeding β€˜11] Β±1 ⇔ 𝐿′ = 𝐸𝐿𝐸 for some D = . β‹± 0 Β±1 ↓ ↓ ← + + + + + βˆ’ βˆ’ + + + + + βˆ’ + + βˆ’ β€’ E.g.: K = ⇝ 𝐸𝐿𝐸 = + + + + βˆ’ + + βˆ’ ← + + + + + βˆ’ βˆ’ + β€’ 𝐿 and 𝐸𝐿𝐸 are called 𝓔 -similar .

  7. Method of f moments π‘œ 𝐿 𝑗,𝑗 = 1 β€’ Diagonal entries: 𝑳 𝒋,𝒋 = β„™ 𝑗 ∈ 𝑍 π‘œ 𝟐 π‘—βˆˆπ‘ 𝑙 𝑙=1 β€’ Magnitude of the off-diagonal entries: + π‘œ π‘˜,π‘˜ βˆ’ 1 2 = 𝐿 𝑗,𝑗 𝐿 2 = 𝐿 𝑗,𝑗 𝐿 𝑗,π‘˜ π‘˜,π‘˜ βˆ’ β„™ 𝑗, π‘˜ ∈ 𝑍 𝐿 𝑗,π‘˜ 𝐿 π‘œ 𝟐 𝑗,π‘˜βˆˆπ‘ 𝑙 𝑙=1 β€’ Signs (up to 𝓔 -similarity) ? π‘œ 𝐾 = 1 Use estimates of higher moments: det 𝐿 π‘œ 𝟐 πΎβˆˆπ‘ 𝑙 𝑙=1

  8. Determinantal Graphs Definition 𝐻 = 𝑂 , 𝐹 : 𝑗, π‘˜ ∈ 𝐹 ⇔ 𝐿 𝑗,π‘˜ β‰  0 . βˆ— βˆ— 0 2 𝐿 = βˆ— βˆ— βˆ— 1 0 βˆ— βˆ— 3 Examples: βˆ— βˆ— βˆ— 0 2 βˆ— βˆ— βˆ— 0 𝐿 = βˆ— βˆ— βˆ— βˆ— 1 3 0 0 βˆ— βˆ— 4

  9. Cycle sparsity β€’ Cycle basis : family of induced cycles that span the cycle space 𝐷 2 𝐷 1 + 𝐷 2 𝐷 1 β€’ Cycle sparsity : length β„“ of the largest cycle needed to span the cycle space β€’ Horton’s algorithm : Find a cycle basis with cycle lengths ≀ β„“ in 𝑃 𝐹 2 𝑂 ln 𝑂 βˆ’1 steps [Horton ’87; Amaldi et al. β€˜10]

  10. Cycle sparsity Theorem: 𝐿 is completely determined, up to 𝒠 -similarity, by its principal minors of order ≀ β„“ . for each cycle of length ≀ β„“ . Key: Signs of 𝐿 𝑗,π‘˜ {𝑗,π‘˜}∈𝐷

  11. Learning the signs β€’ Assumption: 𝐿 ∈ 𝒧 𝛽 , i.e., either 𝐿 𝑗,π‘˜ = 0 or 𝐿 𝑗,π‘˜ β‰₯ 𝛽 > 0 β€’ All 𝐿 𝑗,𝑗 ’s and 𝐿 𝑗,π‘˜ ’s are estimated within 𝒐 βˆ’πŸ/πŸ‘ -rate β€’ 𝐻 is recovered exactly w.h.p. β€’ Horton’s algorithm outputs a minimum basis ℬ β€’ For all induced cycle 𝐷 ∈ ℬ + 2 βˆ’1 |𝐷| 2 det 𝐿 𝐷 = 𝐺 𝐷 𝐿 𝑗,𝑗 , 𝐿 𝑗,π‘˜ 𝐿 𝑗,π‘˜ {𝑗,π‘˜}∈𝐷 β€’ Recover the sign of w.h.p. 𝐿 𝑗,π‘˜ {𝑗,π‘˜}∈𝐷

  12. Main result Theorem: Let 𝐿 ∈ 𝒧 𝛽 with cycle sparsity β„“ and let 𝜁 > 0 . Then, the following holds with probability at least 1 βˆ’ π‘œ βˆ’π΅ : 𝐿 in 𝑃 𝐹 3 + π‘œπ‘‚ 2 steps for which There is an algorithm that outputs 2β„“ 𝛽 2 𝜁 2 + β„“ 2 1 π‘œ ≳ ln 𝑂 β‡’ min 𝐿 βˆ’ 𝐸𝐿𝐸 ∞ ≀ 𝜁 𝛽 𝐸 Near-optimal rate in a minimax sense.

  13. Conclusions β€’ Estimation of 𝐿 by a method of moments in polynomial time β€’ Rates of estimation characterized by the topology of the determinantal graph through its cycle sparsity β„“ . β€’ These rates are provably optimal (up to logarithmic factors) β€’ Adaptation to β„“ .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend