 
              Learning Determinantal Processes wit ith Moments and Cycles J. Urschel, V.-E. Brunel, A. Moitra, P. Rigollet ICML 2017, Sydney
Determinantal Point Processes (D (DPPs) DPP: Random subset of [πΆ] β’ For all πΎ β π , β πΎ β π = det π³ πΎ π³ β β πΓπ , symmetric, 0 βΌ πΏ βΌ π½ π : parameter ( kernel ) of the DPP β’ β’ πΏ πΎ = πΏ π,π π,πβπΎ 2 β€ β 1 β π β 2 β π . β’ β 1 β π = πΏ 1,1 , β 1,2 β π = πΏ 1,1 πΏ 2,2 β πΏ 1,2 E.g. π = πΏ π½ π β πΏ β1 . β’ A.k.a. π -ensembles if 0 βΊ πΏ βΊ π½ π : β π = πΎ β det π πΎ ,
Binary ry representation DPP β· Random binary vector of size πΆ , represented as a subset of [πΆ] . β {1,4,5,7,9,10,12,15,19} 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 1 1 0 1 0 1 1 0 0 1 0 0 1 0 0 0 1 0 β {3,4,6,8,9, 12,15,19} π 1 , β¦ , π π β {0,1} π β π β [π] π π = 1 β π β π Model for correlated Bernoulli r.v.βs (such as Ising , β¦) featuring repulsion.
Applications of DPPβs DPPs have become popular in various applications: β’ Quantum physics ( fermionic processes ) [Macchi β74] β’ Document and timeline summarization [Lin, Bilmes β12; Yao et al. β16] β’ Image search [Kulesza, Taskar β11; Affandi et al. β14] β’ Bioinformatics [Batmanghelich et al. β14] β’ Neuroscience [Snoek et al. β13] β’ Wireless or cellular networks modelization [Miyoshi, Shirai β14; Torrisi, Leonardi β14; Li et al. β15; Deng et al. β15] And they remain an elegant and important tool in probability theory [Borodin β11]
Learning DPPs iid β’ Given π 1 , π 2 , β¦ , π π βΌ DPP πΏ , estimate πΏ . β’ Approach: Method of moments β’ Problem: Is πΏ identified ?
Identification: π -similarity Id β² = det πΏ β’ DPP πΏβ² = DPP πΏ β det πΏ πΎ , βπΎ β [π] πΎ Β±1 0 [Oeding β11] Β±1 β πΏβ² = πΈπΏπΈ for some D = . β± 0 Β±1 β β β + + + + + β β + + + + + β + + β β’ E.g.: K = β πΈπΏπΈ = + + + + β + + β β + + + + + β β + β’ πΏ and πΈπΏπΈ are called π -similar .
Method of f moments π πΏ π,π = 1 β’ Diagonal entries: π³ π,π = β π β π π π πβπ π π=1 β’ Magnitude of the off-diagonal entries: + π π,π β 1 2 = πΏ π,π πΏ 2 = πΏ π,π πΏ π,π π,π β β π, π β π πΏ π,π πΏ π π π,πβπ π π=1 β’ Signs (up to π -similarity) ? π πΎ = 1 Use estimates of higher moments: det πΏ π π πΎβπ π π=1
Determinantal Graphs Definition π» = π , πΉ : π, π β πΉ β πΏ π,π β 0 . β β 0 2 πΏ = β β β 1 0 β β 3 Examples: β β β 0 2 β β β 0 πΏ = β β β β 1 3 0 0 β β 4
Cycle sparsity β’ Cycle basis : family of induced cycles that span the cycle space π· 2 π· 1 + π· 2 π· 1 β’ Cycle sparsity : length β of the largest cycle needed to span the cycle space β’ Hortonβs algorithm : Find a cycle basis with cycle lengths β€ β in π πΉ 2 π ln π β1 steps [Horton β87; Amaldi et al. β10]
Cycle sparsity Theorem: πΏ is completely determined, up to π -similarity, by its principal minors of order β€ β . for each cycle of length β€ β . Key: Signs of πΏ π,π {π,π}βπ·
Learning the signs β’ Assumption: πΏ β π§ π½ , i.e., either πΏ π,π = 0 or πΏ π,π β₯ π½ > 0 β’ All πΏ π,π βs and πΏ π,π βs are estimated within π βπ/π -rate β’ π» is recovered exactly w.h.p. β’ Hortonβs algorithm outputs a minimum basis β¬ β’ For all induced cycle π· β β¬ + 2 β1 |π·| 2 det πΏ π· = πΊ π· πΏ π,π , πΏ π,π πΏ π,π {π,π}βπ· β’ Recover the sign of w.h.p. πΏ π,π {π,π}βπ·
Main result Theorem: Let πΏ β π§ π½ with cycle sparsity β and let π > 0 . Then, the following holds with probability at least 1 β π βπ΅ : πΏ in π πΉ 3 + ππ 2 steps for which There is an algorithm that outputs 2β π½ 2 π 2 + β 2 1 π β³ ln π β min πΏ β πΈπΏπΈ β β€ π π½ πΈ Near-optimal rate in a minimax sense.
Conclusions β’ Estimation of πΏ by a method of moments in polynomial time β’ Rates of estimation characterized by the topology of the determinantal graph through its cycle sparsity β . β’ These rates are provably optimal (up to logarithmic factors) β’ Adaptation to β .
Recommend
More recommend