rates of f estimation for dis iscrete determinantal point
play

Rates of f Estimation for Dis iscrete Determinantal Point - PowerPoint PPT Presentation

Rates of f Estimation for Dis iscrete Determinantal Point Processes V.-E. Brunel, A. Moitra, P. Rigollet, J. Urschel COLT 2017, Amsterdam Discrete DPPs Random variables on the hypercube , , represented as subsets of [] .


  1. Rates of f Estimation for Dis iscrete Determinantal Point Processes V.-E. Brunel, A. Moitra, P. Rigollet, J. Urschel COLT 2017, Amsterdam

  2. Discrete DPPs Random variables on the hypercube 𝟏, 𝟐 𝑢 , represented as subsets of [𝑢] . ↔ {1,4,5,7,9,10,12,15,19} 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 1 1 0 1 0 1 1 0 0 1 0 0 1 0 0 0 1 0 ↔ {3,4,6,8,9, 12,15,19} ↔ {1,4,8,12,14,17,18,20} 1 0 0 1 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 1 … ↔ {3,6,8,9, 15,16,18} 0 0 1 0 0 1 0 1 1 0 0 0 0 0 1 1 0 1 0 0

  3. Discrete DPPs β€’ Probabilistic model for correlated Bernoulli r.v. β€’ Feature repulsion (negative association) Random subset 𝑍 βŠ† [𝑂] , Definition 𝐿 ∈ ℝ 𝑂×𝑂 , symmetric, 0 β‰Ό 𝐿 β‰Ό 𝐽 β„™ 𝐾 βŠ† 𝑍 = det 𝐿 𝐾 , βˆ€πΎ β€’ 𝐿 𝑗,π‘˜ ↬ repulsion between items 𝑗 and π‘˜ . β€’ PMF: β„™ 𝑍 = 𝐾 = det 𝐿 βˆ’ 𝐽 𝐾

  4. Goal π‘œ ∼ DPP 𝐿 βˆ— , estimate 𝐿 βˆ— . iid β€’ Given 𝑍 1 , 𝑍 2 , … , 𝑍 β€’ Approach: Maximum Likelihood Estimator. β€’ Question: Rate of convergence of the MLE ?

  5. Identification Id βˆ— , βˆ€πΎ βŠ† [𝑂] β€’ DPP 𝐿 = DPP 𝐿 βˆ— ⇔ det 𝐿 𝐾 = det 𝐿 𝐾 Β±1 0 Β±1 ⇔ 𝐿 = 𝐸𝐿 βˆ— 𝐸 for some D = . β‹± 0 Β±1 ↓ ↓ ← + + + + + βˆ’ βˆ’ + + + + + βˆ’ + + βˆ’ β€’ E.g.: K βˆ— = 𝐸K βˆ— 𝐸 = ⇝ + + + + βˆ’ + + βˆ’ ← + + + + + βˆ’ βˆ’ + 𝐿, 𝐿 βˆ— = min Measure of the error of an estimator β„“ || 𝐿 βˆ’ 𝐸𝐿 βˆ— 𝐸|| 𝐺 𝑳 : 𝐸

  6. Maximum likelihood estimation β€’ Log-likelihood: Ξ¨ 𝐿 = π‘ž 𝐾 ln det K βˆ’ I 𝐾 𝐾 βŠ† 𝑂 𝐿 ∈ argmax β€’ MLE: Ξ¨(𝐿) βˆ— ln det K βˆ’ I Ξ¨ 𝐿 β‰œ 𝔽 = π‘ž 𝐾 Ξ¨ 𝐿 𝐾 𝐾 βŠ† 𝑂 = Ξ¨ 𝐿 βˆ— βˆ’ 𝐿𝑀 𝐸𝑄𝑄 𝐿 βˆ— , 𝐸𝑄𝑄 𝐿

  7. Likelihood geometry ry Fisher information: βˆ’π›Ό 2 Ξ¨ 𝐿 βˆ— Ξ¨ 𝐿 Ξ¨ 𝐿 𝐿 𝐿 𝐿 βˆ— 𝐿 βˆ— 𝛼 2 Ξ¨ K βˆ— < 0 𝛼 2 Ξ¨ K βˆ— = 0 What is the order of the first non degenerate derivative of 𝛀 at 𝑳 = 𝑳 βˆ— ?

  8. Determinantal Graphs & Ir Irreducibility Definition βˆ— β‰  0 . 𝐻 = 𝑂 , 𝐹 : 𝑗, π‘˜ ∈ 𝐹 ⇔ 𝐿 𝑗,π‘˜ β€’ 𝐿 βˆ— is irreducible iff 𝐻 is connected. β€’ Otherwise, 𝐿 βˆ— is block diagonal. β€’ Rk: 𝐿 βˆ— is block diagonal β‡’ 𝑍 = union of independent DPPs β€’ Write 𝑗 ∼ π‘˜ when 𝑗 and π‘˜ are connected in 𝐻 .

  9. Main Results: Ir Irreducible case Theorem 1 𝐿 βˆ— irreducible ⇔ 𝛼 2 Ξ¨(𝐿 βˆ— ) is definite negative Statistical consequences: 𝐿, 𝐿 βˆ— = 𝑃 β„™ π‘œ βˆ’ 1 οƒ˜ β„“ 2 οƒ˜ CLT

  10. Main Results: Block diagonal case (1 (1) Theorem 2 Ker 𝛼 2 Ξ¨ 𝐿 βˆ— = 𝐼 ∈ ℝ 𝑂×𝑂 : 𝐼 𝑗,π‘˜ = 0, βˆ€π‘— ∼ π‘˜ 𝜢 πŸ‘ 𝛀 𝑳 βˆ— is negative definite along directions supported on the blocks of 𝑳 βˆ— . Theorem 3 𝐼 βŠ—3 = 0 βˆ– {0} : 𝛼 3 Ξ¨ 𝐿 βˆ— For 𝐼 ∈ Ker 𝛼 2 Ξ¨ 𝐿 βˆ— 𝐼 βŠ—4 < 0 𝛼 4 Ξ¨ 𝐿 βˆ—

  11. Main Results: Block diagonal case (2 (2) Statistical consequences: 𝐿, 𝐿 βˆ— = 𝑃 β„™ π‘œ βˆ’ 1 οƒ˜ β„“ 6 βˆ— = 𝑃 β„™ π‘œ βˆ’ 1 οƒ˜ β„“ for all blocks 𝑇 of 𝐿 βˆ— . 𝐿 𝑇 , 𝐿 2 𝑇

  12. Conclusions β€’ Rates of convergence of the MLE: if 𝐿 βˆ— is irreducible π‘œ βˆ’1/2 π‘œ βˆ’1/6 otherwise β€’ Rate only determined by connectedness of the determinantal graph β€’ Hidden constants can be arbitrarily large in 𝑂 : e.g., if 𝐻 is a path graph β€’ In another paper we show that the sample complexity of a method-of-moment * estimator is determined by the cycle sparsity of 𝐻 . * Learning Determinantal Point Processes from Moments and Cycles , J. Urschel, V.-E. Brunel, A. Moitra, P. Rigollet, ICML 2017

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend