Learning Determinantal Processes wit ith Moments and Cycles J. - - PowerPoint PPT Presentation

β–Ά
learning determinantal processes wit ith moments and
SMART_READER_LITE
LIVE PREVIEW

Learning Determinantal Processes wit ith Moments and Cycles J. - - PowerPoint PPT Presentation

Learning Determinantal Processes wit ith Moments and Cycles J. Urschel, V.-E. Brunel, A. Moitra, P. Rigollet ICML 2017, Sydney Determinantal Point Processes (D (DPPs) DPP: Random subset of [] For all ,


slide-1
SLIDE 1

Learning Determinantal Processes wit ith Moments and Cycles

  • J. Urschel, V.-E. Brunel, A. Moitra, P. Rigollet

ICML 2017, Sydney

slide-2
SLIDE 2

Determinantal Point Processes (D (DPPs)

DPP: Random subset of [𝑢]

  • For all 𝐾 βŠ† 𝑂 ,

β„™ 𝐾 βŠ† 𝑍 = det 𝑳𝐾

  • 𝑳 ∈ ℝ𝑂×𝑂, symmetric, 0 β‰Ό 𝐿 β‰Ό 𝐽𝑂: parameter (kernel) of the DPP
  • 𝐿

𝐾 = 𝐿𝑗,π‘˜ 𝑗,π‘˜βˆˆπΎ

  • E.g.

β„™ 1 ∈ 𝑍 = 𝐿1,1 , β„™ 1,2 ∈ 𝑍 = 𝐿1,1𝐿2,2 βˆ’ 𝐿1,2

2 ≀ β„™ 1 ∈ 𝑍 β„™ 2 ∈ 𝑍 .

  • A.k.a. 𝑀-ensembles if 0 β‰Ί 𝐿 β‰Ί 𝐽𝑂: β„™ 𝑍 = 𝐾 ∝ det 𝑀𝐾 ,

𝑀 = 𝐿 𝐽𝑂 βˆ’ 𝐿 βˆ’1.

slide-3
SLIDE 3

Binary ry representation

1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 1 1 0 1 0 1 1 0 0 1 0 0 1 0 0 0 1 0 π‘Œ1, … , π‘Œπ‘‚ ∈ {0,1}𝑂 ↔ {1,4,5,7,9,10,12,15,19} ↔ {3,4,6,8,9, 12,15,19} ↔ 𝑍 βŠ† [𝑂] DPP ⟷ Random binary vector of size 𝑢, represented as a subset of [𝑢]. π‘Œπ‘— = 1 ⇔ 𝑗 ∈ 𝑍 Model for correlated Bernoulli r.v.’s (such as Ising, …) featuring repulsion.

slide-4
SLIDE 4

Applications of DPP’s

  • Quantum physics (fermionic processes) [Macchi β€˜74]
  • Document and timeline summarization [Lin, Bilmes β€˜12; Yao et al. β€˜16]
  • Image search [Kulesza, Taskar β€˜11; Affandi et al. β€˜14]
  • Bioinformatics [Batmanghelich et al. β€˜14]
  • Neuroscience [Snoek et al. β€˜13]
  • Wireless or cellular networks modelization [Miyoshi, Shirai β€˜14; Torrisi, Leonardi

β€˜14; Li et al. β€˜15; Deng et al. β€˜15]

DPPs have become popular in various applications: And they remain an elegant and important tool in probability theory [Borodin β€˜11]

slide-5
SLIDE 5

Learning DPPs

  • Given 𝑍

1, 𝑍 2, … , 𝑍 π‘œ ∼ DPP 𝐿 , estimate 𝐿.

  • Approach: Method of moments
  • Problem: Is 𝐿 identified ?

iid

slide-6
SLIDE 6

Id Identification: 𝓔-similarity

  • DPP 𝐿′ = DPP 𝐿 ⇔ det 𝐿

𝐾 β€² = det 𝐿 𝐾 , βˆ€πΎ βŠ† [𝑂]

⇔ 𝐿′ = 𝐸𝐿𝐸 for some D = Β±1 Β±1 β‹± Β±1 .

  • E.g.: K =

+ + + + + + + + + + + + + + + + ⇝ 𝐸𝐿𝐸 = + βˆ’ βˆ’ + βˆ’ + + βˆ’ βˆ’ + + βˆ’ + βˆ’ βˆ’ +

  • 𝐿 and 𝐸𝐿𝐸 are called 𝓔-similar.

← ← ↓ ↓

[Oeding β€˜11]

slide-7
SLIDE 7

Method of f moments

  • Diagonal entries: 𝑳𝒋,𝒋 = β„™ 𝑗 ∈ 𝑍
  • Magnitude of the off-diagonal entries:
  • Signs (up to 𝓔-similarity) ?

Use estimates of higher moments: 𝐿𝑗,𝑗 = 1 π‘œ

𝑙=1 π‘œ

πŸπ‘—βˆˆπ‘π‘™ 𝐿𝑗,π‘˜

2 = 𝐿𝑗,𝑗𝐿 π‘˜,π‘˜ βˆ’ β„™ 𝑗, π‘˜ ∈ 𝑍

𝐿𝑗,π‘˜

2 =

𝐿𝑗,𝑗 𝐿

π‘˜,π‘˜ βˆ’ 1

π‘œ

𝑙=1 π‘œ

πŸπ‘—,π‘˜βˆˆπ‘π‘™

+

det 𝐿

𝐾 = 1

π‘œ

𝑙=1 π‘œ

πŸπΎβˆˆπ‘π‘™

slide-8
SLIDE 8

Determinantal Graphs

Definition 𝐻 = 𝑂 , 𝐹 : 𝑗, π‘˜ ∈ 𝐹 ⇔ 𝐿𝑗,π‘˜ β‰  0. Examples: 𝐿 = βˆ— βˆ— βˆ— βˆ— βˆ— βˆ— βˆ— 𝐿 = βˆ— βˆ— βˆ— βˆ— βˆ— βˆ— βˆ— βˆ— βˆ— βˆ— βˆ— βˆ—

1 2 2 1 3 4 3

slide-9
SLIDE 9

Cycle sparsity

  • Cycle basis: family of induced cycles that span the cycle space
  • Cycle sparsity: length β„“ of the largest cycle needed to span the cycle

space

  • Horton’s algorithm: Find a cycle basis with cycle lengths ≀ β„“ in

𝑃 𝐹 2𝑂 ln 𝑂 βˆ’1 steps [Horton ’87; Amaldi et al. β€˜10]

𝐷1 𝐷2 𝐷1 + 𝐷2

slide-10
SLIDE 10

Cycle sparsity

{𝑗,π‘˜}∈𝐷

𝐿𝑗,π‘˜ Theorem: 𝐿 is completely determined, up to 𝒠-similarity, by its principal minors of order ≀ β„“. Key: Signs of for each cycle of length ≀ β„“.

slide-11
SLIDE 11

Learning the signs

  • Assumption: 𝐿 ∈ 𝒧𝛽, i.e., either 𝐿𝑗,π‘˜ = 0 or 𝐿𝑗,π‘˜ β‰₯ 𝛽 > 0
  • All 𝐿𝑗,𝑗’s and 𝐿𝑗,π‘˜ ’s are estimated within π’βˆ’πŸ/πŸ‘-rate
  • 𝐻 is recovered exactly w.h.p.
  • Horton’s algorithm outputs a minimum basis ℬ
  • For all induced cycle 𝐷 ∈ ℬ

det 𝐿𝐷 = 𝐺𝐷 𝐿𝑗,𝑗, 𝐿𝑗,π‘˜

2

+ 2 βˆ’1 |𝐷|

{𝑗,π‘˜}∈𝐷

𝐿𝑗,π‘˜

  • Recover the sign of

w.h.p.

{𝑗,π‘˜}∈𝐷

𝐿𝑗,π‘˜

slide-12
SLIDE 12

Main result

Theorem: Let 𝐿 ∈ 𝒧𝛽 with cycle sparsity β„“ and let 𝜁 > 0. Then, the following holds with probability at least 1 βˆ’ π‘œβˆ’π΅: There is an algorithm that outputs 𝐿 in 𝑃 𝐹 3 + π‘œπ‘‚2 steps for which π‘œ ≳ 1 𝛽2𝜁2 + β„“ 2 𝛽

2β„“

ln 𝑂 β‡’ min

𝐸

𝐿 βˆ’ 𝐸𝐿𝐸 ∞ ≀ 𝜁 Near-optimal rate in a minimax sense.

slide-13
SLIDE 13

Conclusions

  • Estimation of 𝐿 by a method of moments in polynomial time
  • Rates of estimation characterized by the topology of the

determinantal graph through its cycle sparsity β„“.

  • These rates are provably optimal (up to logarithmic factors)
  • Adaptation to β„“.