multiresolution analysis for the statistical analysis of incomplete - - PowerPoint PPT Presentation

multiresolution analysis for the statistical analysis of
SMART_READER_LITE
LIVE PREVIEW

multiresolution analysis for the statistical analysis of incomplete - - PowerPoint PPT Presentation

multiresolution analysis for the statistical analysis of incomplete rankings Eric Sibony Anna Korba Stphan Clmenon NIPS Workshop on Multiresolution Methods for Large-Scale Learning December 12 2015 LTCI UMR 5141, Telecom ParisTech/CNRS


slide-1
SLIDE 1

multiresolution analysis for the statistical analysis of incomplete rankings

Eric Sibony Anna Korba Stéphan Clémençon NIPS Workshop on Multiresolution Methods for Large-Scale Learning December 12 2015

LTCI UMR 5141, Telecom ParisTech/CNRS

slide-2
SLIDE 2

introduction

Why rankings? Ranking data naturally appear in a wide variety of situations ∙ elections ∙ survey answers ∙ expert judgments ∙ race results ∙ competition rankings ∙ customers behaviors ∙ users preferences ∙ …

1

slide-3
SLIDE 3

introduction

Probabilistic modeling on rankings Catalog of items n := {1, . . . , n} Full ranking a1 ≻ · · · ≻ an ⇔ Permutation σ ∈ Sn that maps an item to its rank: σ(ai) = i The variability of full rankings is therefore modeled by a probability distribution p over the set of permutations Sn. p is called a ranking model.

2

slide-4
SLIDE 4

introduction

Example: probability distribution over S5 (APA dataset)

3

slide-5
SLIDE 5

introduction

Probabilistic modeling on rankings “Parametric” models - psychological interpretation ∙ Thurstone, ∙ Mallows, ∙ Plackett-Luce … “Nonparametric” approaches - mathematical interpretation ∙ Distance-based, ∙ Independence modeling, ∙ Fourier analysis … Why Multiresolution Analysis? To exploit another relevant structure of rankings

4

slide-6
SLIDE 6

fourier analysis on the symmetric group

slide-7
SLIDE 7

abstract fourier analysis

Fourier analysis consists in decomposing a signal into projections on subspaces that are stable under translations. Example: Fourier series For ek(x) = e2iπkx, the space Cek is stable under translations Ta : f → f(· − a) for all a ∈ R/Z. Fourier coefficient f(k) is defined by

  • f(k) = ⟨f, ek⟩ =

∫ 1 f(x)e2iπkxdx

6

slide-8
SLIDE 8

abstract fourier analysis the symmetric group

Let L(Sn) := {f : Sn → R}. Translations on L(Sn) are the

  • perators

Tτ : f → f(·τ −1) defined for τ ∈ Sn. Theorem (From group representation theory) L(Sn) ∼ = ⊕

λ⊢n

dλSλ ∙ λ ⊢ n: indexes of the irreducible representations of Sn ∙ Sλ: space of irreducible representation indexed by λ ∙ dλ = dim Sλ

7

slide-9
SLIDE 9

abstract fourier transform

Let ρλ : Sn → Rdλ×dλ be a representative of the irreducible representation indexed by λ.

  • f(λ) =

σ∈Sn

f(σ)ρλ(σ) “ = ⟨f, ρλ⟩ ” ∈ Rdλ×dλ “projection on dλSλ”. The Fourier transform is then defined by F : f → ( f(λ) )

λ⊢n

Satisfies classic properties ∙ Parseval identity ∙ Inverse Fourier transform ∙ Turns convolution into (matrix) product

8

slide-10
SLIDE 10

specificities

Fourier coefficients are matrices “Frequencies” λ are not numbers (no canonic total order) They are partitions of n: tuples (λ1, . . . , λr) ∈ Nr such that λ1 ≥ · · · ≥ λr and ∑r

i=1 λi = n.

(n), (n − 1, 1), (n − 2, 2), (n − 2, 1, 1), . . . The canonic partial order on partitions however orders the Fourier coefficients by “levels of smoothness”.

9

slide-11
SLIDE 11

utilizations

Classic methods of Fourier analysis apply to ranking data ∙ Band-limited approximation (e.g. [Huang et al., 2009]) ∙ Phase-magnitude decomposition (e.g. [Kakarala, 2011]) ∙ Analysis of random walks (e.g. [Diaconis, 1988]) ∙ Construction of kernels (e.g. [Kondor and Barbosa, 2010]) ∙ Hypothesis testing (e.g. [Diaconis, 1989])

10

slide-12
SLIDE 12

looking for a new representation

slide-13
SLIDE 13

natural extension

As in classic Fourier analysis, Fourier coefficients contain global information on Sn.

  • f(λ) =

σ∈Sn

f(σ)ρλ(σ) ⇒ The Fourier transform only allows to characterize the global smoothness of a function Probability distributions over Sn may show local irregularities ⇒ One needs some form of multiresolution analysis to characterize the local smoothness of a function

12

slide-14
SLIDE 14

construction of a multiresolution analysis

Natural attempt ∙ Fourier analysis is constructed from translations ∙ Multiresolution analysis should be constructed from translations and dilations Problem: No equivalent of dilations in a discrete setting.

13

slide-15
SLIDE 15

“space-scale” decomposition

Relevant approach Directly construct a Multiresolution analysis that allows to characterize local singularities. The multiresolution analysis introduced in [Kondor and Dempsey, 2012] allows to characterize singularities f localized both ∙ in “space”: f with a small support in Sn ∙ in “scale/frequency”: Ff with small support in {λ ⊢ n}

14

slide-16
SLIDE 16

item localization

Some modern applications require a different type of localization. In these applications, observed rankings are incomplete: they

  • nly involve small subsets of items among the catalog n

a1 ≻ · · · ≻ ak with k ≪ n (e.g. users preferences). Such applications require “item localization”.

15

slide-17
SLIDE 17
  • ur purpose : “item-scale” decomposition

Does Fourier analysis offers some “item localization”? No ⇒ We introduce a multiresolution analysis that allows to characterize singularities f localized both in ∙ in “items”: f only “impacts” the rankings of a subset of items ∙ in “scale/frequency”: Ff with small support in {λ ⊢ n}

16

slide-18
SLIDE 18
  • ur purpose : “item-scale” decomposition

What do we mean by “item localization”?

17

slide-19
SLIDE 19

rank information localization

slide-20
SLIDE 20

rank information

Permutation ( 1 2 3 4 5 2 5 4 3 1 ) ↔ Ranking 5 ≻ 1 ≻ 4 ≻ 3 ≻ 2 Absolute rank information ∙ What is the rank σ(3) of item 3? 4 ∙ What item σ−1(2) is ranked at 2nd position? 1 ∙ What are the ranks σ({2, 4, 5})

  • f items {2, 4, 5}? {5, 3, 1}

Relative rank information ∙ How are items 1 and 3 relatively ordered? 1 ≻ 3 ∙ How are the items of the subset {2, 4, 5} relatively

  • rdered? 5 ≻ 4 ≻ 2

19

slide-21
SLIDE 21

rank information

Permutation σ ↔ Ranking σ−1(1) ≻ · · · ≻ σ−1(n) Absolute rank information ∙ What is the rank σ(i) of item i? ∙ What item σ−1(j) is ranked at jth position? ∙ What are the ranks σ({i, j, k})

  • f items {i, j, k}?

Relative rank information ∙ How are items a and b relatively ordered? ∙ How are the items of the subset A relatively

  • rdered?

20

slide-22
SLIDE 22

rank information

rnd Permutation Σ ↔ rnd Ranking Σ−1(1) ≻ · · · ≻ Σ−1(n) Absolute rank information ∙ What is the law of the rank Σ(i) of item i ? ∙ What is the law of the item Σ−1(j) ranked at jth position? ∙ What is the law of the ranks Σ({i, j, k}) of items {i, j, k}? Relative rank information ∙ What is the probability P[Σ(a) < Σ(b)] that a is ranked higher than b? ∙ What is the law of the ranking Σ|A induced by Σ on the subset A?

21

slide-23
SLIDE 23

marginals of a ranking model

For a random permutation Σ drawn from a ranking model p, all these laws are marginals of p. Example P [Σ(i) = j] = ∑

σ∈Sn, σ(i)=j

p(σ) P[Σ(a) < Σ(b)] = ∑

σ∈Sn, σ(a)<σ(b)

p(σ) Associated marginal operators M(n−1,1)

i

: p → law of Σ(i) M{a,b} : p → law of I{Σ(a) < Σ(b)}

22

slide-24
SLIDE 24

marginals of a ranking model

Absolute marginals For λ ⊢ n, Mλ

A1,...,Ar : p → law of (Σ(A1), . . . , Σ(Ar))

where (A1, . . . , Ar) is an partition of n such that |Ai| = λi. Absolute marginals For A ⊂ n with |A| ≥ 2, MA : p → law of Σ|A where Σ|A is the ranking induced by Σ on the items of A.

23

slide-25
SLIDE 25

marginals localize nested levels of rank information

Example for absolute marginals The knowledge of all (n − 2, 1, 1) marginals induces the knowledge of all (n − 1, 1) marginals. M(n−1,1)

i

p(j) = P[Σ(i) = j] = ∑

j′̸=j

P[Σ(i) = j, Σ(i′) = j′] = ∑

j′̸=j

M(n−2,1,1)

(i,i′)

p(j, j′) for all i′ ̸= i.

24

slide-26
SLIDE 26

marginals localize nested levels of rank information

Example for relative marginals The knowledge of the marginal on {a, b, c} induces the knowledge of the marginal on {a, c}. M{a,c}p(b ≻ c) = P[Σ(b) < Σ(c)] = P[Σ(a) < Σ(b) < Σ(c)] + P[Σ(b) < Σ(a) < Σ(c)] + P[Σ(b) < Σ(c) < Σ(a)] = M{a,b,c}p(a ≻ b ≻ c) + M{a,b,c}p(b ≻ a ≻ c) + M{a,b,c}p(b ≻ c ≻ a)

25

slide-27
SLIDE 27

fourier analysis localizes absolute rank information

A classic result from Sn representation theory (Young’s rule) says informally that:

  • 1. Absolute marginals are nested according to the canonic
  • rder on partitions
  • 2. The part of information of a function f : Sn → R that is

specific to its λ-marginals Mλf is contained in its Fourier coefficient f(λ): Mλ

A1,...,Arf

“=” Mλ

A1,...,ArF−1

  f(λ) + ∑

µ◃λ

Kµ,λ f(µ)  

26

slide-28
SLIDE 28

fourier analysis localizes absolute rank information

Illustration from Jonathan Huang’s thesis 27

slide-29
SLIDE 29

fourier analysis does not localize relative rank information

f M{1,2,3}f M{1,2,4}f M{1,3,4}f M{2,3,4}f M{1,2}f M{1,3}f M{1,4}f M{2,3}f M{2,4}f M{3,4}f

28

slide-30
SLIDE 30

the mra representation

slide-31
SLIDE 31

main result

Theorem ([Clémençon et al., 2014]) Denote by ¯ P(E) := {∅} ∪ {A ⊂ E | |A| ≥ 2}. There exist ∙ a “wavelet transform” Ψ : f → (ΨBf) ¯

P(n)

∙ “wavelet synthesis operators” φA Such that for any f : Sn → R, f = φn ∑

B∈ ¯ P(n)

ΨBf and for all A ∈ ¯ P(n), MAf = φA ∑

B∈ ¯ P(A)

ΨBf

30

slide-32
SLIDE 32

main result

Example For a function f M{1,2,3}f = φ{1,2,3} [ Ψ∅f + Ψ{1,2}f + Ψ{1,3}f + Ψ{2,3}f + Ψ{1,2,3}f ] For a ranking model p P(2 ≻ 1 ≻ 3) = 1 6 + 1 2 [( P(2 ≻ 1) − 1 2 ) + ( P(1 ≻ 3) − 1 2 )] + residual

31

slide-33
SLIDE 33

main result

f M{1,2,3}f M{1,2,4}f M{1,3,4}f M{2,3,4}f M{1,2}f M{1,3}f M{1,4}f M{2,3}f M{2,4}f M{3,4}f

32

slide-34
SLIDE 34

main result

Proof ingredients of the Theorem ∙ Combinatorial relationship between the wavelet synthesis

  • perators φA and the marginal operators MB

∙ Recent result in algebraic topology (from [Reiner et al., 2013]) Framework ∙ The wavelet transform is used as a mapping to a feature space ∙ It can be computed with a “fast wavelet transform”

33

slide-35
SLIDE 35

main practical use

The MRA representation allows to characterize the functions f : Sn → R with know marginal values: MAf = GA for A ∈ A where A is any collection of subsets.

34

slide-36
SLIDE 36

main practical use

f

  • 0,2

0,3 0,8 1234 1243 1324 1342 1423 1432 2134 2143 2314 2341 2413 2431 3124 3142 3214 3241 3412 3421 4123 4132 4213 4231 4312 4321

M{1,2,3}f M{1,2,4}f M{1,3,4}f M{2,3,4}f

0,5 1 123 132 213 231 312 321 0,5 1 124 142 214 241 412 421 0,5 1 134 143 314 341 413 431 0,5 1 234 243 324 342 423 432

M{1,2}f M{1,3}f M{1,4}f M{2,3}f M{2,4}f M{3,4}f

0,5 1 12 21 0,5 1 13 31 0,5 1 14 41 0,5 1 23 32 0,5 1 24 42 0,5 1 34 43

35

slide-37
SLIDE 37

use for the statistical analysis of incomplete rankings

Setting Dataset of N IID censored observations (Σ(1)

|A1, . . . , Σ(N) |AN) with

Σ(i) ∼ p ranking model Ai ∼ ν probability distribution over ¯ P(n) Method Perform statistical analysis through the wavelet transform Ψ (see [Sibony et al., 2015]) ∙ Easy framework to combine heterogeneous information and exploit the structure of ranking models ∙ The complexity only depends on the complexity of the dataset

36

slide-38
SLIDE 38

conclusion

slide-39
SLIDE 39

conclusion

Summary ∙ The statistical analysis of some ranking data require to localize information involved in marginal projections of a ranking model ∙ We have constructed a multiresolution analysis that allows to localize relative rank information and therefore applies to modern large-scale settings Future directions Application to various statistical problems, regularization procedures, extension to incomplete rankings with ties, …

38

slide-40
SLIDE 40

Thank you

39

slide-41
SLIDE 41

Clémençon, S., Jakubowicz, J., and Sibony, E. (2014). Multiresolution analysis of incomplete rankings. ArXiv e-prints. Diaconis, P. (1988). Group representations in probability and statistics. Institute of Mathematical Statistics Lecture Notes - Monograph Series. Institute of Mathematical Statistics, Hayward, CA. Diaconis, P. (1989). A generalization of spectral analysis with application to ranked data. The Annals of Statistics, 17(3):949–979. Huang, J., Guestrin, C., and Guibas, L. (2009). Fourier theoretic probabilistic inference over permutations.

39

slide-42
SLIDE 42

JMLR, 10:997–1070. Kakarala, R. (2011). A signal processing approach to Fourier analysis of ranking data: the importance of phase. IEEE Transactions on Signal Processing, pages 1–10. Kondor, R. and Barbosa, M. S. (2010). Ranking with kernels in Fourier space. In Proceedings of COLT’10, pages 451–463. Kondor, R. and Dempsey, W. (2012). Multiresolution analysis on the symmetric group. In Neural Information Processing Systems 25. Reiner, V., Saliola, F., and Welker, V. (2013). Spectra of symmetrized shuffling operators. Memoirs of the American Mathematical Society, 228(1072). Sibony, E., Clémençon, S., and Jakubowicz, J. (2015).

39

slide-43
SLIDE 43

Mra-based statistical learning from incomplete rankings. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, pages 1432–1441.

39