clustering rankings in the fourier domain
play

Clustering Rankings in the Fourier Domain Stphan Clmenon and Romaric - PowerPoint PPT Presentation

Clustering Rankings in the Fourier Domain Stphan Clmenon and Romaric Gaudel and Jrmie Jakubowicz LTCI, Telecom Paristech (TSI) UMR Institut Telecom/CNRS No. 5141 ECML PKDD, September 2011 Distributions on rankings Many applications


  1. Clustering Rankings in the Fourier Domain Stéphan Clémençon and Romaric Gaudel and Jérémie Jakubowicz LTCI, Telecom Paristech (TSI) UMR Institut Telecom/CNRS No. 5141 ECML PKDD, September 2011

  2. Distributions on rankings Many applications consider ranked data / distributions on rankings (Uniform distribution with respect to constraints) ◮ Top-k lists ⋆ Rank of the k most preferred objects 3 > 2 > 5 > . . . ◮ Preference data ⋆ Preferences on k (randomly) picked objects . . . > 3 > . . . > 2 > . . . > 5 > . . . “sushi” dataset ◮ Bucket order ⋆ Preferences on groups of objects 3 , 2 > 5 , 1 , 7 > 4 , 6 , 8 S. Clémençon & R. Gaudel & J. Jakubowicz (LTCI) Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 2 / 20

  3. Representation for distributions on Rankings Probability table ◮ n ! (factorial n ) coefficients Fourier representation [Diaconis, 1989; Kondor & Barbosa, 2010] ◮ n ! coefficients ◮ Few relevant coefficients in practice Parametric models ◮ Mallows [Mallows, 1957] ◮ Plackett-Luce [Luce, 1959; Plackett, 1975] S. Clémençon & R. Gaudel & J. Jakubowicz (LTCI) Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 3 / 20

  4. Representation for distributions on Rankings Probability table ◮ n ! (factorial n ) coefficients Fourier representation [Diaconis, 1989; Kondor & Barbosa, 2010] ◮ n ! coefficients ◮ Few relevant coefficients in practice Parametric models ◮ Mallows [Mallows, 1957] ◮ Plackett-Luce [Luce, 1959; Plackett, 1975] S. Clémençon & R. Gaudel & J. Jakubowicz (LTCI) Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 3 / 20

  5. Contributions Clustering of rankings through sparse Fourier representation Position ◮ Clustering of distributions on rankings ⋆ Gather ranking distributions with similar shapes Proposed approach ◮ Work in the Fourier representation ⋆ Sparse representation of 1 distribution = ⇒ ⋆ Sparse difference between representations of 2 distributions S. Clémençon & R. Gaudel & J. Jakubowicz (LTCI) Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 4 / 20

  6. Outline Sparsity in the Fourier Representation 1 Sparse Clustering of Rankings 2 Numerical Experiments 3 S. Clémençon & R. Gaudel & J. Jakubowicz (LTCI) Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 5 / 20

  7. Fourier representation For real line function Functions are decomposed on the sinusoidal basis f ( x ) = 1 . 1 + 2 . 1 cos ( x ) + 3 . 2 cos ( 2 x ) + 1 . 5 cos ( 3 x ) + 0 . 2 cos ( 4 x ) + 0 . 01 cos ( 5 x ) + . . . = + + + + + The information is contained in few (low frequency) coefficients = ⇒ Reduced storage/transfer/computation costs S. Clémençon & R. Gaudel & J. Jakubowicz (LTCI) Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 6 / 20

  8. Fourier representation For real line function Functions are decomposed on the sinusoidal basis f ( x ) = 1 . 1 + 2 . 1 cos ( x ) + 3 . 2 cos ( 2 x ) + 1 . 5 cos ( 3 x ) + 0 . 2 cos ( 4 x ) + 0 . 01 cos ( 5 x ) + . . . = + + + + + The information is contained in few (low frequency) coefficients = ⇒ Reduced storage/transfer/computation costs S. Clémençon & R. Gaudel & J. Jakubowicz (LTCI) Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 6 / 20

  9. Fourier representation For functions on S n [Diaconis, 1989] There is no simple basis (corresponding to eigen-spaces of dimension 1) = ⇒ Fourier coefficients are matrices indexed by the set R n of all integer partitions of n     F f =  , , , , , , . . .  �� � � � k ξ = ( n 1 , . . . , n k ) ∈ N ∗ k : n 1 ≥ · · · ≥ n k , R n = n i = n , 1 ≤ k ≤ n i = 1 “Low-frequency” coefficients are related to low order summaries ( P [ σ ( i , j ) = ( k , ℓ )] ) S. Clémençon & R. Gaudel & J. Jakubowicz (LTCI) Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 7 / 20

  10. Example: Mallows( S 5 ) Exponential distribution on rankings, γ = 0 . 1 0.06 0.012 [ 3 2 4 1 5 ] [ 3 5 4 2 1 ] [ 1 2 4 5 3 ] 0.02 0.008 −0.02 0.004 [ 3 2 4 1 5 ] [ 3 5 4 2 1 ] 0.000 −0.06 [ 1 2 4 5 3 ] 0 20 40 60 80 100 120 0 20 40 60 80 100 120 “Temporal” coefficents Fourier coefficients Remark: ◮ A few relevant parameters when using the Fourier representation S. Clémençon & R. Gaudel & J. Jakubowicz (LTCI) Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 8 / 20

  11. Uncertainty principle Balancing Sparsity Theorem (inspired from [Donoho & Stark, 1989]) Let f ∈ C [ S n ] of Fourier transform F f . Denote by supp ( f ) = { σ ∈ S n : f ( σ ) � = 0 } and by supp ( F f ) = { ξ ∈ R n : F f ( ξ ) � = 0 } the support of f and that of its Fourier transform respectively. Then, we have: � d 2 # supp ( f ) · ξ ≥ n ! . ξ ∈ supp ( F f ) 1 γ = 10 γ = 1 0.8 γ = 0.1 distortion 0.6 Direct consequence 0.4 ◮ Both representations cannot 0.2 0 be simultaneously sparse 0 20 40 60 80 100 120 # used coefficients Distortion with Mallows( S 5 ) S. Clémençon & R. Gaudel & J. Jakubowicz (LTCI) Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 9 / 20

  12. Outline Sparsity in the Fourier Representation 1 Sparse Clustering of Rankings 2 Numerical Experiments 3 S. Clémençon & R. Gaudel & J. Jakubowicz (LTCI) Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 10 / 20

  13. Clustering of rankings Aim ◮ Gather distributions on rankings with similar shape Objective function ◮ Minimize (on all partitions C ) � � L || f i − f j || 2 · I { ( f i , f j ) ∈ C 2 � M ( C ) = l } l = 1 1 ≤ i , j ≤ N � � L � 1 ||F f i ( ξ ) − F f j ( ξ ) || 2 = d ξ HS ( d ξ ) n ! ξ ∈R n l = 1 1 ≤ i , j ≤ N : ( f i , f j ) ∈C 2 l with d ξ × d ξ the dimension of the matrix indexed by ξ S. Clémençon & R. Gaudel & J. Jakubowicz (LTCI) Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 11 / 20

  14. Managing sparsity Aim ◮ Gather distributions on rankings with similar shape ◮ Use few Fourier coefficients New objective function [Witten & Tibshirani, 2010] ◮ Minimize (on all partitions C , and all weight vectors ω ) � � � L ω ξ d ξ � ||F f i ( ξ ) − F f j ( ξ ) || 2 M ω ( C ) = HS ( d ξ ) n ! l = 1 1 ≤ i , j ≤ N : ( f i , f j ) ∈C 2 ξ ∈R n l with ω = ( ω ξ ) ξ ∈ R n ∈ R # R n , || ω || 2 l 2 ≤ 1 and || ω || l 1 ≤ λ + Remark: ◮ Fixing ω = ( 1 / √ # R n , . . . , 1 / √ # R n ) leads to the initial optimization problem (without ω ) S. Clémençon & R. Gaudel & J. Jakubowicz (LTCI) Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 12 / 20

  15. Algorithm Initialize ω = ( 1 / √ # R n , . . . , 1 / √ # R n ) Until convergence, iterate steps 1 and 2 Fixing the weight vector ω , minimize � M ω ( C ) after the partition C 1 Fixing the partition C , minimize � M ω ( C ) after ω . 2 Remarks ◮ Step 1 is performed by a standard clustering algorithm ◮ Step 2 accepts a closed form [Witten & Tibshirani, 2010] S. Clémençon & R. Gaudel & J. Jakubowicz (LTCI) Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 13 / 20

  16. Outline Sparsity in the Fourier Representation 1 Sparse Clustering of Rankings 2 Numerical Experiments 3 S. Clémençon & R. Gaudel & J. Jakubowicz (LTCI) Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 14 / 20

  17. Experiments Aim ◮ Recover clustering information ◮ Use few coefficients Datasets ◮ Mallows (synthetic) ⋆ Exponential distribution on rankings ◮ Top- k lists (synthetic) ⋆ Uniform distribution on rankings ◮ E-commerce Dataset ⋆ List of purchased products (ordered by date) S. Clémençon & R. Gaudel & J. Jakubowicz (LTCI) Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 15 / 20

  18. S. Clémençon & R. Gaudel & J. Jakubowicz (LTCI) γ = 1 Mallows( S 7 ) Remarks: 0.00 0.10 0.20 ◮ The Fourier representation uses few coefficients (compared to n ! = 5 , 040) ◮ The Fourier representation recovers the clustering information “Temporal” representation (3 coefficients selected) [ 6 5 1 7 4 2 3 ] [ 6 5 1 7 4 3 2 ] [ 6 5 1 4 7 2 3 ] [ 6 7 3 4 5 1 2 ] [ 4 6 3 7 5 1 2 ] [ 4 7 3 6 5 1 2 ] [ 4 7 3 6 2 1 5 ] [ 3 7 4 6 5 1 2 ] Clustering Rankings in the Fourier Domain [ 6 5 1 2 4 7 3 ] [ 1 5 6 7 4 2 3 ] 0.02 0.08 0.14 (54 coefficients selected) [ 6 7 3 4 5 1 2 ] Fourier representation [ 3 7 4 6 5 1 2 ] [ 4 6 3 7 5 1 2 ] [ 4 7 3 6 5 1 2 ] ECML PKDD, September 2011 [ 4 7 3 6 2 1 5 ] [ 1 5 6 7 4 2 3 ] [ 6 5 1 7 4 3 2 ] [ 6 5 1 2 4 7 3 ] [ 6 5 1 7 4 2 3 ] [ 6 5 1 4 7 2 3 ] 16 / 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend