Clustering Rankings in the Fourier Domain
Stéphan Clémençon and Romaric Gaudel and Jérémie Jakubowicz
LTCI, Telecom Paristech (TSI) UMR Institut Telecom/CNRS No. 5141
Clustering Rankings in the Fourier Domain Stphan Clmenon and Romaric - - PowerPoint PPT Presentation
Clustering Rankings in the Fourier Domain Stphan Clmenon and Romaric Gaudel and Jrmie Jakubowicz LTCI, Telecom Paristech (TSI) UMR Institut Telecom/CNRS No. 5141 ECML PKDD, September 2011 Distributions on rankings Many applications
LTCI, Telecom Paristech (TSI) UMR Institut Telecom/CNRS No. 5141
◮ Top-k lists ⋆ Rank of the k most preferred objects
◮ Preference data ⋆ Preferences on k (randomly) picked objects
◮ Bucket order ⋆ Preferences on groups of objects
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 2 / 20
◮ n! (factorial n) coefficients
◮ n! coefficients ◮ Few relevant coefficients in practice
◮ Mallows
◮ Plackett-Luce
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 3 / 20
◮ n! (factorial n) coefficients
◮ n! coefficients ◮ Few relevant coefficients in practice
◮ Mallows
◮ Plackett-Luce
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 3 / 20
◮ Clustering of distributions on rankings ⋆ Gather ranking distributions with similar shapes
◮ Work in the Fourier representation
⋆ Sparse representation of 1 distribution ⋆ Sparse difference between representations of 2 distributions
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 4 / 20
1
2
3
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 5 / 20
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 6 / 20
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 6 / 20
k
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 7 / 20
20 40 60 80 100 120 0.000 0.004 0.008 0.012 [ 3 2 4 1 5 ] [ 3 5 4 2 1 ] [ 1 2 4 5 3 ]
20 40 60 80 100 120 −0.06 −0.02 0.02 0.06 [ 3 2 4 1 5 ] [ 3 5 4 2 1 ] [ 1 2 4 5 3 ]
◮ A few relevant parameters when using the Fourier representation
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 8 / 20
ξ ≥ n!.
◮ Both representations cannot
0.2 0.4 0.6 0.8 1 20 40 60 80 100 120 distortion # used coefficients γ = 10 γ = 1 γ = 0.1
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 9 / 20
1
2
3
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 10 / 20
◮ Gather distributions on rankings with similar shape
◮ Minimize (on all partitions C)
L
l }
L
l
HS(dξ)
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 11 / 20
◮ Gather distributions on rankings with similar shape ◮ Use few Fourier coefficients
◮ Minimize (on all partitions C, and all weight vectors ω)
L
l
HS(dξ)
+
l2 ≤ 1 and ||ω||l1 ≤ λ
◮ Fixing ω = (1/√#Rn, . . . , 1/√#Rn) leads to the initial optimization
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 12 / 20
1
2
◮ Step 1 is performed by a standard clustering algorithm ◮ Step 2 accepts a closed form
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 13 / 20
1
2
3
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 14 / 20
◮ Recover clustering information ◮ Use few coefficients
◮ Mallows (synthetic) ⋆ Exponential distribution on rankings ◮ Top-k lists (synthetic) ⋆ Uniform distribution on rankings ◮ E-commerce Dataset ⋆ List of purchased products (ordered by date)
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 15 / 20
[ 6 5 1 7 4 2 3 ] [ 6 5 1 7 4 3 2 ] [ 6 5 1 4 7 2 3 ] [ 6 7 3 4 5 1 2 ] [ 4 6 3 7 5 1 2 ] [ 4 7 3 6 5 1 2 ] [ 4 7 3 6 2 1 5 ] [ 3 7 4 6 5 1 2 ] [ 6 5 1 2 4 7 3 ] [ 1 5 6 7 4 2 3 ] 0.00 0.10 0.20
[ 6 7 3 4 5 1 2 ] [ 3 7 4 6 5 1 2 ] [ 4 6 3 7 5 1 2 ] [ 4 7 3 6 5 1 2 ] [ 4 7 3 6 2 1 5 ] [ 1 5 6 7 4 2 3 ] [ 6 5 1 7 4 3 2 ] [ 6 5 1 2 4 7 3 ] [ 6 5 1 7 4 2 3 ] [ 6 5 1 4 7 2 3 ] 0.02 0.08 0.14
◮ The Fourier representation recovers the clustering information ◮ The Fourier representation uses few coefficients (compared to n! = 5, 040)
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 16 / 20
2 < 1 < 3 < 6 < ... 2 < 1 < 3 < 4 < ... 2 < 1 < 3 < 5 < ... 2 < 1 < 3 < 7 < ... 2 < 1 < 3 < 8 < ... 1 < 3 < 2 < 6 < ... 1 < 3 < 2 < 4 < ... 1 < 3 < 2 < 5 < ... 1 < 3 < 2 < 7 < ... 1 < 3 < 2 < 8 < ... 3 < 2 < 1 < 6 < ... 3 < 2 < 1 < 4 < ... 3 < 2 < 1 < 5 < ... 3 < 2 < 1 < 7 < ... 3 < 2 < 1 < 8 < ... 6 < 8 < 7 < 3 < ... 7 < 8 < 6 < 3 < ... 8 < 6 < 7 < 3 < ... 8 < 7 < 6 < 3 < ... 8 < 7 < 6 < 5 < ... 8 < 7 < 6 < 4 < ... 8 < 6 < 7 < 5 < ... 8 < 6 < 7 < 4 < ... 7 < 8 < 6 < 5 < ... 7 < 8 < 6 < 4 < ... 6 < 8 < 7 < 5 < ... 6 < 8 < 7 < 4 < ... 7 < 6 < 8 < 5 < ... 7 < 6 < 8 < 4 < ... 6 < 7 < 8 < 4 < ... 6 < 7 < 8 < 5 < ... 6 < 7 < 8 < 3 < ... 7 < 6 < 8 < 3 < ... 6 < 8 < 7 < 1 < ... 8 < 6 < 7 < 1 < ... 6 < 8 < 7 < 2 < ... 8 < 6 < 7 < 2 < ... 6 < 7 < 8 < 1 < ... 7 < 6 < 8 < 2 < ... 7 < 8 < 6 < 2 < ... 8 < 7 < 6 < 1 < ... 6 < 7 < 8 < 2 < ... 7 < 6 < 8 < 1 < ... 7 < 8 < 6 < 1 < ... 8 < 7 < 6 < 2 < ... 1 < 2 < 3 < 6 < ... 1 < 2 < 3 < 4 < ... 1 < 2 < 3 < 5 < ... 1 < 2 < 3 < 7 < ... 1 < 2 < 3 < 8 < ... 2 < 3 < 1 < 6 < ... 2 < 3 < 1 < 4 < ... 2 < 3 < 1 < 5 < ... 2 < 3 < 1 < 7 < ... 2 < 3 < 1 < 8 < ... 3 < 1 < 2 < 6 < ... 3 < 1 < 2 < 4 < ... 3 < 1 < 2 < 5 < ... 3 < 1 < 2 < 7 < ... 3 < 1 < 2 < 8 < ... 0.000 0.010 0.020 0.030
◮ The Fourier representation recovers the clustering information ◮ The “temporal” representation is useless (examples have disjoint supports) ◮ The Fourier representation uses few coefficients
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 17 / 20
0.002 0.006 0.010
◮ 4 groups among users ◮ Focuses on few coefficients
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 18 / 20
◮ Based on the Fourier representation
◮ Based on a sparse clustering criterion
◮ Better understanding of the class of distributions with sparse Fourier
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 19 / 20
Clustering Rankings in the Fourier Domain ECML PKDD, September 2011 21 / 20